public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH, libgomp, OpenMP 5.0] Implement omp_get_device_num
@ 2021-07-23 10:21 Chung-Lin Tang
  2021-07-23 10:39 ` Jakub Jelinek
                   ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Chung-Lin Tang @ 2021-07-23 10:21 UTC (permalink / raw)
  To: gcc-patches, Thomas Schwinge, Jakub Jelinek, Andrew Stubbs, Tom de Vries
  Cc: Catherine Moore, Vollweiler, Marcel (DI SW CAS EPS STS)

[-- Attachment #1: Type: text/plain, Size: 2739 bytes --]

Hi all,
this patch implements the omp_get_device_num API function, which appears
to be a missing piece in the library routines implementation.

The host-side implementation is simple, which by specification is equivalent
to omp_get_initial_device.

Inside offloaded regions, the preferred way to should be that the device
already has this information initialized (once) when the device is initialized.
And the function merely returns the stored value.

This implementation adds a convention for an additional entry (dubbed under 'others'
in the code) returned by the 'load_image' plugin hook. Basically we define
a variable name in libgomp-plugin.h, which the device libgomp defines, and the
offload plugin searches for, and returns the variable device location start/end for
gomp_load_image_from_device to initialize. The device-side omp_get_device_num
then just returns that value.

This patch implements for gcn and nvptx offload targets. The icv-device.c file is
starting to look like a file ready to consolidate away the target specific versions,
but that's for later.

Basic libgomp tests were added for C/C++ and Fortran. Tested without regressions
with offloading for amdgcn and nvptx on x86_64-linux host. Okay for trunk?

Thanks,
Chung-Lin

2021-07-23  Chung-Lin Tang  <cltang@codesourcery.com>

libgomp/ChangeLog

	* icv-device.c (omp_get_device_num): New API function, host side.
	* fortran.c (omp_get_device_num_): New interface function.
	* libgomp-plugin.h (GOMP_DEVICE_NUM_VAR): Define macro symbol.
	* libgomp.map (OMP_5.0.1): Add omp_get_device_num, omp_get_device_num_.
	* libgomp.texi (omp_get_device_num): Add documentation for new API
	function.
	* omp.h.in (omp_get_device_num): Add declaration.
	* omp_lib.f90.in (omp_get_device_num): Likewise.
	* omp_lib.h.in (omp_get_device_num): Likewise.
	* target.c (gomp_load_image_to_device): If additional entry for device
	number exists at end of returned entries from 'load_image_func' hook,
	copy the assigned device number over to the device variable.

	* config/gcn/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global.
	(omp_get_device_num): New API function, device side.
	* config/plugin/plugin-gcn.c ("symcat.h"): Add include.
	(GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR
	at end of returned 'target_table' entries.

	* config/nvptx/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global.
	(omp_get_device_num): New API function, device side.
	* config/plugin/plugin-nvptx.c ("symcat.h"): Add include.
	(GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR
	at end of returned 'target_table' entries.

	* testsuite/libgomp.c-c++-common/target-45.c: New test.
	* testsuite/libgomp.fortran/target10.f90: New test.



[-- Attachment #2: omp_get_device_num.patch --]
[-- Type: text/plain, Size: 15110 bytes --]

diff --git a/libgomp/config/gcn/icv-device.c b/libgomp/config/gcn/icv-device.c
index 72d4f7cff74..8f72028a6c8 100644
--- a/libgomp/config/gcn/icv-device.c
+++ b/libgomp/config/gcn/icv-device.c
@@ -70,6 +70,16 @@ omp_is_initial_device (void)
   return 0;
 }
 
+/* This is set to the device number of current GPU during device initialization,
+   when the offload image containing this libgomp portion is loaded.  */
+static int GOMP_DEVICE_NUM_VAR;
+
+int
+omp_get_device_num (void)
+{
+  return GOMP_DEVICE_NUM_VAR;
+}
+
 ialias (omp_set_default_device)
 ialias (omp_get_default_device)
 ialias (omp_get_initial_device)
diff --git a/libgomp/config/nvptx/icv-device.c b/libgomp/config/nvptx/icv-device.c
index 3b96890f338..e586da1d3a8 100644
--- a/libgomp/config/nvptx/icv-device.c
+++ b/libgomp/config/nvptx/icv-device.c
@@ -58,8 +58,19 @@ omp_is_initial_device (void)
   return 0;
 }
 
+/* This is set to the device number of current GPU during device initialization,
+   when the offload image containing this libgomp portion is loaded.  */
+static int GOMP_DEVICE_NUM_VAR;
+
+int
+omp_get_device_num (void)
+{
+  return GOMP_DEVICE_NUM_VAR;
+}
+
 ialias (omp_set_default_device)
 ialias (omp_get_default_device)
 ialias (omp_get_initial_device)
 ialias (omp_get_num_devices)
 ialias (omp_is_initial_device)
+ialias (omp_get_device_num)
diff --git a/libgomp/fortran.c b/libgomp/fortran.c
index 4ec39c4e61b..2360582e32e 100644
--- a/libgomp/fortran.c
+++ b/libgomp/fortran.c
@@ -598,6 +598,12 @@ omp_get_initial_device_ (void)
   return omp_get_initial_device ();
 }
 
+int32_t
+omp_get_device_num_ (void)
+{
+  return omp_get_device_num ();
+}
+
 int32_t
 omp_get_max_task_priority_ (void)
 {
diff --git a/libgomp/icv-device.c b/libgomp/icv-device.c
index c1bedf46647..f11bdfa85c4 100644
--- a/libgomp/icv-device.c
+++ b/libgomp/icv-device.c
@@ -61,8 +61,17 @@ omp_is_initial_device (void)
   return 1;
 }
 
+int
+omp_get_device_num (void)
+{
+  /* By specification, this is equivalent to omp_get_initial_device
+     on the host.  */
+  return omp_get_initial_device ();
+}
+
 ialias (omp_set_default_device)
 ialias (omp_get_default_device)
 ialias (omp_get_initial_device)
 ialias (omp_get_num_devices)
 ialias (omp_is_initial_device)
+ialias (omp_get_device_num)
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index 62645ce9954..3f0983c1ea4 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -102,6 +102,12 @@ struct addr_pair
   uintptr_t end;
 };
 
+/* This symbol is to name a target side variable that holds the designated
+   'device number' of the target device. The symbol needs to be available to
+   libgomp code and the  offload plugin (which in the latter case must be
+   stringified).  */
+#define GOMP_DEVICE_NUM_VAR __gomp_device_num
+
 /* Miscellaneous functions.  */
 extern void *GOMP_PLUGIN_malloc (size_t) __attribute__ ((malloc));
 extern void *GOMP_PLUGIN_malloc_cleared (size_t) __attribute__ ((malloc));
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index 8ea27b5565f..ffcb98ae99e 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -197,6 +197,8 @@ OMP_5.0.1 {
 	omp_get_supported_active_levels_;
 	omp_fulfill_event;
 	omp_fulfill_event_;
+	omp_get_device_num;
+	omp_get_device_num_;
 } OMP_5.0;
 
 GOMP_1.0 {
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 2c1f1b5968b..fc9e708a8d2 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -165,6 +165,7 @@ linkage, and do not throw exceptions.
 * omp_get_ancestor_thread_num:: Ancestor thread ID
 * omp_get_cancellation::        Whether cancellation support is enabled
 * omp_get_default_device::      Get the default device for target regions
+* omp_get_device_num::          Get device that current thread is running on
 * omp_get_dynamic::             Dynamic teams setting
 * omp_get_initial_device::      Device number of host device
 * omp_get_level::               Number of parallel regions
@@ -385,6 +386,34 @@ For OpenMP 5.1, this must be equal to the value returned by the
 
 
 
+@node omp_get_device_num
+@section @code{omp_get_device_num} -- Return device number of current device
+@table @asis
+@item @emph{Description}:
+This function returns a device number that represents the device that the
+current thread is executing on. For OpenMP 5.0, this must be equal to the
+value returned by the @code{omp_get_initial_device} function when called
+from the host.
+
+@item @emph{C/C++}
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int omp_get_device_num(void);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{integer function omp_get_device_num()}
+@end multitable
+
+@item @emph{See also}:
+@ref{omp_get_initial_device}
+
+@item @emph{Reference}:
+@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.2.37.
+@end table
+
+
+
 @node omp_get_level
 @section @code{omp_get_level} -- Obtain the current nesting level
 @table @asis
diff --git a/libgomp/omp.h.in b/libgomp/omp.h.in
index 69f96f09124..3bfd0aee6db 100644
--- a/libgomp/omp.h.in
+++ b/libgomp/omp.h.in
@@ -243,6 +243,7 @@ extern void omp_get_partition_place_nums (int *) __GOMP_NOTHROW;
 extern void omp_set_default_device (int) __GOMP_NOTHROW;
 extern int omp_get_default_device (void) __GOMP_NOTHROW;
 extern int omp_get_num_devices (void) __GOMP_NOTHROW;
+extern int omp_get_device_num (void) __GOMP_NOTHROW;
 extern int omp_get_num_teams (void) __GOMP_NOTHROW;
 extern int omp_get_team_num (void) __GOMP_NOTHROW;
 
diff --git a/libgomp/omp_lib.f90.in b/libgomp/omp_lib.f90.in
index 851f85f5316..adc20726296 100644
--- a/libgomp/omp_lib.f90.in
+++ b/libgomp/omp_lib.f90.in
@@ -550,6 +550,12 @@
           end function omp_get_initial_device
         end interface
 
+        interface
+          function omp_get_device_num ()
+            integer (4) :: omp_get_device_num
+          end function omp_get_device_num
+        end interface
+
         interface
           function omp_get_max_task_priority ()
             integer (4) :: omp_get_max_task_priority
diff --git a/libgomp/omp_lib.h.in b/libgomp/omp_lib.h.in
index 06d17b5fcdc..b006509d075 100644
--- a/libgomp/omp_lib.h.in
+++ b/libgomp/omp_lib.h.in
@@ -244,6 +244,9 @@
       external omp_get_initial_device
       integer(4) omp_get_initial_device
 
+      external omp_get_device_num
+      integer(4) omp_get_device_num
+
       external omp_get_max_task_priority
       integer(4) omp_get_max_task_priority
 
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index cfed42a2d4d..76906a52bc2 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -29,6 +29,7 @@
 /* {{{ Includes and defines  */
 
 #include "config.h"
+#include "symcat.h"
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
@@ -3310,6 +3311,7 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
   struct kernel_info *kernel;
   int kernel_count = image_desc->kernel_count;
   unsigned var_count = image_desc->global_variable_count;
+  int other_count = 1;
 
   agent = get_agent_info (ord);
   if (!agent)
@@ -3326,7 +3328,8 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
 
   GCN_DEBUG ("Encountered %d kernels in an image\n", kernel_count);
   GCN_DEBUG ("Encountered %u global variables in an image\n", var_count);
-  pair = GOMP_PLUGIN_malloc ((kernel_count + var_count - 2)
+  GCN_DEBUG ("Expect %d other variables in an image\n", other_count);
+  pair = GOMP_PLUGIN_malloc ((kernel_count + var_count + other_count - 2)
 			     * sizeof (struct addr_pair));
   *target_table = pair;
   module = (struct module_info *)
@@ -3401,6 +3404,35 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
       pair++;
     }
 
+  GCN_DEBUG ("Looking for variable %s\n", STRINGX (GOMP_DEVICE_NUM_VAR));
+
+  hsa_status_t status;
+  hsa_executable_symbol_t var_symbol;
+  status = hsa_fns.hsa_executable_get_symbol_fn (agent->executable, NULL,
+						 STRINGX (GOMP_DEVICE_NUM_VAR),
+						 agent->id, 0, &var_symbol);
+  if (status == HSA_STATUS_SUCCESS)
+    {
+      uint64_t device_num_varptr;
+      uint32_t device_num_varsize;
+
+      status = hsa_fns.hsa_executable_symbol_get_info_fn
+	(var_symbol, HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_ADDRESS,
+	 &device_num_varptr);
+      if (status != HSA_STATUS_SUCCESS)
+	hsa_fatal ("Could not extract a variable from its symbol", status);
+      status = hsa_fns.hsa_executable_symbol_get_info_fn
+	(var_symbol, HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_SIZE,
+	 &device_num_varsize);
+      if (status != HSA_STATUS_SUCCESS)
+	hsa_fatal ("Could not extract a variable size from its symbol", status);
+
+      pair->start = device_num_varptr;
+      pair->end = device_num_varptr + device_num_varsize;
+    }
+  else
+    pair->start = pair->end = 0;
+
   /* Ensure that constructors are run first.  */
   struct GOMP_kernel_launch_attributes kla =
     { 3,
@@ -3423,7 +3455,7 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
   if (module->fini_array_func)
     kernel_count--;
 
-  return kernel_count + var_count;
+  return kernel_count + var_count + other_count;
 }
 
 /* Unload GCN object-code module described by struct gcn_image_desc in
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 1215212d501..8b34895a88f 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -34,6 +34,7 @@
 #define _GNU_SOURCE
 #include "openacc.h"
 #include "config.h"
+#include "symcat.h"
 #include "libgomp-plugin.h"
 #include "oacc-plugin.h"
 #include "gomp-constants.h"
@@ -1265,7 +1266,7 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
   CUmodule module;
   const char *const *var_names;
   const struct targ_fn_launch *fn_descs;
-  unsigned int fn_entries, var_entries, i, j;
+  unsigned int fn_entries, var_entries, other_entries, i, j;
   struct targ_fn_descriptor *targ_fns;
   struct addr_pair *targ_tbl;
   const nvptx_tdata_t *img_header = (const nvptx_tdata_t *) target_data;
@@ -1295,8 +1296,11 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
   fn_entries = img_header->fn_num;
   fn_descs = img_header->fn_descs;
 
+  /* Currently, the only other entry kind is 'device number'.  */
+  other_entries = 1;
+
   targ_tbl = GOMP_PLUGIN_malloc (sizeof (struct addr_pair)
-				 * (fn_entries + var_entries));
+				 * (fn_entries + var_entries + other_entries));
   targ_fns = GOMP_PLUGIN_malloc (sizeof (struct targ_fn_descriptor)
 				 * fn_entries);
 
@@ -1345,9 +1349,22 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
       targ_tbl->end = targ_tbl->start + bytes;
     }
 
+  CUdeviceptr device_num_varptr;
+  size_t device_num_varsize;
+  CUresult r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, &device_num_varptr,
+				  &device_num_varsize, module,
+				  STRINGX (GOMP_DEVICE_NUM_VAR));
+  if (r == CUDA_SUCCESS)
+    {
+      targ_tbl->start = (uintptr_t) device_num_varptr;
+      targ_tbl->end = (uintptr_t) (device_num_varptr + device_num_varsize);
+    }
+  else
+    targ_tbl->start = targ_tbl->end = 0;
+
   nvptx_set_clocktick (module, dev);
 
-  return fn_entries + var_entries;
+  return fn_entries + var_entries + other_entries;
 }
 
 /* Unload the program described by TARGET_DATA.  DEV_DATA is the
diff --git a/libgomp/target.c b/libgomp/target.c
index bb09d501dd6..118871269b6 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1920,6 +1920,9 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
   int num_funcs = host_funcs_end - host_func_table;
   int num_vars  = (host_vars_end - host_var_table) / 2;
 
+  /* Others currently is only 'device_num' */
+  int num_others = 1;
+
   /* Load image to device and get target addresses for the image.  */
   struct addr_pair *target_table = NULL;
   int i, num_target_entries;
@@ -1928,7 +1931,9 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
     = devicep->load_image_func (devicep->target_id, version,
 				target_data, &target_table);
 
-  if (num_target_entries != num_funcs + num_vars)
+  if (num_target_entries != num_funcs + num_vars
+      /* Others (device_num) are included as trailing entries in pair list.  */
+      && num_target_entries != num_funcs + num_vars + num_others)
     {
       gomp_mutex_unlock (&devicep->lock);
       if (is_register_lock)
@@ -2000,6 +2005,31 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
       array++;
     }
 
+  /* Last entry is for the on-device 'device_num' variable. Tolerate case
+     where plugin does not return this entry.  */
+  if (num_funcs + num_vars < num_target_entries)
+    {
+      struct addr_pair *device_num_var = &target_table[num_funcs + num_vars];
+      if (device_num_var->start != 0)
+	{
+	  /* The index of the devicep within devices[] is regarded as its
+	     'device number', which is different from the per-device type
+	     devicep->target_id.  */
+	  int device_num_val = (int) (devicep - &devices[0]);
+	  if (device_num_var->end - device_num_var->start != sizeof (int))
+	    {
+	      gomp_mutex_unlock (&devicep->lock);
+	      gomp_fatal ("offload plugin managed 'device_num' not of expected "
+			  "format");
+	    }
+
+	  /* Copy device_num value to place on device memory, hereby actually
+	     designating its device number into effect.  */
+	  gomp_copy_host2dev (devicep, NULL, (void *) device_num_var->start,
+			      &device_num_val, sizeof (int), NULL);
+	}
+    }
+
   free (target_table);
 }
 
diff --git a/libgomp/testsuite/libgomp.c-c++-common/target-45.c b/libgomp/testsuite/libgomp.c-c++-common/target-45.c
new file mode 100644
index 00000000000..b39a85154a9
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/target-45.c
@@ -0,0 +1,23 @@
+/* { dg-do run { target { offload_target_nvptx || offload_target_amdgcn } } } */
+
+#include <omp.h>
+#include <stdlib.h>
+
+int main (void)
+{
+  int host_device_num = omp_get_device_num ();
+
+  if (host_device_num != omp_get_initial_device ())
+    abort ();
+
+  int device_num;
+  #pragma omp target map(from: device_num)
+  {
+    device_num = omp_get_device_num ();
+  }
+
+  if (host_device_num == device_num)
+    abort ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.fortran/target10.f90 b/libgomp/testsuite/libgomp.fortran/target10.f90
new file mode 100644
index 00000000000..329438c8afc
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/target10.f90
@@ -0,0 +1,17 @@
+! { dg-do run { target { offload_target_nvptx || offload_target_amdgcn } } }
+
+program main
+  use omp_lib
+  implicit none
+  integer :: device_num, host_device_num
+
+  host_device_num = omp_get_device_num ()
+  if (host_device_num .ne. omp_get_initial_device ()) stop 1
+
+  !$omp target map(from: device_num)
+  device_num = omp_get_device_num ()
+  !$omp end target
+
+  if (host_device_num .eq. device_num) stop 2
+
+end program main

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH, libgomp, OpenMP 5.0] Implement omp_get_device_num
  2021-07-23 10:21 [PATCH, libgomp, OpenMP 5.0] Implement omp_get_device_num Chung-Lin Tang
@ 2021-07-23 10:39 ` Jakub Jelinek
  2021-08-02 13:10   ` [PATCH, v2, " Chung-Lin Tang
  2021-07-23 11:01 ` [PATCH, " Tobias Burnus
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 21+ messages in thread
From: Jakub Jelinek @ 2021-07-23 10:39 UTC (permalink / raw)
  To: Chung-Lin Tang
  Cc: gcc-patches, Thomas Schwinge, Andrew Stubbs, Tom de Vries,
	Catherine Moore, Vollweiler, Marcel (DI SW CAS EPS STS)

On Fri, Jul 23, 2021 at 06:21:41PM +0800, Chung-Lin Tang wrote:
> --- a/libgomp/icv-device.c
> +++ b/libgomp/icv-device.c
> @@ -61,8 +61,17 @@ omp_is_initial_device (void)
>    return 1;
>  }
>  
> +int
> +omp_get_device_num (void)
> +{
> +  /* By specification, this is equivalent to omp_get_initial_device
> +     on the host.  */
> +  return omp_get_initial_device ();
> +}
> +

I think this won't work properly with the intel micoffload, where the host
libgomp is used in the offloaded code.
For omp_is_initial_device, the plugin solves it by:
liboffloadmic/plugin/offload_target_main.cpp
overriding it:
/* Override the corresponding functions from libgomp.  */
extern "C" int
omp_is_initial_device (void) __GOMP_NOTHROW
{
  return 0;
}
   
extern "C" int32_t
omp_is_initial_device_ (void)
{
  return omp_is_initial_device ();
}
but guess it will need slightly more work because we need to copy the value
to the offloading device too.
It can be done incrementally though.

> --- a/libgomp/libgomp-plugin.h
> +++ b/libgomp/libgomp-plugin.h
> @@ -102,6 +102,12 @@ struct addr_pair
>    uintptr_t end;
>  };
>  
> +/* This symbol is to name a target side variable that holds the designated
> +   'device number' of the target device. The symbol needs to be available to
> +   libgomp code and the  offload plugin (which in the latter case must be
> +   stringified).  */
> +#define GOMP_DEVICE_NUM_VAR __gomp_device_num

For a single var it is acceptable (though, please avoid the double space
before offload plugin in the comment), but once we have more than one
variable, I think we should simply have a struct which will contain all the
parameters that need to be copied from the host to the offloading device at
image load time (and have eventually another struct that holds parameters
that we'll need to copy to the device on each kernel launch, I bet some ICVs
will be one category, other ICVs another one).

> diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
> index 8ea27b5565f..ffcb98ae99e 100644
> --- a/libgomp/libgomp.map
> +++ b/libgomp/libgomp.map
> @@ -197,6 +197,8 @@ OMP_5.0.1 {
>  	omp_get_supported_active_levels_;
>  	omp_fulfill_event;
>  	omp_fulfill_event_;
> +	omp_get_device_num;
> +	omp_get_device_num_;
>  } OMP_5.0;

This is wrong.  We've already released GCC 11.1 with the OMP_5.0.1
symbol version, so we must not add any further symbols into that symbol
version.  OpenMP 5.0 routines added in GCC 12 should be OMP_5.0.2 symbol
version.

	Jakub


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH, libgomp, OpenMP 5.0] Implement omp_get_device_num
  2021-07-23 10:21 [PATCH, libgomp, OpenMP 5.0] Implement omp_get_device_num Chung-Lin Tang
  2021-07-23 10:39 ` Jakub Jelinek
@ 2021-07-23 11:01 ` Tobias Burnus
  2021-08-02 13:09   ` Chung-Lin Tang
  2022-01-04  9:28 ` [PATCH, OpenMP, libgomp, committed] Fix GOMP_DEVICE_NUM_VAR stringification error Chung-Lin Tang
  2022-01-12  9:43 ` [PATCH] libgomp, OpenMP: Fix issue for omp_get_device_num on gfx targets Marcel Vollweiler
  3 siblings, 1 reply; 21+ messages in thread
From: Tobias Burnus @ 2021-07-23 11:01 UTC (permalink / raw)
  To: Chung-Lin Tang, gcc-patches, Thomas Schwinge, Jakub Jelinek,
	Andrew Stubbs, Tom de Vries
  Cc: Vollweiler, Marcel (DI SW CAS EPS STS)

On 23.07.21 12:21, Chung-Lin Tang wrote:
> Inside offloaded regions, the preferred way to should be that the device
> already has this information initialized (once) when the device is
> initialized.
> And the function merely returns the stored value.
...
> +++ b/libgomp/testsuite/libgomp.c-c++-common/target-45.c
> @@ -0,0 +1,23 @@
> +/* { dg-do run { target { offload_target_nvptx || offload_target_amdgcn } } } */
...
> +  int device_num;
> +  #pragma omp target map(from: device_num)
> +  {
> +    device_num = omp_get_device_num ();
> +  }
> +
> +  if (host_device_num == device_num)
> +    abort ();

I personally prefer having:
    int initial_dev;
and inside 'omp target' (with 'map(from:initial_dev)'):
    initial_device = omp_is_initial_device();

Then the check would be:
   if (initial_device && host_device_num != device_num)
     abort();
   if (!initial_device && host_device_num == device_num)
     abort();

(Likewise for Fortran.)

And instead of restricting the target to nvptx/gcn, we could just add
dg-xfail-run-if for *-intelmic-* and *-intelmicemul-*.

Additionally, offload_target_nvptx/...amdgcn only check whether
compilation support is available not whether a device exists
at run time.
(The device availability is checked by target_offload_device,
using omp_is_initial_device().)

Tobias

PS: For completeness, I want to note that OpenMP 5.1 supports
setting the per-device ICV as via the environment variables,
besides inheriting the generic ICV values, device-specific
settings are possible with:
   <ENVIRONMENT VARIABLE>_DEV[_<device>]
Thus, more data will be passed from libgomp to the plugins
in the future.

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH, libgomp, OpenMP 5.0] Implement omp_get_device_num
  2021-07-23 11:01 ` [PATCH, " Tobias Burnus
@ 2021-08-02 13:09   ` Chung-Lin Tang
  0 siblings, 0 replies; 21+ messages in thread
From: Chung-Lin Tang @ 2021-08-02 13:09 UTC (permalink / raw)
  To: Tobias Burnus, gcc-patches, Thomas Schwinge, Jakub Jelinek,
	Andrew Stubbs, Tom de Vries
  Cc: Vollweiler, Marcel (DI SW CAS EPS STS)



On 2021/7/23 7:01 PM, Tobias Burnus wrote:
> I personally prefer having:
>     int initial_dev;
> and inside 'omp target' (with 'map(from:initial_dev)'):
>     initial_device = omp_is_initial_device();
> 
> Then the check would be:
>    if (initial_device && host_device_num != device_num)
>      abort();
>    if (!initial_device && host_device_num == device_num)
>      abort();
> 
> (Likewise for Fortran.)

Thanks, I've adjusted the new testcases to use this style.

> And instead of restricting the target to nvptx/gcn, we could just add
> dg-xfail-run-if for *-intelmic-* and *-intelmicemul-*.

I've added a 'offload_target_intelmic' to use on the new testcases.

> Additionally, offload_target_nvptx/...amdgcn only check whether
> compilation support is available not whether a device exists
> at run time.
> (The device availability is checked by target_offload_device,
> using omp_is_initial_device().)

I guess there is value in testing compilation as long as the compiler
is properly configured, and leaving the execution as an independent test.
OTOH, I think the OpenMP execution tests are not properly forcing offload
(or not) using the environment variables, unlike what we have for OpenACC.

Thanks,
Chung-Lin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH, v2, libgomp, OpenMP 5.0] Implement omp_get_device_num
  2021-07-23 10:39 ` Jakub Jelinek
@ 2021-08-02 13:10   ` Chung-Lin Tang
  2021-08-03 11:08     ` Jakub Jelinek
                       ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Chung-Lin Tang @ 2021-08-02 13:10 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: gcc-patches, Thomas Schwinge, Andrew Stubbs, Tom de Vries,
	Catherine Moore, Vollweiler, Marcel (DI SW CAS EPS STS)

[-- Attachment #1: Type: text/plain, Size: 5768 bytes --]

On 2021/7/23 6:39 PM, Jakub Jelinek wrote:
> On Fri, Jul 23, 2021 at 06:21:41PM +0800, Chung-Lin Tang wrote:
>> --- a/libgomp/icv-device.c
>> +++ b/libgomp/icv-device.c
>> @@ -61,8 +61,17 @@ omp_is_initial_device (void)
>>     return 1;
>>   }
>>   
>> +int
>> +omp_get_device_num (void)
>> +{
>> +  /* By specification, this is equivalent to omp_get_initial_device
>> +     on the host.  */
>> +  return omp_get_initial_device ();
>> +}
>> +
> 
> I think this won't work properly with the intel micoffload, where the host
> libgomp is used in the offloaded code.
> For omp_is_initial_device, the plugin solves it by:
> liboffloadmic/plugin/offload_target_main.cpp
> overriding it:
> /* Override the corresponding functions from libgomp.  */
> extern "C" int
> omp_is_initial_device (void) __GOMP_NOTHROW
> {
>    return 0;
> }
>     
> extern "C" int32_t
> omp_is_initial_device_ (void)
> {
>    return omp_is_initial_device ();
> }
> but guess it will need slightly more work because we need to copy the value
> to the offloading device too.
> It can be done incrementally though.

I guess this part of intelmic functionality will just have to wait later.
There seem to be other parts of liboffloadmic that seems to need re-work,
e.g. omp_get_num_devices() return mic_engines_total, where it should actually
return the number of all devices (not just intelmic). omp_get_initial_device()
returning -1 (which I don't quite understand), etc.

Really suggest to have intelmic support be re-worked as an offload plugin inside
libgomp, rather than floating outside by itself.

>> --- a/libgomp/libgomp-plugin.h
>> +++ b/libgomp/libgomp-plugin.h
>> @@ -102,6 +102,12 @@ struct addr_pair
>>     uintptr_t end;
>>   };
>>   
>> +/* This symbol is to name a target side variable that holds the designated
>> +   'device number' of the target device. The symbol needs to be available to
>> +   libgomp code and the  offload plugin (which in the latter case must be
>> +   stringified).  */
>> +#define GOMP_DEVICE_NUM_VAR __gomp_device_num
> 
> For a single var it is acceptable (though, please avoid the double space
> before offload plugin in the comment), but once we have more than one
> variable, I think we should simply have a struct which will contain all the
> parameters that need to be copied from the host to the offloading device at
> image load time (and have eventually another struct that holds parameters
> that we'll need to copy to the device on each kernel launch, I bet some ICVs
> will be one category, other ICVs another one).

Actually, if you look at the 5.[01] specifications, omp_get_device_num() is not
defined in terms of an ICV. Maybe it conceptually ought to be, but the current
description of "the device number of the device on which the calling thread is
executing" is not one if the defined ICVs.

It looks like there will eventually be some kind of ICV block handled in a similar
way, but I think that the modifications will be straightforward then. For now,
I think it's okay for GOMP_DEVICE_NUM_VAR to just be a normal global variable.

>> diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
>> index 8ea27b5565f..ffcb98ae99e 100644
>> --- a/libgomp/libgomp.map
>> +++ b/libgomp/libgomp.map
>> @@ -197,6 +197,8 @@ OMP_5.0.1 {
>>   	omp_get_supported_active_levels_;
>>   	omp_fulfill_event;
>>   	omp_fulfill_event_;
>> +	omp_get_device_num;
>> +	omp_get_device_num_;
>>   } OMP_5.0;
> 
> This is wrong.  We've already released GCC 11.1 with the OMP_5.0.1
> symbol version, so we must not add any further symbols into that symbol
> version.  OpenMP 5.0 routines added in GCC 12 should be OMP_5.0.2 symbol
> version.

I've adjusted this into 5.0.2, in between 5.0.1 and the new 5.1 added by the recent
omp_display_env[_] routines. omp_get_device_num is a OpenMP 5.0 introduced
API function, so I think this is the correct handling (instead of stashing into 5.1).

There is a new function check_effective_target_offload_target_intelmic() in
testsuite/lib/libgomp.exp, used to test for non-intelmic offloading situations.

Re-tested with no regressions, seeking approval for trunk.

Thanks,
Chung-Lin

2021-08-02  Chung-Lin Tang  <cltang@codesourcery.com>

libgomp/ChangeLog

	* icv-device.c (omp_get_device_num): New API function, host side.
	* fortran.c (omp_get_device_num_): New interface function.
	* libgomp-plugin.h (GOMP_DEVICE_NUM_VAR): Define macro symbol.
	* libgomp.map (OMP_5.0.2): New version space with omp_get_device_num,
	omp_get_device_num_.
	* libgomp.texi (omp_get_device_num): Add documentation for new API
	function.
	* omp.h.in (omp_get_device_num): Add declaration.
	* omp_lib.f90.in (omp_get_device_num): Likewise.
	* omp_lib.h.in (omp_get_device_num): Likewise.
	* target.c (gomp_load_image_to_device): If additional entry for device
	number exists at end of returned entries from 'load_image_func' hook,
	copy the assigned device number over to the device variable.

	* config/gcn/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global.
	(omp_get_device_num): New API function, device side.
	* config/plugin/plugin-gcn.c ("symcat.h"): Add include.
	(GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR
	at end of returned 'target_table' entries.

	* config/nvptx/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global.
	(omp_get_device_num): New API function, device side.
	* config/plugin/plugin-nvptx.c ("symcat.h"): Add include.
	(GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR
	at end of returned 'target_table' entries.

	* testsuite/lib/libgomp.exp
	(check_effective_target_offload_target_intelmic): New function for
	testing for intelmic offloading.
	* testsuite/libgomp.c-c++-common/target-45.c: New test.
	* testsuite/libgomp.fortran/target10.f90: New test.

[-- Attachment #2: omp_get_device_num-v2.patch --]
[-- Type: text/plain, Size: 16246 bytes --]

diff --git a/libgomp/config/gcn/icv-device.c b/libgomp/config/gcn/icv-device.c
index 72d4f7cff74..8f72028a6c8 100644
--- a/libgomp/config/gcn/icv-device.c
+++ b/libgomp/config/gcn/icv-device.c
@@ -70,6 +70,16 @@ omp_is_initial_device (void)
   return 0;
 }
 
+/* This is set to the device number of current GPU during device initialization,
+   when the offload image containing this libgomp portion is loaded.  */
+static int GOMP_DEVICE_NUM_VAR;
+
+int
+omp_get_device_num (void)
+{
+  return GOMP_DEVICE_NUM_VAR;
+}
+
 ialias (omp_set_default_device)
 ialias (omp_get_default_device)
 ialias (omp_get_initial_device)
diff --git a/libgomp/config/nvptx/icv-device.c b/libgomp/config/nvptx/icv-device.c
index 3b96890f338..e586da1d3a8 100644
--- a/libgomp/config/nvptx/icv-device.c
+++ b/libgomp/config/nvptx/icv-device.c
@@ -58,8 +58,19 @@ omp_is_initial_device (void)
   return 0;
 }
 
+/* This is set to the device number of current GPU during device initialization,
+   when the offload image containing this libgomp portion is loaded.  */
+static int GOMP_DEVICE_NUM_VAR;
+
+int
+omp_get_device_num (void)
+{
+  return GOMP_DEVICE_NUM_VAR;
+}
+
 ialias (omp_set_default_device)
 ialias (omp_get_default_device)
 ialias (omp_get_initial_device)
 ialias (omp_get_num_devices)
 ialias (omp_is_initial_device)
+ialias (omp_get_device_num)
diff --git a/libgomp/fortran.c b/libgomp/fortran.c
index e042702ac91..5cff8840eab 100644
--- a/libgomp/fortran.c
+++ b/libgomp/fortran.c
@@ -599,6 +599,12 @@ omp_get_initial_device_ (void)
   return omp_get_initial_device ();
 }
 
+int32_t
+omp_get_device_num_ (void)
+{
+  return omp_get_device_num ();
+}
+
 int32_t
 omp_get_max_task_priority_ (void)
 {
diff --git a/libgomp/icv-device.c b/libgomp/icv-device.c
index c1bedf46647..f11bdfa85c4 100644
--- a/libgomp/icv-device.c
+++ b/libgomp/icv-device.c
@@ -61,8 +61,17 @@ omp_is_initial_device (void)
   return 1;
 }
 
+int
+omp_get_device_num (void)
+{
+  /* By specification, this is equivalent to omp_get_initial_device
+     on the host.  */
+  return omp_get_initial_device ();
+}
+
 ialias (omp_set_default_device)
 ialias (omp_get_default_device)
 ialias (omp_get_initial_device)
 ialias (omp_get_num_devices)
 ialias (omp_is_initial_device)
+ialias (omp_get_device_num)
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index 62645ce9954..cf24a2bee41 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -102,6 +102,12 @@ struct addr_pair
   uintptr_t end;
 };
 
+/* This symbol is to name a target side variable that holds the designated
+   'device number' of the target device. The symbol needs to be available to
+   libgomp code and the offload plugin (which in the latter case must be
+   stringified).  */
+#define GOMP_DEVICE_NUM_VAR __gomp_device_num
+
 /* Miscellaneous functions.  */
 extern void *GOMP_PLUGIN_malloc (size_t) __attribute__ ((malloc));
 extern void *GOMP_PLUGIN_malloc_cleared (size_t) __attribute__ ((malloc));
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index 69aa69562b8..cc44885cba9 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -199,12 +199,18 @@ OMP_5.0.1 {
 	omp_fulfill_event_;
 } OMP_5.0;
 
+OMP_5.0.2 {
+  global:
+	omp_get_device_num;
+	omp_get_device_num_;
+} OMP_5.0.1;
+
 OMP_5.1 {
   global:
 	omp_display_env;
 	omp_display_env_;
 	omp_display_env_8_;
-} OMP_5.0.1;
+} OMP_5.0.2;
 
 GOMP_1.0 {
   global:
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 2c1f1b5968b..fc9e708a8d2 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -165,6 +165,7 @@ linkage, and do not throw exceptions.
 * omp_get_ancestor_thread_num:: Ancestor thread ID
 * omp_get_cancellation::        Whether cancellation support is enabled
 * omp_get_default_device::      Get the default device for target regions
+* omp_get_device_num::          Get device that current thread is running on
 * omp_get_dynamic::             Dynamic teams setting
 * omp_get_initial_device::      Device number of host device
 * omp_get_level::               Number of parallel regions
@@ -385,6 +386,34 @@ For OpenMP 5.1, this must be equal to the value returned by the
 
 
 
+@node omp_get_device_num
+@section @code{omp_get_device_num} -- Return device number of current device
+@table @asis
+@item @emph{Description}:
+This function returns a device number that represents the device that the
+current thread is executing on. For OpenMP 5.0, this must be equal to the
+value returned by the @code{omp_get_initial_device} function when called
+from the host.
+
+@item @emph{C/C++}
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int omp_get_device_num(void);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{integer function omp_get_device_num()}
+@end multitable
+
+@item @emph{See also}:
+@ref{omp_get_initial_device}
+
+@item @emph{Reference}:
+@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.2.37.
+@end table
+
+
+
 @node omp_get_level
 @section @code{omp_get_level} -- Obtain the current nesting level
 @table @asis
diff --git a/libgomp/omp.h.in b/libgomp/omp.h.in
index c93db968d2e..da34a9d98a6 100644
--- a/libgomp/omp.h.in
+++ b/libgomp/omp.h.in
@@ -243,6 +243,7 @@ extern void omp_get_partition_place_nums (int *) __GOMP_NOTHROW;
 extern void omp_set_default_device (int) __GOMP_NOTHROW;
 extern int omp_get_default_device (void) __GOMP_NOTHROW;
 extern int omp_get_num_devices (void) __GOMP_NOTHROW;
+extern int omp_get_device_num (void) __GOMP_NOTHROW;
 extern int omp_get_num_teams (void) __GOMP_NOTHROW;
 extern int omp_get_team_num (void) __GOMP_NOTHROW;
 
diff --git a/libgomp/omp_lib.f90.in b/libgomp/omp_lib.f90.in
index 5fc6587e49e..d7e804f4fd5 100644
--- a/libgomp/omp_lib.f90.in
+++ b/libgomp/omp_lib.f90.in
@@ -550,6 +550,12 @@
           end function omp_get_initial_device
         end interface
 
+        interface
+          function omp_get_device_num ()
+            integer (4) :: omp_get_device_num
+          end function omp_get_device_num
+        end interface
+
         interface
           function omp_get_max_task_priority ()
             integer (4) :: omp_get_max_task_priority
diff --git a/libgomp/omp_lib.h.in b/libgomp/omp_lib.h.in
index 9873cea9ac1..20c32645e3c 100644
--- a/libgomp/omp_lib.h.in
+++ b/libgomp/omp_lib.h.in
@@ -244,6 +244,9 @@
       external omp_get_initial_device
       integer(4) omp_get_initial_device
 
+      external omp_get_device_num
+      integer(4) omp_get_device_num
+
       external omp_get_max_task_priority
       integer(4) omp_get_max_task_priority
 
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 2548614a2e5..e6d8b6aae25 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -29,6 +29,7 @@
 /* {{{ Includes and defines  */
 
 #include "config.h"
+#include "symcat.h"
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
@@ -3305,6 +3306,7 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
   struct kernel_info *kernel;
   int kernel_count = image_desc->kernel_count;
   unsigned var_count = image_desc->global_variable_count;
+  int other_count = 1;
 
   agent = get_agent_info (ord);
   if (!agent)
@@ -3321,7 +3323,8 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
 
   GCN_DEBUG ("Encountered %d kernels in an image\n", kernel_count);
   GCN_DEBUG ("Encountered %u global variables in an image\n", var_count);
-  pair = GOMP_PLUGIN_malloc ((kernel_count + var_count - 2)
+  GCN_DEBUG ("Expect %d other variables in an image\n", other_count);
+  pair = GOMP_PLUGIN_malloc ((kernel_count + var_count + other_count - 2)
 			     * sizeof (struct addr_pair));
   *target_table = pair;
   module = (struct module_info *)
@@ -3396,6 +3399,35 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
       pair++;
     }
 
+  GCN_DEBUG ("Looking for variable %s\n", STRINGX (GOMP_DEVICE_NUM_VAR));
+
+  hsa_status_t status;
+  hsa_executable_symbol_t var_symbol;
+  status = hsa_fns.hsa_executable_get_symbol_fn (agent->executable, NULL,
+						 STRINGX (GOMP_DEVICE_NUM_VAR),
+						 agent->id, 0, &var_symbol);
+  if (status == HSA_STATUS_SUCCESS)
+    {
+      uint64_t device_num_varptr;
+      uint32_t device_num_varsize;
+
+      status = hsa_fns.hsa_executable_symbol_get_info_fn
+	(var_symbol, HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_ADDRESS,
+	 &device_num_varptr);
+      if (status != HSA_STATUS_SUCCESS)
+	hsa_fatal ("Could not extract a variable from its symbol", status);
+      status = hsa_fns.hsa_executable_symbol_get_info_fn
+	(var_symbol, HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_SIZE,
+	 &device_num_varsize);
+      if (status != HSA_STATUS_SUCCESS)
+	hsa_fatal ("Could not extract a variable size from its symbol", status);
+
+      pair->start = device_num_varptr;
+      pair->end = device_num_varptr + device_num_varsize;
+    }
+  else
+    pair->start = pair->end = 0;
+
   /* Ensure that constructors are run first.  */
   struct GOMP_kernel_launch_attributes kla =
     { 3,
@@ -3418,7 +3450,7 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
   if (module->fini_array_func)
     kernel_count--;
 
-  return kernel_count + var_count;
+  return kernel_count + var_count + other_count;
 }
 
 /* Unload GCN object-code module described by struct gcn_image_desc in
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 1215212d501..8b34895a88f 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -34,6 +34,7 @@
 #define _GNU_SOURCE
 #include "openacc.h"
 #include "config.h"
+#include "symcat.h"
 #include "libgomp-plugin.h"
 #include "oacc-plugin.h"
 #include "gomp-constants.h"
@@ -1265,7 +1266,7 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
   CUmodule module;
   const char *const *var_names;
   const struct targ_fn_launch *fn_descs;
-  unsigned int fn_entries, var_entries, i, j;
+  unsigned int fn_entries, var_entries, other_entries, i, j;
   struct targ_fn_descriptor *targ_fns;
   struct addr_pair *targ_tbl;
   const nvptx_tdata_t *img_header = (const nvptx_tdata_t *) target_data;
@@ -1295,8 +1296,11 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
   fn_entries = img_header->fn_num;
   fn_descs = img_header->fn_descs;
 
+  /* Currently, the only other entry kind is 'device number'.  */
+  other_entries = 1;
+
   targ_tbl = GOMP_PLUGIN_malloc (sizeof (struct addr_pair)
-				 * (fn_entries + var_entries));
+				 * (fn_entries + var_entries + other_entries));
   targ_fns = GOMP_PLUGIN_malloc (sizeof (struct targ_fn_descriptor)
 				 * fn_entries);
 
@@ -1345,9 +1349,22 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
       targ_tbl->end = targ_tbl->start + bytes;
     }
 
+  CUdeviceptr device_num_varptr;
+  size_t device_num_varsize;
+  CUresult r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, &device_num_varptr,
+				  &device_num_varsize, module,
+				  STRINGX (GOMP_DEVICE_NUM_VAR));
+  if (r == CUDA_SUCCESS)
+    {
+      targ_tbl->start = (uintptr_t) device_num_varptr;
+      targ_tbl->end = (uintptr_t) (device_num_varptr + device_num_varsize);
+    }
+  else
+    targ_tbl->start = targ_tbl->end = 0;
+
   nvptx_set_clocktick (module, dev);
 
-  return fn_entries + var_entries;
+  return fn_entries + var_entries + other_entries;
 }
 
 /* Unload the program described by TARGET_DATA.  DEV_DATA is the
diff --git a/libgomp/target.c b/libgomp/target.c
index 453b3210e40..96f3d6ad1cf 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1974,6 +1974,9 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
   int num_funcs = host_funcs_end - host_func_table;
   int num_vars  = (host_vars_end - host_var_table) / 2;
 
+  /* Others currently is only 'device_num' */
+  int num_others = 1;
+
   /* Load image to device and get target addresses for the image.  */
   struct addr_pair *target_table = NULL;
   int i, num_target_entries;
@@ -1982,7 +1985,9 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
     = devicep->load_image_func (devicep->target_id, version,
 				target_data, &target_table);
 
-  if (num_target_entries != num_funcs + num_vars)
+  if (num_target_entries != num_funcs + num_vars
+      /* Others (device_num) are included as trailing entries in pair list.  */
+      && num_target_entries != num_funcs + num_vars + num_others)
     {
       gomp_mutex_unlock (&devicep->lock);
       if (is_register_lock)
@@ -2054,6 +2059,31 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
       array++;
     }
 
+  /* Last entry is for the on-device 'device_num' variable. Tolerate case
+     where plugin does not return this entry.  */
+  if (num_funcs + num_vars < num_target_entries)
+    {
+      struct addr_pair *device_num_var = &target_table[num_funcs + num_vars];
+      if (device_num_var->start != 0)
+	{
+	  /* The index of the devicep within devices[] is regarded as its
+	     'device number', which is different from the per-device type
+	     devicep->target_id.  */
+	  int device_num_val = (int) (devicep - &devices[0]);
+	  if (device_num_var->end - device_num_var->start != sizeof (int))
+	    {
+	      gomp_mutex_unlock (&devicep->lock);
+	      gomp_fatal ("offload plugin managed 'device_num' not of expected "
+			  "format");
+	    }
+
+	  /* Copy device_num value to place on device memory, hereby actually
+	     designating its device number into effect.  */
+	  gomp_copy_host2dev (devicep, NULL, (void *) device_num_var->start,
+			      &device_num_val, sizeof (int), false, NULL);
+	}
+    }
+
   free (target_table);
 }
 
diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp
index a2050151e84..ba8a73275c5 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -374,6 +374,11 @@ proc check_effective_target_offload_target_amdgcn { } {
     return [libgomp_check_effective_target_offload_target "amdgcn"]
 }
 
+# Return 1 if compiling for offload target intelmic
+proc check_effective_target_offload_target_intelmic { } {
+    return [libgomp_check_effective_target_offload_target "*-intelmic"]
+}
+
 # Return 1 if offload device is available.
 proc check_effective_target_offload_device { } {
     return [check_runtime_nocache offload_device_available_ {
diff --git a/libgomp/testsuite/libgomp.c-c++-common/target-45.c b/libgomp/testsuite/libgomp.c-c++-common/target-45.c
new file mode 100644
index 00000000000..ec0d202e51c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/target-45.c
@@ -0,0 +1,30 @@
+/* { dg-do run { target { ! offload_target_intelmic } } } */
+
+#include <omp.h>
+#include <stdlib.h>
+
+int main (void)
+{
+
+  int host_device_num = omp_get_device_num ();
+
+  if (host_device_num != omp_get_initial_device ())
+    abort ();
+
+  int device_num;
+  int initial_device;
+
+  #pragma omp target map(from: device_num, initial_device)
+  {
+    initial_device = omp_is_initial_device ();
+    device_num = omp_get_device_num ();
+  }
+
+  if (initial_device && host_device_num != device_num)
+    abort ();
+
+  if (!initial_device && host_device_num == device_num)
+    abort ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.fortran/target10.f90 b/libgomp/testsuite/libgomp.fortran/target10.f90
new file mode 100644
index 00000000000..a7a6422fdc4
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/target10.f90
@@ -0,0 +1,20 @@
+! { dg-do run { target { ! offload_target_intelmic } } }
+
+program main
+  use omp_lib
+  implicit none
+  integer :: device_num, host_device_num
+  logical :: initial_device
+
+  host_device_num = omp_get_device_num ()
+  if (host_device_num .ne. omp_get_initial_device ()) stop 1
+
+  !$omp target map(from: device_num, initial_device)
+  initial_device = omp_is_initial_device ()
+  device_num = omp_get_device_num ()
+  !$omp end target
+
+  if (initial_device .and. host_device_num .ne. device_num) stop 2
+  if (initial_device .and. host_device_num .eq. device_num) stop 3
+
+end program main

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH, v2, libgomp, OpenMP 5.0] Implement omp_get_device_num
  2021-08-02 13:10   ` [PATCH, v2, " Chung-Lin Tang
@ 2021-08-03 11:08     ` Jakub Jelinek
  2021-08-03 12:07     ` Thomas Schwinge
  2021-08-03 12:22     ` [PATCH, v2, libgomp, OpenMP 5.0] " Thomas Schwinge
  2 siblings, 0 replies; 21+ messages in thread
From: Jakub Jelinek @ 2021-08-03 11:08 UTC (permalink / raw)
  To: Chung-Lin Tang
  Cc: Andrew Stubbs, gcc-patches, Thomas Schwinge, Vollweiler,
	Marcel (DI SW CAS EPS STS)

On Mon, Aug 02, 2021 at 09:10:57PM +0800, Chung-Lin Tang wrote:
> > I think this won't work properly with the intel micoffload, where the host
> > libgomp is used in the offloaded code.
> > For omp_is_initial_device, the plugin solves it by:
> > liboffloadmic/plugin/offload_target_main.cpp
> > overriding it:
> > /* Override the corresponding functions from libgomp.  */
> > extern "C" int
> > omp_is_initial_device (void) __GOMP_NOTHROW
> > {
> >    return 0;
> > }
> > extern "C" int32_t
> > omp_is_initial_device_ (void)
> > {
> >    return omp_is_initial_device ();
> > }
> > but guess it will need slightly more work because we need to copy the value
> > to the offloading device too.
> > It can be done incrementally though.
> 
> I guess this part of intelmic functionality will just have to wait later.
> There seem to be other parts of liboffloadmic that seems to need re-work,
> e.g. omp_get_num_devices() return mic_engines_total, where it should actually
> return the number of all devices (not just intelmic). omp_get_initial_device()
> returning -1 (which I don't quite understand), etc.

For omp_get_num_devices() the standard says:
When called from within a target region the effect of this routine is unspecified.
Ditto for omp_get_initial_device and various other routines.
So it is UB if those functions are called in offloaded regions.

> > For a single var it is acceptable (though, please avoid the double space
> > before offload plugin in the comment), but once we have more than one
> > variable, I think we should simply have a struct which will contain all the
> > parameters that need to be copied from the host to the offloading device at
> > image load time (and have eventually another struct that holds parameters
> > that we'll need to copy to the device on each kernel launch, I bet some ICVs
> > will be one category, other ICVs another one).
> 
> Actually, if you look at the 5.[01] specifications, omp_get_device_num() is not
> defined in terms of an ICV. Maybe it conceptually ought to be, but the current
> description of "the device number of the device on which the calling thread is
> executing" is not one if the defined ICVs.
> 
> It looks like there will eventually be some kind of ICV block handled in a similar
> way, but I think that the modifications will be straightforward then. For now,
> I think it's okay for GOMP_DEVICE_NUM_VAR to just be a normal global variable.

Yeah, it is ok for now, but even for the below mentioned omp_display_env
we'll need to replace it...

> There is a new function check_effective_target_offload_target_intelmic() in
> testsuite/lib/libgomp.exp, used to test for non-intelmic offloading situations.
> 
> Re-tested with no regressions, seeking approval for trunk.
> 
> Thanks,
> Chung-Lin
> 
> 2021-08-02  Chung-Lin Tang  <cltang@codesourcery.com>
> 
> libgomp/ChangeLog
> 
> 	* icv-device.c (omp_get_device_num): New API function, host side.
> 	* fortran.c (omp_get_device_num_): New interface function.
> 	* libgomp-plugin.h (GOMP_DEVICE_NUM_VAR): Define macro symbol.
> 	* libgomp.map (OMP_5.0.2): New version space with omp_get_device_num,
> 	omp_get_device_num_.
> 	* libgomp.texi (omp_get_device_num): Add documentation for new API
> 	function.
> 	* omp.h.in (omp_get_device_num): Add declaration.
> 	* omp_lib.f90.in (omp_get_device_num): Likewise.
> 	* omp_lib.h.in (omp_get_device_num): Likewise.
> 	* target.c (gomp_load_image_to_device): If additional entry for device
> 	number exists at end of returned entries from 'load_image_func' hook,
> 	copy the assigned device number over to the device variable.
> 
> 	* config/gcn/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global.
> 	(omp_get_device_num): New API function, device side.
> 	* config/plugin/plugin-gcn.c ("symcat.h"): Add include.
> 	(GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR
> 	at end of returned 'target_table' entries.
> 
> 	* config/nvptx/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global.
> 	(omp_get_device_num): New API function, device side.
> 	* config/plugin/plugin-nvptx.c ("symcat.h"): Add include.
> 	(GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR
> 	at end of returned 'target_table' entries.
> 
> 	* testsuite/lib/libgomp.exp
> 	(check_effective_target_offload_target_intelmic): New function for
> 	testing for intelmic offloading.
> 	* testsuite/libgomp.c-c++-common/target-45.c: New test.
> 	* testsuite/libgomp.fortran/target10.f90: New test.

Ok, thanks.

	Jakub


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH, v2, libgomp, OpenMP 5.0] Implement omp_get_device_num
  2021-08-02 13:10   ` [PATCH, v2, " Chung-Lin Tang
  2021-08-03 11:08     ` Jakub Jelinek
@ 2021-08-03 12:07     ` Thomas Schwinge
  2021-08-05 15:30       ` [PATCH, v3, libgomp, OpenMP 5.0, committed] " Chung-Lin Tang
  2021-08-03 12:22     ` [PATCH, v2, libgomp, OpenMP 5.0] " Thomas Schwinge
  2 siblings, 1 reply; 21+ messages in thread
From: Thomas Schwinge @ 2021-08-03 12:07 UTC (permalink / raw)
  To: Chung-Lin Tang
  Cc: Jakub Jelinek, Andrew Stubbs, gcc-patches, marcel_vollweiler,
	Tom de Vries

Hi Chung-Lin!

Just a few quick comments:

On 2021-08-02T21:10:57+0800, Chung-Lin Tang <cltang@codesourcery.com> wrote:
> On 2021/7/23 6:39 PM, Jakub Jelinek wrote:
>> On Fri, Jul 23, 2021 at 06:21:41PM +0800, Chung-Lin Tang wrote:
>>> --- a/libgomp/icv-device.c
>>> +++ b/libgomp/icv-device.c
>>> @@ -61,8 +61,17 @@ omp_is_initial_device (void)
>>>     return 1;
>>>   }
>>>
>>> +int
>>> +omp_get_device_num (void)
>>> +{
>>> +  /* By specification, this is equivalent to omp_get_initial_device
>>> +     on the host.  */
>>> +  return omp_get_initial_device ();
>>> +}
>>> +
>>
>> I think this won't work properly with the intel micoffload, where the host
>> libgomp is used in the offloaded code.
>> For omp_is_initial_device, the plugin solves it by:
>> liboffloadmic/plugin/offload_target_main.cpp
>> overriding it:
>> /* Override the corresponding functions from libgomp.  */
>> extern "C" int
>> omp_is_initial_device (void) __GOMP_NOTHROW
>> {
>>    return 0;
>> }
>>
>> extern "C" int32_t
>> omp_is_initial_device_ (void)
>> {
>>    return omp_is_initial_device ();
>> }
>> but guess it will need slightly more work because we need to copy the value
>> to the offloading device too.
>> It can be done incrementally though.
>
> I guess this part of intelmic functionality will just have to wait later.
> There seem to be other parts of liboffloadmic that seems to need re-work,
> e.g. omp_get_num_devices() return mic_engines_total, where it should actually
> return the number of all devices (not just intelmic). omp_get_initial_device()
> returning -1 (which I don't quite understand), etc.

(I'm confirming there are such pre-existing problems with Intel MIC; I've
never looked up any details.)

> Really suggest to have intelmic support be re-worked as an offload plugin inside
> libgomp, rather than floating outside by itself.

Well, it is a regular libgomp plugin, just its sources are not in
'libgomp/plugin/' and it's not built during libgomp build.  Are you
suggesting just to move it into 'libgomp/plugin/'?  This may need some
more complicated setup because of its 'liboffloadmic' dependency?


>>> --- a/libgomp/libgomp-plugin.h
>>> +++ b/libgomp/libgomp-plugin.h
>>> @@ -102,6 +102,12 @@ struct addr_pair
>>>     uintptr_t end;
>>>   };
>>>
>>> +/* This symbol is to name a target side variable that holds the designated
>>> +   'device number' of the target device. The symbol needs to be available to
>>> +   libgomp code and the  offload plugin (which in the latter case must be
>>> +   stringified).  */
>>> +#define GOMP_DEVICE_NUM_VAR __gomp_device_num
>>
>> For a single var it is acceptable (though, please avoid the double space
>> before offload plugin in the comment), but once we have more than one
>> variable, I think we should simply have a struct which will contain all the
>> parameters that need to be copied from the host to the offloading device at
>> image load time (and have eventually another struct that holds parameters
>> that we'll need to copy to the device on each kernel launch, I bet some ICVs
>> will be one category, other ICVs another one).

ACK.  Also other program state, like 'fenv' or the gfortran "state blob".
This is <https://gcc.gnu.org/PR92827> "Missing data/state
sharing/propagation between host and offloading devices".

> Actually, if you look at the 5.[01] specifications, omp_get_device_num() is not
> defined in terms of an ICV. Maybe it conceptually ought to be, but the current
> description of "the device number of the device on which the calling thread is
> executing" is not one if the defined ICVs.
>
> It looks like there will eventually be some kind of ICV block handled in a similar
> way, but I think that the modifications will be straightforward then. For now,
> I think it's okay for GOMP_DEVICE_NUM_VAR to just be a normal global variable.

There is, by the way, precedent for that:
'libgomp/config/nvptx/time.c:double __nvptx_clocktick', set up in
'libgomp/plugin/plugin-nvptx.c:nvptx_set_clocktick' ('cuModuleGetGlobal'
to get the device address, followed by 'cuMemcpyHtoD'), invoked from
'libgomp/plugin/plugin-nvptx.c:GOMP_OFFLOAD_load_image', quite simple.

For the case discussed here, we're now adding more complex
'other_count'/'other_entries'/'num_others' bookkeeping.  (Great that all
of the plugins plus 'libgomp/target.c' invented their own terminology...)
;-)

> --- a/libgomp/plugin/plugin-gcn.c
> +++ b/libgomp/plugin/plugin-gcn.c

> @@ -3305,6 +3306,7 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
>    struct kernel_info *kernel;
>    int kernel_count = image_desc->kernel_count;
>    unsigned var_count = image_desc->global_variable_count;
> +  int other_count = 1;
>
>    agent = get_agent_info (ord);
>    if (!agent)
> @@ -3321,7 +3323,8 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
>
>    GCN_DEBUG ("Encountered %d kernels in an image\n", kernel_count);
>    GCN_DEBUG ("Encountered %u global variables in an image\n", var_count);
> -  pair = GOMP_PLUGIN_malloc ((kernel_count + var_count - 2)
> +  GCN_DEBUG ("Expect %d other variables in an image\n", other_count);
> +  pair = GOMP_PLUGIN_malloc ((kernel_count + var_count + other_count - 2)
>                            * sizeof (struct addr_pair));
>    *target_table = pair;
>    module = (struct module_info *)
> @@ -3396,6 +3399,35 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
>        pair++;
>      }
>
> +  GCN_DEBUG ("Looking for variable %s\n", STRINGX (GOMP_DEVICE_NUM_VAR));
> +
> +  hsa_status_t status;
> +  hsa_executable_symbol_t var_symbol;
> +  status = hsa_fns.hsa_executable_get_symbol_fn (agent->executable, NULL,
> +                                              STRINGX (GOMP_DEVICE_NUM_VAR),
> +                                              agent->id, 0, &var_symbol);
> +  if (status == HSA_STATUS_SUCCESS)
> +    {
> +      uint64_t device_num_varptr;
> +      uint32_t device_num_varsize;
> +
> +      status = hsa_fns.hsa_executable_symbol_get_info_fn
> +     (var_symbol, HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_ADDRESS,
> +      &device_num_varptr);
> +      if (status != HSA_STATUS_SUCCESS)
> +     hsa_fatal ("Could not extract a variable from its symbol", status);
> +      status = hsa_fns.hsa_executable_symbol_get_info_fn
> +     (var_symbol, HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_SIZE,
> +      &device_num_varsize);
> +      if (status != HSA_STATUS_SUCCESS)
> +     hsa_fatal ("Could not extract a variable size from its symbol", status);
> +
> +      pair->start = device_num_varptr;
> +      pair->end = device_num_varptr + device_num_varsize;
> +    }
> +  else
> +    pair->start = pair->end = 0;
> +

Is the 'else' branch to accomodate "old" executables running against
"new" libgomp?  If yes, then please add a comment, "for compatibility
with pre-GCC 12 executables" or similar.

Also, add 'pair++;', to avoid future confusion?

>    /* Ensure that constructors are run first.  */
>    struct GOMP_kernel_launch_attributes kla =
>      { 3,
> @@ -3418,7 +3450,7 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
>    if (module->fini_array_func)
>      kernel_count--;
>
> -  return kernel_count + var_count;
> +  return kernel_count + var_count + other_count;
>  }
>
>  /* Unload GCN object-code module described by struct gcn_image_desc in

> --- a/libgomp/plugin/plugin-nvptx.c
> +++ b/libgomp/plugin/plugin-nvptx.c

> @@ -1265,7 +1266,7 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
>    CUmodule module;
>    const char *const *var_names;
>    const struct targ_fn_launch *fn_descs;
> -  unsigned int fn_entries, var_entries, i, j;
> +  unsigned int fn_entries, var_entries, other_entries, i, j;
>    struct targ_fn_descriptor *targ_fns;
>    struct addr_pair *targ_tbl;
>    const nvptx_tdata_t *img_header = (const nvptx_tdata_t *) target_data;
> @@ -1295,8 +1296,11 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
>    fn_entries = img_header->fn_num;
>    fn_descs = img_header->fn_descs;
>
> +  /* Currently, the only other entry kind is 'device number'.  */
> +  other_entries = 1;
> +
>    targ_tbl = GOMP_PLUGIN_malloc (sizeof (struct addr_pair)
> -                              * (fn_entries + var_entries));
> +                              * (fn_entries + var_entries + other_entries));
>    targ_fns = GOMP_PLUGIN_malloc (sizeof (struct targ_fn_descriptor)
>                                * fn_entries);
>
> @@ -1345,9 +1349,22 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
>        targ_tbl->end = targ_tbl->start + bytes;
>      }
>
> +  CUdeviceptr device_num_varptr;
> +  size_t device_num_varsize;
> +  CUresult r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, &device_num_varptr,
> +                               &device_num_varsize, module,
> +                               STRINGX (GOMP_DEVICE_NUM_VAR));
> +  if (r == CUDA_SUCCESS)
> +    {
> +      targ_tbl->start = (uintptr_t) device_num_varptr;
> +      targ_tbl->end = (uintptr_t) (device_num_varptr + device_num_varsize);
> +    }
> +  else
> +    targ_tbl->start = targ_tbl->end = 0;
> +

Same comment for 'else', please.

Also, 'targ_tbl++;', to avoid future confusion?

>    nvptx_set_clocktick (module, dev);
>
> -  return fn_entries + var_entries;
> +  return fn_entries + var_entries + other_entries;
>  }
>
>  /* Unload the program described by TARGET_DATA.  DEV_DATA is the

> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -1974,6 +1974,9 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
>    int num_funcs = host_funcs_end - host_func_table;
>    int num_vars  = (host_vars_end - host_var_table) / 2;
>
> +  /* Others currently is only 'device_num' */
> +  int num_others = 1;
> +
>    /* Load image to device and get target addresses for the image.  */
>    struct addr_pair *target_table = NULL;
>    int i, num_target_entries;
> @@ -1982,7 +1985,9 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
>      = devicep->load_image_func (devicep->target_id, version,
>                               target_data, &target_table);
>

Do I understand right that the special-casing here:

> -  if (num_target_entries != num_funcs + num_vars)
> +  if (num_target_entries != num_funcs + num_vars
> +      /* Others (device_num) are included as trailing entries in pair list.  */
> +      && num_target_entries != num_funcs + num_vars + num_others)
>      {
>        gomp_mutex_unlock (&devicep->lock);
>        if (is_register_lock)
> @@ -2054,6 +2059,31 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
>        array++;
>      }
>
> +  /* Last entry is for the on-device 'device_num' variable. Tolerate case
> +     where plugin does not return this entry.  */
> +  if (num_funcs + num_vars < num_target_entries)

... and here is just for compatibility with the unmodified Intel MIC
plugin?  Wouldn't it be simpler to just add a dummy value to that one, to
avoid this special-casing?

> +    {
> +      struct addr_pair *device_num_var = &target_table[num_funcs + num_vars];
> +      if (device_num_var->start != 0)

Please also here add a comment about the '0' case, for completeness.

> +     {
> +       /* The index of the devicep within devices[] is regarded as its
> +          'device number', which is different from the per-device type
> +          devicep->target_id.  */
> +       int device_num_val = (int) (devicep - &devices[0]);
> +       if (device_num_var->end - device_num_var->start != sizeof (int))
> +         {
> +           gomp_mutex_unlock (&devicep->lock);

Add:

    if (is_register_lock)
      gomp_mutex_unlock (&register_lock);

..., I suppose?

> +           gomp_fatal ("offload plugin managed 'device_num' not of expected "
> +                       "format");
> +         }
> +
> +       /* Copy device_num value to place on device memory, hereby actually
> +          designating its device number into effect.  */
> +       gomp_copy_host2dev (devicep, NULL, (void *) device_num_var->start,
> +                           &device_num_val, sizeof (int), false, NULL);
> +     }
> +    }
> +
>    free (target_table);
>  }

Am I understanding this correctly that instead of handling it all inside
the plugins (like '__nvptx_clocktick'), we're here doing the
'other_count'/'other_entries'/'num_others' bookkeeping plus
'gomp_copy_host2dev', because we don't know 'device_num_val' inside the
plugins, and also to introduce a more generic interface for future ICVs
etc.?

It feels to me as if this interface can be improved still.  (For example,
would it make sense to pass into the plugins (via 'load_image_func') some
kind of table, containing 'device_num_val' etc., and then let each plugin
to the setup?)  But I'm OK if we leave that for later, once we actually
add more ICVs etc.


> --- a/libgomp/config/gcn/icv-device.c
> +++ b/libgomp/config/gcn/icv-device.c
> @@ -70,6 +70,16 @@ omp_is_initial_device (void)
>    return 0;
>  }
>
> +/* This is set to the device number of current GPU during device initialization,
> +   when the offload image containing this libgomp portion is loaded.  */
> +static int GOMP_DEVICE_NUM_VAR;
> +
> +int
> +omp_get_device_num (void)
> +{
> +  return GOMP_DEVICE_NUM_VAR;
> +}
> +
>  ialias (omp_set_default_device)
>  ialias (omp_get_default_device)
>  ialias (omp_get_initial_device)

I suppose also add 'ialias (omp_get_device_num)' here, like...

> --- a/libgomp/config/nvptx/icv-device.c
> +++ b/libgomp/config/nvptx/icv-device.c
> @@ -58,8 +58,19 @@ omp_is_initial_device (void)
>    return 0;
>  }
>
> +/* This is set to the device number of current GPU during device initialization,
> +   when the offload image containing this libgomp portion is loaded.  */
> +static int GOMP_DEVICE_NUM_VAR;
> +
> +int
> +omp_get_device_num (void)
> +{
> +  return GOMP_DEVICE_NUM_VAR;
> +}
> +
>  ialias (omp_set_default_device)
>  ialias (omp_get_default_device)
>  ialias (omp_get_initial_device)
>  ialias (omp_get_num_devices)
>  ialias (omp_is_initial_device)
> +ialias (omp_get_device_num)

... here, and...

> --- a/libgomp/icv-device.c
> +++ b/libgomp/icv-device.c
> @@ -61,8 +61,17 @@ omp_is_initial_device (void)
>    return 1;
>  }
>
> +int
> +omp_get_device_num (void)
> +{
> +  /* By specification, this is equivalent to omp_get_initial_device
> +     on the host.  */
> +  return omp_get_initial_device ();
> +}
> +
>  ialias (omp_set_default_device)
>  ialias (omp_get_default_device)
>  ialias (omp_get_initial_device)
>  ialias (omp_get_num_devices)
>  ialias (omp_is_initial_device)
> +ialias (omp_get_device_num)

... here?


> --- a/libgomp/testsuite/lib/libgomp.exp
> +++ b/libgomp/testsuite/lib/libgomp.exp

> +# Return 1 if compiling for offload target intelmic
> +proc check_effective_target_offload_target_intelmic { } {
> +    return [libgomp_check_effective_target_offload_target "*-intelmic"]
> +}

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c-c++-common/target-45.c
> @@ -0,0 +1,30 @@
> +/* { dg-do run { target { ! offload_target_intelmic } } } */

This means that the test case is skipped as soon as the compiler is
configured for Intel MIC offloading -- even if that's not used during
execution.

From some older experiment of mine, I do have a
'check_effective_target_offload_device_intel_mic', which I'll propose as
a follow-up, once this is in.

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.fortran/target10.f90
> @@ -0,0 +1,20 @@
> +! { dg-do run { target { ! offload_target_intelmic } } }

Likewise.

> +program main
> +  use omp_lib
> +  implicit none
> +  integer :: device_num, host_device_num
> +  logical :: initial_device
> +
> +  host_device_num = omp_get_device_num ()
> +  if (host_device_num .ne. omp_get_initial_device ()) stop 1
> +
> +  !$omp target map(from: device_num, initial_device)
> +  initial_device = omp_is_initial_device ()
> +  device_num = omp_get_device_num ()
> +  !$omp end target
> +
> +  if (initial_device .and. host_device_num .ne. device_num) stop 2

That one matches 'libgomp.c-c++-common/target-45.c':

    if (initial_device && host_device_num != device_num)
      abort ();

..., but here:

> +  if (initial_device .and. host_device_num .eq. device_num) stop 3

... shouldn't that be '.not.initial_device', like in:

    if (!initial_device && host_device_num == device_num)
      abort ();


(Also, I'm not familiar with Fortran operator precedence rules, so
probably would put the individual expressions into braces.  ;-) -- But I
trust you know better than I do, of course.)


Grüße
 Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH, v2, libgomp, OpenMP 5.0] Implement omp_get_device_num
  2021-08-02 13:10   ` [PATCH, v2, " Chung-Lin Tang
  2021-08-03 11:08     ` Jakub Jelinek
  2021-08-03 12:07     ` Thomas Schwinge
@ 2021-08-03 12:22     ` Thomas Schwinge
  2021-08-05  8:34       ` Chung-Lin Tang
  2 siblings, 1 reply; 21+ messages in thread
From: Thomas Schwinge @ 2021-08-03 12:22 UTC (permalink / raw)
  To: Chung-Lin Tang
  Cc: Jakub Jelinek, gcc-patches, Andrew Stubbs, Tom de Vries,
	marcel_vollweiler

Hi Chung-Lin!

On 2021-08-02T21:10:57+0800, Chung-Lin Tang <cltang@codesourcery.com> wrote:
> --- a/libgomp/fortran.c
> +++ b/libgomp/fortran.c

> +int32_t
> +omp_get_device_num_ (void)
> +{
> +  return omp_get_device_num ();
> +}

Missing 'ialias_redirect (omp_get_device_num)'?


Grüße
 Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH, v2, libgomp, OpenMP 5.0] Implement omp_get_device_num
  2021-08-03 12:22     ` [PATCH, v2, libgomp, OpenMP 5.0] " Thomas Schwinge
@ 2021-08-05  8:34       ` Chung-Lin Tang
  0 siblings, 0 replies; 21+ messages in thread
From: Chung-Lin Tang @ 2021-08-05  8:34 UTC (permalink / raw)
  To: Thomas Schwinge
  Cc: Jakub Jelinek, gcc-patches, Andrew Stubbs, Tom de Vries,
	marcel_vollweiler



On 2021/8/3 8:22 PM, Thomas Schwinge wrote:
> Hi Chung-Lin!
> 
> On 2021-08-02T21:10:57+0800, Chung-Lin Tang <cltang@codesourcery.com> wrote:
>> --- a/libgomp/fortran.c
>> +++ b/libgomp/fortran.c
> 
>> +int32_t
>> +omp_get_device_num_ (void)
>> +{
>> +  return omp_get_device_num ();
>> +}
> 
> Missing 'ialias_redirect (omp_get_device_num)'?
> 
> 
> Grüße
>   Thomas
> 

Thanks, will fix before committing.

Chung-Lin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH, v3, libgomp, OpenMP 5.0, committed] Implement omp_get_device_num
  2021-08-03 12:07     ` Thomas Schwinge
@ 2021-08-05 15:30       ` Chung-Lin Tang
  2021-08-09  7:16         ` [PATCH, libgomp, OpenMP 5.0, OG11, " Chung-Lin Tang
  0 siblings, 1 reply; 21+ messages in thread
From: Chung-Lin Tang @ 2021-08-05 15:30 UTC (permalink / raw)
  To: Thomas Schwinge
  Cc: Jakub Jelinek, Andrew Stubbs, gcc-patches, marcel_vollweiler,
	Tom de Vries

[-- Attachment #1: Type: text/plain, Size: 12349 bytes --]



On 2021/8/3 8:07 PM, Thomas Schwinge wrote:
>> Really suggest to have intelmic support be re-worked as an offload plugin inside
>> libgomp, rather than floating outside by itself.
> Well, it is a regular libgomp plugin, just its sources are not in
> 'libgomp/plugin/' and it's not built during libgomp build.  Are you
> suggesting just to move it into 'libgomp/plugin/'?  This may need some
> more complicated setup because of its 'liboffloadmic' dependency?

Well it appears that liboffloadmic is layered atop of a COI API (Common Offload Interface?)
that is supposed to be the true proprietary interface to Intel MIC devices.

I think it is more reasonable to have each libgomp plugin to directly be built
atop of the vendor-specific interface for the accelerator. Having another in-tree library
serve in-between makes things a bit unnecessarily complex.

(I'm not sure if I recall correctly, but did liboffloadmic have another use besides for
libgomp?)

>>>> --- a/libgomp/libgomp-plugin.h
>>>> +++ b/libgomp/libgomp-plugin.h
>>>> @@ -102,6 +102,12 @@ struct addr_pair
>>>>      uintptr_t end;
>>>>    };
>>>>    
>>>> +/* This symbol is to name a target side variable that holds the designated
>>>> +   'device number' of the target device. The symbol needs to be available to
>>>> +   libgomp code and the  offload plugin (which in the latter case must be
>>>> +   stringified).  */
>>>> +#define GOMP_DEVICE_NUM_VAR __gomp_device_num
>>> For a single var it is acceptable (though, please avoid the double space
>>> before offload plugin in the comment), but once we have more than one
>>> variable, I think we should simply have a struct which will contain all the
>>> parameters that need to be copied from the host to the offloading device at
>>> image load time (and have eventually another struct that holds parameters
>>> that we'll need to copy to the device on each kernel launch, I bet some ICVs
>>> will be one category, other ICVs another one).
> ACK.  Also other program state, like 'fenv' or the gfortran "state blob".
> This is<https://gcc.gnu.org/PR92827>  "Missing data/state
> sharing/propagation between host and offloading devices".

Okay, so we actually have a PR number for this :)


>> Actually, if you look at the 5.[01] specifications, omp_get_device_num() is not
>> defined in terms of an ICV. Maybe it conceptually ought to be, but the current
>> description of "the device number of the device on which the calling thread is
>> executing" is not one if the defined ICVs.
>>
>> It looks like there will eventually be some kind of ICV block handled in a similar
>> way, but I think that the modifications will be straightforward then. For now,
>> I think it's okay for GOMP_DEVICE_NUM_VAR to just be a normal global variable.
> There is, by the way, precedent for that:
> 'libgomp/config/nvptx/time.c:double __nvptx_clocktick', set up in
> 'libgomp/plugin/plugin-nvptx.c:nvptx_set_clocktick' ('cuModuleGetGlobal'
> to get the device address, followed by 'cuMemcpyHtoD'), invoked from
> 'libgomp/plugin/plugin-nvptx.c:GOMP_OFFLOAD_load_image', quite simple.
> 
> For the case discussed here, we're now adding more complex
> 'other_count'/'other_entries'/'num_others' bookkeeping.  (Great that all
> of the plugins plus 'libgomp/target.c' invented their own terminology...)
> ;-)

Well, that is kind of what nvptx is doing by itself internally.
(e.g. libgomp/config/gcn/time.c does not use such external setting by the plugin)

Maybe that "last" entry handled by load_image will eventually turn into a large
block struct to handle all such cases.

>> --- a/libgomp/plugin/plugin-gcn.c
>> +++ b/libgomp/plugin/plugin-gcn.c
...
>> +  if (status == HSA_STATUS_SUCCESS)
>> +    {
>> +      uint64_t device_num_varptr;
>> +      uint32_t device_num_varsize;
>> +
>> +      status = hsa_fns.hsa_executable_symbol_get_info_fn
>> +	(var_symbol, HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_ADDRESS,
>> +	 &device_num_varptr);
>> +      if (status != HSA_STATUS_SUCCESS)
>> +	hsa_fatal ("Could not extract a variable from its symbol", status);
>> +      status = hsa_fns.hsa_executable_symbol_get_info_fn
>> +	(var_symbol, HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_SIZE,
>> +	 &device_num_varsize);
>> +      if (status != HSA_STATUS_SUCCESS)
>> +	hsa_fatal ("Could not extract a variable size from its symbol", status);
>> +
>> +      pair->start = device_num_varptr;
>> +      pair->end = device_num_varptr + device_num_varsize;
>> +    }
>> +  else
>> +    pair->start = pair->end = 0;
>> +
> Is the 'else' branch to accomodate "old" executables running against
> "new" libgomp?  If yes, then please add a comment, "for compatibility
> with pre-GCC 12 executables" or similar.

No, it's because GOMP_DEVICE_NUM_VAR is only linked in when the program
actually uses omp_get_device_num().

Even when a program does use omp_get_device_num(), only that offload image
which contains that part of libgomp has the device number variable defined.

So the else case should actually be quite common.

> Also, add 'pair++;', to avoid future confusion?

Done.

>> +  if (r == CUDA_SUCCESS)
>> +    {
>> +      targ_tbl->start = (uintptr_t) device_num_varptr;
>> +      targ_tbl->end = (uintptr_t) (device_num_varptr + device_num_varsize);
>> +    }
>> +  else
>> +    targ_tbl->start = targ_tbl->end = 0;
>> +
> Same comment for 'else', please.
> 
> Also, 'targ_tbl++;', to avoid future confusion?

Done.

>>     nvptx_set_clocktick (module, dev);
>>   
>> -  return fn_entries + var_entries;
>> +  return fn_entries + var_entries + other_entries;
>>   }
>>   
>>   /* Unload the program described by TARGET_DATA.  DEV_DATA is the
>> --- a/libgomp/target.c
>> +++ b/libgomp/target.c
>> @@ -1974,6 +1974,9 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
>>     int num_funcs = host_funcs_end - host_func_table;
>>     int num_vars  = (host_vars_end - host_var_table) / 2;
>>   
>> +  /* Others currently is only 'device_num' */
>> +  int num_others = 1;
>> +
>>     /* Load image to device and get target addresses for the image.  */
>>     struct addr_pair *target_table = NULL;
>>     int i, num_target_entries;
>> @@ -1982,7 +1985,9 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
>>       = devicep->load_image_func (devicep->target_id, version,
>>   				target_data, &target_table);
>>   
> Do I understand right that the special-casing here:
> 
>> -  if (num_target_entries != num_funcs + num_vars)
>> +  if (num_target_entries != num_funcs + num_vars
>> +      /* Others (device_num) are included as trailing entries in pair list.  */
>> +      && num_target_entries != num_funcs + num_vars + num_others)
>>       {
>>         gomp_mutex_unlock (&devicep->lock);
>>         if (is_register_lock)
>> @@ -2054,6 +2059,31 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
>>         array++;
>>       }
>>   
>> +  /* Last entry is for the on-device 'device_num' variable. Tolerate case
>> +     where plugin does not return this entry.  */
>> +  if (num_funcs + num_vars < num_target_entries)
> ... and here is just for compatibility with the unmodified Intel MIC
> plugin?  Wouldn't it be simpler to just add a dummy value to that one, to
> avoid this special-casing?

Adding that "null" code to liboffloadmic is not lesser effort, and is not as robust as
simply checking here in gomp_load_image_to_device.

> 
>> +    {
>> +      struct addr_pair *device_num_var = &target_table[num_funcs + num_vars];
>> +      if (device_num_var->start != 0)
> Please also here add a comment about the '0' case, for completeness.

Done.

>> +	{
>> +	  /* The index of the devicep within devices[] is regarded as its
>> +	     'device number', which is different from the per-device type
>> +	     devicep->target_id.  */
>> +	  int device_num_val = (int) (devicep - &devices[0]);
>> +	  if (device_num_var->end - device_num_var->start != sizeof (int))
>> +	    {
>> +	      gomp_mutex_unlock (&devicep->lock);
> Add:
> 
>      if (is_register_lock)
>        gomp_mutex_unlock (&register_lock);
> 
> ..., I suppose?

Done, thanks for catching this one.

>> +	  /* Copy device_num value to place on device memory, hereby actually
>> +	     designating its device number into effect.  */
>> +	  gomp_copy_host2dev (devicep, NULL, (void *) device_num_var->start,
>> +			      &device_num_val, sizeof (int), false, NULL);
>> +	}
>> +    }
>> +
>>     free (target_table);
>>   }
> Am I understanding this correctly that instead of handling it all inside
> the plugins (like '__nvptx_clocktick'), we're here doing the
> 'other_count'/'other_entries'/'num_others' bookkeeping plus
> 'gomp_copy_host2dev', because we don't know 'device_num_val' inside the
> plugins, and also to introduce a more generic interface for future ICVs
> etc.?

As mentioned above, cases like "__nvptx_clocktick" should be seen as target
specific, and can be implemented entirely within nvptx's domain. It's really
not a design issue, and any target can do similar device initialization
if they can.

> It feels to me as if this interface can be improved still.  (For example,
> would it make sense to pass into the plugins (via 'load_image_func') some
> kind of table, containing 'device_num_val' etc., and then let each plugin
> to the setup?)  But I'm OK if we leave that for later, once we actually
> add more ICVs etc.

I thought of that when implementing the current patch too, but:

1) It would likely be yet another plugin-hook added (not undoable, but kind
    of superfluous IMHO).

2) It would be further exposing higher-level concepts (OpenMP or OpenACC)
    into the plugin, and generally speaking the plugin has a more limited
    view of the entire libgomp source. This means that there will be cases
    where putting some kind of setup/initialization in the plugin will be
    awkward and hard to implement (without pulling even more stuff into the
    plugin).

    Having the plugin simply do the job of finding the device location of
    an opaque variable with pre-arranged name and size, and return it for
    libgomp to do the setup work, is a better separation of interface.

>> --- a/libgomp/config/gcn/icv-device.c
>> +++ b/libgomp/config/gcn/icv-device.c
>> @@ -70,6 +70,16 @@ omp_is_initial_device (void)
>>     return 0;
>>   }
>>   
>> +/* This is set to the device number of current GPU during device initialization,
>> +   when the offload image containing this libgomp portion is loaded.  */
>> +static int GOMP_DEVICE_NUM_VAR;
>> +
>> +int
>> +omp_get_device_num (void)
>> +{
>> +  return GOMP_DEVICE_NUM_VAR;
>> +}
>> +
>>   ialias (omp_set_default_device)
>>   ialias (omp_get_default_device)
>>   ialias (omp_get_initial_device)
> I suppose also add 'ialias (omp_get_device_num)' here, like...

Done, thanks for catching.

>> --- a/libgomp/testsuite/lib/libgomp.exp
>> +++ b/libgomp/testsuite/lib/libgomp.exp
>> +# Return 1 if compiling for offload target intelmic
>> +proc check_effective_target_offload_target_intelmic { } {
>> +    return [libgomp_check_effective_target_offload_target "*-intelmic"]
>> +}
>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.c-c++-common/target-45.c
>> @@ -0,0 +1,30 @@
>> +/* { dg-do run { target { ! offload_target_intelmic } } } */
> This means that the test case is skipped as soon as the compiler is
> configured for Intel MIC offloading -- even if that's not used during
> execution.
> 
>  From some older experiment of mine, I do have a
> 'check_effective_target_offload_device_intel_mic', which I'll propose as
> a follow-up, once this is in.

Great.

>> +  if (initial_device .and. host_device_num .ne. device_num) stop 2
> That one matches 'libgomp.c-c++-common/target-45.c':
> 
>      if (initial_device && host_device_num != device_num)
>        abort ();
> 
> ..., but here:
> 
>> +  if (initial_device .and. host_device_num .eq. device_num) stop 3
> ... shouldn't that be '.not.initial_device', like in:
> 
>      if (!initial_device && host_device_num == device_num)
>        abort ();

Yeah, Tobias also caught this as well :)

> 
> (Also, I'm not familiar with Fortran operator precedence rules, so
> probably would put the individual expressions into braces.;-)  -- But I
> trust you know better than I do, of course.)

Done.

Attached is the final "v3" patch that I committed.

Thanks,
Chung-Lin



[-- Attachment #2: omp_get_device_num-v3.patch --]
[-- Type: text/plain, Size: 20460 bytes --]

From 0bac793ed6bad2c0c13cd1e93a1aa5808467afc8 Mon Sep 17 00:00:00 2001
From: Chung-Lin Tang <cltang@codesourcery.com>
Date: Thu, 5 Aug 2021 23:29:03 +0800
Subject: [PATCH] openmp: Implement omp_get_device_num routine

This patch implements the omp_get_device_num library routine, specified in
OpenMP 5.0.

GOMP_DEVICE_NUM_VAR is a macro symbol which defines name of a "device number"
variable, is defined on the device-side libgomp, has it's address returned to
host-side libgomp during device initialization, and the host libgomp then
sets its value to the designated device number.

libgomp/ChangeLog:

	* icv-device.c (omp_get_device_num): New API function, host side.
	* fortran.c (omp_get_device_num_): New interface function.
	* libgomp-plugin.h (GOMP_DEVICE_NUM_VAR): Define macro symbol.
	* libgomp.map (OMP_5.0.2): New version space with omp_get_device_num,
	omp_get_device_num_.
	* libgomp.texi (omp_get_device_num): Add documentation for new API
	function.
	* omp.h.in (omp_get_device_num): Add declaration.
	* omp_lib.f90.in (omp_get_device_num): Likewise.
	* omp_lib.h.in (omp_get_device_num): Likewise.
	* target.c (gomp_load_image_to_device): If additional entry for device
	number exists at end of returned entries from 'load_image_func' hook,
	copy the assigned device number over to the device variable.

	* config/gcn/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global.
	(omp_get_device_num): New API function, device side.
	* plugin/plugin-gcn.c ("symcat.h"): Add include.
	(GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR
	at end of returned 'target_table' entries.

	* config/nvptx/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global.
	(omp_get_device_num): New API function, device side.
	* plugin/plugin-nvptx.c ("symcat.h"): Add include.
	(GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR
	at end of returned 'target_table' entries.

	* testsuite/lib/libgomp.exp
	(check_effective_target_offload_target_intelmic): New function for
	testing for intelmic offloading.
	* testsuite/libgomp.c-c++-common/target-45.c: New test.
	* testsuite/libgomp.fortran/target10.f90: New test.
---
 libgomp/config/gcn/icv-device.c               | 11 ++++++
 libgomp/config/nvptx/icv-device.c             | 11 ++++++
 libgomp/fortran.c                             |  7 ++++
 libgomp/icv-device.c                          |  9 +++++
 libgomp/libgomp-plugin.h                      |  6 +++
 libgomp/libgomp.map                           |  8 +++-
 libgomp/libgomp.texi                          | 29 ++++++++++++++
 libgomp/omp.h.in                              |  1 +
 libgomp/omp_lib.f90.in                        |  6 +++
 libgomp/omp_lib.h.in                          |  3 ++
 libgomp/plugin/plugin-gcn.c                   | 38 ++++++++++++++++++-
 libgomp/plugin/plugin-nvptx.c                 | 25 ++++++++++--
 libgomp/target.c                              | 36 +++++++++++++++++-
 libgomp/testsuite/lib/libgomp.exp             |  5 +++
 .../libgomp.c-c++-common/target-45.c          | 30 +++++++++++++++
 .../testsuite/libgomp.fortran/target10.f90    | 20 ++++++++++
 16 files changed, 238 insertions(+), 7 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/target-45.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/target10.f90

diff --git a/libgomp/config/gcn/icv-device.c b/libgomp/config/gcn/icv-device.c
index 72d4f7cff74..34e0f8346f2 100644
--- a/libgomp/config/gcn/icv-device.c
+++ b/libgomp/config/gcn/icv-device.c
@@ -70,6 +70,16 @@ omp_is_initial_device (void)
   return 0;
 }
 
+/* This is set to the device number of current GPU during device initialization,
+   when the offload image containing this libgomp portion is loaded.  */
+static volatile int GOMP_DEVICE_NUM_VAR;
+
+int
+omp_get_device_num (void)
+{
+  return GOMP_DEVICE_NUM_VAR;
+}
+
 ialias (omp_set_default_device)
 ialias (omp_get_default_device)
 ialias (omp_get_initial_device)
@@ -77,3 +87,4 @@ ialias (omp_get_num_devices)
 ialias (omp_get_num_teams)
 ialias (omp_get_team_num)
 ialias (omp_is_initial_device)
+ialias (omp_get_device_num)
diff --git a/libgomp/config/nvptx/icv-device.c b/libgomp/config/nvptx/icv-device.c
index 3b96890f338..b63149d0c34 100644
--- a/libgomp/config/nvptx/icv-device.c
+++ b/libgomp/config/nvptx/icv-device.c
@@ -58,8 +58,19 @@ omp_is_initial_device (void)
   return 0;
 }
 
+/* This is set to the device number of current GPU during device initialization,
+   when the offload image containing this libgomp portion is loaded.  */
+static volatile int GOMP_DEVICE_NUM_VAR;
+
+int
+omp_get_device_num (void)
+{
+  return GOMP_DEVICE_NUM_VAR;
+}
+
 ialias (omp_set_default_device)
 ialias (omp_get_default_device)
 ialias (omp_get_initial_device)
 ialias (omp_get_num_devices)
 ialias (omp_is_initial_device)
+ialias (omp_get_device_num)
diff --git a/libgomp/fortran.c b/libgomp/fortran.c
index e042702ac91..07f97656e51 100644
--- a/libgomp/fortran.c
+++ b/libgomp/fortran.c
@@ -83,6 +83,7 @@ ialias_redirect (omp_get_partition_place_nums)
 ialias_redirect (omp_set_default_device)
 ialias_redirect (omp_get_default_device)
 ialias_redirect (omp_get_num_devices)
+ialias_redirect (omp_get_device_num)
 ialias_redirect (omp_get_num_teams)
 ialias_redirect (omp_get_team_num)
 ialias_redirect (omp_is_initial_device)
@@ -599,6 +600,12 @@ omp_get_initial_device_ (void)
   return omp_get_initial_device ();
 }
 
+int32_t
+omp_get_device_num_ (void)
+{
+  return omp_get_device_num ();
+}
+
 int32_t
 omp_get_max_task_priority_ (void)
 {
diff --git a/libgomp/icv-device.c b/libgomp/icv-device.c
index c1bedf46647..f11bdfa85c4 100644
--- a/libgomp/icv-device.c
+++ b/libgomp/icv-device.c
@@ -61,8 +61,17 @@ omp_is_initial_device (void)
   return 1;
 }
 
+int
+omp_get_device_num (void)
+{
+  /* By specification, this is equivalent to omp_get_initial_device
+     on the host.  */
+  return omp_get_initial_device ();
+}
+
 ialias (omp_set_default_device)
 ialias (omp_get_default_device)
 ialias (omp_get_initial_device)
 ialias (omp_get_num_devices)
 ialias (omp_is_initial_device)
+ialias (omp_get_device_num)
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index 62645ce9954..cf24a2bee41 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -102,6 +102,12 @@ struct addr_pair
   uintptr_t end;
 };
 
+/* This symbol is to name a target side variable that holds the designated
+   'device number' of the target device. The symbol needs to be available to
+   libgomp code and the offload plugin (which in the latter case must be
+   stringified).  */
+#define GOMP_DEVICE_NUM_VAR __gomp_device_num
+
 /* Miscellaneous functions.  */
 extern void *GOMP_PLUGIN_malloc (size_t) __attribute__ ((malloc));
 extern void *GOMP_PLUGIN_malloc_cleared (size_t) __attribute__ ((malloc));
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index 69aa69562b8..cc44885cba9 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -199,12 +199,18 @@ OMP_5.0.1 {
 	omp_fulfill_event_;
 } OMP_5.0;
 
+OMP_5.0.2 {
+  global:
+	omp_get_device_num;
+	omp_get_device_num_;
+} OMP_5.0.1;
+
 OMP_5.1 {
   global:
 	omp_display_env;
 	omp_display_env_;
 	omp_display_env_8_;
-} OMP_5.0.1;
+} OMP_5.0.2;
 
 GOMP_1.0 {
   global:
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 2c1f1b5968b..fc9e708a8d2 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -165,6 +165,7 @@ linkage, and do not throw exceptions.
 * omp_get_ancestor_thread_num:: Ancestor thread ID
 * omp_get_cancellation::        Whether cancellation support is enabled
 * omp_get_default_device::      Get the default device for target regions
+* omp_get_device_num::          Get device that current thread is running on
 * omp_get_dynamic::             Dynamic teams setting
 * omp_get_initial_device::      Device number of host device
 * omp_get_level::               Number of parallel regions
@@ -385,6 +386,34 @@ For OpenMP 5.1, this must be equal to the value returned by the
 
 
 
+@node omp_get_device_num
+@section @code{omp_get_device_num} -- Return device number of current device
+@table @asis
+@item @emph{Description}:
+This function returns a device number that represents the device that the
+current thread is executing on. For OpenMP 5.0, this must be equal to the
+value returned by the @code{omp_get_initial_device} function when called
+from the host.
+
+@item @emph{C/C++}
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int omp_get_device_num(void);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{integer function omp_get_device_num()}
+@end multitable
+
+@item @emph{See also}:
+@ref{omp_get_initial_device}
+
+@item @emph{Reference}:
+@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.2.37.
+@end table
+
+
+
 @node omp_get_level
 @section @code{omp_get_level} -- Obtain the current nesting level
 @table @asis
diff --git a/libgomp/omp.h.in b/libgomp/omp.h.in
index c93db968d2e..da34a9d98a6 100644
--- a/libgomp/omp.h.in
+++ b/libgomp/omp.h.in
@@ -243,6 +243,7 @@ extern void omp_get_partition_place_nums (int *) __GOMP_NOTHROW;
 extern void omp_set_default_device (int) __GOMP_NOTHROW;
 extern int omp_get_default_device (void) __GOMP_NOTHROW;
 extern int omp_get_num_devices (void) __GOMP_NOTHROW;
+extern int omp_get_device_num (void) __GOMP_NOTHROW;
 extern int omp_get_num_teams (void) __GOMP_NOTHROW;
 extern int omp_get_team_num (void) __GOMP_NOTHROW;
 
diff --git a/libgomp/omp_lib.f90.in b/libgomp/omp_lib.f90.in
index 5fc6587e49e..d7e804f4fd5 100644
--- a/libgomp/omp_lib.f90.in
+++ b/libgomp/omp_lib.f90.in
@@ -550,6 +550,12 @@
           end function omp_get_initial_device
         end interface
 
+        interface
+          function omp_get_device_num ()
+            integer (4) :: omp_get_device_num
+          end function omp_get_device_num
+        end interface
+
         interface
           function omp_get_max_task_priority ()
             integer (4) :: omp_get_max_task_priority
diff --git a/libgomp/omp_lib.h.in b/libgomp/omp_lib.h.in
index 9873cea9ac1..20c32645e3c 100644
--- a/libgomp/omp_lib.h.in
+++ b/libgomp/omp_lib.h.in
@@ -244,6 +244,9 @@
       external omp_get_initial_device
       integer(4) omp_get_initial_device
 
+      external omp_get_device_num
+      integer(4) omp_get_device_num
+
       external omp_get_max_task_priority
       integer(4) omp_get_max_task_priority
 
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 2548614a2e5..f26d7361106 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -29,6 +29,7 @@
 /* {{{ Includes and defines  */
 
 #include "config.h"
+#include "symcat.h"
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
@@ -3305,6 +3306,7 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
   struct kernel_info *kernel;
   int kernel_count = image_desc->kernel_count;
   unsigned var_count = image_desc->global_variable_count;
+  int other_count = 1;
 
   agent = get_agent_info (ord);
   if (!agent)
@@ -3321,7 +3323,8 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
 
   GCN_DEBUG ("Encountered %d kernels in an image\n", kernel_count);
   GCN_DEBUG ("Encountered %u global variables in an image\n", var_count);
-  pair = GOMP_PLUGIN_malloc ((kernel_count + var_count - 2)
+  GCN_DEBUG ("Expect %d other variables in an image\n", other_count);
+  pair = GOMP_PLUGIN_malloc ((kernel_count + var_count + other_count - 2)
 			     * sizeof (struct addr_pair));
   *target_table = pair;
   module = (struct module_info *)
@@ -3396,6 +3399,37 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
       pair++;
     }
 
+  GCN_DEBUG ("Looking for variable %s\n", STRINGX (GOMP_DEVICE_NUM_VAR));
+
+  hsa_status_t status;
+  hsa_executable_symbol_t var_symbol;
+  status = hsa_fns.hsa_executable_get_symbol_fn (agent->executable, NULL,
+						 STRINGX (GOMP_DEVICE_NUM_VAR),
+						 agent->id, 0, &var_symbol);
+  if (status == HSA_STATUS_SUCCESS)
+    {
+      uint64_t device_num_varptr;
+      uint32_t device_num_varsize;
+
+      status = hsa_fns.hsa_executable_symbol_get_info_fn
+	(var_symbol, HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_ADDRESS,
+	 &device_num_varptr);
+      if (status != HSA_STATUS_SUCCESS)
+	hsa_fatal ("Could not extract a variable from its symbol", status);
+      status = hsa_fns.hsa_executable_symbol_get_info_fn
+	(var_symbol, HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_SIZE,
+	 &device_num_varsize);
+      if (status != HSA_STATUS_SUCCESS)
+	hsa_fatal ("Could not extract a variable size from its symbol", status);
+
+      pair->start = device_num_varptr;
+      pair->end = device_num_varptr + device_num_varsize;
+    }
+  else
+    /* The 'GOMP_DEVICE_NUM_VAR' variable was not in this image.  */
+    pair->start = pair->end = 0;
+  pair++;
+
   /* Ensure that constructors are run first.  */
   struct GOMP_kernel_launch_attributes kla =
     { 3,
@@ -3418,7 +3452,7 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
   if (module->fini_array_func)
     kernel_count--;
 
-  return kernel_count + var_count;
+  return kernel_count + var_count + other_count;
 }
 
 /* Unload GCN object-code module described by struct gcn_image_desc in
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 1215212d501..0f16e1cf00d 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -34,6 +34,7 @@
 #define _GNU_SOURCE
 #include "openacc.h"
 #include "config.h"
+#include "symcat.h"
 #include "libgomp-plugin.h"
 #include "oacc-plugin.h"
 #include "gomp-constants.h"
@@ -1265,7 +1266,7 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
   CUmodule module;
   const char *const *var_names;
   const struct targ_fn_launch *fn_descs;
-  unsigned int fn_entries, var_entries, i, j;
+  unsigned int fn_entries, var_entries, other_entries, i, j;
   struct targ_fn_descriptor *targ_fns;
   struct addr_pair *targ_tbl;
   const nvptx_tdata_t *img_header = (const nvptx_tdata_t *) target_data;
@@ -1295,8 +1296,11 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
   fn_entries = img_header->fn_num;
   fn_descs = img_header->fn_descs;
 
+  /* Currently, the only other entry kind is 'device number'.  */
+  other_entries = 1;
+
   targ_tbl = GOMP_PLUGIN_malloc (sizeof (struct addr_pair)
-				 * (fn_entries + var_entries));
+				 * (fn_entries + var_entries + other_entries));
   targ_fns = GOMP_PLUGIN_malloc (sizeof (struct targ_fn_descriptor)
 				 * fn_entries);
 
@@ -1345,9 +1349,24 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
       targ_tbl->end = targ_tbl->start + bytes;
     }
 
+  CUdeviceptr device_num_varptr;
+  size_t device_num_varsize;
+  CUresult r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, &device_num_varptr,
+				  &device_num_varsize, module,
+				  STRINGX (GOMP_DEVICE_NUM_VAR));
+  if (r == CUDA_SUCCESS)
+    {
+      targ_tbl->start = (uintptr_t) device_num_varptr;
+      targ_tbl->end = (uintptr_t) (device_num_varptr + device_num_varsize);
+    }
+  else
+    /* The 'GOMP_DEVICE_NUM_VAR' variable was not in this image.  */
+    targ_tbl->start = targ_tbl->end = 0;
+  targ_tbl++;
+
   nvptx_set_clocktick (module, dev);
 
-  return fn_entries + var_entries;
+  return fn_entries + var_entries + other_entries;
 }
 
 /* Unload the program described by TARGET_DATA.  DEV_DATA is the
diff --git a/libgomp/target.c b/libgomp/target.c
index 453b3210e40..67fcf41cc2e 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1974,6 +1974,9 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
   int num_funcs = host_funcs_end - host_func_table;
   int num_vars  = (host_vars_end - host_var_table) / 2;
 
+  /* Others currently is only 'device_num' */
+  int num_others = 1;
+
   /* Load image to device and get target addresses for the image.  */
   struct addr_pair *target_table = NULL;
   int i, num_target_entries;
@@ -1982,7 +1985,9 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
     = devicep->load_image_func (devicep->target_id, version,
 				target_data, &target_table);
 
-  if (num_target_entries != num_funcs + num_vars)
+  if (num_target_entries != num_funcs + num_vars
+      /* Others (device_num) are included as trailing entries in pair list.  */
+      && num_target_entries != num_funcs + num_vars + num_others)
     {
       gomp_mutex_unlock (&devicep->lock);
       if (is_register_lock)
@@ -2054,6 +2059,35 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
       array++;
     }
 
+  /* Last entry is for the on-device 'device_num' variable. Tolerate case
+     where plugin does not return this entry.  */
+  if (num_funcs + num_vars < num_target_entries)
+    {
+      struct addr_pair *device_num_var = &target_table[num_funcs + num_vars];
+      /* Start address will be non-zero for last entry if GOMP_DEVICE_NUM_VAR
+	 was found in this image.  */
+      if (device_num_var->start != 0)
+	{
+	  /* The index of the devicep within devices[] is regarded as its
+	     'device number', which is different from the per-device type
+	     devicep->target_id.  */
+	  int device_num_val = (int) (devicep - &devices[0]);
+	  if (device_num_var->end - device_num_var->start != sizeof (int))
+	    {
+	      gomp_mutex_unlock (&devicep->lock);
+	      if (is_register_lock)
+		gomp_mutex_unlock (&register_lock);
+	      gomp_fatal ("offload plugin managed 'device_num' not of expected "
+			  "format");
+	    }
+
+	  /* Copy device_num value to place on device memory, hereby actually
+	     designating its device number into effect.  */
+	  gomp_copy_host2dev (devicep, NULL, (void *) device_num_var->start,
+			      &device_num_val, sizeof (int), false, NULL);
+	}
+    }
+
   free (target_table);
 }
 
diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp
index a2050151e84..ba8a73275c5 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -374,6 +374,11 @@ proc check_effective_target_offload_target_amdgcn { } {
     return [libgomp_check_effective_target_offload_target "amdgcn"]
 }
 
+# Return 1 if compiling for offload target intelmic
+proc check_effective_target_offload_target_intelmic { } {
+    return [libgomp_check_effective_target_offload_target "*-intelmic"]
+}
+
 # Return 1 if offload device is available.
 proc check_effective_target_offload_device { } {
     return [check_runtime_nocache offload_device_available_ {
diff --git a/libgomp/testsuite/libgomp.c-c++-common/target-45.c b/libgomp/testsuite/libgomp.c-c++-common/target-45.c
new file mode 100644
index 00000000000..ec0d202e51c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/target-45.c
@@ -0,0 +1,30 @@
+/* { dg-do run { target { ! offload_target_intelmic } } } */
+
+#include <omp.h>
+#include <stdlib.h>
+
+int main (void)
+{
+
+  int host_device_num = omp_get_device_num ();
+
+  if (host_device_num != omp_get_initial_device ())
+    abort ();
+
+  int device_num;
+  int initial_device;
+
+  #pragma omp target map(from: device_num, initial_device)
+  {
+    initial_device = omp_is_initial_device ();
+    device_num = omp_get_device_num ();
+  }
+
+  if (initial_device && host_device_num != device_num)
+    abort ();
+
+  if (!initial_device && host_device_num == device_num)
+    abort ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.fortran/target10.f90 b/libgomp/testsuite/libgomp.fortran/target10.f90
new file mode 100644
index 00000000000..0b939ad7a0d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/target10.f90
@@ -0,0 +1,20 @@
+! { dg-do run { target { ! offload_target_intelmic } } }
+
+program main
+  use omp_lib
+  implicit none
+  integer :: device_num, host_device_num
+  logical :: initial_device
+
+  host_device_num = omp_get_device_num ()
+  if (host_device_num .ne. omp_get_initial_device ()) stop 1
+
+  !$omp target map(from: device_num, initial_device)
+  initial_device = omp_is_initial_device ()
+  device_num = omp_get_device_num ()
+  !$omp end target
+
+  if (initial_device .and. (host_device_num .ne. device_num)) stop 2
+  if ((.not. initial_device) .and. (host_device_num .eq. device_num)) stop 3
+
+end program main
-- 
2.17.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH, libgomp, OpenMP 5.0, OG11, committed] Implement omp_get_device_num
  2021-08-05 15:30       ` [PATCH, v3, libgomp, OpenMP 5.0, committed] " Chung-Lin Tang
@ 2021-08-09  7:16         ` Chung-Lin Tang
  2021-08-09  9:30           ` Julian Brown
  0 siblings, 1 reply; 21+ messages in thread
From: Chung-Lin Tang @ 2021-08-09  7:16 UTC (permalink / raw)
  To: Thomas Schwinge
  Cc: Jakub Jelinek, Andrew Stubbs, gcc-patches, marcel_vollweiler,
	Tobias Burnus, Catherine Moore

[-- Attachment #1: Type: text/plain, Size: 582 bytes --]

The omp_get_device_num patch was merged to devel/omp/gcc-11 (OG11) after testing.
Commit was 83177ca9f262b230c892e667ebf685f96a718ec8.

This commit also effective reverts the one-liner patch by Cesar:
https://gcc.gnu.org/pipermail/gcc-patches/2017-October/484844.html

(which was still kept in OG11 at 59ef9fea377db72f198b2bd5a95d5aef58b3f9c4)

That small patch is not on mainline, and conflicts with the current merge, and upon
review and test, appears isn't really needed anymore. Thus took the liberty to
overwrite it with the merge of this omp_get_device_num patch.

Chung-Lin


[-- Attachment #2: 0001-openmp-Implement-omp_get_device_num-routine.patch --]
[-- Type: text/plain, Size: 23180 bytes --]

From 83177ca9f262b230c892e667ebf685f96a718ec8 Mon Sep 17 00:00:00 2001
From: Chung-Lin Tang <cltang@codesourcery.com>
Date: Mon, 9 Aug 2021 08:58:07 +0200
Subject: [PATCH] openmp: Implement omp_get_device_num routine

This patch implements the omp_get_device_num library routine, specified in
OpenMP 5.0.

GOMP_DEVICE_NUM_VAR is a macro symbol which defines name of a "device number"
variable, is defined on the device-side libgomp, has it's address returned to
host-side libgomp during device initialization, and the host libgomp then
sets its value to the designated device number.

libgomp/ChangeLog:

	* icv-device.c (omp_get_device_num): New API function, host side.
	* fortran.c (omp_get_device_num_): New interface function.
	* libgomp-plugin.h (GOMP_DEVICE_NUM_VAR): Define macro symbol.
	* libgomp.map (OMP_5.0.2): New version space with omp_get_device_num,
	omp_get_device_num_.
	* libgomp.texi (omp_get_device_num): Add documentation for new API
	function.
	* omp.h.in (omp_get_device_num): Add declaration.
	* omp_lib.f90.in (omp_get_device_num): Likewise.
	* omp_lib.h.in (omp_get_device_num): Likewise.
	* target.c (gomp_load_image_to_device): If additional entry for device
	number exists at end of returned entries from 'load_image_func' hook,
	copy the assigned device number over to the device variable.

	* config/gcn/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global.
	(omp_get_device_num): New API function, device side.
	* plugin/plugin-gcn.c ("symcat.h"): Add include.
	(GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR
	at end of returned 'target_table' entries.

	* config/nvptx/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global.
	(omp_get_device_num): New API function, device side.
	* plugin/plugin-nvptx.c ("symcat.h"): Add include.
	(GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR
	at end of returned 'target_table' entries.

	* testsuite/lib/libgomp.exp
	(check_effective_target_offload_target_intelmic): New function for
	testing for intelmic offloading.
	* testsuite/libgomp.c-c++-common/target-45.c: New test.
	* testsuite/libgomp.fortran/target10.f90: New test.

(cherry picked from commit 0bac793ed6bad2c0c13cd1e93a1aa5808467afc8)
---
 libgomp/ChangeLog.omp                              | 42 +++++++++++++++++++---
 libgomp/config/gcn/icv-device.c                    | 11 ++++++
 libgomp/config/nvptx/icv-device.c                  | 11 ++++++
 libgomp/fortran.c                                  |  7 ++++
 libgomp/icv-device.c                               |  9 +++++
 libgomp/libgomp-plugin.h                           |  6 ++++
 libgomp/libgomp.map                                |  8 ++++-
 libgomp/libgomp.texi                               | 29 +++++++++++++++
 libgomp/omp.h.in                                   |  1 +
 libgomp/omp_lib.f90.in                             |  6 ++++
 libgomp/omp_lib.h.in                               |  3 ++
 libgomp/plugin/plugin-gcn.c                        | 38 ++++++++++++++++++--
 libgomp/plugin/plugin-nvptx.c                      | 25 +++++++++++--
 libgomp/target.c                                   | 36 ++++++++++++++++++-
 libgomp/testsuite/lib/libgomp.exp                  |  5 +++
 libgomp/testsuite/libgomp.c-c++-common/target-45.c | 30 ++++++++++++++++
 libgomp/testsuite/libgomp.fortran/target10.f90     | 20 +++++++++++
 17 files changed, 276 insertions(+), 11 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/target-45.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/target10.f90

diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp
index 9467e90..3a3299b 100644
--- a/libgomp/ChangeLog.omp
+++ b/libgomp/ChangeLog.omp
@@ -1,15 +1,49 @@
-2021-06-30  Tobias Burnus  <tobias@codesourcery.com>
+2021-08-09  Tobias Burnus  <tobias@codesourcery.com>
 
 	Backported from master:
-	2021-06-29  Thomas Schwinge  <thomas@codesourcery.com>
+	2021-08-05  Chung-Lin Tang  <cltang@codesourcery.com>
+
+	* icv-device.c (omp_get_device_num): New API function, host side.
+	* fortran.c (omp_get_device_num_): New interface function.
+	* libgomp-plugin.h (GOMP_DEVICE_NUM_VAR): Define macro symbol.
+	* libgomp.map (OMP_5.0.2): New version space with omp_get_device_num,
+	omp_get_device_num_.
+	* libgomp.texi (omp_get_device_num): Add documentation for new API
+	function.
+	* omp.h.in (omp_get_device_num): Add declaration.
+	* omp_lib.f90.in (omp_get_device_num): Likewise.
+	* omp_lib.h.in (omp_get_device_num): Likewise.
+	* target.c (gomp_load_image_to_device): If additional entry for device
+	number exists at end of returned entries from 'load_image_func' hook,
+	copy the assigned device number over to the device variable.
+	* config/gcn/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global.
+	(omp_get_device_num): New API function, device side.
+	* plugin/plugin-gcn.c ("symcat.h"): Add include.
+	(GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR
+	at end of returned 'target_table' entries.
+	* config/nvptx/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global.
+	(omp_get_device_num): New API function, device side.
+	* plugin/plugin-nvptx.c ("symcat.h"): Add include.
+	(GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR
+	at end of returned 'target_table' entries.
+	* testsuite/lib/libgomp.exp
+	(check_effective_target_offload_target_intelmic): New function for
+	testing for intelmic offloading.
+	* testsuite/libgomp.c-c++-common/target-45.c: New test.
+	* testsuite/libgomp.fortran/target10.f90: New test.
+
+2021-07-30  Tobias Burnus  <tobias@codesourcery.com>
+
+	Backported from master:
+	2021-07-29  Thomas Schwinge  <thomas@codesourcery.com>
 		    Ulrich Drepper  <drepper@redhat.com>
 	* fortran.c (omp_display_env_, omp_display_env_8_): Only
 	'#ifndef LIBGOMP_OFFLOADED_ONLY'.
 
-2021-06-30  Tobias Burnus  <tobias@codesourcery.com>
+2021-07-30  Tobias Burnus  <tobias@codesourcery.com>
 
 	Backported from master:
-	2021-06-29  Ulrich Drepper  <drepper@gmail.com>
+	2021-07-29  Ulrich Drepper  <drepper@gmail.com>
 
 	* env.c (wait_policy, stacksize): New static variables,
 	move out of handle_omp_display_env.
diff --git a/libgomp/config/gcn/icv-device.c b/libgomp/config/gcn/icv-device.c
index 72d4f7c..34e0f83 100644
--- a/libgomp/config/gcn/icv-device.c
+++ b/libgomp/config/gcn/icv-device.c
@@ -70,6 +70,16 @@ omp_is_initial_device (void)
   return 0;
 }
 
+/* This is set to the device number of current GPU during device initialization,
+   when the offload image containing this libgomp portion is loaded.  */
+static volatile int GOMP_DEVICE_NUM_VAR;
+
+int
+omp_get_device_num (void)
+{
+  return GOMP_DEVICE_NUM_VAR;
+}
+
 ialias (omp_set_default_device)
 ialias (omp_get_default_device)
 ialias (omp_get_initial_device)
@@ -77,3 +87,4 @@ ialias (omp_get_num_devices)
 ialias (omp_get_num_teams)
 ialias (omp_get_team_num)
 ialias (omp_is_initial_device)
+ialias (omp_get_device_num)
diff --git a/libgomp/config/nvptx/icv-device.c b/libgomp/config/nvptx/icv-device.c
index 3b96890..b63149d 100644
--- a/libgomp/config/nvptx/icv-device.c
+++ b/libgomp/config/nvptx/icv-device.c
@@ -58,8 +58,19 @@ omp_is_initial_device (void)
   return 0;
 }
 
+/* This is set to the device number of current GPU during device initialization,
+   when the offload image containing this libgomp portion is loaded.  */
+static volatile int GOMP_DEVICE_NUM_VAR;
+
+int
+omp_get_device_num (void)
+{
+  return GOMP_DEVICE_NUM_VAR;
+}
+
 ialias (omp_set_default_device)
 ialias (omp_get_default_device)
 ialias (omp_get_initial_device)
 ialias (omp_get_num_devices)
 ialias (omp_is_initial_device)
+ialias (omp_get_device_num)
diff --git a/libgomp/fortran.c b/libgomp/fortran.c
index e042702..07f9765 100644
--- a/libgomp/fortran.c
+++ b/libgomp/fortran.c
@@ -83,6 +83,7 @@ ialias_redirect (omp_get_partition_place_nums)
 ialias_redirect (omp_set_default_device)
 ialias_redirect (omp_get_default_device)
 ialias_redirect (omp_get_num_devices)
+ialias_redirect (omp_get_device_num)
 ialias_redirect (omp_get_num_teams)
 ialias_redirect (omp_get_team_num)
 ialias_redirect (omp_is_initial_device)
@@ -600,6 +601,12 @@ omp_get_initial_device_ (void)
 }
 
 int32_t
+omp_get_device_num_ (void)
+{
+  return omp_get_device_num ();
+}
+
+int32_t
 omp_get_max_task_priority_ (void)
 {
   return omp_get_max_task_priority ();
diff --git a/libgomp/icv-device.c b/libgomp/icv-device.c
index c1bedf4..f11bdfa 100644
--- a/libgomp/icv-device.c
+++ b/libgomp/icv-device.c
@@ -61,8 +61,17 @@ omp_is_initial_device (void)
   return 1;
 }
 
+int
+omp_get_device_num (void)
+{
+  /* By specification, this is equivalent to omp_get_initial_device
+     on the host.  */
+  return omp_get_initial_device ();
+}
+
 ialias (omp_set_default_device)
 ialias (omp_get_default_device)
 ialias (omp_get_initial_device)
 ialias (omp_get_num_devices)
 ialias (omp_is_initial_device)
+ialias (omp_get_device_num)
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index 1b28f57..65ba382 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -102,6 +102,12 @@ struct addr_pair
   uintptr_t end;
 };
 
+/* This symbol is to name a target side variable that holds the designated
+   'device number' of the target device. The symbol needs to be available to
+   libgomp code and the offload plugin (which in the latter case must be
+   stringified).  */
+#define GOMP_DEVICE_NUM_VAR __gomp_device_num
+
 /* Miscellaneous functions.  */
 extern void *GOMP_PLUGIN_malloc (size_t) __attribute__ ((malloc));
 extern void *GOMP_PLUGIN_malloc_cleared (size_t) __attribute__ ((malloc));
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index 8cbeb13..3859079 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -199,12 +199,18 @@ OMP_5.0.1 {
 	omp_fulfill_event_;
 } OMP_5.0;
 
+OMP_5.0.2 {
+  global:
+	omp_get_device_num;
+	omp_get_device_num_;
+} OMP_5.0.1;
+
 OMP_5.1 {
   global:
 	omp_display_env;
 	omp_display_env_;
 	omp_display_env_8_;
-} OMP_5.0.1;
+} OMP_5.0.2;
 
 GOMP_1.0 {
   global:
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 5bab28d..2ee6f31 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -165,6 +165,7 @@ linkage, and do not throw exceptions.
 * omp_get_ancestor_thread_num:: Ancestor thread ID
 * omp_get_cancellation::        Whether cancellation support is enabled
 * omp_get_default_device::      Get the default device for target regions
+* omp_get_device_num::          Get device that current thread is running on
 * omp_get_dynamic::             Dynamic teams setting
 * omp_get_initial_device::      Device number of host device
 * omp_get_level::               Number of parallel regions
@@ -385,6 +386,34 @@ For OpenMP 5.1, this must be equal to the value returned by the
 
 
 
+@node omp_get_device_num
+@section @code{omp_get_device_num} -- Return device number of current device
+@table @asis
+@item @emph{Description}:
+This function returns a device number that represents the device that the
+current thread is executing on. For OpenMP 5.0, this must be equal to the
+value returned by the @code{omp_get_initial_device} function when called
+from the host.
+
+@item @emph{C/C++}
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int omp_get_device_num(void);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{integer function omp_get_device_num()}
+@end multitable
+
+@item @emph{See also}:
+@ref{omp_get_initial_device}
+
+@item @emph{Reference}:
+@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.2.37.
+@end table
+
+
+
 @node omp_get_level
 @section @code{omp_get_level} -- Obtain the current nesting level
 @table @asis
diff --git a/libgomp/omp.h.in b/libgomp/omp.h.in
index c93db96..da34a9d 100644
--- a/libgomp/omp.h.in
+++ b/libgomp/omp.h.in
@@ -243,6 +243,7 @@ extern void omp_get_partition_place_nums (int *) __GOMP_NOTHROW;
 extern void omp_set_default_device (int) __GOMP_NOTHROW;
 extern int omp_get_default_device (void) __GOMP_NOTHROW;
 extern int omp_get_num_devices (void) __GOMP_NOTHROW;
+extern int omp_get_device_num (void) __GOMP_NOTHROW;
 extern int omp_get_num_teams (void) __GOMP_NOTHROW;
 extern int omp_get_team_num (void) __GOMP_NOTHROW;
 
diff --git a/libgomp/omp_lib.f90.in b/libgomp/omp_lib.f90.in
index 5fc6587..d7e804f 100644
--- a/libgomp/omp_lib.f90.in
+++ b/libgomp/omp_lib.f90.in
@@ -551,6 +551,12 @@
         end interface
 
         interface
+          function omp_get_device_num ()
+            integer (4) :: omp_get_device_num
+          end function omp_get_device_num
+        end interface
+
+        interface
           function omp_get_max_task_priority ()
             integer (4) :: omp_get_max_task_priority
           end function omp_get_max_task_priority
diff --git a/libgomp/omp_lib.h.in b/libgomp/omp_lib.h.in
index 9873cea..20c3264 100644
--- a/libgomp/omp_lib.h.in
+++ b/libgomp/omp_lib.h.in
@@ -244,6 +244,9 @@
       external omp_get_initial_device
       integer(4) omp_get_initial_device
 
+      external omp_get_device_num
+      integer(4) omp_get_device_num
+
       external omp_get_max_task_priority
       integer(4) omp_get_max_task_priority
 
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 1402e85..1455bdc 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -29,6 +29,7 @@
 /* {{{ Includes and defines  */
 
 #include "config.h"
+#include "symcat.h"
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
@@ -3361,6 +3362,7 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
   struct kernel_info *kernel;
   int kernel_count = image_desc->kernel_count;
   unsigned var_count = image_desc->global_variable_count;
+  int other_count = 1;
 
   agent = get_agent_info (ord);
   if (!agent)
@@ -3377,7 +3379,8 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
 
   GCN_DEBUG ("Encountered %d kernels in an image\n", kernel_count);
   GCN_DEBUG ("Encountered %u global variables in an image\n", var_count);
-  pair = GOMP_PLUGIN_malloc ((kernel_count + var_count - 2)
+  GCN_DEBUG ("Expect %d other variables in an image\n", other_count);
+  pair = GOMP_PLUGIN_malloc ((kernel_count + var_count + other_count - 2)
 			     * sizeof (struct addr_pair));
   *target_table = pair;
   module = (struct module_info *)
@@ -3452,6 +3455,37 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
       pair++;
     }
 
+  GCN_DEBUG ("Looking for variable %s\n", STRINGX (GOMP_DEVICE_NUM_VAR));
+
+  hsa_status_t status;
+  hsa_executable_symbol_t var_symbol;
+  status = hsa_fns.hsa_executable_get_symbol_fn (agent->executable, NULL,
+						 STRINGX (GOMP_DEVICE_NUM_VAR),
+						 agent->id, 0, &var_symbol);
+  if (status == HSA_STATUS_SUCCESS)
+    {
+      uint64_t device_num_varptr;
+      uint32_t device_num_varsize;
+
+      status = hsa_fns.hsa_executable_symbol_get_info_fn
+	(var_symbol, HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_ADDRESS,
+	 &device_num_varptr);
+      if (status != HSA_STATUS_SUCCESS)
+	hsa_fatal ("Could not extract a variable from its symbol", status);
+      status = hsa_fns.hsa_executable_symbol_get_info_fn
+	(var_symbol, HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_SIZE,
+	 &device_num_varsize);
+      if (status != HSA_STATUS_SUCCESS)
+	hsa_fatal ("Could not extract a variable size from its symbol", status);
+
+      pair->start = device_num_varptr;
+      pair->end = device_num_varptr + device_num_varsize;
+    }
+  else
+    /* The 'GOMP_DEVICE_NUM_VAR' variable was not in this image.  */
+    pair->start = pair->end = 0;
+  pair++;
+
   /* Ensure that constructors are run first.  */
   struct GOMP_kernel_launch_attributes kla =
     { 3,
@@ -3474,7 +3508,7 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
   if (module->fini_array_func)
     kernel_count--;
 
-  return kernel_count + var_count;
+  return kernel_count + var_count + other_count;
 }
 
 /* Unload GCN object-code module described by struct gcn_image_desc in
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 96f2c13..a968096 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -34,6 +34,7 @@
 #define _GNU_SOURCE
 #include "openacc.h"
 #include "config.h"
+#include "symcat.h"
 #include "libgomp-plugin.h"
 #include "oacc-plugin.h"
 #include "gomp-constants.h"
@@ -1280,7 +1281,7 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
   CUmodule module;
   const char *const *var_names;
   const struct targ_fn_launch *fn_descs;
-  unsigned int fn_entries, var_entries, i, j;
+  unsigned int fn_entries, var_entries, other_entries, i, j;
   struct targ_fn_descriptor *targ_fns;
   struct addr_pair *targ_tbl;
   const nvptx_tdata_t *img_header = (const nvptx_tdata_t *) target_data;
@@ -1310,8 +1311,11 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
   fn_entries = img_header->fn_num;
   fn_descs = img_header->fn_descs;
 
+  /* Currently, the only other entry kind is 'device number'.  */
+  other_entries = 1;
+
   targ_tbl = GOMP_PLUGIN_malloc (sizeof (struct addr_pair)
-				 * (fn_entries + var_entries));
+				 * (fn_entries + var_entries + other_entries));
   targ_fns = GOMP_PLUGIN_malloc (sizeof (struct targ_fn_descriptor)
 				 * fn_entries);
 
@@ -1360,9 +1364,24 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
       targ_tbl->end = targ_tbl->start + bytes;
     }
 
+  CUdeviceptr device_num_varptr;
+  size_t device_num_varsize;
+  CUresult r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, &device_num_varptr,
+				  &device_num_varsize, module,
+				  STRINGX (GOMP_DEVICE_NUM_VAR));
+  if (r == CUDA_SUCCESS)
+    {
+      targ_tbl->start = (uintptr_t) device_num_varptr;
+      targ_tbl->end = (uintptr_t) (device_num_varptr + device_num_varsize);
+    }
+  else
+    /* The 'GOMP_DEVICE_NUM_VAR' variable was not in this image.  */
+    targ_tbl->start = targ_tbl->end = 0;
+  targ_tbl++;
+
   nvptx_set_clocktick (module, dev);
 
-  return fn_entries + var_entries;
+  return fn_entries + var_entries + other_entries;
 }
 
 /* Unload the program described by TARGET_DATA.  DEV_DATA is the
diff --git a/libgomp/target.c b/libgomp/target.c
index f0a66fc..53ca4b0 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -2230,6 +2230,9 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
   int num_funcs = host_funcs_end - host_func_table;
   int num_vars  = (host_vars_end - host_var_table) / 2;
 
+  /* Others currently is only 'device_num' */
+  int num_others = 1;
+
   /* Load image to device and get target addresses for the image.  */
   struct addr_pair *target_table = NULL;
   int i, num_target_entries;
@@ -2238,7 +2241,9 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
     = devicep->load_image_func (devicep->target_id, version,
 				target_data, &target_table);
 
-  if (num_target_entries < num_funcs + num_vars)
+  if (num_target_entries != num_funcs + num_vars
+      /* Others (device_num) are included as trailing entries in pair list.  */
+      && num_target_entries != num_funcs + num_vars + num_others)
     {
       gomp_mutex_unlock (&devicep->lock);
       if (is_register_lock)
@@ -2310,6 +2315,35 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
       array++;
     }
 
+  /* Last entry is for the on-device 'device_num' variable. Tolerate case
+     where plugin does not return this entry.  */
+  if (num_funcs + num_vars < num_target_entries)
+    {
+      struct addr_pair *device_num_var = &target_table[num_funcs + num_vars];
+      /* Start address will be non-zero for last entry if GOMP_DEVICE_NUM_VAR
+	 was found in this image.  */
+      if (device_num_var->start != 0)
+	{
+	  /* The index of the devicep within devices[] is regarded as its
+	     'device number', which is different from the per-device type
+	     devicep->target_id.  */
+	  int device_num_val = (int) (devicep - &devices[0]);
+	  if (device_num_var->end - device_num_var->start != sizeof (int))
+	    {
+	      gomp_mutex_unlock (&devicep->lock);
+	      if (is_register_lock)
+		gomp_mutex_unlock (&register_lock);
+	      gomp_fatal ("offload plugin managed 'device_num' not of expected "
+			  "format");
+	    }
+
+	  /* Copy device_num value to place on device memory, hereby actually
+	     designating its device number into effect.  */
+	  gomp_copy_host2dev (devicep, NULL, (void *) device_num_var->start,
+			      &device_num_val, sizeof (int), false, NULL);
+	}
+    }
+
   free (target_table);
 }
 
diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp
index 15ce33b..83d1307 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -373,6 +373,11 @@ proc check_effective_target_offload_target_amdgcn { } {
     return [libgomp_check_effective_target_offload_target "amdgcn"]
 }
 
+# Return 1 if compiling for offload target intelmic
+proc check_effective_target_offload_target_intelmic { } {
+    return [libgomp_check_effective_target_offload_target "*-intelmic"]
+}
+
 # Return 1 if offload device is available.
 proc check_effective_target_offload_device { } {
     return [check_runtime_nocache offload_device_available_ {
diff --git a/libgomp/testsuite/libgomp.c-c++-common/target-45.c b/libgomp/testsuite/libgomp.c-c++-common/target-45.c
new file mode 100644
index 0000000..ec0d202
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/target-45.c
@@ -0,0 +1,30 @@
+/* { dg-do run { target { ! offload_target_intelmic } } } */
+
+#include <omp.h>
+#include <stdlib.h>
+
+int main (void)
+{
+
+  int host_device_num = omp_get_device_num ();
+
+  if (host_device_num != omp_get_initial_device ())
+    abort ();
+
+  int device_num;
+  int initial_device;
+
+  #pragma omp target map(from: device_num, initial_device)
+  {
+    initial_device = omp_is_initial_device ();
+    device_num = omp_get_device_num ();
+  }
+
+  if (initial_device && host_device_num != device_num)
+    abort ();
+
+  if (!initial_device && host_device_num == device_num)
+    abort ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.fortran/target10.f90 b/libgomp/testsuite/libgomp.fortran/target10.f90
new file mode 100644
index 0000000..0b939ad
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/target10.f90
@@ -0,0 +1,20 @@
+! { dg-do run { target { ! offload_target_intelmic } } }
+
+program main
+  use omp_lib
+  implicit none
+  integer :: device_num, host_device_num
+  logical :: initial_device
+
+  host_device_num = omp_get_device_num ()
+  if (host_device_num .ne. omp_get_initial_device ()) stop 1
+
+  !$omp target map(from: device_num, initial_device)
+  initial_device = omp_is_initial_device ()
+  device_num = omp_get_device_num ()
+  !$omp end target
+
+  if (initial_device .and. (host_device_num .ne. device_num)) stop 2
+  if ((.not. initial_device) .and. (host_device_num .eq. device_num)) stop 3
+
+end program main
-- 
2.8.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH, libgomp, OpenMP 5.0, OG11, committed] Implement omp_get_device_num
  2021-08-09  7:16         ` [PATCH, libgomp, OpenMP 5.0, OG11, " Chung-Lin Tang
@ 2021-08-09  9:30           ` Julian Brown
  2021-08-09  9:30             ` Julian Brown
  0 siblings, 1 reply; 21+ messages in thread
From: Julian Brown @ 2021-08-09  9:30 UTC (permalink / raw)
  To: Chung-Lin Tang
  Cc: Jakub Jelinek, Andrew Stubbs, gcc-patches, Tobias Burnus,
	marcel_vollweiler

On Mon, 9 Aug 2021 15:16:45 +0800
Chung-Lin Tang <cltang@codesourcery.com> wrote:

> This commit also effective reverts the one-liner patch by Cesar:
> https://gcc.gnu.org/pipermail/gcc-patches/2017-October/484844.html
> 
> (which was still kept in OG11 at
> 59ef9fea377db72f198b2bd5a95d5aef58b3f9c4)
> 
> That small patch is not on mainline, and conflicts with the current
> merge, and upon review and test, appears isn't really needed anymore.
> Thus took the liberty to overwrite it with the merge of this
> omp_get_device_num patch.

FWIW, though there probably isn't test coverage for this, I don't
recall it having been fixed in another way -- it probably only shows up
when linking with a static library with offload code in unused object
files, or something. Not particularly easy to do as a standalone test
with DejaGnu!

HTH,

Julian

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH, libgomp, OpenMP 5.0, OG11, committed] Implement omp_get_device_num
  2021-08-09  9:30           ` Julian Brown
@ 2021-08-09  9:30             ` Julian Brown
  0 siblings, 0 replies; 21+ messages in thread
From: Julian Brown @ 2021-08-09  9:30 UTC (permalink / raw)
  To: gcc-patches
  Cc: Jakub Jelinek, Andrew Stubbs, gcc-patches, Tobias Burnus,
	marcel_vollweiler

On Mon, 9 Aug 2021 15:16:45 +0800
Chung-Lin Tang <cltang@codesourcery.com> wrote:

> This commit also effective reverts the one-liner patch by Cesar:
> https://gcc.gnu.org/pipermail/gcc-patches/2017-October/484844.html
> 
> (which was still kept in OG11 at
> 59ef9fea377db72f198b2bd5a95d5aef58b3f9c4)
> 
> That small patch is not on mainline, and conflicts with the current
> merge, and upon review and test, appears isn't really needed anymore.
> Thus took the liberty to overwrite it with the merge of this
> omp_get_device_num patch.

FWIW, though there probably isn't test coverage for this, I don't
recall it having been fixed in another way -- it probably only shows up
when linking with a static library with offload code in unused object
files, or something. Not particularly easy to do as a standalone test
with DejaGnu!

HTH,

Julian



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH, OpenMP, libgomp, committed] Fix GOMP_DEVICE_NUM_VAR stringification error
@ 2022-01-04  9:28 ` Chung-Lin Tang
  2022-01-04 14:12   ` [committed] libgomp/testsuite: Improve omp_get_device_num() tests (was: Re: [PATCH, OpenMP, libgomp, committed] Fix GOMP_DEVICE_NUM_VAR stringification error) Tobias Burnus
  0 siblings, 1 reply; 21+ messages in thread
From: Chung-Lin Tang @ 2022-01-04  9:28 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek, Thomas Schwinge, Andrew Stubbs,
	Vollweiler, Marcel (DI SW CAS EPS STS)

[-- Attachment #1: Type: text/plain, Size: 774 bytes --]

In the patch that implemented omp_get_device_num(), there was an error where
the stringification of GOMP_DEVICE_NUM_VAR, which is the macro expanding to
the actual symbol used, was erroneously using the STRINGX() macro in the
libgomp offload image symbol search, and expansion of the variable name
string through the additional layer of preprocessor symbol was not properly
achieved.

This patch fixes this by changing to properly use XSTRING(), also from
include/symcat.h.

This change was fairly obvious, so committed directly.

Thanks,
Chung-Lin

libgomp/ChangeLog:

	* plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Change uses of STRINGX
	into XSTRING when looking for GOMP_DEVICE_NUM_VAR in offload image.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise.

[-- Attachment #2: 0001-libgomp-Fix-GOMP_DEVICE_NUM_VAR-stringification-duri.patch --]
[-- Type: text/plain, Size: 2423 bytes --]

From fbb592407c9dd244b4cea086cbb90d7bd0bf60bb Mon Sep 17 00:00:00 2001
From: Chung-Lin Tang <cltang@codesourcery.com>
Date: Tue, 4 Jan 2022 17:26:23 +0800
Subject: [PATCH] libgomp: Fix GOMP_DEVICE_NUM_VAR stringification during
 offload image load

In the patch that implemented omp_get_device_num(), there was an error where
the stringification of GOMP_DEVICE_NUM_VAR, which is the macro expanding to
the actual symbol used, was erroneously using the STRINGX() macro in the
libgomp offload image symbol search, and expansion of the variable name
string through the additional layer of preprocessor symbol was not properly
achieved.

This patch fixes this by changing to properly use XSTRING(), also from
include/symcat.h.

libgomp/ChangeLog:

	* plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Change uses of STRINGX
	into XSTRING when looking for GOMP_DEVICE_NUM_VAR in offload image.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise.
---
 libgomp/plugin/plugin-gcn.c   | 4 ++--
 libgomp/plugin/plugin-nvptx.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 8ffd3d1a2cf..d0f05b28bf3 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -3401,12 +3401,12 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
 	}
     }
 
-  GCN_DEBUG ("Looking for variable %s\n", STRINGX (GOMP_DEVICE_NUM_VAR));
+  GCN_DEBUG ("Looking for variable %s\n", XSTRING (GOMP_DEVICE_NUM_VAR));
 
   hsa_status_t status;
   hsa_executable_symbol_t var_symbol;
   status = hsa_fns.hsa_executable_get_symbol_fn (agent->executable, NULL,
-						 STRINGX (GOMP_DEVICE_NUM_VAR),
+						 XSTRING (GOMP_DEVICE_NUM_VAR),
 						 agent->id, 0, &var_symbol);
   if (status == HSA_STATUS_SUCCESS)
     {
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index f32276b0a18..b4f0a84d77a 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1353,7 +1353,7 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
   size_t device_num_varsize;
   CUresult r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, &device_num_varptr,
 				  &device_num_varsize, module,
-				  STRINGX (GOMP_DEVICE_NUM_VAR));
+				  XSTRING (GOMP_DEVICE_NUM_VAR));
   if (r == CUDA_SUCCESS)
     {
       targ_tbl->start = (uintptr_t) device_num_varptr;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [committed] libgomp/testsuite: Improve omp_get_device_num() tests (was: Re: [PATCH, OpenMP, libgomp, committed] Fix GOMP_DEVICE_NUM_VAR stringification error)
  2022-01-04  9:28 ` [PATCH, OpenMP, libgomp, committed] Fix GOMP_DEVICE_NUM_VAR stringification error Chung-Lin Tang
@ 2022-01-04 14:12   ` Tobias Burnus
  2022-01-13 12:22     ` [committed] libgomp/testsuite: Improve omp_get_device_num() tests Thomas Schwinge
  0 siblings, 1 reply; 21+ messages in thread
From: Tobias Burnus @ 2022-01-04 14:12 UTC (permalink / raw)
  To: Chung-Lin Tang, gcc-patches, Jakub Jelinek, Thomas Schwinge,
	Andrew Stubbs, Vollweiler,  Marcel (DI SW CAS EPS STS)

[-- Attachment #1: Type: text/plain, Size: 847 bytes --]

On 04.01.22 10:28, Chung-Lin Tang wrote:

> In the patch that implemented omp_get_device_num(), there was an error
> where
> the stringification of GOMP_DEVICE_NUM_VAR, ...

... which caused that omp_get_device() == 0 (always) on nvptx/gcn.

That's fine if there is only a single non-host device (as often the
case), but not if there are multiples.

This commit r12-6209 now makes the testcases iterate over all devices
(including the initial/host device).

Hence, with multiple non-host devices and this test, the error had been
found before ... ;-)

Tobias
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Attachment #2: committed.diff --]
[-- Type: text/x-patch, Size: 3179 bytes --]

commit be661959a6b6d8f9c3c8608a746789e7b2ec3ca4
Author: Tobias Burnus <tobias@codesourcery.com>
Date:   Tue Jan 4 14:58:06 2022 +0100

    libgomp/testsuite: Improve omp_get_device_num() tests
    
    Related to r12-6208-gebc853deb7cc0487de9ef6e891a007ba853d1933
    "libgomp: Fix GOMP_DEVICE_NUM_VAR stringification during offload image load"
    
    That commit fixed an issue with omp_get_device_num() on gcn/nvptx that
    resulted in having always the value 0.
    This commit modifies the tests to iterate over all devices such that on a
    multi-nonhost-device system it had detected that always-zero issue.
    
    libgomp/ChangeLog:
    
            * testsuite/libgomp.c-c++-common/target-45.c: Iterate over all devices.
            * testsuite/libgomp.fortran/target10.f90: Likewise.

diff --git a/libgomp/testsuite/libgomp.c-c++-common/target-45.c b/libgomp/testsuite/libgomp.c-c++-common/target-45.c
index 81acee81064..837503996d7 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/target-45.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/target-45.c
@@ -14,17 +14,23 @@ int main (void)
   int device_num;
   int initial_device;
 
-  #pragma omp target map(from: device_num, initial_device)
-  {
-    initial_device = omp_is_initial_device ();
-    device_num = omp_get_device_num ();
-  }
-
-  if (initial_device && host_device_num != device_num)
-    abort ();
-
-  if (!initial_device && host_device_num == device_num)
-    abort ();
+  for (int i = 0; i <= omp_get_num_devices (); i++)
+    {
+      #pragma omp target map(from: device_num, initial_device) device(i)
+	{
+	  initial_device = omp_is_initial_device ();
+	  device_num = omp_get_device_num ();
+	}
+
+      if (i != device_num)
+	abort ();
+
+      if (initial_device && host_device_num != device_num)
+	abort ();
+
+      if (!initial_device && host_device_num == device_num)
+	abort ();
+    }
 
   return 0;
 }
diff --git a/libgomp/testsuite/libgomp.fortran/target10.f90 b/libgomp/testsuite/libgomp.fortran/target10.f90
index f41a726de75..f6951fc9057 100644
--- a/libgomp/testsuite/libgomp.fortran/target10.f90
+++ b/libgomp/testsuite/libgomp.fortran/target10.f90
@@ -4,18 +4,20 @@
 program main
   use omp_lib
   implicit none
-  integer :: device_num, host_device_num
+  integer :: device_num, host_device_num, i
   logical :: initial_device
 
   host_device_num = omp_get_device_num ()
   if (host_device_num .ne. omp_get_initial_device ()) stop 1
 
-  !$omp target map(from: device_num, initial_device)
-  initial_device = omp_is_initial_device ()
-  device_num = omp_get_device_num ()
-  !$omp end target
-
-  if (initial_device .and. (host_device_num .ne. device_num)) stop 2
-  if ((.not. initial_device) .and. (host_device_num .eq. device_num)) stop 3
+  do i = 0, omp_get_num_devices ()
+    !$omp target map(from: device_num, initial_device) device(i)
+      initial_device = omp_is_initial_device ()
+      device_num = omp_get_device_num ()
+    !$omp end target
+    if (i /= device_num) stop 2
+    if (initial_device .and. (host_device_num .ne. device_num)) stop 3
+    if ((.not. initial_device) .and. (host_device_num .eq. device_num)) stop 4
+  end do
 
 end program main

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH] libgomp, OpenMP: Fix issue for omp_get_device_num on gfx targets.
@ 2022-01-12  9:43 ` Marcel Vollweiler
  2022-01-18 12:25   ` Thomas Schwinge
  2022-01-18 13:47   ` Andrew Stubbs
  0 siblings, 2 replies; 21+ messages in thread
From: Marcel Vollweiler @ 2022-01-12  9:43 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 866 bytes --]

Hi,

Currently omp_get_device_num does not work on gcn targets with more than
one offload device. The reason is that GOMP_DEVICE_NUM_VAR is static in
icv-device.c and thus "__gomp_device_num" is not visible in the offload
image.

This patch removes "static" such that "__gomp_device_num" is now part of
the offload image and can now be found in GOMP_OFFLOAD_load_image in the
plugin.

This is not an issue for nvptx. There, "__gomp_device_num" is in the
offload image even with "static".

The patch was tested on x86_64-linux with amdgcn offloading with no
regressions.

Marcel
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Attachment #2: remove-static-for-omp-get-device-num-amd.diff --]
[-- Type: text/plain, Size: 1207 bytes --]

libgomp, OpenMP: Fix issue for omp_get_device_num on gcn targets.

Currently omp_get_device_num does not work on gcn targets with more than one
offload device. The reason is that GOMP_DEVICE_NUM_VAR is static in
icv-device.c and thus "__gomp_device_num" is not visible in the offload image.

This patch removes "static" such that "__gomp_device_num" is now part of the
offload image and can now be found in GOMP_OFFLOAD_load_image in the plugin.

This is not an issue for nvptx. There, "__gomp_device_num" is in the offload
image even with "static".

libgomp/ChangeLog:

	* config/gcn/icv-device.c: Make GOMP_DEVICE_NUM_VAR public (remove
	"static") to make the device num available in the offload image.

diff --git a/libgomp/config/gcn/icv-device.c b/libgomp/config/gcn/icv-device.c
index fcfa0f3..f70b7e6 100644
--- a/libgomp/config/gcn/icv-device.c
+++ b/libgomp/config/gcn/icv-device.c
@@ -60,7 +60,7 @@ omp_is_initial_device (void)
 
 /* This is set to the device number of current GPU during device initialization,
    when the offload image containing this libgomp portion is loaded.  */
-static volatile int GOMP_DEVICE_NUM_VAR;
+volatile int GOMP_DEVICE_NUM_VAR;
 
 int
 omp_get_device_num (void)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [committed] libgomp/testsuite: Improve omp_get_device_num() tests
  2022-01-04 14:12   ` [committed] libgomp/testsuite: Improve omp_get_device_num() tests (was: Re: [PATCH, OpenMP, libgomp, committed] Fix GOMP_DEVICE_NUM_VAR stringification error) Tobias Burnus
@ 2022-01-13 12:22     ` Thomas Schwinge
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Schwinge @ 2022-01-13 12:22 UTC (permalink / raw)
  To: Tobias Burnus, Chung-Lin Tang, gcc-patches
  Cc: Jakub Jelinek, Andrew Stubbs, Marcel Vollweiler

[-- Attachment #1: Type: text/plain, Size: 1137 bytes --]

Hi!

On 2022-01-04T15:12:58+0100, Tobias Burnus <tobias@codesourcery.com> wrote:
> This commit r12-6209 now makes the testcases iterate over all devices
> (including the initial/host device).
>
> Hence, with multiple non-host devices and this test, the error had been
> found before ... ;-)

Yay for test cases!  :-)

... but we now run into issues if Intel MIC (emulated) offloading is
(additionally) enabled, because that one still doesn't properly implement
device-side 'omp_get_device_num'.  ;-)

Thus pushed to master branch
commit d97364aab1af361275b87713154c366ce2b9029a
"Improve Intel MIC offloading XFAILing for 'omp_get_device_num'", see
attached.

(It wasn't obvious to me how to implement that; very incomplete
"[WIP] Intel MIC 'omp_get_device_num'" attached, not planning on working
on this any further.)


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Improve-Intel-MIC-offloading-XFAILing-for-omp_get_de.patch --]
[-- Type: text/x-diff, Size: 5568 bytes --]

From d97364aab1af361275b87713154c366ce2b9029a Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Wed, 5 Jan 2022 19:52:25 +0100
Subject: [PATCH] Improve Intel MIC offloading XFAILing for
 'omp_get_device_num'

After recent commit be661959a6b6d8f9c3c8608a746789e7b2ec3ca4
"libgomp/testsuite: Improve omp_get_device_num() tests", we're now iterating
over all OpenMP target devices.  Intel MIC (emulated) offloading still doesn't
properly implement device-side 'omp_get_device_num', and we thus regress:

    PASS: libgomp.c/../libgomp.c-c++-common/target-45.c (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/target-45.c execution test

    PASS: libgomp.c++/../libgomp.c-c++-common/target-45.c (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.c++/../libgomp.c-c++-common/target-45.c execution test

    PASS: libgomp.fortran/target10.f90   -O0  (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90   -O0  execution test
    PASS: libgomp.fortran/target10.f90   -O1  (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90   -O1  execution test
    PASS: libgomp.fortran/target10.f90   -O2  (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90   -O2  execution test
    PASS: libgomp.fortran/target10.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
    PASS: libgomp.fortran/target10.f90   -O3 -g  (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90   -O3 -g  execution test
    PASS: libgomp.fortran/target10.f90   -Os  (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90   -Os  execution test

Improve the XFAILing added in commit bb75b22aba254e8ff144db27b1c8b4804bad73bb
"Allow matching Intel MIC in OpenMP 'declare variant'" for the case that *any*
Intel MIC offload device is available.

	libgomp/
	* testsuite/libgomp.c-c++-common/on_device_arch.h
	(any_device_arch, any_device_arch_intel_mic): New.
	* testsuite/lib/libgomp.exp
	(check_effective_target_offload_device_any_intel_mic): New.
	* testsuite/libgomp.c-c++-common/target-45.c: Use it.
	* testsuite/libgomp.fortran/target10.f90: Likewise.
---
 libgomp/testsuite/lib/libgomp.exp             | 12 +++++++++-
 .../libgomp.c-c++-common/on_device_arch.h     | 23 +++++++++++++++++++
 .../libgomp.c-c++-common/target-45.c          |  2 +-
 .../testsuite/libgomp.fortran/target10.f90    |  2 +-
 4 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp
index 57fb6b068f3..8c5ecfff0ac 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -451,7 +451,6 @@ proc check_effective_target_openacc_nvidia_accel_selected { } {
 # Return 1 if using Intel MIC offload device.
 proc check_effective_target_offload_device_intel_mic { } {
     return [check_runtime_nocache offload_device_intel_mic {
-      #include <omp.h>
       #include "testsuite/libgomp.c-c++-common/on_device_arch.h"
       int main ()
 	{
@@ -460,6 +459,17 @@ proc check_effective_target_offload_device_intel_mic { } {
     } ]
 }
 
+# Return 1 if any Intel MIC offload device is available.
+proc check_effective_target_offload_device_any_intel_mic { } {
+    return [check_runtime_nocache offload_device_any_intel_mic {
+      #include "testsuite/libgomp.c-c++-common/on_device_arch.h"
+      int main ()
+	{
+	  return !any_device_arch_intel_mic ();
+	}
+    } ]
+}
+
 # Return 1 if the OpenACC 'host' device type is selected.
 
 proc check_effective_target_openacc_host_selected { } {
diff --git a/libgomp/testsuite/libgomp.c-c++-common/on_device_arch.h b/libgomp/testsuite/libgomp.c-c++-common/on_device_arch.h
index ee541dd2260..f92743b04d7 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/on_device_arch.h
+++ b/libgomp/testsuite/libgomp.c-c++-common/on_device_arch.h
@@ -1,3 +1,4 @@
+#include <omp.h>
 #include <gomp-constants.h>
 
 /* static */ int
@@ -41,3 +42,25 @@ on_device_arch_intel_mic ()
 {
   return on_device_arch (GOMP_DEVICE_INTEL_MIC);
 }
+
+static int
+any_device_arch (int d)
+{
+  int nd = omp_get_num_devices ();
+  for (int i = 0; i < nd; ++i)
+    {
+      int d_cur;
+      #pragma omp target device(i) map(from:d_cur)
+      d_cur = device_arch ();
+      if (d_cur == d)
+	return 1;
+    }
+
+  return 0;
+}
+
+int
+any_device_arch_intel_mic ()
+{
+  return any_device_arch (GOMP_DEVICE_INTEL_MIC);
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/target-45.c b/libgomp/testsuite/libgomp.c-c++-common/target-45.c
index 837503996d7..27bbeddf7fd 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/target-45.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/target-45.c
@@ -1,4 +1,4 @@
-/* { dg-xfail-run-if TODO { offload_device_intel_mic } } */
+/* { dg-xfail-run-if TODO { offload_device_any_intel_mic } } */
 
 #include <omp.h>
 #include <stdlib.h>
diff --git a/libgomp/testsuite/libgomp.fortran/target10.f90 b/libgomp/testsuite/libgomp.fortran/target10.f90
index f6951fc9057..31452554d67 100644
--- a/libgomp/testsuite/libgomp.fortran/target10.f90
+++ b/libgomp/testsuite/libgomp.fortran/target10.f90
@@ -1,5 +1,5 @@
 ! { dg-do run }
-! { dg-xfail-run-if TODO { offload_device_intel_mic } }
+! { dg-xfail-run-if TODO { offload_device_any_intel_mic } }
 
 program main
   use omp_lib
-- 
2.34.1


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: 0001-WIP-Intel-MIC-omp_get_device_num.patch --]
[-- Type: text/x-diff, Size: 6223 bytes --]

From cea0fd2d10f7a9fd060543e59e142c8c9d06abb0 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Wed, 5 Jan 2022 19:15:06 +0100
Subject: [PATCH] [WIP] Intel MIC 'omp_get_device_num'

See commit 0bac793ed6bad2c0c13cd1e93a1aa5808467afc8 "openmp: Implement omp_get_device_num routine"

P commit fbb592407c9dd244b4cea086cbb90d7bd0bf60bb "libgomp: Fix GOMP_DEVICE_NUM_VAR stringification during offload image load"
P commit be661959a6b6d8f9c3c8608a746789e7b2ec3ca4 "libgomp/testsuite: Improve omp_get_device_num() tests"

With Intel MIC (emulated) offloading enabled:

    PASS: libgomp.c/../libgomp.c-c++-common/target-45.c (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/target-45.c execution test

    PASS: libgomp.c++/../libgomp.c-c++-common/target-45.c (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.c++/../libgomp.c-c++-common/target-45.c execution test

    PASS: libgomp.fortran/target10.f90   -O0  (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90   -O0  execution test
    PASS: libgomp.fortran/target10.f90   -O1  (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90   -O1  execution test
    PASS: libgomp.fortran/target10.f90   -O2  (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90   -O2  execution test
    PASS: libgomp.fortran/target10.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
    PASS: libgomp.fortran/target10.f90   -O3 -g  (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90   -O3 -g  execution test
    PASS: libgomp.fortran/target10.f90   -Os  (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90   -Os  execution test

TODO Not sure if this is the right approach for Intel MIC, over all.
TODO This is, at least, still missing corresponding changes to 'liboffloadmic/plugin/libgomp-plugin-intelmic.cpp': 'GOMP_OFFLOAD_load_image'/'offload_image'.
TODO I don't understand how 'libgomp/icv-device.c'/'libgomp/config/*/icv-device.c' are split over 'liboffloadmic/plugin/offload_target_main.cpp' vs. 'liboffloadmic/runtime/offload_omp_host.cpp'/'liboffloadmic/runtime/offload_omp_target.cpp'.
---
 liboffloadmic/plugin/Makefile.am             |  2 +-
 liboffloadmic/plugin/Makefile.in             |  4 +++-
 liboffloadmic/plugin/offload_target_main.cpp | 24 +++++++++++++++++++-
 3 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/liboffloadmic/plugin/Makefile.am b/liboffloadmic/plugin/Makefile.am
index 7caea7894ac..840940cb760 100644
--- a/liboffloadmic/plugin/Makefile.am
+++ b/liboffloadmic/plugin/Makefile.am
@@ -57,7 +57,7 @@ if PLUGIN_HOST
 else # PLUGIN_TARGET
   plugin_includedir = $(libsubincludedir)
   plugin_include_HEADERS = main_target_image.h
-  AM_CPPFLAGS = $(CPPFLAGS) -DLINUX -DCOI_LIBRARY_VERSION=2 -DOFFLOAD_DEBUG=1 -DSEP_SUPPORT -DTIMING_SUPPORT -DHOST_LIBRARY=0 -I$(coi_inc_dir) -I$(liboffload_src_dir) -I$(libgomp_dir)
+  AM_CPPFLAGS = $(CPPFLAGS) -DLINUX -DCOI_LIBRARY_VERSION=2 -DOFFLOAD_DEBUG=1 -DSEP_SUPPORT -DTIMING_SUPPORT -DHOST_LIBRARY=0 -I$(coi_inc_dir) -I$(liboffload_src_dir) -I$(libgomp_src_dir) -I$(libgomp_dir)
   AM_CXXFLAGS = $(CXXFLAGS)
   AM_LDFLAGS = -L$(liboffload_dir)/.libs -L$(libgomp_dir)/.libs -loffloadmic_target -lcoi_device -lgomp -rdynamic
 endif
diff --git a/liboffloadmic/plugin/Makefile.in b/liboffloadmic/plugin/Makefile.in
index 8d5ad0025c2..45dcd01bab3 100644
--- a/liboffloadmic/plugin/Makefile.in
+++ b/liboffloadmic/plugin/Makefile.in
@@ -401,7 +401,7 @@ target_install_dir = $(accel_search_dir)/lib/gcc/$(accel_target)/$(gcc_version)$
 @PLUGIN_HOST_TRUE@libgomp_plugin_intelmic_la_LDFLAGS = -L$(liboffload_dir)/.libs -loffloadmic_host -version-info 1:0:0
 @PLUGIN_HOST_FALSE@plugin_includedir = $(libsubincludedir)
 @PLUGIN_HOST_FALSE@plugin_include_HEADERS = main_target_image.h
-@PLUGIN_HOST_FALSE@AM_CPPFLAGS = $(CPPFLAGS) -DLINUX -DCOI_LIBRARY_VERSION=2 -DOFFLOAD_DEBUG=1 -DSEP_SUPPORT -DTIMING_SUPPORT -DHOST_LIBRARY=0 -I$(coi_inc_dir) -I$(liboffload_src_dir) -I$(libgomp_dir)
+@PLUGIN_HOST_FALSE@AM_CPPFLAGS = $(CPPFLAGS) -DLINUX -DCOI_LIBRARY_VERSION=2 -DOFFLOAD_DEBUG=1 -DSEP_SUPPORT -DTIMING_SUPPORT -DHOST_LIBRARY=0 -I$(coi_inc_dir) -I$(liboffload_src_dir) -I$(libgomp_src_dir) -I$(libgomp_dir)
 @PLUGIN_HOST_FALSE@AM_CXXFLAGS = $(CXXFLAGS)
 @PLUGIN_HOST_FALSE@AM_LDFLAGS = -L$(liboffload_dir)/.libs -L$(libgomp_dir)/.libs -loffloadmic_target -lcoi_device -lgomp -rdynamic
 
@@ -838,3 +838,5 @@ maintainer-clean-local: maintainer-clean-multi
 # Tell versions [3.59,3.63) of GNU make to not export all variables.
 # Otherwise a system limit (for SysV at least) may be exceeded.
 .NOEXPORT:
+
+#TODO Properly regenerate.
diff --git a/liboffloadmic/plugin/offload_target_main.cpp b/liboffloadmic/plugin/offload_target_main.cpp
index ccf7240e286..3b7e06e249c 100644
--- a/liboffloadmic/plugin/offload_target_main.cpp
+++ b/liboffloadmic/plugin/offload_target_main.cpp
@@ -31,6 +31,7 @@
 #include <stdint.h>
 #include <stdio.h>
 #include <stdlib.h>
+#include "libgomp-plugin.h"
 #include "compiler_if_target.h"
 
 
@@ -91,7 +92,10 @@ static void *last_var_ptr = NULL;
 static int last_var_size = 0;
 
 
-/* Override the corresponding functions from libgomp.  */
+/* Override functions from 'libgomp/icv-device.c' (see
+   'libgomp/config/[...]/icv-device.c') as well as the corresponding
+   'libgomp/fortran.c' wrapper routines.  */
+
 extern "C" int
 omp_is_initial_device (void) __GOMP_NOTHROW
 {
@@ -104,6 +108,24 @@ omp_is_initial_device_ (void)
   return omp_is_initial_device ();
 }
 
+#ifndef GOMP_DEVICE_NUM_VAR
+# error
+#endif
+
+static volatile int GOMP_DEVICE_NUM_VAR;
+
+extern "C" int
+omp_get_device_num (void) __GOMP_NOTHROW
+{
+  return GOMP_DEVICE_NUM_VAR;
+}
+
+extern "C" int32_t
+omp_get_device_num_ (void)
+{
+  return omp_get_device_num ();
+}
+
 
 /* Dummy function needed for the initialization of target process during the
    first call to __offload_offload1.  */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] libgomp, OpenMP: Fix issue for omp_get_device_num on gfx targets.
  2022-01-12  9:43 ` [PATCH] libgomp, OpenMP: Fix issue for omp_get_device_num on gfx targets Marcel Vollweiler
@ 2022-01-18 12:25   ` Thomas Schwinge
  2022-01-18 13:55     ` Andrew Stubbs
  2022-01-18 14:31     ` Marcel Vollweiler
  2022-01-18 13:47   ` Andrew Stubbs
  1 sibling, 2 replies; 21+ messages in thread
From: Thomas Schwinge @ 2022-01-18 12:25 UTC (permalink / raw)
  To: Marcel Vollweiler, Chung-Lin Tang
  Cc: gcc-patches, Jakub Jelinek, Tom de Vries, Andrew Stubbs,
	Julian Brown, Tobias Burnus

Hi!

Maybe I'm just totally confused -- as so often ;-) -- but things seem
strange here:

On 2022-01-12T10:43:05+0100, Marcel Vollweiler <marcel@codesourcery.com> wrote:
> Currently omp_get_device_num does not work on gcn targets with more than
> one offload device. The reason is that GOMP_DEVICE_NUM_VAR

I understand the 'GOMP_DEVICE_NUM_VAR' "macro indirection" is so that we
define the actual symbol name ('__gomp_device_num') in one place
('libgomp/libgomp-plugin.h'), and then use it (via macro expansion) in
several places, right?

> is static in
> icv-device.c and thus "__gomp_device_num" is not visible in the offload
> image.

That behavior seems correct -- but undesired indeed?

> This patch removes "static" such that "__gomp_device_num" is now part of
> the offload image and can now be found in GOMP_OFFLOAD_load_image in the
> plugin.

That seems correct?

Or, is there a reason to have it 'static', say, so that several such
local variables can co-exist, instead of just one global one?

> This is not an issue for nvptx. There, "__gomp_device_num" is in the
> offload image even with "static".

That's unexpected then, and should be looked into?

Still, should 'static' be removed here, too?


Grüße
 Thomas


> The patch was tested on x86_64-linux with amdgcn offloading with no
> regressions.


> libgomp, OpenMP: Fix issue for omp_get_device_num on gcn targets.
>
> Currently omp_get_device_num does not work on gcn targets with more than one
> offload device. The reason is that GOMP_DEVICE_NUM_VAR is static in
> icv-device.c and thus "__gomp_device_num" is not visible in the offload image.
>
> This patch removes "static" such that "__gomp_device_num" is now part of the
> offload image and can now be found in GOMP_OFFLOAD_load_image in the plugin.
>
> This is not an issue for nvptx. There, "__gomp_device_num" is in the offload
> image even with "static".
>
> libgomp/ChangeLog:
>
>       * config/gcn/icv-device.c: Make GOMP_DEVICE_NUM_VAR public (remove
>       "static") to make the device num available in the offload image.
>
> diff --git a/libgomp/config/gcn/icv-device.c b/libgomp/config/gcn/icv-device.c
> index fcfa0f3..f70b7e6 100644
> --- a/libgomp/config/gcn/icv-device.c
> +++ b/libgomp/config/gcn/icv-device.c
> @@ -60,7 +60,7 @@ omp_is_initial_device (void)
>
>  /* This is set to the device number of current GPU during device initialization,
>     when the offload image containing this libgomp portion is loaded.  */
> -static volatile int GOMP_DEVICE_NUM_VAR;
> +volatile int GOMP_DEVICE_NUM_VAR;
>
>  int
>  omp_get_device_num (void)
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] libgomp, OpenMP: Fix issue for omp_get_device_num on gfx targets.
  2022-01-12  9:43 ` [PATCH] libgomp, OpenMP: Fix issue for omp_get_device_num on gfx targets Marcel Vollweiler
  2022-01-18 12:25   ` Thomas Schwinge
@ 2022-01-18 13:47   ` Andrew Stubbs
  1 sibling, 0 replies; 21+ messages in thread
From: Andrew Stubbs @ 2022-01-18 13:47 UTC (permalink / raw)
  To: Marcel Vollweiler, gcc-patches

Sorry, I had not seen that this was entirely within my amdgcn remit....

On 12/01/2022 09:43, Marcel Vollweiler wrote:
> Hi,
> 
> Currently omp_get_device_num does not work on gcn targets with more than
> one offload device. The reason is that GOMP_DEVICE_NUM_VAR is static in
> icv-device.c and thus "__gomp_device_num" is not visible in the offload
> image.
> 
> This patch removes "static" such that "__gomp_device_num" is now part of
> the offload image and can now be found in GOMP_OFFLOAD_load_image in the
> plugin.

To be clear, omp_get_device_num doesn't work correctly with any number 
of offload devices. It so happens that the uninitialized value (zero) is 
the correct answer on the first device so the failure was hidden on 
single-device machines.

> This is not an issue for nvptx. There, "__gomp_device_num" is in the
> offload image even with "static".

This is probably related to the unusual way PTX files are "linked".

> The patch was tested on x86_64-linux with amdgcn offloading with no
> regressions.

OK to commit. I know this isn't a regression, but it's a silly bug in a 
new feature on a secondary target, so I think we can have this in stage 4.

Andrew

P.S. Like Thomas says, the static can probably be safely removed in the 
NVPTX file also, but then I think you're planning to unify the two files 
in any case?


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] libgomp, OpenMP: Fix issue for omp_get_device_num on gfx targets.
  2022-01-18 12:25   ` Thomas Schwinge
@ 2022-01-18 13:55     ` Andrew Stubbs
  2022-01-18 14:31     ` Marcel Vollweiler
  1 sibling, 0 replies; 21+ messages in thread
From: Andrew Stubbs @ 2022-01-18 13:55 UTC (permalink / raw)
  To: Thomas Schwinge, Marcel Vollweiler, Chung-Lin Tang
  Cc: Jakub Jelinek, Julian Brown, gcc-patches, Tobias Burnus

On 18/01/2022 12:25, Thomas Schwinge wrote:
> Hi!
> 
> Maybe I'm just totally confused -- as so often ;-) -- but things seem
> strange here:
> 
> On 2022-01-12T10:43:05+0100, Marcel Vollweiler <marcel@codesourcery.com> wrote:
>> Currently omp_get_device_num does not work on gcn targets with more than
>> one offload device. The reason is that GOMP_DEVICE_NUM_VAR
> 
> I understand the 'GOMP_DEVICE_NUM_VAR' "macro indirection" is so that we
> define the actual symbol name ('__gomp_device_num') in one place
> ('libgomp/libgomp-plugin.h'), and then use it (via macro expansion) in
> several places, right?

Right. I don't know what the motivation behind that design was, but 
Marcel has not invented this.

>> is static in
>> icv-device.c and thus "__gomp_device_num" is not visible in the offload
>> image.
> 
> That behavior seems correct -- but undesired indeed?
> 
>> This patch removes "static" such that "__gomp_device_num" is now part of
>> the offload image and can now be found in GOMP_OFFLOAD_load_image in the
>> plugin.
> 
> That seems correct?
> 
> Or, is there a reason to have it 'static', say, so that several such
> local variables can co-exist, instead of just one global one?

If there were several similar symbols "coexisting" then the libgomp 
plugin would not know which to update at kernel launch time, and 
different parts of libgomp would see different values for the same ICV. 
I can't think of any reason why we'd want that?

My guess is that it started out as a static variable inside a function 
and was then moved to file-scope without removing the keyword.

> 
>> This is not an issue for nvptx. There, "__gomp_device_num" is in the
>> offload image even with "static".
> 
> That's unexpected then, and should be looked into?

I suspect it's just because NVPTX doesn't use ELF at link time. Multiple 
static variables with the same name in different translation units might 
even be broken?

> Still, should 'static' be removed here, too?

I would have thought so. In fact there's no need for nvptx and amdgcn to 
have independent icv-device.c files, given that they must both implement 
the same features and there's nothing architecture-specific going on 
here (thus far).

Andrew

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] libgomp, OpenMP: Fix issue for omp_get_device_num on gfx targets.
  2022-01-18 12:25   ` Thomas Schwinge
  2022-01-18 13:55     ` Andrew Stubbs
@ 2022-01-18 14:31     ` Marcel Vollweiler
  1 sibling, 0 replies; 21+ messages in thread
From: Marcel Vollweiler @ 2022-01-18 14:31 UTC (permalink / raw)
  To: Thomas Schwinge, Chung-Lin Tang
  Cc: gcc-patches, Jakub Jelinek, Tom de Vries, Andrew Stubbs,
	Julian Brown, Tobias Burnus

Hi Thomas,

Am 18.01.2022 um 13:25 schrieb Thomas Schwinge:
> Hi!
>
> Maybe I'm just totally confused -- as so often ;-) -- but things seem
> strange here:
>
> On 2022-01-12T10:43:05+0100, Marcel Vollweiler <marcel@codesourcery.com> wrote:
>> Currently omp_get_device_num does not work on gcn targets with more than
>> one offload device. The reason is that GOMP_DEVICE_NUM_VAR
>
> I understand the 'GOMP_DEVICE_NUM_VAR' "macro indirection" is so that we
> define the actual symbol name ('__gomp_device_num') in one place
> ('libgomp/libgomp-plugin.h'), and then use it (via macro expansion) in
> several places, right?

Yes, as far as I understood.

>
>> is static in
>> icv-device.c and thus "__gomp_device_num" is not visible in the offload
>> image.
>
> That behavior seems correct -- but undesired indeed?

Good question. In contrast to nvptx I observed that __gomp_device_num is
not part of the offload image which we read out in
GOMP_OFFLOAD_load_image ("if (status != HSA_STATUS_SUCCESS)" in
libgomp/plugin/plugin-gcn.c returns false). I validated it with some
additional output in the if-branches.

>
>> This patch removes "static" such that "__gomp_device_num" is now part of
>> the offload image and can now be found in GOMP_OFFLOAD_load_image in the
>> plugin.
>
> That seems correct?
>
> Or, is there a reason to have it 'static', say, so that several such
> local variables can co-exist, instead of just one global one?
>
>> This is not an issue for nvptx. There, "__gomp_device_num" is in the
>> offload image even with "static".
>
> That's unexpected then, and should be looked into?

Actually, I don't see the reason for the different behaviour for nvptx.
I just tested that for nvptx the correct device number is returned by
omp_get_device_num on the device - also if we have more than one device.

>
> Still, should 'static' be removed here, too?

I wouldn't suggest unless it is really necessary? I mean, there we don't
have any issue. Although I aggree with Andrew that we could combine both
icv-device.c files into one common file.

Marcel
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2022-01-18 14:32 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-23 10:21 [PATCH, libgomp, OpenMP 5.0] Implement omp_get_device_num Chung-Lin Tang
2021-07-23 10:39 ` Jakub Jelinek
2021-08-02 13:10   ` [PATCH, v2, " Chung-Lin Tang
2021-08-03 11:08     ` Jakub Jelinek
2021-08-03 12:07     ` Thomas Schwinge
2021-08-05 15:30       ` [PATCH, v3, libgomp, OpenMP 5.0, committed] " Chung-Lin Tang
2021-08-09  7:16         ` [PATCH, libgomp, OpenMP 5.0, OG11, " Chung-Lin Tang
2021-08-09  9:30           ` Julian Brown
2021-08-09  9:30             ` Julian Brown
2021-08-03 12:22     ` [PATCH, v2, libgomp, OpenMP 5.0] " Thomas Schwinge
2021-08-05  8:34       ` Chung-Lin Tang
2021-07-23 11:01 ` [PATCH, " Tobias Burnus
2021-08-02 13:09   ` Chung-Lin Tang
2022-01-04  9:28 ` [PATCH, OpenMP, libgomp, committed] Fix GOMP_DEVICE_NUM_VAR stringification error Chung-Lin Tang
2022-01-04 14:12   ` [committed] libgomp/testsuite: Improve omp_get_device_num() tests (was: Re: [PATCH, OpenMP, libgomp, committed] Fix GOMP_DEVICE_NUM_VAR stringification error) Tobias Burnus
2022-01-13 12:22     ` [committed] libgomp/testsuite: Improve omp_get_device_num() tests Thomas Schwinge
2022-01-12  9:43 ` [PATCH] libgomp, OpenMP: Fix issue for omp_get_device_num on gfx targets Marcel Vollweiler
2022-01-18 12:25   ` Thomas Schwinge
2022-01-18 13:55     ` Andrew Stubbs
2022-01-18 14:31     ` Marcel Vollweiler
2022-01-18 13:47   ` Andrew Stubbs

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).