public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Fix OpenACC shutdown and PTX image unloading (PR65904)
@ 2015-05-01  9:47 Julian Brown
  2015-05-06  8:33 ` Thomas Schwinge
  0 siblings, 1 reply; 4+ messages in thread
From: Julian Brown @ 2015-05-01  9:47 UTC (permalink / raw)
  To: Thomas Schwinge, gcc-patches, Jakub Jelinek, Ilya Verbin

[-- Attachment #1: Type: text/plain, Size: 1641 bytes --]

Hi,

This patch fixes PR65904, a double-free error that started occurring
after recent libgomp changes to the way offload images are registered
with the runtime.

Offload images now map all functions/data using just two malloc'ed
blocks, but the function gomp_free_memmap did not take that into
account, and treated all mappings as if they had their own blocks (as
they do if created by gomp_map_vars): so attempting to free the whole
map at once failed when it hit mappings for an offload image.

The fix is to split offload-image freeing out of GOMP_offload_unregister
into a separate function, and call that from gomp_free_memmap for the
given device before freeing the rest of the memory map.

The patch also fixes a thinko that was revealed in image unloading in
the NVPTX backend. Tested for libgomp with PTX offloading.

OK for trunk?

Thanks,

Julian

ChangeLog

    libgomp/
    * libgomp.h (gomp_free_memmap): Update prototype.
    * oacc-init.c (acc_shutdown_1): Pass device descriptor to
    gomp_free_memmap. Don't lock device around call.
    * target.c (gomp_map_vars): Initialise tgt->array to NULL before
    early exit.
    (GOMP_offload_unregister): Split out and call...
    (gomp_deoffload_image_from_device): This new function.
    (gomp_free_memmap): Call gomp_deoffload_image_from_device.
    
    * plugin/nvptx.c (struct ptx_image_data): Add ord, fn_descs fields.
    (GOMP_OFFLOAD_load_image): Populate above fields.
    (GOMP_OFFLOAD_unload_image): Switch to ORD'th device before freeing
    images, and use fn_descs field from ptx_image_data instead of
    incorrectly using a pointer derived from target_data.
 

[-- Attachment #2: openacc-shutdown-unloading-fixes-1.diff --]
[-- Type: text/x-patch, Size: 10259 bytes --]

commit 14e8e35a494a5a8231ab1a3cad38a2157bca7e4a
Author: Julian Brown <julian@codesourcery.com>
Date:   Thu Apr 30 10:19:58 2015 -0700

    Fix freeing of memory maps during acc shutdown.

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 5272f01..5e0e09c 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -777,7 +777,7 @@ extern struct target_mem_desc *gomp_map_vars (struct gomp_device_descr *,
 extern void gomp_copy_from_async (struct target_mem_desc *);
 extern void gomp_unmap_vars (struct target_mem_desc *, bool);
 extern void gomp_init_device (struct gomp_device_descr *);
-extern void gomp_free_memmap (struct splay_tree_s *);
+extern void gomp_free_memmap (struct gomp_device_descr *);
 extern void gomp_fini_device (struct gomp_device_descr *);
 
 /* work.c */
diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
index 503f8b8..f2c60ec 100644
--- a/libgomp/oacc-init.c
+++ b/libgomp/oacc-init.c
@@ -245,9 +245,7 @@ acc_shutdown_1 (acc_device_t d)
 
       if (walk->dev)
 	{
-	  gomp_mutex_lock (&walk->dev->lock);
-	  gomp_free_memmap (&walk->dev->mem_map);
-	  gomp_mutex_unlock (&walk->dev->lock);
+	  gomp_free_memmap (walk->dev);
 
 	  walk->dev = NULL;
 	  walk->base_dev = NULL;
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 583ec87..2cc0ae0 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -334,8 +334,10 @@ struct ptx_event
 
 struct ptx_image_data
 {
+  int ord;
   void *target_data;
   CUmodule module;
+  struct targ_fn_descriptor *fn_descs;
   struct ptx_image_data *next;
 };
 
@@ -1625,13 +1627,6 @@ GOMP_OFFLOAD_load_image (int ord, void *target_data,
 
   link_ptx (&module, img_header[0]);
 
-  pthread_mutex_lock (&ptx_image_lock);
-  new_image = GOMP_PLUGIN_malloc (sizeof (struct ptx_image_data));
-  new_image->target_data = target_data;
-  new_image->module = module;
-  new_image->next = ptx_images;
-  ptx_images = new_image;
-  pthread_mutex_unlock (&ptx_image_lock);
 
   /* The mkoffload utility emits a table of pointers/integers at the start of
      each offload image:
@@ -1652,8 +1647,21 @@ GOMP_OFFLOAD_load_image (int ord, void *target_data,
 
   *target_table = GOMP_PLUGIN_malloc (sizeof (struct addr_pair)
 				      * (fn_entries + var_entries));
-  targ_fns = GOMP_PLUGIN_malloc (sizeof (struct targ_fn_descriptor)
-				 * fn_entries);
+  if (fn_entries > 0)
+    targ_fns = GOMP_PLUGIN_malloc (sizeof (struct targ_fn_descriptor)
+				   * fn_entries);
+  else
+    targ_fns = NULL;
+
+  pthread_mutex_lock (&ptx_image_lock);
+  new_image = GOMP_PLUGIN_malloc (sizeof (struct ptx_image_data));
+  new_image->ord = ord;
+  new_image->target_data = target_data;
+  new_image->module = module;
+  new_image->fn_descs = targ_fns;
+  new_image->next = ptx_images;
+  ptx_images = new_image;
+  pthread_mutex_unlock (&ptx_image_lock);
 
   for (i = 0; i < fn_entries; i++)
     {
@@ -1687,23 +1695,22 @@ GOMP_OFFLOAD_load_image (int ord, void *target_data,
 }
 
 void
-GOMP_OFFLOAD_unload_image (int tid __attribute__((unused)), void *target_data)
+GOMP_OFFLOAD_unload_image (int ord, void *target_data)
 {
-  void **img_header = (void **) target_data;
-  struct targ_fn_descriptor *targ_fns
-    = (struct targ_fn_descriptor *) img_header[0];
   struct ptx_image_data *image, *prev = NULL, *newhd = NULL;
 
-  free (targ_fns);
+  nvptx_attach_host_thread_to_device (ord);
 
   pthread_mutex_lock (&ptx_image_lock);
   for (image = ptx_images; image != NULL;)
     {
       struct ptx_image_data *next = image->next;
 
-      if (image->target_data == target_data)
+      if (image->ord == ord && image->target_data == target_data)
 	{
 	  cuModuleUnload (image->module);
+	  if (image->fn_descs)
+	    free (image->fn_descs);
 	  free (image);
 	  if (prev)
 	    prev->next = next;
diff --git a/libgomp/target.c b/libgomp/target.c
index d8da783..3ac674c 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -178,6 +178,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
   tgt->list_count = mapnum;
   tgt->refcount = 1;
   tgt->device_descr = devicep;
+  tgt->array = NULL;
 
   if (mapnum == 0)
     return tgt;
@@ -275,7 +276,6 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
   if (is_target)
     tgt_size = mapnum * sizeof (void *);
 
-  tgt->array = NULL;
   if (not_found_cnt)
     {
       tgt->array = gomp_malloc (not_found_cnt * sizeof (*tgt->array));
@@ -797,32 +797,79 @@ GOMP_offload_register (void *host_table, enum offload_target_type target_type,
   gomp_mutex_unlock (&register_lock);
 }
 
-/* This function should be called from every offload image while unloading.
-   It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of
-   the target, and TARGET_DATA needed by target plugin.  */
+/* DEVICEP should be locked on entry, and remains locked on exit.  */
 
-void
-GOMP_offload_unregister (void *host_table, enum offload_target_type target_type,
-			 void *target_data)
+static void
+gomp_deoffload_image_from_device (struct gomp_device_descr *devicep,
+				  void *host_table, void *target_data)
 {
   void **host_func_table = ((void ***) host_table)[0];
   void **host_funcs_end  = ((void ***) host_table)[1];
   void **host_var_table  = ((void ***) host_table)[2];
   void **host_vars_end   = ((void ***) host_table)[3];
-  int i;
 
   /* The func table contains only addresses, the var table contains addresses
      and corresponding sizes.  */
   int num_funcs = host_funcs_end - host_func_table;
   int num_vars  = (host_vars_end - host_var_table) / 2;
+  int j;
+
+  devicep->unload_image_func (devicep->target_id, target_data);
+
+  /* Remove mapping from splay tree.  */
+  struct splay_tree_key_s k;
+  splay_tree_key node = NULL;
+  if (num_funcs > 0)
+    {
+      k.host_start = (uintptr_t) host_func_table[0];
+      k.host_end = k.host_start + 1;
+      node = splay_tree_lookup (&devicep->mem_map, &k);
+    }
+  else if (num_vars > 0)
+    {
+      k.host_start = (uintptr_t) host_var_table[0];
+      k.host_end = k.host_start + (uintptr_t) host_var_table[1];
+      node = splay_tree_lookup (&devicep->mem_map, &k);
+    }
+
+  for (j = 0; j < num_funcs; j++)
+    {
+      k.host_start = (uintptr_t) host_func_table[j];
+      k.host_end = k.host_start + 1;
+      splay_tree_remove (&devicep->mem_map, &k);
+    }
+
+  for (j = 0; j < num_vars; j++)
+    {
+      k.host_start = (uintptr_t) host_var_table[j * 2];
+      k.host_end = k.host_start + (uintptr_t) host_var_table[j * 2 + 1];
+      splay_tree_remove (&devicep->mem_map, &k);
+    }
+
+  if (node)
+    {
+      free (node->tgt);
+      free (node);
+    }
+}
+
+/* This function should be called from every offload image while unloading.
+   It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of
+   the target, and TARGET_DATA needed by target plugin.  */
+
+void
+GOMP_offload_unregister (void *host_table, enum offload_target_type target_type,
+			 void *target_data)
+{
+  int i;
 
   gomp_mutex_lock (&register_lock);
 
   /* Unload image from all initialized devices.  */
   for (i = 0; i < num_devices; i++)
     {
-      int j;
       struct gomp_device_descr *devicep = &devices[i];
+      
       gomp_mutex_lock (&devicep->lock);
       if (devicep->type != target_type || !devicep->is_initialized)
 	{
@@ -830,44 +877,7 @@ GOMP_offload_unregister (void *host_table, enum offload_target_type target_type,
 	  continue;
 	}
 
-      devicep->unload_image_func (devicep->target_id, target_data);
-
-      /* Remove mapping from splay tree.  */
-      struct splay_tree_key_s k;
-      splay_tree_key node = NULL;
-      if (num_funcs > 0)
-	{
-	  k.host_start = (uintptr_t) host_func_table[0];
-	  k.host_end = k.host_start + 1;
-	  node = splay_tree_lookup (&devicep->mem_map, &k);
-	}
-      else if (num_vars > 0)
-	{
-	  k.host_start = (uintptr_t) host_var_table[0];
-	  k.host_end = k.host_start + (uintptr_t) host_var_table[1];
-	  node = splay_tree_lookup (&devicep->mem_map, &k);
-	}
-
-      for (j = 0; j < num_funcs; j++)
-	{
-	  k.host_start = (uintptr_t) host_func_table[j];
-	  k.host_end = k.host_start + 1;
-	  splay_tree_remove (&devicep->mem_map, &k);
-	}
-
-      for (j = 0; j < num_vars; j++)
-	{
-	  k.host_start = (uintptr_t) host_var_table[j * 2];
-	  k.host_end = k.host_start + (uintptr_t) host_var_table[j * 2 + 1];
-	  splay_tree_remove (&devicep->mem_map, &k);
-	}
-
-      if (node)
-	{
-	  free (node->tgt);
-	  free (node);
-	}
-
+      gomp_deoffload_image_from_device (devicep, host_table, target_data);
       gomp_mutex_unlock (&devicep->lock);
     }
 
@@ -903,20 +913,50 @@ gomp_init_device (struct gomp_device_descr *devicep)
   devicep->is_initialized = true;
 }
 
-/* Free address mapping tables.  MM must be locked on entry, and remains locked
-   on return.  */
+/* Free address mapping tables for an active device DEVICEP.  This includes
+   both mapped offload functions/variables, and mapped user data regions.
+   To be used before shutting a device down: subsequently reinitialising the
+   device will repopulate the offload image mappings.  */
 
 attribute_hidden void
-gomp_free_memmap (struct splay_tree_s *mem_map)
+gomp_free_memmap (struct gomp_device_descr *devicep)
 {
+  int i;
+  struct splay_tree_s *mem_map = &devicep->mem_map;
+
+  assert (devicep->is_initialized);
+
+  gomp_mutex_lock (&devicep->lock);
+  
+  /* Unmap offload images that are registered to this device.  */
+  for (i = 0; i < num_offload_images; i++)
+    {
+      struct offload_image_descr *image = &offload_images[i];
+      if (image->type == devicep->type)
+        {
+	  void *host_table = image->host_table;
+	  void *target_data = image->target_data;
+
+	  gomp_deoffload_image_from_device (devicep, host_table, target_data);
+	}
+    }
+
+  /* Clear other mappings.  */
   while (mem_map->root)
     {
       struct target_mem_desc *tgt = mem_map->root->key.tgt;
 
       splay_tree_remove (mem_map, &mem_map->root->key);
-      free (tgt->array);
+
+      if (tgt->list_count == 0)
+	assert (!tgt->array);
+      else
+	free (tgt->array);
+
       free (tgt);
     }
+
+  gomp_mutex_unlock (&devicep->lock);
 }
 
 /* This function de-initializes the target device, specified by DEVICEP.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Fix OpenACC shutdown and PTX image unloading (PR65904)
  2015-05-01  9:47 [PATCH] Fix OpenACC shutdown and PTX image unloading (PR65904) Julian Brown
@ 2015-05-06  8:33 ` Thomas Schwinge
  2015-05-07 10:38   ` Julian Brown
  0 siblings, 1 reply; 4+ messages in thread
From: Thomas Schwinge @ 2015-05-06  8:33 UTC (permalink / raw)
  To: Julian Brown; +Cc: gcc-patches, Jakub Jelinek, Ilya Verbin

[-- Attachment #1: Type: text/plain, Size: 5348 bytes --]

Hi!

On Fri, 1 May 2015 10:47:19 +0100, Julian Brown <julian@codesourcery.com> wrote:
> This patch fixes PR65904, a double-free error that started occurring
> after recent libgomp changes to the way offload images are registered
> with the runtime.
> 
> Offload images now map all functions/data using just two malloc'ed
> blocks, but the function gomp_free_memmap did not take that into
> account, and treated all mappings as if they had their own blocks (as
> they do if created by gomp_map_vars): so attempting to free the whole
> map at once failed when it hit mappings for an offload image.
> 
> The fix is to split offload-image freeing out of GOMP_offload_unregister
> into a separate function, and call that from gomp_free_memmap for the
> given device before freeing the rest of the memory map.
> 
> The patch also fixes a thinko that was revealed in image unloading in
> the NVPTX backend. Tested for libgomp with PTX offloading.

Confirming that both nvptx (PR65904 fixed) and (emulated) intelmic (no
changes) testing look fine.

> OK for trunk?

> ChangeLog
> 
>     libgomp/

Don't forget to add the PR marker here.

> --- a/libgomp/plugin/plugin-nvptx.c
> +++ b/libgomp/plugin/plugin-nvptx.c
> @@ -1687,23 +1695,22 @@ GOMP_OFFLOAD_load_image (int ord, void *target_data,
>  }
>  
>  void
> -GOMP_OFFLOAD_unload_image (int tid __attribute__((unused)), void *target_data)
> +GOMP_OFFLOAD_unload_image (int ord, void *target_data)
>  {
> -  void **img_header = (void **) target_data;
> -  struct targ_fn_descriptor *targ_fns
> -    = (struct targ_fn_descriptor *) img_header[0];
>    struct ptx_image_data *image, *prev = NULL, *newhd = NULL;
>  
> -  free (targ_fns);
> +  nvptx_attach_host_thread_to_device (ord);

By the way, do we need to lock ptx_devices in
libgomp/plugin/plugin-nvptx.c:nvptx_attach_host_thread_to_device and
libgomp/plugin/plugin-nvptx.c:GOMP_OFFLOAD_openacc_create_thread_data?
(ptx_dev_lock?  If yes, its definition as well as instantiated_devices
should be moved to a more appropriate place, probably?)  Also, several
accesses to instantiated_devices are not locked by ptx_dev_lock but
should be, from my cursory review.

> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -797,32 +797,79 @@ GOMP_offload_register (void *host_table, enum offload_target_type target_type,
>    gomp_mutex_unlock (&register_lock);
>  }
>  
> -/* This function should be called from every offload image while unloading.
> -   It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of
> -   the target, and TARGET_DATA needed by target plugin.  */
> +/* DEVICEP should be locked on entry, and remains locked on exit.  */

(I'm not a native speaker, but would use what I consider to be more
explicit/precise language: »must be locked« instead of »should be«.  I'll
be happy to learn should they mean the same thing?)

>  
> -void
> -GOMP_offload_unregister (void *host_table, enum offload_target_type target_type,
> -			 void *target_data)
> +static void
> +gomp_deoffload_image_from_device (struct gomp_device_descr *devicep,
> +				  void *host_table, void *target_data)
>  {

> +/* This function should be called from every offload image while unloading.

s%from%for%, I think?  (And, s%should%must%, again?)

> +   It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of
> +   the target, and TARGET_DATA needed by target plugin.  */
> +
> +void
> +GOMP_offload_unregister (void *host_table, enum offload_target_type target_type,
> +			 void *target_data)
> +{

> -/* Free address mapping tables.  MM must be locked on entry, and remains locked
> -   on return.  */
> +/* Free address mapping tables for an active device DEVICEP.  This includes
> +   both mapped offload functions/variables, and mapped user data regions.
> +   To be used before shutting a device down: subsequently reinitialising the
> +   device will repopulate the offload image mappings.  */
>  
>  attribute_hidden void
> -gomp_free_memmap (struct splay_tree_s *mem_map)
> +gomp_free_memmap (struct gomp_device_descr *devicep)
>  {
> +  int i;
> +  struct splay_tree_s *mem_map = &devicep->mem_map;
> +
> +  assert (devicep->is_initialized);
> +
> +  gomp_mutex_lock (&devicep->lock);

Need to lock before first access to *devicep?

> +  
> +  /* Unmap offload images that are registered to this device.  */
> +  for (i = 0; i < num_offload_images; i++)
> +    {
> +      struct offload_image_descr *image = &offload_images[i];

Need to take register_lock when accessing offload_images?

> +      if (image->type == devicep->type)
> +        {
> +	  void *host_table = image->host_table;
> +	  void *target_data = image->target_data;
> +
> +	  gomp_deoffload_image_from_device (devicep, host_table, target_data);
> +	}
> +    }
> +
> +  /* Clear other mappings.  */
>    while (mem_map->root)
>      {
>        struct target_mem_desc *tgt = mem_map->root->key.tgt;
>  
>        splay_tree_remove (mem_map, &mem_map->root->key);
> -      free (tgt->array);
> +
> +      if (tgt->list_count == 0)
> +	assert (!tgt->array);
> +      else
> +	free (tgt->array);
> +
>        free (tgt);
>      }
> +
> +  gomp_mutex_unlock (&devicep->lock);
>  }


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Fix OpenACC shutdown and PTX image unloading (PR65904)
  2015-05-06  8:33 ` Thomas Schwinge
@ 2015-05-07 10:38   ` Julian Brown
  0 siblings, 0 replies; 4+ messages in thread
From: Julian Brown @ 2015-05-07 10:38 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: gcc-patches, Jakub Jelinek, Ilya Verbin

[-- Attachment #1: Type: text/plain, Size: 5193 bytes --]

On Wed, 6 May 2015 10:32:56 +0200
Thomas Schwinge <thomas@codesourcery.com> wrote:

> Hi!
> 
> On Fri, 1 May 2015 10:47:19 +0100, Julian Brown
> <julian@codesourcery.com> wrote:
> > The patch also fixes a thinko that was revealed in image unloading
> > in the NVPTX backend. Tested for libgomp with PTX offloading.
> 
> Confirming that both nvptx (PR65904 fixed) and (emulated) intelmic (no
> changes) testing look fine.

Thanks for testing!

> By the way, do we need to lock ptx_devices in
> libgomp/plugin/plugin-nvptx.c:nvptx_attach_host_thread_to_device and
> libgomp/plugin/plugin-nvptx.c:GOMP_OFFLOAD_openacc_create_thread_data?
> (ptx_dev_lock?  If yes, its definition as well as instantiated_devices
> should be moved to a more appropriate place, probably?)

Probably yes (though I'm not sure what you mean about moving the
instantiated_devices and ptx_dev_lock to a more appropriate place?).

> Also, several
> accesses to instantiated_devices are not locked by ptx_dev_lock but
> should be, from my cursory review.

I'm not sure about that.

> > --- a/libgomp/target.c
> > +++ b/libgomp/target.c
> > @@ -797,32 +797,79 @@ GOMP_offload_register (void *host_table, enum
> > offload_target_type target_type, gomp_mutex_unlock (&register_lock);
> >  }
> >  
> > -/* This function should be called from every offload image while
> > unloading.
> > -   It gets the descriptor of the host func and var tables
> > HOST_TABLE, TYPE of
> > -   the target, and TARGET_DATA needed by target plugin.  */
> > +/* DEVICEP should be locked on entry, and remains locked on exit.
> > */
> 
> (I'm not a native speaker, but would use what I consider to be more
> explicit/precise language: »must be locked« instead of »should be«.
> I'll be happy to learn should they mean the same thing?)

I've changed the wording in a couple of comments.

> >  
> > -void
> > -GOMP_offload_unregister (void *host_table, enum
> > offload_target_type target_type,
> > -			 void *target_data)
> > +static void
> > +gomp_deoffload_image_from_device (struct gomp_device_descr
> > *devicep,
> > +				  void *host_table, void
> > *target_data) {
> 
> > +/* This function should be called from every offload image while
> > unloading.
> 
> s%from%for%, I think?  (And, s%should%must%, again?)

No, this really is "from" -- this comment wasn't actually added by my
patch, just moved. I'm also not sure about "should" in this instance --
unloading an image is already a corner-case, and maybe there are
circumstances in which it'd be impossible for some given object to call
the function?

> > +   It gets the descriptor of the host func and var tables
> > HOST_TABLE, TYPE of
> > +   the target, and TARGET_DATA needed by target plugin.  */
> > +
> > +void
> > +GOMP_offload_unregister (void *host_table, enum
> > offload_target_type target_type,
> > +			 void *target_data)
> > +{
> 
> > -/* Free address mapping tables.  MM must be locked on entry, and
> > remains locked
> > -   on return.  */
> > +/* Free address mapping tables for an active device DEVICEP.  This
> > includes
> > +   both mapped offload functions/variables, and mapped user data
> > regions.
> > +   To be used before shutting a device down: subsequently
> > reinitialising the
> > +   device will repopulate the offload image mappings.  */
> >  
> >  attribute_hidden void
> > -gomp_free_memmap (struct splay_tree_s *mem_map)
> > +gomp_free_memmap (struct gomp_device_descr *devicep)
> >  {
> > +  int i;
> > +  struct splay_tree_s *mem_map = &devicep->mem_map;
> > +
> > +  assert (devicep->is_initialized);
> > +
> > +  gomp_mutex_lock (&devicep->lock);
> 
> Need to lock before first access to *devicep?

Fixed.

> > +  
> > +  /* Unmap offload images that are registered to this device.  */
> > +  for (i = 0; i < num_offload_images; i++)
> > +    {
> > +      struct offload_image_descr *image = &offload_images[i];
> 
> Need to take register_lock when accessing offload_images?

This too. Retested for libgomp/NVPTX.

OK for trunk now?

Thanks,

Julian

ChangeLog

    PR libgomp/65904

    libgomp/
    * libgomp.h (gomp_free_memmap): Update prototype.
    * oacc-init.c (acc_shutdown_1): Pass device descriptor to
    gomp_free_memmap. Don't lock device around call.
    * target.c (gomp_map_vars): Initialise tgt->array to NULL before
    early exit.
    (GOMP_offload_unregister): Split out and call...
    (gomp_deoffload_image_from_device): This new function.
    (gomp_free_memmap): Call gomp_deoffload_image_from_device.
    * plugin/nvptx.c (struct ptx_image_data): Add ord, fn_descs fields.
    (nvptx_init): Tweak comment.
    (nvptx_attach_host_thread_to_device): Add locking with ptx_dev_lock
    around ptx_devices accesses.
    (GOMP_OFFLOAD_load_image): Populate new ptx_image_data fields.
    (GOMP_OFFLOAD_unload_image): Switch to ORD'th device before freeing
    images, and use fn_descs field from ptx_image_data instead of
    incorrectly using a pointer derived from target_data.
    (GOMP_OFFLOAD_openacc_create_thread_data): Add locking around
    ptx_devices accesses.

[-- Attachment #2: openacc-shutdown-unloading-fixes-2.diff --]
[-- Type: text/x-patch, Size: 12525 bytes --]

Index: libgomp/target.c
===================================================================
--- libgomp/target.c	(revision 222861)
+++ libgomp/target.c	(working copy)
@@ -178,6 +178,7 @@ gomp_map_vars (struct gomp_device_descr
   tgt->list_count = mapnum;
   tgt->refcount = 1;
   tgt->device_descr = devicep;
+  tgt->array = NULL;
 
   if (mapnum == 0)
     return tgt;
@@ -275,7 +276,6 @@ gomp_map_vars (struct gomp_device_descr
   if (is_target)
     tgt_size = mapnum * sizeof (void *);
 
-  tgt->array = NULL;
   if (not_found_cnt)
     {
       tgt->array = gomp_malloc (not_found_cnt * sizeof (*tgt->array));
@@ -797,32 +797,79 @@ GOMP_offload_register (void *host_table,
   gomp_mutex_unlock (&register_lock);
 }
 
-/* This function should be called from every offload image while unloading.
-   It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of
-   the target, and TARGET_DATA needed by target plugin.  */
+/* DEVICEP must be locked on entry, and remains locked on exit.  */
 
-void
-GOMP_offload_unregister (void *host_table, enum offload_target_type target_type,
-			 void *target_data)
+static void
+gomp_deoffload_image_from_device (struct gomp_device_descr *devicep,
+				  void *host_table, void *target_data)
 {
   void **host_func_table = ((void ***) host_table)[0];
   void **host_funcs_end  = ((void ***) host_table)[1];
   void **host_var_table  = ((void ***) host_table)[2];
   void **host_vars_end   = ((void ***) host_table)[3];
-  int i;
 
   /* The func table contains only addresses, the var table contains addresses
      and corresponding sizes.  */
   int num_funcs = host_funcs_end - host_func_table;
   int num_vars  = (host_vars_end - host_var_table) / 2;
+  int j;
+
+  devicep->unload_image_func (devicep->target_id, target_data);
+
+  /* Remove mapping from splay tree.  */
+  struct splay_tree_key_s k;
+  splay_tree_key node = NULL;
+  if (num_funcs > 0)
+    {
+      k.host_start = (uintptr_t) host_func_table[0];
+      k.host_end = k.host_start + 1;
+      node = splay_tree_lookup (&devicep->mem_map, &k);
+    }
+  else if (num_vars > 0)
+    {
+      k.host_start = (uintptr_t) host_var_table[0];
+      k.host_end = k.host_start + (uintptr_t) host_var_table[1];
+      node = splay_tree_lookup (&devicep->mem_map, &k);
+    }
+
+  for (j = 0; j < num_funcs; j++)
+    {
+      k.host_start = (uintptr_t) host_func_table[j];
+      k.host_end = k.host_start + 1;
+      splay_tree_remove (&devicep->mem_map, &k);
+    }
+
+  for (j = 0; j < num_vars; j++)
+    {
+      k.host_start = (uintptr_t) host_var_table[j * 2];
+      k.host_end = k.host_start + (uintptr_t) host_var_table[j * 2 + 1];
+      splay_tree_remove (&devicep->mem_map, &k);
+    }
+
+  if (node)
+    {
+      free (node->tgt);
+      free (node);
+    }
+}
+
+/* This function should be called from every offload image while unloading.
+   It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of
+   the target, and TARGET_DATA needed by target plugin.  */
+
+void
+GOMP_offload_unregister (void *host_table, enum offload_target_type target_type,
+			 void *target_data)
+{
+  int i;
 
   gomp_mutex_lock (&register_lock);
 
   /* Unload image from all initialized devices.  */
   for (i = 0; i < num_devices; i++)
     {
-      int j;
       struct gomp_device_descr *devicep = &devices[i];
+      
       gomp_mutex_lock (&devicep->lock);
       if (devicep->type != target_type || !devicep->is_initialized)
 	{
@@ -830,44 +877,7 @@ GOMP_offload_unregister (void *host_tabl
 	  continue;
 	}
 
-      devicep->unload_image_func (devicep->target_id, target_data);
-
-      /* Remove mapping from splay tree.  */
-      struct splay_tree_key_s k;
-      splay_tree_key node = NULL;
-      if (num_funcs > 0)
-	{
-	  k.host_start = (uintptr_t) host_func_table[0];
-	  k.host_end = k.host_start + 1;
-	  node = splay_tree_lookup (&devicep->mem_map, &k);
-	}
-      else if (num_vars > 0)
-	{
-	  k.host_start = (uintptr_t) host_var_table[0];
-	  k.host_end = k.host_start + (uintptr_t) host_var_table[1];
-	  node = splay_tree_lookup (&devicep->mem_map, &k);
-	}
-
-      for (j = 0; j < num_funcs; j++)
-	{
-	  k.host_start = (uintptr_t) host_func_table[j];
-	  k.host_end = k.host_start + 1;
-	  splay_tree_remove (&devicep->mem_map, &k);
-	}
-
-      for (j = 0; j < num_vars; j++)
-	{
-	  k.host_start = (uintptr_t) host_var_table[j * 2];
-	  k.host_end = k.host_start + (uintptr_t) host_var_table[j * 2 + 1];
-	  splay_tree_remove (&devicep->mem_map, &k);
-	}
-
-      if (node)
-	{
-	  free (node->tgt);
-	  free (node);
-	}
-
+      gomp_deoffload_image_from_device (devicep, host_table, target_data);
       gomp_mutex_unlock (&devicep->lock);
     }
 
@@ -903,20 +913,54 @@ gomp_init_device (struct gomp_device_des
   devicep->is_initialized = true;
 }
 
-/* Free address mapping tables.  MM must be locked on entry, and remains locked
-   on return.  */
+/* Free address mapping tables for an active device DEVICEP.  This includes
+   both mapped offload functions/variables, and mapped user data regions.
+   To be used before shutting a device down: subsequently reinitialising the
+   device will repopulate the offload image mappings.  */
 
 attribute_hidden void
-gomp_free_memmap (struct splay_tree_s *mem_map)
+gomp_free_memmap (struct gomp_device_descr *devicep)
 {
+  int i;
+  struct splay_tree_s *mem_map = &devicep->mem_map;
+
+  gomp_mutex_lock (&devicep->lock);
+
+  assert (devicep->is_initialized);
+
+  gomp_mutex_lock (&register_lock);
+
+  /* Unmap offload images that are registered to this device.  */
+  for (i = 0; i < num_offload_images; i++)
+    {
+      struct offload_image_descr *image = &offload_images[i];
+      if (image->type == devicep->type)
+        {
+	  void *host_table = image->host_table;
+	  void *target_data = image->target_data;
+
+	  gomp_deoffload_image_from_device (devicep, host_table, target_data);
+	}
+    }
+
+  gomp_mutex_unlock (&register_lock);
+
+  /* Clear other mappings.  */
   while (mem_map->root)
     {
       struct target_mem_desc *tgt = mem_map->root->key.tgt;
 
       splay_tree_remove (mem_map, &mem_map->root->key);
-      free (tgt->array);
+
+      if (tgt->list_count == 0)
+	assert (!tgt->array);
+      else
+	free (tgt->array);
+
       free (tgt);
     }
+
+  gomp_mutex_unlock (&devicep->lock);
 }
 
 /* This function de-initializes the target device, specified by DEVICEP.
Index: libgomp/libgomp.h
===================================================================
--- libgomp/libgomp.h	(revision 222861)
+++ libgomp/libgomp.h	(working copy)
@@ -777,7 +777,7 @@ extern struct target_mem_desc *gomp_map_
 extern void gomp_copy_from_async (struct target_mem_desc *);
 extern void gomp_unmap_vars (struct target_mem_desc *, bool);
 extern void gomp_init_device (struct gomp_device_descr *);
-extern void gomp_free_memmap (struct splay_tree_s *);
+extern void gomp_free_memmap (struct gomp_device_descr *);
 extern void gomp_fini_device (struct gomp_device_descr *);
 
 /* work.c */
Index: libgomp/plugin/plugin-nvptx.c
===================================================================
--- libgomp/plugin/plugin-nvptx.c	(revision 222861)
+++ libgomp/plugin/plugin-nvptx.c	(working copy)
@@ -334,8 +334,10 @@ struct ptx_event
 
 struct ptx_image_data
 {
+  int ord;
   void *target_data;
   CUmodule module;
+  struct targ_fn_descriptor *fn_descs;
   struct ptx_image_data *next;
 };
 
@@ -589,7 +591,7 @@ select_stream_for_async (int async, pthr
 }
 
 /* Initialize the device.  Return TRUE on success, else FALSE.  PTX_DEV_LOCK
-   should be locked on entry and remains locked on exit.  */
+   must be locked on entry and remains locked on exit.  */
 static bool
 nvptx_init (void)
 {
@@ -643,12 +645,17 @@ nvptx_attach_host_thread_to_device (int
     {
       CUcontext old_ctx;
 
+      pthread_mutex_lock (&ptx_dev_lock);
+
       ptx_dev = ptx_devices[n];
       assert (ptx_dev);
 
       r = cuCtxGetCurrent (&thd_ctx);
       if (r != CUDA_SUCCESS)
-        GOMP_PLUGIN_fatal ("cuCtxGetCurrent error: %s", cuda_error (r));
+	{
+	  pthread_mutex_unlock (&ptx_dev_lock);
+          GOMP_PLUGIN_fatal ("cuCtxGetCurrent error: %s", cuda_error (r));
+	}
 
       /* We don't necessarily have a current context (e.g. if it has been
          destroyed.  Pop it if we do though.  */
@@ -656,10 +663,14 @@ nvptx_attach_host_thread_to_device (int
 	{
 	  r = cuCtxPopCurrent (&old_ctx);
 	  if (r != CUDA_SUCCESS)
-            GOMP_PLUGIN_fatal ("cuCtxPopCurrent error: %s", cuda_error (r));
+	    {
+	      pthread_mutex_unlock (&ptx_dev_lock);
+	      GOMP_PLUGIN_fatal ("cuCtxPopCurrent error: %s", cuda_error (r));
+	    }
 	}
 
       r = cuCtxPushCurrent (ptx_dev->ctx);
+      pthread_mutex_unlock (&ptx_dev_lock);
       if (r != CUDA_SUCCESS)
         GOMP_PLUGIN_fatal ("cuCtxPushCurrent error: %s", cuda_error (r));
     }
@@ -1625,13 +1636,6 @@ GOMP_OFFLOAD_load_image (int ord, void *
 
   link_ptx (&module, img_header[0]);
 
-  pthread_mutex_lock (&ptx_image_lock);
-  new_image = GOMP_PLUGIN_malloc (sizeof (struct ptx_image_data));
-  new_image->target_data = target_data;
-  new_image->module = module;
-  new_image->next = ptx_images;
-  ptx_images = new_image;
-  pthread_mutex_unlock (&ptx_image_lock);
 
   /* The mkoffload utility emits a table of pointers/integers at the start of
      each offload image:
@@ -1652,8 +1656,21 @@ GOMP_OFFLOAD_load_image (int ord, void *
 
   *target_table = GOMP_PLUGIN_malloc (sizeof (struct addr_pair)
 				      * (fn_entries + var_entries));
-  targ_fns = GOMP_PLUGIN_malloc (sizeof (struct targ_fn_descriptor)
-				 * fn_entries);
+  if (fn_entries > 0)
+    targ_fns = GOMP_PLUGIN_malloc (sizeof (struct targ_fn_descriptor)
+				   * fn_entries);
+  else
+    targ_fns = NULL;
+
+  pthread_mutex_lock (&ptx_image_lock);
+  new_image = GOMP_PLUGIN_malloc (sizeof (struct ptx_image_data));
+  new_image->ord = ord;
+  new_image->target_data = target_data;
+  new_image->module = module;
+  new_image->fn_descs = targ_fns;
+  new_image->next = ptx_images;
+  ptx_images = new_image;
+  pthread_mutex_unlock (&ptx_image_lock);
 
   for (i = 0; i < fn_entries; i++)
     {
@@ -1687,23 +1704,22 @@ GOMP_OFFLOAD_load_image (int ord, void *
 }
 
 void
-GOMP_OFFLOAD_unload_image (int tid __attribute__((unused)), void *target_data)
+GOMP_OFFLOAD_unload_image (int ord, void *target_data)
 {
-  void **img_header = (void **) target_data;
-  struct targ_fn_descriptor *targ_fns
-    = (struct targ_fn_descriptor *) img_header[0];
   struct ptx_image_data *image, *prev = NULL, *newhd = NULL;
 
-  free (targ_fns);
+  nvptx_attach_host_thread_to_device (ord);
 
   pthread_mutex_lock (&ptx_image_lock);
   for (image = ptx_images; image != NULL;)
     {
       struct ptx_image_data *next = image->next;
 
-      if (image->target_data == target_data)
+      if (image->ord == ord && image->target_data == target_data)
 	{
 	  cuModuleUnload (image->module);
+	  if (image->fn_descs)
+	    free (image->fn_descs);
 	  free (image);
 	  if (prev)
 	    prev->next = next;
@@ -1833,13 +1849,18 @@ GOMP_OFFLOAD_openacc_create_thread_data
   CUresult r;
   CUcontext thd_ctx;
 
+  pthread_mutex_lock (&ptx_dev_lock);
+
   ptx_dev = ptx_devices[ord];
 
   assert (ptx_dev);
 
   r = cuCtxGetCurrent (&thd_ctx);
   if (r != CUDA_SUCCESS)
-    GOMP_PLUGIN_fatal ("cuCtxGetCurrent error: %s", cuda_error (r));
+    {
+      pthread_mutex_unlock (&ptx_dev_lock);
+      GOMP_PLUGIN_fatal ("cuCtxGetCurrent error: %s", cuda_error (r));
+    }
 
   assert (ptx_dev->ctx);
 
@@ -1847,12 +1868,17 @@ GOMP_OFFLOAD_openacc_create_thread_data
     {
       r = cuCtxPushCurrent (ptx_dev->ctx);
       if (r != CUDA_SUCCESS)
-	GOMP_PLUGIN_fatal ("cuCtxPushCurrent error: %s", cuda_error (r));
+	{
+	  pthread_mutex_unlock (&ptx_dev_lock);
+	  GOMP_PLUGIN_fatal ("cuCtxPushCurrent error: %s", cuda_error (r));
+	}
     }
 
   nvthd->current_stream = ptx_dev->null_stream;
   nvthd->ptx_dev = ptx_dev;
 
+  pthread_mutex_unlock (&ptx_dev_lock);
+
   return (void *) nvthd;
 }
 
Index: libgomp/oacc-init.c
===================================================================
--- libgomp/oacc-init.c	(revision 222862)
+++ libgomp/oacc-init.c	(working copy)
@@ -244,9 +244,7 @@ acc_shutdown_1 (acc_device_t d)
 
       if (walk->dev)
 	{
-	  gomp_mutex_lock (&walk->dev->lock);
-	  gomp_free_memmap (&walk->dev->mem_map);
-	  gomp_mutex_unlock (&walk->dev->lock);
+	  gomp_free_memmap (walk->dev);
 
 	  walk->dev = NULL;
 	  walk->base_dev = NULL;

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Fix OpenACC shutdown and PTX image unloading (PR65904)
       [not found] <f2cefaab-aaa4-03e0-ea6f-5e22ccd44997@suse.de>
@ 2019-01-15 10:38 ` Tom de Vries
  0 siblings, 0 replies; 4+ messages in thread
From: Tom de Vries @ 2019-01-15 10:38 UTC (permalink / raw)
  To: Julian Brown; +Cc: Thomas Schwinge, Jakub Jelinek, gcc-patches

[ add gcc-patches@ ]

On 15-01-19 11:38, Tom de Vries wrote:
> Hi
> 
> Copied from here (
> https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00532.html ):
>> This too. Retested for libgomp/NVPTX.
>>
>> OK for trunk now?
>>
> 
> The plugin-nvptx.c part looks ok to me, for stage 1.
> 
> Thanks,
> - Tom
> 
>> Thanks,
>>
>> Julian
>>
>> ChangeLog
>>
>>     PR libgomp/65904
>>
>>     libgomp/
>>     * libgomp.h (gomp_free_memmap): Update prototype.
>>     * oacc-init.c (acc_shutdown_1): Pass device descriptor to
>>     gomp_free_memmap. Don't lock device around call.
>>     * target.c (gomp_map_vars): Initialise tgt->array to NULL before
>>     early exit.
>>     (GOMP_offload_unregister): Split out and call...
>>     (gomp_deoffload_image_from_device): This new function.
>>     (gomp_free_memmap): Call gomp_deoffload_image_from_device.
>>     * plugin/nvptx.c (struct ptx_image_data): Add ord, fn_descs fields.
>>     (nvptx_init): Tweak comment.
>>     (nvptx_attach_host_thread_to_device): Add locking with ptx_dev_lock
>>     around ptx_devices accesses.
>>     (GOMP_OFFLOAD_load_image): Populate new ptx_image_data fields.
>>     (GOMP_OFFLOAD_unload_image): Switch to ORD'th device before freeing
>>     images, and use fn_descs field from ptx_image_data instead of
>>     incorrectly using a pointer derived from target_data.
>>     (GOMP_OFFLOAD_openacc_create_thread_data): Add locking around
>>     ptx_devices accesses.
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-01-15 10:38 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-01  9:47 [PATCH] Fix OpenACC shutdown and PTX image unloading (PR65904) Julian Brown
2015-05-06  8:33 ` Thomas Schwinge
2015-05-07 10:38   ` Julian Brown
     [not found] <f2cefaab-aaa4-03e0-ea6f-5e22ccd44997@suse.de>
2019-01-15 10:38 ` Tom de Vries

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).