* [PATCH] Fix OpenACC shutdown and PTX image unloading (PR65904) @ 2015-05-01 9:47 Julian Brown 2015-05-06 8:33 ` Thomas Schwinge 0 siblings, 1 reply; 4+ messages in thread From: Julian Brown @ 2015-05-01 9:47 UTC (permalink / raw) To: Thomas Schwinge, gcc-patches, Jakub Jelinek, Ilya Verbin [-- Attachment #1: Type: text/plain, Size: 1641 bytes --] Hi, This patch fixes PR65904, a double-free error that started occurring after recent libgomp changes to the way offload images are registered with the runtime. Offload images now map all functions/data using just two malloc'ed blocks, but the function gomp_free_memmap did not take that into account, and treated all mappings as if they had their own blocks (as they do if created by gomp_map_vars): so attempting to free the whole map at once failed when it hit mappings for an offload image. The fix is to split offload-image freeing out of GOMP_offload_unregister into a separate function, and call that from gomp_free_memmap for the given device before freeing the rest of the memory map. The patch also fixes a thinko that was revealed in image unloading in the NVPTX backend. Tested for libgomp with PTX offloading. OK for trunk? Thanks, Julian ChangeLog libgomp/ * libgomp.h (gomp_free_memmap): Update prototype. * oacc-init.c (acc_shutdown_1): Pass device descriptor to gomp_free_memmap. Don't lock device around call. * target.c (gomp_map_vars): Initialise tgt->array to NULL before early exit. (GOMP_offload_unregister): Split out and call... (gomp_deoffload_image_from_device): This new function. (gomp_free_memmap): Call gomp_deoffload_image_from_device. * plugin/nvptx.c (struct ptx_image_data): Add ord, fn_descs fields. (GOMP_OFFLOAD_load_image): Populate above fields. (GOMP_OFFLOAD_unload_image): Switch to ORD'th device before freeing images, and use fn_descs field from ptx_image_data instead of incorrectly using a pointer derived from target_data. [-- Attachment #2: openacc-shutdown-unloading-fixes-1.diff --] [-- Type: text/x-patch, Size: 10259 bytes --] commit 14e8e35a494a5a8231ab1a3cad38a2157bca7e4a Author: Julian Brown <julian@codesourcery.com> Date: Thu Apr 30 10:19:58 2015 -0700 Fix freeing of memory maps during acc shutdown. diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h index 5272f01..5e0e09c 100644 --- a/libgomp/libgomp.h +++ b/libgomp/libgomp.h @@ -777,7 +777,7 @@ extern struct target_mem_desc *gomp_map_vars (struct gomp_device_descr *, extern void gomp_copy_from_async (struct target_mem_desc *); extern void gomp_unmap_vars (struct target_mem_desc *, bool); extern void gomp_init_device (struct gomp_device_descr *); -extern void gomp_free_memmap (struct splay_tree_s *); +extern void gomp_free_memmap (struct gomp_device_descr *); extern void gomp_fini_device (struct gomp_device_descr *); /* work.c */ diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c index 503f8b8..f2c60ec 100644 --- a/libgomp/oacc-init.c +++ b/libgomp/oacc-init.c @@ -245,9 +245,7 @@ acc_shutdown_1 (acc_device_t d) if (walk->dev) { - gomp_mutex_lock (&walk->dev->lock); - gomp_free_memmap (&walk->dev->mem_map); - gomp_mutex_unlock (&walk->dev->lock); + gomp_free_memmap (walk->dev); walk->dev = NULL; walk->base_dev = NULL; diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c index 583ec87..2cc0ae0 100644 --- a/libgomp/plugin/plugin-nvptx.c +++ b/libgomp/plugin/plugin-nvptx.c @@ -334,8 +334,10 @@ struct ptx_event struct ptx_image_data { + int ord; void *target_data; CUmodule module; + struct targ_fn_descriptor *fn_descs; struct ptx_image_data *next; }; @@ -1625,13 +1627,6 @@ GOMP_OFFLOAD_load_image (int ord, void *target_data, link_ptx (&module, img_header[0]); - pthread_mutex_lock (&ptx_image_lock); - new_image = GOMP_PLUGIN_malloc (sizeof (struct ptx_image_data)); - new_image->target_data = target_data; - new_image->module = module; - new_image->next = ptx_images; - ptx_images = new_image; - pthread_mutex_unlock (&ptx_image_lock); /* The mkoffload utility emits a table of pointers/integers at the start of each offload image: @@ -1652,8 +1647,21 @@ GOMP_OFFLOAD_load_image (int ord, void *target_data, *target_table = GOMP_PLUGIN_malloc (sizeof (struct addr_pair) * (fn_entries + var_entries)); - targ_fns = GOMP_PLUGIN_malloc (sizeof (struct targ_fn_descriptor) - * fn_entries); + if (fn_entries > 0) + targ_fns = GOMP_PLUGIN_malloc (sizeof (struct targ_fn_descriptor) + * fn_entries); + else + targ_fns = NULL; + + pthread_mutex_lock (&ptx_image_lock); + new_image = GOMP_PLUGIN_malloc (sizeof (struct ptx_image_data)); + new_image->ord = ord; + new_image->target_data = target_data; + new_image->module = module; + new_image->fn_descs = targ_fns; + new_image->next = ptx_images; + ptx_images = new_image; + pthread_mutex_unlock (&ptx_image_lock); for (i = 0; i < fn_entries; i++) { @@ -1687,23 +1695,22 @@ GOMP_OFFLOAD_load_image (int ord, void *target_data, } void -GOMP_OFFLOAD_unload_image (int tid __attribute__((unused)), void *target_data) +GOMP_OFFLOAD_unload_image (int ord, void *target_data) { - void **img_header = (void **) target_data; - struct targ_fn_descriptor *targ_fns - = (struct targ_fn_descriptor *) img_header[0]; struct ptx_image_data *image, *prev = NULL, *newhd = NULL; - free (targ_fns); + nvptx_attach_host_thread_to_device (ord); pthread_mutex_lock (&ptx_image_lock); for (image = ptx_images; image != NULL;) { struct ptx_image_data *next = image->next; - if (image->target_data == target_data) + if (image->ord == ord && image->target_data == target_data) { cuModuleUnload (image->module); + if (image->fn_descs) + free (image->fn_descs); free (image); if (prev) prev->next = next; diff --git a/libgomp/target.c b/libgomp/target.c index d8da783..3ac674c 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -178,6 +178,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum, tgt->list_count = mapnum; tgt->refcount = 1; tgt->device_descr = devicep; + tgt->array = NULL; if (mapnum == 0) return tgt; @@ -275,7 +276,6 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum, if (is_target) tgt_size = mapnum * sizeof (void *); - tgt->array = NULL; if (not_found_cnt) { tgt->array = gomp_malloc (not_found_cnt * sizeof (*tgt->array)); @@ -797,32 +797,79 @@ GOMP_offload_register (void *host_table, enum offload_target_type target_type, gomp_mutex_unlock (®ister_lock); } -/* This function should be called from every offload image while unloading. - It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of - the target, and TARGET_DATA needed by target plugin. */ +/* DEVICEP should be locked on entry, and remains locked on exit. */ -void -GOMP_offload_unregister (void *host_table, enum offload_target_type target_type, - void *target_data) +static void +gomp_deoffload_image_from_device (struct gomp_device_descr *devicep, + void *host_table, void *target_data) { void **host_func_table = ((void ***) host_table)[0]; void **host_funcs_end = ((void ***) host_table)[1]; void **host_var_table = ((void ***) host_table)[2]; void **host_vars_end = ((void ***) host_table)[3]; - int i; /* The func table contains only addresses, the var table contains addresses and corresponding sizes. */ int num_funcs = host_funcs_end - host_func_table; int num_vars = (host_vars_end - host_var_table) / 2; + int j; + + devicep->unload_image_func (devicep->target_id, target_data); + + /* Remove mapping from splay tree. */ + struct splay_tree_key_s k; + splay_tree_key node = NULL; + if (num_funcs > 0) + { + k.host_start = (uintptr_t) host_func_table[0]; + k.host_end = k.host_start + 1; + node = splay_tree_lookup (&devicep->mem_map, &k); + } + else if (num_vars > 0) + { + k.host_start = (uintptr_t) host_var_table[0]; + k.host_end = k.host_start + (uintptr_t) host_var_table[1]; + node = splay_tree_lookup (&devicep->mem_map, &k); + } + + for (j = 0; j < num_funcs; j++) + { + k.host_start = (uintptr_t) host_func_table[j]; + k.host_end = k.host_start + 1; + splay_tree_remove (&devicep->mem_map, &k); + } + + for (j = 0; j < num_vars; j++) + { + k.host_start = (uintptr_t) host_var_table[j * 2]; + k.host_end = k.host_start + (uintptr_t) host_var_table[j * 2 + 1]; + splay_tree_remove (&devicep->mem_map, &k); + } + + if (node) + { + free (node->tgt); + free (node); + } +} + +/* This function should be called from every offload image while unloading. + It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of + the target, and TARGET_DATA needed by target plugin. */ + +void +GOMP_offload_unregister (void *host_table, enum offload_target_type target_type, + void *target_data) +{ + int i; gomp_mutex_lock (®ister_lock); /* Unload image from all initialized devices. */ for (i = 0; i < num_devices; i++) { - int j; struct gomp_device_descr *devicep = &devices[i]; + gomp_mutex_lock (&devicep->lock); if (devicep->type != target_type || !devicep->is_initialized) { @@ -830,44 +877,7 @@ GOMP_offload_unregister (void *host_table, enum offload_target_type target_type, continue; } - devicep->unload_image_func (devicep->target_id, target_data); - - /* Remove mapping from splay tree. */ - struct splay_tree_key_s k; - splay_tree_key node = NULL; - if (num_funcs > 0) - { - k.host_start = (uintptr_t) host_func_table[0]; - k.host_end = k.host_start + 1; - node = splay_tree_lookup (&devicep->mem_map, &k); - } - else if (num_vars > 0) - { - k.host_start = (uintptr_t) host_var_table[0]; - k.host_end = k.host_start + (uintptr_t) host_var_table[1]; - node = splay_tree_lookup (&devicep->mem_map, &k); - } - - for (j = 0; j < num_funcs; j++) - { - k.host_start = (uintptr_t) host_func_table[j]; - k.host_end = k.host_start + 1; - splay_tree_remove (&devicep->mem_map, &k); - } - - for (j = 0; j < num_vars; j++) - { - k.host_start = (uintptr_t) host_var_table[j * 2]; - k.host_end = k.host_start + (uintptr_t) host_var_table[j * 2 + 1]; - splay_tree_remove (&devicep->mem_map, &k); - } - - if (node) - { - free (node->tgt); - free (node); - } - + gomp_deoffload_image_from_device (devicep, host_table, target_data); gomp_mutex_unlock (&devicep->lock); } @@ -903,20 +913,50 @@ gomp_init_device (struct gomp_device_descr *devicep) devicep->is_initialized = true; } -/* Free address mapping tables. MM must be locked on entry, and remains locked - on return. */ +/* Free address mapping tables for an active device DEVICEP. This includes + both mapped offload functions/variables, and mapped user data regions. + To be used before shutting a device down: subsequently reinitialising the + device will repopulate the offload image mappings. */ attribute_hidden void -gomp_free_memmap (struct splay_tree_s *mem_map) +gomp_free_memmap (struct gomp_device_descr *devicep) { + int i; + struct splay_tree_s *mem_map = &devicep->mem_map; + + assert (devicep->is_initialized); + + gomp_mutex_lock (&devicep->lock); + + /* Unmap offload images that are registered to this device. */ + for (i = 0; i < num_offload_images; i++) + { + struct offload_image_descr *image = &offload_images[i]; + if (image->type == devicep->type) + { + void *host_table = image->host_table; + void *target_data = image->target_data; + + gomp_deoffload_image_from_device (devicep, host_table, target_data); + } + } + + /* Clear other mappings. */ while (mem_map->root) { struct target_mem_desc *tgt = mem_map->root->key.tgt; splay_tree_remove (mem_map, &mem_map->root->key); - free (tgt->array); + + if (tgt->list_count == 0) + assert (!tgt->array); + else + free (tgt->array); + free (tgt); } + + gomp_mutex_unlock (&devicep->lock); } /* This function de-initializes the target device, specified by DEVICEP. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] Fix OpenACC shutdown and PTX image unloading (PR65904) 2015-05-01 9:47 [PATCH] Fix OpenACC shutdown and PTX image unloading (PR65904) Julian Brown @ 2015-05-06 8:33 ` Thomas Schwinge 2015-05-07 10:38 ` Julian Brown 0 siblings, 1 reply; 4+ messages in thread From: Thomas Schwinge @ 2015-05-06 8:33 UTC (permalink / raw) To: Julian Brown; +Cc: gcc-patches, Jakub Jelinek, Ilya Verbin [-- Attachment #1: Type: text/plain, Size: 5348 bytes --] Hi! On Fri, 1 May 2015 10:47:19 +0100, Julian Brown <julian@codesourcery.com> wrote: > This patch fixes PR65904, a double-free error that started occurring > after recent libgomp changes to the way offload images are registered > with the runtime. > > Offload images now map all functions/data using just two malloc'ed > blocks, but the function gomp_free_memmap did not take that into > account, and treated all mappings as if they had their own blocks (as > they do if created by gomp_map_vars): so attempting to free the whole > map at once failed when it hit mappings for an offload image. > > The fix is to split offload-image freeing out of GOMP_offload_unregister > into a separate function, and call that from gomp_free_memmap for the > given device before freeing the rest of the memory map. > > The patch also fixes a thinko that was revealed in image unloading in > the NVPTX backend. Tested for libgomp with PTX offloading. Confirming that both nvptx (PR65904 fixed) and (emulated) intelmic (no changes) testing look fine. > OK for trunk? > ChangeLog > > libgomp/ Don't forget to add the PR marker here. > --- a/libgomp/plugin/plugin-nvptx.c > +++ b/libgomp/plugin/plugin-nvptx.c > @@ -1687,23 +1695,22 @@ GOMP_OFFLOAD_load_image (int ord, void *target_data, > } > > void > -GOMP_OFFLOAD_unload_image (int tid __attribute__((unused)), void *target_data) > +GOMP_OFFLOAD_unload_image (int ord, void *target_data) > { > - void **img_header = (void **) target_data; > - struct targ_fn_descriptor *targ_fns > - = (struct targ_fn_descriptor *) img_header[0]; > struct ptx_image_data *image, *prev = NULL, *newhd = NULL; > > - free (targ_fns); > + nvptx_attach_host_thread_to_device (ord); By the way, do we need to lock ptx_devices in libgomp/plugin/plugin-nvptx.c:nvptx_attach_host_thread_to_device and libgomp/plugin/plugin-nvptx.c:GOMP_OFFLOAD_openacc_create_thread_data? (ptx_dev_lock? If yes, its definition as well as instantiated_devices should be moved to a more appropriate place, probably?) Also, several accesses to instantiated_devices are not locked by ptx_dev_lock but should be, from my cursory review. > --- a/libgomp/target.c > +++ b/libgomp/target.c > @@ -797,32 +797,79 @@ GOMP_offload_register (void *host_table, enum offload_target_type target_type, > gomp_mutex_unlock (®ister_lock); > } > > -/* This function should be called from every offload image while unloading. > - It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of > - the target, and TARGET_DATA needed by target plugin. */ > +/* DEVICEP should be locked on entry, and remains locked on exit. */ (I'm not a native speaker, but would use what I consider to be more explicit/precise language: »must be locked« instead of »should be«. I'll be happy to learn should they mean the same thing?) > > -void > -GOMP_offload_unregister (void *host_table, enum offload_target_type target_type, > - void *target_data) > +static void > +gomp_deoffload_image_from_device (struct gomp_device_descr *devicep, > + void *host_table, void *target_data) > { > +/* This function should be called from every offload image while unloading. s%from%for%, I think? (And, s%should%must%, again?) > + It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of > + the target, and TARGET_DATA needed by target plugin. */ > + > +void > +GOMP_offload_unregister (void *host_table, enum offload_target_type target_type, > + void *target_data) > +{ > -/* Free address mapping tables. MM must be locked on entry, and remains locked > - on return. */ > +/* Free address mapping tables for an active device DEVICEP. This includes > + both mapped offload functions/variables, and mapped user data regions. > + To be used before shutting a device down: subsequently reinitialising the > + device will repopulate the offload image mappings. */ > > attribute_hidden void > -gomp_free_memmap (struct splay_tree_s *mem_map) > +gomp_free_memmap (struct gomp_device_descr *devicep) > { > + int i; > + struct splay_tree_s *mem_map = &devicep->mem_map; > + > + assert (devicep->is_initialized); > + > + gomp_mutex_lock (&devicep->lock); Need to lock before first access to *devicep? > + > + /* Unmap offload images that are registered to this device. */ > + for (i = 0; i < num_offload_images; i++) > + { > + struct offload_image_descr *image = &offload_images[i]; Need to take register_lock when accessing offload_images? > + if (image->type == devicep->type) > + { > + void *host_table = image->host_table; > + void *target_data = image->target_data; > + > + gomp_deoffload_image_from_device (devicep, host_table, target_data); > + } > + } > + > + /* Clear other mappings. */ > while (mem_map->root) > { > struct target_mem_desc *tgt = mem_map->root->key.tgt; > > splay_tree_remove (mem_map, &mem_map->root->key); > - free (tgt->array); > + > + if (tgt->list_count == 0) > + assert (!tgt->array); > + else > + free (tgt->array); > + > free (tgt); > } > + > + gomp_mutex_unlock (&devicep->lock); > } Grüße, Thomas [-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] Fix OpenACC shutdown and PTX image unloading (PR65904) 2015-05-06 8:33 ` Thomas Schwinge @ 2015-05-07 10:38 ` Julian Brown 0 siblings, 0 replies; 4+ messages in thread From: Julian Brown @ 2015-05-07 10:38 UTC (permalink / raw) To: Thomas Schwinge; +Cc: gcc-patches, Jakub Jelinek, Ilya Verbin [-- Attachment #1: Type: text/plain, Size: 5193 bytes --] On Wed, 6 May 2015 10:32:56 +0200 Thomas Schwinge <thomas@codesourcery.com> wrote: > Hi! > > On Fri, 1 May 2015 10:47:19 +0100, Julian Brown > <julian@codesourcery.com> wrote: > > The patch also fixes a thinko that was revealed in image unloading > > in the NVPTX backend. Tested for libgomp with PTX offloading. > > Confirming that both nvptx (PR65904 fixed) and (emulated) intelmic (no > changes) testing look fine. Thanks for testing! > By the way, do we need to lock ptx_devices in > libgomp/plugin/plugin-nvptx.c:nvptx_attach_host_thread_to_device and > libgomp/plugin/plugin-nvptx.c:GOMP_OFFLOAD_openacc_create_thread_data? > (ptx_dev_lock? If yes, its definition as well as instantiated_devices > should be moved to a more appropriate place, probably?) Probably yes (though I'm not sure what you mean about moving the instantiated_devices and ptx_dev_lock to a more appropriate place?). > Also, several > accesses to instantiated_devices are not locked by ptx_dev_lock but > should be, from my cursory review. I'm not sure about that. > > --- a/libgomp/target.c > > +++ b/libgomp/target.c > > @@ -797,32 +797,79 @@ GOMP_offload_register (void *host_table, enum > > offload_target_type target_type, gomp_mutex_unlock (®ister_lock); > > } > > > > -/* This function should be called from every offload image while > > unloading. > > - It gets the descriptor of the host func and var tables > > HOST_TABLE, TYPE of > > - the target, and TARGET_DATA needed by target plugin. */ > > +/* DEVICEP should be locked on entry, and remains locked on exit. > > */ > > (I'm not a native speaker, but would use what I consider to be more > explicit/precise language: »must be locked« instead of »should be«. > I'll be happy to learn should they mean the same thing?) I've changed the wording in a couple of comments. > > > > -void > > -GOMP_offload_unregister (void *host_table, enum > > offload_target_type target_type, > > - void *target_data) > > +static void > > +gomp_deoffload_image_from_device (struct gomp_device_descr > > *devicep, > > + void *host_table, void > > *target_data) { > > > +/* This function should be called from every offload image while > > unloading. > > s%from%for%, I think? (And, s%should%must%, again?) No, this really is "from" -- this comment wasn't actually added by my patch, just moved. I'm also not sure about "should" in this instance -- unloading an image is already a corner-case, and maybe there are circumstances in which it'd be impossible for some given object to call the function? > > + It gets the descriptor of the host func and var tables > > HOST_TABLE, TYPE of > > + the target, and TARGET_DATA needed by target plugin. */ > > + > > +void > > +GOMP_offload_unregister (void *host_table, enum > > offload_target_type target_type, > > + void *target_data) > > +{ > > > -/* Free address mapping tables. MM must be locked on entry, and > > remains locked > > - on return. */ > > +/* Free address mapping tables for an active device DEVICEP. This > > includes > > + both mapped offload functions/variables, and mapped user data > > regions. > > + To be used before shutting a device down: subsequently > > reinitialising the > > + device will repopulate the offload image mappings. */ > > > > attribute_hidden void > > -gomp_free_memmap (struct splay_tree_s *mem_map) > > +gomp_free_memmap (struct gomp_device_descr *devicep) > > { > > + int i; > > + struct splay_tree_s *mem_map = &devicep->mem_map; > > + > > + assert (devicep->is_initialized); > > + > > + gomp_mutex_lock (&devicep->lock); > > Need to lock before first access to *devicep? Fixed. > > + > > + /* Unmap offload images that are registered to this device. */ > > + for (i = 0; i < num_offload_images; i++) > > + { > > + struct offload_image_descr *image = &offload_images[i]; > > Need to take register_lock when accessing offload_images? This too. Retested for libgomp/NVPTX. OK for trunk now? Thanks, Julian ChangeLog PR libgomp/65904 libgomp/ * libgomp.h (gomp_free_memmap): Update prototype. * oacc-init.c (acc_shutdown_1): Pass device descriptor to gomp_free_memmap. Don't lock device around call. * target.c (gomp_map_vars): Initialise tgt->array to NULL before early exit. (GOMP_offload_unregister): Split out and call... (gomp_deoffload_image_from_device): This new function. (gomp_free_memmap): Call gomp_deoffload_image_from_device. * plugin/nvptx.c (struct ptx_image_data): Add ord, fn_descs fields. (nvptx_init): Tweak comment. (nvptx_attach_host_thread_to_device): Add locking with ptx_dev_lock around ptx_devices accesses. (GOMP_OFFLOAD_load_image): Populate new ptx_image_data fields. (GOMP_OFFLOAD_unload_image): Switch to ORD'th device before freeing images, and use fn_descs field from ptx_image_data instead of incorrectly using a pointer derived from target_data. (GOMP_OFFLOAD_openacc_create_thread_data): Add locking around ptx_devices accesses. [-- Attachment #2: openacc-shutdown-unloading-fixes-2.diff --] [-- Type: text/x-patch, Size: 12525 bytes --] Index: libgomp/target.c =================================================================== --- libgomp/target.c (revision 222861) +++ libgomp/target.c (working copy) @@ -178,6 +178,7 @@ gomp_map_vars (struct gomp_device_descr tgt->list_count = mapnum; tgt->refcount = 1; tgt->device_descr = devicep; + tgt->array = NULL; if (mapnum == 0) return tgt; @@ -275,7 +276,6 @@ gomp_map_vars (struct gomp_device_descr if (is_target) tgt_size = mapnum * sizeof (void *); - tgt->array = NULL; if (not_found_cnt) { tgt->array = gomp_malloc (not_found_cnt * sizeof (*tgt->array)); @@ -797,32 +797,79 @@ GOMP_offload_register (void *host_table, gomp_mutex_unlock (®ister_lock); } -/* This function should be called from every offload image while unloading. - It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of - the target, and TARGET_DATA needed by target plugin. */ +/* DEVICEP must be locked on entry, and remains locked on exit. */ -void -GOMP_offload_unregister (void *host_table, enum offload_target_type target_type, - void *target_data) +static void +gomp_deoffload_image_from_device (struct gomp_device_descr *devicep, + void *host_table, void *target_data) { void **host_func_table = ((void ***) host_table)[0]; void **host_funcs_end = ((void ***) host_table)[1]; void **host_var_table = ((void ***) host_table)[2]; void **host_vars_end = ((void ***) host_table)[3]; - int i; /* The func table contains only addresses, the var table contains addresses and corresponding sizes. */ int num_funcs = host_funcs_end - host_func_table; int num_vars = (host_vars_end - host_var_table) / 2; + int j; + + devicep->unload_image_func (devicep->target_id, target_data); + + /* Remove mapping from splay tree. */ + struct splay_tree_key_s k; + splay_tree_key node = NULL; + if (num_funcs > 0) + { + k.host_start = (uintptr_t) host_func_table[0]; + k.host_end = k.host_start + 1; + node = splay_tree_lookup (&devicep->mem_map, &k); + } + else if (num_vars > 0) + { + k.host_start = (uintptr_t) host_var_table[0]; + k.host_end = k.host_start + (uintptr_t) host_var_table[1]; + node = splay_tree_lookup (&devicep->mem_map, &k); + } + + for (j = 0; j < num_funcs; j++) + { + k.host_start = (uintptr_t) host_func_table[j]; + k.host_end = k.host_start + 1; + splay_tree_remove (&devicep->mem_map, &k); + } + + for (j = 0; j < num_vars; j++) + { + k.host_start = (uintptr_t) host_var_table[j * 2]; + k.host_end = k.host_start + (uintptr_t) host_var_table[j * 2 + 1]; + splay_tree_remove (&devicep->mem_map, &k); + } + + if (node) + { + free (node->tgt); + free (node); + } +} + +/* This function should be called from every offload image while unloading. + It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of + the target, and TARGET_DATA needed by target plugin. */ + +void +GOMP_offload_unregister (void *host_table, enum offload_target_type target_type, + void *target_data) +{ + int i; gomp_mutex_lock (®ister_lock); /* Unload image from all initialized devices. */ for (i = 0; i < num_devices; i++) { - int j; struct gomp_device_descr *devicep = &devices[i]; + gomp_mutex_lock (&devicep->lock); if (devicep->type != target_type || !devicep->is_initialized) { @@ -830,44 +877,7 @@ GOMP_offload_unregister (void *host_tabl continue; } - devicep->unload_image_func (devicep->target_id, target_data); - - /* Remove mapping from splay tree. */ - struct splay_tree_key_s k; - splay_tree_key node = NULL; - if (num_funcs > 0) - { - k.host_start = (uintptr_t) host_func_table[0]; - k.host_end = k.host_start + 1; - node = splay_tree_lookup (&devicep->mem_map, &k); - } - else if (num_vars > 0) - { - k.host_start = (uintptr_t) host_var_table[0]; - k.host_end = k.host_start + (uintptr_t) host_var_table[1]; - node = splay_tree_lookup (&devicep->mem_map, &k); - } - - for (j = 0; j < num_funcs; j++) - { - k.host_start = (uintptr_t) host_func_table[j]; - k.host_end = k.host_start + 1; - splay_tree_remove (&devicep->mem_map, &k); - } - - for (j = 0; j < num_vars; j++) - { - k.host_start = (uintptr_t) host_var_table[j * 2]; - k.host_end = k.host_start + (uintptr_t) host_var_table[j * 2 + 1]; - splay_tree_remove (&devicep->mem_map, &k); - } - - if (node) - { - free (node->tgt); - free (node); - } - + gomp_deoffload_image_from_device (devicep, host_table, target_data); gomp_mutex_unlock (&devicep->lock); } @@ -903,20 +913,54 @@ gomp_init_device (struct gomp_device_des devicep->is_initialized = true; } -/* Free address mapping tables. MM must be locked on entry, and remains locked - on return. */ +/* Free address mapping tables for an active device DEVICEP. This includes + both mapped offload functions/variables, and mapped user data regions. + To be used before shutting a device down: subsequently reinitialising the + device will repopulate the offload image mappings. */ attribute_hidden void -gomp_free_memmap (struct splay_tree_s *mem_map) +gomp_free_memmap (struct gomp_device_descr *devicep) { + int i; + struct splay_tree_s *mem_map = &devicep->mem_map; + + gomp_mutex_lock (&devicep->lock); + + assert (devicep->is_initialized); + + gomp_mutex_lock (®ister_lock); + + /* Unmap offload images that are registered to this device. */ + for (i = 0; i < num_offload_images; i++) + { + struct offload_image_descr *image = &offload_images[i]; + if (image->type == devicep->type) + { + void *host_table = image->host_table; + void *target_data = image->target_data; + + gomp_deoffload_image_from_device (devicep, host_table, target_data); + } + } + + gomp_mutex_unlock (®ister_lock); + + /* Clear other mappings. */ while (mem_map->root) { struct target_mem_desc *tgt = mem_map->root->key.tgt; splay_tree_remove (mem_map, &mem_map->root->key); - free (tgt->array); + + if (tgt->list_count == 0) + assert (!tgt->array); + else + free (tgt->array); + free (tgt); } + + gomp_mutex_unlock (&devicep->lock); } /* This function de-initializes the target device, specified by DEVICEP. Index: libgomp/libgomp.h =================================================================== --- libgomp/libgomp.h (revision 222861) +++ libgomp/libgomp.h (working copy) @@ -777,7 +777,7 @@ extern struct target_mem_desc *gomp_map_ extern void gomp_copy_from_async (struct target_mem_desc *); extern void gomp_unmap_vars (struct target_mem_desc *, bool); extern void gomp_init_device (struct gomp_device_descr *); -extern void gomp_free_memmap (struct splay_tree_s *); +extern void gomp_free_memmap (struct gomp_device_descr *); extern void gomp_fini_device (struct gomp_device_descr *); /* work.c */ Index: libgomp/plugin/plugin-nvptx.c =================================================================== --- libgomp/plugin/plugin-nvptx.c (revision 222861) +++ libgomp/plugin/plugin-nvptx.c (working copy) @@ -334,8 +334,10 @@ struct ptx_event struct ptx_image_data { + int ord; void *target_data; CUmodule module; + struct targ_fn_descriptor *fn_descs; struct ptx_image_data *next; }; @@ -589,7 +591,7 @@ select_stream_for_async (int async, pthr } /* Initialize the device. Return TRUE on success, else FALSE. PTX_DEV_LOCK - should be locked on entry and remains locked on exit. */ + must be locked on entry and remains locked on exit. */ static bool nvptx_init (void) { @@ -643,12 +645,17 @@ nvptx_attach_host_thread_to_device (int { CUcontext old_ctx; + pthread_mutex_lock (&ptx_dev_lock); + ptx_dev = ptx_devices[n]; assert (ptx_dev); r = cuCtxGetCurrent (&thd_ctx); if (r != CUDA_SUCCESS) - GOMP_PLUGIN_fatal ("cuCtxGetCurrent error: %s", cuda_error (r)); + { + pthread_mutex_unlock (&ptx_dev_lock); + GOMP_PLUGIN_fatal ("cuCtxGetCurrent error: %s", cuda_error (r)); + } /* We don't necessarily have a current context (e.g. if it has been destroyed. Pop it if we do though. */ @@ -656,10 +663,14 @@ nvptx_attach_host_thread_to_device (int { r = cuCtxPopCurrent (&old_ctx); if (r != CUDA_SUCCESS) - GOMP_PLUGIN_fatal ("cuCtxPopCurrent error: %s", cuda_error (r)); + { + pthread_mutex_unlock (&ptx_dev_lock); + GOMP_PLUGIN_fatal ("cuCtxPopCurrent error: %s", cuda_error (r)); + } } r = cuCtxPushCurrent (ptx_dev->ctx); + pthread_mutex_unlock (&ptx_dev_lock); if (r != CUDA_SUCCESS) GOMP_PLUGIN_fatal ("cuCtxPushCurrent error: %s", cuda_error (r)); } @@ -1625,13 +1636,6 @@ GOMP_OFFLOAD_load_image (int ord, void * link_ptx (&module, img_header[0]); - pthread_mutex_lock (&ptx_image_lock); - new_image = GOMP_PLUGIN_malloc (sizeof (struct ptx_image_data)); - new_image->target_data = target_data; - new_image->module = module; - new_image->next = ptx_images; - ptx_images = new_image; - pthread_mutex_unlock (&ptx_image_lock); /* The mkoffload utility emits a table of pointers/integers at the start of each offload image: @@ -1652,8 +1656,21 @@ GOMP_OFFLOAD_load_image (int ord, void * *target_table = GOMP_PLUGIN_malloc (sizeof (struct addr_pair) * (fn_entries + var_entries)); - targ_fns = GOMP_PLUGIN_malloc (sizeof (struct targ_fn_descriptor) - * fn_entries); + if (fn_entries > 0) + targ_fns = GOMP_PLUGIN_malloc (sizeof (struct targ_fn_descriptor) + * fn_entries); + else + targ_fns = NULL; + + pthread_mutex_lock (&ptx_image_lock); + new_image = GOMP_PLUGIN_malloc (sizeof (struct ptx_image_data)); + new_image->ord = ord; + new_image->target_data = target_data; + new_image->module = module; + new_image->fn_descs = targ_fns; + new_image->next = ptx_images; + ptx_images = new_image; + pthread_mutex_unlock (&ptx_image_lock); for (i = 0; i < fn_entries; i++) { @@ -1687,23 +1704,22 @@ GOMP_OFFLOAD_load_image (int ord, void * } void -GOMP_OFFLOAD_unload_image (int tid __attribute__((unused)), void *target_data) +GOMP_OFFLOAD_unload_image (int ord, void *target_data) { - void **img_header = (void **) target_data; - struct targ_fn_descriptor *targ_fns - = (struct targ_fn_descriptor *) img_header[0]; struct ptx_image_data *image, *prev = NULL, *newhd = NULL; - free (targ_fns); + nvptx_attach_host_thread_to_device (ord); pthread_mutex_lock (&ptx_image_lock); for (image = ptx_images; image != NULL;) { struct ptx_image_data *next = image->next; - if (image->target_data == target_data) + if (image->ord == ord && image->target_data == target_data) { cuModuleUnload (image->module); + if (image->fn_descs) + free (image->fn_descs); free (image); if (prev) prev->next = next; @@ -1833,13 +1849,18 @@ GOMP_OFFLOAD_openacc_create_thread_data CUresult r; CUcontext thd_ctx; + pthread_mutex_lock (&ptx_dev_lock); + ptx_dev = ptx_devices[ord]; assert (ptx_dev); r = cuCtxGetCurrent (&thd_ctx); if (r != CUDA_SUCCESS) - GOMP_PLUGIN_fatal ("cuCtxGetCurrent error: %s", cuda_error (r)); + { + pthread_mutex_unlock (&ptx_dev_lock); + GOMP_PLUGIN_fatal ("cuCtxGetCurrent error: %s", cuda_error (r)); + } assert (ptx_dev->ctx); @@ -1847,12 +1868,17 @@ GOMP_OFFLOAD_openacc_create_thread_data { r = cuCtxPushCurrent (ptx_dev->ctx); if (r != CUDA_SUCCESS) - GOMP_PLUGIN_fatal ("cuCtxPushCurrent error: %s", cuda_error (r)); + { + pthread_mutex_unlock (&ptx_dev_lock); + GOMP_PLUGIN_fatal ("cuCtxPushCurrent error: %s", cuda_error (r)); + } } nvthd->current_stream = ptx_dev->null_stream; nvthd->ptx_dev = ptx_dev; + pthread_mutex_unlock (&ptx_dev_lock); + return (void *) nvthd; } Index: libgomp/oacc-init.c =================================================================== --- libgomp/oacc-init.c (revision 222862) +++ libgomp/oacc-init.c (working copy) @@ -244,9 +244,7 @@ acc_shutdown_1 (acc_device_t d) if (walk->dev) { - gomp_mutex_lock (&walk->dev->lock); - gomp_free_memmap (&walk->dev->mem_map); - gomp_mutex_unlock (&walk->dev->lock); + gomp_free_memmap (walk->dev); walk->dev = NULL; walk->base_dev = NULL; ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <f2cefaab-aaa4-03e0-ea6f-5e22ccd44997@suse.de>]
* Re: [PATCH] Fix OpenACC shutdown and PTX image unloading (PR65904) [not found] <f2cefaab-aaa4-03e0-ea6f-5e22ccd44997@suse.de> @ 2019-01-15 10:38 ` Tom de Vries 0 siblings, 0 replies; 4+ messages in thread From: Tom de Vries @ 2019-01-15 10:38 UTC (permalink / raw) To: Julian Brown; +Cc: Thomas Schwinge, Jakub Jelinek, gcc-patches [ add gcc-patches@ ] On 15-01-19 11:38, Tom de Vries wrote: > Hi > > Copied from here ( > https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00532.html ): >> This too. Retested for libgomp/NVPTX. >> >> OK for trunk now? >> > > The plugin-nvptx.c part looks ok to me, for stage 1. > > Thanks, > - Tom > >> Thanks, >> >> Julian >> >> ChangeLog >> >> PR libgomp/65904 >> >> libgomp/ >> * libgomp.h (gomp_free_memmap): Update prototype. >> * oacc-init.c (acc_shutdown_1): Pass device descriptor to >> gomp_free_memmap. Don't lock device around call. >> * target.c (gomp_map_vars): Initialise tgt->array to NULL before >> early exit. >> (GOMP_offload_unregister): Split out and call... >> (gomp_deoffload_image_from_device): This new function. >> (gomp_free_memmap): Call gomp_deoffload_image_from_device. >> * plugin/nvptx.c (struct ptx_image_data): Add ord, fn_descs fields. >> (nvptx_init): Tweak comment. >> (nvptx_attach_host_thread_to_device): Add locking with ptx_dev_lock >> around ptx_devices accesses. >> (GOMP_OFFLOAD_load_image): Populate new ptx_image_data fields. >> (GOMP_OFFLOAD_unload_image): Switch to ORD'th device before freeing >> images, and use fn_descs field from ptx_image_data instead of >> incorrectly using a pointer derived from target_data. >> (GOMP_OFFLOAD_openacc_create_thread_data): Add locking around >> ptx_devices accesses. > ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2019-01-15 10:38 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-05-01 9:47 [PATCH] Fix OpenACC shutdown and PTX image unloading (PR65904) Julian Brown 2015-05-06 8:33 ` Thomas Schwinge 2015-05-07 10:38 ` Julian Brown [not found] <f2cefaab-aaa4-03e0-ea6f-5e22ccd44997@suse.de> 2019-01-15 10:38 ` Tom de Vries
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).