public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Merge current set of OpenACC changes from gomp-4_0-branch
@ 2015-01-15 20:44 Thomas Schwinge
  2015-01-15 20:47 ` Jeff Law
                   ` (10 more replies)
  0 siblings, 11 replies; 92+ messages in thread
From: Thomas Schwinge @ 2015-01-15 20:44 UTC (permalink / raw)
  To: gcc-patches


[-- Attachment #1.1: Type: text/plain, Size: 83002 bytes --]

Hi!

In r219682, I have committed to trunk our current set of OpenACC changes,
which we had prepared on gomp-4_0-branch.  Thanks to everyone who has
been contributing!

Note that this is an experimental feature, incomplete, and subject to
change in future versions of GCC.  We shall update -- and keep updated --
<https://gcc.gnu.org/wiki/OpenACC>, to track the current status.  (Please
come back to that page in a few days, it has not yet been updated.)

Please note that there are still a handful of patches pending (posted
weeks ago, need to ping) that are needed for nvptx offloading, so that's
not yet functional.

Here's the commit log.  The patch itself is too big to post inline, so
please find it attached, gzipped.

commit ca4c354552639d3bac6d1f690d9e04017d7d80ed
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Thu Jan 15 20:11:12 2015 +0000

    Merge current set of OpenACC changes from gomp-4_0-branch.
    
    	contrib/
    	* gcc_update (files_and_dependencies): Update rules for new
    	libgomp/plugin/Makefrag.am and libgomp/plugin/configfrag.ac files.
    	gcc/
    	* builtin-types.def (BT_FN_VOID_INT_INT_VAR)
    	(BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR_INT_INT_VAR)
    	(BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR):
    	New function types.
    	* builtins.c: Include "gomp-constants.h".
    	(expand_builtin_acc_on_device): New function.
    	(expand_builtin, is_inexpensive_builtin): Handle
    	BUILT_IN_ACC_ON_DEVICE.
    	* builtins.def (DEF_GOACC_BUILTIN, DEF_GOACC_BUILTIN_COMPILER):
    	New macros.
    	* cgraph.c (cgraph_node::create): Consider flag_openacc next to
    	flag_openmp.
    	* config.gcc <nvptx-*> (tm_file): Add nvptx/offload.h.
    	<*-intelmic-* | *-intelmicemul-*> (tm_file): Add
    	i386/intelmic-offload.h.
    	* gcc.c (LINK_COMMAND_SPEC, GOMP_SELF_SPECS): For -fopenacc, link
    	to libgomp and its dependencies.
    	* config/arc/arc.h (LINK_COMMAND_SPEC): Likewise.
    	* config/darwin.h (LINK_COMMAND_SPEC_A): Likewise.
    	* config/i386/mingw32.h (GOMP_SELF_SPECS): Likewise.
    	* config/ia64/hpux.h (LIB_SPEC): Likewise.
    	* config/pa/pa-hpux11.h (LIB_SPEC): Likewise.
    	* config/pa/pa64-hpux.h (LIB_SPEC): Likewise.
    	* doc/generic.texi: Update for OpenACC changes.
    	* doc/gimple.texi: Likewise.
    	* doc/invoke.texi: Likewise.
    	* doc/sourcebuild.texi: Likewise.
    	* gimple-pretty-print.c (dump_gimple_omp_for): Handle
    	GF_OMP_FOR_KIND_OACC_LOOP.
    	(dump_gimple_omp_target): Handle GF_OMP_TARGET_KIND_OACC_KERNELS,
    	GF_OMP_TARGET_KIND_OACC_PARALLEL, GF_OMP_TARGET_KIND_OACC_DATA,
    	GF_OMP_TARGET_KIND_OACC_UPDATE,
    	GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA.
    	Dump more data.
    	* gimple.c: Update comments for OpenACC changes.
    	* gimple.def: Likewise.
    	* gimple.h: Likewise.
    	(enum gf_mask): Add GF_OMP_FOR_KIND_OACC_LOOP,
    	GF_OMP_TARGET_KIND_OACC_PARALLEL, GF_OMP_TARGET_KIND_OACC_KERNELS,
    	GF_OMP_TARGET_KIND_OACC_DATA, GF_OMP_TARGET_KIND_OACC_UPDATE,
    	GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA.
    	(gimple_omp_for_cond, gimple_omp_for_set_cond): Sort in the
    	appropriate place.
    	(is_gimple_omp_oacc, is_gimple_omp_offloaded): New functions.
    	* gimplify.c: Include "gomp-constants.h".
    	Update comments for OpenACC changes.
    	(is_gimple_stmt): Handle OACC_PARALLEL, OACC_KERNELS, OACC_DATA,
    	OACC_HOST_DATA, OACC_DECLARE, OACC_UPDATE, OACC_ENTER_DATA,
    	OACC_EXIT_DATA, OACC_CACHE, OACC_LOOP.
    	(gimplify_scan_omp_clauses, gimplify_adjust_omp_clauses): Handle
    	OMP_CLAUSE__CACHE_, OMP_CLAUSE_ASYNC, OMP_CLAUSE_WAIT,
    	OMP_CLAUSE_NUM_GANGS, OMP_CLAUSE_NUM_WORKERS,
    	OMP_CLAUSE_VECTOR_LENGTH, OMP_CLAUSE_GANG, OMP_CLAUSE_WORKER,
    	OMP_CLAUSE_VECTOR, OMP_CLAUSE_DEVICE_RESIDENT,
    	OMP_CLAUSE_USE_DEVICE, OMP_CLAUSE_INDEPENDENT, OMP_CLAUSE_AUTO,
    	OMP_CLAUSE_SEQ.
    	(gimplify_adjust_omp_clauses_1, gimplify_adjust_omp_clauses): Use
    	GOMP_MAP_* instead of OMP_CLAUSE_MAP_*.  Use
    	OMP_CLAUSE_SET_MAP_KIND.
    	(gimplify_oacc_cache): New function.
    	(gimplify_omp_for): Handle OACC_LOOP.
    	(gimplify_omp_workshare): Handle OACC_KERNELS, OACC_PARALLEL,
    	OACC_DATA.
    	(gimplify_omp_target_update): Handle OACC_ENTER_DATA,
    	OACC_EXIT_DATA, OACC_UPDATE.
    	(gimplify_expr): Handle OACC_LOOP, OACC_CACHE, OACC_HOST_DATA,
    	OACC_DECLARE, OACC_KERNELS, OACC_PARALLEL, OACC_DATA,
    	OACC_ENTER_DATA, OACC_EXIT_DATA, OACC_UPDATE.
    	(gimplify_body): Consider flag_openacc next to flag_openmp.
    	* lto-streamer-out.c: Include "gomp-constants.h".
    	* omp-builtins.def (BUILT_IN_ACC_GET_DEVICE_TYPE)
    	(BUILT_IN_GOACC_DATA_START, BUILT_IN_GOACC_DATA_END)
    	(BUILT_IN_GOACC_ENTER_EXIT_DATA, BUILT_IN_GOACC_PARALLEL)
    	(BUILT_IN_GOACC_UPDATE, BUILT_IN_GOACC_WAIT)
    	(BUILT_IN_GOACC_GET_THREAD_NUM, BUILT_IN_GOACC_GET_NUM_THREADS)
    	(BUILT_IN_ACC_ON_DEVICE): New builtins.
    	* omp-low.c: Include "gomp-constants.h".
    	Update comments for OpenACC changes.
    	(struct omp_context): Add reduction_map, gwv_below, gwv_this
    	members.
    	(extract_omp_for_data, use_pointer_for_field, install_var_field)
    	(new_omp_context, delete_omp_context, scan_sharing_clauses)
    	(create_omp_child_function, scan_omp_for, scan_omp_target)
    	(check_omp_nesting_restrictions, lower_reduction_clauses)
    	(build_omp_regions_1, diagnose_sb_0, make_gimple_omp_edges):
    	Update for OpenACC changes.
    	(scan_sharing_clauses): Handle OMP_CLAUSE_NUM_GANGS:
    	OMP_CLAUSE_NUM_WORKERS: OMP_CLAUSE_VECTOR_LENGTH,
    	OMP_CLAUSE_ASYNC, OMP_CLAUSE_WAIT, OMP_CLAUSE_GANG,
    	OMP_CLAUSE_WORKER, OMP_CLAUSE_VECTOR, OMP_CLAUSE_DEVICE_RESIDENT,
    	OMP_CLAUSE_USE_DEVICE, OMP_CLAUSE__CACHE_, OMP_CLAUSE_INDEPENDENT,
    	OMP_CLAUSE_AUTO, OMP_CLAUSE_SEQ.  Use GOMP_MAP_* instead of
    	OMP_CLAUSE_MAP_*.
    	(expand_omp_for_static_nochunk, expand_omp_for_static_chunk):
    	Handle GF_OMP_FOR_KIND_OACC_LOOP.
    	(expand_omp_target, lower_omp_target): Handle
    	GF_OMP_TARGET_KIND_OACC_PARALLEL, GF_OMP_TARGET_KIND_OACC_KERNELS,
    	GF_OMP_TARGET_KIND_OACC_UPDATE,
    	GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA,
    	GF_OMP_TARGET_KIND_OACC_DATA.
    	(pass_expand_omp::execute, execute_lower_omp)
    	(pass_diagnose_omp_blocks::gate): Consider flag_openacc next to
    	flag_openmp.
    	(offload_symbol_decl): New variable.
    	(oacc_get_reduction_array_id, oacc_max_threads)
    	(get_offload_symbol_decl, get_base_type, lookup_oacc_reduction)
    	(maybe_lookup_oacc_reduction, enclosing_target_ctx)
    	(oacc_loop_or_target_p, oacc_lower_reduction_var_helper)
    	(oacc_gimple_assign, oacc_initialize_reduction_data)
    	(oacc_finalize_reduction_data, oacc_process_reduction_data): New
    	functions.
    	(is_targetreg_ctx): Remove function.
    	* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE__CACHE_,
    	OMP_CLAUSE_DEVICE_RESIDENT, OMP_CLAUSE_USE_DEVICE,
    	OMP_CLAUSE_GANG, OMP_CLAUSE_ASYNC, OMP_CLAUSE_WAIT,
    	OMP_CLAUSE_AUTO, OMP_CLAUSE_SEQ, OMP_CLAUSE_INDEPENDENT,
    	OMP_CLAUSE_WORKER, OMP_CLAUSE_VECTOR, OMP_CLAUSE_NUM_GANGS,
    	OMP_CLAUSE_NUM_WORKERS, OMP_CLAUSE_VECTOR_LENGTH.
    	* tree.c (omp_clause_code_name, walk_tree_1): Update accordingly.
    	* tree.h (OMP_CLAUSE_GANG_EXPR, OMP_CLAUSE_GANG_STATIC_EXPR)
    	(OMP_CLAUSE_ASYNC_EXPR, OMP_CLAUSE_WAIT_EXPR)
    	(OMP_CLAUSE_VECTOR_EXPR, OMP_CLAUSE_WORKER_EXPR)
    	(OMP_CLAUSE_NUM_GANGS_EXPR, OMP_CLAUSE_NUM_WORKERS_EXPR)
    	(OMP_CLAUSE_VECTOR_LENGTH_EXPR): New macros.
    	* tree-core.h: Update comments for OpenACC changes.
    	(enum omp_clause_map_kind): Remove.
    	(struct tree_omp_clause): Change type of map_kind member from enum
    	omp_clause_map_kind to unsigned char.
    	* tree-inline.c: Update comments for OpenACC changes.
    	* tree-nested.c: Likewise.  Include "gomp-constants.h".
    	(convert_nonlocal_reference_stmt, convert_local_reference_stmt)
    	(convert_tramp_reference_stmt, convert_gimple_call): Update for
    	OpenACC changes.  Use GOMP_MAP_* instead of OMP_CLAUSE_MAP_*.  Use
    	OMP_CLAUSE_SET_MAP_KIND.
    	* tree-pretty-print.c: Include "gomp-constants.h".
    	(dump_omp_clause): Handle OMP_CLAUSE_DEVICE_RESIDENT,
    	OMP_CLAUSE_USE_DEVICE, OMP_CLAUSE__CACHE_, OMP_CLAUSE_GANG,
    	OMP_CLAUSE_ASYNC, OMP_CLAUSE_AUTO, OMP_CLAUSE_SEQ,
    	OMP_CLAUSE_WAIT, OMP_CLAUSE_WORKER, OMP_CLAUSE_VECTOR,
    	OMP_CLAUSE_NUM_GANGS, OMP_CLAUSE_NUM_WORKERS,
    	OMP_CLAUSE_VECTOR_LENGTH, OMP_CLAUSE_INDEPENDENT.  Use GOMP_MAP_*
    	instead of OMP_CLAUSE_MAP_*.
    	(dump_generic_node): Handle OACC_PARALLEL, OACC_KERNELS,
    	OACC_DATA, OACC_HOST_DATA, OACC_DECLARE, OACC_UPDATE,
    	OACC_ENTER_DATA, OACC_EXIT_DATA, OACC_CACHE, OACC_LOOP.
    	* tree-streamer-in.c: Include "gomp-constants.h".
    	(unpack_ts_omp_clause_value_fields) Use GOMP_MAP_* instead of
    	OMP_CLAUSE_MAP_*.  Use OMP_CLAUSE_SET_MAP_KIND.
    	* tree-streamer-out.c: Include "gomp-constants.h".
    	(pack_ts_omp_clause_value_fields): Use GOMP_MAP_* instead of
    	OMP_CLAUSE_MAP_*.
    	* tree.def (OACC_PARALLEL, OACC_KERNELS, OACC_DATA)
    	(OACC_HOST_DATA, OACC_LOOP, OACC_CACHE, OACC_DECLARE)
    	(OACC_ENTER_DATA, OACC_EXIT_DATA, OACC_UPDATE): New tree codes.
    	* tree.c (omp_clause_num_ops): Update accordingly.
    	* tree.h (OMP_BODY, OMP_CLAUSES, OMP_LOOP_CHECK, OMP_CLAUSE_SIZE):
    	Likewise.
    	(OACC_PARALLEL_BODY, OACC_PARALLEL_CLAUSES, OACC_KERNELS_BODY)
    	(OACC_KERNELS_CLAUSES, OACC_DATA_BODY, OACC_DATA_CLAUSES)
    	(OACC_HOST_DATA_BODY, OACC_HOST_DATA_CLAUSES, OACC_CACHE_CLAUSES)
    	(OACC_DECLARE_CLAUSES, OACC_ENTER_DATA_CLAUSES)
    	(OACC_EXIT_DATA_CLAUSES, OACC_UPDATE_CLAUSES)
    	(OACC_KERNELS_COMBINED, OACC_PARALLEL_COMBINED): New macros.
    	* tree.h (OMP_CLAUSE_MAP_KIND): Cast it to enum gomp_map_kind.
    	(OMP_CLAUSE_SET_MAP_KIND): New macro.
    	* varpool.c (varpool_node::get_create): Consider flag_openacc next
    	to flag_openmp.
    	* config/i386/intelmic-offload.h: New file.
    	* config/nvptx/offload.h: Likewise.
    	gcc/ada/
    	* gcc-interface/utils.c (DEF_FUNCTION_TYPE_VAR_8)
    	(DEF_FUNCTION_TYPE_VAR_12): New macros.
    	gcc/c-family/
    	* c.opt (fopenacc): New option.
    	* c-cppbuiltin.c (c_cpp_builtins): Conditionally define _OPENACC.
    	* c-common.c (DEF_FUNCTION_TYPE_VAR_8, DEF_FUNCTION_TYPE_VAR_12):
    	New macros.
    	* c-common.h (c_finish_oacc_wait): New prototype.
    	* c-omp.c: Include "omp-low.h" and "gomp-constants.h".
    	(c_finish_oacc_wait): New function.
    	* c-pragma.c (oacc_pragmas): New variable.
    	(c_pp_lookup_pragma, init_pragma): Handle it.
    	* c-pragma.h (enum pragma_kind): Add PRAGMA_OACC_CACHE,
    	PRAGMA_OACC_DATA, PRAGMA_OACC_ENTER_DATA, PRAGMA_OACC_EXIT_DATA,
    	PRAGMA_OACC_KERNELS, PRAGMA_OACC_LOOP, PRAGMA_OACC_PARALLEL,
    	PRAGMA_OACC_UPDATE, PRAGMA_OACC_WAIT.
    	(enum pragma_omp_clause): Add PRAGMA_OACC_CLAUSE_ASYNC,
    	PRAGMA_OACC_CLAUSE_AUTO, PRAGMA_OACC_CLAUSE_COLLAPSE,
    	PRAGMA_OACC_CLAUSE_COPY, PRAGMA_OACC_CLAUSE_COPYIN,
    	PRAGMA_OACC_CLAUSE_COPYOUT, PRAGMA_OACC_CLAUSE_CREATE,
    	PRAGMA_OACC_CLAUSE_DELETE, PRAGMA_OACC_CLAUSE_DEVICE,
    	PRAGMA_OACC_CLAUSE_DEVICEPTR, PRAGMA_OACC_CLAUSE_FIRSTPRIVATE,
    	PRAGMA_OACC_CLAUSE_GANG, PRAGMA_OACC_CLAUSE_HOST,
    	PRAGMA_OACC_CLAUSE_IF, PRAGMA_OACC_CLAUSE_NUM_GANGS,
    	PRAGMA_OACC_CLAUSE_NUM_WORKERS, PRAGMA_OACC_CLAUSE_PRESENT,
    	PRAGMA_OACC_CLAUSE_PRESENT_OR_COPY,
    	PRAGMA_OACC_CLAUSE_PRESENT_OR_COPYIN,
    	PRAGMA_OACC_CLAUSE_PRESENT_OR_COPYOUT,
    	PRAGMA_OACC_CLAUSE_PRESENT_OR_CREATE, PRAGMA_OACC_CLAUSE_PRIVATE,
    	PRAGMA_OACC_CLAUSE_REDUCTION, PRAGMA_OACC_CLAUSE_SELF,
    	PRAGMA_OACC_CLAUSE_SEQ, PRAGMA_OACC_CLAUSE_VECTOR,
    	PRAGMA_OACC_CLAUSE_VECTOR_LENGTH, PRAGMA_OACC_CLAUSE_WAIT,
    	PRAGMA_OACC_CLAUSE_WORKER.
    	gcc/c/
    	* c-parser.c: Include "gomp-constants.h".
    	(c_parser_omp_clause_map): Use enum gomp_map_kind instead of enum
    	omp_clause_map_kind.  Use GOMP_MAP_* instead of OMP_CLAUSE_MAP_*.
    	Use OMP_CLAUSE_SET_MAP_KIND.
    	(c_parser_pragma): Handle PRAGMA_OACC_ENTER_DATA,
    	PRAGMA_OACC_EXIT_DATA, PRAGMA_OACC_UPDATE.
    	(c_parser_omp_construct): Handle PRAGMA_OACC_CACHE,
    	PRAGMA_OACC_DATA, PRAGMA_OACC_KERNELS, PRAGMA_OACC_LOOP,
    	PRAGMA_OACC_PARALLEL, PRAGMA_OACC_WAIT.
    	(c_parser_omp_clause_name): Handle "auto", "async", "copy",
    	"copyout", "create", "delete", "deviceptr", "gang", "host",
    	"num_gangs", "num_workers", "present", "present_or_copy", "pcopy",
    	"present_or_copyin", "pcopyin", "present_or_copyout", "pcopyout",
    	"present_or_create", "pcreate", "seq", "self", "vector",
    	"vector_length", "wait", "worker".
    	(OACC_DATA_CLAUSE_MASK, OACC_KERNELS_CLAUSE_MASK)
    	(OACC_ENTER_DATA_CLAUSE_MASK, OACC_EXIT_DATA_CLAUSE_MASK)
    	(OACC_LOOP_CLAUSE_MASK, OACC_PARALLEL_CLAUSE_MASK)
    	(OACC_UPDATE_CLAUSE_MASK, OACC_WAIT_CLAUSE_MASK): New macros.
    	(c_parser_omp_variable_list): Handle OMP_CLAUSE__CACHE_.
    	(c_parser_oacc_wait_list, c_parser_oacc_data_clause)
    	(c_parser_oacc_data_clause_deviceptr)
    	(c_parser_omp_clause_num_gangs, c_parser_omp_clause_num_workers)
    	(c_parser_oacc_clause_async, c_parser_oacc_clause_wait)
    	(c_parser_omp_clause_vector_length, c_parser_oacc_all_clauses)
    	(c_parser_oacc_cache, c_parser_oacc_data, c_parser_oacc_kernels)
    	(c_parser_oacc_enter_exit_data, c_parser_oacc_loop)
    	(c_parser_oacc_parallel, c_parser_oacc_update)
    	(c_parser_oacc_wait): New functions.
    	* c-tree.h (c_finish_oacc_parallel, c_finish_oacc_kernels)
    	(c_finish_oacc_data): New prototypes.
    	* c-typeck.c: Include "gomp-constants.h".
    	(handle_omp_array_sections): Handle GOMP_MAP_FORCE_DEVICEPTR.  Use
    	GOMP_MAP_* instead of OMP_CLAUSE_MAP_*.  Use
    	OMP_CLAUSE_SET_MAP_KIND.
    	(c_finish_oacc_parallel, c_finish_oacc_kernels)
    	(c_finish_oacc_data): New functions.
    	(c_finish_omp_clauses): Handle OMP_CLAUSE__CACHE_,
    	OMP_CLAUSE_NUM_GANGS, OMP_CLAUSE_NUM_WORKERS,
    	OMP_CLAUSE_VECTOR_LENGTH, OMP_CLAUSE_ASYNC, OMP_CLAUSE_WAIT,
    	OMP_CLAUSE_AUTO, OMP_CLAUSE_SEQ, OMP_CLAUSE_GANG,
    	OMP_CLAUSE_WORKER, OMP_CLAUSE_VECTOR, and OMP_CLAUSE_MAP's
    	GOMP_MAP_FORCE_DEVICEPTR.
    	gcc/cp/
    	* parser.c: Include "gomp-constants.h".
    	(cp_parser_omp_clause_map): Use enum gomp_map_kind instead of enum
    	omp_clause_map_kind.  Use GOMP_MAP_* instead of OMP_CLAUSE_MAP_*.
    	Use OMP_CLAUSE_SET_MAP_KIND.
    	(cp_parser_omp_construct, cp_parser_pragma): Handle
    	PRAGMA_OACC_CACHE, PRAGMA_OACC_DATA, PRAGMA_OACC_ENTER_DATA,
    	PRAGMA_OACC_EXIT_DATA, PRAGMA_OACC_KERNELS, PRAGMA_OACC_PARALLEL,
    	PRAGMA_OACC_LOOP, PRAGMA_OACC_UPDATE, PRAGMA_OACC_WAIT.
    	(cp_parser_omp_clause_name): Handle "async", "copy", "copyout",
    	"create", "delete", "deviceptr", "host", "num_gangs",
    	"num_workers", "present", "present_or_copy", "pcopy",
    	"present_or_copyin", "pcopyin", "present_or_copyout", "pcopyout",
    	"present_or_create", "pcreate", "vector_length", "wait".
    	(OACC_DATA_CLAUSE_MASK, OACC_ENTER_DATA_CLAUSE_MASK)
    	(OACC_EXIT_DATA_CLAUSE_MASK, OACC_KERNELS_CLAUSE_MASK)
    	(OACC_LOOP_CLAUSE_MASK, OACC_PARALLEL_CLAUSE_MASK)
    	(OACC_UPDATE_CLAUSE_MASK, OACC_WAIT_CLAUSE_MASK): New macros.
    	(cp_parser_omp_var_list_no_open): Handle OMP_CLAUSE__CACHE_.
    	(cp_parser_oacc_data_clause, cp_parser_oacc_data_clause_deviceptr)
    	(cp_parser_oacc_clause_vector_length, cp_parser_oacc_wait_list)
    	(cp_parser_oacc_clause_wait, cp_parser_omp_clause_num_gangs)
    	(cp_parser_omp_clause_num_workers, cp_parser_oacc_clause_async)
    	(cp_parser_oacc_all_clauses, cp_parser_oacc_cache)
    	(cp_parser_oacc_data, cp_parser_oacc_enter_exit_data)
    	(cp_parser_oacc_kernels, cp_parser_oacc_loop)
    	(cp_parser_oacc_parallel, cp_parser_oacc_update)
    	(cp_parser_oacc_wait): New functions.
    	* cp-tree.h (finish_oacc_data, finish_oacc_kernels)
    	(finish_oacc_parallel): New prototypes.
    	* semantics.c: Include "gomp-constants.h".
    	(handle_omp_array_sections): Handle GOMP_MAP_FORCE_DEVICEPTR.  Use
    	GOMP_MAP_* instead of OMP_CLAUSE_MAP_*.  Use
    	OMP_CLAUSE_SET_MAP_KIND.
    	(finish_omp_clauses): Handle OMP_CLAUSE_ASYNC,
    	OMP_CLAUSE_VECTOR_LENGTH, OMP_CLAUSE_WAIT, OMP_CLAUSE__CACHE_.
    	Use GOMP_MAP_* instead of OMP_CLAUSE_MAP_*.
    	(finish_oacc_data, finish_oacc_kernels, finish_oacc_parallel): New
    	functions.
    	gcc/fortran/
    	* lang.opt (fopenacc): New option.
    	* cpp.c (cpp_define_builtins): Conditionally define _OPENACC.
    	* dump-parse-tree.c (show_omp_node): Split part of it into...
    	(show_omp_clauses): ... this new function.
    	(show_omp_node, show_code_node): Handle EXEC_OACC_PARALLEL_LOOP,
    	EXEC_OACC_PARALLEL, EXEC_OACC_KERNELS_LOOP, EXEC_OACC_KERNELS,
    	EXEC_OACC_DATA, EXEC_OACC_HOST_DATA, EXEC_OACC_LOOP,
    	EXEC_OACC_UPDATE, EXEC_OACC_WAIT, EXEC_OACC_CACHE,
    	EXEC_OACC_ENTER_DATA, EXEC_OACC_EXIT_DATA.
    	(show_namespace): Update for OpenACC.
    	* f95-lang.c (DEF_FUNCTION_TYPE_VAR_2, DEF_FUNCTION_TYPE_VAR_8)
    	(DEF_FUNCTION_TYPE_VAR_12, DEF_GOACC_BUILTIN)
    	(DEF_GOACC_BUILTIN_COMPILER): New macros.
    	* types.def (BT_FN_VOID_INT_INT_VAR)
    	(BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR_INT_INT_VAR)
    	(BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR):
    	New function types.
    	* gfortran.h (gfc_statement): Add ST_OACC_PARALLEL_LOOP,
    	ST_OACC_END_PARALLEL_LOOP, ST_OACC_PARALLEL, ST_OACC_END_PARALLEL,
    	ST_OACC_KERNELS, ST_OACC_END_KERNELS, ST_OACC_DATA,
    	ST_OACC_END_DATA, ST_OACC_HOST_DATA, ST_OACC_END_HOST_DATA,
    	ST_OACC_LOOP, ST_OACC_END_LOOP, ST_OACC_DECLARE, ST_OACC_UPDATE,
    	ST_OACC_WAIT, ST_OACC_CACHE, ST_OACC_KERNELS_LOOP,
    	ST_OACC_END_KERNELS_LOOP, ST_OACC_ENTER_DATA, ST_OACC_EXIT_DATA,
    	ST_OACC_ROUTINE.
    	(struct gfc_expr_list): New data type.
    	(gfc_get_expr_list): New macro.
    	(gfc_omp_map_op): Add OMP_MAP_FORCE_ALLOC, OMP_MAP_FORCE_DEALLOC,
    	OMP_MAP_FORCE_TO, OMP_MAP_FORCE_FROM, OMP_MAP_FORCE_TOFROM,
    	OMP_MAP_FORCE_PRESENT, OMP_MAP_FORCE_DEVICEPTR.
    	(OMP_LIST_FIRST, OMP_LIST_DEVICE_RESIDENT, OMP_LIST_USE_DEVICE)
    	(OMP_LIST_CACHE): New enumerators.
    	(struct gfc_omp_clauses): Add async_expr, gang_expr, worker_expr,
    	vector_expr, num_gangs_expr, num_workers_expr, vector_length_expr,
    	wait_list, tile_list, async, gang, worker, vector, seq,
    	independent, wait, par_auto, gang_static, and loc members.
    	(struct gfc_namespace): Add oacc_declare_clauses member.
    	(gfc_exec_op): Add EXEC_OACC_KERNELS_LOOP,
    	EXEC_OACC_PARALLEL_LOOP, EXEC_OACC_PARALLEL, EXEC_OACC_KERNELS,
    	EXEC_OACC_DATA, EXEC_OACC_HOST_DATA, EXEC_OACC_LOOP,
    	EXEC_OACC_UPDATE, EXEC_OACC_WAIT, EXEC_OACC_CACHE,
    	EXEC_OACC_ENTER_DATA, EXEC_OACC_EXIT_DATA.
    	(gfc_free_expr_list, gfc_resolve_oacc_directive)
    	(gfc_resolve_oacc_declare, gfc_resolve_oacc_parallel_loop_blocks)
    	(gfc_resolve_oacc_blocks): New prototypes.
    	* match.c (match_exit_cycle): Handle EXEC_OACC_LOOP and
    	EXEC_OACC_PARALLEL_LOOP.
    	* match.h (gfc_match_oacc_cache, gfc_match_oacc_wait)
    	(gfc_match_oacc_update, gfc_match_oacc_declare)
    	(gfc_match_oacc_loop, gfc_match_oacc_host_data)
    	(gfc_match_oacc_data, gfc_match_oacc_kernels)
    	(gfc_match_oacc_kernels_loop, gfc_match_oacc_parallel)
    	(gfc_match_oacc_parallel_loop, gfc_match_oacc_enter_data)
    	(gfc_match_oacc_exit_data, gfc_match_oacc_routine): New
    	prototypes.
    	* openmp.c: Include "diagnostic.h" and "gomp-constants.h".
    	(gfc_free_omp_clauses): Update for members added to struct
    	gfc_omp_clauses.
    	(gfc_match_omp_clauses): Change mask paramter to uint64_t.  Add
    	openacc parameter.
    	(resolve_omp_clauses): Add openacc parameter.  Update for OpenACC.
    	(struct fortran_omp_context): Add is_openmp member.
    	(gfc_resolve_omp_parallel_blocks): Initialize it.
    	(gfc_resolve_do_iterator): Update for OpenACC.
    	(gfc_resolve_omp_directive): Call
    	resolve_omp_directive_inside_oacc_region.
    	(OMP_CLAUSE_PRIVATE, OMP_CLAUSE_FIRSTPRIVATE)
    	(OMP_CLAUSE_LASTPRIVATE, OMP_CLAUSE_COPYPRIVATE)
    	(OMP_CLAUSE_SHARED, OMP_CLAUSE_COPYIN, OMP_CLAUSE_REDUCTION)
    	(OMP_CLAUSE_IF, OMP_CLAUSE_NUM_THREADS, OMP_CLAUSE_SCHEDULE)
    	(OMP_CLAUSE_DEFAULT, OMP_CLAUSE_ORDERED, OMP_CLAUSE_COLLAPSE)
    	(OMP_CLAUSE_UNTIED, OMP_CLAUSE_FINAL, OMP_CLAUSE_MERGEABLE)
    	(OMP_CLAUSE_ALIGNED, OMP_CLAUSE_DEPEND, OMP_CLAUSE_INBRANCH)
    	(OMP_CLAUSE_LINEAR, OMP_CLAUSE_NOTINBRANCH, OMP_CLAUSE_PROC_BIND)
    	(OMP_CLAUSE_SAFELEN, OMP_CLAUSE_SIMDLEN, OMP_CLAUSE_UNIFORM)
    	(OMP_CLAUSE_DEVICE, OMP_CLAUSE_MAP, OMP_CLAUSE_TO)
    	(OMP_CLAUSE_FROM, OMP_CLAUSE_NUM_TEAMS, OMP_CLAUSE_THREAD_LIMIT)
    	(OMP_CLAUSE_DIST_SCHEDULE): Use uint64_t.
    	(OMP_CLAUSE_ASYNC, OMP_CLAUSE_NUM_GANGS, OMP_CLAUSE_NUM_WORKERS)
    	(OMP_CLAUSE_VECTOR_LENGTH, OMP_CLAUSE_COPY, OMP_CLAUSE_COPYOUT)
    	(OMP_CLAUSE_CREATE, OMP_CLAUSE_PRESENT)
    	(OMP_CLAUSE_PRESENT_OR_COPY, OMP_CLAUSE_PRESENT_OR_COPYIN)
    	(OMP_CLAUSE_PRESENT_OR_COPYOUT, OMP_CLAUSE_PRESENT_OR_CREATE)
    	(OMP_CLAUSE_DEVICEPTR, OMP_CLAUSE_GANG, OMP_CLAUSE_WORKER)
    	(OMP_CLAUSE_VECTOR, OMP_CLAUSE_SEQ, OMP_CLAUSE_INDEPENDENT)
    	(OMP_CLAUSE_USE_DEVICE, OMP_CLAUSE_DEVICE_RESIDENT)
    	(OMP_CLAUSE_HOST_SELF, OMP_CLAUSE_OACC_DEVICE, OMP_CLAUSE_WAIT)
    	(OMP_CLAUSE_DELETE, OMP_CLAUSE_AUTO, OMP_CLAUSE_TILE): New macros.
    	(gfc_match_omp_clauses): Handle those.
    	(OACC_PARALLEL_CLAUSES, OACC_KERNELS_CLAUSES, OACC_DATA_CLAUSES)
    	(OACC_LOOP_CLAUSES, OACC_PARALLEL_LOOP_CLAUSES)
    	(OACC_KERNELS_LOOP_CLAUSES, OACC_HOST_DATA_CLAUSES)
    	(OACC_DECLARE_CLAUSES, OACC_UPDATE_CLAUSES)
    	(OACC_ENTER_DATA_CLAUSES, OACC_EXIT_DATA_CLAUSES)
    	(OACC_WAIT_CLAUSES): New macros.
    	(gfc_free_expr_list, match_oacc_expr_list, match_oacc_clause_gang)
    	(gfc_match_omp_map_clause, gfc_match_oacc_parallel_loop)
    	(gfc_match_oacc_parallel, gfc_match_oacc_kernels_loop)
    	(gfc_match_oacc_kernels, gfc_match_oacc_data)
    	(gfc_match_oacc_host_data, gfc_match_oacc_loop)
    	(gfc_match_oacc_declare, gfc_match_oacc_update)
    	(gfc_match_oacc_enter_data, gfc_match_oacc_exit_data)
    	(gfc_match_oacc_wait, gfc_match_oacc_cache)
    	(gfc_match_oacc_routine, oacc_is_loop)
    	(resolve_oacc_scalar_int_expr, resolve_oacc_positive_int_expr)
    	(check_symbol_not_pointer, check_array_not_assumed)
    	(resolve_oacc_data_clauses, resolve_oacc_deviceptr_clause)
    	(oacc_compatible_clauses, oacc_is_parallel, oacc_is_kernels)
    	(omp_code_to_statement, oacc_code_to_statement)
    	(resolve_oacc_directive_inside_omp_region)
    	(resolve_omp_directive_inside_oacc_region)
    	(resolve_oacc_nested_loops, resolve_oacc_params_in_parallel)
    	(resolve_oacc_loop_blocks, gfc_resolve_oacc_blocks)
    	(resolve_oacc_loop, resolve_oacc_cache, gfc_resolve_oacc_declare)
    	(gfc_resolve_oacc_directive): New functions.
    	* parse.c (next_free): Update for OpenACC.  Move some code into...
    	(verify_token_free): ... this new function.
    	(next_fixed): Update for OpenACC.  Move some code into...
    	(verify_token_fixed): ... this new function.
    	(case_executable): Add ST_OACC_UPDATE, ST_OACC_WAIT,
    	ST_OACC_CACHE, ST_OACC_ENTER_DATA, and ST_OACC_EXIT_DATA.
    	(case_exec_markers): Add ST_OACC_PARALLEL_LOOP, ST_OACC_PARALLEL,
    	ST_OACC_KERNELS, ST_OACC_DATA, ST_OACC_HOST_DATA, ST_OACC_LOOP,
    	ST_OACC_KERNELS_LOOP.
    	(case_decl): Add ST_OACC_ROUTINE.
    	(push_state, parse_critical_block, parse_progunit): Update for
    	OpenACC.
    	(gfc_ascii_statement): Handle ST_OACC_PARALLEL_LOOP,
    	ST_OACC_END_PARALLEL_LOOP, ST_OACC_PARALLEL, ST_OACC_END_PARALLEL,
    	ST_OACC_KERNELS, ST_OACC_END_KERNELS, ST_OACC_KERNELS_LOOP,
    	ST_OACC_END_KERNELS_LOOP, ST_OACC_DATA, ST_OACC_END_DATA,
    	ST_OACC_HOST_DATA, ST_OACC_END_HOST_DATA, ST_OACC_LOOP,
    	ST_OACC_END_LOOP, ST_OACC_DECLARE, ST_OACC_UPDATE, ST_OACC_WAIT,
    	ST_OACC_CACHE, ST_OACC_ENTER_DATA, ST_OACC_EXIT_DATA,
    	ST_OACC_ROUTINE.
    	(verify_st_order, parse_spec): Handle ST_OACC_DECLARE.
    	(parse_executable): Handle ST_OACC_PARALLEL_LOOP,
    	ST_OACC_KERNELS_LOOP, ST_OACC_LOOP, ST_OACC_PARALLEL,
    	ST_OACC_KERNELS, ST_OACC_DATA, ST_OACC_HOST_DATA.
    	(decode_oacc_directive, parse_oacc_structured_block)
    	(parse_oacc_loop, is_oacc): New functions.
    	* parse.h (struct gfc_state_data): Add oacc_declare_clauses
    	member.
    	(is_oacc): New prototype.
    	* resolve.c (gfc_resolve_blocks, gfc_resolve_code): Handle
    	EXEC_OACC_PARALLEL_LOOP, EXEC_OACC_PARALLEL,
    	EXEC_OACC_KERNELS_LOOP, EXEC_OACC_KERNELS, EXEC_OACC_DATA,
    	EXEC_OACC_HOST_DATA, EXEC_OACC_LOOP, EXEC_OACC_UPDATE,
    	EXEC_OACC_WAIT, EXEC_OACC_CACHE, EXEC_OACC_ENTER_DATA,
    	EXEC_OACC_EXIT_DATA.
    	(resolve_codes): Call gfc_resolve_oacc_declare.
    	* scanner.c (openacc_flag, openacc_locus): New variables.
    	(skip_free_comments): Update for OpenACC.  Move some code into...
    	(skip_omp_attribute): ... this new function.
    	(skip_oacc_attribute): New function.
    	(skip_fixed_comments, gfc_next_char_literal): Update for OpenACC.
    	* st.c (gfc_free_statement): Handle EXEC_OACC_PARALLEL_LOOP,
    	EXEC_OACC_PARALLEL, EXEC_OACC_KERNELS_LOOP, EXEC_OACC_KERNELS,
    	EXEC_OACC_DATA, EXEC_OACC_HOST_DATA, EXEC_OACC_LOOP,
    	EXEC_OACC_UPDATE, EXEC_OACC_WAIT, EXEC_OACC_CACHE,
    	EXEC_OACC_ENTER_DATA, EXEC_OACC_EXIT_DATA.
    	* trans-decl.c (gfc_generate_function_code): Update for OpenACC.
    	* trans-openmp.c: Include "gomp-constants.h".
    	(gfc_omp_finish_clause, gfc_trans_omp_clauses): Use GOMP_MAP_*
    	instead of OMP_CLAUSE_MAP_*.  Use OMP_CLAUSE_SET_MAP_KIND.
    	(gfc_trans_omp_clauses): Handle OMP_LIST_USE_DEVICE,
    	OMP_LIST_DEVICE_RESIDENT, OMP_LIST_CACHE, and OMP_MAP_FORCE_ALLOC,
    	OMP_MAP_FORCE_DEALLOC, OMP_MAP_FORCE_TO, OMP_MAP_FORCE_FROM,
    	OMP_MAP_FORCE_TOFROM, OMP_MAP_FORCE_PRESENT,
    	OMP_MAP_FORCE_DEVICEPTR, and gfc_omp_clauses' async, seq,
    	independent, wait_list, num_gangs_expr, num_workers_expr,
    	vector_length_expr, vector, vector_expr, worker, worker_expr,
    	gang, gang_expr members.
    	(gfc_trans_omp_do): Handle EXEC_OACC_LOOP.
    	(gfc_convert_expr_to_tree, gfc_trans_oacc_construct)
    	(gfc_trans_oacc_executable_directive)
    	(gfc_trans_oacc_wait_directive, gfc_trans_oacc_combined_directive)
    	(gfc_trans_oacc_declare, gfc_trans_oacc_directive): New functions.
    	* trans-stmt.c (gfc_trans_block_construct): Update for OpenACC.
    	* trans-stmt.h (gfc_trans_oacc_directive, gfc_trans_oacc_declare):
    	New prototypes.
    	* trans.c (tranc_code): Handle EXEC_OACC_CACHE, EXEC_OACC_WAIT,
    	EXEC_OACC_UPDATE, EXEC_OACC_LOOP, EXEC_OACC_HOST_DATA,
    	EXEC_OACC_DATA, EXEC_OACC_KERNELS, EXEC_OACC_KERNELS_LOOP,
    	EXEC_OACC_PARALLEL, EXEC_OACC_PARALLEL_LOOP, EXEC_OACC_ENTER_DATA,
    	EXEC_OACC_EXIT_DATA.
    	* gfortran.texi: Update for OpenACC.
    	* intrinsic.texi: Likewise.
    	* invoke.texi: Likewise.
    	gcc/lto/
    	* lto-lang.c (DEF_FUNCTION_TYPE_VAR_8, DEF_FUNCTION_TYPE_VAR_12):
    	New macros.
    	* lto.c: Include "gomp-constants.h".
    	gcc/testsuite/
    	* lib/target-supports.exp (check_effective_target_fopenacc): New
    	procedure.
    	* g++.dg/goacc-gomp/goacc-gomp.exp: New file.
    	* g++.dg/goacc/goacc.exp: Likewise.
    	* gcc.dg/goacc-gomp/goacc-gomp.exp: Likewise.
    	* gcc.dg/goacc/goacc.exp: Likewise.
    	* gfortran.dg/goacc/goacc.exp: Likewise.
    	* c-c++-common/cpp/openacc-define-1.c: New file.
    	* c-c++-common/cpp/openacc-define-2.c: Likewise.
    	* c-c++-common/cpp/openacc-define-3.c: Likewise.
    	* c-c++-common/goacc-gomp/nesting-1.c: Likewise.
    	* c-c++-common/goacc-gomp/nesting-fail-1.c: Likewise.
    	* c-c++-common/goacc/acc_on_device-2-off.c: Likewise.
    	* c-c++-common/goacc/acc_on_device-2.c: Likewise.
    	* c-c++-common/goacc/asyncwait-1.c: Likewise.
    	* c-c++-common/goacc/cache-1.c: Likewise.
    	* c-c++-common/goacc/clauses-fail.c: Likewise.
    	* c-c++-common/goacc/collapse-1.c: Likewise.
    	* c-c++-common/goacc/data-1.c: Likewise.
    	* c-c++-common/goacc/data-2.c: Likewise.
    	* c-c++-common/goacc/data-clause-duplicate-1.c: Likewise.
    	* c-c++-common/goacc/deviceptr-1.c: Likewise.
    	* c-c++-common/goacc/deviceptr-2.c: Likewise.
    	* c-c++-common/goacc/deviceptr-3.c: Likewise.
    	* c-c++-common/goacc/if-clause-1.c: Likewise.
    	* c-c++-common/goacc/if-clause-2.c: Likewise.
    	* c-c++-common/goacc/kernels-1.c: Likewise.
    	* c-c++-common/goacc/loop-1.c: Likewise.
    	* c-c++-common/goacc/loop-private-1.c: Likewise.
    	* c-c++-common/goacc/nesting-1.c: Likewise.
    	* c-c++-common/goacc/nesting-data-1.c: Likewise.
    	* c-c++-common/goacc/nesting-fail-1.c: Likewise.
    	* c-c++-common/goacc/parallel-1.c: Likewise.
    	* c-c++-common/goacc/pcopy.c: Likewise.
    	* c-c++-common/goacc/pcopyin.c: Likewise.
    	* c-c++-common/goacc/pcopyout.c: Likewise.
    	* c-c++-common/goacc/pcreate.c: Likewise.
    	* c-c++-common/goacc/pragma_context.c: Likewise.
    	* c-c++-common/goacc/present-1.c: Likewise.
    	* c-c++-common/goacc/reduction-1.c: Likewise.
    	* c-c++-common/goacc/reduction-2.c: Likewise.
    	* c-c++-common/goacc/reduction-3.c: Likewise.
    	* c-c++-common/goacc/reduction-4.c: Likewise.
    	* c-c++-common/goacc/sb-1.c: Likewise.
    	* c-c++-common/goacc/sb-2.c: Likewise.
    	* c-c++-common/goacc/sb-3.c: Likewise.
    	* c-c++-common/goacc/update-1.c: Likewise.
    	* gcc.dg/goacc/acc_on_device-1.c: Likewise.
    	* gfortran.dg/goacc/acc_on_device-1.f95: Likewise.
    	* gfortran.dg/goacc/acc_on_device-2-off.f95: Likewise.
    	* gfortran.dg/goacc/acc_on_device-2.f95: Likewise.
    	* gfortran.dg/goacc/assumed.f95: Likewise.
    	* gfortran.dg/goacc/asyncwait-1.f95: Likewise.
    	* gfortran.dg/goacc/asyncwait-2.f95: Likewise.
    	* gfortran.dg/goacc/asyncwait-3.f95: Likewise.
    	* gfortran.dg/goacc/asyncwait-4.f95: Likewise.
    	* gfortran.dg/goacc/branch.f95: Likewise.
    	* gfortran.dg/goacc/cache-1.f95: Likewise.
    	* gfortran.dg/goacc/coarray.f95: Likewise.
    	* gfortran.dg/goacc/continuation-free-form.f95: Likewise.
    	* gfortran.dg/goacc/cray.f95: Likewise.
    	* gfortran.dg/goacc/critical.f95: Likewise.
    	* gfortran.dg/goacc/data-clauses.f95: Likewise.
    	* gfortran.dg/goacc/data-tree.f95: Likewise.
    	* gfortran.dg/goacc/declare-1.f95: Likewise.
    	* gfortran.dg/goacc/enter-exit-data.f95: Likewise.
    	* gfortran.dg/goacc/fixed-1.f: Likewise.
    	* gfortran.dg/goacc/fixed-2.f: Likewise.
    	* gfortran.dg/goacc/fixed-3.f: Likewise.
    	* gfortran.dg/goacc/fixed-4.f: Likewise.
    	* gfortran.dg/goacc/host_data-tree.f95: Likewise.
    	* gfortran.dg/goacc/if.f95: Likewise.
    	* gfortran.dg/goacc/kernels-tree.f95: Likewise.
    	* gfortran.dg/goacc/list.f95: Likewise.
    	* gfortran.dg/goacc/literal.f95: Likewise.
    	* gfortran.dg/goacc/loop-1.f95: Likewise.
    	* gfortran.dg/goacc/loop-2.f95: Likewise.
    	* gfortran.dg/goacc/loop-3.f95: Likewise.
    	* gfortran.dg/goacc/loop-tree-1.f90: Likewise.
    	* gfortran.dg/goacc/omp.f95: Likewise.
    	* gfortran.dg/goacc/parallel-kernels-clauses.f95: Likewise.
    	* gfortran.dg/goacc/parallel-kernels-regions.f95: Likewise.
    	* gfortran.dg/goacc/parallel-tree.f95: Likewise.
    	* gfortran.dg/goacc/parameter.f95: Likewise.
    	* gfortran.dg/goacc/private-1.f95: Likewise.
    	* gfortran.dg/goacc/private-2.f95: Likewise.
    	* gfortran.dg/goacc/private-3.f95: Likewise.
    	* gfortran.dg/goacc/pure-elemental-procedures.f95: Likewise.
    	* gfortran.dg/goacc/reduction-2.f95: Likewise.
    	* gfortran.dg/goacc/reduction.f95: Likewise.
    	* gfortran.dg/goacc/routine-1.f90: Likewise.
    	* gfortran.dg/goacc/routine-2.f90: Likewise.
    	* gfortran.dg/goacc/sentinel-free-form.f95: Likewise.
    	* gfortran.dg/goacc/several-directives.f95: Likewise.
    	* gfortran.dg/goacc/sie.f95: Likewise.
    	* gfortran.dg/goacc/subarrays.f95: Likewise.
    	* gfortran.dg/gomp/map-1.f90: Likewise.
    	* gfortran.dg/openacc-define-1.f90: Likewise.
    	* gfortran.dg/openacc-define-2.f90: Likewise.
    	* gfortran.dg/openacc-define-3.f90: Likewise.
    	* g++.dg/gomp/block-1.C: Update for changed compiler output.
    	* g++.dg/gomp/block-2.C: Likewise.
    	* g++.dg/gomp/block-3.C: Likewise.
    	* g++.dg/gomp/block-5.C: Likewise.
    	* g++.dg/gomp/target-1.C: Likewise.
    	* g++.dg/gomp/target-2.C: Likewise.
    	* g++.dg/gomp/taskgroup-1.C: Likewise.
    	* g++.dg/gomp/teams-1.C: Likewise.
    	* gcc.dg/cilk-plus/jump-openmp.c: Likewise.
    	* gcc.dg/cilk-plus/jump.c: Likewise.
    	* gcc.dg/gomp/block-1.c: Likewise.
    	* gcc.dg/gomp/block-10.c: Likewise.
    	* gcc.dg/gomp/block-2.c: Likewise.
    	* gcc.dg/gomp/block-3.c: Likewise.
    	* gcc.dg/gomp/block-4.c: Likewise.
    	* gcc.dg/gomp/block-5.c: Likewise.
    	* gcc.dg/gomp/block-6.c: Likewise.
    	* gcc.dg/gomp/block-7.c: Likewise.
    	* gcc.dg/gomp/block-8.c: Likewise.
    	* gcc.dg/gomp/block-9.c: Likewise.
    	* gcc.dg/gomp/target-1.c: Likewise.
    	* gcc.dg/gomp/target-2.c: Likewise.
    	* gcc.dg/gomp/taskgroup-1.c: Likewise.
    	* gcc.dg/gomp/teams-1.c: Likewise.
    	include/
    	* gomp-constants.h: New file.
    	libgomp/
    	* Makefile.am (search_path): Add $(top_srcdir)/../include.
    	(libgomp_la_SOURCES): Add splay-tree.c, libgomp-plugin.c,
    	oacc-parallel.c, oacc-host.c, oacc-init.c, oacc-mem.c,
    	oacc-async.c, oacc-plugin.c, oacc-cuda.c.
    	[USE_FORTRAN] (libgomp_la_SOURCES): Add openacc.f90.
    	Include $(top_srcdir)/plugin/Makefrag.am.
    	(nodist_libsubinclude_HEADERS): Add openacc.h.
    	[USE_FORTRAN] (nodist_finclude_HEADERS): Add openacc_lib.h,
    	openacc.f90, openacc.mod, openacc_kinds.mod.
    	(omp_lib.mod): Generalize into...
    	(%.mod): ... this new rule.
    	(openacc_kinds.mod, openacc.mod): New rules.
    	* plugin/configfrag.ac: New file.
    	* configure.ac: Move plugin/offloading support into it.  Include
    	it.  Instantiate testsuite/libgomp-test-support.pt.exp.
    	* plugin/Makefrag.am: New file.
    	* testsuite/Makefile.am (OFFLOAD_TARGETS)
    	(OFFLOAD_ADDITIONAL_OPTIONS, OFFLOAD_ADDITIONAL_LIB_PATHS): Don't
    	export.
    	(libgomp-test-support.exp): New rule.
    	(all-local): Depend on it.
    	* Makefile.in: Regenerate.
    	* testsuite/Makefile.in: Regenerate.
    	* config.h.in: Likewise.
    	* configure: Likewise.
    	* configure.tgt: Harden shell syntax.
    	* env.c: Include "oacc-int.h".
    	(parse_acc_device_type): New function.
    	(gomp_debug_var, goacc_device_type, goacc_device_num): New
    	variables.
    	(initialize_env): Initialize those.  Call
    	goacc_runtime_initialize.
    	* error.c (gomp_vdebug, gomp_debug, gomp_vfatal): New functions.
    	(gomp_fatal): Call gomp_vfatal.
    	* libgomp.h: Include "libgomp-plugin.h" and <stdarg.h>.
    	(gomp_debug_var, goacc_device_type, goacc_device_num, gomp_vdebug)
    	(gomp_debug, gomp_verror, gomp_vfatal, gomp_init_targets_once)
    	(splay_tree_node, splay_tree, splay_tree_key)
    	(struct target_mem_desc, struct splay_tree_key_s)
    	(struct gomp_memory_mapping, struct acc_dispatch_t)
    	(struct gomp_device_descr, gomp_acc_insert_pointer)
    	(gomp_acc_remove_pointer, target_mem_desc, gomp_copy_from_async)
    	(gomp_unmap_vars, gomp_init_device, gomp_init_tables)
    	(gomp_free_memmap, gomp_fini_device): New declarations.
    	(gomp_vdebug, gomp_debug): New macros.
    	Include "splay-tree.h".
    	* libgomp.map (OACC_2.0): New symbol version.  Use for
    	acc_get_num_devices, acc_get_num_devices_h_, acc_set_device_type,
    	acc_set_device_type_h_, acc_get_device_type,
    	acc_get_device_type_h_, acc_set_device_num, acc_set_device_num_h_,
    	acc_get_device_num, acc_get_device_num_h_, acc_async_test,
    	acc_async_test_h_, acc_async_test_all, acc_async_test_all_h_,
    	acc_wait, acc_wait_h_, acc_wait_async, acc_wait_async_h_,
    	acc_wait_all, acc_wait_all_h_, acc_wait_all_async,
    	acc_wait_all_async_h_, acc_init, acc_init_h_, acc_shutdown,
    	acc_shutdown_h_, acc_on_device, acc_on_device_h_, acc_malloc,
    	acc_free, acc_copyin, acc_copyin_32_h_, acc_copyin_64_h_,
    	acc_copyin_array_h_, acc_present_or_copyin,
    	acc_present_or_copyin_32_h_, acc_present_or_copyin_64_h_,
    	acc_present_or_copyin_array_h_, acc_create, acc_create_32_h_,
    	acc_create_64_h_, acc_create_array_h_, acc_present_or_create,
    	acc_present_or_create_32_h_, acc_present_or_create_64_h_,
    	acc_present_or_create_array_h_, acc_copyout, acc_copyout_32_h_,
    	acc_copyout_64_h_, acc_copyout_array_h_, acc_delete,
    	acc_delete_32_h_, acc_delete_64_h_, acc_delete_array_h_,
    	acc_update_device, acc_update_device_32_h_,
    	acc_update_device_64_h_, acc_update_device_array_h_,
    	acc_update_self, acc_update_self_32_h_, acc_update_self_64_h_,
    	acc_update_self_array_h_, acc_map_data, acc_unmap_data,
    	acc_deviceptr, acc_hostptr, acc_is_present, acc_is_present_32_h_,
    	acc_is_present_64_h_, acc_is_present_array_h_,
    	acc_memcpy_to_device, acc_memcpy_from_device,
    	acc_get_current_cuda_device, acc_get_current_cuda_context,
    	acc_get_cuda_stream, acc_set_cuda_stream.
    	(GOACC_2.0): New symbol version.  Use for GOACC_data_end,
    	GOACC_data_start, GOACC_enter_exit_data, GOACC_parallel,
    	GOACC_update, GOACC_wait, GOACC_get_thread_num,
    	GOACC_get_num_threads.
    	(GOMP_PLUGIN_1.0): New symbol version.  Use for
    	GOMP_PLUGIN_malloc, GOMP_PLUGIN_malloc_cleared,
    	GOMP_PLUGIN_realloc, GOMP_PLUGIN_debug, GOMP_PLUGIN_error,
    	GOMP_PLUGIN_fatal, GOMP_PLUGIN_async_unmap_vars,
    	GOMP_PLUGIN_acc_thread.
    	* libgomp.texi: Update for OpenACC changes, and GOMP_DEBUG
    	environment variable.
    	* libgomp_g.h (GOACC_data_start, GOACC_data_end)
    	(GOACC_enter_exit_data, GOACC_parallel, GOACC_update, GOACC_wait)
    	(GOACC_get_num_threads, GOACC_get_thread_num): New declarations.
    	* splay-tree.h (splay_tree_lookup, splay_tree_insert)
    	(splay_tree_remove): New declarations.
    	(rotate_left, rotate_right, splay_tree_splay, splay_tree_insert)
    	(splay_tree_remove, splay_tree_lookup): Move into...
    	* splay-tree.c: ... this new file.
    	* target.c: Include "oacc-plugin.h", "oacc-int.h", <assert.h>.
    	(splay_tree_node, splay_tree, splay_tree_key)
    	(struct target_mem_desc, struct splay_tree_key_s)
    	(struct gomp_device_descr): Don't declare.
    	(num_devices_openmp): New variable.
    	(gomp_get_num_devices ): Use it.
    	(gomp_init_targets_once): New function.
    	(gomp_get_num_devices ): Use it.
    	(get_kind, gomp_copy_from_async, gomp_free_memmap)
    	(gomp_fini_device, gomp_register_image_for_device): New functions.
    	(gomp_map_vars): Add devaddrs parameter.
    	(gomp_update): Add mm parameter.
    	(gomp_init_device): Move most of it into...
    	(gomp_init_tables): ... this new function.
    	(gomp_register_images_for_device): Remove function.
    	(splay_compare, gomp_map_vars, gomp_unmap_vars, gomp_init_device):
    	Make them hidden instead of static.
    	(gomp_map_vars_existing, gomp_map_vars, gomp_unmap_vars)
    	(gomp_update, gomp_init_device, GOMP_target, GOMP_target_data)
    	(GOMP_target_end_data, GOMP_target_update)
    	(gomp_load_plugin_for_device, gomp_target_init): Update for
    	OpenACC changes.
    	* oacc-async.c: New file.
    	* oacc-cuda.c: Likewise.
    	* oacc-host.c: Likewise.
    	* oacc-init.c: Likewise.
    	* oacc-int.h: Likewise.
    	* oacc-mem.c: Likewise.
    	* oacc-parallel.c: Likewise.
    	* oacc-plugin.c: Likewise.
    	* oacc-plugin.h: Likewise.
    	* oacc-ptx.h: Likewise.
    	* openacc.f90: Likewise.
    	* openacc.h: Likewise.
    	* openacc_lib.h: Likewise.
    	* plugin/plugin-host.c: Likewise.
    	* plugin/plugin-nvptx.c: Likewise.
    	* libgomp-plugin.c: Likewise.
    	* libgomp-plugin.h: Likewise.
    	* libgomp_target.h: Remove file after merging content into the
    	former file.  Update all users.
    	* testsuite/lib/libgomp.exp: Load libgomp-test-support.exp.
    	(offload_targets_s, offload_targets_s_openacc): New variables.
    	(check_effective_target_openacc_nvidia_accel_present)
    	(check_effective_target_openacc_nvidia_accel_selected): New
    	procedures.
    	(libgomp_init): Update for OpenACC changes.
    	* testsuite/libgomp-test-support.exp.in: New file.
    	* testsuite/libgomp.oacc-c++/c++.exp: Likewise.
    	* testsuite/libgomp.oacc-c/c.exp: Likewise.
    	* testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/abort-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/abort-2.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/abort-3.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/abort-4.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/cache-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/clauses-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/clauses-2.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/collapse-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/collapse-2.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/collapse-3.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/collapse-4.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/context-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/context-2.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/context-3.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/context-4.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/data-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/data-2.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/data-3.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/data-already-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/data-already-2.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/data-already-3.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/data-already-4.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/data-already-5.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/data-already-6.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/data-already-7.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/data-already-8.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/if-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-empty.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-10.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-11.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-12.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-13.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-14.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-15.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-16.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-17.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-18.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-19.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-2.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-20.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-21.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-22.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-23.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-24.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-25.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-26.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-27.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-28.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-29.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-3.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-30.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-31.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-32.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-33.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-34.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-35.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-36.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-37.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-38.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-39.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-4.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-40.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-41.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-42.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-43.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-44.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-45.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-46.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-47.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-48.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-49.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-5.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-50.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-51.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-52.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-53.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-54.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-55.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-56.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-57.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-58.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-59.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-6.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-60.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-61.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-62.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-63.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-64.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-65.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-66.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-67.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-68.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-69.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-7.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-70.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-71.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-72.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-73.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-74.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-75.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-76.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-77.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-78.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-79.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-80.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-81.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-82.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-83.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-84.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-85.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-86.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-87.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-88.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-89.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-9.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-90.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-91.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-92.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/nested-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/nested-2.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/offset-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/parallel-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/parallel-empty.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/pointer-align-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/present-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/present-2.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/reduction-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/reduction-2.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/reduction-3.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/reduction-4.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/reduction-initial-1.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/subr.h: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/subr.ptx: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/timer.h: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/update-1-2.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/update-1.c: Likewise.
    	* testsuite/libgomp.oacc-fortran/abort-1.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/abort-2.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f: Likewise.
    	* testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f: Likewise.
    	* testsuite/libgomp.oacc-fortran/asyncwait-1.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/asyncwait-2.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/asyncwait-3.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/collapse-1.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/collapse-2.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/collapse-3.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/collapse-4.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/collapse-5.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/collapse-6.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/collapse-7.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/collapse-8.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/data-1.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/data-2.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/data-3.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/data-4-2.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/data-4.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/data-already-1.f: Likewise.
    	* testsuite/libgomp.oacc-fortran/data-already-2.f: Likewise.
    	* testsuite/libgomp.oacc-fortran/data-already-3.f: Likewise.
    	* testsuite/libgomp.oacc-fortran/data-already-4.f: Likewise.
    	* testsuite/libgomp.oacc-fortran/data-already-5.f: Likewise.
    	* testsuite/libgomp.oacc-fortran/data-already-6.f: Likewise.
    	* testsuite/libgomp.oacc-fortran/data-already-7.f: Likewise.
    	* testsuite/libgomp.oacc-fortran/data-already-8.f: Likewise.
    	* testsuite/libgomp.oacc-fortran/lib-1.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/lib-10.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/lib-2.f: Likewise.
    	* testsuite/libgomp.oacc-fortran/lib-3.f: Likewise.
    	* testsuite/libgomp.oacc-fortran/lib-4.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/lib-5.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/lib-6.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/lib-7.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/lib-8.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/map-1.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/openacc_version-1.f: Likewise.
    	* testsuite/libgomp.oacc-fortran/openacc_version-2.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/pointer-align-1.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/pset-1.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/reduction-1.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/reduction-2.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/reduction-3.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/reduction-4.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/reduction-5.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/reduction-6.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/routine-1.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/routine-2.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/routine-3.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/routine-4.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/subarrays-1.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/subarrays-2.f90: Likewise.
    	liboffloadmic/
    	* plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_get_name)
    	(GOMP_OFFLOAD_get_caps, GOMP_OFFLOAD_fini_device): New functions.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@219682 138bc75d-0d04-0410-961f-82ee72b054a4

 contrib/ChangeLog                                  |    5 +
 contrib/gcc_update                                 |    2 +
 gcc/ChangeLog                                      |  180 ++
 gcc/ada/ChangeLog                                  |    5 +
 gcc/ada/gcc-interface/utils.c                      |   18 +
 gcc/builtin-types.def                              |   11 +
 gcc/builtins.c                                     |   49 +
 gcc/builtins.def                                   |   16 +-
 gcc/c-family/ChangeLog                             |   38 +
 gcc/c-family/c-common.c                            |   17 +
 gcc/c-family/c-common.h                            |    1 +
 gcc/c-family/c-cppbuiltin.c                        |    3 +
 gcc/c-family/c-omp.c                               |   46 +-
 gcc/c-family/c-pragma.c                            |   31 +
 gcc/c-family/c-pragma.h                            |   44 +-
 gcc/c-family/c.opt                                 |    4 +
 gcc/c/ChangeLog                                    |   51 +
 gcc/c/c-parser.c                                   | 1036 ++++++++++-
 gcc/c/c-tree.h                                     |    3 +
 gcc/c/c-typeck.c                                   |   78 +-
 gcc/cgraph.c                                       |    2 +-
 gcc/config.gcc                                     |    2 +
 gcc/config/arc/arc.h                               |    2 +-
 gcc/config/darwin.h                                |    2 +-
 gcc/config/i386/intelmic-mkoffload.c               |    2 +-
 gcc/config/i386/intelmic-offload.h                 |   35 +
 gcc/config/i386/mingw32.h                          |    2 +-
 gcc/config/ia64/hpux.h                             |    2 +-
 gcc/config/nvptx/offload.h                         |   35 +
 gcc/config/pa/pa-hpux11.h                          |    4 +-
 gcc/config/pa/pa64-hpux.h                          |   24 +-
 gcc/cp/ChangeLog                                   |   45 +
 gcc/cp/cp-tree.h                                   |    3 +
 gcc/cp/parser.c                                    |  880 +++++++++-
 gcc/cp/semantics.c                                 |  102 +-
 gcc/doc/generic.texi                               |   71 +-
 gcc/doc/gimple.texi                                |    5 +-
 gcc/doc/invoke.texi                                |   18 +-
 gcc/doc/sourcebuild.texi                           |    3 +
 gcc/fortran/ChangeLog                              |  196 +++
 gcc/fortran/cpp.c                                  |    3 +
 gcc/fortran/dump-parse-tree.c                      |  478 ++++--
 gcc/fortran/f95-lang.c                             |   70 +
 gcc/fortran/gfortran.h                             |   58 +-
 gcc/fortran/gfortran.texi                          |   44 +-
 gcc/fortran/intrinsic.texi                         |   29 +
 gcc/fortran/invoke.texi                            |   16 +-
 gcc/fortran/lang.opt                               |    4 +
 gcc/fortran/match.c                                |   31 +-
 gcc/fortran/match.h                                |   16 +
 gcc/fortran/openmp.c                               | 1521 ++++++++++++++++-
 gcc/fortran/parse.c                                |  477 +++++-
 gcc/fortran/parse.h                                |    2 +
 gcc/fortran/resolve.c                              |   37 +
 gcc/fortran/scanner.c                              |  365 +++-
 gcc/fortran/st.c                                   |   12 +
 gcc/fortran/trans-decl.c                           |    7 +
 gcc/fortran/trans-openmp.c                         |  396 ++++-
 gcc/fortran/trans-stmt.c                           |    8 +
 gcc/fortran/trans-stmt.h                           |    4 +
 gcc/fortran/trans.c                                |   15 +
 gcc/fortran/types.def                              |   12 +
 gcc/gcc.c                                          |    5 +-
 gcc/gimple-pretty-print.c                          |   69 +-
 gcc/gimple.c                                       |    6 +-
 gcc/gimple.def                                     |   12 +-
 gcc/gimple.h                                       |  177 +-
 gcc/gimplify.c                                     |  254 ++-
 gcc/lto-streamer-out.c                             |    1 +
 gcc/lto/ChangeLog                                  |    7 +
 gcc/lto/lto-lang.c                                 |   17 +
 gcc/lto/lto.c                                      |    1 +
 gcc/omp-builtins.def                               |   33 +-
 gcc/omp-low.c                                      | 1598 ++++++++++++++---
 gcc/testsuite/ChangeLog                            |  130 ++
 gcc/testsuite/c-c++-common/cpp/openacc-define-1.c  |    6 +
 gcc/testsuite/c-c++-common/cpp/openacc-define-2.c  |    7 +
 gcc/testsuite/c-c++-common/cpp/openacc-define-3.c  |   11 +
 gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c  |   12 +
 .../c-c++-common/goacc-gomp/nesting-fail-1.c       |  457 +++++
 .../c-c++-common/goacc/acc_on_device-2-off.c       |   25 +
 gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c |   29 +
 gcc/testsuite/c-c++-common/goacc/asyncwait-1.c     |  213 +++
 gcc/testsuite/c-c++-common/goacc/cache-1.c         |   88 +
 gcc/testsuite/c-c++-common/goacc/clauses-fail.c    |   18 +
 gcc/testsuite/c-c++-common/goacc/collapse-1.c      |   97 ++
 gcc/testsuite/c-c++-common/goacc/data-1.c          |    6 +
 gcc/testsuite/c-c++-common/goacc/data-2.c          |   21 +
 .../c-c++-common/goacc/data-clause-duplicate-1.c   |   13 +
 gcc/testsuite/c-c++-common/goacc/deviceptr-1.c     |   86 +
 gcc/testsuite/c-c++-common/goacc/deviceptr-2.c     |   23 +
 gcc/testsuite/c-c++-common/goacc/deviceptr-3.c     |   11 +
 gcc/testsuite/c-c++-common/goacc/if-clause-1.c     |   10 +
 gcc/testsuite/c-c++-common/goacc/if-clause-2.c     |   11 +
 gcc/testsuite/c-c++-common/goacc/kernels-1.c       |    6 +
 gcc/testsuite/c-c++-common/goacc/loop-1.c          |   72 +
 gcc/testsuite/c-c++-common/goacc/loop-private-1.c  |   14 +
 gcc/testsuite/c-c++-common/goacc/nesting-1.c       |  101 ++
 gcc/testsuite/c-c++-common/goacc/nesting-data-1.c  |   61 +
 gcc/testsuite/c-c++-common/goacc/nesting-fail-1.c  |   39 +
 gcc/testsuite/c-c++-common/goacc/parallel-1.c      |    6 +
 gcc/testsuite/c-c++-common/goacc/pcopy.c           |   11 +
 gcc/testsuite/c-c++-common/goacc/pcopyin.c         |   11 +
 gcc/testsuite/c-c++-common/goacc/pcopyout.c        |   11 +
 gcc/testsuite/c-c++-common/goacc/pcreate.c         |   11 +
 gcc/testsuite/c-c++-common/goacc/pragma_context.c  |   34 +
 gcc/testsuite/c-c++-common/goacc/present-1.c       |   11 +
 gcc/testsuite/c-c++-common/goacc/reduction-1.c     |   71 +
 gcc/testsuite/c-c++-common/goacc/reduction-2.c     |   50 +
 gcc/testsuite/c-c++-common/goacc/reduction-3.c     |   50 +
 gcc/testsuite/c-c++-common/goacc/reduction-4.c     |   52 +
 gcc/testsuite/c-c++-common/goacc/sb-1.c            |   75 +
 gcc/testsuite/c-c++-common/goacc/sb-2.c            |   22 +
 gcc/testsuite/c-c++-common/goacc/sb-3.c            |   18 +
 gcc/testsuite/c-c++-common/goacc/update-1.c        |   17 +
 gcc/testsuite/g++.dg/goacc-gomp/goacc-gomp.exp     |   36 +
 gcc/testsuite/g++.dg/goacc/goacc.exp               |   35 +
 gcc/testsuite/g++.dg/gomp/block-1.C                |    2 +-
 gcc/testsuite/g++.dg/gomp/block-2.C                |    2 +-
 gcc/testsuite/g++.dg/gomp/block-3.C                |    4 +-
 gcc/testsuite/g++.dg/gomp/block-5.C                |    2 +-
 gcc/testsuite/g++.dg/gomp/target-1.C               |    2 +-
 gcc/testsuite/g++.dg/gomp/target-2.C               |    2 +-
 gcc/testsuite/g++.dg/gomp/taskgroup-1.C            |    2 +-
 gcc/testsuite/g++.dg/gomp/teams-1.C                |    4 +-
 gcc/testsuite/gcc.dg/cilk-plus/jump-openmp.c       |    4 +-
 gcc/testsuite/gcc.dg/cilk-plus/jump.c              |    4 +-
 gcc/testsuite/gcc.dg/goacc-gomp/goacc-gomp.exp     |   38 +
 gcc/testsuite/gcc.dg/goacc/acc_on_device-1.c       |   20 +
 gcc/testsuite/gcc.dg/goacc/goacc.exp               |   37 +
 gcc/testsuite/gcc.dg/gomp/block-1.c                |    4 +-
 gcc/testsuite/gcc.dg/gomp/block-10.c               |   12 +-
 gcc/testsuite/gcc.dg/gomp/block-2.c                |    4 +-
 gcc/testsuite/gcc.dg/gomp/block-3.c                |    8 +-
 gcc/testsuite/gcc.dg/gomp/block-4.c                |    2 +-
 gcc/testsuite/gcc.dg/gomp/block-5.c                |    4 +-
 gcc/testsuite/gcc.dg/gomp/block-6.c                |    2 +-
 gcc/testsuite/gcc.dg/gomp/block-7.c                |   12 +-
 gcc/testsuite/gcc.dg/gomp/block-8.c                |    2 +-
 gcc/testsuite/gcc.dg/gomp/block-9.c                |    2 +-
 gcc/testsuite/gcc.dg/gomp/target-1.c               |    6 +-
 gcc/testsuite/gcc.dg/gomp/target-2.c               |    6 +-
 gcc/testsuite/gcc.dg/gomp/taskgroup-1.c            |    6 +-
 gcc/testsuite/gcc.dg/gomp/teams-1.c                |   12 +-
 .../gfortran.dg/goacc/acc_on_device-1.f95          |   22 +
 .../gfortran.dg/goacc/acc_on_device-2-off.f95      |   39 +
 .../gfortran.dg/goacc/acc_on_device-2.f95          |   40 +
 gcc/testsuite/gfortran.dg/goacc/assumed.f95        |   47 +
 gcc/testsuite/gfortran.dg/goacc/asyncwait-1.f95    |   91 +
 gcc/testsuite/gfortran.dg/goacc/asyncwait-2.f95    |   91 +
 gcc/testsuite/gfortran.dg/goacc/asyncwait-3.f95    |   41 +
 gcc/testsuite/gfortran.dg/goacc/asyncwait-4.f95    |   37 +
 gcc/testsuite/gfortran.dg/goacc/branch.f95         |   53 +
 gcc/testsuite/gfortran.dg/goacc/cache-1.f95        |   12 +
 gcc/testsuite/gfortran.dg/goacc/coarray.f95        |   35 +
 .../gfortran.dg/goacc/continuation-free-form.f95   |   23 +
 gcc/testsuite/gfortran.dg/goacc/cray.f95           |   56 +
 gcc/testsuite/gfortran.dg/goacc/critical.f95       |   27 +
 gcc/testsuite/gfortran.dg/goacc/data-clauses.f95   |  259 +++
 gcc/testsuite/gfortran.dg/goacc/data-tree.f95      |   30 +
 gcc/testsuite/gfortran.dg/goacc/declare-1.f95      |   20 +
 .../gfortran.dg/goacc/enter-exit-data.f95          |   88 +
 gcc/testsuite/gfortran.dg/goacc/fixed-1.f          |   12 +
 gcc/testsuite/gfortran.dg/goacc/fixed-2.f          |   15 +
 gcc/testsuite/gfortran.dg/goacc/fixed-3.f          |   13 +
 gcc/testsuite/gfortran.dg/goacc/fixed-4.f          |    6 +
 gcc/testsuite/gfortran.dg/goacc/goacc.exp          |   36 +
 gcc/testsuite/gfortran.dg/goacc/host_data-tree.f95 |   13 +
 gcc/testsuite/gfortran.dg/goacc/if.f95             |   52 +
 gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95   |   32 +
 gcc/testsuite/gfortran.dg/goacc/list.f95           |  111 ++
 gcc/testsuite/gfortran.dg/goacc/literal.f95        |   30 +
 gcc/testsuite/gfortran.dg/goacc/loop-1.f95         |  171 ++
 gcc/testsuite/gfortran.dg/goacc/loop-2.f95         |  649 +++++++
 gcc/testsuite/gfortran.dg/goacc/loop-3.f95         |   55 +
 gcc/testsuite/gfortran.dg/goacc/loop-tree-1.f90    |   48 +
 gcc/testsuite/gfortran.dg/goacc/omp.f95            |   66 +
 .../gfortran.dg/goacc/parallel-kernels-clauses.f95 |   96 ++
 .../gfortran.dg/goacc/parallel-kernels-regions.f95 |   55 +
 gcc/testsuite/gfortran.dg/goacc/parallel-tree.f95  |   41 +
 gcc/testsuite/gfortran.dg/goacc/parameter.f95      |   32 +
 gcc/testsuite/gfortran.dg/goacc/private-1.f95      |   37 +
 gcc/testsuite/gfortran.dg/goacc/private-2.f95      |   39 +
 gcc/testsuite/gfortran.dg/goacc/private-3.f95      |   23 +
 .../goacc/pure-elemental-procedures.f95            |   78 +
 gcc/testsuite/gfortran.dg/goacc/reduction-2.f95    |   21 +
 gcc/testsuite/gfortran.dg/goacc/reduction.f95      |  138 ++
 gcc/testsuite/gfortran.dg/goacc/routine-1.f90      |   37 +
 gcc/testsuite/gfortran.dg/goacc/routine-2.f90      |   17 +
 .../gfortran.dg/goacc/sentinel-free-form.f95       |   21 +
 .../gfortran.dg/goacc/several-directives.f95       |    6 +
 gcc/testsuite/gfortran.dg/goacc/sie.f95            |  252 +++
 gcc/testsuite/gfortran.dg/goacc/subarrays.f95      |   41 +
 gcc/testsuite/gfortran.dg/gomp/map-1.f90           |  110 ++
 gcc/testsuite/gfortran.dg/openacc-define-1.f90     |    7 +
 gcc/testsuite/gfortran.dg/openacc-define-2.f90     |    7 +
 gcc/testsuite/gfortran.dg/openacc-define-3.f90     |   11 +
 gcc/testsuite/lib/target-supports.exp              |    9 +
 gcc/tree-core.h                                    |   88 +-
 gcc/tree-inline.c                                  |    2 +-
 gcc/tree-nested.c                                  |   20 +-
 gcc/tree-pretty-print.c                            |  195 ++-
 gcc/tree-streamer-in.c                             |    6 +-
 gcc/tree-streamer-out.c                            |    4 +-
 gcc/tree.c                                         |   47 +-
 gcc/tree.def                                       |   56 +-
 gcc/tree.h                                         |   92 +-
 gcc/varpool.c                                      |    2 +-
 include/ChangeLog                                  |    6 +
 include/gomp-constants.h                           |  116 ++
 libgomp/ChangeLog                                  |  351 ++++
 libgomp/Makefile.am                                |   24 +-
 libgomp/Makefile.in                                |  169 +-
 libgomp/config.h.in                                |    9 +-
 libgomp/configure                                  |  249 ++-
 libgomp/configure.ac                               |   42 +-
 libgomp/configure.tgt                              |    2 +-
 libgomp/env.c                                      |   24 +
 libgomp/error.c                                    |   38 +-
 libgomp/{libgomp_target.h => libgomp-plugin.c}     |   67 +-
 libgomp/{libgomp_target.h => libgomp-plugin.h}     |   47 +-
 libgomp/libgomp.h                                  |  217 ++-
 libgomp/libgomp.map                                |  104 ++
 libgomp/libgomp.texi                               |   40 +-
 libgomp/libgomp_g.h                                |   16 +
 libgomp/{error.c => oacc-async.c}                  |   64 +-
 libgomp/oacc-cuda.c                                |   84 +
 libgomp/oacc-host.c                                |  100 ++
 libgomp/oacc-init.c                                |  636 +++++++
 libgomp/oacc-int.h                                 |  105 ++
 libgomp/oacc-mem.c                                 |  585 +++++++
 libgomp/oacc-parallel.c                            |  490 ++++++
 libgomp/{libgomp_target.h => oacc-plugin.c}        |   37 +-
 libgomp/{libgomp_target.h => oacc-plugin.h}        |   26 +-
 libgomp/oacc-ptx.h                                 |  202 +++
 libgomp/openacc.f90                                |  956 +++++++++++
 libgomp/openacc.h                                  |  118 ++
 libgomp/openacc_lib.h                              |  381 +++++
 libgomp/plugin/Makefrag.am                         |   49 +
 libgomp/plugin/configfrag.ac                       |  148 ++
 libgomp/plugin/plugin-host.c                       |  266 +++
 libgomp/plugin/plugin-nvptx.c                      | 1791 ++++++++++++++++++++
 libgomp/{splay-tree.h => splay-tree.c}             |   31 +-
 libgomp/splay-tree.h                               |  180 +-
 libgomp/target.c                                   |  640 ++++---
 libgomp/testsuite/Makefile.am                      |   17 +-
 libgomp/testsuite/Makefile.in                      |   56 +-
 libgomp/testsuite/lib/libgomp.exp                  |   77 +-
 libgomp/testsuite/libgomp-test-support.exp.in      |    4 +
 libgomp/testsuite/libgomp.oacc-c++/c++.exp         |  107 ++
 .../testsuite/libgomp.oacc-c-c++-common/abort-1.c  |   17 +
 .../testsuite/libgomp.oacc-c-c++-common/abort-2.c  |   17 +
 .../testsuite/libgomp.oacc-c-c++-common/abort-3.c  |   17 +
 .../testsuite/libgomp.oacc-c-c++-common/abort-4.c  |   17 +
 .../libgomp.oacc-c-c++-common/acc_on_device-1.c    |   75 +
 .../libgomp.oacc-c-c++-common/asyncwait-1.c        |  466 +++++
 .../testsuite/libgomp.oacc-c-c++-common/cache-1.c  |   48 +
 .../libgomp.oacc-c-c++-common/clauses-1.c          |  623 +++++++
 .../libgomp.oacc-c-c++-common/clauses-2.c          |   67 +
 .../libgomp.oacc-c-c++-common/collapse-1.c         |   31 +
 .../libgomp.oacc-c-c++-common/collapse-2.c         |   37 +
 .../libgomp.oacc-c-c++-common/collapse-3.c         |   40 +
 .../libgomp.oacc-c-c++-common/collapse-4.c         |   27 +
 .../libgomp.oacc-c-c++-common/context-1.c          |  213 +++
 .../libgomp.oacc-c-c++-common/context-2.c          |  223 +++
 .../libgomp.oacc-c-c++-common/context-3.c          |  200 +++
 .../libgomp.oacc-c-c++-common/context-4.c          |  213 +++
 .../testsuite/libgomp.oacc-c-c++-common/data-1.c   |  188 ++
 .../testsuite/libgomp.oacc-c-c++-common/data-2.c   |  162 ++
 .../testsuite/libgomp.oacc-c-c++-common/data-3.c   |  166 ++
 .../libgomp.oacc-c-c++-common/data-already-1.c     |   19 +
 .../libgomp.oacc-c-c++-common/data-already-2.c     |   16 +
 .../libgomp.oacc-c-c++-common/data-already-3.c     |   17 +
 .../libgomp.oacc-c-c++-common/data-already-4.c     |   17 +
 .../libgomp.oacc-c-c++-common/data-already-5.c     |   17 +
 .../libgomp.oacc-c-c++-common/data-already-6.c     |   17 +
 .../libgomp.oacc-c-c++-common/data-already-7.c     |   17 +
 .../libgomp.oacc-c-c++-common/data-already-8.c     |   16 +
 .../libgomp.oacc-c-c++-common/deviceptr-1.c        |   32 +
 libgomp/testsuite/libgomp.oacc-c-c++-common/if-1.c |  613 +++++++
 .../libgomp.oacc-c-c++-common/kernels-1.c          |  184 ++
 .../libgomp.oacc-c-c++-common/kernels-empty.c      |    6 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-1.c    |   24 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-10.c   |   58 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-11.c   |   23 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-12.c   |   37 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-13.c   |   60 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-14.c   |   61 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-15.c   |   33 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-16.c   |   29 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-17.c   |   31 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-18.c   |   34 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-19.c   |   60 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-2.c    |   26 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-20.c   |   29 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-21.c   |   29 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-22.c   |   29 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-23.c   |   39 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-24.c   |   55 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-25.c   |   30 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-26.c   |   26 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-27.c   |   26 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-28.c   |   26 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-29.c   |   26 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-3.c    |   15 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-30.c   |   26 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-31.c   |   27 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-32.c   |   38 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-33.c   |   31 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-34.c   |   33 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-35.c   |   26 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-36.c   |   26 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-37.c   |   40 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-38.c   |   64 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-39.c   |   41 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-4.c    |   13 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-40.c   |   42 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-41.c   |   43 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-42.c   |   35 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-43.c   |   45 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-44.c   |   45 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-45.c   |   50 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-46.c   |   42 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-47.c   |   43 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-48.c   |   43 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-49.c   |   48 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-5.c    |   40 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-50.c   |   30 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-51.c   |   41 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-52.c   |   28 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-53.c   |   28 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-54.c   |   28 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-55.c   |   48 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-56.c   |   33 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-57.c   |   28 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-58.c   |   28 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-59.c   |   55 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-6.c    |   39 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-60.c   |   54 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-61.c   |   70 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-62.c   |   49 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-63.c   |   43 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-64.c   |   43 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-65.c   |   43 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-66.c   |   48 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-67.c   |   43 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-68.c   |   43 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-69.c   |  124 ++
 .../testsuite/libgomp.oacc-c-c++-common/lib-7.c    |   18 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-70.c   |  136 ++
 .../testsuite/libgomp.oacc-c-c++-common/lib-71.c   |  119 ++
 .../testsuite/libgomp.oacc-c-c++-common/lib-72.c   |  121 ++
 .../testsuite/libgomp.oacc-c-c++-common/lib-73.c   |  134 ++
 .../testsuite/libgomp.oacc-c-c++-common/lib-74.c   |  139 ++
 .../testsuite/libgomp.oacc-c-c++-common/lib-75.c   |  141 ++
 .../testsuite/libgomp.oacc-c-c++-common/lib-76.c   |  147 ++
 .../testsuite/libgomp.oacc-c-c++-common/lib-77.c   |  135 ++
 .../testsuite/libgomp.oacc-c-c++-common/lib-78.c   |  140 ++
 .../testsuite/libgomp.oacc-c-c++-common/lib-79.c   |  167 ++
 .../testsuite/libgomp.oacc-c-c++-common/lib-80.c   |  132 ++
 .../testsuite/libgomp.oacc-c-c++-common/lib-81.c   |  211 +++
 .../testsuite/libgomp.oacc-c-c++-common/lib-82.c   |  144 ++
 .../testsuite/libgomp.oacc-c-c++-common/lib-83.c   |   58 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-84.c   |   66 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-85.c   |   52 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-86.c   |   42 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-87.c   |   42 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-88.c   |  111 ++
 .../testsuite/libgomp.oacc-c-c++-common/lib-89.c   |  118 ++
 .../testsuite/libgomp.oacc-c-c++-common/lib-9.c    |   70 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-90.c   |  137 ++
 .../testsuite/libgomp.oacc-c-c++-common/lib-91.c   |   84 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-92.c   |  112 ++
 .../testsuite/libgomp.oacc-c-c++-common/nested-1.c |  680 ++++++++
 .../testsuite/libgomp.oacc-c-c++-common/nested-2.c |  141 ++
 .../testsuite/libgomp.oacc-c-c++-common/offset-1.c |   97 ++
 .../libgomp.oacc-c-c++-common/parallel-1.c         |  206 +++
 .../libgomp.oacc-c-c++-common/parallel-empty.c     |    6 +
 .../libgomp.oacc-c-c++-common/pointer-align-1.c    |   35 +
 .../libgomp.oacc-c-c++-common/present-1.c          |   48 +
 .../libgomp.oacc-c-c++-common/present-2.c          |   48 +
 .../libgomp.oacc-c-c++-common/reduction-1.c        |  174 ++
 .../libgomp.oacc-c-c++-common/reduction-2.c        |  126 ++
 .../libgomp.oacc-c-c++-common/reduction-3.c        |  126 ++
 .../libgomp.oacc-c-c++-common/reduction-4.c        |  129 ++
 .../libgomp.oacc-c-c++-common/reduction-5.c        |   32 +
 .../reduction-initial-1.c                          |   25 +
 libgomp/testsuite/libgomp.oacc-c-c++-common/subr.h |   46 +
 .../testsuite/libgomp.oacc-c-c++-common/subr.ptx   |  148 ++
 .../testsuite/libgomp.oacc-c-c++-common/timer.h    |  103 ++
 .../libgomp.oacc-c-c++-common/update-1-2.c         |  282 +++
 .../testsuite/libgomp.oacc-c-c++-common/update-1.c |  280 +++
 libgomp/testsuite/libgomp.oacc-c/c.exp             |   71 +
 libgomp/testsuite/libgomp.oacc-fortran/abort-1.f90 |   10 +
 libgomp/testsuite/libgomp.oacc-fortran/abort-2.f90 |   13 +
 .../libgomp.oacc-fortran/acc_on_device-1-1.f90     |   52 +
 .../libgomp.oacc-fortran/acc_on_device-1-2.f       |   52 +
 .../libgomp.oacc-fortran/acc_on_device-1-3.f       |   52 +
 .../testsuite/libgomp.oacc-fortran/asyncwait-1.f90 |  135 ++
 .../testsuite/libgomp.oacc-fortran/asyncwait-2.f90 |   40 +
 .../testsuite/libgomp.oacc-fortran/asyncwait-3.f90 |   42 +
 .../testsuite/libgomp.oacc-fortran/collapse-1.f90  |   27 +
 .../testsuite/libgomp.oacc-fortran/collapse-2.f90  |   25 +
 .../testsuite/libgomp.oacc-fortran/collapse-3.f90  |   28 +
 .../testsuite/libgomp.oacc-fortran/collapse-4.f90  |   40 +
 .../testsuite/libgomp.oacc-fortran/collapse-5.f90  |   48 +
 .../testsuite/libgomp.oacc-fortran/collapse-6.f90  |   50 +
 .../testsuite/libgomp.oacc-fortran/collapse-7.f90  |   40 +
 .../testsuite/libgomp.oacc-fortran/collapse-8.f90  |   47 +
 libgomp/testsuite/libgomp.oacc-fortran/data-1.f90  |   45 +
 libgomp/testsuite/libgomp.oacc-fortran/data-2.f90  |   31 +
 libgomp/testsuite/libgomp.oacc-fortran/data-3.f90  |  131 ++
 .../testsuite/libgomp.oacc-fortran/data-4-2.f90    |  138 ++
 libgomp/testsuite/libgomp.oacc-fortran/data-4.f90  |  136 ++
 .../libgomp.oacc-fortran/data-already-1.f          |   17 +
 .../libgomp.oacc-fortran/data-already-2.f          |   16 +
 .../libgomp.oacc-fortran/data-already-3.f          |   15 +
 .../libgomp.oacc-fortran/data-already-4.f          |   14 +
 .../libgomp.oacc-fortran/data-already-5.f          |   14 +
 .../libgomp.oacc-fortran/data-already-6.f          |   14 +
 .../libgomp.oacc-fortran/data-already-7.f          |   14 +
 .../libgomp.oacc-fortran/data-already-8.f          |   16 +
 libgomp/testsuite/libgomp.oacc-fortran/fortran.exp |   98 ++
 libgomp/testsuite/libgomp.oacc-fortran/lib-1.f90   |   13 +
 libgomp/testsuite/libgomp.oacc-fortran/lib-10.f90  |   82 +
 libgomp/testsuite/libgomp.oacc-fortran/lib-2.f     |   13 +
 libgomp/testsuite/libgomp.oacc-fortran/lib-3.f     |   13 +
 libgomp/testsuite/libgomp.oacc-fortran/lib-4.f90   |   35 +
 libgomp/testsuite/libgomp.oacc-fortran/lib-5.f90   |   31 +
 libgomp/testsuite/libgomp.oacc-fortran/lib-6.f90   |   35 +
 libgomp/testsuite/libgomp.oacc-fortran/lib-7.f90   |   31 +
 libgomp/testsuite/libgomp.oacc-fortran/lib-8.f90   |   83 +
 libgomp/testsuite/libgomp.oacc-fortran/map-1.f90   |   97 ++
 .../libgomp.oacc-fortran/openacc_version-1.f       |    9 +
 .../libgomp.oacc-fortran/openacc_version-2.f90     |    9 +
 .../libgomp.oacc-fortran/pointer-align-1.f90       |   21 +
 libgomp/testsuite/libgomp.oacc-fortran/pset-1.f90  |  229 +++
 .../testsuite/libgomp.oacc-fortran/reduction-1.f90 |  225 +++
 .../testsuite/libgomp.oacc-fortran/reduction-2.f90 |  170 ++
 .../testsuite/libgomp.oacc-fortran/reduction-3.f90 |  170 ++
 .../testsuite/libgomp.oacc-fortran/reduction-4.f90 |   54 +
 .../testsuite/libgomp.oacc-fortran/reduction-5.f90 |   32 +
 .../testsuite/libgomp.oacc-fortran/reduction-6.f90 |   30 +
 .../testsuite/libgomp.oacc-fortran/routine-1.f90   |   32 +
 .../testsuite/libgomp.oacc-fortran/routine-2.f90   |   29 +
 .../testsuite/libgomp.oacc-fortran/routine-3.f90   |   27 +
 .../testsuite/libgomp.oacc-fortran/routine-4.f90   |   23 +
 .../testsuite/libgomp.oacc-fortran/subarrays-1.f90 |   97 ++
 .../testsuite/libgomp.oacc-fortran/subarrays-2.f90 |  100 ++
 liboffloadmic/ChangeLog                            |    5 +
 liboffloadmic/plugin/libgomp-plugin-intelmic.cpp   |   26 +-
 451 files changed, 37297 insertions(+), 1546 deletions(-)


Grüße,
 Thomas



[-- Attachment #1.2: gcc-trunk-r219682.patch.gz --]
[-- Type: application/x-gzip, Size: 220224 bytes --]

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Merge current set of OpenACC changes from gomp-4_0-branch
  2015-01-15 20:44 Merge current set of OpenACC changes from gomp-4_0-branch Thomas Schwinge
@ 2015-01-15 20:47 ` Jeff Law
  2015-01-15 22:47 ` Tobias Burnus
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 92+ messages in thread
From: Jeff Law @ 2015-01-15 20:47 UTC (permalink / raw)
  To: Thomas Schwinge, gcc-patches

On 01/15/15 13:20, Thomas Schwinge wrote:
> Hi!
>
> In r219682, I have committed to trunk our current set of OpenACC changes,
> which we had prepared on gomp-4_0-branch.  Thanks to everyone who has
> been contributing!
>
> Note that this is an experimental feature, incomplete, and subject to
> change in future versions of GCC.  We shall update -- and keep updated --
> <https://gcc.gnu.org/wiki/OpenACC>, to track the current status.  (Please
> come back to that page in a few days, it has not yet been updated.)
>
> Please note that there are still a handful of patches pending (posted
> weeks ago, need to ping) that are needed for nvptx offloading, so that's
> not yet functional.
Definitely ping them.  We're trying hard to get things closed down and 
knowing what's still out there is very important.

jeff

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Merge current set of OpenACC changes from gomp-4_0-branch
  2015-01-15 20:44 Merge current set of OpenACC changes from gomp-4_0-branch Thomas Schwinge
  2015-01-15 20:47 ` Jeff Law
@ 2015-01-15 22:47 ` Tobias Burnus
  2015-01-16 12:34 ` Gerald Pfeifer
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 92+ messages in thread
From: Tobias Burnus @ 2015-01-15 22:47 UTC (permalink / raw)
  To: Thomas Schwinge, gcc-patches; +Cc: gfortran

[-- Attachment #1: Type: text/plain, Size: 1450 bytes --]

Hi Thomas,

thanks to you and all others involved for the OpenACC merge.


Attached is a patch which converts for Fortran '%s' into %qs, as 
mentioned to before. (It wasn't possible when the original patch was 
reviewed as the common diagnostic patches came later.)

Committed as Rev. 219694.


On the Fortran side: Compared with C/C++, support for "acc cache" seems 
to be missing (PR63865, contains draft-patch link). (Other PRs: the 
device_resident clause is not supported (PR63859) - I don't know whether 
it is supported in C/C++ or not. And there are two ICEs (PR63865, 
PR63858).) – Is some work on those planed on your side for GCC 5 or more 
likely not?

Tobias


Thomas Schwinge wrote:
> In r219682, I have committed to trunk our current set of OpenACC changes,
> which we had prepared on gomp-4_0-branch.  Thanks to everyone who has
> been contributing!
>
> Note that this is an experimental feature, incomplete, and subject to
> change in future versions of GCC.  We shall update -- and keep updated --
> <https://gcc.gnu.org/wiki/OpenACC>, to track the current status.  (Please
> come back to that page in a few days, it has not yet been updated.)
>
> Please note that there are still a handful of patches pending (posted
> weeks ago, need to ping) that are needed for nvptx offloading, so that's
> not yet functional.
>
> Here's the commit log.  The patch itself is too big to post inline, so
> please find it attached, gzipped.


[-- Attachment #2: openacc.diff --]
[-- Type: text/x-patch, Size: 7546 bytes --]

2015-01-15  Tobias Burnus  <burnus@net-b.de>

	* openmp.c (check_symbol_not_pointer, resolve_oacc_data_clauses,
	resolve_oacc_deviceptr_clause, resolve_omp_clauses,
	gfc_resolve_oacc_declare): Replace '%s' by %qs.

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 005739b..422e977 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -1457,7 +1457,7 @@ gfc_match_oacc_routine (void)
   if (!sym->attr.external && !sym->attr.function && !sym->attr.subroutine)
     {
       gfc_error ("Syntax error in !$ACC ROUTINE ( NAME ) at %C, invalid"
-		 " function name '%s'", sym->name);
+		 " function name %qs", sym->name);
       gfc_current_locus = old_loc;
       return MATCH_ERROR;
     }
@@ -2649,29 +2649,29 @@ static void
 check_symbol_not_pointer (gfc_symbol *sym, locus loc, const char *name)
 {
   if (sym->ts.type == BT_DERIVED && sym->attr.pointer)
-    gfc_error ("POINTER object '%s' of derived type in %s clause at %L",
+    gfc_error ("POINTER object %qs of derived type in %s clause at %L",
 	       sym->name, name, &loc);
   if (sym->ts.type == BT_DERIVED && sym->attr.cray_pointer)
-    gfc_error ("Cray pointer object of derived type '%s' in %s clause at %L",
+    gfc_error ("Cray pointer object of derived type %qs in %s clause at %L",
 	       sym->name, name, &loc);
   if (sym->ts.type == BT_DERIVED && sym->attr.cray_pointee)
-    gfc_error ("Cray pointee object of derived type '%s' in %s clause at %L",
+    gfc_error ("Cray pointee object of derived type %qs in %s clause at %L",
 	       sym->name, name, &loc);
 
   if ((sym->ts.type == BT_ASSUMED && sym->attr.pointer)
       || (sym->ts.type == BT_CLASS && CLASS_DATA (sym)
 	  && CLASS_DATA (sym)->attr.pointer))
-    gfc_error ("POINTER object '%s' of polymorphic type in %s clause at %L",
+    gfc_error ("POINTER object %qs of polymorphic type in %s clause at %L",
 	       sym->name, name, &loc);
   if ((sym->ts.type == BT_ASSUMED && sym->attr.cray_pointer)
       || (sym->ts.type == BT_CLASS && CLASS_DATA (sym)
 	  && CLASS_DATA (sym)->attr.cray_pointer))
-    gfc_error ("Cray pointer object of polymorphic type '%s' in %s clause at %L",
+    gfc_error ("Cray pointer object of polymorphic type %qs in %s clause at %L",
 	       sym->name, name, &loc);
   if ((sym->ts.type == BT_ASSUMED && sym->attr.cray_pointee)
       || (sym->ts.type == BT_CLASS && CLASS_DATA (sym)
 	  && CLASS_DATA (sym)->attr.cray_pointee))
-    gfc_error ("Cray pointee object of polymorphic type '%s' in %s clause at %L",
+    gfc_error ("Cray pointee object of polymorphic type %qs in %s clause at %L",
 	       sym->name, name, &loc);
 }
 
@@ -2681,14 +2681,14 @@ static void
 check_array_not_assumed (gfc_symbol *sym, locus loc, const char *name)
 {
   if (sym->as && sym->as->type == AS_ASSUMED_SIZE)
-    gfc_error ("Assumed size array '%s' in %s clause at %L",
+    gfc_error ("Assumed size array %qs in %s clause at %L",
 	       sym->name, name, &loc);
   if (sym->as && sym->as->type == AS_ASSUMED_RANK)
-    gfc_error ("Assumed rank array '%s' in %s clause at %L",
+    gfc_error ("Assumed rank array %qs in %s clause at %L",
 	       sym->name, name, &loc);
   if (sym->as && sym->as->type == AS_DEFERRED && sym->attr.pointer
       && !sym->attr.contiguous)
-    gfc_error ("Noncontiguous deferred shape array '%s' in %s clause at %L",
+    gfc_error ("Noncontiguous deferred shape array %qs in %s clause at %L",
 	       sym->name, name, &loc);
 }
 
@@ -2696,12 +2696,12 @@ static void
 resolve_oacc_data_clauses (gfc_symbol *sym, locus loc, const char *name)
 {
   if (sym->ts.type == BT_DERIVED && sym->attr.allocatable)
-    gfc_error ("ALLOCATABLE object '%s' of derived type in %s clause at %L",
+    gfc_error ("ALLOCATABLE object %qs of derived type in %s clause at %L",
 	       sym->name, name, &loc);
   if ((sym->ts.type == BT_ASSUMED && sym->attr.allocatable)
       || (sym->ts.type == BT_CLASS && CLASS_DATA (sym)
 	  && CLASS_DATA (sym)->attr.allocatable))
-    gfc_error ("ALLOCATABLE object '%s' of polymorphic type "
+    gfc_error ("ALLOCATABLE object %qs of polymorphic type "
 	       "in %s clause at %L", sym->name, name, &loc);
   check_symbol_not_pointer (sym, loc, name);
   check_array_not_assumed (sym, loc, name);
@@ -2713,25 +2713,25 @@ resolve_oacc_deviceptr_clause (gfc_symbol *sym, locus loc, const char *name)
   if (sym->attr.pointer
       || (sym->ts.type == BT_CLASS && CLASS_DATA (sym)
 	  && CLASS_DATA (sym)->attr.class_pointer))
-    gfc_error ("POINTER object '%s' in %s clause at %L",
+    gfc_error ("POINTER object %qs in %s clause at %L",
 	       sym->name, name, &loc);
   if (sym->attr.cray_pointer
       || (sym->ts.type == BT_CLASS && CLASS_DATA (sym)
 	  && CLASS_DATA (sym)->attr.cray_pointer))
-    gfc_error ("Cray pointer object '%s' in %s clause at %L",
+    gfc_error ("Cray pointer object %qs in %s clause at %L",
 	       sym->name, name, &loc);
   if (sym->attr.cray_pointee
       || (sym->ts.type == BT_CLASS && CLASS_DATA (sym)
 	  && CLASS_DATA (sym)->attr.cray_pointee))
-    gfc_error ("Cray pointee object '%s' in %s clause at %L",
+    gfc_error ("Cray pointee object %qs in %s clause at %L",
 	       sym->name, name, &loc);
   if (sym->attr.allocatable
       || (sym->ts.type == BT_CLASS && CLASS_DATA (sym)
 	  && CLASS_DATA (sym)->attr.allocatable))
-    gfc_error ("ALLOCATABLE object '%s' in %s clause at %L",
+    gfc_error ("ALLOCATABLE object %qs in %s clause at %L",
 	       sym->name, name, &loc);
   if (sym->attr.value)
-    gfc_error ("VALUE object '%s' in %s clause at %L",
+    gfc_error ("VALUE object %qs in %s clause at %L",
 	       sym->name, name, &loc);
   check_array_not_assumed (sym, loc, name);
 }
@@ -3367,18 +3367,18 @@ resolve_omp_clauses (gfc_code *code, locus *where,
 		      if (n->sym->attr.allocatable
 			  || (n->sym->ts.type == BT_CLASS && CLASS_DATA (n->sym)
 			      && CLASS_DATA (n->sym)->attr.allocatable))
-			gfc_error ("ALLOCATABLE object '%s' in %s clause at %L",
+			gfc_error ("ALLOCATABLE object %qs in %s clause at %L",
 				   n->sym->name, name, where);
 		      if (n->sym->attr.pointer
 			  || (n->sym->ts.type == BT_CLASS && CLASS_DATA (n->sym)
 			      && CLASS_DATA (n->sym)->attr.class_pointer))
-			gfc_error ("POINTER object '%s' in %s clause at %L",
+			gfc_error ("POINTER object %qs in %s clause at %L",
 				   n->sym->name, name, where);
 		      if (n->sym->attr.cray_pointer)
-			gfc_error ("Cray pointer object '%s' in %s clause at %L",
+			gfc_error ("Cray pointer object %qs in %s clause at %L",
 				   n->sym->name, name, where);
 		      if (n->sym->attr.cray_pointee)
-			gfc_error ("Cray pointee object '%s' in %s clause at %L",
+			gfc_error ("Cray pointee object %qs in %s clause at %L",
 				   n->sym->name, name, where);
 		      /* FALLTHRU */
 		  case OMP_LIST_DEVICE_RESIDENT:
@@ -4624,7 +4624,7 @@ gfc_resolve_oacc_declare (gfc_namespace *ns)
       {
 	n->sym->mark = 0;
 	if (n->sym->attr.flavor == FL_PARAMETER)
-	  gfc_error ("PARAMETER object '%s' is not allowed at %L", n->sym->name, &loc);
+	  gfc_error ("PARAMETER object %qs is not allowed at %L", n->sym->name, &loc);
       }
 
   for (list = OMP_LIST_DEVICE_RESIDENT;
@@ -4632,7 +4632,7 @@ gfc_resolve_oacc_declare (gfc_namespace *ns)
     for (n = ns->oacc_declare_clauses->lists[list]; n; n = n->next)
       {
 	if (n->sym->mark)
-	  gfc_error ("Symbol '%s' present on multiple clauses at %L",
+	  gfc_error ("Symbol %qs present on multiple clauses at %L",
 		     n->sym->name, &loc);
 	else
 	  n->sym->mark = 1;

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Merge current set of OpenACC changes from gomp-4_0-branch
  2015-01-15 20:44 Merge current set of OpenACC changes from gomp-4_0-branch Thomas Schwinge
  2015-01-15 20:47 ` Jeff Law
  2015-01-15 22:47 ` Tobias Burnus
@ 2015-01-16 12:34 ` Gerald Pfeifer
  2015-01-16 20:37   ` Thomas Schwinge
  2015-01-16 15:04 ` Gerald Pfeifer
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 92+ messages in thread
From: Gerald Pfeifer @ 2015-01-16 12:34 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: gcc-patches

Hi Thomas,

On Thursday 2015-01-15 21:20, Thomas Schwinge wrote:
> In r219682, I have committed to trunk our current set of OpenACC changes,
> which we had prepared on gomp-4_0-branch.  Thanks to everyone who has
> been contributing!

this breaks bootstrap on FreeBSD 8/amd64 from what I can tell:

libtool: compile:  /scratch/tmp/gerald/OBJ-0116-1138/./gcc/xgcc 
-B/scratch/tmp/gerald/OBJ-0116-1138/./gcc/ 
-B/home/gerald/gcc-ref8-amd64/x86_64-unknown-freebsd8.4/bin/ 
-B/home/gerald/gcc-ref8-amd64/x86_64-unknown-freebsd8.4/lib/ -isystem 
/home/gerald/gcc-ref8-amd64/x86_64-unknown-freebsd8.4/include -isystem 
/home/gerald/gcc-ref8-amd64/x86_64-unknown-freebsd8.4/sys-include 
-DHAVE_CONFIG_H -I. -I/sc ratch/tmp/gerald/gcc-HEAD/libgomp 
-I/scratch/tmp/gerald/gcc-HEAD/libgomp/config/ posix 
-I/scratch/tmp/gerald/gcc-HEAD/libgomp 
-I/scratch/tmp/gerald/gcc-HEAD/libg omp/../include -Wall -pthread 
-Werror -g -O2 -MT work.lo -MD -MP -MF .deps/work. Tpo -c 
/scratch/tmp/gerald/gcc-HEAD/libgomp/work.c -o work.o >/dev/null 2>&1 
/scratch/tmp/gerald/gcc-HEAD/libgomp/oacc-parallel.c:37:20: fatal error: 
alloca. h: No such file or directory compilation terminated.


% find /usr/include/ -name alloca.h
%

i.e., FreeBSD does not feature the alloca.h header and declares
alloca() in stdlib.h.

Gerald

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Merge current set of OpenACC changes from gomp-4_0-branch
  2015-01-15 20:44 Merge current set of OpenACC changes from gomp-4_0-branch Thomas Schwinge
                   ` (2 preceding siblings ...)
  2015-01-16 12:34 ` Gerald Pfeifer
@ 2015-01-16 15:04 ` Gerald Pfeifer
  2015-01-16 15:06 ` Jakub Jelinek
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 92+ messages in thread
From: Gerald Pfeifer @ 2015-01-16 15:04 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: gcc-patches

Once I work around the previous failure, I quickly get another one on
FreeBSD 8.4/amd64:

/scratch/tmp/gerald/gcc-HEAD/libgomp/target.c:67:12: error: 
\xe2\x80\x98num_devices\xe2\x80\x99 defined but not used [-Werror=unused-variable]
 static int num_devices;
            ^

This one did not require autoconf, so I went ahead and fixed it myself. 
;-)  Committed after successful bootstrap on x86_64-unknown-freebsd8.4 
(after hacking around the other failure).

Gerald


2015-01-16  Gerald Pfeifer  <gerald@pfeifer.com>

	* target.c (num_devices): Guard with PLUGIN_SUPPORT.

Index: target.c
===================================================================
--- target.c	(revision 219741)
+++ target.c	(revision 219742)
@@ -63,8 +63,10 @@
 /* Array of descriptors for all available devices.  */
 static struct gomp_device_descr *devices;
 
+#ifdef PLUGIN_SUPPORT
 /* Total number of available devices.  */
 static int num_devices;
+#endif
 
 /* Number of GOMP_OFFLOAD_CAP_OPENMP_400 devices.  */
 static int num_devices_openmp;

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Merge current set of OpenACC changes from gomp-4_0-branch
  2015-01-15 20:44 Merge current set of OpenACC changes from gomp-4_0-branch Thomas Schwinge
                   ` (3 preceding siblings ...)
  2015-01-16 15:04 ` Gerald Pfeifer
@ 2015-01-16 15:06 ` Jakub Jelinek
  2015-01-16 15:37   ` David Malcolm
  2015-01-16 21:13 ` [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter (was: Merge current set of OpenACC changes from gomp-4_0-branch) Thomas Schwinge
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 92+ messages in thread
From: Jakub Jelinek @ 2015-01-16 15:06 UTC (permalink / raw)
  To: Thomas Schwinge, David Malcolm; +Cc: gcc-patches

On Thu, Jan 15, 2015 at 09:20:07PM +0100, Thomas Schwinge wrote:
>     	* builtin-types.def (BT_FN_VOID_INT_INT_VAR)
>     	(BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR_INT_INT_VAR)
>     	(BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR):
>     	New function types.

This broke bootstrap with --enable-languages=jit.  Fixed thusly, committed
as obvious:

2015-01-16  Jakub Jelinek  <jakub@redhat.com>

	* jit-builtins.h (DEF_FUNCTION_TYPE_VAR_5): Fix spelling of
	last argument.
	(DEF_FUNCTION_TYPE_VAR_8, DEF_FUNCTION_TYPE_VAR_12): Define and
	undef afterwards.
	* jit-builtins.c (DEF_FUNCTION_TYPE_VAR_8, DEF_FUNCTION_TYPE_VAR_12):
	Likewise.

--- gcc/jit/jit-builtins.h.jj	2015-01-14 11:00:22.000000000 +0100
+++ gcc/jit/jit-builtins.h	2015-01-16 15:35:13.440088390 +0100
@@ -45,7 +45,12 @@ enum jit_builtin_type
 #define DEF_FUNCTION_TYPE_VAR_2(NAME, RETURN, ARG1, ARG2) NAME,
 #define DEF_FUNCTION_TYPE_VAR_3(NAME, RETURN, ARG1, ARG2, ARG3) NAME,
 #define DEF_FUNCTION_TYPE_VAR_4(NAME, RETURN, ARG1, ARG2, ARG3, ARG4) NAME,
-#define DEF_FUNCTION_TYPE_VAR_5(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG6) \
+#define DEF_FUNCTION_TYPE_VAR_5(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5) \
+  NAME,
+#define DEF_FUNCTION_TYPE_VAR_8(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				ARG6, ARG7, ARG8) NAME,
+#define DEF_FUNCTION_TYPE_VAR_12(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11, ARG12) \
   NAME,
 #define DEF_POINTER_TYPE(NAME, TYPE) NAME,
 #include "builtin-types.def"
@@ -65,6 +70,8 @@ enum jit_builtin_type
 #undef DEF_FUNCTION_TYPE_VAR_3
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
+#undef DEF_FUNCTION_TYPE_VAR_8
+#undef DEF_FUNCTION_TYPE_VAR_12
 #undef DEF_POINTER_TYPE
   BT_LAST
 }; /* enum jit_builtin_type */
--- gcc/jit/jit-builtins.c.jj	2015-01-14 11:00:22.000000000 +0100
+++ gcc/jit/jit-builtins.c	2015-01-16 15:35:02.176276537 +0100
@@ -313,6 +313,15 @@ builtins_manager::make_type (enum jit_bu
       case ENUM: return make_fn_type (ENUM, RETURN, 1, 4, ARG1, ARG2, ARG3, ARG4);
 #define DEF_FUNCTION_TYPE_VAR_5(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5) \
       case ENUM: return make_fn_type (ENUM, RETURN, 1, 5, ARG1, ARG2, ARG3, ARG4, ARG5);
+#define DEF_FUNCTION_TYPE_VAR_8(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				ARG6, ARG7, ARG8) \
+      case ENUM: return make_fn_type (ENUM, RETURN, 1, 8, ARG1, ARG2, ARG3, \
+				      ARG4, ARG5, ARG6, ARG7, ARG8);
+#define DEF_FUNCTION_TYPE_VAR_12(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11, ARG12) \
+      case ENUM: return make_fn_type (ENUM, RETURN, 1, 12, ARG1, ARG2, ARG3, \
+				      ARG4, ARG5, ARG6, ARG7, ARG8, ARG9, \
+				      ARG10, ARG11, ARG12);
 #define DEF_POINTER_TYPE(ENUM, TYPE) \
       case ENUM: return make_ptr_type (ENUM, TYPE);
 
@@ -334,6 +343,8 @@ builtins_manager::make_type (enum jit_bu
 #undef DEF_FUNCTION_TYPE_VAR_3
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
+#undef DEF_FUNCTION_TYPE_VAR_8
+#undef DEF_FUNCTION_TYPE_VAR_12
 #undef DEF_POINTER_TYPE
 
     default:


	Jakub

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Merge current set of OpenACC changes from gomp-4_0-branch
  2015-01-16 15:06 ` Jakub Jelinek
@ 2015-01-16 15:37   ` David Malcolm
  0 siblings, 0 replies; 92+ messages in thread
From: David Malcolm @ 2015-01-16 15:37 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Thomas Schwinge, gcc-patches

On Fri, 2015-01-16 at 15:39 +0100, Jakub Jelinek wrote:
> On Thu, Jan 15, 2015 at 09:20:07PM +0100, Thomas Schwinge wrote:
> >     	* builtin-types.def (BT_FN_VOID_INT_INT_VAR)
> >     	(BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR_INT_INT_VAR)
> >     	(BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR):
> >     	New function types.
> 
> This broke bootstrap with --enable-languages=jit.  Fixed thusly, committed
> as obvious:
> 
> 2015-01-16  Jakub Jelinek  <jakub@redhat.com>
> 
> 	* jit-builtins.h (DEF_FUNCTION_TYPE_VAR_5): Fix spelling of
> 	last argument.
> 	(DEF_FUNCTION_TYPE_VAR_8, DEF_FUNCTION_TYPE_VAR_12): Define and
> 	undef afterwards.
> 	* jit-builtins.c (DEF_FUNCTION_TYPE_VAR_8, DEF_FUNCTION_TYPE_VAR_12):
> 	Likewise.

Thanks!


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Merge current set of OpenACC changes from gomp-4_0-branch
  2015-01-16 12:34 ` Gerald Pfeifer
@ 2015-01-16 20:37   ` Thomas Schwinge
  0 siblings, 0 replies; 92+ messages in thread
From: Thomas Schwinge @ 2015-01-16 20:37 UTC (permalink / raw)
  To: Gerald Pfeifer; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2599 bytes --]

Hi Gerald!

On Fri, 16 Jan 2015 13:32:10 +0100 (CET), Gerald Pfeifer <gerald@pfeifer.com> wrote:
> On Thursday 2015-01-15 21:20, Thomas Schwinge wrote:
> > In r219682, I have committed to trunk our current set of OpenACC changes,
> > which we had prepared on gomp-4_0-branch.  Thanks to everyone who has
> > been contributing!
> 
> this breaks bootstrap on FreeBSD 8/amd64 from what I can tell:

Sorry for that.  And, thanks for fixing the num_devices issue.

> /scratch/tmp/gerald/gcc-HEAD/libgomp/oacc-parallel.c:37:20: fatal error: 
> alloca. h: No such file or directory compilation terminated.
> 
> 
> % find /usr/include/ -name alloca.h
> %
> 
> i.e., FreeBSD does not feature the alloca.h header and declares
> alloca() in stdlib.h.

The fix is simple enough; committed to trunk in r219771, as obvious:

commit a6f19a7c6b55f96d0c6dc65914857fc8e9b30aaf
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Fri Jan 16 20:05:21 2015 +0000

    libgomp: Don't use <alloca.h>.
    
    	libgomp/
    	* oacc-parallel.c: Don't include <alloca.h>.
    	(GOACC_parallel): Use gomp_alloca instead of alloca.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@219771 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog       | 5 +++++
 libgomp/oacc-parallel.c | 3 +--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git libgomp/ChangeLog libgomp/ChangeLog
index 7c106d4..065dfd4 100644
--- libgomp/ChangeLog
+++ libgomp/ChangeLog
@@ -1,3 +1,8 @@
+2015-01-16  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* oacc-parallel.c: Don't include <alloca.h>.
+	(GOACC_parallel): Use gomp_alloca instead of alloca.
+
 2015-01-16  Gerald Pfeifer  <gerald@pfeifer.com>
 
 	* target.c (num_devices): Guard with PLUGIN_SUPPORT.
diff --git libgomp/oacc-parallel.c libgomp/oacc-parallel.c
index 6d5386b..b5e8060 100644
--- libgomp/oacc-parallel.c
+++ libgomp/oacc-parallel.c
@@ -34,7 +34,6 @@
 #include <string.h>
 #include <stdarg.h>
 #include <assert.h>
-#include <alloca.h>
 
 static int
 find_pset (int pos, size_t mapnum, unsigned short *kinds)
@@ -151,7 +150,7 @@ GOACC_parallel (int device, void (*fn) (void *), const void *offload_table,
   tgt = gomp_map_vars (acc_dev, mapnum, hostaddrs, NULL, sizes, kinds, true,
 		       false);
 
-  devaddrs = alloca (sizeof (void *) * mapnum);
+  devaddrs = gomp_alloca (sizeof (void *) * mapnum);
   for (i = 0; i < mapnum; i++)
     devaddrs[i] = (void *) (tgt->list[i]->tgt->tgt_start
 			    + tgt->list[i]->tgt_offset);


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter (was: Merge current set of OpenACC changes from gomp-4_0-branch)
  2015-01-15 20:44 Merge current set of OpenACC changes from gomp-4_0-branch Thomas Schwinge
                   ` (4 preceding siblings ...)
  2015-01-16 15:06 ` Jakub Jelinek
@ 2015-01-16 21:13 ` Thomas Schwinge
  2015-01-16 23:19   ` Ilya Verbin
                     ` (2 more replies)
  2015-01-16 22:41 ` Merge current set of OpenACC changes from gomp-4_0-branch Andreas Schwab
                   ` (4 subsequent siblings)
  10 siblings, 3 replies; 92+ messages in thread
From: Thomas Schwinge @ 2015-01-16 21:13 UTC (permalink / raw)
  To: jakub, gcc-patches
  Cc: howarth, dominiq, andrey.turetskiy, bernds, iverbin, kyukhin

[-- Attachment #1: Type: text/plain, Size: 8596 bytes --]

Hi!

On Thu, 15 Jan 2015 21:20:07 +0100, I wrote:
> In r219682, I have committed to trunk our current set of OpenACC changes,

Here is a patch to remove the __OFFLOAD_SYMBOL__ variable/formal
parameter, as discussed in <https://gcc.gnu.org/PR64625>.

But -- I now wonder whether that's actually the issue that has been
reported in the PR; doesn't that more look like a problem with the
__OFFLOAD_TABLE__ symbol defined in libgcc/offloadstuff.c, and used in
the mkoffload tools (such as gcc/config/i386/intelmic-mkoffload.c)?  Can
anyone guess what's going on?

Anyway, as discussed in <https://gcc.gnu.org/PR64625>, I'd like to commit
this patch either way, OK?

commit 4409d0129118479c1cd1adbcfa96316ac4e734b0
Author: Thomas Schwinge <thomas@codesourcery.com>
Date:   Fri Jan 16 20:12:12 2015 +0100

    [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter.
    
    	gcc/
    	* omp-low.c (offload_symbol_decl): Remove variable.
    	(get_offload_symbol_decl): Remove function.
    	(expand_omp_target): For BUILT_IN_GOMP_TARGET,
    	BUILT_IN_GOMP_TARGET_DATA, BUILT_IN_GOMP_TARGET_UPDATE pass NULL
    	instead of &__OFFLOAD_TABLE__, for BUILT_IN_GOACC_DATA_START,
    	BUILT_IN_GOACC_ENTER_EXIT_DATA, BUILT_IN_GOACC_PARALLEL,
    	BUILT_IN_GOACC_UPDATE don't pass it at all.
    	libgomp/
    	* libgomp_g.h (GOACC_data_start, GOACC_enter_exit_data)
    	(GOACC_parallel, GOACC_update): Remove const_void *offload_table
    	formal parameter.  Update all users.
    	* target.c (GOMP_target, GOMP_target_data, GOMP_target_update):
    	Document unused formal parameter.
---
 gcc/omp-low.c           | 45 ++++++++++++++++++---------------------------
 libgomp/libgomp_g.h     | 10 +++++-----
 libgomp/oacc-parallel.c |  8 ++++----
 libgomp/target.c        | 11 +++++------
 4 files changed, 32 insertions(+), 42 deletions(-)

diff --git gcc/omp-low.c gcc/omp-low.c
index b7bf338..1589310 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -340,30 +340,6 @@ oacc_max_threads (omp_context *ctx)
 /* Holds offload tables with decls.  */
 vec<tree, va_gc> *offload_funcs, *offload_vars;
 
-/* Holds a decl for __OFFLOAD_TABLE__.  */
-static GTY(()) tree offload_symbol_decl;
-
-/* Get the __OFFLOAD_TABLE__ symbol.  */
-static tree
-get_offload_symbol_decl (void)
-{
-  if (!offload_symbol_decl)
-    {
-      tree decl = build_decl (UNKNOWN_LOCATION, VAR_DECL,
-			      get_identifier ("__OFFLOAD_TABLE__"),
-			      ptr_type_node);
-      TREE_ADDRESSABLE (decl) = 1;
-      TREE_PUBLIC (decl) = 1;
-      DECL_EXTERNAL (decl) = 1;
-      DECL_WEAK (decl) = 1;
-      DECL_ATTRIBUTES (decl)
-	= tree_cons (get_identifier ("weak"),
-		     NULL_TREE, DECL_ATTRIBUTES (decl));
-      offload_symbol_decl = decl;
-    }
-  return offload_symbol_decl;
-}
-
 /* Convenience function for calling scan_omp_1_op on tree operands.  */
 
 static inline tree
@@ -9119,16 +9095,31 @@ expand_omp_target (struct omp_region *region)
     }
 
   gimple g;
-  tree offload_table = get_offload_symbol_decl ();
   vec<tree> *args;
   /* The maximum number used by any start_ix, without varargs.  */
-  unsigned int argcnt = 12;
+  unsigned int argcnt = 11;
 
   vec_alloc (args, argcnt);
   args->quick_push (device);
   if (offloaded)
     args->quick_push (build_fold_addr_expr (child_fn));
-  args->quick_push (build_fold_addr_expr (offload_table));
+  switch (start_ix)
+    {
+    case BUILT_IN_GOMP_TARGET:
+    case BUILT_IN_GOMP_TARGET_DATA:
+    case BUILT_IN_GOMP_TARGET_UPDATE:
+      /* This const void * is part of the current ABI, but we're not actually
+	 using it.  */
+      args->quick_push (build_zero_cst (ptr_type_node));
+      break;
+    case BUILT_IN_GOACC_DATA_START:
+    case BUILT_IN_GOACC_ENTER_EXIT_DATA:
+    case BUILT_IN_GOACC_PARALLEL:
+    case BUILT_IN_GOACC_UPDATE:
+      break;
+    default:
+      gcc_unreachable ();
+    }
   args->quick_push (t1);
   args->quick_push (t2);
   args->quick_push (t3);
diff --git libgomp/libgomp_g.h libgomp/libgomp_g.h
index c1e4e63..5e88d45 100644
--- libgomp/libgomp_g.h
+++ libgomp/libgomp_g.h
@@ -217,15 +217,15 @@ extern void GOMP_teams (unsigned int, unsigned int);
 
 /* oacc-parallel.c */
 
-extern void GOACC_data_start (int, const void *,
-			      size_t, void **, size_t *, unsigned short *);
+extern void GOACC_data_start (int, size_t, void **, size_t *,
+			      unsigned short *);
 extern void GOACC_data_end (void);
-extern void GOACC_enter_exit_data (int, const void *, size_t, void **,
+extern void GOACC_enter_exit_data (int, size_t, void **,
 				   size_t *, unsigned short *, int, int, ...);
-extern void GOACC_parallel (int, void (*) (void *), const void *, size_t,
+extern void GOACC_parallel (int, void (*) (void *), size_t,
 			    void **, size_t *, unsigned short *, int, int, int,
 			    int, int, ...);
-extern void GOACC_update (int, const void *, size_t, void **, size_t *,
+extern void GOACC_update (int, size_t, void **, size_t *,
 			  unsigned short *, int, int, ...);
 extern void GOACC_wait (int, int, ...);
 extern int GOACC_get_num_threads (void);
diff --git libgomp/oacc-parallel.c libgomp/oacc-parallel.c
index b5e8060..a300742 100644
--- libgomp/oacc-parallel.c
+++ libgomp/oacc-parallel.c
@@ -75,7 +75,7 @@ select_acc_device (int device_type)
 static void goacc_wait (int async, int num_waits, va_list ap);
 
 void
-GOACC_parallel (int device, void (*fn) (void *), const void *offload_table,
+GOACC_parallel (int device, void (*fn) (void *),
 		size_t mapnum, void **hostaddrs, size_t *sizes,
 		unsigned short *kinds,
 		int num_gangs, int num_workers, int vector_length,
@@ -172,7 +172,7 @@ GOACC_parallel (int device, void (*fn) (void *), const void *offload_table,
 }
 
 void
-GOACC_data_start (int device, const void *offload_table, size_t mapnum,
+GOACC_data_start (int device, size_t mapnum,
 		  void **hostaddrs, size_t *sizes, unsigned short *kinds)
 {
   bool host_fallback = device == GOMP_DEVICE_HOST_FALLBACK;
@@ -218,7 +218,7 @@ GOACC_data_end (void)
 }
 
 void
-GOACC_enter_exit_data (int device, const void *offload_table, size_t mapnum,
+GOACC_enter_exit_data (int device, size_t mapnum,
 		       void **hostaddrs, size_t *sizes, unsigned short *kinds,
 		       int async, int num_waits, ...)
 {
@@ -408,7 +408,7 @@ goacc_wait (int async, int num_waits, va_list ap)
 }
 
 void
-GOACC_update (int device, const void *offload_table, size_t mapnum,
+GOACC_update (int device, size_t mapnum,
 	      void **hostaddrs, size_t *sizes, unsigned short *kinds,
 	      int async, int num_waits, ...)
 {
diff --git libgomp/target.c libgomp/target.c
index 72d64fc..ebff55e 100644
--- libgomp/target.c
+++ libgomp/target.c
@@ -738,15 +738,14 @@ gomp_fini_device (struct gomp_device_descr *devicep)
    is GOMP_DEVICE_ICV, it means use device-var ICV.  If it is
    GOMP_DEVICE_HOST_FALLBACK (or any value
    larger than last available hw device), use host fallback.
-   FN is address of host code, OFFLOAD_TABLE contains value of the
-   __OFFLOAD_TABLE__ symbol in the shared library or binary that invokes
-   GOMP_target.  HOSTADDRS, SIZES and KINDS are arrays
+   FN is address of host code, UNUSED is part of the current ABI, but
+   we're not actually using it.  HOSTADDRS, SIZES and KINDS are arrays
    with MAPNUM entries, with addresses of the host objects,
    sizes of the host objects (resp. for pointer kind pointer bias
    and assumed sizeof (void *) size) and kinds.  */
 
 void
-GOMP_target (int device, void (*fn) (void *), const void *offload_table,
+GOMP_target (int device, void (*fn) (void *), const void *unused,
 	     size_t mapnum, void **hostaddrs, size_t *sizes,
 	     unsigned char *kinds)
 {
@@ -817,7 +816,7 @@ GOMP_target (int device, void (*fn) (void *), const void *offload_table,
 }
 
 void
-GOMP_target_data (int device, const void *offload_table, size_t mapnum,
+GOMP_target_data (int device, const void *unused, size_t mapnum,
 		  void **hostaddrs, size_t *sizes, unsigned char *kinds)
 {
   struct gomp_device_descr *devicep = resolve_device (device);
@@ -873,7 +872,7 @@ GOMP_target_end_data (void)
 }
 
 void
-GOMP_target_update (int device, const void *offload_table, size_t mapnum,
+GOMP_target_update (int device, const void *unused, size_t mapnum,
 		    void **hostaddrs, size_t *sizes, unsigned char *kinds)
 {
   struct gomp_device_descr *devicep = resolve_device (device);


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Merge current set of OpenACC changes from gomp-4_0-branch
  2015-01-15 20:44 Merge current set of OpenACC changes from gomp-4_0-branch Thomas Schwinge
                   ` (5 preceding siblings ...)
  2015-01-16 21:13 ` [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter (was: Merge current set of OpenACC changes from gomp-4_0-branch) Thomas Schwinge
@ 2015-01-16 22:41 ` Andreas Schwab
  2015-02-04  9:41   ` [RFC testsuite] Fix PR64850, tweak acc_on_device* tests Thomas Schwinge
  2015-01-16 23:22 ` Merge current set of OpenACC changes from gomp-4_0-branch Ilya Verbin
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 92+ messages in thread
From: Andreas Schwab @ 2015-01-16 22:41 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: gcc-patches

FAIL: c-c++-common/goacc/acc_on_device-2-off.c  -std=c++98  scan-rtl-dump-times expand "\\\\(call [^\\\\n]*\\\\"acc_on_device" 1
FAIL: c-c++-common/goacc/acc_on_device-2-off.c  -std=c++11  scan-rtl-dump-times expand "\\\\(call [^\\\\n]*\\\\"acc_on_device" 1
FAIL: c-c++-common/goacc/acc_on_device-2-off.c  -std=c++14  scan-rtl-dump-times expand "\\\\(call [^\\\\n]*\\\\"acc_on_device" 1
XPASS: c-c++-common/goacc/acc_on_device-2.c  -std=c++98  scan-rtl-dump-times expand "\\\\(call [^\\\\n]*\\\\"acc_on_device" 0
XPASS: c-c++-common/goacc/acc_on_device-2.c  -std=c++11  scan-rtl-dump-times expand "\\\\(call [^\\\\n]*\\\\"acc_on_device" 0
XPASS: c-c++-common/goacc/acc_on_device-2.c  -std=c++14  scan-rtl-dump-times expand "\\\\(call [^\\\\n]*\\\\"acc_on_device" 0
FAIL: c-c++-common/goacc/acc_on_device-2-off.c scan-rtl-dump-times expand "\\\\(call [^\\\\n]*\\\\"acc_on_device" 1
FAIL: gcc.dg/goacc/acc_on_device-1.c scan-rtl-dump-times expand "\\\\(call [^\\\\n]*\\\\"acc_on_device" 4

You are making invalid assumptions about the form of a call pattern.

(call_insn 7 6 8 2 (set (reg:SI 0 %d0)
        (call (mem:QI (reg/f:SI 33) [0 acc_on_device S1 A8])
            (const_int 4 [0x4]))) /daten/aranym/gcc/gcc-20150116/gcc/testsuite/c-c++-common/goacc/acc_on_device-2-off.c:19 -1
     (nil)
    (nil))

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter (was: Merge current set of OpenACC changes from gomp-4_0-branch)
  2015-01-16 21:13 ` [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter (was: Merge current set of OpenACC changes from gomp-4_0-branch) Thomas Schwinge
@ 2015-01-16 23:19   ` Ilya Verbin
  2015-01-16 23:38     ` Jack Howarth
  2015-01-17  3:09   ` Jack Howarth
  2015-02-24 17:23   ` [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter Thomas Schwinge
  2 siblings, 1 reply; 92+ messages in thread
From: Ilya Verbin @ 2015-01-16 23:19 UTC (permalink / raw)
  To: Thomas Schwinge
  Cc: jakub, gcc-patches, howarth, dominiq, andrey.turetskiy, bernds, kyukhin

On 16 Jan 21:34, Thomas Schwinge wrote:
> On Thu, 15 Jan 2015 21:20:07 +0100, I wrote:
> Here is a patch to remove the __OFFLOAD_SYMBOL__ variable/formal
> parameter, as discussed in <https://gcc.gnu.org/PR64625>.
> 
> But -- I now wonder whether that's actually the issue that has been
> reported in the PR; doesn't that more look like a problem with the
> __OFFLOAD_TABLE__ symbol defined in libgcc/offloadstuff.c, and used in
> the mkoffload tools (such as gcc/config/i386/intelmic-mkoffload.c)?  Can
> anyone guess what's going on?

Why do you think so?  __OFFLOAD_TABLE__ symbol lives in libgcc/offloadstuff.c
since November without regressions.

  -- Ilya

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Merge current set of OpenACC changes from gomp-4_0-branch
  2015-01-15 20:44 Merge current set of OpenACC changes from gomp-4_0-branch Thomas Schwinge
                   ` (6 preceding siblings ...)
  2015-01-16 22:41 ` Merge current set of OpenACC changes from gomp-4_0-branch Andreas Schwab
@ 2015-01-16 23:22 ` Ilya Verbin
  2015-01-23 18:28   ` Ilya Verbin
  2015-02-17 18:06 ` Thomas Schwinge
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 92+ messages in thread
From: Ilya Verbin @ 2015-01-16 23:22 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: gcc-patches, Jakub Jelinek, Kirill Yukhin

Hi!

On 15 Jan 21:20, Thomas Schwinge wrote:
> In r219682, I have committed to trunk our current set of OpenACC changes,
> which we had prepared on gomp-4_0-branch.  Thanks to everyone who has
> been contributing!

Unfortunately, it broke offloading from shared libraries (I mean common libs
with NEEDED entries, not dlopened).  Such things are not covered by the
testsuite, that's why you missed this issue.  Here is a simple testcase:

+++++ test.c: +++++

int f_aaa (void);

int main ()
{
  int x = f_aaa ();
  #pragma omp target
    x++;
  return x;
}

+++++ libaaa.c: +++++

int f_aaa (void)
{
  int x = 0;
  #pragma omp target
    x = 10;
  return x;
}

++++++++++

$ gcc -fopenmp -shared -fPIC libaaa.c -o libaaa.so
$ gcc -fopenmp -L. -laaa test.c
$ ./a.out
libgomp: Target function wasn't mapped


The problem seems to be here:

-gomp_register_images_for_device (struct gomp_device_descr *device)
+gomp_register_image_for_device (struct gomp_device_descr *device,
+				struct offload_image_descr *image)
 {
-  int i;
-  for (i = 0; i < num_offload_images; i++)
+  if (!device->offload_regions_registered
+      && (device->type == image->type
+	  || device->type == OFFLOAD_TARGET_TYPE_HOST))
     {
-      struct offload_image_descr *image = &offload_images[i];
-      if (image->type == device->type)
-	device->register_image_func (image->host_table, image->target_data);
+      device->register_image_func (image->host_table, image->target_data);
+      device->offload_regions_registered = true;
     }
 }

So, you don't assume that a device can have multiple images from multiple libs?

  -- Ilya

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter (was: Merge current set of OpenACC changes from gomp-4_0-branch)
  2015-01-16 23:19   ` Ilya Verbin
@ 2015-01-16 23:38     ` Jack Howarth
  2015-01-16 23:48       ` Ilya Verbin
  0 siblings, 1 reply; 92+ messages in thread
From: Jack Howarth @ 2015-01-16 23:38 UTC (permalink / raw)
  To: Ilya Verbin
  Cc: Thomas Schwinge, jakub, GCC Patches, howarth,
	Dominique d'Humières, andrey.turetskiy, bernds, kyukhin

On 86_64 Fedora 15, current gcc trunk only produces…

nm libgcc_s.so.1 | grep OFF
0000000000215478 d _GLOBAL_OFFSET_TABLE_

and not __OFFLOAD_TABLE__,  The  libgcc_s.so.1 built on
x86_64-apple-darwin14 doesn't even contain the _GLOBAL_OFFSET_TABLE_
symbol.

On Fri, Jan 16, 2015 at 5:40 PM, Ilya Verbin <iverbin@gmail.com> wrote:
> On 16 Jan 21:34, Thomas Schwinge wrote:
>> On Thu, 15 Jan 2015 21:20:07 +0100, I wrote:
>> Here is a patch to remove the __OFFLOAD_SYMBOL__ variable/formal
>> parameter, as discussed in <https://gcc.gnu.org/PR64625>.
>>
>> But -- I now wonder whether that's actually the issue that has been
>> reported in the PR; doesn't that more look like a problem with the
>> __OFFLOAD_TABLE__ symbol defined in libgcc/offloadstuff.c, and used in
>> the mkoffload tools (such as gcc/config/i386/intelmic-mkoffload.c)?  Can
>> anyone guess what's going on?
>
> Why do you think so?  __OFFLOAD_TABLE__ symbol lives in libgcc/offloadstuff.c
> since November without regressions.
>
>   -- Ilya

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter (was: Merge current set of OpenACC changes from gomp-4_0-branch)
  2015-01-16 23:38     ` Jack Howarth
@ 2015-01-16 23:48       ` Ilya Verbin
  2015-01-17  0:37         ` Jack Howarth
  0 siblings, 1 reply; 92+ messages in thread
From: Ilya Verbin @ 2015-01-16 23:48 UTC (permalink / raw)
  To: Jack Howarth
  Cc: Thomas Schwinge, jakub, GCC Patches, howarth,
	Dominique d'Humières, andrey.turetskiy, bernds, kyukhin

On 16 Jan 18:22, Jack Howarth wrote:
> On 86_64 Fedora 15, current gcc trunk only produces…
> 
> nm libgcc_s.so.1 | grep OFF
> 0000000000215478 d _GLOBAL_OFFSET_TABLE_
> 
> and not __OFFLOAD_TABLE__,  The  libgcc_s.so.1 built on
> x86_64-apple-darwin14 doesn't even contain the _GLOBAL_OFFSET_TABLE_
> symbol.
> 
> On Fri, Jan 16, 2015 at 5:40 PM, Ilya Verbin <iverbin@gmail.com> wrote:
> > Why do you think so?  __OFFLOAD_TABLE__ symbol lives in libgcc/offloadstuff.c
> > since November without regressions.

That's correct.
1. offloadstuff.c isn't linked into libgcc_s.so.1
2. __OFFLOAD_TABLE__ is guarded with ENABLE_OFFLOADING, which is disabled in
default configuration.

  -- Ilya

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter (was: Merge current set of OpenACC changes from gomp-4_0-branch)
  2015-01-16 23:48       ` Ilya Verbin
@ 2015-01-17  0:37         ` Jack Howarth
  2015-01-17  1:23           ` Ilya Verbin
  0 siblings, 1 reply; 92+ messages in thread
From: Jack Howarth @ 2015-01-17  0:37 UTC (permalink / raw)
  To: Ilya Verbin
  Cc: Thomas Schwinge, jakub, GCC Patches, howarth,
	Dominique d'Humières, Andrey Turetskiy, bernds, kyukhin

As I read https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64625#c3, the
requirement for  __OFFLOAD_TABLE__ was not longer present and the
residual usages of it just had to be removed. The weak symbol on
darwin is fragile and seems to trip up on the existing code which
produces undefined symbols for ___OFFLOAD_TABLE__...

# nm e.50.1.o | grep OFF
         U ___OFFLOAD_TABLE__

rather than

$ nm e.50.1.o | grep OFF
         w __OFFLOAD_TABLE__

for all of the test cases.

On Fri, Jan 16, 2015 at 6:30 PM, Ilya Verbin <iverbin@gmail.com> wrote:
> On 16 Jan 18:22, Jack Howarth wrote:
>> On 86_64 Fedora 15, current gcc trunk only produces…
>>
>> nm libgcc_s.so.1 | grep OFF
>> 0000000000215478 d _GLOBAL_OFFSET_TABLE_
>>
>> and not __OFFLOAD_TABLE__,  The  libgcc_s.so.1 built on
>> x86_64-apple-darwin14 doesn't even contain the _GLOBAL_OFFSET_TABLE_
>> symbol.
>>
>> On Fri, Jan 16, 2015 at 5:40 PM, Ilya Verbin <iverbin@gmail.com> wrote:
>> > Why do you think so?  __OFFLOAD_TABLE__ symbol lives in libgcc/offloadstuff.c
>> > since November without regressions.
>
> That's correct.
> 1. offloadstuff.c isn't linked into libgcc_s.so.1
> 2. __OFFLOAD_TABLE__ is guarded with ENABLE_OFFLOADING, which is disabled in
> default configuration.
>
>   -- Ilya

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter (was: Merge current set of OpenACC changes from gomp-4_0-branch)
  2015-01-17  0:37         ` Jack Howarth
@ 2015-01-17  1:23           ` Ilya Verbin
  0 siblings, 0 replies; 92+ messages in thread
From: Ilya Verbin @ 2015-01-17  1:23 UTC (permalink / raw)
  To: Jack Howarth
  Cc: Thomas Schwinge, jakub, GCC Patches, howarth,
	Dominique d'Humières, Andrey Turetskiy, bernds, kyukhin

On 16 Jan 19:23, Jack Howarth wrote:
> As I read https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64625#c3, the
> requirement for  __OFFLOAD_TABLE__ was not longer present and the
> residual usages of it just had to be removed. The weak symbol on
> darwin is fragile and seems to trip up on the existing code which
> produces undefined symbols for ___OFFLOAD_TABLE__...
> 
> # nm e.50.1.o | grep OFF
>          U ___OFFLOAD_TABLE__
> 
> rather than
> 
> $ nm e.50.1.o | grep OFF
>          w __OFFLOAD_TABLE__
> 
> for all of the test cases.

I believe that the initial patch, which removes get_offload_symbol_decl, will
fix this.

  -- Ilya

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter (was: Merge current set of OpenACC changes from gomp-4_0-branch)
  2015-01-16 21:13 ` [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter (was: Merge current set of OpenACC changes from gomp-4_0-branch) Thomas Schwinge
  2015-01-16 23:19   ` Ilya Verbin
@ 2015-01-17  3:09   ` Jack Howarth
  2015-02-24 17:23   ` [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter Thomas Schwinge
  2 siblings, 0 replies; 92+ messages in thread
From: Jack Howarth @ 2015-01-17  3:09 UTC (permalink / raw)
  To: Thomas Schwinge
  Cc: jakub, GCC Patches, howarth, Dominique d'Humières,
	Andrey Turetskiy, bernds, Ilya Verbin, kyukhin

Confirmed that this patch eliminates

[Bug libgomp/64625] ___OFFLOAD_TABLE__ symbol not produced on x86_64 darwin

and thus exposes

[Bug libgomp/64635] New: darwin produces
libgomp-plugin-host_nonshm.1.dylib but tries to load
libgomp-plugin-host_nonshm.so.1

The additional hack (which should be fixed with configure/Makefile.
changes to detect SHLIBEXT)...

@@ -1055,7 +1054,7 @@ static void
 gomp_target_init (void)
 {
   const char *prefix ="libgomp-plugin-";
-  const char *suffix = ".so.1";
+  const char *suffix = ".1.dylib";
   const char *cur, *next;
   char *plugin_name;

to target.c in libgomp eliminates the second bug.

Native configuration is x86_64-apple-darwin14.1.0

=== libgomp tests ===

Schedule of variations:
    unix/-m32
    unix/-m64

Running target unix/-m32
Using /sw/share/dejagnu/baseboards/unix.exp as board description file
for target.
Using /sw/share/dejagnu/config/unix.exp as generic interface file for target.
Using /sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/config/default.exp
as tool-and-target-specific interface file.
Running /sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.c/c.exp
...
Running /sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.c++/c++.exp
...
Running /sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.fortran/fortran.exp
...
Running /sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.graphite/graphite.exp
...
Running /sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.oacc-c/c.exp
...
Running /sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.oacc-c++/c++.exp
...
Running /sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
...

=== libgomp Summary for unix/-m32 ===

# of expected passes 5715
# of unsupported tests 281
Running target unix/-m64
Using /sw/share/dejagnu/baseboards/unix.exp as board description file
for target.
Using /sw/share/dejagnu/config/unix.exp as generic interface file for target.
Using /sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/config/default.exp
as tool-and-target-specific interface file.
Running /sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.c/c.exp
...
Running /sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.c++/c++.exp
...
Running /sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.fortran/fortran.exp
...
Running /sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.graphite/graphite.exp
...
Running /sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.oacc-c/c.exp
...
Running /sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.oacc-c++/c++.exp
...
Running /sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
...

=== libgomp Summary for unix/-m64 ===

# of expected passes 5715
# of unsupported tests 281

=== libgomp Summary ===

# of expected passes 11430
# of unsupported tests 562


On Fri, Jan 16, 2015 at 3:34 PM, Thomas Schwinge
<thomas@codesourcery.com> wrote:
> Hi!
>
> On Thu, 15 Jan 2015 21:20:07 +0100, I wrote:
>> In r219682, I have committed to trunk our current set of OpenACC changes,
>
> Here is a patch to remove the __OFFLOAD_SYMBOL__ variable/formal
> parameter, as discussed in <https://gcc.gnu.org/PR64625>.
>
> But -- I now wonder whether that's actually the issue that has been
> reported in the PR; doesn't that more look like a problem with the
> __OFFLOAD_TABLE__ symbol defined in libgcc/offloadstuff.c, and used in
> the mkoffload tools (such as gcc/config/i386/intelmic-mkoffload.c)?  Can
> anyone guess what's going on?
>
> Anyway, as discussed in <https://gcc.gnu.org/PR64625>, I'd like to commit
> this patch either way, OK?
>
> commit 4409d0129118479c1cd1adbcfa96316ac4e734b0
> Author: Thomas Schwinge <thomas@codesourcery.com>
> Date:   Fri Jan 16 20:12:12 2015 +0100
>
>     [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter.
>
>         gcc/
>         * omp-low.c (offload_symbol_decl): Remove variable.
>         (get_offload_symbol_decl): Remove function.
>         (expand_omp_target): For BUILT_IN_GOMP_TARGET,
>         BUILT_IN_GOMP_TARGET_DATA, BUILT_IN_GOMP_TARGET_UPDATE pass NULL
>         instead of &__OFFLOAD_TABLE__, for BUILT_IN_GOACC_DATA_START,
>         BUILT_IN_GOACC_ENTER_EXIT_DATA, BUILT_IN_GOACC_PARALLEL,
>         BUILT_IN_GOACC_UPDATE don't pass it at all.
>         libgomp/
>         * libgomp_g.h (GOACC_data_start, GOACC_enter_exit_data)
>         (GOACC_parallel, GOACC_update): Remove const_void *offload_table
>         formal parameter.  Update all users.
>         * target.c (GOMP_target, GOMP_target_data, GOMP_target_update):
>         Document unused formal parameter.
> ---
>  gcc/omp-low.c           | 45 ++++++++++++++++++---------------------------
>  libgomp/libgomp_g.h     | 10 +++++-----
>  libgomp/oacc-parallel.c |  8 ++++----
>  libgomp/target.c        | 11 +++++------
>  4 files changed, 32 insertions(+), 42 deletions(-)
>
> diff --git gcc/omp-low.c gcc/omp-low.c
> index b7bf338..1589310 100644
> --- gcc/omp-low.c
> +++ gcc/omp-low.c
> @@ -340,30 +340,6 @@ oacc_max_threads (omp_context *ctx)
>  /* Holds offload tables with decls.  */
>  vec<tree, va_gc> *offload_funcs, *offload_vars;
>
> -/* Holds a decl for __OFFLOAD_TABLE__.  */
> -static GTY(()) tree offload_symbol_decl;
> -
> -/* Get the __OFFLOAD_TABLE__ symbol.  */
> -static tree
> -get_offload_symbol_decl (void)
> -{
> -  if (!offload_symbol_decl)
> -    {
> -      tree decl = build_decl (UNKNOWN_LOCATION, VAR_DECL,
> -                             get_identifier ("__OFFLOAD_TABLE__"),
> -                             ptr_type_node);
> -      TREE_ADDRESSABLE (decl) = 1;
> -      TREE_PUBLIC (decl) = 1;
> -      DECL_EXTERNAL (decl) = 1;
> -      DECL_WEAK (decl) = 1;
> -      DECL_ATTRIBUTES (decl)
> -       = tree_cons (get_identifier ("weak"),
> -                    NULL_TREE, DECL_ATTRIBUTES (decl));
> -      offload_symbol_decl = decl;
> -    }
> -  return offload_symbol_decl;
> -}
> -
>  /* Convenience function for calling scan_omp_1_op on tree operands.  */
>
>  static inline tree
> @@ -9119,16 +9095,31 @@ expand_omp_target (struct omp_region *region)
>      }
>
>    gimple g;
> -  tree offload_table = get_offload_symbol_decl ();
>    vec<tree> *args;
>    /* The maximum number used by any start_ix, without varargs.  */
> -  unsigned int argcnt = 12;
> +  unsigned int argcnt = 11;
>
>    vec_alloc (args, argcnt);
>    args->quick_push (device);
>    if (offloaded)
>      args->quick_push (build_fold_addr_expr (child_fn));
> -  args->quick_push (build_fold_addr_expr (offload_table));
> +  switch (start_ix)
> +    {
> +    case BUILT_IN_GOMP_TARGET:
> +    case BUILT_IN_GOMP_TARGET_DATA:
> +    case BUILT_IN_GOMP_TARGET_UPDATE:
> +      /* This const void * is part of the current ABI, but we're not actually
> +        using it.  */
> +      args->quick_push (build_zero_cst (ptr_type_node));
> +      break;
> +    case BUILT_IN_GOACC_DATA_START:
> +    case BUILT_IN_GOACC_ENTER_EXIT_DATA:
> +    case BUILT_IN_GOACC_PARALLEL:
> +    case BUILT_IN_GOACC_UPDATE:
> +      break;
> +    default:
> +      gcc_unreachable ();
> +    }
>    args->quick_push (t1);
>    args->quick_push (t2);
>    args->quick_push (t3);
> diff --git libgomp/libgomp_g.h libgomp/libgomp_g.h
> index c1e4e63..5e88d45 100644
> --- libgomp/libgomp_g.h
> +++ libgomp/libgomp_g.h
> @@ -217,15 +217,15 @@ extern void GOMP_teams (unsigned int, unsigned int);
>
>  /* oacc-parallel.c */
>
> -extern void GOACC_data_start (int, const void *,
> -                             size_t, void **, size_t *, unsigned short *);
> +extern void GOACC_data_start (int, size_t, void **, size_t *,
> +                             unsigned short *);
>  extern void GOACC_data_end (void);
> -extern void GOACC_enter_exit_data (int, const void *, size_t, void **,
> +extern void GOACC_enter_exit_data (int, size_t, void **,
>                                    size_t *, unsigned short *, int, int, ...);
> -extern void GOACC_parallel (int, void (*) (void *), const void *, size_t,
> +extern void GOACC_parallel (int, void (*) (void *), size_t,
>                             void **, size_t *, unsigned short *, int, int, int,
>                             int, int, ...);
> -extern void GOACC_update (int, const void *, size_t, void **, size_t *,
> +extern void GOACC_update (int, size_t, void **, size_t *,
>                           unsigned short *, int, int, ...);
>  extern void GOACC_wait (int, int, ...);
>  extern int GOACC_get_num_threads (void);
> diff --git libgomp/oacc-parallel.c libgomp/oacc-parallel.c
> index b5e8060..a300742 100644
> --- libgomp/oacc-parallel.c
> +++ libgomp/oacc-parallel.c
> @@ -75,7 +75,7 @@ select_acc_device (int device_type)
>  static void goacc_wait (int async, int num_waits, va_list ap);
>
>  void
> -GOACC_parallel (int device, void (*fn) (void *), const void *offload_table,
> +GOACC_parallel (int device, void (*fn) (void *),
>                 size_t mapnum, void **hostaddrs, size_t *sizes,
>                 unsigned short *kinds,
>                 int num_gangs, int num_workers, int vector_length,
> @@ -172,7 +172,7 @@ GOACC_parallel (int device, void (*fn) (void *), const void *offload_table,
>  }
>
>  void
> -GOACC_data_start (int device, const void *offload_table, size_t mapnum,
> +GOACC_data_start (int device, size_t mapnum,
>                   void **hostaddrs, size_t *sizes, unsigned short *kinds)
>  {
>    bool host_fallback = device == GOMP_DEVICE_HOST_FALLBACK;
> @@ -218,7 +218,7 @@ GOACC_data_end (void)
>  }
>
>  void
> -GOACC_enter_exit_data (int device, const void *offload_table, size_t mapnum,
> +GOACC_enter_exit_data (int device, size_t mapnum,
>                        void **hostaddrs, size_t *sizes, unsigned short *kinds,
>                        int async, int num_waits, ...)
>  {
> @@ -408,7 +408,7 @@ goacc_wait (int async, int num_waits, va_list ap)
>  }
>
>  void
> -GOACC_update (int device, const void *offload_table, size_t mapnum,
> +GOACC_update (int device, size_t mapnum,
>               void **hostaddrs, size_t *sizes, unsigned short *kinds,
>               int async, int num_waits, ...)
>  {
> diff --git libgomp/target.c libgomp/target.c
> index 72d64fc..ebff55e 100644
> --- libgomp/target.c
> +++ libgomp/target.c
> @@ -738,15 +738,14 @@ gomp_fini_device (struct gomp_device_descr *devicep)
>     is GOMP_DEVICE_ICV, it means use device-var ICV.  If it is
>     GOMP_DEVICE_HOST_FALLBACK (or any value
>     larger than last available hw device), use host fallback.
> -   FN is address of host code, OFFLOAD_TABLE contains value of the
> -   __OFFLOAD_TABLE__ symbol in the shared library or binary that invokes
> -   GOMP_target.  HOSTADDRS, SIZES and KINDS are arrays
> +   FN is address of host code, UNUSED is part of the current ABI, but
> +   we're not actually using it.  HOSTADDRS, SIZES and KINDS are arrays
>     with MAPNUM entries, with addresses of the host objects,
>     sizes of the host objects (resp. for pointer kind pointer bias
>     and assumed sizeof (void *) size) and kinds.  */
>
>  void
> -GOMP_target (int device, void (*fn) (void *), const void *offload_table,
> +GOMP_target (int device, void (*fn) (void *), const void *unused,
>              size_t mapnum, void **hostaddrs, size_t *sizes,
>              unsigned char *kinds)
>  {
> @@ -817,7 +816,7 @@ GOMP_target (int device, void (*fn) (void *), const void *offload_table,
>  }
>
>  void
> -GOMP_target_data (int device, const void *offload_table, size_t mapnum,
> +GOMP_target_data (int device, const void *unused, size_t mapnum,
>                   void **hostaddrs, size_t *sizes, unsigned char *kinds)
>  {
>    struct gomp_device_descr *devicep = resolve_device (device);
> @@ -873,7 +872,7 @@ GOMP_target_end_data (void)
>  }
>
>  void
> -GOMP_target_update (int device, const void *offload_table, size_t mapnum,
> +GOMP_target_update (int device, const void *unused, size_t mapnum,
>                     void **hostaddrs, size_t *sizes, unsigned char *kinds)
>  {
>    struct gomp_device_descr *devicep = resolve_device (device);
>
>
> Grüße,
>  Thomas

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Merge current set of OpenACC changes from gomp-4_0-branch
  2015-01-16 23:22 ` Merge current set of OpenACC changes from gomp-4_0-branch Ilya Verbin
@ 2015-01-23 18:28   ` Ilya Verbin
  2015-01-23 19:11     ` Jakub Jelinek
  2015-01-26 14:01     ` Thomas Schwinge
  0 siblings, 2 replies; 92+ messages in thread
From: Ilya Verbin @ 2015-01-23 18:28 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: gcc-patches, Jakub Jelinek, Kirill Yukhin

On 17 Jan 02:16, Ilya Verbin wrote:
> Hi!
> 
> Unfortunately, it broke offloading from shared libraries (I mean common libs
> with NEEDED entries, not dlopened).  Such things are not covered by the
> testsuite, that's why you missed this issue.  Here is a simple testcase:
> ... 
> So, you don't assume that a device can have multiple images from multiple libs?

Ping?

Also, could you please explain, why did you divide a device initialization into
two functions -- gomp_init_device and gomp_init_tables?

Currently I'm trying to rebase on trunk my old patch, which fixes offloading
from dlopened libraries: http://gcc.gnu.org/ml/gcc-patches/2014-11/msg01604.html
It works for OpenMP and MIC, but I don't know how not to break OpenACC and PTX.

  -- Ilya

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Merge current set of OpenACC changes from gomp-4_0-branch
  2015-01-23 18:28   ` Ilya Verbin
@ 2015-01-23 19:11     ` Jakub Jelinek
  2015-01-26 14:01     ` Thomas Schwinge
  1 sibling, 0 replies; 92+ messages in thread
From: Jakub Jelinek @ 2015-01-23 19:11 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: Thomas Schwinge, gcc-patches, Kirill Yukhin

On Fri, Jan 23, 2015 at 08:20:53PM +0300, Ilya Verbin wrote:
> On 17 Jan 02:16, Ilya Verbin wrote:
> > Hi!
> > 
> > Unfortunately, it broke offloading from shared libraries (I mean common libs
> > with NEEDED entries, not dlopened).  Such things are not covered by the
> > testsuite, that's why you missed this issue.  Here is a simple testcase:
> > ... 
> > So, you don't assume that a device can have multiple images from multiple libs?
> 
> Ping?
> 
> Also, could you please explain, why did you divide a device initialization into
> two functions -- gomp_init_device and gomp_init_tables?
> 
> Currently I'm trying to rebase on trunk my old patch, which fixes offloading
> from dlopened libraries: http://gcc.gnu.org/ml/gcc-patches/2014-11/msg01604.html
> It works for OpenMP and MIC, but I don't know how not to break OpenACC and PTX.

There is also the problem that GOMP_offload_register doesn't use any
locking, so one thread could be in the middle of dlopening some shared
library and doing GOMP_offload_register in there, and another thread
calling gomp_target_init at the same time, so you could reference freed
memory if GOMP_offload_register had to reallocate etc.

	Jakub

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Merge current set of OpenACC changes from gomp-4_0-branch
  2015-01-23 18:28   ` Ilya Verbin
  2015-01-23 19:11     ` Jakub Jelinek
@ 2015-01-26 14:01     ` Thomas Schwinge
  2015-01-26 15:23       ` Ilya Verbin
                         ` (2 more replies)
  1 sibling, 3 replies; 92+ messages in thread
From: Thomas Schwinge @ 2015-01-26 14:01 UTC (permalink / raw)
  To: Ilya Verbin, Julian Brown; +Cc: gcc-patches, Jakub Jelinek, Kirill Yukhin

[-- Attachment #1: Type: text/plain, Size: 1720 bytes --]

Hi!

Sorry for the late answer -- I've been on sick leave, and just now
returning to work.  Julian, would you please have a look at the following
issues?

> > > In r219682, I have committed to trunk our current set of OpenACC changes,
> > > which we had prepared on gomp-4_0-branch.  Thanks to everyone who has
> > > been contributing!

On Fri, 23 Jan 2015 20:20:53 +0300, Ilya Verbin <iverbin@gmail.com> wrote:
> On 17 Jan 02:16, Ilya Verbin wrote:
> > Unfortunately, it broke offloading from shared libraries (I mean common libs
> > with NEEDED entries, not dlopened).

Sorry for that!

> > Such things are not covered by the
> > testsuite, that's why you missed this issue.  Here is a simple testcase:

<http://news.gmane.org/find-root.php?message_id=%3C20150116231632.GB48380%40msticlxl57.ims.intel.com%3E>

Probably a good motivation for adding such a test case.  ;-)

> > So, you don't assume that a device can have multiple images from multiple libs?
> 
> Ping?

This probably is "just" a bug that we introduced with our changes?
(Julian?)


> Also, could you please explain, why did you divide a device initialization into
> two functions -- gomp_init_device and gomp_init_tables?

As I understand it (again, Julian, please correct me if I got that
wrong), the reason is that for OpenACC support, we need these as two
separate (independent) actions.  Is this causing problems for OpenMP
offloading?


> Currently I'm trying to rebase on trunk my old patch, which fixes offloading
> from dlopened libraries: http://gcc.gnu.org/ml/gcc-patches/2014-11/msg01604.html
> It works for OpenMP and MIC, but I don't know how not to break OpenACC and PTX.


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Merge current set of OpenACC changes from gomp-4_0-branch
  2015-01-26 14:01     ` Thomas Schwinge
@ 2015-01-26 15:23       ` Ilya Verbin
  2015-01-27 14:41         ` Julian Brown
  2015-01-27 13:43       ` Merge current set of OpenACC changes from gomp-4_0-branch Julian Brown
  2015-01-27 19:50       ` Jack Howarth
  2 siblings, 1 reply; 92+ messages in thread
From: Ilya Verbin @ 2015-01-26 15:23 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: Julian Brown, gcc-patches, Jakub Jelinek, Kirill Yukhin

On 26 Jan 14:44, Thomas Schwinge wrote:
> On 17 Jan 02:16, Ilya Verbin wrote:
> > Such things are not covered by the
> > testsuite, that's why you missed this issue.  Here is a simple testcase:
> 
> <http://news.gmane.org/find-root.php?message_id=%3C20150116231632.GB48380%40msticlxl57.ims.intel.com%3E>
> 
> Probably a good motivation for adding such a test case.  ;-)

I thought about it, but I don't know how to compile 2 binaries and run one of
them using dejagnu.

> > So, you don't assume that a device can have multiple images from multiple libs?
> 
> This probably is "just" a bug that we introduced with our changes?
> (Julian?)
> 
> > Also, could you please explain, why did you divide a device initialization into
> > two functions -- gomp_init_device and gomp_init_tables?
> 
> As I understand it (again, Julian, please correct me if I got that
> wrong), the reason is that for OpenACC support, we need these as two
> separate (independent) actions.  Is this causing problems for OpenMP
> offloading?

I'm asking since in this patch http://gcc.gnu.org/ml/gcc-patches/2014-11/msg01604.html
I tried to change libgomp<->plugin interface to enable offloading from libs,
loaded at any time.
My proposal was to replace GOMP_OFFLOAD_register_image and
GOMP_OFFLOAD_get_table with GOMP_OFFLOAD_[un]load_image.
When target device is initialized, GOMP_OFFLOAD_load_image registers one image
in the plugin and returns corresponding target addresses for the image.
The mapping between host and target addresses happens as previously.
I hope that this approach is suitable for both MIC and PTX.

Here is my current patch, it works for OpenMP->MIC, but obviously will not work
for PTX, since it requires symmetrical changes in the plugin.  Could you please
take a look, whether it is possible to support this new interface in PTX plugin?


diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index d9cbff5..1072ae4 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -51,14 +51,12 @@ enum offload_target_type
   OFFLOAD_TARGET_TYPE_INTEL_MIC = 6
 };
 
-/* Auxiliary struct, used for transferring a host-target address range mapping
-   from plugin to libgomp.  */
-struct mapping_table
+/* Auxiliary struct, used for transferring pairs of addresses from plugin
+   to libgomp.  */
+struct addr_pair
 {
-  uintptr_t host_start;
-  uintptr_t host_end;
-  uintptr_t tgt_start;
-  uintptr_t tgt_end;
+  uintptr_t start;
+  uintptr_t end;
 };
 
 /* Miscellaneous functions.  */
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 3089401..4e021f9 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -773,10 +773,10 @@ struct gomp_device_descr
   unsigned int (*get_caps_func) (void);
   int (*get_type_func) (void);
   int (*get_num_devices_func) (void);
-  void (*register_image_func) (void *, void *);
   void (*init_device_func) (int);
   void (*fini_device_func) (int);
-  int (*get_table_func) (int, struct mapping_table **);
+  int (*load_image_func) (int, void *, struct addr_pair **);
+  void (*unload_image_func) (int, void *);
   void *(*alloc_func) (int, size_t);
   void (*free_func) (int, void *);
   void *(*dev2host_func) (int, void *, const void *, size_t);
@@ -793,9 +793,6 @@ struct gomp_device_descr
   /* Set to true when device is initialized.  */
   bool is_initialized;
 
-  /* True when offload regions have been registered with this device.  */
-  bool offload_regions_registered;
-
   /* OpenACC-specific data and functions.  */
   /* This is mutable because of its mutable data_environ and target_data
      members.  */
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index f44174e..2b2b953 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -231,6 +231,7 @@ GOMP_4.0 {
 GOMP_4.0.1 {
   global:
 	GOMP_offload_register;
+	GOMP_offload_unregister;
 } GOMP_4.0;
 
 OACC_2.0 {
diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index 6aeb1e7..5d67c6c 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -43,10 +43,10 @@ static struct gomp_device_descr host_dispatch =
     .get_caps_func = GOMP_OFFLOAD_get_caps,
     .get_type_func = GOMP_OFFLOAD_get_type,
     .get_num_devices_func = GOMP_OFFLOAD_get_num_devices,
-    .register_image_func = GOMP_OFFLOAD_register_image,
     .init_device_func = GOMP_OFFLOAD_init_device,
     .fini_device_func = GOMP_OFFLOAD_fini_device,
-    .get_table_func = GOMP_OFFLOAD_get_table,
+    .load_image_func = GOMP_OFFLOAD_load_image,
+    .unload_image_func = GOMP_OFFLOAD_unload_image,
     .alloc_func = GOMP_OFFLOAD_alloc,
     .free_func = GOMP_OFFLOAD_free,
     .dev2host_func = GOMP_OFFLOAD_dev2host,
@@ -56,7 +56,6 @@ static struct gomp_device_descr host_dispatch =
     .mem_map.is_initialized = false,
     .mem_map.splay_tree.root = NULL,
     .is_initialized = false,
-    .offload_regions_registered = false,
 
     .openacc = {
       .open_device_func = GOMP_OFFLOAD_openacc_open_device,
diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
index 166eb55..19b937a 100644
--- a/libgomp/oacc-init.c
+++ b/libgomp/oacc-init.c
@@ -284,12 +284,6 @@ lazy_open (int ord)
     = acc_dev->openacc.create_thread_data_func (acc_dev->openacc.target_data);
 
   acc_dev->openacc.async_set_async_func (acc_async_sync);
-
-  struct gomp_memory_mapping *mem_map = &acc_dev->mem_map;
-  gomp_mutex_lock (&mem_map->lock);
-  if (!mem_map->is_initialized)
-    gomp_init_tables (acc_dev, mem_map);
-  gomp_mutex_unlock (&mem_map->lock);
 }
 
 /* OpenACC 2.0a (3.2.12, 3.2.13) doesn't specify whether the serialization of
diff --git a/libgomp/plugin/plugin-host.c b/libgomp/plugin/plugin-host.c
index ebf7f11..bc60f72 100644
--- a/libgomp/plugin/plugin-host.c
+++ b/libgomp/plugin/plugin-host.c
@@ -95,12 +95,6 @@ GOMP_OFFLOAD_get_num_devices (void)
 }
 
 STATIC void
-GOMP_OFFLOAD_register_image (void *host_table __attribute__ ((unused)),
-			     void *target_data __attribute__ ((unused)))
-{
-}
-
-STATIC void
 GOMP_OFFLOAD_init_device (int n __attribute__ ((unused)))
 {
 }
@@ -111,12 +105,19 @@ GOMP_OFFLOAD_fini_device (int n __attribute__ ((unused)))
 }
 
 STATIC int
-GOMP_OFFLOAD_get_table (int n __attribute__ ((unused)),
-			struct mapping_table **table __attribute__ ((unused)))
+GOMP_OFFLOAD_load_image (int n __attribute__ ((unused)),
+			 void *i __attribute__ ((unused)),
+			 struct addr_pair **r __attribute__ ((unused)))
 {
   return 0;
 }
 
+STATIC void
+GOMP_OFFLOAD_unload_image (int n __attribute__ ((unused)),
+			   void *i __attribute__ ((unused)))
+{
+}
+
 STATIC void *
 GOMP_OFFLOAD_openacc_open_device (int n)
 {
diff --git a/libgomp/target.c b/libgomp/target.c
index ebff55e..ce2017c 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -635,7 +635,84 @@ gomp_update (struct gomp_device_descr *devicep, struct gomp_memory_mapping *mm,
   gomp_mutex_unlock (&mm->lock);
 }
 
-/* This function should be called from every offload image.
+
+/* Insert mapping of host -> target address pairs to splay tree.  */
+
+static void
+gomp_splay_tree_insert_mapping (struct gomp_device_descr *devicep,
+				struct addr_pair *host_addr,
+				struct addr_pair *tgt_addr)
+{
+  struct gomp_memory_mapping *mm = &devicep->mem_map;
+  struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt));
+  tgt->refcount = 1;
+  tgt->array = gomp_malloc (sizeof (*tgt->array));
+  tgt->tgt_start = tgt_addr->start;
+  tgt->tgt_end = tgt_addr->end;
+  tgt->to_free = NULL;
+  tgt->list_count = 0;
+  tgt->device_descr = devicep;
+  splay_tree_node node = tgt->array;
+  splay_tree_key k = &node->key;
+  k->host_start = host_addr->start;
+  k->host_end = host_addr->end;
+  k->tgt_offset = 0;
+  k->refcount = 1;
+  k->copy_from = false;
+  k->tgt = tgt;
+  node->left = NULL;
+  node->right = NULL;
+  splay_tree_insert (&mm->splay_tree, node);
+}
+
+/* Load image pointed by TARGET_DATA to the device, specified by DEVICEP.
+   And insert to splay tree the mapping between addresses from HOST_TABLE and
+   from loaded target image.  */
+
+static void
+gomp_offload_image_to_device (struct gomp_device_descr *devicep,
+			      void *host_table, void *target_data)
+{
+  void **host_func_table = ((void ***) host_table)[0];
+  void **host_funcs_end  = ((void ***) host_table)[1];
+  void **host_var_table  = ((void ***) host_table)[2];
+  void **host_vars_end   = ((void ***) host_table)[3];
+
+  /* The func table contains only addresses, the var table contains addresses
+     and corresponding sizes.  */
+  int num_funcs = host_funcs_end - host_func_table;
+  int num_vars  = (host_vars_end - host_var_table) / 2;
+
+  /* Load image to device and get target addresses for the image.  */
+  struct addr_pair *target_table = NULL;
+  int i, num_target_entries
+    = devicep->load_image_func (devicep->target_id, target_data, &target_table);
+
+  if (num_target_entries != num_funcs + num_vars)
+    gomp_fatal ("Can't map target functions or variables");
+
+  /* Insert host-target address mapping into devicep->dev_splay_tree.  */
+  for (i = 0; i < num_funcs; i++)
+    {
+      struct addr_pair host_addr;
+      host_addr.start = (uintptr_t) host_func_table[i];
+      host_addr.end = host_addr.start + 1;
+      gomp_splay_tree_insert_mapping (devicep, &host_addr, &target_table[i]);
+    }
+
+  for (i = 0; i < num_vars; i++)
+    {
+      struct addr_pair host_addr;
+      host_addr.start = (uintptr_t) host_var_table[i*2];
+      host_addr.end = host_addr.start + (uintptr_t) host_var_table[i*2+1];
+      gomp_splay_tree_insert_mapping (devicep, &host_addr,
+				      &target_table[num_funcs+i]);
+    }
+
+  free (target_table);
+}
+
+/* This function should be called from every offload image while loading.
    It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of
    the target, and TARGET_DATA needed by target plugin.  */
 
@@ -643,6 +720,17 @@ void
 GOMP_offload_register (void *host_table, enum offload_target_type target_type,
 		       void *target_data)
 {
+  int i;
+
+  /* Load image to all initialized devices.  */
+  for (i = 0; i < num_devices; i++)
+    {
+      struct gomp_device_descr *devicep = &devices[i];
+      if (devicep->type == target_type && devicep->is_initialized)
+	gomp_offload_image_to_device (devicep, host_table, target_data);
+    }
+
+  /* Insert image to array of pending images.  */
   offload_images = gomp_realloc (offload_images,
 				 (num_offload_images + 1)
 				 * sizeof (struct offload_image_descr));
@@ -654,54 +742,83 @@ GOMP_offload_register (void *host_table, enum offload_target_type target_type,
   num_offload_images++;
 }
 
-/* This function initializes the target device, specified by DEVICEP.  DEVICEP
-   must be locked on entry, and remains locked on return.  */
+/* This function should be called from every offload image while unloading.
+   It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of
+   the target, and TARGET_DATA needed by target plugin.  */
 
-attribute_hidden void
-gomp_init_device (struct gomp_device_descr *devicep)
+void
+GOMP_offload_unregister (void *host_table, enum offload_target_type target_type,
+			 void *target_data)
 {
-  devicep->init_device_func (devicep->target_id);
-  devicep->is_initialized = true;
+  void **host_func_table = ((void ***) host_table)[0];
+  void **host_funcs_end  = ((void ***) host_table)[1];
+  void **host_var_table  = ((void ***) host_table)[2];
+  void **host_vars_end   = ((void ***) host_table)[3];
+  int i;
+
+  /* The func table contains only addresses, the var table contains addresses
+     and corresponding sizes.  */
+  int num_funcs = host_funcs_end - host_func_table;
+  int num_vars  = (host_vars_end - host_var_table) / 2;
+
+  /* Unload image from all initialized devices.  */
+  for (i = 0; i < num_devices; i++)
+    {
+      int j;
+      struct gomp_device_descr *devicep = &devices[i];
+      struct gomp_memory_mapping *mm = &devicep->mem_map;
+
+      if (devicep->type != target_type || !devicep->is_initialized)
+	continue;
+
+      devicep->unload_image_func (devicep->target_id, target_data);
+
+      /* Remove mapping from splay tree.  */
+      for (j = 0; j < num_funcs; j++)
+	{
+	  struct splay_tree_key_s k;
+	  k.host_start = (uintptr_t) host_func_table[j];
+	  k.host_end = k.host_start + 1;
+	  splay_tree_remove (&mm->splay_tree, &k);
+	}
+
+      for (j = 0; j < num_vars; j++)
+	{
+	  struct splay_tree_key_s k;
+	  k.host_start = (uintptr_t) host_var_table[j*2];
+	  k.host_end = k.host_start + (uintptr_t) host_var_table[j*2+1];
+	  splay_tree_remove (&mm->splay_tree, &k);
+	}
+    }
+
+  /* Remove image from array of pending images.  */
+  for (i = 0; i < num_offload_images; i++)
+    if (offload_images[i].target_data == target_data)
+      {
+	offload_images[i] = offload_images[--num_offload_images];
+	break;
+      }
 }
 
-/* Initialize address mapping tables.  MM must be locked on entry, and remains
-   locked on return.  */
+/* This function initializes the target device, specified by DEVICEP.  DEVICEP
+   must be locked on entry, and remains locked on return.  */
 
 attribute_hidden void
-gomp_init_tables (struct gomp_device_descr *devicep,
-		  struct gomp_memory_mapping *mm)
+gomp_init_device (struct gomp_device_descr *devicep)
 {
-  /* Get address mapping table for device.  */
-  struct mapping_table *table = NULL;
-  int num_entries = devicep->get_table_func (devicep->target_id, &table);
-
-  /* Insert host-target address mapping into dev_splay_tree.  */
   int i;
-  for (i = 0; i < num_entries; i++)
+  devicep->init_device_func (devicep->target_id);
+
+  /* Load to device all images registered by the moment.  */
+  for (i = 0; i < num_offload_images; i++)
     {
-      struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt));
-      tgt->refcount = 1;
-      tgt->array = gomp_malloc (sizeof (*tgt->array));
-      tgt->tgt_start = table[i].tgt_start;
-      tgt->tgt_end = table[i].tgt_end;
-      tgt->to_free = NULL;
-      tgt->list_count = 0;
-      tgt->device_descr = devicep;
-      splay_tree_node node = tgt->array;
-      splay_tree_key k = &node->key;
-      k->host_start = table[i].host_start;
-      k->host_end = table[i].host_end;
-      k->tgt_offset = 0;
-      k->refcount = 1;
-      k->copy_from = false;
-      k->tgt = tgt;
-      node->left = NULL;
-      node->right = NULL;
-      splay_tree_insert (&mm->splay_tree, node);
+      struct offload_image_descr *image = &offload_images[i];
+      if (image->type == devicep->type)
+	gomp_offload_image_to_device (devicep, image->host_table,
+				      image->target_data);
     }
 
-  free (table);
-  mm->is_initialized = true;
+  devicep->is_initialized = true;
 }
 
 /* Free address mapping tables.  MM must be locked on entry, and remains locked
@@ -750,6 +867,7 @@ GOMP_target (int device, void (*fn) (void *), const void *unused,
 	     unsigned char *kinds)
 {
   struct gomp_device_descr *devicep = resolve_device (device);
+  struct gomp_memory_mapping *mm = &devicep->mem_map;
 
   if (devicep == NULL
       || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
@@ -780,21 +898,12 @@ GOMP_target (int device, void (*fn) (void *), const void *unused,
     fn_addr = (void *) fn;
   else
     {
-      struct gomp_memory_mapping *mm = &devicep->mem_map;
-      gomp_mutex_lock (&mm->lock);
-
-      if (!mm->is_initialized)
-	gomp_init_tables (devicep, mm);
-
       struct splay_tree_key_s k;
       k.host_start = (uintptr_t) fn;
       k.host_end = k.host_start + 1;
       splay_tree_key tgt_fn = splay_tree_lookup (&mm->splay_tree, &k);
       if (tgt_fn == NULL)
 	gomp_fatal ("Target function wasn't mapped");
-
-      gomp_mutex_unlock (&mm->lock);
-
       fn_addr = (void *) tgt_fn->tgt->tgt_start;
     }
 
@@ -845,12 +954,6 @@ GOMP_target_data (int device, const void *unused, size_t mapnum,
     gomp_init_device (devicep);
   gomp_mutex_unlock (&devicep->lock);
 
-  struct gomp_memory_mapping *mm = &devicep->mem_map;
-  gomp_mutex_lock (&mm->lock);
-  if (!mm->is_initialized)
-    gomp_init_tables (devicep, mm);
-  gomp_mutex_unlock (&mm->lock);
-
   struct target_mem_desc *tgt
     = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false,
 		     false);
@@ -886,13 +989,7 @@ GOMP_target_update (int device, const void *unused, size_t mapnum,
     gomp_init_device (devicep);
   gomp_mutex_unlock (&devicep->lock);
 
-  struct gomp_memory_mapping *mm = &devicep->mem_map;
-  gomp_mutex_lock (&mm->lock);
-  if (!mm->is_initialized)
-    gomp_init_tables (devicep, mm);
-  gomp_mutex_unlock (&mm->lock);
-
-  gomp_update (devicep, mm, mapnum, hostaddrs, sizes, kinds, false);
+  gomp_update (devicep, &devicep->mem_map, mapnum, hostaddrs, sizes, kinds, false);
 }
 
 void
@@ -961,10 +1058,10 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
   DLSYM (get_caps);
   DLSYM (get_type);
   DLSYM (get_num_devices);
-  DLSYM (register_image);
   DLSYM (init_device);
   DLSYM (fini_device);
-  DLSYM (get_table);
+  DLSYM (load_image);
+  DLSYM (unload_image);
   DLSYM (alloc);
   DLSYM (free);
   DLSYM (dev2host);
@@ -1027,22 +1124,6 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
   return err == NULL;
 }
 
-/* This function adds a compatible offload image IMAGE to an accelerator device
-   DEVICE.  DEVICE must be locked on entry, and remains locked on return.  */
-
-static void
-gomp_register_image_for_device (struct gomp_device_descr *device,
-				struct offload_image_descr *image)
-{
-  if (!device->offload_regions_registered
-      && (device->type == image->type
-	  || device->type == OFFLOAD_TARGET_TYPE_HOST))
-    {
-      device->register_image_func (image->host_table, image->target_data);
-      device->offload_regions_registered = true;
-    }
-}
-
 /* This function initializes the runtime needed for offloading.
    It parses the list of offload targets and tries to load the plugins for
    these targets.  On return, the variables NUM_DEVICES and NUM_DEVICES_OPENMP
@@ -1104,7 +1185,6 @@ gomp_target_init (void)
 		current_device.mem_map.is_initialized = false;
 		current_device.mem_map.splay_tree.root = NULL;
 		current_device.is_initialized = false;
-		current_device.offload_regions_registered = false;
 		current_device.openacc.data_environ = NULL;
 		current_device.openacc.target_data = NULL;
 		for (i = 0; i < new_num_devices; i++)
@@ -1146,21 +1226,12 @@ gomp_target_init (void)
 
   for (i = 0; i < num_devices; i++)
     {
-      int j;
-
-      for (j = 0; j < num_offload_images; j++)
-	gomp_register_image_for_device (&devices[i], &offload_images[j]);
-
       /* The 'devices' array can be moved (by the realloc call) until we have
 	 found all the plugins, so registering with the OpenACC runtime (which
 	 takes a copy of the pointer argument) must be delayed until now.  */
       if (devices[i].capabilities & GOMP_OFFLOAD_CAP_OPENACC_200)
 	goacc_register (&devices[i]);
     }
-
-  free (offload_images);
-  offload_images = NULL;
-  num_offload_images = 0;
 }
 
 #else /* PLUGIN_SUPPORT */


Thanks,
  -- Ilya

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Merge current set of OpenACC changes from gomp-4_0-branch
  2015-01-26 14:01     ` Thomas Schwinge
  2015-01-26 15:23       ` Ilya Verbin
@ 2015-01-27 13:43       ` Julian Brown
  2015-01-27 19:50       ` Jack Howarth
  2 siblings, 0 replies; 92+ messages in thread
From: Julian Brown @ 2015-01-27 13:43 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: Ilya Verbin, gcc-patches, Jakub Jelinek, Kirill Yukhin

On Mon, 26 Jan 2015 14:44:19 +0100
Thomas Schwinge <thomas@codesourcery.com> wrote:

> > On 17 Jan 02:16, Ilya Verbin wrote:
> > > Unfortunately, it broke offloading from shared libraries (I mean
> > > common libs with NEEDED entries, not dlopened).
> 
> Sorry for that!
> 
> > > Such things are not covered by the
> > > testsuite, that's why you missed this issue.  Here is a simple
> > > testcase:
> 
> <http://news.gmane.org/find-root.php?message_id=%3C20150116231632.GB48380%40msticlxl57.ims.intel.com%3E>
> 
> Probably a good motivation for adding such a test case.  ;-)
> 
> > > So, you don't assume that a device can have multiple images from
> > > multiple libs?
> > 
> > Ping?
> 
> This probably is "just" a bug that we introduced with our changes?
> (Julian?)

AFAICR, we haven't yet figured out how to make (shared) libraries work
with PTX. Actually I'm not entirely sure if static libraries containing
PTX code will work either. But, multiple images (e.g. from different
object files) are supported, via the loop in gomp_target_init.

(The semantics of gomp_register_image_for_device were changed, but not
-- intentionally! -- to limit the number of offloaded images to one.)

> > Also, could you please explain, why did you divide a device
> > initialization into two functions -- gomp_init_device and
> > gomp_init_tables?
> 
> As I understand it (again, Julian, please correct me if I got that
> wrong), the reason is that for OpenACC support, we need these as two
> separate (independent) actions.  Is this causing problems for OpenMP
> offloading?

This was certainly necessary at some point, when the support for
multiple devices of the same type in the OpenACC runtime was delegated
entirely to target-dependent code. Later (after one round of
refactoring), the gomp_device_descr and the memory map were still
separate, with the former possibly representing a number of devices,
and the latter having independent copies for each instance of a device.

That's largely been refactored (again) away now though -- a
gomp_device_descr and its memory map are stored together, per-device
instance. So this separation of their initialisation can probably go
away, although some (somewhat delicate) code in oacc-init.c would need
to be tweaked.

Julian

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Merge current set of OpenACC changes from gomp-4_0-branch
  2015-01-26 15:23       ` Ilya Verbin
@ 2015-01-27 14:41         ` Julian Brown
  2015-02-03 11:28           ` Ilya Verbin
  0 siblings, 1 reply; 92+ messages in thread
From: Julian Brown @ 2015-01-27 14:41 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: Thomas Schwinge, gcc-patches, Jakub Jelinek, Kirill Yukhin

On Mon, 26 Jan 2015 17:34:26 +0300
Ilya Verbin <iverbin@gmail.com> wrote:

> Here is my current patch, it works for OpenMP->MIC, but obviously
> will not work for PTX, since it requires symmetrical changes in the
> plugin.  Could you please take a look, whether it is possible to
> support this new interface in PTX plugin?

I think it can probably be made to work. I'll have a look in more
detail.

Thanks,

Julian

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Merge current set of OpenACC changes from gomp-4_0-branch
  2015-01-26 14:01     ` Thomas Schwinge
  2015-01-26 15:23       ` Ilya Verbin
  2015-01-27 13:43       ` Merge current set of OpenACC changes from gomp-4_0-branch Julian Brown
@ 2015-01-27 19:50       ` Jack Howarth
  2 siblings, 0 replies; 92+ messages in thread
From: Jack Howarth @ 2015-01-27 19:50 UTC (permalink / raw)
  To: Thomas Schwinge
  Cc: Ilya Verbin, Julian Brown, GCC Patches, Jakub Jelinek, Kirill Yukhin

Thomas,
     Any plans to fix
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64635 soon? On x86_64
darwin, the OpenACC merge resulted a huge number of failures in the
libgomp test suite…


=== libgomp Summary ===
# of expected passes 10628
# of unexpected failures 724
# of unsupported tests 562

which are resolved with a fix similar to
https://gcc.gnu.org/bugzilla/attachment.cgi?id=34480.
               Jack


On Mon, Jan 26, 2015 at 8:44 AM, Thomas Schwinge
<thomas@codesourcery.com> wrote:
> Hi!
>
> Sorry for the late answer -- I've been on sick leave, and just now
> returning to work.  Julian, would you please have a look at the following
> issues?
>
>> > > In r219682, I have committed to trunk our current set of OpenACC changes,
>> > > which we had prepared on gomp-4_0-branch.  Thanks to everyone who has
>> > > been contributing!
>
> On Fri, 23 Jan 2015 20:20:53 +0300, Ilya Verbin <iverbin@gmail.com> wrote:
>> On 17 Jan 02:16, Ilya Verbin wrote:
>> > Unfortunately, it broke offloading from shared libraries (I mean common libs
>> > with NEEDED entries, not dlopened).
>
> Sorry for that!
>
>> > Such things are not covered by the
>> > testsuite, that's why you missed this issue.  Here is a simple testcase:
>
> <http://news.gmane.org/find-root.php?message_id=%3C20150116231632.GB48380%40msticlxl57.ims.intel.com%3E>
>
> Probably a good motivation for adding such a test case.  ;-)
>
>> > So, you don't assume that a device can have multiple images from multiple libs?
>>
>> Ping?
>
> This probably is "just" a bug that we introduced with our changes?
> (Julian?)
>
>
>> Also, could you please explain, why did you divide a device initialization into
>> two functions -- gomp_init_device and gomp_init_tables?
>
> As I understand it (again, Julian, please correct me if I got that
> wrong), the reason is that for OpenACC support, we need these as two
> separate (independent) actions.  Is this causing problems for OpenMP
> offloading?
>
>
>> Currently I'm trying to rebase on trunk my old patch, which fixes offloading
>> from dlopened libraries: http://gcc.gnu.org/ml/gcc-patches/2014-11/msg01604.html
>> It works for OpenMP and MIC, but I don't know how not to break OpenACC and PTX.
>
>
> Grüße,
>  Thomas

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Merge current set of OpenACC changes from gomp-4_0-branch
  2015-01-27 14:41         ` Julian Brown
@ 2015-02-03 11:28           ` Ilya Verbin
  2015-02-03 13:00             ` Julian Brown
  0 siblings, 1 reply; 92+ messages in thread
From: Ilya Verbin @ 2015-02-03 11:28 UTC (permalink / raw)
  To: Julian Brown, Thomas Schwinge; +Cc: gcc-patches, Jakub Jelinek, Kirill Yukhin

Hi Julian!

On 27 Jan 14:07, Julian Brown wrote:
> On Mon, 26 Jan 2015 17:34:26 +0300
> Ilya Verbin <iverbin@gmail.com> wrote:
> > Here is my current patch, it works for OpenMP->MIC, but obviously
> > will not work for PTX, since it requires symmetrical changes in the
> > plugin.  Could you please take a look, whether it is possible to
> > support this new interface in PTX plugin?
> 
> I think it can probably be made to work. I'll have a look in more
> detail.

Do you have any progress on this?

Thanks,
  -- Ilya

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Merge current set of OpenACC changes from gomp-4_0-branch
  2015-02-03 11:28           ` Ilya Verbin
@ 2015-02-03 13:00             ` Julian Brown
  2015-02-03 20:01               ` Ilya Verbin
  0 siblings, 1 reply; 92+ messages in thread
From: Julian Brown @ 2015-02-03 13:00 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: Thomas Schwinge, gcc-patches, Jakub Jelinek, Kirill Yukhin

On Tue, 3 Feb 2015 14:28:44 +0300
Ilya Verbin <iverbin@gmail.com> wrote:

> Hi Julian!
> 
> On 27 Jan 14:07, Julian Brown wrote:
> > On Mon, 26 Jan 2015 17:34:26 +0300
> > Ilya Verbin <iverbin@gmail.com> wrote:
> > > Here is my current patch, it works for OpenMP->MIC, but obviously
> > > will not work for PTX, since it requires symmetrical changes in
> > > the plugin.  Could you please take a look, whether it is possible
> > > to support this new interface in PTX plugin?
> > 
> > I think it can probably be made to work. I'll have a look in more
> > detail.
> 
> Do you have any progress on this?

I'm still working on a patch to update OpenACC support and the PTX
backend to use load/unload_image and to unify initialisation/"opening".
So far I think the answer is basically "yes, the new interface can be
supported", though I might request a minor tweak -- e.g. that
load_image takes an extra "void **" argument so that a libgomp backend
can allocate a block of generic metadata relating to the image, then
that same block would be passed (void *) to the unload hook so the
backend can use it there and deallocate it when it's finished with.

Would that be possible? (It'd mostly be for a "CUmodule" handle: this
could be stashed away somewhere within the nvptx backend, but it might
be neater to put it in generic code since it'll probably be useful for
other backends anyway.)

Thanks,

Julian

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Merge current set of OpenACC changes from gomp-4_0-branch
  2015-02-03 13:00             ` Julian Brown
@ 2015-02-03 20:01               ` Ilya Verbin
  2015-02-04 15:06                 ` Julian Brown
  0 siblings, 1 reply; 92+ messages in thread
From: Ilya Verbin @ 2015-02-03 20:01 UTC (permalink / raw)
  To: Julian Brown; +Cc: Thomas Schwinge, gcc-patches, Jakub Jelinek, Kirill Yukhin

On 03 Feb 13:00, Julian Brown wrote:
> On Tue, 3 Feb 2015 14:28:44 +0300
> Ilya Verbin <iverbin@gmail.com> wrote:
> > On 27 Jan 14:07, Julian Brown wrote:
> > > On Mon, 26 Jan 2015 17:34:26 +0300
> > > Ilya Verbin <iverbin@gmail.com> wrote:
> > > > Here is my current patch, it works for OpenMP->MIC, but obviously
> > > > will not work for PTX, since it requires symmetrical changes in
> > > > the plugin.  Could you please take a look, whether it is possible
> > > > to support this new interface in PTX plugin?
> > > 
> > > I think it can probably be made to work. I'll have a look in more
> > > detail.
> > 
> > Do you have any progress on this?
> 
> I'm still working on a patch to update OpenACC support and the PTX
> backend to use load/unload_image and to unify initialisation/"opening".
> So far I think the answer is basically "yes, the new interface can be
> supported", though I might request a minor tweak -- e.g. that
> load_image takes an extra "void **" argument so that a libgomp backend
> can allocate a block of generic metadata relating to the image, then
> that same block would be passed (void *) to the unload hook so the
> backend can use it there and deallocate it when it's finished with.
> 
> Would that be possible? (It'd mostly be for a "CUmodule" handle: this
> could be stashed away somewhere within the nvptx backend, but it might
> be neater to put it in generic code since it'll probably be useful for
> other backends anyway.)

An extra argument is not a problem, however I don't quite get the idea.
PTX plugin allocates some data while loading, and needs this data while
unloading?  Then why not to create a hash table with image_ptr -> metadata
mapping inside the plugin?  In this case, to the unload hook can deallocate
metadata using the image_ptr key.  Since this metadata is target-specific,
I believe it would be better to keep it inside the plugin.

  -- Ilya

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [RFC testsuite] Fix PR64850, tweak acc_on_device* tests
@ 2015-02-03 23:41 Kaz Kojima
  0 siblings, 0 replies; 92+ messages in thread
From: Kaz Kojima @ 2015-02-03 23:41 UTC (permalink / raw)
  To: gcc-patches

Hi,

Several goacc/acc_on_device tests fail for a few targets:

hppa2.0w-hp-hpux11.11 (PR testsuite/64850)
https://gcc.gnu.org/ml/gcc-testresults/2015-01/msg02659.html

m68k-unknown-linux-gnu
https://gcc.gnu.org/ml/gcc-testresults/2015-01/msg02960.html

sh4-unknown-linux-gnu
https://gcc.gnu.org/ml/gcc-testresults/2015-01/msg02930.html

Also they fail with special options
x86_64-unknown-linux-gnu -fpic -mcmodel=large
https://gcc.gnu.org/ml/gcc-testresults/2015-02/msg00198.html

Those tests scan .expand rtl dumps to get the number of calls for
acc_on_device function.  For almost targets, the call rtx looks
something like

  (call (mem:QI (symbol_ref:SI ("acc_on_device") [flags 0x41]  <function_decl 0xb7614100 acc_on_device>) [0 acc_on_device S1 A8])

and tests use the regular expression "\\\(call \[^\\n\]*\\\"acc_on_device"
to detect it.
This expression doesn't match with the corresponding call rtx

  (call (mem:SI (symbol_ref/v:SI ("@acc_on_device") [flags 0x41]  <function_decl 0xb764d900 acc_on_device>) [0 acc_on_device S4 A32])

for hppa and something like

  (call (mem:QI (reg/f:SI 33) [0 acc_on_device S1 A8])

for m68k and sh.  All call rtxes have the function name in
the alias set of its mem rtx and it seems that the regular
expression "\\\(call \[^\\n\]* acc_on_device" works for all
cases.  The attached patch is tested on i686-pc-linux-gnu and
sh4-unknown-linux-gnu.

Regards,
	kaz
--
	PR testsuite/64850
	* gcc.dg/goacc/acc_on_device-1.c: Use a space instead of \\\" in
	the expression to find calls.
	* c-c++-common/goacc/acc_on_device-2.c: Likewise.
	* c-c++-common/goacc/acc_on_device-2-off.c: Likewise.
	* gfortran.dg/goacc/acc_on_device-1.f95: Likewise.
	* gfortran.dg/goacc/acc_on_device-2.f95: Likewise.
	* gfortran.dg/goacc/acc_on_device-2-off.f95: Likewise.

diff --git a/c-c++-common/goacc/acc_on_device-2-off.c b/c-c++-common/goacc/acc_on_device-2-off.c
index 25d21ad..ea31047 100644
--- a/c-c++-common/goacc/acc_on_device-2-off.c
+++ b/c-c++-common/goacc/acc_on_device-2-off.c
@@ -20,6 +20,6 @@ f (void)
 }
 
 /* Without -fopenacc, we're expecting one call.
-   { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]*\\\"acc_on_device" 1 "expand" } } */
+   { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]* acc_on_device" 1 "expand" } } */
 
 /* { dg-final { cleanup-rtl-dump "expand" } } */
diff --git a/c-c++-common/goacc/acc_on_device-2.c b/c-c++-common/goacc/acc_on_device-2.c
index d5389a9..2f4ee2b 100644
--- a/c-c++-common/goacc/acc_on_device-2.c
+++ b/c-c++-common/goacc/acc_on_device-2.c
@@ -24,6 +24,6 @@ f (void)
    perturbs expansion as a builtin, which expects an int parameter.  It's fine
    when changing acc_device_t to plain int, but that's not what we're doing in
    <openacc.h>.
-   { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]*\\\"acc_on_device" 0 "expand" { xfail c++ } } } */
+   { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]* acc_on_device" 0 "expand" { xfail c++ } } } */
 
 /* { dg-final { cleanup-rtl-dump "expand" } } */
diff --git a/gcc.dg/goacc/acc_on_device-1.c b/gcc.dg/goacc/acc_on_device-1.c
index 1a0276e..d0dbc82 100644
--- a/gcc.dg/goacc/acc_on_device-1.c
+++ b/gcc.dg/goacc/acc_on_device-1.c
@@ -15,6 +15,6 @@ f (void)
 }
 
 /* Unsuitable to be handled as a builtin, so we're expecting four calls.
-   { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]*\\\"acc_on_device" 4 "expand" } } */
+   { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]* acc_on_device" 4 "expand" } } */
 
 /* { dg-final { cleanup-rtl-dump "expand" } } */
diff --git a/gfortran.dg/goacc/acc_on_device-1.f95 b/gfortran.dg/goacc/acc_on_device-1.f95
index 9dfde26..0126d9c 100644
--- a/gfortran.dg/goacc/acc_on_device-1.f95
+++ b/gfortran.dg/goacc/acc_on_device-1.f95
@@ -17,6 +17,6 @@ logical function f ()
 end function f
 
 ! Unsuitable to be handled as a builtin, so we're expecting four calls.
-! { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]*\\\"acc_on_device" 4 "expand" } }
+! { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]* acc_on_device" 4 "expand" } }
 
 ! { dg-final { cleanup-rtl-dump "expand" } }
diff --git a/gfortran.dg/goacc/acc_on_device-2-off.f95 b/gfortran.dg/goacc/acc_on_device-2-off.f95
index cf28264..0a4978e 100644
--- a/gfortran.dg/goacc/acc_on_device-2-off.f95
+++ b/gfortran.dg/goacc/acc_on_device-2-off.f95
@@ -34,6 +34,6 @@ logical (4) function f ()
 end function f
 
 ! Without -fopenacc, we're expecting one call.
-! { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]*\\\"acc_on_device" 1 "expand" } }
+! { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]* acc_on_device" 1 "expand" } }
 
 ! { dg-final { cleanup-rtl-dump "expand" } }
diff --git a/gfortran.dg/goacc/acc_on_device-2.f95 b/gfortran.dg/goacc/acc_on_device-2.f95
index 7730a60..43ad022 100644
--- a/gfortran.dg/goacc/acc_on_device-2.f95
+++ b/gfortran.dg/goacc/acc_on_device-2.f95
@@ -35,6 +35,6 @@ end function f
 
 ! With -fopenacc, we're expecting the builtin to be expanded, so no calls.
 ! TODO: not working.
-! { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]*\\\"acc_on_device" 0 "expand" { xfail *-*-* } } }
+! { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]* acc_on_device" 0 "expand" { xfail *-*-* } } }
 
 ! { dg-final { cleanup-rtl-dump "expand" } }

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC testsuite] Fix PR64850, tweak acc_on_device* tests
  2015-01-16 22:41 ` Merge current set of OpenACC changes from gomp-4_0-branch Andreas Schwab
@ 2015-02-04  9:41   ` Thomas Schwinge
  2015-02-10 12:02     ` Thomas Schwinge
  0 siblings, 1 reply; 92+ messages in thread
From: Thomas Schwinge @ 2015-02-04  9:41 UTC (permalink / raw)
  To: Kaz Kojima; +Cc: gcc-patches, Andreas Schwab, danglin

[-- Attachment #1: Type: text/plain, Size: 5974 bytes --]

Hi Kaz!

On Wed, 04 Feb 2015 08:41:28 +0900 (JST), Kaz Kojima <kkojima@rr.iij4u.or.jp> wrote:
> Several goacc/acc_on_device tests fail for a few targets:
> 
> hppa2.0w-hp-hpux11.11 (PR testsuite/64850)
> https://gcc.gnu.org/ml/gcc-testresults/2015-01/msg02659.html
> 
> m68k-unknown-linux-gnu
> https://gcc.gnu.org/ml/gcc-testresults/2015-01/msg02960.html
> 
> sh4-unknown-linux-gnu
> https://gcc.gnu.org/ml/gcc-testresults/2015-01/msg02930.html
> 
> Also they fail with special options
> x86_64-unknown-linux-gnu -fpic -mcmodel=large
> https://gcc.gnu.org/ml/gcc-testresults/2015-02/msg00198.html

Thanks for looking into this -- incidentally, I also started looking into
it yesterday...  :-)

> Those tests scan .expand rtl dumps to get the number of calls for
> acc_on_device function.  For almost targets, the call rtx looks
> something like
> 
>   (call (mem:QI (symbol_ref:SI ("acc_on_device") [flags 0x41]  <function_decl 0xb7614100 acc_on_device>) [0 acc_on_device S1 A8])
> 
> and tests use the regular expression "\\\(call \[^\\n\]*\\\"acc_on_device"
> to detect it.
> This expression doesn't match with the corresponding call rtx
> 
>   (call (mem:SI (symbol_ref/v:SI ("@acc_on_device") [flags 0x41]  <function_decl 0xb764d900 acc_on_device>) [0 acc_on_device S4 A32])
> 
> for hppa and something like
> 
>   (call (mem:QI (reg/f:SI 33) [0 acc_on_device S1 A8])
> 
> for m68k and sh.  All call rtxes have the function name in
> the alias set of its mem rtx and it seems that the regular
> expression "\\\(call \[^\\n\]* acc_on_device" works for all
> cases.  The attached patch is tested on i686-pc-linux-gnu and
> sh4-unknown-linux-gnu.

> 	PR testsuite/64850
> 	* gcc.dg/goacc/acc_on_device-1.c: Use a space instead of \\\" in
> 	the expression to find calls.
> 	* c-c++-common/goacc/acc_on_device-2.c: Likewise.
> 	* c-c++-common/goacc/acc_on_device-2-off.c: Likewise.
> 	* gfortran.dg/goacc/acc_on_device-1.f95: Likewise.
> 	* gfortran.dg/goacc/acc_on_device-2.f95: Likewise.
> 	* gfortran.dg/goacc/acc_on_device-2-off.f95: Likewise.

The other idea that I had is to separately scan/count the symbol_ref and
the call (or call_insn?), but I'm not sure if that is "better".  So, your
patch looks good to me, thanks!

> diff --git a/c-c++-common/goacc/acc_on_device-2-off.c b/c-c++-common/goacc/acc_on_device-2-off.c
> index 25d21ad..ea31047 100644
> --- a/c-c++-common/goacc/acc_on_device-2-off.c
> +++ b/c-c++-common/goacc/acc_on_device-2-off.c
> @@ -20,6 +20,6 @@ f (void)
>  }
>  
>  /* Without -fopenacc, we're expecting one call.
> -   { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]*\\\"acc_on_device" 1 "expand" } } */
> +   { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]* acc_on_device" 1 "expand" } } */
>  
>  /* { dg-final { cleanup-rtl-dump "expand" } } */
> diff --git a/c-c++-common/goacc/acc_on_device-2.c b/c-c++-common/goacc/acc_on_device-2.c
> index d5389a9..2f4ee2b 100644
> --- a/c-c++-common/goacc/acc_on_device-2.c
> +++ b/c-c++-common/goacc/acc_on_device-2.c
> @@ -24,6 +24,6 @@ f (void)
>     perturbs expansion as a builtin, which expects an int parameter.  It's fine
>     when changing acc_device_t to plain int, but that's not what we're doing in
>     <openacc.h>.
> -   { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]*\\\"acc_on_device" 0 "expand" { xfail c++ } } } */
> +   { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]* acc_on_device" 0 "expand" { xfail c++ } } } */
>  
>  /* { dg-final { cleanup-rtl-dump "expand" } } */
> diff --git a/gcc.dg/goacc/acc_on_device-1.c b/gcc.dg/goacc/acc_on_device-1.c
> index 1a0276e..d0dbc82 100644
> --- a/gcc.dg/goacc/acc_on_device-1.c
> +++ b/gcc.dg/goacc/acc_on_device-1.c
> @@ -15,6 +15,6 @@ f (void)
>  }
>  
>  /* Unsuitable to be handled as a builtin, so we're expecting four calls.
> -   { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]*\\\"acc_on_device" 4 "expand" } } */
> +   { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]* acc_on_device" 4 "expand" } } */
>  
>  /* { dg-final { cleanup-rtl-dump "expand" } } */
> diff --git a/gfortran.dg/goacc/acc_on_device-1.f95 b/gfortran.dg/goacc/acc_on_device-1.f95
> index 9dfde26..0126d9c 100644
> --- a/gfortran.dg/goacc/acc_on_device-1.f95
> +++ b/gfortran.dg/goacc/acc_on_device-1.f95
> @@ -17,6 +17,6 @@ logical function f ()
>  end function f
>  
>  ! Unsuitable to be handled as a builtin, so we're expecting four calls.
> -! { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]*\\\"acc_on_device" 4 "expand" } }
> +! { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]* acc_on_device" 4 "expand" } }
>  
>  ! { dg-final { cleanup-rtl-dump "expand" } }
> diff --git a/gfortran.dg/goacc/acc_on_device-2-off.f95 b/gfortran.dg/goacc/acc_on_device-2-off.f95
> index cf28264..0a4978e 100644
> --- a/gfortran.dg/goacc/acc_on_device-2-off.f95
> +++ b/gfortran.dg/goacc/acc_on_device-2-off.f95
> @@ -34,6 +34,6 @@ logical (4) function f ()
>  end function f
>  
>  ! Without -fopenacc, we're expecting one call.
> -! { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]*\\\"acc_on_device" 1 "expand" } }
> +! { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]* acc_on_device" 1 "expand" } }
>  
>  ! { dg-final { cleanup-rtl-dump "expand" } }
> diff --git a/gfortran.dg/goacc/acc_on_device-2.f95 b/gfortran.dg/goacc/acc_on_device-2.f95
> index 7730a60..43ad022 100644
> --- a/gfortran.dg/goacc/acc_on_device-2.f95
> +++ b/gfortran.dg/goacc/acc_on_device-2.f95
> @@ -35,6 +35,6 @@ end function f
>  
>  ! With -fopenacc, we're expecting the builtin to be expanded, so no calls.
>  ! TODO: not working.
> -! { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]*\\\"acc_on_device" 0 "expand" { xfail *-*-* } } }
> +! { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]* acc_on_device" 0 "expand" { xfail *-*-* } } }
>  
>  ! { dg-final { cleanup-rtl-dump "expand" } }


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Merge current set of OpenACC changes from gomp-4_0-branch
  2015-02-03 20:01               ` Ilya Verbin
@ 2015-02-04 15:06                 ` Julian Brown
  2015-02-18 12:25                   ` Ilya Verbin
  2015-02-24 12:49                   ` Julian Brown
  0 siblings, 2 replies; 92+ messages in thread
From: Julian Brown @ 2015-02-04 15:06 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: Thomas Schwinge, gcc-patches, Jakub Jelinek, Kirill Yukhin

[-- Attachment #1: Type: text/plain, Size: 7512 bytes --]

On Tue, 3 Feb 2015 23:01:04 +0300
Ilya Verbin <iverbin@gmail.com> wrote:

> On 03 Feb 13:00, Julian Brown wrote:
> > On Tue, 3 Feb 2015 14:28:44 +0300
> > Ilya Verbin <iverbin@gmail.com> wrote:
> > > On 27 Jan 14:07, Julian Brown wrote:
> > > > On Mon, 26 Jan 2015 17:34:26 +0300
> > > > Ilya Verbin <iverbin@gmail.com> wrote:
> > > > > Here is my current patch, it works for OpenMP->MIC, but
> > > > > obviously will not work for PTX, since it requires
> > > > > symmetrical changes in the plugin.  Could you please take a
> > > > > look, whether it is possible to support this new interface in
> > > > > PTX plugin?
> > > > 
> > > > I think it can probably be made to work. I'll have a look in
> > > > more detail.
> > > 
> > > Do you have any progress on this?
> > 
> > I'm still working on a patch to update OpenACC support and the PTX
> > backend to use load/unload_image and to unify
> > initialisation/"opening". So far I think the answer is basically
> > "yes, the new interface can be supported", though I might request a
> > minor tweak -- e.g. that load_image takes an extra "void **"
> > argument so that a libgomp backend can allocate a block of generic
> > metadata relating to the image, then that same block would be
> > passed (void *) to the unload hook so the backend can use it there
> > and deallocate it when it's finished with.
> > 
> > Would that be possible? (It'd mostly be for a "CUmodule" handle:
> > this could be stashed away somewhere within the nvptx backend, but
> > it might be neater to put it in generic code since it'll probably
> > be useful for other backends anyway.)
> 
> An extra argument is not a problem, however I don't quite get the
> idea. PTX plugin allocates some data while loading, and needs this
> data while unloading?  Then why not to create a hash table with
> image_ptr -> metadata mapping inside the plugin? [...]

Right -- that's what I meant by "could be stashed away somewhere within
the nvptx backend". I just thought that retaining a generic chunk of
state for each (JIT-compiled, in this case) block of code might be
something that would be useful for other targets too. I've kept
the required information (for now at least) within the nvptx backend as
an associative list.

This (WIP) patch is based on top of a version of your patch that I
merged to our internal branch: that's still the easiest way for me to
test the PTX backend (with unloading support) at present, and it passes
libgomp testing that way. Trunk should be fairly close, but I haven't
tried applying it there yet.

The major changes are:

* The removal of the OpenACC-specific plugin hooks open_device,
  close_device, set_device_num and get_device_num. The functionality
  has been moved into the init/fini hooks (for the first two) or moved
  into the target-independent OpenACC parts, respectively.

* The PTX mkoffload utility has been extended to support variables as
  well as function mapping, to fill out support for the load/unload
  image hooks. (Not really tested so far!)

* The plugin hooks that are shared between OpenMP and OpenACC now
  support the "device number" argument properly: that should help with
  (eventually) unifying the plugin interface for the two APIs. (With
  set_device_num and get_device_num removed, the plugin is "stateless"
  with respect to which device is currently active. The rest of the
  OpenACC hooks -- async functions, etc. -- should probably be changed
  to take a device number argument too, but that could be a follow-on
  patch.)

* The limitation of having only one type of device active simultaneously
  in the OpenACC runtime has (theoretically!) been removed.

Thoughts?

Thanks,

Julian

ChangeLog

    gcc/
    * config/nvptx/mkoffload.c (process): Support variable mapping.

    libgomp/
    * libgomp.h (acc_dispatch_t): Remove open_device_func,
    close_device_func, get_device_num_func, set_device_num_func,
    target_data members. Change create_thread_data_func argument to
    device number instead of generic pointer.
    * oacc-async.c (assert.h): Include.
    (acc_async_test, acc_async_test_all, acc_wait, acc_wait_async)
    (acc_wait_all, acc_wait_all_async): Use current host thread's
    active device, not base_dev.
    * oacc-cuda.c (acc_get_current_cuda_device)
    (acc_get_current_cuda_context, acc_get_cuda_stream)
    (acc_set_cuda_stream): Likewise.
    * oacc-host.c (host_dispatch): Don't set open_device_func,
    close_device_func, get_device_num_func or set_device_num_func.
    * oacc-init.c (base_dev, init_key): Remove.
    (cached_base_dev): New.
    (name_of_acc_device_t): New.
    (acc_init_1): Initialise default-numbered device, not zeroth.
    (acc_shutdown_1): Close all devices of a given type.
    (goacc_destroy_thread): Don't use base_dev.
    (lazy_open, lazy_init, lazy_init_and_open): Remove.
    (goacc_attach_host_thread_to_device): New.
    (acc_init): Reimplement with goacc_attach_host_thread_to_device.
    (acc_get_num_devices): Don't use base_dev.
    (acc_set_device_type): Reimplement.
    (acc_get_device_type): Don't use base_dev.
    (acc_get_device_num): Tweak logic.
    (acc_set_device_num): Likewise.
    (goacc_runtime_initialize): Initialize cached_base_dev not base_dev.
    (goacc_lazy_initialize): Reimplement with acc_init and
    goacc_attach_host_thread_to_device.
    * oacc-int.h (goacc_thread): Add base_dev field.
    (base_dev): Remove extern declaration.
    (goacc_attach_host_thread_to_device): Add prototype.
    * oacc-mem.c (acc_malloc): Use current thread's device instead of
    base_dev.
    (acc_free): Likewise.
    (acc_memcpy_to_device): Likewise.
    (acc_memcpy_from_device): Likewise.
    * oacc-parallel.c (select_acc_device): Remove. Replace calls with
    goacc_lazy_initialize (throughout).
    * target.c (gomp_load_plugin_for_device): Don't initialise openacc
    open_device, close_device, get_device_num or set_device_num hooks.
    Don't initialise target_data.
    * plugin/plugin-host.c (GOMP_OFFLOAD_openacc_open_device)
    (GOMP_OFFLOAD_openacc_close_device)
    (GOMP_OFFLOAD_openacc_get_device_num)
    (GOMP_OFFLOAD_openacc_set_device_num): Remove.
    (GOMP_OFFLOAD_openacc_create_thread_data): Change (unused) argument
    to int.
    * plugin/plugin-nvptx.c (pthread.h): Include.
    (ptx_inited): Remove.
    (instantiated_devices, ptx_dev_lock): New.
    (struct ptx_image_data): New.
    (ptx_devices, ptx_images, ptx_image_lock): New.
    (nvptx_get_num_devices): Remove forward declaration.
    (nvptx_init): Change return type to bool.
    (nvptx_fini): Remove.
    (nvptx_attach_host_thread_to_device): New.
    (nvptx_open_device): Remove struct ptx_device* instead of void*.
    (nvptx_close_device): Change argument type to struct ptx_device*,
    return type to void.
    (nvptx_get_num_devices): Use instantiated_devices not ptx_inited.
    (kernel_target_data, kernel_host_table): Remove static globals.
    (GOMP_OFFLOAD_register_image, GOMP_OFFLOAD_get_table): Remove.
    (GOMP_OFFLOAD_init_device): Reimplement.
    (GOMP_OFFLOAD_fini_device): Likewise.
    (GOMP_OFFLOAD_load_image, GOMP_OFFLOAD_unload_image): New.
    (GOMP_OFFLOAD_alloc, GOMP_OFFLOAD_free, GOMP_OFFLOAD_dev2host)
    (GOMP_OFFLOAD_host2dev): Use ORD argument.
    (GOMP_OFFLOAD_openacc_open_device)
    (GOMP_OFFLOAD_openacc_close_device)
    (GOMP_OFFLOAD_openacc_set_device_num)
    (GOMP_OFFLOAD_openacc_get_device_num): Remove.
    (GOMP_OFFLOAD_openacc_create_thread_data): Change argument to int
    (device number).

[-- Attachment #2: nvptx-load-unload-1.diff --]
[-- Type: text/x-patch, Size: 43954 bytes --]

commit 70cf9f51e9a2581385a4f07ff848b5dc4ac588d3
Author: Julian Brown <julian@codesourcery.com>
Date:   Thu Jan 29 06:04:10 2015 -0800

    Adapt nvptx backend to new load/unload mechanism.

diff --git a/gcc/config/nvptx/mkoffload.c b/gcc/config/nvptx/mkoffload.c
index 1cd92d1..640d6a3 100644
--- a/gcc/config/nvptx/mkoffload.c
+++ b/gcc/config/nvptx/mkoffload.c
@@ -748,6 +748,7 @@ process (FILE *in, FILE *out)
 {
   const char *input = read_file (in);
   Token *tok = tokenize (input);
+  unsigned int nvars = 0, nfuncs = 0;
 
   do
     tok = parse_file (tok);
@@ -759,16 +760,17 @@ process (FILE *in, FILE *out)
   write_stmts (out, rev_stmts (fns));
   fprintf (out, ";\n\n");
   fprintf (out, "static const char *var_mappings[] = {\n");
-  for (id_map *id = var_ids; id; id = id->next)
+  for (id_map *id = var_ids; id; id = id->next, nvars++)
     fprintf (out, "\t\"%s\"%s\n", id->ptx_name, id->next ? "," : "");
   fprintf (out, "};\n\n");
   fprintf (out, "static const char *func_mappings[] = {\n");
-  for (id_map *id = func_ids; id; id = id->next)
+  for (id_map *id = func_ids; id; id = id->next, nfuncs++)
     fprintf (out, "\t\"%s\"%s\n", id->ptx_name, id->next ? "," : "");
   fprintf (out, "};\n\n");
 
   fprintf (out, "static const void *target_data[] = {\n");
-  fprintf (out, "  ptx_code, var_mappings, func_mappings\n");
+  fprintf (out, "  ptx_code, (void*) %u, var_mappings, (void*) %u, "
+		"func_mappings\n", nvars, nfuncs);
   fprintf (out, "};\n\n");
 
   fprintf (out, "#ifdef __cplusplus\n");
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index e901bc8..7e8884c 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -690,18 +690,6 @@ typedef struct acc_dispatch_t
   /* This is guarded by the lock in the "outer" struct gomp_device_descr.  */
   struct target_mem_desc *data_environ;
 
-  /* Extra information required for a device instance by a given target.  */
-  /* This is guarded by the lock in the "outer" struct gomp_device_descr.  */
-  void *target_data;
-
-  /* Open or close a device instance.  */
-  void *(*open_device_func) (int n);
-  int (*close_device_func) (void *h);
-
-  /* Set or get the device number.  */
-  int (*get_device_num_func) (void);
-  void (*set_device_num_func) (int);
-
   /* Execute.  */
   void (*exec_func) (void (*) (void *), size_t, void **, void **, size_t *,
 		     unsigned short *, int, int, int, int, size_t, void *);
@@ -719,7 +707,7 @@ typedef struct acc_dispatch_t
   void (*async_set_async_func) (int);
 
   /* Create/destroy TLS data.  */
-  void *(*create_thread_data_func) (void *);
+  void *(*create_thread_data_func) (int);
   void (*destroy_thread_data_func) (void *);
 
   /* NVIDIA target specific routines.  */
diff --git a/libgomp/oacc-async.c b/libgomp/oacc-async.c
index ec711f1..00d536a 100644
--- a/libgomp/oacc-async.c
+++ b/libgomp/oacc-async.c
@@ -26,7 +26,7 @@
    see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
-
+#include <assert.h>
 #include "openacc.h"
 #include "libgomp.h"
 #include "libgomp_target.h"
@@ -35,44 +35,68 @@
 int
 acc_async_test (int async)
 {
+  struct goacc_thread *thr = goacc_thread ();
+
   if (async < acc_async_sync)
     gomp_fatal ("invalid async argument: %d", async);
 
-  return base_dev->openacc.async_test_func (async);
+  assert (thr->dev);
+
+  return thr->dev->openacc.async_test_func (async);
 }
 
 int
 acc_async_test_all (void)
 {
-  return base_dev->openacc.async_test_all_func ();
+  struct goacc_thread *thr = goacc_thread ();
+
+  assert (thr->dev);
+
+  return thr->dev->openacc.async_test_all_func ();
 }
 
 void
 acc_wait (int async)
 {
+  struct goacc_thread *thr = goacc_thread ();
+
   if (async < acc_async_sync)
     gomp_fatal ("invalid async argument: %d", async);
 
-  base_dev->openacc.async_wait_func (async);
+  assert (thr->dev);
+
+  thr->dev->openacc.async_wait_func (async);
 }
 
 void
 acc_wait_async (int async1, int async2)
 {
-  base_dev->openacc.async_wait_async_func (async1, async2);
+  struct goacc_thread *thr = goacc_thread ();
+
+  assert (thr->dev);
+
+  thr->dev->openacc.async_wait_async_func (async1, async2);
 }
 
 void
 acc_wait_all (void)
 {
-  base_dev->openacc.async_wait_all_func ();
+  struct goacc_thread *thr = goacc_thread ();
+
+  assert (thr->dev);
+
+  thr->dev->openacc.async_wait_all_func ();
 }
 
 void
 acc_wait_all_async (int async)
 {
+  struct goacc_thread *thr = goacc_thread ();
+
   if (async < acc_async_sync)
     gomp_fatal ("invalid async argument: %d", async);
 
-  base_dev->openacc.async_wait_all_async_func (async);
+  assert (thr->dev);
+
+  thr->dev->openacc.async_wait_all_async_func (async);
 }
diff --git a/libgomp/oacc-cuda.c b/libgomp/oacc-cuda.c
index 8a81f03..12a35f3 100644
--- a/libgomp/oacc-cuda.c
+++ b/libgomp/oacc-cuda.c
@@ -35,51 +35,53 @@
 void *
 acc_get_current_cuda_device (void)
 {
-  void *p = NULL;
+  struct goacc_thread *thr = goacc_thread ();
 
-  if (base_dev && base_dev->openacc.cuda.get_current_device_func)
-    p = base_dev->openacc.cuda.get_current_device_func ();
+  if (thr && thr->dev && thr->dev->openacc.cuda.get_current_device_func)
+    return thr->dev->openacc.cuda.get_current_device_func ();
 
-  return p;
+  return NULL;
 }
 
 void *
 acc_get_current_cuda_context (void)
 {
-  void *p = NULL;
+  struct goacc_thread *thr = goacc_thread ();
 
-  if (base_dev && base_dev->openacc.cuda.get_current_context_func)
-    p = base_dev->openacc.cuda.get_current_context_func ();
+  if (thr && thr->dev && thr->dev->openacc.cuda.get_current_context_func)
+    return thr->dev->openacc.cuda.get_current_context_func ();
 
-  return p;
+  return NULL;
 }
 
 void *
 acc_get_cuda_stream (int async)
 {
-  void *p = NULL;
+  struct goacc_thread *thr = goacc_thread ();
 
   if (async < 0)
-    return p;
+    return NULL;
 
-  if (base_dev && base_dev->openacc.cuda.get_stream_func)
-    p = base_dev->openacc.cuda.get_stream_func (async);
+  if (thr && thr->dev && thr->dev->openacc.cuda.get_stream_func)
+    return thr->dev->openacc.cuda.get_stream_func (async);
 
-  return p;
+  return NULL;
 }
 
 int
 acc_set_cuda_stream (int async, void *stream)
 {
-  int s = -1;
+  struct goacc_thread *thr;
 
   if (async < 0 || stream == NULL)
     return 0;
   
   goacc_lazy_initialize ();
 
-  if (base_dev && base_dev->openacc.cuda.set_stream_func)
-    s = base_dev->openacc.cuda.set_stream_func (async, stream);
+  thr = goacc_thread ();
 
-  return s;
+  if (thr && thr->dev && thr->dev->openacc.cuda.set_stream_func)
+    return thr->dev->openacc.cuda.set_stream_func (async, stream);
+
+  return -1;
 }
diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index 8fbbb37..fb605ed 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -58,12 +58,6 @@ static struct gomp_device_descr host_dispatch =
     .run_func = GOMP_OFFLOAD_run,
 
     .openacc = {
-      .open_device_func = GOMP_OFFLOAD_openacc_open_device,
-      .close_device_func = GOMP_OFFLOAD_openacc_close_device,
-
-      .get_device_num_func = GOMP_OFFLOAD_openacc_get_device_num,
-      .set_device_num_func = GOMP_OFFLOAD_openacc_set_device_num,
-
       .exec_func = GOMP_OFFLOAD_openacc_parallel,
 
       .register_async_cleanup_func
diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
index c07ff66..6206da6 100644
--- a/libgomp/oacc-init.c
+++ b/libgomp/oacc-init.c
@@ -39,14 +39,13 @@
 
 static gomp_mutex_t acc_device_lock;
 
-/* The dispatch table for the current accelerator device.  This is global, so
-   you can only have one type of device open at any given time in a program. 
-   This is the "base" device in that several devices that use the same
-   dispatch table may be active concurrently: this one (the "zeroth") is used
-   for overall initialisation/shutdown, and other instances -- not necessarily
-   including this one -- may be opened and closed once the base device has
-   been initialized.  */
-struct gomp_device_descr *base_dev;
+/* A cached version of the dispatcher for the global "current" accelerator type,
+   e.g. used as the default when creating new host threads.  This is the
+   device-type equivalent of goacc_device_num (which specifies which device to
+   use out of potentially several of the same type).  If there are several
+   devices of a given type, this points at the first one.  */
+
+static struct gomp_device_descr *cached_base_dev = NULL;
 
 #if defined HAVE_TLS || defined USE_EMUTLS
 __thread struct goacc_thread *goacc_tls_data;
@@ -55,9 +54,6 @@ pthread_key_t goacc_tls_key;
 #endif
 static pthread_key_t goacc_cleanup_key;
 
-/* Current dispatcher, and how it was initialized */
-static acc_device_t init_key = _ACC_device_hwm;
-
 static struct goacc_thread *goacc_threads;
 static gomp_mutex_t goacc_thread_lock;
 
@@ -96,6 +92,21 @@ get_openacc_name (const char *name)
     return name;
 }
 
+static const char *
+name_of_acc_device_t (enum acc_device_t type)
+{
+  switch (type)
+    {
+    case acc_device_none: return "none";
+    case acc_device_default: return "default";
+    case acc_device_host: return "host";
+    case acc_device_host_nonshm: return "host_nonshm";
+    case acc_device_not_host: return "not_host";
+    case acc_device_nvidia: return "nvidia";
+    default: return "<unknown>";
+    }
+}
+
 static struct gomp_device_descr *
 resolve_device (acc_device_t d)
 {
@@ -161,22 +172,89 @@ resolve_device (acc_device_t d)
 static struct gomp_device_descr *
 acc_init_1 (acc_device_t d)
 {
-  struct gomp_device_descr *acc_dev;
+  struct gomp_device_descr *base_dev, *acc_dev;
+  int ndevs;
 
-  acc_dev = resolve_device (d);
+  base_dev = resolve_device (d);
 
-  if (!acc_dev || acc_dev->get_num_devices_func () <= 0)
-    gomp_fatal ("device %u not supported", (unsigned)d);
+  ndevs = base_dev->get_num_devices_func ();
+
+  if (!base_dev || ndevs <= 0 || goacc_device_num >= ndevs)
+    gomp_fatal ("device %s not supported", name_of_acc_device_t (d));
+
+  acc_dev = &base_dev[goacc_device_num];
 
   if (acc_dev->is_initialized)
     gomp_fatal ("device already active");
 
-  /* We need to remember what we were intialized as, to check shutdown etc.  */
-  init_key = d;  
-
   gomp_init_device (acc_dev);
 
-  return acc_dev;
+  return base_dev;
+}
+
+static void
+acc_shutdown_1 (acc_device_t d)
+{
+  struct gomp_device_descr *base_dev;
+  struct goacc_thread *walk;
+  int ndevs, i;
+  bool devices_active = false;
+
+  /* Get the base device for this device type.  */
+  base_dev = resolve_device (d);
+
+  if (!base_dev)
+    gomp_fatal ("device %s not supported", name_of_acc_device_t (d));
+
+  gomp_mutex_lock (&goacc_thread_lock);
+
+  /* Free target-specific TLS data and close all devices.  */
+  for (walk = goacc_threads; walk != NULL; walk = walk->next)
+    {
+      if (walk->target_tls)
+	base_dev->openacc.destroy_thread_data_func (walk->target_tls);
+
+      walk->target_tls = NULL;
+
+      /* This would mean the user is shutting down OpenACC in the middle of an
+         "acc data" pragma.  Likely not intentional.  */
+      if (walk->mapped_data)
+	gomp_fatal ("shutdown in 'acc data' region");
+
+      /* Similarly, if this happens then user code has done something weird.  */
+      if (walk->saved_bound_dev)
+        gomp_fatal ("shutdown during host fallback");
+
+      if (walk->dev)
+	{
+	  struct gomp_memory_mapping *mem_map = &walk->dev->mem_map;
+
+	  gomp_mutex_lock (&mem_map->lock);
+	  gomp_free_memmap (mem_map);
+	  gomp_mutex_unlock (&mem_map->lock);
+
+	  walk->dev = NULL;
+	  walk->base_dev = NULL;
+	}
+    }
+
+  gomp_mutex_unlock (&goacc_thread_lock);
+
+  ndevs = base_dev->get_num_devices_func ();
+
+  /* Close all the devices of this type that have been opened.  */
+  for (i = 0; i < ndevs; i++)
+    {
+      struct gomp_device_descr *acc_dev = &base_dev[i];
+      if (acc_dev->is_initialized)
+        {
+	  devices_active = true;
+	  gomp_fini_device (acc_dev);
+	}
+    }
+  
+  if (!devices_active)
+    gomp_fatal ("no device initialized");
 }
 
 static struct goacc_thread *
@@ -209,9 +287,11 @@ goacc_destroy_thread (void *data)
   
   if (thr)
     {
-      if (base_dev && thr->target_tls)
+      struct gomp_device_descr *acc_dev = thr->dev;
+
+      if (acc_dev && thr->target_tls)
 	{
-	  base_dev->openacc.destroy_thread_data_func (thr->target_tls);
+	  acc_dev->openacc.destroy_thread_data_func (thr->target_tls);
 	  thr->target_tls = NULL;
 	}
 
@@ -238,53 +318,49 @@ goacc_destroy_thread (void *data)
   gomp_mutex_unlock (&goacc_thread_lock);
 }
 
-/* Open the ORD'th device of the currently-active type (base_dev must be
-   initialised before calling).  If ORD is < 0, open the default-numbered
-   device (set by the ACC_DEVICE_NUM environment variable or a call to
-   acc_set_device_num), or leave any currently-opened device as is.  "Opening"
-   consists of calling the device's open_device_func hook, and setting up
-   thread-local data (maybe allocating, then initializing with information
-   pertaining to the newly-opened or previously-opened device).  */
+/* Use the ORD'th device instance for the current host thread (or -1 for the
+   current global default).  The device (and the runtime) must be initialised
+   before calling this function.  */
 
-static void
-lazy_open (int ord)
+void
+goacc_attach_host_thread_to_device (int ord)
 {
   struct goacc_thread *thr = goacc_thread ();
-  struct gomp_device_descr *acc_dev;
-
-  if (thr && thr->dev)
-    {
-      assert (ord < 0 || ord == thr->dev->target_id);
-      return;
-    }
-
-  assert (base_dev);
-
+  struct gomp_device_descr *acc_dev = NULL, *base_dev = NULL;
+  int num_devices;
+  
+  if (thr && thr->dev && (thr->dev->target_id == ord || ord < 0))
+    return;
+  
   if (ord < 0)
     ord = goacc_device_num;
-
-  /* The OpenACC 2.0 spec leaves the runtime's behaviour when an out-of-range
-     device is requested as implementation-defined (4.2 ACC_DEVICE_NUM).
-     We choose to raise an error in such a case.  */
-  if (ord >= base_dev->get_num_devices_func ())
-    gomp_fatal ("device %u does not exist", ord);
-
+  
+  /* Decide which type of device to use.  If the current thread has a device
+     type already (e.g. set by acc_set_device_type), use that, else use the
+     global default.  */
+  if (thr && thr->base_dev)
+    base_dev = thr->base_dev;
+  else
+    {
+      assert (cached_base_dev);
+      base_dev = cached_base_dev;
+    }
+  
+  num_devices = base_dev->get_num_devices_func ();
+  if (num_devices <= 0 || ord >= num_devices)
+    gomp_fatal ("device %u out of range", ord);
+  
   if (!thr)
     thr = goacc_new_thread ();
-
-  acc_dev = thr->dev = &base_dev[ord];
-
-  assert (acc_dev->target_id == ord);
-
+  
+  thr->base_dev = base_dev;
+  thr->dev = acc_dev = &base_dev[ord];
   thr->saved_bound_dev = NULL;
   thr->mapped_data = NULL;
-
-  if (!acc_dev->openacc.target_data)
-    acc_dev->openacc.target_data = acc_dev->openacc.open_device_func (ord);
-
+  
   thr->target_tls
-    = acc_dev->openacc.create_thread_data_func (acc_dev->openacc.target_data);
-
+    = acc_dev->openacc.create_thread_data_func (ord);
+  
   acc_dev->openacc.async_set_async_func (acc_async_sync);
 }
 
@@ -294,75 +370,20 @@ lazy_open (int ord)
 void
 acc_init (acc_device_t d)
 {
-  if (!base_dev)
+  if (!cached_base_dev)
     gomp_init_targets_once ();
 
   gomp_mutex_lock (&acc_device_lock);
 
-  base_dev = acc_init_1 (d);
-
-  lazy_open (-1);
+  cached_base_dev = acc_init_1 (d);
 
   gomp_mutex_unlock (&acc_device_lock);
+  
+  goacc_attach_host_thread_to_device (-1);
 }
 
 ialias (acc_init)
 
-static void
-acc_shutdown_1 (acc_device_t d)
-{
-  struct goacc_thread *walk;
-
-  /* We don't check whether d matches the actual device found, because
-     OpenACC 2.0 (3.2.12) says the parameters to the init and this
-     call must match (for the shutdown call anyway, it's silent on
-     others).  */
-
-  if (!base_dev)
-    gomp_fatal ("no device initialized");
-  if (d != init_key)
-    gomp_fatal ("device %u(%u) is initialized",
-		(unsigned) init_key, (unsigned) base_dev->type);
-
-  gomp_mutex_lock (&goacc_thread_lock);
-
-  /* Free target-specific TLS data and close all devices.  */
-  for (walk = goacc_threads; walk != NULL; walk = walk->next)
-    {
-      if (walk->target_tls)
-	base_dev->openacc.destroy_thread_data_func (walk->target_tls);
-
-      walk->target_tls = NULL;
-
-      /* This would mean the user is shutting down OpenACC in the middle of an
-         "acc data" pragma.  Likely not intentional.  */
-      if (walk->mapped_data)
-	gomp_fatal ("shutdown in 'acc data' region");
-
-      if (walk->dev)
-	{
-	  void *target_data = walk->dev->openacc.target_data;
-	  if (walk->dev->openacc.close_device_func (target_data) < 0)
-	    gomp_fatal ("failed to close device");
-
-	  walk->dev->openacc.target_data = target_data = NULL;
-
-	  struct gomp_memory_mapping *mem_map = &walk->dev->mem_map;
-	  gomp_mutex_lock (&mem_map->lock);
-	  gomp_free_memmap (mem_map);
-	  gomp_mutex_unlock (&mem_map->lock);
-
-	  walk->dev = NULL;
-	}
-    }
-
-  gomp_mutex_unlock (&goacc_thread_lock);
-
-  gomp_fini_device (base_dev);
-
-  base_dev = NULL;
-}
-
 void
 acc_shutdown (acc_device_t d)
 {
@@ -375,59 +396,16 @@ acc_shutdown (acc_device_t d)
 
 ialias (acc_shutdown)
 
-/* This function is called after plugins have been initialized.  It deals with
-   the "base" device, and is used to prepare the runtime for dealing with a
-   number of such devices (as implemented by some particular plugin).  If the
-   argument device type D matches a previous call to the function, return the
-   current base device, else shut the old device down and re-initialize with
-   the new device type.  */
-
-static struct gomp_device_descr *
-lazy_init (acc_device_t d)
-{
-  if (base_dev)
-    {
-      /* Re-initializing the same device, do nothing.  */
-      if (d == init_key)
-	return base_dev;
-
-      acc_shutdown_1 (init_key);
-    }
-
-  assert (!base_dev);
-
-  return acc_init_1 (d);
-}
-
-/* Ensure that plugins are loaded, initialize and open the (default-numbered)
-   device.  */
-
-static void
-lazy_init_and_open (acc_device_t d)
-{
-  if (!base_dev)
-    gomp_init_targets_once ();
-
-  gomp_mutex_lock (&acc_device_lock);
-
-  base_dev = lazy_init (d);
-
-  lazy_open (-1);
-
-  gomp_mutex_unlock (&acc_device_lock);
-}
-
 int
 acc_get_num_devices (acc_device_t d)
 {
   int n = 0;
-  const struct gomp_device_descr *acc_dev;
+  struct gomp_device_descr *acc_dev;
 
   if (d == acc_device_none)
     return 0;
 
-  if (!base_dev)
-    gomp_init_targets_once ();
+  gomp_init_targets_once ();
 
   acc_dev = resolve_device (d);
   if (!acc_dev)
@@ -442,10 +420,39 @@ acc_get_num_devices (acc_device_t d)
 
 ialias (acc_get_num_devices)
 
+/* Set the device type for the current thread only (using the current global
+   default device number), initialising that device if necessary.  Also set the
+   default device type for new threads to D.  */
+
 void
 acc_set_device_type (acc_device_t d)
 {
-  lazy_init_and_open (d);
+  struct gomp_device_descr *base_dev, *acc_dev;
+  struct goacc_thread *thr = goacc_thread ();
+
+  gomp_mutex_lock (&acc_device_lock);
+
+  if (!cached_base_dev)
+    gomp_init_targets_once ();
+
+  cached_base_dev = base_dev = resolve_device (d);
+  acc_dev = &base_dev[goacc_device_num];
+
+  if (!acc_dev->is_initialized)
+    gomp_init_device (acc_dev);
+
+  gomp_mutex_unlock (&acc_device_lock);
+
+  /* We're changing device type: invalidate the current thread's dev and
+     base_dev pointers.  */
+  if (thr && thr->base_dev != base_dev)
+    {
+      thr->base_dev = thr->dev = NULL;
+      if (thr->mapped_data)
+        gomp_fatal ("acc_set_device_type in 'acc data' region");
+    }
+
+  goacc_attach_host_thread_to_device (-1);
 }
 
 ialias (acc_set_device_type)
@@ -454,10 +461,11 @@ acc_device_t
 acc_get_device_type (void)
 {
   acc_device_t res = acc_device_none;
-  const struct gomp_device_descr *dev;
+  struct gomp_device_descr *dev;
+  struct goacc_thread *thr = goacc_thread ();
 
-  if (base_dev)
-    res = acc_device_type (base_dev->type);
+  if (thr && thr->base_dev)
+    res = acc_device_type (thr->base_dev->type);
   else
     {
       gomp_init_targets_once ();
@@ -478,78 +486,64 @@ int
 acc_get_device_num (acc_device_t d)
 {
   const struct gomp_device_descr *dev;
-  int num;
+  struct goacc_thread *thr = goacc_thread ();
 
   if (d >= _ACC_device_hwm)
     gomp_fatal ("device %u out of range", (unsigned)d);
 
-  if (!base_dev)
+  if (!cached_base_dev)
     gomp_init_targets_once ();
 
   dev = resolve_device (d);
   if (!dev)
-    gomp_fatal ("no devices of type %u", d);
+    gomp_fatal ("device %s not supported", name_of_acc_device_t (d));
 
-  /* We might not have called lazy_open for this host thread yet, in which case
-     the get_device_num_func hook will return -1.  */
-  num = dev->openacc.get_device_num_func ();
-  if (num < 0)
-    num = goacc_device_num;
+  if (thr && thr->base_dev == dev && thr->dev)
+    return thr->dev->target_id;
   
-  return num;
+  return goacc_device_num;
 }
 
 ialias (acc_get_device_num)
 
 void
-acc_set_device_num (int n, acc_device_t d)
+acc_set_device_num (int ord, acc_device_t d)
 {
-  const struct gomp_device_descr *dev;
+  struct gomp_device_descr *base_dev, *acc_dev;
   int num_devices;
 
-  if (!base_dev)
+  if (!cached_base_dev)
     gomp_init_targets_once ();
-  
-  if ((int) d == 0)
-    {
-      int i;
-      
-      /* A device setting of zero sets all device types on the system to use
-         the Nth instance of that device type.  Only attempt it for initialized
-	 devices though.  */
-      for (i = acc_device_not_host + 1; i < _ACC_device_hwm; i++)
-        {
-	  dev = resolve_device (d);
-	  if (dev && dev->is_initialized)
-	    dev->openacc.set_device_num_func (n);
-	}
 
-      /* ...and for future calls to acc_init/acc_set_device_type, etc.  */
-      goacc_device_num = n;
-    }
+  if (ord < 0)
+    ord = goacc_device_num;
+
+  if ((int) d == 0)
+    /* Set whatever device is being used by the current host thread to use
+       device instance ORD.  It's unclear if this is supposed to affect other
+       host threads too (OpenACC 2.0 (3.2.4) acc_set_device_num).  */
+    goacc_attach_host_thread_to_device (ord);
   else
     {
-      struct goacc_thread *thr = goacc_thread ();
-
       gomp_mutex_lock (&acc_device_lock);
-
-      base_dev = lazy_init (d);
+      cached_base_dev = base_dev = resolve_device (d);
 
       num_devices = base_dev->get_num_devices_func ();
 
-      if (n >= num_devices)
-        gomp_fatal ("device %u out of range", n);
+      if (ord >= num_devices)
+        gomp_fatal ("device %u out of range", ord);
 
-      /* If we're changing the device number, de-associate this thread with
-	 the device (but don't close the device, since it may be in use by
-	 other threads).  */
-      if (thr && thr->dev && n != thr->dev->target_id)
-	thr->dev = NULL;
+      acc_dev = &base_dev[ord];
 
-      lazy_open (n);
+      if (!acc_dev->is_initialized)
+        gomp_init_device (acc_dev);
 
       gomp_mutex_unlock (&acc_device_lock);
+
+      goacc_attach_host_thread_to_device (ord);
     }
+
+  goacc_device_num = ord;
 }
 
 ialias (acc_set_device_num)
@@ -579,7 +573,7 @@ goacc_runtime_initialize (void)
 
   pthread_key_create (&goacc_cleanup_key, goacc_destroy_thread);
 
-  base_dev = NULL;
+  cached_base_dev = NULL;
 
   goacc_threads = NULL;
   gomp_mutex_init (&goacc_thread_lock);
@@ -608,9 +602,8 @@ goacc_restore_bind (void)
 }
 
 /* This is called from any OpenACC support function that may need to implicitly
-   initialize the libgomp runtime.  On exit all such initialization will have
-   been done, and both the global ACC_dev and the per-host-thread ACC_memmap
-   pointers will be valid.  */
+   initialize the libgomp runtime, either globally or from a new host thread. 
+   On exit "goacc_thread" will return a valid & populated thread block.  */
 
 attribute_hidden void
 goacc_lazy_initialize (void)
@@ -620,12 +613,8 @@ goacc_lazy_initialize (void)
   if (thr && thr->dev)
     return;
 
-  if (!base_dev)
-    lazy_init_and_open (acc_device_default);
+  if (!cached_base_dev)
+    acc_init (acc_device_default);
   else
-    {
-      gomp_mutex_lock (&acc_device_lock);
-      lazy_open (-1);
-      gomp_mutex_unlock (&acc_device_lock);
-    }
+    goacc_attach_host_thread_to_device (-1);
 }
diff --git a/libgomp/oacc-int.h b/libgomp/oacc-int.h
index e03cd8d..049b4c4 100644
--- a/libgomp/oacc-int.h
+++ b/libgomp/oacc-int.h
@@ -56,6 +56,9 @@ acc_device_type (enum offload_target_type type)
 
 struct goacc_thread
 {
+  /* The base device for the current thread.  */
+  struct gomp_device_descr *base_dev;
+
   /* The device for the current thread.  */
   struct gomp_device_descr *dev;
   
@@ -91,10 +94,7 @@ goacc_thread (void)
 struct gomp_device_descr;
 
 void goacc_register (struct gomp_device_descr *) __GOACC_NOTHROW;
-
-/* Current dispatcher.  */
-extern struct gomp_device_descr *base_dev;
-
+void goacc_attach_host_thread_to_device (int);
 void goacc_runtime_initialize (void);
 void goacc_save_and_set_bind (acc_device_t);
 void goacc_restore_bind (void);
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 7cbb139..186a395 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -115,7 +115,9 @@ acc_malloc (size_t s)
 
   struct goacc_thread *thr = goacc_thread ();
 
-  return base_dev->alloc_func (thr->dev->target_id, s);
+  assert (thr->dev);
+
+  return thr->dev->alloc_func (thr->dev->target_id, s);
 }
 
 /* OpenACC 2.0a (3.2.16) doesn't specify what to do in the event
@@ -130,6 +132,8 @@ acc_free (void *d)
   if (!d)
     return;
 
+  assert (thr->dev);
+
   /* We don't have to call lazy open here, as the ptr value must have
      been returned by acc_malloc.  It's not permitted to pass NULL in
      (unless you got that null from acc_malloc).  */
@@ -142,7 +146,7 @@ acc_free (void *d)
      acc_unmap_data ((void *)(k->host_start + offset));
    }
 
-  base_dev->free_func (thr->dev->target_id, d);
+  thr->dev->free_func (thr->dev->target_id, d);
 }
 
 void
@@ -152,7 +156,9 @@ acc_memcpy_to_device (void *d, void *h, size_t s)
      been obtained from a routine that did that.  */
   struct goacc_thread *thr = goacc_thread ();
 
-  base_dev->host2dev_func (thr->dev->target_id, d, h, s);
+  assert (thr->dev);
+
+  thr->dev->host2dev_func (thr->dev->target_id, d, h, s);
 }
 
 void
@@ -162,7 +168,9 @@ acc_memcpy_from_device (void *h, void *d, size_t s)
      been obtained from a routine that did that.  */
   struct goacc_thread *thr = goacc_thread ();
 
-  base_dev->dev2host_func (thr->dev->target_id, h, d, s);
+  assert (thr->dev);
+
+  thr->dev->dev2host_func (thr->dev->target_id, h, d, s);
 }
 
 /* Return the device pointer that corresponds to host data H.  Or NULL
diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
index e40314e..a683c07 100644
--- a/libgomp/oacc-parallel.c
+++ b/libgomp/oacc-parallel.c
@@ -49,32 +49,6 @@ find_pset (int pos, size_t mapnum, unsigned short *kinds)
   return kind == GOMP_MAP_TO_PSET;
 }
 
-
-/* Ensure that the target device for DEVICE_TYPE is initialised (and that
-   plugins have been loaded if appropriate).  The ACC_dev variable for the
-   current thread will be set appropriately for the given device type on
-   return.  */
-
-attribute_hidden void
-select_acc_device (int device_type)
-{
-  goacc_lazy_initialize ();
-
-  if (device_type == GOMP_DEVICE_HOST_FALLBACK)
-    return;
-
-  if (device_type == acc_device_none)
-    device_type = acc_device_host;
-
-  if (device_type >= 0)
-    {
-      /* NOTE: this will go badly if the surrounding data environment is set up
-         to use a different device type.  We'll just have to trust that users
-	 know what they're doing...  */
-      acc_set_device_type (device_type);
-    }
-}
-
 static void *__goacc_host_ganglocal_ptr;
 
 void *
@@ -128,7 +102,7 @@ GOACC_parallel (int device, void (*fn) (void *), const void *offload_table,
   gomp_debug (0, "%s: mapnum=%zd, hostaddrs=%p, sizes=%p, kinds=%p, shared_size=%zu async=%d\n",
 	      __FUNCTION__, mapnum, hostaddrs, sizes, kinds, shared_size, async);
 
-  select_acc_device (device);
+  goacc_lazy_initialize ();
 
   thr = goacc_thread ();
   acc_dev = thr->dev;
@@ -211,7 +185,7 @@ GOACC_data_start (int device, const void *offload_table, size_t mapnum,
   gomp_debug (0, "%s: mapnum=%zd, hostaddrs=%p, sizes=%p, kinds=%p\n",
 	      __FUNCTION__, mapnum, hostaddrs, sizes, kinds);
 
-  select_acc_device (device);
+  goacc_lazy_initialize ();
 
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
@@ -258,7 +232,7 @@ GOACC_enter_exit_data (int device, const void *offload_table, size_t mapnum,
   bool data_enter = false;
   size_t i;
 
-  select_acc_device (device);
+  goacc_lazy_initialize ();
 
   thr = goacc_thread ();
   acc_dev = thr->dev;
@@ -399,7 +373,7 @@ GOACC_kernels (int device, void (*fn) (void *), const void *offload_table,
 
   va_list ap;
 
-  select_acc_device (device);
+  goacc_lazy_initialize ();
 
   va_start (ap, num_waits);
 
@@ -471,7 +445,7 @@ GOACC_update (int device, const void *offload_table, size_t mapnum,
   bool host_fallback = device == GOMP_DEVICE_HOST_FALLBACK;
   size_t i;
 
-  select_acc_device (device);
+  goacc_lazy_initialize ();
 
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
diff --git a/libgomp/plugin/plugin-host.c b/libgomp/plugin/plugin-host.c
index f0f3102..8d1059d 100644
--- a/libgomp/plugin/plugin-host.c
+++ b/libgomp/plugin/plugin-host.c
@@ -124,31 +124,6 @@ GOMP_OFFLOAD_unload_image (int n __attribute__ ((unused)),
 }
 
 STATIC void *
-GOMP_OFFLOAD_openacc_open_device (int n)
-{
-  return (void *) (intptr_t) n;
-}
-
-STATIC int
-GOMP_OFFLOAD_openacc_close_device (void *hnd)
-{
-  return 0;
-}
-
-STATIC int
-GOMP_OFFLOAD_openacc_get_device_num (void)
-{
-  return 0;
-}
-
-STATIC void
-GOMP_OFFLOAD_openacc_set_device_num (int n)
-{
-  if (n > 0)
-    GOMP(fatal) ("device number %u out of range for host execution", n);
-}
-
-STATIC void *
 GOMP_OFFLOAD_alloc (int n __attribute__((unused)), size_t s)
 {
   return GOMP(malloc) (s);
@@ -260,7 +235,7 @@ GOMP_OFFLOAD_openacc_async_wait_all_async (int async __attribute__((unused)))
 }
 
 STATIC void *
-GOMP_OFFLOAD_openacc_create_thread_data (void *targ_data
+GOMP_OFFLOAD_openacc_create_thread_data (int ord
 					 __attribute__((unused)))
 {
   return NULL;
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 7a3d4ab..3f627eb 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -46,6 +46,7 @@
 #include <dlfcn.h>
 #include <unistd.h>
 #include <assert.h>
+#include <pthread.h>
 
 #define	ARRAYSIZE(X) (sizeof (X) / sizeof ((X)[0]))
 
@@ -133,7 +134,8 @@ struct targ_fn_descriptor
   const char *name;
 };
 
-static bool ptx_inited = false;
+static unsigned int instantiated_devices = 0;
+static pthread_mutex_t ptx_dev_lock = PTHREAD_MUTEX_INITIALIZER;
 
 struct ptx_stream
 {
@@ -331,9 +333,21 @@ struct ptx_event
   struct ptx_event *next;
 };
 
+struct ptx_image_data
+{
+  void *target_data;
+  CUmodule module;
+  struct ptx_image_data *next;
+};
+
 static gomp_mutex_t ptx_event_lock;
 static struct ptx_event *ptx_events;
 
+static struct ptx_device **ptx_devices;
+
+static struct ptx_image_data *ptx_images = NULL;
+static pthread_mutex_t ptx_image_lock = PTHREAD_MUTEX_INITIALIZER;
+
 #define _XSTR(s) _STR(s)
 #define _STR(s) #s
 
@@ -575,21 +589,21 @@ select_stream_for_async (int async, pthread_t thread, bool create,
   return stream;
 }
 
-static int nvptx_get_num_devices (void);
-
-/* Initialize the device.  */
-static int
+/* Initialize the device.  Return TRUE on success, else FALSE.  PTX_DEV_LOCK
+   should be locked on entry and remains locked on exit.  */
+static bool
 nvptx_init (void)
 {
   CUresult r;
   int rc;
+  int ndevs;
 
-  if (ptx_inited)
-    return nvptx_get_num_devices ();
+  if (instantiated_devices != 0)
+    return true;
 
   rc = verify_device_library ();
   if (rc < 0)
-    return -1;
+    return false;
 
   r = cuInit (0);
   if (r != CUDA_SUCCESS)
@@ -599,18 +613,58 @@ nvptx_init (void)
 
   GOMP_PLUGIN_mutex_init (&ptx_event_lock);
 
-  ptx_inited = true;
+  r = cuDeviceGetCount (&ndevs);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuDeviceGetCount error: %s", cuda_error (r));
+
+  ptx_devices = GOMP_PLUGIN_malloc (sizeof (struct ptx_device *) * ndevs);
 
-  return nvptx_get_num_devices ();
+  return true;
 }
 
+/* Select the N'th PTX device for the current host thread.  The device must
+   have been previously opened before calling this function.  */
+
 static void
-nvptx_fini (void)
+nvptx_attach_host_thread_to_device (int n)
 {
-  ptx_inited = false;
+  CUdevice dev;
+  CUresult r;
+  struct ptx_device *ptx_dev;
+  CUcontext thd_ctx;
+
+  r = cuCtxGetDevice (&dev);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuCtxGetDevice error: %s", cuda_error (r));
+
+  if (dev == n)
+    return;
+  else
+    {
+      CUcontext old_ctx;
+
+      ptx_dev = ptx_devices[n];
+      assert (ptx_dev);
+
+      r = cuCtxGetCurrent (&thd_ctx);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuCtxGetCurrent error: %s", cuda_error (r));
+
+      /* The host thread must already have a non-NULL context when this function
+         is called because we must have previously called nvptx_open_device.  */
+      assert (thd_ctx);
+
+      r = cuCtxPopCurrent (&old_ctx);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuCtxPopCurrent error: %s", cuda_error (r));
+
+      r = cuCtxPushCurrent (ptx_dev->ctx);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuCtxPushCurrent error: %s", cuda_error (r));
+    }
 }
 
-static void *
+static struct ptx_device *
 nvptx_open_device (int n)
 {
   struct ptx_device *ptx_dev;
@@ -678,17 +732,16 @@ nvptx_open_device (int n)
 
   init_streams_for_device (ptx_dev, async_engines);
 
-  return (void *) ptx_dev;
+  return ptx_dev;
 }
 
-static int
-nvptx_close_device (void *targ_data)
+static void
+nvptx_close_device (struct ptx_device *ptx_dev)
 {
   CUresult r;
-  struct ptx_device *ptx_dev = targ_data;
 
   if (!ptx_dev)
-    return 0;
+    return;
 
   fini_streams_for_device (ptx_dev);
 
@@ -700,8 +753,6 @@ nvptx_close_device (void *targ_data)
     }
 
   free (ptx_dev);
-
-  return 0;
 }
 
 static int
@@ -714,7 +765,7 @@ nvptx_get_num_devices (void)
      order to enumerate available devices, but CUDA API routines can't be used
      until cuInit has been called.  Just call it now (but don't yet do any
      further initialization).  */
-  if (!ptx_inited)
+  if (instantiated_devices == 0)
     cuInit (0);
 
   r = cuDeviceGetCount (&n);
@@ -1508,64 +1559,84 @@ GOMP_OFFLOAD_get_num_devices (void)
   return nvptx_get_num_devices ();
 }
 
-static void **kernel_target_data;
-static void **kernel_host_table;
-
 void
-GOMP_OFFLOAD_register_image (void *host_table, void *target_data)
+GOMP_OFFLOAD_init_device (int n)
 {
-  kernel_target_data = target_data;
-  kernel_host_table = host_table;
-}
+  pthread_mutex_lock (&ptx_dev_lock);
 
-void
-GOMP_OFFLOAD_init_device (int n __attribute__((unused)))
-{
-  (void) nvptx_init ();
+  if (!nvptx_init ()
+      || (instantiated_devices & (1 << n)) != 0)
+    {
+      pthread_mutex_unlock (&ptx_dev_lock);
+      return NULL;
+    }
+
+  ptx_devices[n] = nvptx_open_device (n);
+  instantiated_devices |= 1 << n;
+
+  pthread_mutex_unlock (&ptx_dev_lock);
 }
 
 void
-GOMP_OFFLOAD_fini_device (int n __attribute__((unused)))
+GOMP_OFFLOAD_fini_device (int n)
 {
-  nvptx_fini ();
+  pthread_mutex_lock (&ptx_dev_lock);
+
+  if (instantiated_devices & (1 << n))
+    {
+      nvptx_close_device (ptx_devices[n]);
+      instantiated_devices &= ~(1 << n);
+    }
+
+  pthread_mutex_unlock (&ptx_dev_lock);
 }
 
 int
-GOMP_OFFLOAD_get_table (int n __attribute__((unused)),
-			struct mapping_table **tablep)
+GOMP_OFFLOAD_load_image (int ord, void *target_data,
+			 struct addr_pair **target_table)
 {
   CUmodule module;
-  void **fn_table;
-  char **fn_names;
-  int fn_entries, i;
+  char **fn_names, **var_names;
+  unsigned int fn_entries, var_entries, i, j;
   CUresult r;
   struct targ_fn_descriptor *targ_fns;
+  void **img_header = (void **) target_data;
+  struct ptx_image_data *new_image;
+
+  nvptx_attach_host_thread_to_device (ord);
 
   if (nvptx_init () <= 0)
     return 0;
 
-  /* This isn't an error, because an image may legitimately have no offloaded
-     regions and so will not call GOMP_offload_register.  */
-  if (kernel_target_data == NULL)
-    return 0;
+  link_ptx (&module, img_header[0]);
 
-  link_ptx (&module, kernel_target_data[0]);
+  pthread_mutex_lock (&ptx_image_lock);
+  new_image = GOMP_PLUGIN_malloc (sizeof (struct ptx_image_data));
+  new_image->target_data = target_data;
+  new_image->module = module;
+  new_image->next = ptx_images;
+  ptx_images = new_image;
+  pthread_mutex_unlock (&ptx_image_lock);
 
-  /* kernel_target_data[0] -> ptx code
-     kernel_target_data[1] -> variable mappings
-     kernel_target_data[2] -> array of kernel names in ascii
+  /* The mkoffload utility emits a table of pointers/integers at the start of
+     each offload image:
 
-     kernel_host_table[0] -> start of function addresses (__offload_func_table)
-     kernel_host_table[1] -> end of function addresses (__offload_funcs_end)
+     img_header[0] -> ptx code
+     img_header[1] -> number of variables
+     img_header[2] -> array of variable names (pointers to strings)
+     img_header[3] -> number of kernels
+     img_header[4] -> array of kernel names (pointers to strings)
 
      The array of kernel names and the functions addresses form a
      one-to-one correspondence.  */
 
-  fn_table = kernel_host_table[0];
-  fn_names = (char **) kernel_target_data[2];
-  fn_entries = (kernel_host_table[1] - kernel_host_table[0]) / sizeof (void *);
+  var_entries = (uintptr_t) img_header[1];
+  var_names = (char **) img_header[2];
+  fn_entries = (uintptr_t) img_header[3];
+  fn_names = (char **) img_header[4];
 
-  *tablep = GOMP_PLUGIN_malloc (sizeof (struct mapping_table) * fn_entries);
+  *target_table = GOMP_PLUGIN_malloc (sizeof (struct addr_pair)
+				      * (fn_entries + var_entries));
   targ_fns = GOMP_PLUGIN_malloc (sizeof (struct targ_fn_descriptor)
 				 * fn_entries);
 
@@ -1580,38 +1651,86 @@ GOMP_OFFLOAD_get_table (int n __attribute__((unused)),
       targ_fns[i].fn = function;
       targ_fns[i].name = (const char *) fn_names[i];
 
-      (*tablep)[i].host_start = (uintptr_t) fn_table[i];
-      (*tablep)[i].host_end = (*tablep)[i].host_start + 1;
-      (*tablep)[i].tgt_start = (uintptr_t) &targ_fns[i];
-      (*tablep)[i].tgt_end = (*tablep)[i].tgt_start + 1;
+      (*target_table)[i].start = (uintptr_t) &targ_fns[i];
+      (*target_table)[i].end = (*target_table)[i].start + 1;
     }
 
-  return fn_entries;
+  for (j = 0; j < var_entries; j++, i++)
+    {
+      CUdeviceptr var;
+      size_t bytes;
+
+      r = cuModuleGetGlobal (&var, &bytes, module, var_names[j]);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuModuleGetGlobal error: %s", cuda_error (r));
+
+      (*target_table)[i].start = (uintptr_t) var;
+      (*target_table)[i].end = (*target_table)[i].start + bytes;
+    }
+
+  return i;
+}
+
+void
+GOMP_OFFLOAD_unload_image (int tid __attribute__((unused)), void *target_data)
+{
+  void **img_header = (void **) target_data;
+  struct targ_fn_descriptor *targ_fns
+    = (struct targ_fn_descriptor *) img_header[0];
+  struct ptx_image_data *image, *prev = NULL, *newhd = NULL;
+
+  free (targ_fns);
+
+  pthread_mutex_lock (&ptx_image_lock);
+  for (image = ptx_images; image != NULL;)
+    {
+      struct ptx_image_data *next = image->next;
+
+      if (image->target_data == target_data)
+	{
+	  cuModuleUnload (image->module);
+	  free (image);
+	  if (prev)
+	    prev->next = next;
+	}
+      else
+	{
+	  prev = image;
+	  if (!newhd)
+	    newhd = image;
+	}
+
+      image = next;
+    }
+  ptx_images = newhd;
+  pthread_mutex_unlock (&ptx_image_lock);
 }
 
 void *
-GOMP_OFFLOAD_alloc (int n __attribute__((unused)), size_t size)
+GOMP_OFFLOAD_alloc (int ord, size_t size)
 {
+  nvptx_attach_host_thread_to_device (ord);
   return nvptx_alloc (size);
 }
 
 void
-GOMP_OFFLOAD_free (int n __attribute__((unused)), void *ptr)
+GOMP_OFFLOAD_free (int ord, void *ptr)
 {
+  nvptx_attach_host_thread_to_device (ord);
   nvptx_free (ptr);
 }
 
 void *
-GOMP_OFFLOAD_dev2host (int ord __attribute__((unused)), void *dst,
-		       const void *src, size_t n)
+GOMP_OFFLOAD_dev2host (int ord, void *dst, const void *src, size_t n)
 {
+  nvptx_attach_host_thread_to_device (ord);
   return nvptx_dev2host (dst, src, n);
 }
 
 void *
-GOMP_OFFLOAD_host2dev (int ord __attribute__((unused)), void *dst,
-		       const void *src, size_t n)
+GOMP_OFFLOAD_host2dev (int ord, void *dst, const void *src, size_t n)
 {
+  nvptx_attach_host_thread_to_device (ord);
   return nvptx_host2dev (dst, src, n);
 }
 
@@ -1628,45 +1747,6 @@ GOMP_OFFLOAD_openacc_parallel (void (*fn) (void *), size_t mapnum,
 	      num_workers, vector_length, async, shared_size, targ_mem_desc);
 }
 
-void *
-GOMP_OFFLOAD_openacc_open_device (int n)
-{
-  return nvptx_open_device (n);
-}
-
-int
-GOMP_OFFLOAD_openacc_close_device (void *h)
-{
-  return nvptx_close_device (h);
-}
-
-void
-GOMP_OFFLOAD_openacc_set_device_num (int n)
-{
-  struct nvptx_thread *nvthd = nvptx_thread ();
-
-  assert (n >= 0);
-
-  if (!nvthd->ptx_dev || nvthd->ptx_dev->ord != n)
-    (void) nvptx_open_device (n);
-}
-
-/* This can be called before the device is "opened" for the current thread, in
-   which case we can't tell which device number should be returned.  We don't
-   actually want to open the device here, so just return -1 and let the caller
-   (oacc-init.c:acc_get_device_num) handle it.  */
-
-int
-GOMP_OFFLOAD_openacc_get_device_num (void)
-{
-  struct nvptx_thread *nvthd = nvptx_thread ();
-
-  if (nvthd && nvthd->ptx_dev)
-    return nvthd->ptx_dev->ord;
-  else
-    return -1;
-}
-
 void
 GOMP_OFFLOAD_openacc_register_async_cleanup (void *targ_mem_desc)
 {
@@ -1730,14 +1810,18 @@ GOMP_OFFLOAD_openacc_async_set_async (int async)
 }
 
 void *
-GOMP_OFFLOAD_openacc_create_thread_data (void *targ_data)
+GOMP_OFFLOAD_openacc_create_thread_data (int ord)
 {
-  struct ptx_device *ptx_dev = (struct ptx_device *) targ_data;
+  struct ptx_device *ptx_dev;
   struct nvptx_thread *nvthd
     = GOMP_PLUGIN_malloc (sizeof (struct nvptx_thread));
   CUresult r;
   CUcontext thd_ctx;
 
+  ptx_dev = ptx_devices[ord];
+
+  assert (ptx_dev);
+
   r = cuCtxGetCurrent (&thd_ctx);
   if (r != CUDA_SUCCESS)
     GOMP_PLUGIN_fatal ("cuCtxGetCurrent error: %s", cuda_error (r));
diff --git a/libgomp/target.c b/libgomp/target.c
index d4ff100..c6e4c9c 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1085,10 +1085,6 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
     {
       optional_present = optional_total = 0;
       DLSYM_OPT (openacc.exec, openacc_parallel);
-      DLSYM_OPT (openacc.open_device, openacc_open_device);
-      DLSYM_OPT (openacc.close_device, openacc_close_device);
-      DLSYM_OPT (openacc.get_device_num, openacc_get_device_num);
-      DLSYM_OPT (openacc.set_device_num, openacc_set_device_num);
       DLSYM_OPT (openacc.register_async_cleanup,
 		 openacc_register_async_cleanup);
       DLSYM_OPT (openacc.async_test, openacc_async_test);
@@ -1197,7 +1193,6 @@ gomp_target_init (void)
 		current_device.mem_map.splay_tree.root = NULL;
 		current_device.mem_map.is_initialized = false;
 		current_device.openacc.data_environ = NULL;
-		current_device.openacc.target_data = NULL;
 		for (i = 0; i < new_num_devices; i++)
 		  {
 		    current_device.target_id = i;

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC testsuite] Fix PR64850, tweak acc_on_device* tests
  2015-02-04  9:41   ` [RFC testsuite] Fix PR64850, tweak acc_on_device* tests Thomas Schwinge
@ 2015-02-10 12:02     ` Thomas Schwinge
  2015-02-12  0:23       ` Kaz Kojima
  0 siblings, 1 reply; 92+ messages in thread
From: Thomas Schwinge @ 2015-02-10 12:02 UTC (permalink / raw)
  To: Kaz Kojima, Jakub Jelinek; +Cc: gcc-patches, Andreas Schwab, danglin

[-- Attachment #1: Type: text/plain, Size: 7704 bytes --]

Hi!

On Wed, 4 Feb 2015 10:40:40 +0100, I wrote:
> On Wed, 04 Feb 2015 08:41:28 +0900 (JST), Kaz Kojima <kkojima@rr.iij4u.or.jp> wrote:
> > Several goacc/acc_on_device tests fail for a few targets:
> > 
> > hppa2.0w-hp-hpux11.11 (PR testsuite/64850)
> > https://gcc.gnu.org/ml/gcc-testresults/2015-01/msg02659.html
> > 
> > m68k-unknown-linux-gnu
> > https://gcc.gnu.org/ml/gcc-testresults/2015-01/msg02960.html
> > 
> > sh4-unknown-linux-gnu
> > https://gcc.gnu.org/ml/gcc-testresults/2015-01/msg02930.html
> > 
> > Also they fail with special options
> > x86_64-unknown-linux-gnu -fpic -mcmodel=large
> > https://gcc.gnu.org/ml/gcc-testresults/2015-02/msg00198.html
> 
> Thanks for looking into this -- incidentally, I also started looking into
> it yesterday...  :-)
> 
> > Those tests scan .expand rtl dumps to get the number of calls for
> > acc_on_device function.  For almost targets, the call rtx looks
> > something like
> > 
> >   (call (mem:QI (symbol_ref:SI ("acc_on_device") [flags 0x41]  <function_decl 0xb7614100 acc_on_device>) [0 acc_on_device S1 A8])
> > 
> > and tests use the regular expression "\\\(call \[^\\n\]*\\\"acc_on_device"
> > to detect it.
> > This expression doesn't match with the corresponding call rtx
> > 
> >   (call (mem:SI (symbol_ref/v:SI ("@acc_on_device") [flags 0x41]  <function_decl 0xb764d900 acc_on_device>) [0 acc_on_device S4 A32])
> > 
> > for hppa and something like
> > 
> >   (call (mem:QI (reg/f:SI 33) [0 acc_on_device S1 A8])
> > 
> > for m68k and sh.  All call rtxes have the function name in
> > the alias set of its mem rtx and it seems that the regular
> > expression "\\\(call \[^\\n\]* acc_on_device" works for all
> > cases.  The attached patch is tested on i686-pc-linux-gnu and
> > sh4-unknown-linux-gnu.
> 
> > 	PR testsuite/64850
> > 	* gcc.dg/goacc/acc_on_device-1.c: Use a space instead of \\\" in
> > 	the expression to find calls.
> > 	* c-c++-common/goacc/acc_on_device-2.c: Likewise.
> > 	* c-c++-common/goacc/acc_on_device-2-off.c: Likewise.
> > 	* gfortran.dg/goacc/acc_on_device-1.f95: Likewise.
> > 	* gfortran.dg/goacc/acc_on_device-2.f95: Likewise.
> > 	* gfortran.dg/goacc/acc_on_device-2-off.f95: Likewise.
> 
> The other idea that I had is to separately scan/count the symbol_ref and
> the call (or call_insn?), but I'm not sure if that is "better".  So, your
> patch looks good to me, thanks!

To resolve the immediate problem: is my approval "enough" for Kaz to
commit the patch, or does that need a "more authoritative approval"?

Jakub then suggested on IRC:

    <jakub> tschwinge: I don't understand why do you want to scan rtl dumps at all; if you want to verify the library function is not called, I'd say best just scan-assembler-not acc_on_device

I think I copied this approach from some other test case (but don't
remember which).  In the current set of tests, we need to verify that the
acc_on_device library function is called 0, 1, or 4 times (see below).

For example, for gcc.dg/goacc/acc_on_device-1.c we'Ve got:

    $ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/ source-gcc/gcc/testsuite/gcc.dg/goacc/acc_on_device-1.c -fopenacc -O -std=c89 -Wno-implicit-function-declaration -S -fpic -mcmodel=large -o acc_on_device-1.s
    $ grep acc_on_device < acc_on_device-1.s
            .file   "acc_on_device-1.c"
            movabsq $acc_on_device@PLTOFF, %rdx
            movabsq $acc_on_device@PLTOFF, %rdx
            movabsq $acc_on_device@PLTOFF, %rdx
            movabsq $acc_on_device@PLTOFF, %rdx

Isn't it even more fragile to scan here for acc_on_device being called
four times compared to the -fdump-rtl-expand dump?  Or should I split up
the four tests into four separate files?  (I guess I lack knowledge of
the best approach for doing such a thing in the GCC testsuite.)

For reference:

> > diff --git a/c-c++-common/goacc/acc_on_device-2-off.c b/c-c++-common/goacc/acc_on_device-2-off.c
> > index 25d21ad..ea31047 100644
> > --- a/c-c++-common/goacc/acc_on_device-2-off.c
> > +++ b/c-c++-common/goacc/acc_on_device-2-off.c
> > @@ -20,6 +20,6 @@ f (void)
> >  }
> >  
> >  /* Without -fopenacc, we're expecting one call.
> > -   { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]*\\\"acc_on_device" 1 "expand" } } */
> > +   { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]* acc_on_device" 1 "expand" } } */
> >  
> >  /* { dg-final { cleanup-rtl-dump "expand" } } */
> > diff --git a/c-c++-common/goacc/acc_on_device-2.c b/c-c++-common/goacc/acc_on_device-2.c
> > index d5389a9..2f4ee2b 100644
> > --- a/c-c++-common/goacc/acc_on_device-2.c
> > +++ b/c-c++-common/goacc/acc_on_device-2.c
> > @@ -24,6 +24,6 @@ f (void)
> >     perturbs expansion as a builtin, which expects an int parameter.  It's fine
> >     when changing acc_device_t to plain int, but that's not what we're doing in
> >     <openacc.h>.
> > -   { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]*\\\"acc_on_device" 0 "expand" { xfail c++ } } } */
> > +   { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]* acc_on_device" 0 "expand" { xfail c++ } } } */
> >  
> >  /* { dg-final { cleanup-rtl-dump "expand" } } */
> > diff --git a/gcc.dg/goacc/acc_on_device-1.c b/gcc.dg/goacc/acc_on_device-1.c
> > index 1a0276e..d0dbc82 100644
> > --- a/gcc.dg/goacc/acc_on_device-1.c
> > +++ b/gcc.dg/goacc/acc_on_device-1.c
> > @@ -15,6 +15,6 @@ f (void)
> >  }
> >  
> >  /* Unsuitable to be handled as a builtin, so we're expecting four calls.
> > -   { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]*\\\"acc_on_device" 4 "expand" } } */
> > +   { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]* acc_on_device" 4 "expand" } } */
> >  
> >  /* { dg-final { cleanup-rtl-dump "expand" } } */
> > diff --git a/gfortran.dg/goacc/acc_on_device-1.f95 b/gfortran.dg/goacc/acc_on_device-1.f95
> > index 9dfde26..0126d9c 100644
> > --- a/gfortran.dg/goacc/acc_on_device-1.f95
> > +++ b/gfortran.dg/goacc/acc_on_device-1.f95
> > @@ -17,6 +17,6 @@ logical function f ()
> >  end function f
> >  
> >  ! Unsuitable to be handled as a builtin, so we're expecting four calls.
> > -! { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]*\\\"acc_on_device" 4 "expand" } }
> > +! { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]* acc_on_device" 4 "expand" } }
> >  
> >  ! { dg-final { cleanup-rtl-dump "expand" } }
> > diff --git a/gfortran.dg/goacc/acc_on_device-2-off.f95 b/gfortran.dg/goacc/acc_on_device-2-off.f95
> > index cf28264..0a4978e 100644
> > --- a/gfortran.dg/goacc/acc_on_device-2-off.f95
> > +++ b/gfortran.dg/goacc/acc_on_device-2-off.f95
> > @@ -34,6 +34,6 @@ logical (4) function f ()
> >  end function f
> >  
> >  ! Without -fopenacc, we're expecting one call.
> > -! { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]*\\\"acc_on_device" 1 "expand" } }
> > +! { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]* acc_on_device" 1 "expand" } }
> >  
> >  ! { dg-final { cleanup-rtl-dump "expand" } }
> > diff --git a/gfortran.dg/goacc/acc_on_device-2.f95 b/gfortran.dg/goacc/acc_on_device-2.f95
> > index 7730a60..43ad022 100644
> > --- a/gfortran.dg/goacc/acc_on_device-2.f95
> > +++ b/gfortran.dg/goacc/acc_on_device-2.f95
> > @@ -35,6 +35,6 @@ end function f
> >  
> >  ! With -fopenacc, we're expecting the builtin to be expanded, so no calls.
> >  ! TODO: not working.
> > -! { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]*\\\"acc_on_device" 0 "expand" { xfail *-*-* } } }
> > +! { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]* acc_on_device" 0 "expand" { xfail *-*-* } } }
> >  
> >  ! { dg-final { cleanup-rtl-dump "expand" } }


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC testsuite] Fix PR64850, tweak acc_on_device* tests
  2015-02-10 12:02     ` Thomas Schwinge
@ 2015-02-12  0:23       ` Kaz Kojima
  0 siblings, 0 replies; 92+ messages in thread
From: Kaz Kojima @ 2015-02-12  0:23 UTC (permalink / raw)
  To: thomas; +Cc: jakub, gcc-patches, schwab, danglin

Thomas Schwinge <thomas@codesourcery.com> wrote:
> To resolve the immediate problem: is my approval "enough" for Kaz to
> commit the patch, or does that need a "more authoritative approval"?

I'd like to commit my patch as a "quick fix" in a few days if
no one objects.

> I think I copied this approach from some other test case (but don't
> remember which).  In the current set of tests, we need to verify that the
> acc_on_device library function is called 0, 1, or 4 times (see below).
> 
> For example, for gcc.dg/goacc/acc_on_device-1.c we'Ve got:
> 
>     $ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/ source-gcc/gcc/testsuite/gcc.dg/goacc/acc_on_device-1.c -fopenacc -O -std=c89 -Wno-implicit-function-declaration -S -fpic -mcmodel=large -o acc_on_device-1.s
>     $ grep acc_on_device < acc_on_device-1.s
>             .file   "acc_on_device-1.c"
>             movabsq $acc_on_device@PLTOFF, %rdx
>             movabsq $acc_on_device@PLTOFF, %rdx
>             movabsq $acc_on_device@PLTOFF, %rdx
>             movabsq $acc_on_device@PLTOFF, %rdx
> 
> Isn't it even more fragile to scan here for acc_on_device being called
> four times compared to the -fdump-rtl-expand dump?  Or should I split up
> the four tests into four separate files?  (I guess I lack knowledge of
> the best approach for doing such a thing in the GCC testsuite.)

Another example with the asm output of m68k compiler:

	lea acc_on_device,%a2
	jsr (%a2)
	...
	jsr (%a2)
	...
	jsr (%a2)
	...
	jsr (%a2)

for -fopenacc -O -fno-openacc acc_on_device-1.c.  It seems that
getting the number of calls for the specific function isn't easy
with the asm output on some targets.

Regards,
	kaz

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Merge current set of OpenACC changes from gomp-4_0-branch
  2015-01-15 20:44 Merge current set of OpenACC changes from gomp-4_0-branch Thomas Schwinge
                   ` (7 preceding siblings ...)
  2015-01-16 23:22 ` Merge current set of OpenACC changes from gomp-4_0-branch Ilya Verbin
@ 2015-02-17 18:06 ` Thomas Schwinge
  2015-02-23 10:31 ` Fix number of arguments parameter in Ada DEF_FUNCTION_TYPE_* (was: Merge current set of OpenACC changes from gomp-4_0-branch) Thomas Schwinge
  2015-04-20 14:24 ` Merge current set of OpenACC changes from gomp-4_0-branch Thomas Schwinge
  10 siblings, 0 replies; 92+ messages in thread
From: Thomas Schwinge @ 2015-02-17 18:06 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 9425 bytes --]

Hi!

On Thu, 15 Jan 2015 21:20:07 +0100, I wrote:
> In r219682, I have committed to trunk our current set of OpenACC changes,
> which we had prepared on gomp-4_0-branch.

This whole file is scheduled to go away: the routines are to be replaced
by builtins which are expanded in the nvptx backend, but until we're
there, here's a patch to make it at least work; committed to trunk in
r220768:

commit f816d7a63c8bc11c81080a0b34bf587d46b6f4c6
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Tue Feb 17 18:05:24 2015 +0000

    libgomp: Make nvptx helper routines self-contained.
    
    	libgomp/
    	* oacc-ptx.h (GOACC_INTERNAL_PTX): Add GOACC_tid, GOACC_ntid,
    	GOACC_ctaid, and GOACC_nctaid routines.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@220768 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog  |    6 ++
 libgomp/oacc-ptx.h |  224 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 230 insertions(+)

diff --git libgomp/ChangeLog libgomp/ChangeLog
index 6c24531..2c32d9e 100644
--- libgomp/ChangeLog
+++ libgomp/ChangeLog
@@ -1,3 +1,9 @@
+2015-02-17  Thomas Schwinge  <thomas@codesourcery.com>
+	    Cesar Philippidis  <cesar@codesourcery.com>
+
+	* oacc-ptx.h (GOACC_INTERNAL_PTX): Add GOACC_tid, GOACC_ntid,
+	GOACC_ctaid, and GOACC_nctaid routines.
+
 2015-02-11  Jakub Jelinek  <jakub@redhat.com>
 
 	PR c/64824
diff --git libgomp/oacc-ptx.h libgomp/oacc-ptx.h
index 13ff86f..2419a46 100644
--- libgomp/oacc-ptx.h
+++ libgomp/oacc-ptx.h
@@ -101,9 +101,233 @@
   ".version 3.1\n" \
   ".target sm_30\n" \
   ".address_size 64\n" \
+  ".visible .func (.param .u32 %out_retval) GOACC_tid (.param .u32 %in_ar1);\n" \
+  ".visible .func (.param .u32 %out_retval) GOACC_ntid (.param .u32 %in_ar1);\n" \
+  ".visible .func (.param .u32 %out_retval) GOACC_ctaid (.param .u32 %in_ar1);\n" \
+  ".visible .func (.param .u32 %out_retval) GOACC_nctaid (.param .u32 %in_ar1);\n" \
   ".visible .func (.param .u32 %out_retval) GOACC_get_num_threads;\n" \
   ".visible .func (.param .u32 %out_retval) GOACC_get_thread_num;\n" \
   ".extern .func abort;\n" \
+  ".visible .func (.param .u32 %out_retval) GOACC_tid (.param .u32 %in_ar1)\n" \
+  "{\n" \
+  ".reg .u32 %ar1;\n" \
+  ".reg .u32 %retval;\n" \
+  ".reg .u64 %hr10;\n" \
+  ".reg .u32 %r22;\n" \
+  ".reg .u32 %r23;\n" \
+  ".reg .u32 %r24;\n" \
+  ".reg .u32 %r25;\n" \
+  ".reg .u32 %r26;\n" \
+  ".reg .u32 %r27;\n" \
+  ".reg .u32 %r28;\n" \
+  ".reg .u32 %r29;\n" \
+  ".reg .pred %r30;\n" \
+  ".reg .u32 %r31;\n" \
+  ".reg .pred %r32;\n" \
+  ".reg .u32 %r33;\n" \
+  ".reg .pred %r34;\n" \
+  ".local .align 8 .b8 %frame[4];\n" \
+  "ld.param.u32 %ar1,[%in_ar1];\n" \
+  "mov.u32 %r27,%ar1;\n" \
+  "st.local.u32 [%frame],%r27;\n" \
+  "ld.local.u32 %r28,[%frame];\n" \
+  "mov.u32 %r29,1;\n"							\
+  "setp.eq.u32 %r30,%r28,%r29;\n"					\
+  "@%r30 bra $L4;\n"							\
+  "mov.u32 %r31,2;\n"							\
+  "setp.eq.u32 %r32,%r28,%r31;\n"					\
+  "@%r32 bra $L5;\n"							\
+  "mov.u32 %r33,0;\n"							\
+  "setp.eq.u32 %r34,%r28,%r33;\n"					\
+  "@!%r34 bra $L8;\n"							\
+  "mov.u32 %r23,%tid.x;\n"						\
+  "mov.u32 %r22,%r23;\n"						\
+  "bra $L7;\n"								\
+  "$L4:\n"								\
+  "mov.u32 %r24,%tid.y;\n"						\
+  "mov.u32 %r22,%r24;\n"						\
+  "bra $L7;\n"								\
+  "$L5:\n"								\
+  "mov.u32 %r25,%tid.z;\n"						\
+  "mov.u32 %r22,%r25;\n"						\
+  "bra $L7;\n"								\
+  "$L8:\n"								\
+  "{\n"									\
+  "{\n"									\
+  "call abort;\n"							\
+  "}\n"									\
+  "}\n"									\
+  "$L7:\n"								\
+  "mov.u32 %r26,%r22;\n"						\
+  "mov.u32 %retval,%r26;\n"						\
+  "st.param.u32 [%out_retval],%retval;\n"				\
+  "ret;\n"								\
+  "}\n"									\
+  ".visible .func (.param .u32 %out_retval) GOACC_ntid (.param .u32 %in_ar1)\n" \
+  "{\n"									\
+  ".reg .u32 %ar1;\n"							\
+  ".reg .u32 %retval;\n"						\
+  ".reg .u64 %hr10;\n"							\
+  ".reg .u32 %r22;\n"							\
+  ".reg .u32 %r23;\n"							\
+  ".reg .u32 %r24;\n"							\
+  ".reg .u32 %r25;\n"							\
+  ".reg .u32 %r26;\n"							\
+  ".reg .u32 %r27;\n"							\
+  ".reg .u32 %r28;\n"							\
+  ".reg .u32 %r29;\n"							\
+  ".reg .pred %r30;\n"							\
+  ".reg .u32 %r31;\n"							\
+  ".reg .pred %r32;\n"							\
+  ".reg .u32 %r33;\n"							\
+  ".reg .pred %r34;\n"							\
+  ".local .align 8 .b8 %frame[4];\n"					\
+  "ld.param.u32 %ar1,[%in_ar1];\n"					\
+  "mov.u32 %r27,%ar1;\n"						\
+  "st.local.u32 [%frame],%r27;\n"					\
+  "ld.local.u32 %r28,[%frame];\n"					\
+  "mov.u32 %r29,1;\n"							\
+  "setp.eq.u32 %r30,%r28,%r29;\n"					\
+  "@%r30 bra $L11;\n"							\
+  "mov.u32 %r31,2;\n"							\
+  "setp.eq.u32 %r32,%r28,%r31;\n"					\
+  "@%r32 bra $L12;\n"							\
+  "mov.u32 %r33,0;\n"							\
+  "setp.eq.u32 %r34,%r28,%r33;\n"					\
+  "@!%r34 bra $L15;\n"							\
+  "mov.u32 %r23,%ntid.x;\n"						\
+  "mov.u32 %r22,%r23;\n"						\
+  "bra $L14;\n"								\
+  "$L11:\n"								\
+  "mov.u32 %r24,%ntid.y;\n"						\
+  "mov.u32 %r22,%r24;\n"						\
+  "bra $L14;\n"								\
+  "$L12:\n"								\
+  "mov.u32 %r25,%ntid.z;\n"						\
+  "mov.u32 %r22,%r25;\n"						\
+  "bra $L14;\n"								\
+  "$L15:\n"								\
+  "{\n"									\
+  "{\n"									\
+  "call abort;\n"							\
+  "}\n"									\
+  "}\n"									\
+  "$L14:\n"								\
+  "mov.u32 %r26,%r22;\n"						\
+  "mov.u32 %retval,%r26;\n"						\
+  "st.param.u32 [%out_retval],%retval;\n"				\
+  "ret;\n"								\
+  "}\n"									\
+  ".visible .func (.param .u32 %out_retval) GOACC_ctaid (.param .u32 %in_ar1)\n" \
+  "{\n"									\
+  ".reg .u32 %ar1;\n"							\
+  ".reg .u32 %retval;\n"						\
+  ".reg .u64 %hr10;\n"							\
+  ".reg .u32 %r22;\n"							\
+  ".reg .u32 %r23;\n"							\
+  ".reg .u32 %r24;\n"							\
+  ".reg .u32 %r25;\n"							\
+  ".reg .u32 %r26;\n"							\
+  ".reg .u32 %r27;\n"							\
+  ".reg .u32 %r28;\n"							\
+  ".reg .u32 %r29;\n"							\
+  ".reg .pred %r30;\n"							\
+  ".reg .u32 %r31;\n"							\
+  ".reg .pred %r32;\n"							\
+  ".reg .u32 %r33;\n"							\
+  ".reg .pred %r34;\n"							\
+  ".local .align 8 .b8 %frame[4];\n"					\
+  "ld.param.u32 %ar1,[%in_ar1];\n"					\
+  "mov.u32 %r27,%ar1;\n"						\
+  "st.local.u32 [%frame],%r27;\n"					\
+  "ld.local.u32 %r28,[%frame];\n"					\
+  "mov.u32 %r29,1;\n"							\
+  "setp.eq.u32 %r30,%r28,%r29;\n"					\
+  "@%r30 bra $L18;\n"							\
+  "mov.u32 %r31,2;\n"							\
+  "setp.eq.u32 %r32,%r28,%r31;\n"					\
+  "@%r32 bra $L19;\n"							\
+  "mov.u32 %r33,0;\n"							\
+  "setp.eq.u32 %r34,%r28,%r33;\n"					\
+  "@!%r34 bra $L22;\n"							\
+  "mov.u32 %r23,%ctaid.x;\n"						\
+  "mov.u32 %r22,%r23;\n"						\
+  "bra $L21;\n"								\
+  "$L18:\n"								\
+  "mov.u32 %r24,%ctaid.y;\n"						\
+  "mov.u32 %r22,%r24;\n"						\
+  "bra $L21;\n"								\
+  "$L19:\n"								\
+  "mov.u32 %r25,%ctaid.z;\n"						\
+  "mov.u32 %r22,%r25;\n"						\
+  "bra $L21;\n"								\
+  "$L22:\n"								\
+  "{\n"									\
+  "{\n"									\
+  "call abort;\n"							\
+  "}\n"									\
+  "}\n"									\
+  "$L21:\n"								\
+  "mov.u32 %r26,%r22;\n"						\
+  "mov.u32 %retval,%r26;\n"						\
+  "st.param.u32 [%out_retval],%retval;\n"				\
+  "ret;\n"								\
+  "}\n"									\
+  ".visible .func (.param .u32 %out_retval) GOACC_nctaid (.param .u32 %in_ar1)\n" \
+  "{\n"									\
+  ".reg .u32 %ar1;\n"							\
+  ".reg .u32 %retval;\n"						\
+  ".reg .u64 %hr10;\n"							\
+  ".reg .u32 %r22;\n"							\
+  ".reg .u32 %r23;\n"							\
+  ".reg .u32 %r24;\n"							\
+  ".reg .u32 %r25;\n"							\
+  ".reg .u32 %r26;\n"							\
+  ".reg .u32 %r27;\n"							\
+  ".reg .u32 %r28;\n"							\
+  ".reg .u32 %r29;\n"							\
+  ".reg .pred %r30;\n"							\
+  ".reg .u32 %r31;\n"							\
+  ".reg .pred %r32;\n"							\
+  ".reg .u32 %r33;\n"							\
+  ".reg .pred %r34;\n"							\
+  ".local .align 8 .b8 %frame[4];\n"					\
+  "ld.param.u32 %ar1,[%in_ar1];\n"					\
+  "mov.u32 %r27,%ar1;\n"						\
+  "st.local.u32 [%frame],%r27;\n"					\
+  "ld.local.u32 %r28,[%frame];\n"					\
+  "mov.u32 %r29,1;\n"							\
+  "setp.eq.u32 %r30,%r28,%r29;\n"					\
+  "@%r30 bra $L25;\n"							\
+  "mov.u32 %r31,2;\n"							\
+  "setp.eq.u32 %r32,%r28,%r31;\n"					\
+  "@%r32 bra $L26;\n"							\
+  "mov.u32 %r33,0;\n"							\
+  "setp.eq.u32 %r34,%r28,%r33;\n"					\
+  "@!%r34 bra $L29;\n"							\
+  "mov.u32 %r23,%nctaid.x;\n"						\
+  "mov.u32 %r22,%r23;\n"						\
+  "bra $L28;\n"								\
+  "$L25:\n"								\
+  "mov.u32 %r24,%nctaid.y;\n"						\
+  "mov.u32 %r22,%r24;\n"						\
+  "bra $L28;\n"								\
+  "$L26:\n"								\
+  "mov.u32 %r25,%nctaid.z;\n"						\
+  "mov.u32 %r22,%r25;\n"						\
+  "bra $L28;\n"								\
+  "$L29:\n"								\
+  "{\n"									\
+  "{\n"									\
+  "call abort;\n"							\
+  "}\n"									\
+  "}\n"									\
+  "$L28:\n"								\
+  "mov.u32 %r26,%r22;\n"						\
+  "mov.u32 %retval,%r26;\n"						\
+  "st.param.u32 [%out_retval],%retval;\n"				\
+  "ret;\n"								\
+  "}\n"									\
   ".visible .func (.param .u32 %out_retval) GOACC_get_num_threads\n"	\
   "{\n"									\
   ".reg .u32 %retval;\n"						\


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Merge current set of OpenACC changes from gomp-4_0-branch
  2015-02-04 15:06                 ` Julian Brown
@ 2015-02-18 12:25                   ` Ilya Verbin
  2015-02-24 12:49                   ` Julian Brown
  1 sibling, 0 replies; 92+ messages in thread
From: Ilya Verbin @ 2015-02-18 12:25 UTC (permalink / raw)
  To: Julian Brown; +Cc: Thomas Schwinge, gcc-patches, Jakub Jelinek, Kirill Yukhin

On Wed, Feb 04, 2015 at 15:05:45 +0000, Julian Brown wrote:
> This (WIP) patch is based on top of a version of your patch that I
> merged to our internal branch: that's still the easiest way for me to
> test the PTX backend (with unloading support) at present, and it passes
> libgomp testing that way. Trunk should be fairly close, but I haven't
> tried applying it there yet.
> 
> The major changes are:
> 
> * The removal of the OpenACC-specific plugin hooks open_device,
>   close_device, set_device_num and get_device_num. The functionality
>   has been moved into the init/fini hooks (for the first two) or moved
>   into the target-independent OpenACC parts, respectively.
> 
> * The PTX mkoffload utility has been extended to support variables as
>   well as function mapping, to fill out support for the load/unload
>   image hooks. (Not really tested so far!)
> 
> * The plugin hooks that are shared between OpenMP and OpenACC now
>   support the "device number" argument properly: that should help with
>   (eventually) unifying the plugin interface for the two APIs. (With
>   set_device_num and get_device_num removed, the plugin is "stateless"
>   with respect to which device is currently active. The rest of the
>   OpenACC hooks -- async functions, etc. -- should probably be changed
>   to take a device number argument too, but that could be a follow-on
>   patch.)
> 
> * The limitation of having only one type of device active simultaneously
>   in the OpenACC runtime has (theoretically!) been removed.
> 
> Thoughts?

Up.

I have no comments here since I'm not familiar with OpenACC and PTX, but I hope
that Thomas and Jakub will review this and my corresponding patches [1], [2]
before the final closure of the trunk.

[1] https://gcc.gnu.org/ml/gcc-patches/2015-01/msg02275.html
[2] https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01912.html

  -- Ilya

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Fix number of arguments parameter in Ada DEF_FUNCTION_TYPE_* (was: Merge current set of OpenACC changes from gomp-4_0-branch)
  2015-01-15 20:44 Merge current set of OpenACC changes from gomp-4_0-branch Thomas Schwinge
                   ` (8 preceding siblings ...)
  2015-02-17 18:06 ` Thomas Schwinge
@ 2015-02-23 10:31 ` Thomas Schwinge
  2015-04-20 14:24 ` Merge current set of OpenACC changes from gomp-4_0-branch Thomas Schwinge
  10 siblings, 0 replies; 92+ messages in thread
From: Thomas Schwinge @ 2015-02-23 10:31 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2468 bytes --]

Hi!

On Thu, 15 Jan 2015 21:20:07 +0100, I wrote:
> In r219682, I have committed to trunk our current set of OpenACC changes,
> which we had prepared on gomp-4_0-branch.  [...]

>     	gcc/ada/
>     	* gcc-interface/utils.c (DEF_FUNCTION_TYPE_VAR_8)
>     	(DEF_FUNCTION_TYPE_VAR_12): New macros.

Committed in r220910:

commit 11d2c7638e26ab69df7167474c9aa8f5d4114703
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Mon Feb 23 10:06:49 2015 +0000

    Fix number of arguments parameter in Ada DEF_FUNCTION_TYPE_*.
    
    	gcc/ada/
    	* gcc-interface/utils.c (DEF_FUNCTION_TYPE_VAR_8)
    	(DEF_FUNCTION_TYPE_VAR_12): Fix number of arguments parameter.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@220910 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ada/ChangeLog             |    5 +++++
 gcc/ada/gcc-interface/utils.c |    4 ++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git gcc/ada/ChangeLog gcc/ada/ChangeLog
index ddd6e10..06a51ac 100644
--- gcc/ada/ChangeLog
+++ gcc/ada/ChangeLog
@@ -1,3 +1,8 @@
+2015-02-23  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* gcc-interface/utils.c (DEF_FUNCTION_TYPE_VAR_8)
+	(DEF_FUNCTION_TYPE_VAR_12): Fix number of arguments parameter.
+
 2015-02-22  Arnaud Charlet  <charlet@adacore.com>
 
 	* doc/Makefile: postprocess texinfo files to update @dircategory
diff --git gcc/ada/gcc-interface/utils.c gcc/ada/gcc-interface/utils.c
index 44dad7b..d076da7 100644
--- gcc/ada/gcc-interface/utils.c
+++ gcc/ada/gcc-interface/utils.c
@@ -5477,11 +5477,11 @@ install_builtin_function_types (void)
   def_fn_type (ENUM, RETURN, 1, 5, ARG1, ARG2, ARG3, ARG4, ARG5);
 #define DEF_FUNCTION_TYPE_VAR_8(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
 				ARG6, ARG7, ARG8)			\
-  def_fn_type (ENUM, RETURN, 1, 5, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6,	\
+  def_fn_type (ENUM, RETURN, 1, 8, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6,	\
 	       ARG7, ARG8);
 #define DEF_FUNCTION_TYPE_VAR_12(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
 				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11, ARG12) \
-  def_fn_type (ENUM, RETURN, 1, 5, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6,	\
+  def_fn_type (ENUM, RETURN, 1, 12, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6,	\
 	       ARG7, ARG8, ARG9, ARG10, ARG11, ARG12);
 #define DEF_POINTER_TYPE(ENUM, TYPE) \
   builtin_types[(int) ENUM] = build_pointer_type (builtin_types[(int) TYPE]);


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Merge current set of OpenACC changes from gomp-4_0-branch
  2015-02-04 15:06                 ` Julian Brown
  2015-02-18 12:25                   ` Ilya Verbin
@ 2015-02-24 12:49                   ` Julian Brown
  2015-02-25  9:54                     ` libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch) Thomas Schwinge
  1 sibling, 1 reply; 92+ messages in thread
From: Julian Brown @ 2015-02-24 12:49 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: Thomas Schwinge, gcc-patches, Jakub Jelinek, Kirill Yukhin

[-- Attachment #1: Type: text/plain, Size: 5813 bytes --]

Hi,

On Wed, 4 Feb 2015 15:05:45 +0000
Julian Brown <julian@codesourcery.com> wrote:

> The major changes are:
> 
> * The removal of the OpenACC-specific plugin hooks open_device,
>   close_device, set_device_num and get_device_num. The functionality
>   has been moved into the init/fini hooks (for the first two) or moved
>   into the target-independent OpenACC parts, respectively.
> 
> * The PTX mkoffload utility has been extended to support variables as
>   well as function mapping, to fill out support for the load/unload
>   image hooks. (Not really tested so far!)
> 
> * The plugin hooks that are shared between OpenMP and OpenACC now
>   support the "device number" argument properly: that should help with
>   (eventually) unifying the plugin interface for the two APIs. (With
>   set_device_num and get_device_num removed, the plugin is "stateless"
>   with respect to which device is currently active. The rest of the
>   OpenACC hooks -- async functions, etc. -- should probably be changed
>   to take a device number argument too, but that could be a follow-on
>   patch.)
> 
> * The limitation of having only one type of device active
> simultaneously in the OpenACC runtime has (theoretically!) been
> removed.

This is a version of the previously-posted patch to rework
initialisation and support the proposed load/unload hooks, merged to
gomp4 branch and tested alongside the two patches (from
https://gcc.gnu.org/wiki/Offloading#nvptx_Offloading):

http://news.gmane.org/find-root.php?message_id=%3C20150218100035.GF1746%40tucnak.redhat.com%3E

http://news.gmane.org/find-root.php?message_id=%3C546CF508.9010807%40codesourcery.com%3E

As well as Ilya Verbin's patch:

https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01605.html

Test results look OK, barring a suspected harness issue (lib-83
failing with a timeout for nvptx, though it works fine from the command
line).

OK for gomp4 branch? I could commit Ilya's patch there too if so.

Thanks,

Julian

ChangeLog

    gcc/
    * config/nvptx/mkoffload.c (process): Support variable mapping.

    libgomp/
    * libgomp.h (acc_dispatch_t): Remove open_device_func,
    close_device_func, get_device_num_func, set_device_num_func,
    target_data members. Change create_thread_data_func argument to
    device number instead of generic pointer.
    * oacc-async.c (assert.h): Include.
    (acc_async_test, acc_async_test_all, acc_wait, acc_wait_async)
    (acc_wait_all, acc_wait_all_async): Use current host thread's
    active device, not base_dev.
    * oacc-cuda.c (acc_get_current_cuda_device)
    (acc_get_current_cuda_context, acc_get_cuda_stream)
    (acc_set_cuda_stream): Likewise.
    * oacc-host.c (host_dispatch): Don't set open_device_func,
    close_device_func, get_device_num_func or set_device_num_func.
    * oacc-init.c (base_dev, init_key): Remove.
    (cached_base_dev): New.
    (name_of_acc_device_t): New.
    (acc_init_1): Initialise default-numbered device, not zeroth.
    (acc_shutdown_1): Close all devices of a given type.
    (goacc_destroy_thread): Don't use base_dev.
    (lazy_open, lazy_init, lazy_init_and_open): Remove.
    (goacc_attach_host_thread_to_device): New.
    (acc_init): Reimplement with goacc_attach_host_thread_to_device.
    (acc_get_num_devices): Don't use base_dev.
    (acc_set_device_type): Reimplement.
    (acc_get_device_type): Don't use base_dev.
    (acc_get_device_num): Tweak logic.
    (acc_set_device_num): Likewise.
    (goacc_runtime_initialize): Initialize cached_base_dev not base_dev.
    (goacc_lazy_initialize): Reimplement with acc_init and
    goacc_attach_host_thread_to_device.
    * oacc-int.h (goacc_thread): Add base_dev field.
    (base_dev): Remove extern declaration.
    (goacc_attach_host_thread_to_device): Add prototype.
    * oacc-mem.c (acc_malloc): Use current thread's device instead of
    base_dev.
    (acc_free): Likewise.
    (acc_memcpy_to_device): Likewise.
    (acc_memcpy_from_device): Likewise.
    * oacc-parallel.c (select_acc_device): Remove. Replace calls with
    goacc_lazy_initialize (throughout).
    * target.c (gomp_load_plugin_for_device): Don't initialise openacc
    open_device, close_device, get_device_num or set_device_num hooks.
    Don't initialise target_data.
    * plugin/plugin-host.c (GOMP_OFFLOAD_openacc_open_device)
    (GOMP_OFFLOAD_openacc_close_device)
    (GOMP_OFFLOAD_openacc_get_device_num)
    (GOMP_OFFLOAD_openacc_set_device_num): Remove.
    (GOMP_OFFLOAD_openacc_create_thread_data): Change (unused) argument
    to int.
    * plugin/plugin-nvptx.c (pthread.h): Include.
    (ptx_inited): Remove.
    (instantiated_devices, ptx_dev_lock): New.
    (struct ptx_image_data): New.
    (ptx_devices, ptx_images, ptx_image_lock): New.
    (nvptx_get_num_devices): Remove forward declaration.
    (nvptx_init): Change return type to bool.
    (nvptx_fini): Remove.
    (nvptx_attach_host_thread_to_device): New.
    (nvptx_open_device): Remove struct ptx_device* instead of void*.
    (nvptx_close_device): Change argument type to struct ptx_device*,
    return type to void.
    (nvptx_get_num_devices): Use instantiated_devices not ptx_inited.
    (kernel_target_data, kernel_host_table): Remove static globals.
    (GOMP_OFFLOAD_register_image, GOMP_OFFLOAD_get_table): Remove.
    (GOMP_OFFLOAD_init_device): Reimplement.
    (GOMP_OFFLOAD_fini_device): Likewise.
    (GOMP_OFFLOAD_load_image, GOMP_OFFLOAD_unload_image): New.
    (GOMP_OFFLOAD_alloc, GOMP_OFFLOAD_free, GOMP_OFFLOAD_dev2host)
    (GOMP_OFFLOAD_host2dev): Use ORD argument.
    (GOMP_OFFLOAD_openacc_open_device)
    (GOMP_OFFLOAD_openacc_close_device)
    (GOMP_OFFLOAD_openacc_set_device_num)
    (GOMP_OFFLOAD_openacc_get_device_num): Remove.
    (GOMP_OFFLOAD_openacc_create_thread_data): Change argument to int
    (device number).

[-- Attachment #2: nvptx-load-unload-3.diff --]
[-- Type: text/x-patch, Size: 43789 bytes --]

commit 2a723cdaeddbc6834cd4ed4f194937f36981ca1d
Author: Julian Brown <julian@codesourcery.com>
Date:   Mon Feb 23 11:55:41 2015 -0800

    nvptx load/unload image support, init rework

diff --git a/gcc/config/nvptx/mkoffload.c b/gcc/config/nvptx/mkoffload.c
index 02c44b6..dbc68bc 100644
--- a/gcc/config/nvptx/mkoffload.c
+++ b/gcc/config/nvptx/mkoffload.c
@@ -839,6 +839,7 @@ process (FILE *in, FILE *out)
 {
   const char *input = read_file (in);
   Token *tok = tokenize (input);
+  unsigned int nvars = 0, nfuncs = 0;
 
   do
     tok = parse_file (tok);
@@ -850,16 +851,17 @@ process (FILE *in, FILE *out)
   write_stmts (out, rev_stmts (fns));
   fprintf (out, ";\n\n");
   fprintf (out, "static const char *var_mappings[] = {\n");
-  for (id_map *id = var_ids; id; id = id->next)
+  for (id_map *id = var_ids; id; id = id->next, nvars++)
     fprintf (out, "\t\"%s\"%s\n", id->ptx_name, id->next ? "," : "");
   fprintf (out, "};\n\n");
   fprintf (out, "static const char *func_mappings[] = {\n");
-  for (id_map *id = func_ids; id; id = id->next)
+  for (id_map *id = func_ids; id; id = id->next, nfuncs++)
     fprintf (out, "\t\"%s\"%s\n", id->ptx_name, id->next ? "," : "");
   fprintf (out, "};\n\n");
 
   fprintf (out, "static const void *target_data[] = {\n");
-  fprintf (out, "  ptx_code, var_mappings, func_mappings\n");
+  fprintf (out, "  ptx_code, (void*) %u, var_mappings, (void*) %u, "
+		"func_mappings\n", nvars, nfuncs);
   fprintf (out, "};\n\n");
 
   fprintf (out, "extern void GOMP_offload_register (const void *, int, void *);\n");
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 3fc9aa9..c8e031a 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -706,18 +706,6 @@ typedef struct acc_dispatch_t
   /* This is guarded by the lock in the "outer" struct gomp_device_descr.  */
   struct target_mem_desc *data_environ;
 
-  /* Extra information required for a device instance by a given target.  */
-  /* This is guarded by the lock in the "outer" struct gomp_device_descr.  */
-  void *target_data;
-
-  /* Open or close a device instance.  */
-  void *(*open_device_func) (int n);
-  int (*close_device_func) (void *h);
-
-  /* Set or get the device number.  */
-  int (*get_device_num_func) (void);
-  void (*set_device_num_func) (int);
-
   /* Execute.  */
   void (*exec_func) (void (*) (void *), size_t, void **, void **, size_t *,
 		     unsigned short *, int, int, int, int, void *);
@@ -735,7 +723,7 @@ typedef struct acc_dispatch_t
   void (*async_set_async_func) (int);
 
   /* Create/destroy TLS data.  */
-  void *(*create_thread_data_func) (void *);
+  void *(*create_thread_data_func) (int);
   void (*destroy_thread_data_func) (void *);
 
   /* NVIDIA target specific routines.  */
diff --git a/libgomp/oacc-async.c b/libgomp/oacc-async.c
index 08b7c5e..149577e 100644
--- a/libgomp/oacc-async.c
+++ b/libgomp/oacc-async.c
@@ -26,7 +26,7 @@
    see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
-
+#include <assert.h>
 #include "openacc.h"
 #include "libgomp.h"
 #include "oacc-int.h"
@@ -34,44 +34,68 @@
 int
 acc_async_test (int async)
 {
+  struct goacc_thread *thr = goacc_thread ();
+
   if (async < acc_async_sync)
     gomp_fatal ("invalid async argument: %d", async);
 
-  return base_dev->openacc.async_test_func (async);
+  assert (thr->dev);
+
+  return thr->dev->openacc.async_test_func (async);
 }
 
 int
 acc_async_test_all (void)
 {
-  return base_dev->openacc.async_test_all_func ();
+  struct goacc_thread *thr = goacc_thread ();
+
+  assert (thr->dev);
+
+  return thr->dev->openacc.async_test_all_func ();
 }
 
 void
 acc_wait (int async)
 {
+  struct goacc_thread *thr = goacc_thread ();
+
   if (async < acc_async_sync)
     gomp_fatal ("invalid async argument: %d", async);
 
-  base_dev->openacc.async_wait_func (async);
+  assert (thr->dev);
+
+  thr->dev->openacc.async_wait_func (async);
 }
 
 void
 acc_wait_async (int async1, int async2)
 {
-  base_dev->openacc.async_wait_async_func (async1, async2);
+  struct goacc_thread *thr = goacc_thread ();
+
+  assert (thr->dev);
+
+  thr->dev->openacc.async_wait_async_func (async1, async2);
 }
 
 void
 acc_wait_all (void)
 {
-  base_dev->openacc.async_wait_all_func ();
+  struct goacc_thread *thr = goacc_thread ();
+
+  assert (thr->dev);
+
+  thr->dev->openacc.async_wait_all_func ();
 }
 
 void
 acc_wait_all_async (int async)
 {
+  struct goacc_thread *thr = goacc_thread ();
+
   if (async < acc_async_sync)
     gomp_fatal ("invalid async argument: %d", async);
 
-  base_dev->openacc.async_wait_all_async_func (async);
+  assert (thr->dev);
+
+  thr->dev->openacc.async_wait_all_async_func (async);
 }
diff --git a/libgomp/oacc-cuda.c b/libgomp/oacc-cuda.c
index c8ef376..4aab422 100644
--- a/libgomp/oacc-cuda.c
+++ b/libgomp/oacc-cuda.c
@@ -34,51 +34,53 @@
 void *
 acc_get_current_cuda_device (void)
 {
-  void *p = NULL;
+  struct goacc_thread *thr = goacc_thread ();
 
-  if (base_dev && base_dev->openacc.cuda.get_current_device_func)
-    p = base_dev->openacc.cuda.get_current_device_func ();
+  if (thr && thr->dev && thr->dev->openacc.cuda.get_current_device_func)
+    return thr->dev->openacc.cuda.get_current_device_func ();
 
-  return p;
+  return NULL;
 }
 
 void *
 acc_get_current_cuda_context (void)
 {
-  void *p = NULL;
+  struct goacc_thread *thr = goacc_thread ();
 
-  if (base_dev && base_dev->openacc.cuda.get_current_context_func)
-    p = base_dev->openacc.cuda.get_current_context_func ();
-
-  return p;
+  if (thr && thr->dev && thr->dev->openacc.cuda.get_current_context_func)
+    return thr->dev->openacc.cuda.get_current_context_func ();
+ 
+  return NULL;
 }
 
 void *
 acc_get_cuda_stream (int async)
 {
-  void *p = NULL;
+  struct goacc_thread *thr = goacc_thread ();
 
   if (async < 0)
-    return p;
-
-  if (base_dev && base_dev->openacc.cuda.get_stream_func)
-    p = base_dev->openacc.cuda.get_stream_func (async);
+    return NULL;
 
-  return p;
+  if (thr && thr->dev && thr->dev->openacc.cuda.get_stream_func)
+    return thr->dev->openacc.cuda.get_stream_func (async);
+ 
+  return NULL;
 }
 
 int
 acc_set_cuda_stream (int async, void *stream)
 {
-  int s = -1;
+  struct goacc_thread *thr;
 
   if (async < 0 || stream == NULL)
     return 0;
 
   goacc_lazy_initialize ();
 
-  if (base_dev && base_dev->openacc.cuda.set_stream_func)
-    s = base_dev->openacc.cuda.set_stream_func (async, stream);
+  thr = goacc_thread ();
+
+  if (thr && thr->dev && thr->dev->openacc.cuda.set_stream_func)
+    return thr->dev->openacc.cuda.set_stream_func (async, stream);
 
-  return s;
+  return -1;
 }
diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index 5d67c6c..c3b8b73 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -58,12 +58,6 @@ static struct gomp_device_descr host_dispatch =
     .is_initialized = false,
 
     .openacc = {
-      .open_device_func = GOMP_OFFLOAD_openacc_open_device,
-      .close_device_func = GOMP_OFFLOAD_openacc_close_device,
-
-      .get_device_num_func = GOMP_OFFLOAD_openacc_get_device_num,
-      .set_device_num_func = GOMP_OFFLOAD_openacc_set_device_num,
-
       .exec_func = GOMP_OFFLOAD_openacc_parallel,
 
       .register_async_cleanup_func
diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
index 19b937a..2c5c7db 100644
--- a/libgomp/oacc-init.c
+++ b/libgomp/oacc-init.c
@@ -37,14 +37,13 @@
 
 static gomp_mutex_t acc_device_lock;
 
-/* The dispatch table for the current accelerator device.  This is global, so
-   you can only have one type of device open at any given time in a program.
-   This is the "base" device in that several devices that use the same
-   dispatch table may be active concurrently: this one (the "zeroth") is used
-   for overall initialisation/shutdown, and other instances -- not necessarily
-   including this one -- may be opened and closed once the base device has
-   been initialized.  */
-struct gomp_device_descr *base_dev;
+/* A cached version of the dispatcher for the global "current" accelerator type,
+   e.g. used as the default when creating new host threads.  This is the
+   device-type equivalent of goacc_device_num (which specifies which device to
+   use out of potentially several of the same type).  If there are several
+   devices of a given type, this points at the first one.  */
+
+static struct gomp_device_descr *cached_base_dev = NULL;
 
 #if defined HAVE_TLS || defined USE_EMUTLS
 __thread struct goacc_thread *goacc_tls_data;
@@ -53,9 +52,6 @@ pthread_key_t goacc_tls_key;
 #endif
 static pthread_key_t goacc_cleanup_key;
 
-/* Current dispatcher, and how it was initialized */
-static acc_device_t init_key = _ACC_device_hwm;
-
 static struct goacc_thread *goacc_threads;
 static gomp_mutex_t goacc_thread_lock;
 
@@ -94,6 +90,21 @@ get_openacc_name (const char *name)
     return name;
 }
 
+static const char *
+name_of_acc_device_t (enum acc_device_t type)
+{
+  switch (type)
+    {
+    case acc_device_none: return "none";
+    case acc_device_default: return "default";
+    case acc_device_host: return "host";
+    case acc_device_host_nonshm: return "host_nonshm";
+    case acc_device_not_host: return "not_host";
+    case acc_device_nvidia: return "nvidia";
+    default: return "<unknown>";
+    }
+}
+
 static struct gomp_device_descr *
 resolve_device (acc_device_t d)
 {
@@ -159,22 +170,89 @@ resolve_device (acc_device_t d)
 static struct gomp_device_descr *
 acc_init_1 (acc_device_t d)
 {
-  struct gomp_device_descr *acc_dev;
+  struct gomp_device_descr *base_dev, *acc_dev;
+  int ndevs;
 
-  acc_dev = resolve_device (d);
+  base_dev = resolve_device (d);
+
+  ndevs = base_dev->get_num_devices_func ();
+
+  if (!base_dev || ndevs <= 0 || goacc_device_num >= ndevs)
+    gomp_fatal ("device %s not supported", name_of_acc_device_t (d));
 
-  if (!acc_dev || acc_dev->get_num_devices_func () <= 0)
-    gomp_fatal ("device %u not supported", (unsigned)d);
+  acc_dev = &base_dev[goacc_device_num];
 
   if (acc_dev->is_initialized)
     gomp_fatal ("device already active");
 
-  /* We need to remember what we were intialized as, to check shutdown etc.  */
-  init_key = d;
-
   gomp_init_device (acc_dev);
 
-  return acc_dev;
+  return base_dev;
+}
+
+static void
+acc_shutdown_1 (acc_device_t d)
+{
+  struct gomp_device_descr *base_dev;
+  struct goacc_thread *walk;
+  int ndevs, i;
+  bool devices_active = false;
+
+  /* Get the base device for this device type.  */
+  base_dev = resolve_device (d);
+
+  if (!base_dev)
+    gomp_fatal ("device %s not supported", name_of_acc_device_t (d));
+
+  gomp_mutex_lock (&goacc_thread_lock);
+
+  /* Free target-specific TLS data and close all devices.  */
+  for (walk = goacc_threads; walk != NULL; walk = walk->next)
+    {
+      if (walk->target_tls)
+	base_dev->openacc.destroy_thread_data_func (walk->target_tls);
+
+      walk->target_tls = NULL;
+
+      /* This would mean the user is shutting down OpenACC in the middle of an
+         "acc data" pragma.  Likely not intentional.  */
+      if (walk->mapped_data)
+	gomp_fatal ("shutdown in 'acc data' region");
+
+      /* Similarly, if this happens then user code has done something weird.  */
+      if (walk->saved_bound_dev)
+        gomp_fatal ("shutdown during host fallback");
+
+      if (walk->dev)
+	{
+	  struct gomp_memory_mapping *mem_map = &walk->dev->mem_map;
+
+	  gomp_mutex_lock (&mem_map->lock);
+	  gomp_free_memmap (mem_map);
+	  gomp_mutex_unlock (&mem_map->lock);
+
+	  walk->dev = NULL;
+	  walk->base_dev = NULL;
+	}
+    }
+
+  gomp_mutex_unlock (&goacc_thread_lock);
+
+  ndevs = base_dev->get_num_devices_func ();
+
+  /* Close all the devices of this type that have been opened.  */
+  for (i = 0; i < ndevs; i++)
+    {
+      struct gomp_device_descr *acc_dev = &base_dev[i];
+      if (acc_dev->is_initialized)
+        {
+	  devices_active = true;
+	  gomp_fini_device (acc_dev);
+	}
+    }
+
+  if (!devices_active)
+    gomp_fatal ("no device initialized");
 }
 
 static struct goacc_thread *
@@ -207,9 +285,11 @@ goacc_destroy_thread (void *data)
 
   if (thr)
     {
-      if (base_dev && thr->target_tls)
+      struct gomp_device_descr *acc_dev = thr->dev;
+
+      if (acc_dev && thr->target_tls)
 	{
-	  base_dev->openacc.destroy_thread_data_func (thr->target_tls);
+	  acc_dev->openacc.destroy_thread_data_func (thr->target_tls);
 	  thr->target_tls = NULL;
 	}
 
@@ -236,53 +316,49 @@ goacc_destroy_thread (void *data)
   gomp_mutex_unlock (&goacc_thread_lock);
 }
 
-/* Open the ORD'th device of the currently-active type (base_dev must be
-   initialised before calling).  If ORD is < 0, open the default-numbered
-   device (set by the ACC_DEVICE_NUM environment variable or a call to
-   acc_set_device_num), or leave any currently-opened device as is.  "Opening"
-   consists of calling the device's open_device_func hook, and setting up
-   thread-local data (maybe allocating, then initializing with information
-   pertaining to the newly-opened or previously-opened device).  */
+/* Use the ORD'th device instance for the current host thread (or -1 for the
+   current global default).  The device (and the runtime) must be initialised
+   before calling this function.  */
 
-static void
-lazy_open (int ord)
+void
+goacc_attach_host_thread_to_device (int ord)
 {
   struct goacc_thread *thr = goacc_thread ();
-  struct gomp_device_descr *acc_dev;
-
-  if (thr && thr->dev)
-    {
-      assert (ord < 0 || ord == thr->dev->target_id);
-      return;
-    }
-
-  assert (base_dev);
-
+  struct gomp_device_descr *acc_dev = NULL, *base_dev = NULL;
+  int num_devices;
+  
+  if (thr && thr->dev && (thr->dev->target_id == ord || ord < 0))
+    return;
+  
   if (ord < 0)
     ord = goacc_device_num;
-
-  /* The OpenACC 2.0 spec leaves the runtime's behaviour when an out-of-range
-     device is requested as implementation-defined (4.2 ACC_DEVICE_NUM).
-     We choose to raise an error in such a case.  */
-  if (ord >= base_dev->get_num_devices_func ())
-    gomp_fatal ("device %u does not exist", ord);
-
+  
+  /* Decide which type of device to use.  If the current thread has a device
+     type already (e.g. set by acc_set_device_type), use that, else use the
+     global default.  */
+  if (thr && thr->base_dev)
+    base_dev = thr->base_dev;
+  else
+    {
+      assert (cached_base_dev);
+      base_dev = cached_base_dev;
+    }
+  
+  num_devices = base_dev->get_num_devices_func ();
+  if (num_devices <= 0 || ord >= num_devices)
+    gomp_fatal ("device %u out of range", ord);
+  
   if (!thr)
     thr = goacc_new_thread ();
-
-  acc_dev = thr->dev = &base_dev[ord];
-
-  assert (acc_dev->target_id == ord);
-
+  
+  thr->base_dev = base_dev;
+  thr->dev = acc_dev = &base_dev[ord];
   thr->saved_bound_dev = NULL;
   thr->mapped_data = NULL;
-
-  if (!acc_dev->openacc.target_data)
-    acc_dev->openacc.target_data = acc_dev->openacc.open_device_func (ord);
-
+  
   thr->target_tls
-    = acc_dev->openacc.create_thread_data_func (acc_dev->openacc.target_data);
-
+    = acc_dev->openacc.create_thread_data_func (ord);
+  
   acc_dev->openacc.async_set_async_func (acc_async_sync);
 }
 
@@ -292,75 +368,20 @@ lazy_open (int ord)
 void
 acc_init (acc_device_t d)
 {
-  if (!base_dev)
+  if (!cached_base_dev)
     gomp_init_targets_once ();
 
   gomp_mutex_lock (&acc_device_lock);
 
-  base_dev = acc_init_1 (d);
-
-  lazy_open (-1);
+  cached_base_dev = acc_init_1 (d);
 
   gomp_mutex_unlock (&acc_device_lock);
+  
+  goacc_attach_host_thread_to_device (-1);
 }
 
 ialias (acc_init)
 
-static void
-acc_shutdown_1 (acc_device_t d)
-{
-  struct goacc_thread *walk;
-
-  /* We don't check whether d matches the actual device found, because
-     OpenACC 2.0 (3.2.12) says the parameters to the init and this
-     call must match (for the shutdown call anyway, it's silent on
-     others).  */
-
-  if (!base_dev)
-    gomp_fatal ("no device initialized");
-  if (d != init_key)
-    gomp_fatal ("device %u(%u) is initialized",
-		(unsigned) init_key, (unsigned) base_dev->type);
-
-  gomp_mutex_lock (&goacc_thread_lock);
-
-  /* Free target-specific TLS data and close all devices.  */
-  for (walk = goacc_threads; walk != NULL; walk = walk->next)
-    {
-      if (walk->target_tls)
-	base_dev->openacc.destroy_thread_data_func (walk->target_tls);
-
-      walk->target_tls = NULL;
-
-      /* This would mean the user is shutting down OpenACC in the middle of an
-         "acc data" pragma.  Likely not intentional.  */
-      if (walk->mapped_data)
-	gomp_fatal ("shutdown in 'acc data' region");
-
-      if (walk->dev)
-	{
-	  void *target_data = walk->dev->openacc.target_data;
-	  if (walk->dev->openacc.close_device_func (target_data) < 0)
-	    gomp_fatal ("failed to close device");
-
-	  walk->dev->openacc.target_data = target_data = NULL;
-
-	  struct gomp_memory_mapping *mem_map = &walk->dev->mem_map;
-	  gomp_mutex_lock (&mem_map->lock);
-	  gomp_free_memmap (mem_map);
-	  gomp_mutex_unlock (&mem_map->lock);
-
-	  walk->dev = NULL;
-	}
-    }
-
-  gomp_mutex_unlock (&goacc_thread_lock);
-
-  gomp_fini_device (base_dev);
-
-  base_dev = NULL;
-}
-
 void
 acc_shutdown (acc_device_t d)
 {
@@ -373,59 +394,16 @@ acc_shutdown (acc_device_t d)
 
 ialias (acc_shutdown)
 
-/* This function is called after plugins have been initialized.  It deals with
-   the "base" device, and is used to prepare the runtime for dealing with a
-   number of such devices (as implemented by some particular plugin).  If the
-   argument device type D matches a previous call to the function, return the
-   current base device, else shut the old device down and re-initialize with
-   the new device type.  */
-
-static struct gomp_device_descr *
-lazy_init (acc_device_t d)
-{
-  if (base_dev)
-    {
-      /* Re-initializing the same device, do nothing.  */
-      if (d == init_key)
-	return base_dev;
-
-      acc_shutdown_1 (init_key);
-    }
-
-  assert (!base_dev);
-
-  return acc_init_1 (d);
-}
-
-/* Ensure that plugins are loaded, initialize and open the (default-numbered)
-   device.  */
-
-static void
-lazy_init_and_open (acc_device_t d)
-{
-  if (!base_dev)
-    gomp_init_targets_once ();
-
-  gomp_mutex_lock (&acc_device_lock);
-
-  base_dev = lazy_init (d);
-
-  lazy_open (-1);
-
-  gomp_mutex_unlock (&acc_device_lock);
-}
-
 int
 acc_get_num_devices (acc_device_t d)
 {
   int n = 0;
-  const struct gomp_device_descr *acc_dev;
+  struct gomp_device_descr *acc_dev;
 
   if (d == acc_device_none)
     return 0;
 
-  if (!base_dev)
-    gomp_init_targets_once ();
+  gomp_init_targets_once ();
 
   acc_dev = resolve_device (d);
   if (!acc_dev)
@@ -440,10 +418,39 @@ acc_get_num_devices (acc_device_t d)
 
 ialias (acc_get_num_devices)
 
+/* Set the device type for the current thread only (using the current global
+   default device number), initialising that device if necessary.  Also set the
+   default device type for new threads to D.  */
+
 void
 acc_set_device_type (acc_device_t d)
 {
-  lazy_init_and_open (d);
+  struct gomp_device_descr *base_dev, *acc_dev;
+  struct goacc_thread *thr = goacc_thread ();
+
+  gomp_mutex_lock (&acc_device_lock);
+
+  if (!cached_base_dev)
+    gomp_init_targets_once ();
+
+  cached_base_dev = base_dev = resolve_device (d);
+  acc_dev = &base_dev[goacc_device_num];
+
+  if (!acc_dev->is_initialized)
+    gomp_init_device (acc_dev);
+
+  gomp_mutex_unlock (&acc_device_lock);
+
+  /* We're changing device type: invalidate the current thread's dev and
+     base_dev pointers.  */
+  if (thr && thr->base_dev != base_dev)
+    {
+      thr->base_dev = thr->dev = NULL;
+      if (thr->mapped_data)
+        gomp_fatal ("acc_set_device_type in 'acc data' region");
+    }
+
+  goacc_attach_host_thread_to_device (-1);
 }
 
 ialias (acc_set_device_type)
@@ -452,10 +459,11 @@ acc_device_t
 acc_get_device_type (void)
 {
   acc_device_t res = acc_device_none;
-  const struct gomp_device_descr *dev;
+  struct gomp_device_descr *dev;
+  struct goacc_thread *thr = goacc_thread ();
 
-  if (base_dev)
-    res = acc_device_type (base_dev->type);
+  if (thr && thr->base_dev)
+    res = acc_device_type (thr->base_dev->type);
   else
     {
       gomp_init_targets_once ();
@@ -476,78 +484,65 @@ int
 acc_get_device_num (acc_device_t d)
 {
   const struct gomp_device_descr *dev;
-  int num;
+  struct goacc_thread *thr = goacc_thread ();
 
   if (d >= _ACC_device_hwm)
     gomp_fatal ("device %u out of range", (unsigned)d);
 
-  if (!base_dev)
+  if (!cached_base_dev)
     gomp_init_targets_once ();
 
   dev = resolve_device (d);
   if (!dev)
-    gomp_fatal ("no devices of type %u", d);
+    gomp_fatal ("device %s not supported", name_of_acc_device_t (d));
 
-  /* We might not have called lazy_open for this host thread yet, in which case
-     the get_device_num_func hook will return -1.  */
-  num = dev->openacc.get_device_num_func ();
-  if (num < 0)
-    num = goacc_device_num;
+  if (thr && thr->base_dev == dev && thr->dev)
+    return thr->dev->target_id;
 
-  return num;
+  return goacc_device_num;
 }
 
 ialias (acc_get_device_num)
 
 void
-acc_set_device_num (int n, acc_device_t d)
+acc_set_device_num (int ord, acc_device_t d)
 {
-  const struct gomp_device_descr *dev;
+  struct gomp_device_descr *base_dev, *acc_dev;
   int num_devices;
 
-  if (!base_dev)
+  if (!cached_base_dev)
     gomp_init_targets_once ();
 
-  if ((int) d == 0)
-    {
-      int i;
-
-      /* A device setting of zero sets all device types on the system to use
-         the Nth instance of that device type.  Only attempt it for initialized
-	 devices though.  */
-      for (i = acc_device_not_host + 1; i < _ACC_device_hwm; i++)
-        {
-	  dev = resolve_device (d);
-	  if (dev && dev->is_initialized)
-	    dev->openacc.set_device_num_func (n);
-	}
+  if (ord < 0)
+    ord = goacc_device_num;
 
-      /* ...and for future calls to acc_init/acc_set_device_type, etc.  */
-      goacc_device_num = n;
-    }
+  if ((int) d == 0)
+    /* Set whatever device is being used by the current host thread to use
+       device instance ORD.  It's unclear if this is supposed to affect other
+       host threads too (OpenACC 2.0 (3.2.4) acc_set_device_num).  */
+    goacc_attach_host_thread_to_device (ord);
   else
     {
-      struct goacc_thread *thr = goacc_thread ();
-
       gomp_mutex_lock (&acc_device_lock);
 
-      base_dev = lazy_init (d);
+      cached_base_dev = base_dev = resolve_device (d);
 
       num_devices = base_dev->get_num_devices_func ();
 
-      if (n >= num_devices)
-        gomp_fatal ("device %u out of range", n);
+      if (ord >= num_devices)
+        gomp_fatal ("device %u out of range", ord);
 
-      /* If we're changing the device number, de-associate this thread with
-	 the device (but don't close the device, since it may be in use by
-	 other threads).  */
-      if (thr && thr->dev && n != thr->dev->target_id)
-	thr->dev = NULL;
+      acc_dev = &base_dev[ord];
 
-      lazy_open (n);
+      if (!acc_dev->is_initialized)
+        gomp_init_device (acc_dev);
 
       gomp_mutex_unlock (&acc_device_lock);
+
+      goacc_attach_host_thread_to_device (ord);
     }
+  
+  goacc_device_num = ord;
 }
 
 ialias (acc_set_device_num)
@@ -578,7 +573,7 @@ goacc_runtime_initialize (void)
 
   pthread_key_create (&goacc_cleanup_key, goacc_destroy_thread);
 
-  base_dev = NULL;
+  cached_base_dev = NULL;
 
   goacc_threads = NULL;
   gomp_mutex_init (&goacc_thread_lock);
@@ -607,9 +602,8 @@ goacc_restore_bind (void)
 }
 
 /* This is called from any OpenACC support function that may need to implicitly
-   initialize the libgomp runtime.  On exit all such initialization will have
-   been done, and both the global ACC_dev and the per-host-thread ACC_memmap
-   pointers will be valid.  */
+   initialize the libgomp runtime, either globally or from a new host thread. 
+   On exit "goacc_thread" will return a valid & populated thread block.  */
 
 attribute_hidden void
 goacc_lazy_initialize (void)
@@ -619,12 +613,8 @@ goacc_lazy_initialize (void)
   if (thr && thr->dev)
     return;
 
-  if (!base_dev)
-    lazy_init_and_open (acc_device_default);
+  if (!cached_base_dev)
+    acc_init (acc_device_default);
   else
-    {
-      gomp_mutex_lock (&acc_device_lock);
-      lazy_open (-1);
-      gomp_mutex_unlock (&acc_device_lock);
-    }
+    goacc_attach_host_thread_to_device (-1);
 }
diff --git a/libgomp/oacc-int.h b/libgomp/oacc-int.h
index 85619c8..0ace737 100644
--- a/libgomp/oacc-int.h
+++ b/libgomp/oacc-int.h
@@ -56,6 +56,9 @@ acc_device_type (enum offload_target_type type)
 
 struct goacc_thread
 {
+  /* The base device for the current thread.  */
+  struct gomp_device_descr *base_dev;
+
   /* The device for the current thread.  */
   struct gomp_device_descr *dev;
 
@@ -89,10 +92,7 @@ goacc_thread (void)
 #endif
 
 void goacc_register (struct gomp_device_descr *) __GOACC_NOTHROW;
-
-/* Current dispatcher.  */
-extern struct gomp_device_descr *base_dev;
-
+void goacc_attach_host_thread_to_device (int);
 void goacc_runtime_initialize (void);
 void goacc_save_and_set_bind (acc_device_t);
 void goacc_restore_bind (void);
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 0096d51..0e09837 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -112,7 +112,9 @@ acc_malloc (size_t s)
 
   struct goacc_thread *thr = goacc_thread ();
 
-  return base_dev->alloc_func (thr->dev->target_id, s);
+  assert (thr->dev);
+
+  return thr->dev->alloc_func (thr->dev->target_id, s);
 }
 
 /* OpenACC 2.0a (3.2.16) doesn't specify what to do in the event
@@ -127,6 +129,8 @@ acc_free (void *d)
   if (!d)
     return;
 
+  assert (thr->dev);
+
   /* We don't have to call lazy open here, as the ptr value must have
      been returned by acc_malloc.  It's not permitted to pass NULL in
      (unless you got that null from acc_malloc).  */
@@ -139,7 +143,7 @@ acc_free (void *d)
      acc_unmap_data ((void *)(k->host_start + offset));
    }
 
-  base_dev->free_func (thr->dev->target_id, d);
+  thr->dev->free_func (thr->dev->target_id, d);
 }
 
 void
@@ -149,7 +153,9 @@ acc_memcpy_to_device (void *d, void *h, size_t s)
      been obtained from a routine that did that.  */
   struct goacc_thread *thr = goacc_thread ();
 
-  base_dev->host2dev_func (thr->dev->target_id, d, h, s);
+  assert (thr->dev);
+
+  thr->dev->host2dev_func (thr->dev->target_id, d, h, s);
 }
 
 void
@@ -159,7 +165,9 @@ acc_memcpy_from_device (void *h, void *d, size_t s)
      been obtained from a routine that did that.  */
   struct goacc_thread *thr = goacc_thread ();
 
-  base_dev->dev2host_func (thr->dev->target_id, h, d, s);
+  assert (thr->dev);
+
+  thr->dev->dev2host_func (thr->dev->target_id, h, d, s);
 }
 
 /* Return the device pointer that corresponds to host data H.  Or NULL
diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
index 727fced..0526cf6 100644
--- a/libgomp/oacc-parallel.c
+++ b/libgomp/oacc-parallel.c
@@ -46,32 +46,6 @@ find_pset (int pos, size_t mapnum, unsigned short *kinds)
   return kind == GOMP_MAP_TO_PSET;
 }
 
-
-/* Ensure that the target device for DEVICE_TYPE is initialised (and that
-   plugins have been loaded if appropriate).  The ACC_dev variable for the
-   current thread will be set appropriately for the given device type on
-   return.  */
-
-attribute_hidden void
-select_acc_device (int device_type)
-{
-  goacc_lazy_initialize ();
-
-  if (device_type == GOMP_DEVICE_HOST_FALLBACK)
-    return;
-
-  if (device_type == acc_device_none)
-    device_type = acc_device_host;
-
-  if (device_type >= 0)
-    {
-      /* NOTE: this will go badly if the surrounding data environment is set up
-         to use a different device type.  We'll just have to trust that users
-	 know what they're doing...  */
-      acc_set_device_type (device_type);
-    }
-}
-
 static void goacc_wait (int async, int num_waits, va_list ap);
 
 void
@@ -99,7 +73,7 @@ GOACC_parallel (int device, void (*fn) (void *),
   gomp_debug (0, "%s: mapnum=%zd, hostaddrs=%p, sizes=%p, kinds=%p, async=%d\n",
 	      __FUNCTION__, mapnum, hostaddrs, sizes, kinds, async);
 
-  select_acc_device (device);
+  goacc_lazy_initialize ();
 
   thr = goacc_thread ();
   acc_dev = thr->dev;
@@ -178,7 +152,7 @@ GOACC_data_start (int device, size_t mapnum,
   gomp_debug (0, "%s: mapnum=%zd, hostaddrs=%p, sizes=%p, kinds=%p\n",
 	      __FUNCTION__, mapnum, hostaddrs, sizes, kinds);
 
-  select_acc_device (device);
+  goacc_lazy_initialize ();
 
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
@@ -225,7 +199,7 @@ GOACC_enter_exit_data (int device, size_t mapnum,
   bool data_enter = false;
   size_t i;
 
-  select_acc_device (device);
+  goacc_lazy_initialize ();
 
   thr = goacc_thread ();
   acc_dev = thr->dev;
@@ -366,7 +340,7 @@ GOACC_kernels (int device, void (*fn) (void *),
 
   va_list ap;
 
-  select_acc_device (device);
+  goacc_lazy_initialize ();
 
   va_start (ap, num_waits);
 
@@ -437,7 +411,7 @@ GOACC_update (int device, size_t mapnum,
   bool host_fallback = device == GOMP_DEVICE_HOST_FALLBACK;
   size_t i;
 
-  select_acc_device (device);
+  goacc_lazy_initialize ();
 
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
diff --git a/libgomp/plugin/plugin-host.c b/libgomp/plugin/plugin-host.c
index bc60f72..1faf5bc 100644
--- a/libgomp/plugin/plugin-host.c
+++ b/libgomp/plugin/plugin-host.c
@@ -119,31 +119,6 @@ GOMP_OFFLOAD_unload_image (int n __attribute__ ((unused)),
 }
 
 STATIC void *
-GOMP_OFFLOAD_openacc_open_device (int n)
-{
-  return (void *) (intptr_t) n;
-}
-
-STATIC int
-GOMP_OFFLOAD_openacc_close_device (void *hnd)
-{
-  return 0;
-}
-
-STATIC int
-GOMP_OFFLOAD_openacc_get_device_num (void)
-{
-  return 0;
-}
-
-STATIC void
-GOMP_OFFLOAD_openacc_set_device_num (int n)
-{
-  if (n > 0)
-    GOMP (fatal) ("device number %u out of range for host execution", n);
-}
-
-STATIC void *
 GOMP_OFFLOAD_alloc (int n __attribute__ ((unused)), size_t s)
 {
   return GOMP (malloc) (s);
@@ -254,7 +229,7 @@ GOMP_OFFLOAD_openacc_async_wait_all_async (int async __attribute__ ((unused)))
 }
 
 STATIC void *
-GOMP_OFFLOAD_openacc_create_thread_data (void *targ_data
+GOMP_OFFLOAD_openacc_create_thread_data (int ord
 					 __attribute__ ((unused)))
 {
   return NULL;
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 483cb75..0dbde05 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -46,6 +46,7 @@
 #include <dlfcn.h>
 #include <unistd.h>
 #include <assert.h>
+#include <pthread.h>
 
 #define	ARRAYSIZE(X) (sizeof (X) / sizeof ((X)[0]))
 
@@ -133,7 +134,8 @@ struct targ_fn_descriptor
   const char *name;
 };
 
-static bool ptx_inited = false;
+static unsigned int instantiated_devices = 0;
+static pthread_mutex_t ptx_dev_lock = PTHREAD_MUTEX_INITIALIZER;
 
 struct ptx_stream
 {
@@ -331,9 +333,21 @@ struct ptx_event
   struct ptx_event *next;
 };
 
+struct ptx_image_data
+{
+  void *target_data;
+  CUmodule module;
+  struct ptx_image_data *next;
+};
+
 static pthread_mutex_t ptx_event_lock;
 static struct ptx_event *ptx_events;
 
+static struct ptx_device **ptx_devices;
+
+static struct ptx_image_data *ptx_images = NULL;
+static pthread_mutex_t ptx_image_lock = PTHREAD_MUTEX_INITIALIZER;
+
 #define _XSTR(s) _STR(s)
 #define _STR(s) #s
 
@@ -575,21 +589,21 @@ select_stream_for_async (int async, pthread_t thread, bool create,
   return stream;
 }
 
-static int nvptx_get_num_devices (void);
-
-/* Initialize the device.  */
-static int
+/* Initialize the device.  Return TRUE on success, else FALSE.  PTX_DEV_LOCK
+   should be locked on entry and remains locked on exit.  */
+static bool
 nvptx_init (void)
 {
   CUresult r;
   int rc;
+  int ndevs;
 
-  if (ptx_inited)
-    return nvptx_get_num_devices ();
+  if (instantiated_devices != 0)
+    return true;
 
   rc = verify_device_library ();
   if (rc < 0)
-    return -1;
+    return false;
 
   r = cuInit (0);
   if (r != CUDA_SUCCESS)
@@ -599,18 +613,58 @@ nvptx_init (void)
 
   pthread_mutex_init (&ptx_event_lock, NULL);
 
-  ptx_inited = true;
+  r = cuDeviceGetCount (&ndevs);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuDeviceGetCount error: %s", cuda_error (r));
 
-  return nvptx_get_num_devices ();
+  ptx_devices = GOMP_PLUGIN_malloc (sizeof (struct ptx_device *) * ndevs);
+
+  return true;
 }
 
+/* Select the N'th PTX device for the current host thread.  The device must
+   have been previously opened before calling this function.  */
+
 static void
-nvptx_fini (void)
+nvptx_attach_host_thread_to_device (int n)
 {
-  ptx_inited = false;
+  CUdevice dev;
+  CUresult r;
+  struct ptx_device *ptx_dev;
+  CUcontext thd_ctx;
+
+  r = cuCtxGetDevice (&dev);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuCtxGetDevice error: %s", cuda_error (r));
+
+  if (dev == n)
+    return;
+  else
+    {
+      CUcontext old_ctx;
+
+      ptx_dev = ptx_devices[n];
+      assert (ptx_dev);
+
+      r = cuCtxGetCurrent (&thd_ctx);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuCtxGetCurrent error: %s", cuda_error (r));
+
+      /* The host thread must already have a non-NULL context when this function
+         is called because we must have previously called nvptx_open_device.  */
+      assert (thd_ctx);
+
+      r = cuCtxPopCurrent (&old_ctx);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuCtxPopCurrent error: %s", cuda_error (r));
+
+      r = cuCtxPushCurrent (ptx_dev->ctx);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuCtxPushCurrent error: %s", cuda_error (r));
+    }
 }
 
-static void *
+static struct ptx_device *
 nvptx_open_device (int n)
 {
   struct ptx_device *ptx_dev;
@@ -678,17 +732,16 @@ nvptx_open_device (int n)
 
   init_streams_for_device (ptx_dev, async_engines);
 
-  return (void *) ptx_dev;
+  return ptx_dev;
 }
 
-static int
-nvptx_close_device (void *targ_data)
+static void
+nvptx_close_device (struct ptx_device *ptx_dev)
 {
   CUresult r;
-  struct ptx_device *ptx_dev = targ_data;
 
   if (!ptx_dev)
-    return 0;
+    return;
 
   fini_streams_for_device (ptx_dev);
 
@@ -700,8 +753,6 @@ nvptx_close_device (void *targ_data)
     }
 
   free (ptx_dev);
-
-  return 0;
 }
 
 static int
@@ -714,7 +765,7 @@ nvptx_get_num_devices (void)
      order to enumerate available devices, but CUDA API routines can't be used
      until cuInit has been called.  Just call it now (but don't yet do any
      further initialization).  */
-  if (!ptx_inited)
+  if (instantiated_devices == 0)
     cuInit (0);
 
   r = cuDeviceGetCount (&n);
@@ -1507,64 +1558,84 @@ GOMP_OFFLOAD_get_num_devices (void)
   return nvptx_get_num_devices ();
 }
 
-static void **kernel_target_data;
-static void **kernel_host_table;
-
 void
-GOMP_OFFLOAD_register_image (void *host_table, void *target_data)
+GOMP_OFFLOAD_init_device (int n)
 {
-  kernel_target_data = target_data;
-  kernel_host_table = host_table;
-}
+  pthread_mutex_lock (&ptx_dev_lock);
 
-void
-GOMP_OFFLOAD_init_device (int n __attribute__ ((unused)))
-{
-  (void) nvptx_init ();
+  if (!nvptx_init ()
+      || (instantiated_devices & (1 << n)) != 0)
+    {
+      pthread_mutex_unlock (&ptx_dev_lock);
+      return NULL;
+    }
+
+  ptx_devices[n] = nvptx_open_device (n);
+  instantiated_devices |= 1 << n;
+
+  pthread_mutex_unlock (&ptx_dev_lock);
 }
 
 void
-GOMP_OFFLOAD_fini_device (int n __attribute__ ((unused)))
+GOMP_OFFLOAD_fini_device (int n)
 {
-  nvptx_fini ();
+  pthread_mutex_lock (&ptx_dev_lock);
+
+  if (instantiated_devices & (1 << n))
+    {
+      nvptx_close_device (ptx_devices[n]);
+      instantiated_devices &= ~(1 << n);
+    }
+
+  pthread_mutex_unlock (&ptx_dev_lock);
 }
 
 int
-GOMP_OFFLOAD_get_table (int n __attribute__ ((unused)),
-			struct mapping_table **tablep)
+GOMP_OFFLOAD_load_image (int ord, void *target_data,
+			 struct addr_pair **target_table)
 {
   CUmodule module;
-  void **fn_table;
-  char **fn_names;
-  int fn_entries, i;
+  char **fn_names, **var_names;
+  unsigned int fn_entries, var_entries, i, j;
   CUresult r;
   struct targ_fn_descriptor *targ_fns;
+  void **img_header = (void **) target_data;
+  struct ptx_image_data *new_image;
+
+  nvptx_attach_host_thread_to_device (ord);
 
   if (nvptx_init () <= 0)
     return 0;
 
-  /* This isn't an error, because an image may legitimately have no offloaded
-     regions and so will not call GOMP_offload_register.  */
-  if (kernel_target_data == NULL)
-    return 0;
+  link_ptx (&module, img_header[0]);
 
-  link_ptx (&module, kernel_target_data[0]);
+  pthread_mutex_lock (&ptx_image_lock);
+  new_image = GOMP_PLUGIN_malloc (sizeof (struct ptx_image_data));
+  new_image->target_data = target_data;
+  new_image->module = module;
+  new_image->next = ptx_images;
+  ptx_images = new_image;
+  pthread_mutex_unlock (&ptx_image_lock);
 
-  /* kernel_target_data[0] -> ptx code
-     kernel_target_data[1] -> variable mappings
-     kernel_target_data[2] -> array of kernel names in ascii
+  /* The mkoffload utility emits a table of pointers/integers at the start of
+     each offload image:
 
-     kernel_host_table[0] -> start of function addresses (__offload_func_table)
-     kernel_host_table[1] -> end of function addresses (__offload_funcs_end)
+     img_header[0] -> ptx code
+     img_header[1] -> number of variables
+     img_header[2] -> array of variable names (pointers to strings)
+     img_header[3] -> number of kernels
+     img_header[4] -> array of kernel names (pointers to strings)
 
      The array of kernel names and the functions addresses form a
      one-to-one correspondence.  */
 
-  fn_table = kernel_host_table[0];
-  fn_names = (char **) kernel_target_data[2];
-  fn_entries = (kernel_host_table[1] - kernel_host_table[0]) / sizeof (void *);
+  var_entries = (uintptr_t) img_header[1];
+  var_names = (char **) img_header[2];
+  fn_entries = (uintptr_t) img_header[3];
+  fn_names = (char **) img_header[4];
 
-  *tablep = GOMP_PLUGIN_malloc (sizeof (struct mapping_table) * fn_entries);
+  *target_table = GOMP_PLUGIN_malloc (sizeof (struct addr_pair)
+				      * (fn_entries + var_entries));
   targ_fns = GOMP_PLUGIN_malloc (sizeof (struct targ_fn_descriptor)
 				 * fn_entries);
 
@@ -1579,38 +1650,86 @@ GOMP_OFFLOAD_get_table (int n __attribute__ ((unused)),
       targ_fns[i].fn = function;
       targ_fns[i].name = (const char *) fn_names[i];
 
-      (*tablep)[i].host_start = (uintptr_t) fn_table[i];
-      (*tablep)[i].host_end = (*tablep)[i].host_start + 1;
-      (*tablep)[i].tgt_start = (uintptr_t) &targ_fns[i];
-      (*tablep)[i].tgt_end = (*tablep)[i].tgt_start + 1;
+      (*target_table)[i].start = (uintptr_t) &targ_fns[i];
+      (*target_table)[i].end = (*target_table)[i].start + 1;
     }
 
-  return fn_entries;
+  for (j = 0; j < var_entries; j++, i++)
+    {
+      CUdeviceptr var;
+      size_t bytes;
+
+      r = cuModuleGetGlobal (&var, &bytes, module, var_names[j]);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuModuleGetGlobal error: %s", cuda_error (r));
+
+      (*target_table)[i].start = (uintptr_t) var;
+      (*target_table)[i].end = (*target_table)[i].start + bytes;
+    }
+
+  return i;
+}
+
+void
+GOMP_OFFLOAD_unload_image (int tid __attribute__((unused)), void *target_data)
+{
+  void **img_header = (void **) target_data;
+  struct targ_fn_descriptor *targ_fns
+    = (struct targ_fn_descriptor *) img_header[0];
+  struct ptx_image_data *image, *prev = NULL, *newhd = NULL;
+
+  free (targ_fns);
+
+  pthread_mutex_lock (&ptx_image_lock);
+  for (image = ptx_images; image != NULL;)
+    {
+      struct ptx_image_data *next = image->next;
+
+      if (image->target_data == target_data)
+	{
+	  cuModuleUnload (image->module);
+	  free (image);
+	  if (prev)
+	    prev->next = next;
+	}
+      else
+	{
+	  prev = image;
+	  if (!newhd)
+	    newhd = image;
+	}
+
+      image = next;
+    }
+  ptx_images = newhd;
+  pthread_mutex_unlock (&ptx_image_lock);
 }
 
 void *
-GOMP_OFFLOAD_alloc (int n __attribute__ ((unused)), size_t size)
+GOMP_OFFLOAD_alloc (int ord, size_t size)
 {
+  nvptx_attach_host_thread_to_device (ord);
   return nvptx_alloc (size);
 }
 
 void
-GOMP_OFFLOAD_free (int n __attribute__ ((unused)), void *ptr)
+GOMP_OFFLOAD_free (int ord, void *ptr)
 {
+  nvptx_attach_host_thread_to_device (ord);
   nvptx_free (ptr);
 }
 
 void *
-GOMP_OFFLOAD_dev2host (int ord __attribute__ ((unused)), void *dst,
-		       const void *src, size_t n)
+GOMP_OFFLOAD_dev2host (int ord, void *dst, const void *src, size_t n)
 {
+  nvptx_attach_host_thread_to_device (ord);
   return nvptx_dev2host (dst, src, n);
 }
 
 void *
-GOMP_OFFLOAD_host2dev (int ord __attribute__ ((unused)), void *dst,
-		       const void *src, size_t n)
+GOMP_OFFLOAD_host2dev (int ord, void *dst, const void *src, size_t n)
 {
+  nvptx_attach_host_thread_to_device (ord);
   return nvptx_host2dev (dst, src, n);
 }
 
@@ -1627,45 +1746,6 @@ GOMP_OFFLOAD_openacc_parallel (void (*fn) (void *), size_t mapnum,
 	    num_workers, vector_length, async, targ_mem_desc);
 }
 
-void *
-GOMP_OFFLOAD_openacc_open_device (int n)
-{
-  return nvptx_open_device (n);
-}
-
-int
-GOMP_OFFLOAD_openacc_close_device (void *h)
-{
-  return nvptx_close_device (h);
-}
-
-void
-GOMP_OFFLOAD_openacc_set_device_num (int n)
-{
-  struct nvptx_thread *nvthd = nvptx_thread ();
-
-  assert (n >= 0);
-
-  if (!nvthd->ptx_dev || nvthd->ptx_dev->ord != n)
-    (void) nvptx_open_device (n);
-}
-
-/* This can be called before the device is "opened" for the current thread, in
-   which case we can't tell which device number should be returned.  We don't
-   actually want to open the device here, so just return -1 and let the caller
-   (oacc-init.c:acc_get_device_num) handle it.  */
-
-int
-GOMP_OFFLOAD_openacc_get_device_num (void)
-{
-  struct nvptx_thread *nvthd = nvptx_thread ();
-
-  if (nvthd && nvthd->ptx_dev)
-    return nvthd->ptx_dev->ord;
-  else
-    return -1;
-}
-
 void
 GOMP_OFFLOAD_openacc_register_async_cleanup (void *targ_mem_desc)
 {
@@ -1729,14 +1809,18 @@ GOMP_OFFLOAD_openacc_async_set_async (int async)
 }
 
 void *
-GOMP_OFFLOAD_openacc_create_thread_data (void *targ_data)
+GOMP_OFFLOAD_openacc_create_thread_data (int ord)
 {
-  struct ptx_device *ptx_dev = (struct ptx_device *) targ_data;
+  struct ptx_device *ptx_dev;
   struct nvptx_thread *nvthd
     = GOMP_PLUGIN_malloc (sizeof (struct nvptx_thread));
   CUresult r;
   CUcontext thd_ctx;
 
+  ptx_dev = ptx_devices[ord];
+
+  assert (ptx_dev);
+
   r = cuCtxGetCurrent (&thd_ctx);
   if (r != CUDA_SUCCESS)
     GOMP_PLUGIN_fatal ("cuCtxGetCurrent error: %s", cuda_error (r));
diff --git a/libgomp/target.c b/libgomp/target.c
index f443cff..0e04440 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1074,10 +1074,6 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
     {
       optional_present = optional_total = 0;
       DLSYM_OPT (openacc.exec, openacc_parallel);
-      DLSYM_OPT (openacc.open_device, openacc_open_device);
-      DLSYM_OPT (openacc.close_device, openacc_close_device);
-      DLSYM_OPT (openacc.get_device_num, openacc_get_device_num);
-      DLSYM_OPT (openacc.set_device_num, openacc_set_device_num);
       DLSYM_OPT (openacc.register_async_cleanup,
 		 openacc_register_async_cleanup);
       DLSYM_OPT (openacc.async_test, openacc_async_test);
@@ -1187,7 +1183,6 @@ gomp_target_init (void)
 		current_device.mem_map.splay_tree.root = NULL;
 		current_device.is_initialized = false;
 		current_device.openacc.data_environ = NULL;
-		current_device.openacc.target_data = NULL;
 		for (i = 0; i < new_num_devices; i++)
 		  {
 		    current_device.target_id = i;

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter
  2015-01-16 21:13 ` [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter (was: Merge current set of OpenACC changes from gomp-4_0-branch) Thomas Schwinge
  2015-01-16 23:19   ` Ilya Verbin
  2015-01-17  3:09   ` Jack Howarth
@ 2015-02-24 17:23   ` Thomas Schwinge
  2 siblings, 0 replies; 92+ messages in thread
From: Thomas Schwinge @ 2015-02-24 17:23 UTC (permalink / raw)
  To: jakub, gcc-patches
  Cc: howarth, dominiq, andrey.turetskiy, bernds, iverbin, kyukhin

[-- Attachment #1: Type: text/plain, Size: 32691 bytes --]

Hi!

On Fri, 16 Jan 2015 21:34:15 +0100, I wrote:
> On Thu, 15 Jan 2015 21:20:07 +0100, I wrote:
> > In r219682, I have committed to trunk our current set of OpenACC changes,
> 
> Here is a patch to remove the __OFFLOAD_SYMBOL__ variable/formal
> parameter, as discussed in <https://gcc.gnu.org/PR64625>.

> commit 4409d0129118479c1cd1adbcfa96316ac4e734b0
> Author: Thomas Schwinge <thomas@codesourcery.com>
> Date:   Fri Jan 16 20:12:12 2015 +0100
> 
>     [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter.
>     
>     	gcc/
>     	* omp-low.c (offload_symbol_decl): Remove variable.
>     	(get_offload_symbol_decl): Remove function.
>     	(expand_omp_target): For BUILT_IN_GOMP_TARGET,
>     	BUILT_IN_GOMP_TARGET_DATA, BUILT_IN_GOMP_TARGET_UPDATE pass NULL
>     	instead of &__OFFLOAD_TABLE__, for BUILT_IN_GOACC_DATA_START,
>     	BUILT_IN_GOACC_ENTER_EXIT_DATA, BUILT_IN_GOACC_PARALLEL,
>     	BUILT_IN_GOACC_UPDATE don't pass it at all.
>     	libgomp/
>     	* libgomp_g.h (GOACC_data_start, GOACC_enter_exit_data)
>     	(GOACC_parallel, GOACC_update): Remove const_void *offload_table
>     	formal parameter.  Update all users.
>     	* target.c (GOMP_target, GOMP_target_data, GOMP_target_update):
>     	Document unused formal parameter.

This I committed in r219836.  Turns out, that patch was incomplete, but
nobody noticed, not even GCC itself.  Committed to trunk in r220944:

commit 6349b8cc3e359740c942717ca463f88b91bc034c
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Tue Feb 24 17:00:36 2015 +0000

    [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter.
    
    Fixup for r219836: adjust builtin function prototypes.
    
    	PR libgomp/64625
    	gcc/
    	* omp-builtins.def (BUILT_IN_GOACC_DATA_START): Specify as
    	BT_FN_VOID_INT_SIZE_PTR_PTR_PTR, not
    	BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR.
    	(BUILT_IN_GOACC_ENTER_EXIT_DATA, BUILT_IN_GOACC_UPDATE): Specify as
    	BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_INT_INT_VAR, not
    	BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR_INT_INT_VAR.
    	(BUILT_IN_GOACC_PARALLEL): Specify as
    	BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR, not
    	BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR.
    	* builtin-types.def
    	(BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR_INT_INT_VAR)
    	(BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR):
    	Remove function types.
    	(BT_FN_VOID_INT_SIZE_PTR_PTR_PTR)
    	(BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_INT_INT_VAR)
    	(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR):
    	New function types.
    	gcc/ada/
    	* gcc-interface/utils.c (DEF_FUNCTION_TYPE_VAR_8)
    	(DEF_FUNCTION_TYPE_VAR_12): Remove macros.
    	(DEF_FUNCTION_TYPE_VAR_7, DEF_FUNCTION_TYPE_VAR_11): New macros.
    	gcc/c-family/
    	* c-common.c (DEF_FUNCTION_TYPE_VAR_8, DEF_FUNCTION_TYPE_VAR_12):
    	Remove macros.
    	(DEF_FUNCTION_TYPE_VAR_7, DEF_FUNCTION_TYPE_VAR_11): New macros.
    	gcc/fortran/
    	* f95-lang.c (DEF_FUNCTION_TYPE_VAR_8, DEF_FUNCTION_TYPE_VAR_12):
    	Remove macros.
    	(DEF_FUNCTION_TYPE_VAR_7, DEF_FUNCTION_TYPE_VAR_11): New macros.
    	* types.def (BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR_INT_INT_VAR)
    	(BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR):
    	Remove function types.
    	(BT_FN_VOID_INT_SIZE_PTR_PTR_PTR)
    	(BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_INT_INT_VAR)
    	(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR):
    	New function types.
    	gcc/jit/
    	* jit-builtins.c (DEF_FUNCTION_TYPE_VAR_8)
    	(DEF_FUNCTION_TYPE_VAR_12): Remove macros.
    	(DEF_FUNCTION_TYPE_VAR_7, DEF_FUNCTION_TYPE_VAR_11): New macros.
    	* jit-builtins.h (DEF_FUNCTION_TYPE_VAR_8)
    	(DEF_FUNCTION_TYPE_VAR_12): Remove macros.
    	(DEF_FUNCTION_TYPE_VAR_7, DEF_FUNCTION_TYPE_VAR_11): New macros.
    	gcc/lto/
    	* lto-lang.c (DEF_FUNCTION_TYPE_VAR_8, DEF_FUNCTION_TYPE_VAR_12):
    	Remove macros.
    	(DEF_FUNCTION_TYPE_VAR_7, DEF_FUNCTION_TYPE_VAR_11): New macros.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@220944 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog                 |   21 ++++++++++++++++++++
 gcc/ada/ChangeLog             |    7 +++++++
 gcc/ada/gcc-interface/utils.c |   35 +++++++++++++++-----------------
 gcc/builtin-types.def         |   10 ++++++----
 gcc/c-family/ChangeLog        |    7 +++++++
 gcc/c-family/c-common.c       |   34 +++++++++++++++----------------
 gcc/fortran/ChangeLog         |   14 +++++++++++++
 gcc/fortran/f95-lang.c        |   26 +++++++++++-------------
 gcc/fortran/types.def         |   10 ++++++----
 gcc/jit/ChangeLog             |   10 ++++++++++
 gcc/jit/jit-builtins.c        |   44 +++++++++++++++++++++++------------------
 gcc/jit/jit-builtins.h        |   24 +++++++++++-----------
 gcc/lto/ChangeLog             |    7 +++++++
 gcc/lto/lto-lang.c            |   34 +++++++++++++++----------------
 gcc/omp-builtins.def          |    8 ++++----
 15 files changed, 180 insertions(+), 111 deletions(-)

diff --git gcc/ChangeLog gcc/ChangeLog
index bcae92c..38ed447 100644
--- gcc/ChangeLog
+++ gcc/ChangeLog
@@ -1,3 +1,24 @@
+2015-02-24  Thomas Schwinge  <thomas@codesourcery.com>
+
+	PR libgomp/64625
+	* omp-builtins.def (BUILT_IN_GOACC_DATA_START): Specify as
+	BT_FN_VOID_INT_SIZE_PTR_PTR_PTR, not
+	BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR.
+	(BUILT_IN_GOACC_ENTER_EXIT_DATA, BUILT_IN_GOACC_UPDATE): Specify as
+	BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_INT_INT_VAR, not
+	BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR_INT_INT_VAR.
+	(BUILT_IN_GOACC_PARALLEL): Specify as
+	BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR, not
+	BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR.
+	* builtin-types.def
+	(BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR_INT_INT_VAR)
+	(BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR):
+	Remove function types.
+	(BT_FN_VOID_INT_SIZE_PTR_PTR_PTR)
+	(BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_INT_INT_VAR)
+	(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR):
+	New function types.
+
 2015-02-24  Georg-Johann Lay  <avr@gjlay.de>
 
 	* config/avr/stdfix.h [__WITH_AVRLIBC__]: Include <stdfix-avrlibc.h>.
diff --git gcc/ada/ChangeLog gcc/ada/ChangeLog
index 6fdaaf7..9472434 100644
--- gcc/ada/ChangeLog
+++ gcc/ada/ChangeLog
@@ -1,3 +1,10 @@
+2015-02-24  Thomas Schwinge  <thomas@codesourcery.com>
+
+	PR libgomp/64625
+	* gcc-interface/utils.c (DEF_FUNCTION_TYPE_VAR_8)
+	(DEF_FUNCTION_TYPE_VAR_12): Remove macros.
+	(DEF_FUNCTION_TYPE_VAR_7, DEF_FUNCTION_TYPE_VAR_11): New macros.
+
 2015-02-23  Thomas Schwinge  <thomas@codesourcery.com>
 
 	* gcc-interface/utils.c (DEF_FUNCTION_TYPE_VAR_8): Fix number of
diff --git gcc/ada/gcc-interface/utils.c gcc/ada/gcc-interface/utils.c
index d076da7..4fa3d32 100644
--- gcc/ada/gcc-interface/utils.c
+++ gcc/ada/gcc-interface/utils.c
@@ -5343,13 +5343,11 @@ enum c_builtin_type
 #define DEF_FUNCTION_TYPE_VAR_3(NAME, RETURN, ARG1, ARG2, ARG3) NAME,
 #define DEF_FUNCTION_TYPE_VAR_4(NAME, RETURN, ARG1, ARG2, ARG3, ARG4) NAME,
 #define DEF_FUNCTION_TYPE_VAR_5(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5) \
-  NAME,
-#define DEF_FUNCTION_TYPE_VAR_8(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				ARG6, ARG7, ARG8)			\
-  NAME,
-#define DEF_FUNCTION_TYPE_VAR_12(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11, ARG12) \
-  NAME,
+				NAME,
+#define DEF_FUNCTION_TYPE_VAR_7(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				ARG6, ARG7) NAME,
+#define DEF_FUNCTION_TYPE_VAR_11(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11) NAME,
 #define DEF_POINTER_TYPE(NAME, TYPE) NAME,
 #include "builtin-types.def"
 #undef DEF_PRIMITIVE_TYPE
@@ -5368,8 +5366,8 @@ enum c_builtin_type
 #undef DEF_FUNCTION_TYPE_VAR_3
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
-#undef DEF_FUNCTION_TYPE_VAR_8
-#undef DEF_FUNCTION_TYPE_VAR_12
+#undef DEF_FUNCTION_TYPE_VAR_7
+#undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE
   BT_LAST
 };
@@ -5475,14 +5473,13 @@ install_builtin_function_types (void)
   def_fn_type (ENUM, RETURN, 1, 4, ARG1, ARG2, ARG3, ARG4);
 #define DEF_FUNCTION_TYPE_VAR_5(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5) \
   def_fn_type (ENUM, RETURN, 1, 5, ARG1, ARG2, ARG3, ARG4, ARG5);
-#define DEF_FUNCTION_TYPE_VAR_8(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				ARG6, ARG7, ARG8)			\
-  def_fn_type (ENUM, RETURN, 1, 8, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6,	\
-	       ARG7, ARG8);
-#define DEF_FUNCTION_TYPE_VAR_12(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11, ARG12) \
-  def_fn_type (ENUM, RETURN, 1, 12, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6,	\
-	       ARG7, ARG8, ARG9, ARG10, ARG11, ARG12);
+#define DEF_FUNCTION_TYPE_VAR_7(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				ARG6, ARG7)				\
+  def_fn_type (ENUM, RETURN, 1, 7, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6, ARG7);
+#define DEF_FUNCTION_TYPE_VAR_11(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11) \
+  def_fn_type (ENUM, RETURN, 1, 11, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6,	\
+	       ARG7, ARG8, ARG9, ARG10, ARG11);
 #define DEF_POINTER_TYPE(ENUM, TYPE) \
   builtin_types[(int) ENUM] = build_pointer_type (builtin_types[(int) TYPE]);
 
@@ -5504,8 +5501,8 @@ install_builtin_function_types (void)
 #undef DEF_FUNCTION_TYPE_VAR_3
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
-#undef DEF_FUNCTION_TYPE_VAR_8
-#undef DEF_FUNCTION_TYPE_VAR_12
+#undef DEF_FUNCTION_TYPE_VAR_7
+#undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE
   builtin_types[(int) BT_LAST] = NULL_TREE;
 }
diff --git gcc/builtin-types.def gcc/builtin-types.def
index 3412677..0e34531 100644
--- gcc/builtin-types.def
+++ gcc/builtin-types.def
@@ -492,6 +492,8 @@ DEF_FUNCTION_TYPE_5 (BT_FN_BOOL_VPTR_PTR_I8_INT_INT,
 		     BT_BOOL, BT_VOLATILE_PTR, BT_PTR, BT_I8, BT_INT, BT_INT)
 DEF_FUNCTION_TYPE_5 (BT_FN_BOOL_VPTR_PTR_I16_INT_INT,
 		     BT_BOOL, BT_VOLATILE_PTR, BT_PTR, BT_I16, BT_INT, BT_INT)
+DEF_FUNCTION_TYPE_5 (BT_FN_VOID_INT_SIZE_PTR_PTR_PTR,
+		     BT_VOID, BT_INT, BT_SIZE, BT_PTR, BT_PTR, BT_PTR)
 DEF_FUNCTION_TYPE_5 (BT_FN_VOID_OMPFN_PTR_UINT_UINT_UINT,
 		     BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR, BT_UINT, BT_UINT,
 		     BT_UINT)
@@ -588,12 +590,12 @@ DEF_FUNCTION_TYPE_VAR_5 (BT_FN_INT_STRING_SIZE_INT_SIZE_CONST_STRING_VAR,
 DEF_FUNCTION_TYPE_VAR_5 (BT_FN_INT_INT_INT_INT_INT_INT_VAR,
 			 BT_INT, BT_INT, BT_INT, BT_INT, BT_INT, BT_INT)
 
-DEF_FUNCTION_TYPE_VAR_8 (BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR_INT_INT_VAR,
-			 BT_VOID, BT_INT, BT_PTR, BT_SIZE, BT_PTR, BT_PTR,
+DEF_FUNCTION_TYPE_VAR_7 (BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_INT_INT_VAR,
+			 BT_VOID, BT_INT, BT_SIZE, BT_PTR, BT_PTR,
 			 BT_PTR, BT_INT, BT_INT)
 
-DEF_FUNCTION_TYPE_VAR_12 (BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR,
-			  BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_PTR, BT_SIZE,
+DEF_FUNCTION_TYPE_VAR_11 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR,
+			  BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE,
 			  BT_PTR, BT_PTR, BT_PTR, BT_INT, BT_INT, BT_INT,
 			  BT_INT, BT_INT)
 
diff --git gcc/c-family/ChangeLog gcc/c-family/ChangeLog
index 45261cd..ffa01c6 100644
--- gcc/c-family/ChangeLog
+++ gcc/c-family/ChangeLog
@@ -1,3 +1,10 @@
+2015-02-24  Thomas Schwinge  <thomas@codesourcery.com>
+
+	PR libgomp/64625
+	* c-common.c (DEF_FUNCTION_TYPE_VAR_8, DEF_FUNCTION_TYPE_VAR_12):
+	Remove macros.
+	(DEF_FUNCTION_TYPE_VAR_7, DEF_FUNCTION_TYPE_VAR_11): New macros.
+
 2015-02-16  Marek Polacek  <polacek@redhat.com>
 
 	PR c/65066
diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c
index 3c18d1c..8c23e09 100644
--- gcc/c-family/c-common.c
+++ gcc/c-family/c-common.c
@@ -5236,12 +5236,11 @@ enum c_builtin_type
 #define DEF_FUNCTION_TYPE_VAR_3(NAME, RETURN, ARG1, ARG2, ARG3) NAME,
 #define DEF_FUNCTION_TYPE_VAR_4(NAME, RETURN, ARG1, ARG2, ARG3, ARG4) NAME,
 #define DEF_FUNCTION_TYPE_VAR_5(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5) \
-  NAME,
-#define DEF_FUNCTION_TYPE_VAR_8(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				ARG6, ARG7, ARG8) NAME,
-#define DEF_FUNCTION_TYPE_VAR_12(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11,       \
-				 ARG12) NAME,
+				NAME,
+#define DEF_FUNCTION_TYPE_VAR_7(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				ARG6, ARG7) NAME,
+#define DEF_FUNCTION_TYPE_VAR_11(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11) NAME,
 #define DEF_POINTER_TYPE(NAME, TYPE) NAME,
 #include "builtin-types.def"
 #undef DEF_PRIMITIVE_TYPE
@@ -5260,8 +5259,8 @@ enum c_builtin_type
 #undef DEF_FUNCTION_TYPE_VAR_3
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
-#undef DEF_FUNCTION_TYPE_VAR_8
-#undef DEF_FUNCTION_TYPE_VAR_12
+#undef DEF_FUNCTION_TYPE_VAR_7
+#undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE
   BT_LAST
 };
@@ -5354,14 +5353,13 @@ c_define_builtins (tree va_list_ref_type_node, tree va_list_arg_type_node)
   def_fn_type (ENUM, RETURN, 1, 4, ARG1, ARG2, ARG3, ARG4);
 #define DEF_FUNCTION_TYPE_VAR_5(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5) \
   def_fn_type (ENUM, RETURN, 1, 5, ARG1, ARG2, ARG3, ARG4, ARG5);
-#define DEF_FUNCTION_TYPE_VAR_8(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				ARG6, ARG7, ARG8)			    \
-  def_fn_type (ENUM, RETURN, 1, 8, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6,      \
-	       ARG7, ARG8);
-#define DEF_FUNCTION_TYPE_VAR_12(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11, ARG12) \
-  def_fn_type (ENUM, RETURN, 1, 12, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6,      \
-	       ARG7, ARG8, ARG9, ARG10, ARG11, ARG12);
+#define DEF_FUNCTION_TYPE_VAR_7(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				ARG6, ARG7)				\
+  def_fn_type (ENUM, RETURN, 1, 7, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6, ARG7);
+#define DEF_FUNCTION_TYPE_VAR_11(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11) \
+  def_fn_type (ENUM, RETURN, 1, 11, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6,      \
+	       ARG7, ARG8, ARG9, ARG10, ARG11);
 #define DEF_POINTER_TYPE(ENUM, TYPE) \
   builtin_types[(int) ENUM] = build_pointer_type (builtin_types[(int) TYPE]);
 
@@ -5383,8 +5381,8 @@ c_define_builtins (tree va_list_ref_type_node, tree va_list_arg_type_node)
 #undef DEF_FUNCTION_TYPE_VAR_3
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
-#undef DEF_FUNCTION_TYPE_VAR_8
-#undef DEF_FUNCTION_TYPE_VAR_12
+#undef DEF_FUNCTION_TYPE_VAR_7
+#undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE
   builtin_types[(int) BT_LAST] = NULL_TREE;
 
diff --git gcc/fortran/ChangeLog gcc/fortran/ChangeLog
index d80c59b..100e04d 100644
--- gcc/fortran/ChangeLog
+++ gcc/fortran/ChangeLog
@@ -1,3 +1,17 @@
+2015-02-24  Thomas Schwinge  <thomas@codesourcery.com>
+
+	PR libgomp/64625
+	* f95-lang.c (DEF_FUNCTION_TYPE_VAR_8, DEF_FUNCTION_TYPE_VAR_12):
+	Remove macros.
+	(DEF_FUNCTION_TYPE_VAR_7, DEF_FUNCTION_TYPE_VAR_11): New macros.
+	* types.def (BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR_INT_INT_VAR)
+	(BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR):
+	Remove function types.
+	(BT_FN_VOID_INT_SIZE_PTR_PTR_PTR)
+	(BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_INT_INT_VAR)
+	(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR):
+	New function types.
+
 2015-02-22  Bernd Edlinger  <bernd.edlinger@hotmail.de>
 
 	PR fortran/64980
diff --git gcc/fortran/f95-lang.c gcc/fortran/f95-lang.c
index 94f7479..de9c813 100644
--- gcc/fortran/f95-lang.c
+++ gcc/fortran/f95-lang.c
@@ -673,10 +673,10 @@ gfc_init_builtin_functions (void)
 			    ARG6, ARG7, ARG8) NAME,
 #define DEF_FUNCTION_TYPE_VAR_0(NAME, RETURN) NAME,
 #define DEF_FUNCTION_TYPE_VAR_2(NAME, RETURN, ARG1, ARG2) NAME,
-#define DEF_FUNCTION_TYPE_VAR_8(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				ARG6, ARG7, ARG8) NAME,
-#define DEF_FUNCTION_TYPE_VAR_12(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11, ARG12) NAME,
+#define DEF_FUNCTION_TYPE_VAR_7(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				ARG6, ARG7) NAME,
+#define DEF_FUNCTION_TYPE_VAR_11(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11) NAME,
 #define DEF_POINTER_TYPE(NAME, TYPE) NAME,
 #include "types.def"
 #undef DEF_PRIMITIVE_TYPE
@@ -691,8 +691,8 @@ gfc_init_builtin_functions (void)
 #undef DEF_FUNCTION_TYPE_8
 #undef DEF_FUNCTION_TYPE_VAR_0
 #undef DEF_FUNCTION_TYPE_VAR_2
-#undef DEF_FUNCTION_TYPE_VAR_8
-#undef DEF_FUNCTION_TYPE_VAR_12
+#undef DEF_FUNCTION_TYPE_VAR_7
+#undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE
     BT_LAST
   };
@@ -1133,8 +1133,8 @@ gfc_init_builtin_functions (void)
 					builtin_types[(int) ARG1],     	\
 					builtin_types[(int) ARG2],     	\
 					NULL_TREE);
-#define DEF_FUNCTION_TYPE_VAR_8(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				ARG6, ARG7, ARG8)			\
+#define DEF_FUNCTION_TYPE_VAR_7(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				ARG6, ARG7)				\
   builtin_types[(int) ENUM]						\
     = build_varargs_function_type_list (builtin_types[(int) RETURN],   	\
 					builtin_types[(int) ARG1],     	\
@@ -1144,10 +1144,9 @@ gfc_init_builtin_functions (void)
 					builtin_types[(int) ARG5],	\
 					builtin_types[(int) ARG6],	\
 					builtin_types[(int) ARG7],	\
-					builtin_types[(int) ARG8],	\
 					NULL_TREE);
-#define DEF_FUNCTION_TYPE_VAR_12(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11, ARG12) \
+#define DEF_FUNCTION_TYPE_VAR_11(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11)	\
   builtin_types[(int) ENUM]						\
     = build_varargs_function_type_list (builtin_types[(int) RETURN],   	\
 					builtin_types[(int) ARG1],     	\
@@ -1161,7 +1160,6 @@ gfc_init_builtin_functions (void)
 					builtin_types[(int) ARG9],	\
 					builtin_types[(int) ARG10],	\
 					builtin_types[(int) ARG11],	\
-					builtin_types[(int) ARG12],	\
 					NULL_TREE);
 #define DEF_POINTER_TYPE(ENUM, TYPE)			\
   builtin_types[(int) ENUM]				\
@@ -1179,8 +1177,8 @@ gfc_init_builtin_functions (void)
 #undef DEF_FUNCTION_TYPE_8
 #undef DEF_FUNCTION_TYPE_VAR_0
 #undef DEF_FUNCTION_TYPE_VAR_2
-#undef DEF_FUNCTION_TYPE_VAR_8
-#undef DEF_FUNCTION_TYPE_VAR_12
+#undef DEF_FUNCTION_TYPE_VAR_7
+#undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE
   builtin_types[(int) BT_LAST] = NULL_TREE;
 
diff --git gcc/fortran/types.def gcc/fortran/types.def
index fdae28d..62cac49 100644
--- gcc/fortran/types.def
+++ gcc/fortran/types.def
@@ -163,6 +163,8 @@ DEF_FUNCTION_TYPE_5 (BT_FN_BOOL_LONG_LONG_LONG_LONGPTR_LONGPTR,
 		     BT_PTR_LONG, BT_PTR_LONG)
 DEF_FUNCTION_TYPE_5 (BT_FN_VOID_SIZE_VPTR_PTR_PTR_INT, BT_VOID, BT_SIZE,
 		     BT_VOLATILE_PTR, BT_PTR, BT_PTR, BT_INT)
+DEF_FUNCTION_TYPE_5 (BT_FN_VOID_INT_SIZE_PTR_PTR_PTR,
+		     BT_VOID, BT_INT, BT_SIZE, BT_PTR, BT_PTR, BT_PTR)
 
 DEF_FUNCTION_TYPE_6 (BT_FN_BOOL_LONG_LONG_LONG_LONG_LONGPTR_LONGPTR,
                      BT_BOOL, BT_LONG, BT_LONG, BT_LONG, BT_LONG,
@@ -213,11 +215,11 @@ DEF_FUNCTION_TYPE_VAR_0 (BT_FN_VOID_VAR, BT_VOID)
 
 DEF_FUNCTION_TYPE_VAR_2 (BT_FN_VOID_INT_INT_VAR, BT_VOID, BT_INT, BT_INT)
 
-DEF_FUNCTION_TYPE_VAR_8 (BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR_INT_INT_VAR,
-			 BT_VOID, BT_INT, BT_PTR, BT_SIZE, BT_PTR, BT_PTR,
+DEF_FUNCTION_TYPE_VAR_7 (BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_INT_INT_VAR,
+			 BT_VOID, BT_INT, BT_SIZE, BT_PTR, BT_PTR,
 			 BT_PTR, BT_INT, BT_INT)
 
-DEF_FUNCTION_TYPE_VAR_12 (BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR,
-			  BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_PTR, BT_SIZE,
+DEF_FUNCTION_TYPE_VAR_11 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR,
+			  BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE,
 			  BT_PTR, BT_PTR, BT_PTR, BT_INT, BT_INT, BT_INT,
 			  BT_INT, BT_INT)
diff --git gcc/jit/ChangeLog gcc/jit/ChangeLog
index fa138a9..8bf6751 100644
--- gcc/jit/ChangeLog
+++ gcc/jit/ChangeLog
@@ -1,3 +1,13 @@
+2015-02-24  Thomas Schwinge  <thomas@codesourcery.com>
+
+	PR libgomp/64625
+	* jit-builtins.c (DEF_FUNCTION_TYPE_VAR_8)
+	(DEF_FUNCTION_TYPE_VAR_12): Remove macros.
+	(DEF_FUNCTION_TYPE_VAR_7, DEF_FUNCTION_TYPE_VAR_11): New macros.
+	* jit-builtins.h (DEF_FUNCTION_TYPE_VAR_8)
+	(DEF_FUNCTION_TYPE_VAR_12): Remove macros.
+	(DEF_FUNCTION_TYPE_VAR_7, DEF_FUNCTION_TYPE_VAR_11): New macros.
+
 2015-02-04  David Malcolm  <dmalcolm@redhat.com>
 
 	PR jit/64257
diff --git gcc/jit/jit-builtins.c gcc/jit/jit-builtins.c
index 47b0198..5bf4a67 100644
--- gcc/jit/jit-builtins.c
+++ gcc/jit/jit-builtins.c
@@ -288,19 +288,23 @@ builtins_manager::make_type (enum jit_builtin_type type_id)
 #define DEF_FUNCTION_TYPE_3(ENUM, RETURN, ARG1, ARG2, ARG3) \
       case ENUM: return make_fn_type (ENUM, RETURN, 0, 3, ARG1, ARG2, ARG3);
 #define DEF_FUNCTION_TYPE_4(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4) \
-      case ENUM: return make_fn_type (ENUM, RETURN, 0, 4, ARG1, ARG2, ARG3, ARG4);
-#define DEF_FUNCTION_TYPE_5(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5)	\
-      case ENUM: return make_fn_type (ENUM, RETURN, 0, 5, ARG1, ARG2, ARG3, ARG4, ARG5);
+      case ENUM: return make_fn_type (ENUM, RETURN, 0, 4, ARG1, ARG2, ARG3, \
+				      ARG4);
+#define DEF_FUNCTION_TYPE_5(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5) \
+      case ENUM: return make_fn_type (ENUM, RETURN, 0, 5, ARG1, ARG2, ARG3, \
+				      ARG4, ARG5);
 #define DEF_FUNCTION_TYPE_6(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
 			    ARG6)					\
-      case ENUM: return make_fn_type (ENUM, RETURN, 0, 6, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6);
+      case ENUM: return make_fn_type (ENUM, RETURN, 0, 6, ARG1, ARG2, ARG3, \
+				      ARG4, ARG5, ARG6);
 #define DEF_FUNCTION_TYPE_7(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
 			    ARG6, ARG7)					\
-      case ENUM: return make_fn_type (ENUM, RETURN, 0, 7, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6, ARG7);
+      case ENUM: return make_fn_type (ENUM, RETURN, 0, 7, ARG1, ARG2, ARG3, \
+				      ARG4, ARG5, ARG6, ARG7);
 #define DEF_FUNCTION_TYPE_8(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
 			    ARG6, ARG7, ARG8)				\
-      case ENUM: return make_fn_type (ENUM, RETURN, 0, 8, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6, \
-				      ARG7, ARG8);
+      case ENUM: return make_fn_type (ENUM, RETURN, 0, 8, ARG1, ARG2, ARG3, \
+				      ARG4, ARG5, ARG6, ARG7, ARG8);
 #define DEF_FUNCTION_TYPE_VAR_0(ENUM, RETURN) \
       case ENUM: return make_fn_type (ENUM, RETURN, 1, 0);
 #define DEF_FUNCTION_TYPE_VAR_1(ENUM, RETURN, ARG1) \
@@ -310,18 +314,20 @@ builtins_manager::make_type (enum jit_builtin_type type_id)
 #define DEF_FUNCTION_TYPE_VAR_3(ENUM, RETURN, ARG1, ARG2, ARG3) \
       case ENUM: return make_fn_type (ENUM, RETURN, 1, 3, ARG1, ARG2, ARG3);
 #define DEF_FUNCTION_TYPE_VAR_4(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4) \
-      case ENUM: return make_fn_type (ENUM, RETURN, 1, 4, ARG1, ARG2, ARG3, ARG4);
+      case ENUM: return make_fn_type (ENUM, RETURN, 1, 4, ARG1, ARG2, ARG3, \
+				      ARG4);
 #define DEF_FUNCTION_TYPE_VAR_5(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5) \
-      case ENUM: return make_fn_type (ENUM, RETURN, 1, 5, ARG1, ARG2, ARG3, ARG4, ARG5);
-#define DEF_FUNCTION_TYPE_VAR_8(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				ARG6, ARG7, ARG8) \
-      case ENUM: return make_fn_type (ENUM, RETURN, 1, 8, ARG1, ARG2, ARG3, \
-				      ARG4, ARG5, ARG6, ARG7, ARG8);
-#define DEF_FUNCTION_TYPE_VAR_12(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11, ARG12) \
-      case ENUM: return make_fn_type (ENUM, RETURN, 1, 12, ARG1, ARG2, ARG3, \
+      case ENUM: return make_fn_type (ENUM, RETURN, 1, 5, ARG1, ARG2, ARG3, \
+				      ARG4, ARG5);
+#define DEF_FUNCTION_TYPE_VAR_7(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				ARG6, ARG7)				\
+      case ENUM: return make_fn_type (ENUM, RETURN, 1, 7, ARG1, ARG2, ARG3, \
+				      ARG4, ARG5, ARG6, ARG7);
+#define DEF_FUNCTION_TYPE_VAR_11(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11) \
+      case ENUM: return make_fn_type (ENUM, RETURN, 1, 11, ARG1, ARG2, ARG3, \
 				      ARG4, ARG5, ARG6, ARG7, ARG8, ARG9, \
-				      ARG10, ARG11, ARG12);
+				      ARG10, ARG11);
 #define DEF_POINTER_TYPE(ENUM, TYPE) \
       case ENUM: return make_ptr_type (ENUM, TYPE);
 
@@ -343,8 +349,8 @@ builtins_manager::make_type (enum jit_builtin_type type_id)
 #undef DEF_FUNCTION_TYPE_VAR_3
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
-#undef DEF_FUNCTION_TYPE_VAR_8
-#undef DEF_FUNCTION_TYPE_VAR_12
+#undef DEF_FUNCTION_TYPE_VAR_7
+#undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE
 
     default:
diff --git gcc/jit/jit-builtins.h gcc/jit/jit-builtins.h
index 9101aaf..fdf1323 100644
--- gcc/jit/jit-builtins.h
+++ gcc/jit/jit-builtins.h
@@ -37,21 +37,23 @@ enum jit_builtin_type
 #define DEF_FUNCTION_TYPE_3(NAME, RETURN, ARG1, ARG2, ARG3) NAME,
 #define DEF_FUNCTION_TYPE_4(NAME, RETURN, ARG1, ARG2, ARG3, ARG4) NAME,
 #define DEF_FUNCTION_TYPE_5(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5) NAME,
-#define DEF_FUNCTION_TYPE_6(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6) NAME,
-#define DEF_FUNCTION_TYPE_7(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6, ARG7) NAME,
-#define DEF_FUNCTION_TYPE_8(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6, ARG7, ARG8) NAME,
+#define DEF_FUNCTION_TYPE_6(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+			    ARG6) NAME,
+#define DEF_FUNCTION_TYPE_7(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+			    ARG6, ARG7) NAME,
+#define DEF_FUNCTION_TYPE_8(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+			    ARG6, ARG7, ARG8) NAME,
 #define DEF_FUNCTION_TYPE_VAR_0(NAME, RETURN) NAME,
 #define DEF_FUNCTION_TYPE_VAR_1(NAME, RETURN, ARG1) NAME,
 #define DEF_FUNCTION_TYPE_VAR_2(NAME, RETURN, ARG1, ARG2) NAME,
 #define DEF_FUNCTION_TYPE_VAR_3(NAME, RETURN, ARG1, ARG2, ARG3) NAME,
 #define DEF_FUNCTION_TYPE_VAR_4(NAME, RETURN, ARG1, ARG2, ARG3, ARG4) NAME,
 #define DEF_FUNCTION_TYPE_VAR_5(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5) \
-  NAME,
-#define DEF_FUNCTION_TYPE_VAR_8(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				ARG6, ARG7, ARG8) NAME,
-#define DEF_FUNCTION_TYPE_VAR_12(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11, ARG12) \
-  NAME,
+				NAME,
+#define DEF_FUNCTION_TYPE_VAR_7(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				ARG6, ARG7) NAME,
+#define DEF_FUNCTION_TYPE_VAR_11(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11) NAME,
 #define DEF_POINTER_TYPE(NAME, TYPE) NAME,
 #include "builtin-types.def"
 #undef DEF_PRIMITIVE_TYPE
@@ -70,8 +72,8 @@ enum jit_builtin_type
 #undef DEF_FUNCTION_TYPE_VAR_3
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
-#undef DEF_FUNCTION_TYPE_VAR_8
-#undef DEF_FUNCTION_TYPE_VAR_12
+#undef DEF_FUNCTION_TYPE_VAR_7
+#undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE
   BT_LAST
 }; /* enum jit_builtin_type */
diff --git gcc/lto/ChangeLog gcc/lto/ChangeLog
index 87f1988..65d5160 100644
--- gcc/lto/ChangeLog
+++ gcc/lto/ChangeLog
@@ -1,3 +1,10 @@
+2015-02-24  Thomas Schwinge  <thomas@codesourcery.com>
+
+	PR libgomp/64625
+	* lto-lang.c (DEF_FUNCTION_TYPE_VAR_8, DEF_FUNCTION_TYPE_VAR_12):
+	Remove macros.
+	(DEF_FUNCTION_TYPE_VAR_7, DEF_FUNCTION_TYPE_VAR_11): New macros.
+
 2015-02-03  Jan Hubicka  <hubicka@ucw.cz>
 
 	* lto-symtab.c (lto_cgraph_replace_node): Maintain merged flag.
diff --git gcc/lto/lto-lang.c gcc/lto/lto-lang.c
index aa474e0..073bf17 100644
--- gcc/lto/lto-lang.c
+++ gcc/lto/lto-lang.c
@@ -176,12 +176,11 @@ enum lto_builtin_type
 #define DEF_FUNCTION_TYPE_VAR_3(NAME, RETURN, ARG1, ARG2, ARG3) NAME,
 #define DEF_FUNCTION_TYPE_VAR_4(NAME, RETURN, ARG1, ARG2, ARG3, ARG4) NAME,
 #define DEF_FUNCTION_TYPE_VAR_5(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG6) \
-  NAME,
-#define DEF_FUNCTION_TYPE_VAR_8(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				ARG6, ARG7, ARG8) NAME,
-#define DEF_FUNCTION_TYPE_VAR_12(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11,       \
-				 ARG12) NAME,
+				NAME,
+#define DEF_FUNCTION_TYPE_VAR_7(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				ARG6, ARG7) NAME,
+#define DEF_FUNCTION_TYPE_VAR_11(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11) NAME,
 #define DEF_POINTER_TYPE(NAME, TYPE) NAME,
 #include "builtin-types.def"
 #undef DEF_PRIMITIVE_TYPE
@@ -200,8 +199,8 @@ enum lto_builtin_type
 #undef DEF_FUNCTION_TYPE_VAR_3
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
-#undef DEF_FUNCTION_TYPE_VAR_8
-#undef DEF_FUNCTION_TYPE_VAR_12
+#undef DEF_FUNCTION_TYPE_VAR_7
+#undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE
   BT_LAST
 };
@@ -686,14 +685,13 @@ lto_define_builtins (tree va_list_ref_type_node ATTRIBUTE_UNUSED,
   def_fn_type (ENUM, RETURN, 1, 4, ARG1, ARG2, ARG3, ARG4);
 #define DEF_FUNCTION_TYPE_VAR_5(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5) \
   def_fn_type (ENUM, RETURN, 1, 5, ARG1, ARG2, ARG3, ARG4, ARG5);
-#define DEF_FUNCTION_TYPE_VAR_8(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				ARG6, ARG7, ARG8)			    \
-  def_fn_type (ENUM, RETURN, 1, 8, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6,      \
-	       ARG7, ARG8);
-#define DEF_FUNCTION_TYPE_VAR_12(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11, ARG12) \
-  def_fn_type (ENUM, RETURN, 1, 12, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6,      \
-	       ARG7, ARG8, ARG9, ARG10, ARG11, ARG12);
+#define DEF_FUNCTION_TYPE_VAR_7(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				ARG6, ARG7)				\
+  def_fn_type (ENUM, RETURN, 1, 7, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6, ARG7);
+#define DEF_FUNCTION_TYPE_VAR_11(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11)	\
+  def_fn_type (ENUM, RETURN, 1, 11, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6,	\
+	       ARG7, ARG8, ARG9, ARG10, ARG11);
 #define DEF_POINTER_TYPE(ENUM, TYPE) \
   builtin_types[(int) ENUM] = build_pointer_type (builtin_types[(int) TYPE]);
 
@@ -715,8 +713,8 @@ lto_define_builtins (tree va_list_ref_type_node ATTRIBUTE_UNUSED,
 #undef DEF_FUNCTION_TYPE_VAR_3
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
-#undef DEF_FUNCTION_TYPE_VAR_8
-#undef DEF_FUNCTION_TYPE_VAR_12
+#undef DEF_FUNCTION_TYPE_VAR_7
+#undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE
   builtin_types[(int) BT_LAST] = NULL_TREE;
 
diff --git gcc/omp-builtins.def gcc/omp-builtins.def
index 6aea7b7..50f1321 100644
--- gcc/omp-builtins.def
+++ gcc/omp-builtins.def
@@ -32,17 +32,17 @@ along with GCC; see the file COPYING3.  If not see
 DEF_GOACC_BUILTIN (BUILT_IN_ACC_GET_DEVICE_TYPE, "acc_get_device_type",
 		   BT_FN_INT, ATTR_NOTHROW_LIST)
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_DATA_START, "GOACC_data_start",
-		   BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST)
+		   BT_FN_VOID_INT_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST)
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_DATA_END, "GOACC_data_end",
 		   BT_FN_VOID, ATTR_NOTHROW_LIST)
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_ENTER_EXIT_DATA, "GOACC_enter_exit_data",
-		   BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR_INT_INT_VAR,
+		   BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_INT_INT_VAR,
 		   ATTR_NOTHROW_LIST)
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_PARALLEL, "GOACC_parallel",
-		   BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR,
+		   BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR,
 		   ATTR_NOTHROW_LIST)
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_UPDATE, "GOACC_update",
-		   BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR_INT_INT_VAR,
+		   BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_INT_INT_VAR,
 		   ATTR_NOTHROW_LIST)
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_WAIT, "GOACC_wait",
 		   BT_FN_VOID_INT_INT_VAR,


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
  2015-02-24 12:49                   ` Julian Brown
@ 2015-02-25  9:54                     ` Thomas Schwinge
  2015-02-25 12:17                       ` Julian Brown
                                         ` (2 more replies)
  0 siblings, 3 replies; 92+ messages in thread
From: Thomas Schwinge @ 2015-02-25  9:54 UTC (permalink / raw)
  To: Julian Brown, Ilya Verbin, Jakub Jelinek; +Cc: gcc-patches, Kirill Yukhin

[-- Attachment #1: Type: text/plain, Size: 9291 bytes --]

Hi!

On Tue, 24 Feb 2015 11:29:51 +0000, Julian Brown <julian@codesourcery.com> wrote:
> On Wed, 4 Feb 2015 15:05:45 +0000
> Julian Brown <julian@codesourcery.com> wrote:
> 
> > The major changes are: [...]

Thanks for looking into this!

> This is a version of the previously-posted patch to rework
> initialisation and support the proposed load/unload hooks, merged to
> gomp4 branch and tested alongside the two patches (from
> https://gcc.gnu.org/wiki/Offloading#nvptx_Offloading):
> 
> http://news.gmane.org/find-root.php?message_id=%3C20150218100035.GF1746%40tucnak.redhat.com%3E
> 
> http://news.gmane.org/find-root.php?message_id=%3C546CF508.9010807%40codesourcery.com%3E
> 
> As well as Ilya Verbin's patch:
> 
> https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01605.html

(I also added
<http://news.gmane.org/find-root.php?message_id=%3C20141115000346.GF40445%40msticlxl57.ims.intel.com%3E>
to the mix.)

> Test results look OK, barring a suspected harness issue (lib-83
> failing with a timeout for nvptx

Yes; Jim's rewriting the timing code.

However, I'm seeing a class of testsuite regressions: all variants of
libgomp.oacc-fortran/lib-5.f90 and libgomp.oacc-fortran/lib-7.f90 FAIL:
»libgomp: cuMemFreeHost error: invalid value«.  I see these two test
cases contain a lot of acc_get_num_devices and similar calls -- I've been
testing this on our nvidiak20-2 system, which contains two Nvidia K20
cards, so maybe there's something wrong in that regard.  (But why is this
failing only for Fortran -- are we missing C/C++ tests in that area?)
Can you have a look, or want me to?

> OK for gomp4 branch? I could commit Ilya's patch there too if so.

I'll leave the decision to Jakub, but, what about trunk?  As Ilya
indicated in
<http://news.gmane.org/find-root.php?message_id=%3C20150116231632.GB48380%40msticlxl57.ims.intel.com%3E>,
(at least part of) these patches are fixing a regression with offloading
From shared libraries.  (And maybe the rest qualifies as fixes and
extensions to new code (offloading), so no danger to cause any
regressions compared to the last GCC release?)


I have not reviewed all your changes; just a few comments:

> --- a/gcc/config/nvptx/mkoffload.c
> +++ b/gcc/config/nvptx/mkoffload.c
> @@ -850,16 +851,17 @@ process (FILE *in, FILE *out)

>    fprintf (out, "static const void *target_data[] = {\n");
> -  fprintf (out, "  ptx_code, var_mappings, func_mappings\n");
> +  fprintf (out, "  ptx_code, (void*) %u, var_mappings, (void*) %u, "
> +		"func_mappings\n", nvars, nfuncs);
>    fprintf (out, "};\n\n");

I wondered if it's maybe more elegant to just separate those by NULL
delimiters instead of the size integers casted to void * (spaces
missing)?  But then, that'd need "double scanning" in the consumer,
libgomp/plugin/plugin-nvptx.c:GOMP_OFFLOAD_load_image, because we need to
allocate an appropriately sized array, so maybe your more expressive
approach is better indeed.

> --- a/libgomp/oacc-async.c
> +++ b/libgomp/oacc-async.c
> @@ -34,44 +34,68 @@
>  int
>  acc_async_test (int async)
>  {
> +  struct goacc_thread *thr = goacc_thread ();
> +
>    if (async < acc_async_sync)
>      gomp_fatal ("invalid async argument: %d", async);
>  
> -  return base_dev->openacc.async_test_func (async);
> +  assert (thr->dev);
> +
> +  return thr->dev->openacc.async_test_func (async);
>  }

(Here, and in several other places: I would have placed the declaration
of thr and its initialization just before its first use, but then, no
need to change that now.)

Here, and in several other places: is this code conforming to the OpenACC
specification?  Do we need to (lazily) initialize in all these places, or
in goacc_thread, or gracefully fail (see below) if not initialized
(basically in all places where you currently assert (thr->dev)?

    #include <openacc.h>
    
    int main(int argc, char *argv[])
    {
      return acc_async_test(0);
    }

    $ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/ -Bbuild-gcc/x86_64-unknown-linux-gnu/./libgomp/ -Bbuild-gcc/x86_64-unknown-linux-gnu/./libgomp/.libs -Ibuild-gcc/x86_64-unknown-linux-gnu/./libgomp -Isource-gcc/libgomp -Binstall/offload-nvptx-none/libexec/gcc/x86_64-unknown-linux-gnu/5.0.0 -Binstall/offload-nvptx-none/bin -Binstall/offload-x86_64-intelmicemul-linux-gnu/libexec/gcc/x86_64-unknown-linux-gnu/5.0.0 -Binstall/offload-x86_64-intelmicemul-linux-gnu/bin -Lbuild-gcc/x86_64-unknown-linux-gnu/./libgomp/.libs -Wl,-rpath,build-gcc/x86_64-unknown-linux-gnu/./libgomp/.libs -Wall ../a.c -fopenacc -g
    $ gdb -q a.out 
    Reading symbols from a.out...done.
    (gdb) r
    Starting program: [...]/a.out 
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
    
    Program received signal SIGSEGV, Segmentation fault.
    acc_async_test (async=0) at [...]/source-gcc/libgomp/oacc-async.c:42
    42        assert (thr->dev);

Also, I'm not sure what the expected outcome of this code sequence is:

    acc_init(acc_device_nvidia);
    acc_shutdown(acc_device_nvidia);
    acc_async_test(0);

    a.out: [...]/source-gcc/libgomp/oacc-async.c:42: acc_async_test: Assertion `thr->dev' failed.
    Aborted (core dumped)

If the OpenACC specification can be read such that all this indeed is
"undefined behavior", then aborting/crashing is OK, of course.

> --- a/libgomp/oacc-cuda.c
> +++ b/libgomp/oacc-cuda.c
> @@ -34,51 +34,53 @@
>  void *
>  acc_get_current_cuda_device (void)
>  {
> -  void *p = NULL;
> +  struct goacc_thread *thr = goacc_thread ();
>  
> -  if (base_dev && base_dev->openacc.cuda.get_current_device_func)
> -    p = base_dev->openacc.cuda.get_current_device_func ();
> +  if (thr && thr->dev && thr->dev->openacc.cuda.get_current_device_func)
> +    return thr->dev->openacc.cuda.get_current_device_func ();
>  
> -  return p;
> +  return NULL;
>  }

Here, and in other places, it looks as if we'd fail gracefully.

>  int
>  acc_set_cuda_stream (int async, void *stream)
>  {
> -  int s = -1;
> +  struct goacc_thread *thr;
>  
>    if (async < 0 || stream == NULL)
>      return 0;
>  
>    goacc_lazy_initialize ();
>  
> -  if (base_dev && base_dev->openacc.cuda.set_stream_func)
> -    s = base_dev->openacc.cuda.set_stream_func (async, stream);
> +  thr = goacc_thread ();
> +
> +  if (thr && thr->dev && thr->dev->openacc.cuda.set_stream_func)
> +    return thr->dev->openacc.cuda.set_stream_func (async, stream);
>  
> -  return s;
> +  return -1;
>  }

This one does have a goacc_lazy_initialize call.

> --- a/libgomp/oacc-init.c
> +++ b/libgomp/oacc-init.c

> +static const char *
> +name_of_acc_device_t (enum acc_device_t type)
> +{
> +  switch (type)
> +    {
> +    case acc_device_none: return "none";
> +    case acc_device_default: return "default";
> +    case acc_device_host: return "host";
> +    case acc_device_host_nonshm: return "host_nonshm";
> +    case acc_device_not_host: return "not_host";
> +    case acc_device_nvidia: return "nvidia";
> +    default: return "<unknown>";
> +    }
> +}

I'd have made the default case abort.  (Does a missing case actually
trigger a compile-time error?)

> --- a/libgomp/plugin/plugin-nvptx.c
> +++ b/libgomp/plugin/plugin-nvptx.c
> @@ -46,6 +46,7 @@
>  #include <dlfcn.h>
>  #include <unistd.h>
>  #include <assert.h>
> +#include <pthread.h>

That's already being included.

> -/* Initialize the device.  */
> -static int
> +/* Initialize the device.  Return TRUE on success, else FALSE.  PTX_DEV_LOCK
> +   should be locked on entry and remains locked on exit.  */
> +static bool
>  nvptx_init (void)
>  {
>    CUresult r;
>    int rc;
> +  int ndevs;
>  
> -  if (ptx_inited)
> -    return nvptx_get_num_devices ();
> +  if (instantiated_devices != 0)
> +    return true;

>  void
> -GOMP_OFFLOAD_register_image (void *host_table, void *target_data)
> +GOMP_OFFLOAD_init_device (int n)
>  {

> +  if (!nvptx_init ()
> +      || (instantiated_devices & (1 << n)) != 0)
> +    {
> +      pthread_mutex_unlock (&ptx_dev_lock);
> +      return NULL;

GOMP_OFFLOAD_init_device has a void return type.  (Why doesn't this cause
a compile warning/error?)

> +  ptx_devices[n] = nvptx_open_device (n);
> +  instantiated_devices |= 1 << n;

Here, and also in several other places: do we have to care about big
values of n?

>  int
> -GOMP_OFFLOAD_get_table (int n __attribute__ ((unused)),
> -			struct mapping_table **tablep)
> +GOMP_OFFLOAD_load_image (int ord, void *target_data,
> +			 struct addr_pair **target_table)
>  {
>    CUmodule module;
> -  void **fn_table;
> -  char **fn_names;
> -  int fn_entries, i;
> +  char **fn_names, **var_names;
> +  unsigned int fn_entries, var_entries, i, j;
>    CUresult r;
>    struct targ_fn_descriptor *targ_fns;
> +  void **img_header = (void **) target_data;
> +  struct ptx_image_data *new_image;
> +
> +  nvptx_attach_host_thread_to_device (ord);
>  
>    if (nvptx_init () <= 0)
>      return 0;

Need to adapt to the interface change of nvptx_init.  Also, missing
locking of ptx_dev_lock.


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
  2015-02-25  9:54                     ` libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch) Thomas Schwinge
@ 2015-02-25 12:17                       ` Julian Brown
  2015-02-25 12:23                       ` Ilya Verbin
  2015-02-26 17:31                       ` Ilya Verbin
  2 siblings, 0 replies; 92+ messages in thread
From: Julian Brown @ 2015-02-25 12:17 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: Ilya Verbin, Jakub Jelinek, gcc-patches, Kirill Yukhin

On Wed, 25 Feb 2015 10:36:08 +0100
Thomas Schwinge <thomas@codesourcery.com> wrote:

> Hi!
> 
> On Tue, 24 Feb 2015 11:29:51 +0000, Julian Brown
> <julian@codesourcery.com> wrote:
> > Test results look OK, barring a suspected harness issue (lib-83
> > failing with a timeout for nvptx
> 
> However, I'm seeing a class of testsuite regressions: all variants of
> libgomp.oacc-fortran/lib-5.f90 and libgomp.oacc-fortran/lib-7.f90
> FAIL: »libgomp: cuMemFreeHost error: invalid value«.  I see these two
> test cases contain a lot of acc_get_num_devices and similar calls --
> I've been testing this on our nvidiak20-2 system, which contains two
> Nvidia K20 cards, so maybe there's something wrong in that regard.
> (But why is this failing only for Fortran -- are we missing C/C++
> tests in that area?) Can you have a look, or want me to?

I can have a look at that.

> > --- a/gcc/config/nvptx/mkoffload.c
> > +++ b/gcc/config/nvptx/mkoffload.c
> > @@ -850,16 +851,17 @@ process (FILE *in, FILE *out)
> 
> >    fprintf (out, "static const void *target_data[] = {\n");
> > -  fprintf (out, "  ptx_code, var_mappings, func_mappings\n");
> > +  fprintf (out, "  ptx_code, (void*) %u, var_mappings, (void*) %u,
> > "
> > +		"func_mappings\n", nvars, nfuncs);
> >    fprintf (out, "};\n\n");
> 
> I wondered if it's maybe more elegant to just separate those by NULL
> delimiters instead of the size integers casted to void * (spaces
> missing)?  But then, that'd need "double scanning" in the consumer,
> libgomp/plugin/plugin-nvptx.c:GOMP_OFFLOAD_load_image, because we
> need to allocate an appropriately sized array, so maybe your more
> expressive approach is better indeed.

Yeah, I considered both: there's probably not much to choose between
the approaches. They use the same amount of space.

> > --- a/libgomp/oacc-async.c
> > +++ b/libgomp/oacc-async.c
> > @@ -34,44 +34,68 @@
> >  int
> >  acc_async_test (int async)
> >  {
> > +  struct goacc_thread *thr = goacc_thread ();
> > +
> >    if (async < acc_async_sync)
> >      gomp_fatal ("invalid async argument: %d", async);
> >  
> > -  return base_dev->openacc.async_test_func (async);
> > +  assert (thr->dev);
> > +
> > +  return thr->dev->openacc.async_test_func (async);
> >  }

> Here, and in several other places: is this code conforming to the
> OpenACC specification?  Do we need to (lazily) initialize in all
> these places, or in goacc_thread, or gracefully fail (see below) if
> not initialized (basically in all places where you currently assert
> (thr->dev)?
> 
>     #include <openacc.h>
>     
>     int main(int argc, char *argv[])
>     {
>       return acc_async_test(0);
>     }
> 
> [sigsegv]

Whether it conforms to the spec or not is a hard question to answer,
because a lot of behaviour is left undefined. But here are two
possibly-useful made-up guidelines:

1. Does the program work the same with OpenACC disabled?

2. Does some strange use of OpenACC functionality (including library
   calls, etc.) probably indicate user error?

Much of the lazy initialisation code is there so that (1) can be true
-- i.e., a program can use OpenACC directives without making an
explicit call to "acc_init" or other API-specific initialisation code.

But this case is an explicit call to the OpenACC runtime library, so the
program can't work without -fopenacc enabled, so we can follow
guideline (2) instead. And in this case, it's meaningless to test for
completion of async operation when no device is active.

Of course though, this should be an actual error rather than a crash.
But, I don't think we want to lazily-initialise here.

> Also, I'm not sure what the expected outcome of this code sequence is:
> 
>     acc_init(acc_device_nvidia);
>     acc_shutdown(acc_device_nvidia);
>     acc_async_test(0);
> 
>     a.out: [...]/source-gcc/libgomp/oacc-async.c:42: acc_async_test:
> Assertion `thr->dev' failed. Aborted (core dumped)
> 
> If the OpenACC specification can be read such that all this indeed is
> "undefined behavior", then aborting/crashing is OK, of course.

Again, this would probably indicate user error in a real program, so it
should raise a (real) error message.

> > --- a/libgomp/oacc-cuda.c
> > +++ b/libgomp/oacc-cuda.c
> > @@ -34,51 +34,53 @@
> >  void *
> >  acc_get_current_cuda_device (void)
> >  {
> > -  void *p = NULL;
> > +  struct goacc_thread *thr = goacc_thread ();
> >  
> > -  if (base_dev && base_dev->openacc.cuda.get_current_device_func)
> > -    p = base_dev->openacc.cuda.get_current_device_func ();
> > +  if (thr && thr->dev &&
> > thr->dev->openacc.cuda.get_current_device_func)
> > +    return thr->dev->openacc.cuda.get_current_device_func ();
> >  
> > -  return p;
> > +  return NULL;
> >  }
> 
> Here, and in other places, it looks as if we'd fail gracefully.

Not sure about this (maybe it should be an error too?), but...

> >  int
> >  acc_set_cuda_stream (int async, void *stream)
> >  {
> > -  int s = -1;
> > +  struct goacc_thread *thr;
> >  
> >    if (async < 0 || stream == NULL)
> >      return 0;
> >  
> >    goacc_lazy_initialize ();
> >  
> > -  if (base_dev && base_dev->openacc.cuda.set_stream_func)
> > -    s = base_dev->openacc.cuda.set_stream_func (async, stream);
> > +  thr = goacc_thread ();
> > +
> > +  if (thr && thr->dev && thr->dev->openacc.cuda.set_stream_func)
> > +    return thr->dev->openacc.cuda.set_stream_func (async, stream);
> >  
> > -  return s;
> > +  return -1;
> >  }
> 
> This one does have a goacc_lazy_initialize call.

This one might indeed be a reasonable way of initialising the OpenACC
runtime: the user is already using CUDA or some other CUDA consumer,
and wishes to use OpenACC also. So this can reasonably be the first
"OpenACC" call in a program.

> > --- a/libgomp/oacc-init.c
> > +++ b/libgomp/oacc-init.c
> 
> > +static const char *
> > +name_of_acc_device_t (enum acc_device_t type)
> > +{
> > +  switch (type)
> > +    {
> > +    case acc_device_none: return "none";
> > +    case acc_device_default: return "default";
> > +    case acc_device_host: return "host";
> > +    case acc_device_host_nonshm: return "host_nonshm";
> > +    case acc_device_not_host: return "not_host";
> > +    case acc_device_nvidia: return "nvidia";
> > +    default: return "<unknown>";
> > +    }
> > +}
> 
> I'd have made the default case abort.  (Does a missing case actually
> trigger a compile-time error?)

We're fixing the list of available offloading plugins at build time,
yes? In that case failing on the default case is reasonable.

> > +  ptx_devices[n] = nvptx_open_device (n);
> > +  instantiated_devices |= 1 << n;
> 
> Here, and also in several other places: do we have to care about big
> values of n?

I wondered that: we could equally just use the ptx_devices array and
remove the bitmask. I'll fix that.

> >  int
> > -GOMP_OFFLOAD_get_table (int n __attribute__ ((unused)),
> > -			struct mapping_table **tablep)
> > +GOMP_OFFLOAD_load_image (int ord, void *target_data,
> > +			 struct addr_pair **target_table)
> >  {
> >    CUmodule module;
> > -  void **fn_table;
> > -  char **fn_names;
> > -  int fn_entries, i;
> > +  char **fn_names, **var_names;
> > +  unsigned int fn_entries, var_entries, i, j;
> >    CUresult r;
> >    struct targ_fn_descriptor *targ_fns;
> > +  void **img_header = (void **) target_data;
> > +  struct ptx_image_data *new_image;
> > +
> > +  nvptx_attach_host_thread_to_device (ord);
> >  
> >    if (nvptx_init () <= 0)
> >      return 0;
> 
> Need to adapt to the interface change of nvptx_init.  Also, missing
> locking of ptx_dev_lock.

and these other couple of bits too.

Thanks,

Julian

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
  2015-02-25  9:54                     ` libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch) Thomas Schwinge
  2015-02-25 12:17                       ` Julian Brown
@ 2015-02-25 12:23                       ` Ilya Verbin
  2015-02-26 17:31                       ` Ilya Verbin
  2 siblings, 0 replies; 92+ messages in thread
From: Ilya Verbin @ 2015-02-25 12:23 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: Julian Brown, Jakub Jelinek, gcc-patches, Kirill Yukhin

On Wed, Feb 25, 2015 at 10:36:08 +0100, Thomas Schwinge wrote:
> > Julian Brown <julian@codesourcery.com> wrote:
> > OK for gomp4 branch? I could commit Ilya's patch there too if so.
> 
> I'll leave the decision to Jakub, but, what about trunk?  As Ilya
> indicated in
> <http://news.gmane.org/find-root.php?message_id=%3C20150116231632.GB48380%40msticlxl57.ims.intel.com%3E>,
> (at least part of) these patches are fixing a regression with offloading
> From shared libraries.  (And maybe the rest qualifies as fixes and
> extensions to new code (offloading), so no danger to cause any
> regressions compared to the last GCC release?)

BTW, when I removed calls to gomp_init_tables in  <https://gcc.gnu.org/ml/gcc-patches/2015-01/msg02275.html>,
I could accidentally remove some necessary gomp_mutex_lock/unlock.
Also GOMP_offload_[un]register require some mutexes, as noted by Jakub.
I'm going to fix this.  So, I think we should commit all dependent patches to
gomp4 branch, and I will post a fix for mutexes on top of them.

Thanks,
  -- Ilya

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
  2015-02-25  9:54                     ` libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch) Thomas Schwinge
  2015-02-25 12:17                       ` Julian Brown
  2015-02-25 12:23                       ` Ilya Verbin
@ 2015-02-26 17:31                       ` Ilya Verbin
  2015-03-06 14:01                         ` Ilya Verbin
  2 siblings, 1 reply; 92+ messages in thread
From: Ilya Verbin @ 2015-02-26 17:31 UTC (permalink / raw)
  To: Thomas Schwinge, Julian Brown; +Cc: Jakub Jelinek, gcc-patches, Kirill Yukhin

Hi,

On Wed, Feb 25, 2015 at 10:36:08 +0100, Thomas Schwinge wrote:
> > Julian Brown <julian@codesourcery.com> wrote:
> > This is a version of the previously-posted patch to rework
> > initialisation and support the proposed load/unload hooks, merged to
> > gomp4 branch and tested alongside the two patches (from

Currently the 'struct gomp_memory_mapping' contains 'lock' and 'is_initialized'.
Do you still need them?  Or we can use gomp_device_descr::lock and
is_initialized instead?  If yes, then we can replace the gomp_memory_mapping
structure with a splay_tree, as it was before the OpenACC merge.

Thanks,
  -- Ilya

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
  2015-02-26 17:31                       ` Ilya Verbin
@ 2015-03-06 14:01                         ` Ilya Verbin
  2015-03-09 14:46                           ` Julian Brown
  0 siblings, 1 reply; 92+ messages in thread
From: Ilya Verbin @ 2015-03-06 14:01 UTC (permalink / raw)
  To: Thomas Schwinge, Julian Brown; +Cc: Jakub Jelinek, gcc-patches, Kirill Yukhin

On Thu, Feb 26, 2015 at 20:25:11 +0300, Ilya Verbin wrote:
> On Wed, Feb 25, 2015 at 10:36:08 +0100, Thomas Schwinge wrote:
> > > Julian Brown <julian@codesourcery.com> wrote:
> > > This is a version of the previously-posted patch to rework
> > > initialisation and support the proposed load/unload hooks, merged to
> > > gomp4 branch and tested alongside the two patches (from
> 
> Currently the 'struct gomp_memory_mapping' contains 'lock' and 'is_initialized'.
> Do you still need them?  Or we can use gomp_device_descr::lock and
> is_initialized instead?  If yes, then we can replace the gomp_memory_mapping
> structure with a splay_tree, as it was before the OpenACC merge.

Ping?

Thanks,
  -- Ilya

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
  2015-03-06 14:01                         ` Ilya Verbin
@ 2015-03-09 14:46                           ` Julian Brown
  2015-03-23 19:44                             ` Ilya Verbin
  0 siblings, 1 reply; 92+ messages in thread
From: Julian Brown @ 2015-03-09 14:46 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: Thomas Schwinge, Jakub Jelinek, gcc-patches, Kirill Yukhin

On Fri, 6 Mar 2015 17:01:13 +0300
Ilya Verbin <iverbin@gmail.com> wrote:

> On Thu, Feb 26, 2015 at 20:25:11 +0300, Ilya Verbin wrote:
> > On Wed, Feb 25, 2015 at 10:36:08 +0100, Thomas Schwinge wrote:
> > > > Julian Brown <julian@codesourcery.com> wrote:
> > > > This is a version of the previously-posted patch to rework
> > > > initialisation and support the proposed load/unload hooks,
> > > > merged to gomp4 branch and tested alongside the two patches
> > > > (from
> > 
> > Currently the 'struct gomp_memory_mapping' contains 'lock' and
> > 'is_initialized'. Do you still need them?  Or we can use
> > gomp_device_descr::lock and is_initialized instead?  If yes, then
> > we can replace the gomp_memory_mapping structure with a splay_tree,
> > as it was before the OpenACC merge.
> 
> Ping?

Apologies, I've been distracted with travel and other things. I
suspect, as you suggest, that the gomp_memory_mapping
lock/is_initialized fields may no longer be required. I haven't yet had
time to address that nor all of Thomas's comments on the patch (mostly
breakage with multiple devices), and I'm unlikely to have time this
week either due to vacation...

Thanks,

Julian

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
  2015-03-09 14:46                           ` Julian Brown
@ 2015-03-23 19:44                             ` Ilya Verbin
  2015-03-26 10:07                               ` libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks Thomas Schwinge
  2015-03-26 12:09                               ` libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch) Jakub Jelinek
  0 siblings, 2 replies; 92+ messages in thread
From: Ilya Verbin @ 2015-03-23 19:44 UTC (permalink / raw)
  To: Julian Brown, Thomas Schwinge, Jakub Jelinek; +Cc: gcc-patches, Kirill Yukhin

On Mon, Mar 09, 2015 at 14:45:55 +0000, Julian Brown wrote:
> On Fri, 6 Mar 2015 17:01:13 +0300
> Ilya Verbin <iverbin@gmail.com> wrote:
> 
> > On Thu, Feb 26, 2015 at 20:25:11 +0300, Ilya Verbin wrote:
> > > On Wed, Feb 25, 2015 at 10:36:08 +0100, Thomas Schwinge wrote:
> > > > > Julian Brown <julian@codesourcery.com> wrote:
> > > > > This is a version of the previously-posted patch to rework
> > > > > initialisation and support the proposed load/unload hooks,
> > > > > merged to gomp4 branch and tested alongside the two patches
> > > > > (from
> > > 
> > > Currently the 'struct gomp_memory_mapping' contains 'lock' and
> > > 'is_initialized'. Do you still need them?  Or we can use
> > > gomp_device_descr::lock and is_initialized instead?  If yes, then
> > > we can replace the gomp_memory_mapping structure with a splay_tree,
> > > as it was before the OpenACC merge.
> > 
> > Ping?
> 
> Apologies, I've been distracted with travel and other things. I
> suspect, as you suggest, that the gomp_memory_mapping
> lock/is_initialized fields may no longer be required. I haven't yet had
> time to address that nor all of Thomas's comments on the patch (mostly
> breakage with multiple devices), and I'm unlikely to have time this
> week either due to vacation...

If it is too late for such global changes (rework initialization in libgomp,
change mic and ptx plugins), then here is a small workaround patch to fix
offloading from libraries.  Likely, it will not affect OpenACC programs with one
image.  make check-target-libgomp passed.


	PR libgomp/65338
libgomp/
	* libgomp.h (struct gomp_device_descr): Remove
	offload_regions_registered.
	* oacc-host.c (host_dispatch): Do not initialize
	offload_regions_registered.
	* target.c (gomp_register_image_for_device): Do not check for
	offload_regions_registered.
	(gomp_target_init): Do not initialize offload_regions_registered.


diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 3089401..f45fdba 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -793,9 +793,6 @@ struct gomp_device_descr
   /* Set to true when device is initialized.  */
   bool is_initialized;
 
-  /* True when offload regions have been registered with this device.  */
-  bool offload_regions_registered;
-
   /* OpenACC-specific data and functions.  */
   /* This is mutable because of its mutable data_environ and target_data
      members.  */
diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index 6aeb1e7..2763f44 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -56,7 +56,6 @@ static struct gomp_device_descr host_dispatch =
     .mem_map.is_initialized = false,
     .mem_map.splay_tree.root = NULL,
     .is_initialized = false,
-    .offload_regions_registered = false,
 
     .openacc = {
       .open_device_func = GOMP_OFFLOAD_openacc_open_device,
diff --git a/libgomp/target.c b/libgomp/target.c
index 50baa4d..db1f509 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1035,13 +1035,8 @@ static void
 gomp_register_image_for_device (struct gomp_device_descr *device,
 				struct offload_image_descr *image)
 {
-  if (!device->offload_regions_registered
-      && (device->type == image->type
-	  || device->type == OFFLOAD_TARGET_TYPE_HOST))
-    {
-      device->register_image_func (image->host_table, image->target_data);
-      device->offload_regions_registered = true;
-    }
+  if (device->type == image->type || device->type == OFFLOAD_TARGET_TYPE_HOST)
+    device->register_image_func (image->host_table, image->target_data);
 }
 
 /* This function initializes the runtime needed for offloading.
@@ -1105,7 +1100,6 @@ gomp_target_init (void)
 		current_device.mem_map.is_initialized = false;
 		current_device.mem_map.splay_tree.root = NULL;
 		current_device.is_initialized = false;
-		current_device.offload_regions_registered = false;
 		current_device.openacc.data_environ = NULL;
 		current_device.openacc.target_data = NULL;
 		for (i = 0; i < new_num_devices; i++)


  -- Ilya

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks
  2015-03-23 19:44                             ` Ilya Verbin
@ 2015-03-26 10:07                               ` Thomas Schwinge
  2015-03-26 12:09                               ` libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch) Jakub Jelinek
  1 sibling, 0 replies; 92+ messages in thread
From: Thomas Schwinge @ 2015-03-26 10:07 UTC (permalink / raw)
  To: Ilya Verbin, Julian Brown, Jakub Jelinek; +Cc: gcc-patches, Kirill Yukhin

[-- Attachment #1: Type: text/plain, Size: 4497 bytes --]

Hi!

On Mon, 23 Mar 2015 22:44:39 +0300, Ilya Verbin <iverbin@gmail.com> wrote:
> On Mon, Mar 09, 2015 at 14:45:55 +0000, Julian Brown wrote:
> > On Fri, 6 Mar 2015 17:01:13 +0300
> > Ilya Verbin <iverbin@gmail.com> wrote:
> > 
> > > On Thu, Feb 26, 2015 at 20:25:11 +0300, Ilya Verbin wrote:
> > > > On Wed, Feb 25, 2015 at 10:36:08 +0100, Thomas Schwinge wrote:
> > > > > > Julian Brown <julian@codesourcery.com> wrote:
> > > > > > This is a version of the previously-posted patch to rework
> > > > > > initialisation and support the proposed load/unload hooks,
> > > > > > merged to gomp4 branch and tested alongside the two patches
> > > > > > (from
> > > > 
> > > > Currently the 'struct gomp_memory_mapping' contains 'lock' and
> > > > 'is_initialized'. Do you still need them?  Or we can use
> > > > gomp_device_descr::lock and is_initialized instead?  If yes, then
> > > > we can replace the gomp_memory_mapping structure with a splay_tree,
> > > > as it was before the OpenACC merge.
> > > 
> > > Ping?
> > 
> > Apologies, I've been distracted with travel and other things. I
> > suspect, as you suggest, that the gomp_memory_mapping
> > lock/is_initialized fields may no longer be required. I haven't yet had
> > time to address that nor all of Thomas's comments on the patch (mostly
> > breakage with multiple devices), and I'm unlikely to have time this
> > week either due to vacation...
> 
> If it is too late for such global changes (rework initialization in libgomp,
> change mic and ptx plugins), then here is a small workaround patch to fix
> offloading from libraries.  Likely, it will not affect OpenACC programs with one
> image.  make check-target-libgomp passed.

Thanks!  Confirming that the nvptx offloading test cases are not affected
by this patch.  (But I can't formally approve the patch.)

> 	PR libgomp/65338
> libgomp/
> 	* libgomp.h (struct gomp_device_descr): Remove
> 	offload_regions_registered.
> 	* oacc-host.c (host_dispatch): Do not initialize
> 	offload_regions_registered.
> 	* target.c (gomp_register_image_for_device): Do not check for
> 	offload_regions_registered.
> 	(gomp_target_init): Do not initialize offload_regions_registered.
> 
> 
> diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
> index 3089401..f45fdba 100644
> --- a/libgomp/libgomp.h
> +++ b/libgomp/libgomp.h
> @@ -793,9 +793,6 @@ struct gomp_device_descr
>    /* Set to true when device is initialized.  */
>    bool is_initialized;
>  
> -  /* True when offload regions have been registered with this device.  */
> -  bool offload_regions_registered;
> -
>    /* OpenACC-specific data and functions.  */
>    /* This is mutable because of its mutable data_environ and target_data
>       members.  */
> diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
> index 6aeb1e7..2763f44 100644
> --- a/libgomp/oacc-host.c
> +++ b/libgomp/oacc-host.c
> @@ -56,7 +56,6 @@ static struct gomp_device_descr host_dispatch =
>      .mem_map.is_initialized = false,
>      .mem_map.splay_tree.root = NULL,
>      .is_initialized = false,
> -    .offload_regions_registered = false,
>  
>      .openacc = {
>        .open_device_func = GOMP_OFFLOAD_openacc_open_device,
> diff --git a/libgomp/target.c b/libgomp/target.c
> index 50baa4d..db1f509 100644
> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -1035,13 +1035,8 @@ static void
>  gomp_register_image_for_device (struct gomp_device_descr *device,
>  				struct offload_image_descr *image)
>  {
> -  if (!device->offload_regions_registered
> -      && (device->type == image->type
> -	  || device->type == OFFLOAD_TARGET_TYPE_HOST))
> -    {
> -      device->register_image_func (image->host_table, image->target_data);
> -      device->offload_regions_registered = true;
> -    }
> +  if (device->type == image->type || device->type == OFFLOAD_TARGET_TYPE_HOST)
> +    device->register_image_func (image->host_table, image->target_data);
>  }
>  
>  /* This function initializes the runtime needed for offloading.
> @@ -1105,7 +1100,6 @@ gomp_target_init (void)
>  		current_device.mem_map.is_initialized = false;
>  		current_device.mem_map.splay_tree.root = NULL;
>  		current_device.is_initialized = false;
> -		current_device.offload_regions_registered = false;
>  		current_device.openacc.data_environ = NULL;
>  		current_device.openacc.target_data = NULL;
>  		for (i = 0; i < new_num_devices; i++)


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
  2015-03-23 19:44                             ` Ilya Verbin
  2015-03-26 10:07                               ` libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks Thomas Schwinge
@ 2015-03-26 12:09                               ` Jakub Jelinek
  2015-03-26 20:41                                 ` Ilya Verbin
  2015-03-27 15:21                                 ` libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch) Julian Brown
  1 sibling, 2 replies; 92+ messages in thread
From: Jakub Jelinek @ 2015-03-26 12:09 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: Julian Brown, Thomas Schwinge, gcc-patches, Kirill Yukhin

On Mon, Mar 23, 2015 at 10:44:39PM +0300, Ilya Verbin wrote:
> If it is too late for such global changes (rework initialization in libgomp,
> change mic and ptx plugins), then here is a small workaround patch to fix
> offloading from libraries.  Likely, it will not affect OpenACC programs with one
> image.  make check-target-libgomp passed.

Sorry for not getting to this earlier, really busy with severe regressions
bugfixing lately.

Anyway, IMHO it is not too late to fixing it properly, after all,
the current code is majorly broken.  As I've said earlier, e.g. the lack
of mutex guarding gomp_target_init (which is using pthread_once guaranteed
to be run just once) vs. concurrent GOMP_offload_register calls
(if those are run from ctors, then I guess something like dl_load_lock
ensures at least on glibc that multiple GOMP_offload_register calls aren't
performed at the same time) in accessing/reallocating offload_images
and num_offload_images and the lack of support to register further
images after the gomp_target_init call (if you dlopen further shared
libraries) is really bad.  And it would be really nice to support the
unloading.

But I'm afraid I'm lost in what is the latest posted patch for that,
and how has it been tested (whether just on MIC or MIC emul, or also for
nvptx).

So can you please post a link to the latest full patch and how it has been
tested, and if it is still error prone if say one thread executes
GOMP_target the first time and another at the same time dlopens some shared
library that has offloading regions in it, fix that too?

We still have a week or so to get this sorted out.

	Jakub

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
  2015-03-26 12:09                               ` libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch) Jakub Jelinek
@ 2015-03-26 20:41                                 ` Ilya Verbin
  2015-03-30 16:42                                   ` Jakub Jelinek
                                                     ` (2 more replies)
  2015-03-27 15:21                                 ` libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch) Julian Brown
  1 sibling, 3 replies; 92+ messages in thread
From: Ilya Verbin @ 2015-03-26 20:41 UTC (permalink / raw)
  To: Jakub Jelinek, Thomas Schwinge; +Cc: Julian Brown, gcc-patches, Kirill Yukhin

[-- Attachment #1: Type: text/plain, Size: 51153 bytes --]

On Thu, Mar 26, 2015 at 13:09:19 +0100, Jakub Jelinek wrote:
> On Mon, Mar 23, 2015 at 10:44:39PM +0300, Ilya Verbin wrote:
> > If it is too late for such global changes (rework initialization in libgomp,
> > change mic and ptx plugins), then here is a small workaround patch to fix
> > offloading from libraries.  Likely, it will not affect OpenACC programs with one
> > image.  make check-target-libgomp passed.
> 
> Sorry for not getting to this earlier, really busy with severe regressions
> bugfixing lately.
> 
> Anyway, IMHO it is not too late to fixing it properly, after all,
> the current code is majorly broken.  As I've said earlier, e.g. the lack
> of mutex guarding gomp_target_init (which is using pthread_once guaranteed
> to be run just once) vs. concurrent GOMP_offload_register calls
> (if those are run from ctors, then I guess something like dl_load_lock
> ensures at least on glibc that multiple GOMP_offload_register calls aren't
> performed at the same time) in accessing/reallocating offload_images
> and num_offload_images and the lack of support to register further
> images after the gomp_target_init call (if you dlopen further shared
> libraries) is really bad.  And it would be really nice to support the
> unloading.
> 
> But I'm afraid I'm lost in what is the latest posted patch for that,
> and how has it been tested (whether just on MIC or MIC emul, or also for
> nvptx).
> 
> So can you please post a link to the latest full patch and how it has been
> tested, and if it is still error prone if say one thread executes
> GOMP_target the first time and another at the same time dlopens some shared
> library that has offloading regions in it, fix that too?
> 
> We still have a week or so to get this sorted out.

Here is the latest patch for libgomp and mic plugin.
make check-target-libgomp using intelmic emul passed.
Also I used a testcase from the attachment.

Latest ptx part is here, I guess:
https://gcc.gnu.org/ml/gcc-patches/2015-02/msg01407.html

Thomas, could you please test these 2 patches together on nvptx?


gcc/
	* config/i386/intelmic-mkoffload.c (generate_host_descr_file): Call
	GOMP_offload_unregister from the destructor.
libgomp/
	* libgomp-plugin.h (struct mapping_table): Replace with addr_pair.
	* libgomp.h (struct gomp_memory_mapping): Remove.
	(struct target_mem_desc): Change type of mem_map from
	gomp_memory_mapping * to splay_tree_s *.
	(struct gomp_device_descr): Remove register_image_func, get_table_func.
	Add load_image_func, unload_image_func.
	Change type of mem_map from gomp_memory_mapping to splay_tree_s.
	Remove offload_regions_registered.
	(gomp_init_tables): Remove.
	(gomp_free_memmap): Change type of argument from gomp_memory_mapping *
	to splay_tree_s *.
	* libgomp.map (GOMP_4.0.1): Add GOMP_offload_unregister.
	* oacc-host.c (host_dispatch): Do not initialize register_image_func,
	get_table_func, mem_map.is_initialized, mem_map.splay_tree.root,
	offload_regions_registered.
	Initialize load_image_func, unload_image_func, mem_map.root.
	(goacc_host_init): Do not initialize host_dispatch.mem_map.lock.
	* oacc-init.c (lazy_open): Don't call gomp_init_tables.
	(acc_shutdown_1): Use dev's lock and splay_tree instead of mem_map's.
	* oacc-mem.c (lookup_host): Get gomp_device_descr *dev instead of
	gomp_memory_mapping *.  Use dev's lock and splay_tree.
	(lookup_dev): Use dev's lock.
	(acc_deviceptr): Pass dev to lookup_host instead of mem_map.
	(acc_is_present): Likewise.
	(acc_map_data): Likewise.
	(acc_unmap_data): Likewise.  Use dev's lock.
	(present_create_copy): Likewise.
	(delete_copyout): Pass dev to lookup_host instead of mem_map.
	(update_dev_host): Likewise.
	(gomp_acc_remove_pointer): Likewise.  Use dev's lock.
	* oacc-parallel.c (GOACC_parallel): Use dev's lock and splay_tree.
	* plugin/plugin-host.c (GOMP_OFFLOAD_register_image): Remove.
	(GOMP_OFFLOAD_get_table): Remove
	(GOMP_OFFLOAD_load_image): New function.
	(GOMP_OFFLOAD_unload_image): New function.
	* target.c (register_lock): New mutex for offload image registration.
	(gomp_map_vars): Use dev's lock and splay_tree instead of mem_map's.
	(gomp_copy_from_async): Likewise.
	(gomp_unmap_vars): Likewise.
	(gomp_update): Remove gomp_memory_mapping argument.  Use dev's lock and
	splay_tree instead of mm's.
	(gomp_splay_tree_insert_mapping): New static function.
	(gomp_offload_image_to_device): Ditto.
	(GOMP_offload_register): Add mutex lock.
	Call gomp_offload_image_to_device for all initialized devices.
	(GOMP_offload_unregister): New function.
	(gomp_init_tables): Replace with gomp_init_device.  Replace a call to
	get_table_func from the plugin with calls to init_device_func and
	gomp_offload_image_to_device.
	(gomp_free_memmap): Change type of argument from gomp_memory_mapping *
	to splay_tree_s *.
	(GOMP_target): Do not call gomp_init_tables.  Use dev's lock and
	splay_tree instead of mm's.
	(GOMP_target_data): Do not call gomp_init_tables.
	(GOMP_target_update): Likewise.  Remove argument from gomp_update.
	(gomp_load_plugin_for_device): Replace register_image and get_table
	with load_image and unload_image in DLSYM ().
	(gomp_register_images_for_device): Remove function.
	(gomp_target_init): Do not initialize current_device.mem_map.*,
	current_device.offload_regions_registered.
	Remove call to gomp_register_images_for_device.
	Do not free offload_images and num_offload_images.
liboffloadmic/
	* plugin/libgomp-plugin-intelmic.cpp: Include map.
	(AddrVect, DevAddrVect, ImgDevAddrMap): New typedefs.
	(num_devices, num_images, address_table): New static vars.
	(num_libraries, lib_descrs): Remove static vars.
	(set_mic_lib_path): Rename to ...
	(init): ... this.  Allocate address_table and get num_devices.
	(GOMP_OFFLOAD_get_num_devices): return num_devices.
	(load_lib_and_get_table): Remove static function.
	(offload_image): New static function.
	(GOMP_OFFLOAD_get_table): Remove function.
	(GOMP_OFFLOAD_load_image, GOMP_OFFLOAD_unload_image): New functions.


diff --git a/gcc/config/i386/intelmic-mkoffload.c b/gcc/config/i386/intelmic-mkoffload.c
index f93007c..e101f93 100644
--- a/gcc/config/i386/intelmic-mkoffload.c
+++ b/gcc/config/i386/intelmic-mkoffload.c
@@ -350,14 +350,24 @@ generate_host_descr_file (const char *host_compiler)
 	   "#ifdef __cplusplus\n"
 	   "extern \"C\"\n"
 	   "#endif\n"
-	   "void GOMP_offload_register (void *, int, void *);\n\n"
+	   "void GOMP_offload_register (void *, int, void *);\n"
+	   "void GOMP_offload_unregister (void *, int, void *);\n\n"
 
 	   "__attribute__((constructor))\n"
 	   "static void\n"
 	   "init (void)\n"
 	   "{\n"
 	   "  GOMP_offload_register (&__OFFLOAD_TABLE__, %d, __offload_target_data);\n"
+	   "}\n\n", GOMP_DEVICE_INTEL_MIC);
+
+  fprintf (src_file,
+	   "__attribute__((destructor))\n"
+	   "static void\n"
+	   "fini (void)\n"
+	   "{\n"
+	   "  GOMP_offload_unregister (&__OFFLOAD_TABLE__, %d, __offload_target_data);\n"
 	   "}\n", GOMP_DEVICE_INTEL_MIC);
+
   fclose (src_file);
 
   unsigned new_argc = 0;
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index d9cbff5..1072ae4 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -51,14 +51,12 @@ enum offload_target_type
   OFFLOAD_TARGET_TYPE_INTEL_MIC = 6
 };
 
-/* Auxiliary struct, used for transferring a host-target address range mapping
-   from plugin to libgomp.  */
-struct mapping_table
+/* Auxiliary struct, used for transferring pairs of addresses from plugin
+   to libgomp.  */
+struct addr_pair
 {
-  uintptr_t host_start;
-  uintptr_t host_end;
-  uintptr_t tgt_start;
-  uintptr_t tgt_end;
+  uintptr_t start;
+  uintptr_t end;
 };
 
 /* Miscellaneous functions.  */
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 3089401..a1d42c5 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -224,7 +224,6 @@ struct gomp_team_state
 };
 
 struct target_mem_desc;
-struct gomp_memory_mapping;
 
 /* These are the OpenMP 4.0 Internal Control Variables described in
    section 2.3.1.  Those described as having one copy per task are
@@ -657,7 +656,7 @@ struct target_mem_desc {
   struct gomp_device_descr *device_descr;
 
   /* Memory mapping info for the thread that created this descriptor.  */
-  struct gomp_memory_mapping *mem_map;
+  struct splay_tree_s *mem_map;
 
   /* List of splay keys to remove (or decrease refcount)
      at the end of region.  */
@@ -683,20 +682,6 @@ struct splay_tree_key_s {
 
 #include "splay-tree.h"
 
-/* Information about mapped memory regions (per device/context).  */
-
-struct gomp_memory_mapping
-{
-  /* Mutex for operating with the splay tree and other shared structures.  */
-  gomp_mutex_t lock;
-
-  /* True when tables have been added to this memory map.  */
-  bool is_initialized;
-
-  /* Splay tree containing information about mapped memory regions.  */
-  struct splay_tree_s splay_tree;
-};
-
 typedef struct acc_dispatch_t
 {
   /* This is a linked list of data mapped using the
@@ -773,19 +758,18 @@ struct gomp_device_descr
   unsigned int (*get_caps_func) (void);
   int (*get_type_func) (void);
   int (*get_num_devices_func) (void);
-  void (*register_image_func) (void *, void *);
   void (*init_device_func) (int);
   void (*fini_device_func) (int);
-  int (*get_table_func) (int, struct mapping_table **);
+  int (*load_image_func) (int, void *, struct addr_pair **);
+  void (*unload_image_func) (int, void *);
   void *(*alloc_func) (int, size_t);
   void (*free_func) (int, void *);
   void *(*dev2host_func) (int, void *, const void *, size_t);
   void *(*host2dev_func) (int, void *, const void *, size_t);
   void (*run_func) (int, void *, void *);
 
-  /* Memory-mapping info for this device instance.  */
-  /* Uses a separate lock.  */
-  struct gomp_memory_mapping mem_map;
+  /* Splay tree containing information about mapped memory regions.  */
+  struct splay_tree_s mem_map;
 
   /* Mutex for the mutable data.  */
   gomp_mutex_t lock;
@@ -793,9 +777,6 @@ struct gomp_device_descr
   /* Set to true when device is initialized.  */
   bool is_initialized;
 
-  /* True when offload regions have been registered with this device.  */
-  bool offload_regions_registered;
-
   /* OpenACC-specific data and functions.  */
   /* This is mutable because of its mutable data_environ and target_data
      members.  */
@@ -811,9 +792,7 @@ extern struct target_mem_desc *gomp_map_vars (struct gomp_device_descr *,
 extern void gomp_copy_from_async (struct target_mem_desc *);
 extern void gomp_unmap_vars (struct target_mem_desc *, bool);
 extern void gomp_init_device (struct gomp_device_descr *);
-extern void gomp_init_tables (struct gomp_device_descr *,
-			      struct gomp_memory_mapping *);
-extern void gomp_free_memmap (struct gomp_memory_mapping *);
+extern void gomp_free_memmap (struct splay_tree_s *);
 extern void gomp_fini_device (struct gomp_device_descr *);
 
 /* work.c */
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index f44174e..2b2b953 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -231,6 +231,7 @@ GOMP_4.0 {
 GOMP_4.0.1 {
   global:
 	GOMP_offload_register;
+	GOMP_offload_unregister;
 } GOMP_4.0;
 
 OACC_2.0 {
diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index 6aeb1e7..e4756b6 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -43,20 +43,18 @@ static struct gomp_device_descr host_dispatch =
     .get_caps_func = GOMP_OFFLOAD_get_caps,
     .get_type_func = GOMP_OFFLOAD_get_type,
     .get_num_devices_func = GOMP_OFFLOAD_get_num_devices,
-    .register_image_func = GOMP_OFFLOAD_register_image,
     .init_device_func = GOMP_OFFLOAD_init_device,
     .fini_device_func = GOMP_OFFLOAD_fini_device,
-    .get_table_func = GOMP_OFFLOAD_get_table,
+    .load_image_func = GOMP_OFFLOAD_load_image,
+    .unload_image_func = GOMP_OFFLOAD_unload_image,
     .alloc_func = GOMP_OFFLOAD_alloc,
     .free_func = GOMP_OFFLOAD_free,
     .dev2host_func = GOMP_OFFLOAD_dev2host,
     .host2dev_func = GOMP_OFFLOAD_host2dev,
     .run_func = GOMP_OFFLOAD_run,
 
-    .mem_map.is_initialized = false,
-    .mem_map.splay_tree.root = NULL,
+    .mem_map.root = NULL,
     .is_initialized = false,
-    .offload_regions_registered = false,
 
     .openacc = {
       .open_device_func = GOMP_OFFLOAD_openacc_open_device,
@@ -94,7 +92,6 @@ static struct gomp_device_descr host_dispatch =
 static __attribute__ ((constructor))
 void goacc_host_init (void)
 {
-  gomp_mutex_init (&host_dispatch.mem_map.lock);
   gomp_mutex_init (&host_dispatch.lock);
   goacc_register (&host_dispatch);
 }
diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
index 166eb55..1e0243e 100644
--- a/libgomp/oacc-init.c
+++ b/libgomp/oacc-init.c
@@ -284,12 +284,6 @@ lazy_open (int ord)
     = acc_dev->openacc.create_thread_data_func (acc_dev->openacc.target_data);
 
   acc_dev->openacc.async_set_async_func (acc_async_sync);
-
-  struct gomp_memory_mapping *mem_map = &acc_dev->mem_map;
-  gomp_mutex_lock (&mem_map->lock);
-  if (!mem_map->is_initialized)
-    gomp_init_tables (acc_dev, mem_map);
-  gomp_mutex_unlock (&mem_map->lock);
 }
 
 /* OpenACC 2.0a (3.2.12, 3.2.13) doesn't specify whether the serialization of
@@ -351,10 +345,9 @@ acc_shutdown_1 (acc_device_t d)
 
 	  walk->dev->openacc.target_data = target_data = NULL;
 
-	  struct gomp_memory_mapping *mem_map = &walk->dev->mem_map;
-	  gomp_mutex_lock (&mem_map->lock);
-	  gomp_free_memmap (mem_map);
-	  gomp_mutex_unlock (&mem_map->lock);
+	  gomp_mutex_lock (&walk->dev->lock);
+	  gomp_free_memmap (&walk->dev->mem_map);
+	  gomp_mutex_unlock (&walk->dev->lock);
 
 	  walk->dev = NULL;
 	}
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 0096d51..fdc82e6 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -38,7 +38,7 @@
 /* Return block containing [H->S), or NULL if not contained.  */
 
 static splay_tree_key
-lookup_host (struct gomp_memory_mapping *mem_map, void *h, size_t s)
+lookup_host (struct gomp_device_descr *dev, void *h, size_t s)
 {
   struct splay_tree_key_s node;
   splay_tree_key key;
@@ -46,11 +46,9 @@ lookup_host (struct gomp_memory_mapping *mem_map, void *h, size_t s)
   node.host_start = (uintptr_t) h;
   node.host_end = (uintptr_t) h + s;
 
-  gomp_mutex_lock (&mem_map->lock);
-
-  key = splay_tree_lookup (&mem_map->splay_tree, &node);
-
-  gomp_mutex_unlock (&mem_map->lock);
+  gomp_mutex_lock (&dev->lock);
+  key = splay_tree_lookup (&dev->mem_map, &node);
+  gomp_mutex_unlock (&dev->lock);
 
   return key;
 }
@@ -65,14 +63,11 @@ lookup_dev (struct target_mem_desc *tgt, void *d, size_t s)
 {
   int i;
   struct target_mem_desc *t;
-  struct gomp_memory_mapping *mem_map;
 
   if (!tgt)
     return NULL;
 
-  mem_map = tgt->mem_map;
-
-  gomp_mutex_lock (&mem_map->lock);
+  gomp_mutex_lock (&tgt->device_descr->lock);
 
   for (t = tgt; t != NULL; t = t->prev)
     {
@@ -80,7 +75,7 @@ lookup_dev (struct target_mem_desc *tgt, void *d, size_t s)
         break;
     }
 
-  gomp_mutex_unlock (&mem_map->lock);
+  gomp_mutex_unlock (&tgt->device_descr->lock);
 
   if (!t)
     return NULL;
@@ -176,7 +171,7 @@ acc_deviceptr (void *h)
 
   struct goacc_thread *thr = goacc_thread ();
 
-  n = lookup_host (&thr->dev->mem_map, h, 1);
+  n = lookup_host (thr->dev, h, 1);
 
   if (!n)
     return NULL;
@@ -229,7 +224,7 @@ acc_is_present (void *h, size_t s)
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
 
-  n = lookup_host (&acc_dev->mem_map, h, s);
+  n = lookup_host (acc_dev, h, s);
 
   if (n && ((uintptr_t)h < n->host_start
 	    || (uintptr_t)h + s > n->host_end
@@ -271,7 +266,7 @@ acc_map_data (void *h, void *d, size_t s)
 	gomp_fatal ("[%p,+%d]->[%p,+%d] is a bad map",
                     (void *)h, (int)s, (void *)d, (int)s);
 
-      if (lookup_host (&acc_dev->mem_map, h, s))
+      if (lookup_host (acc_dev, h, s))
 	gomp_fatal ("host address [%p, +%d] is already mapped", (void *)h,
 		    (int)s);
 
@@ -296,7 +291,7 @@ acc_unmap_data (void *h)
   /* No need to call lazy open, as the address must have been mapped.  */
 
   size_t host_size;
-  splay_tree_key n = lookup_host (&acc_dev->mem_map, h, 1);
+  splay_tree_key n = lookup_host (acc_dev, h, 1);
   struct target_mem_desc *t;
 
   if (!n)
@@ -320,7 +315,7 @@ acc_unmap_data (void *h)
       t->tgt_end = 0;
       t->to_free = 0;
 
-      gomp_mutex_lock (&acc_dev->mem_map.lock);
+      gomp_mutex_lock (&acc_dev->lock);
 
       for (tp = NULL, t = acc_dev->openacc.data_environ; t != NULL;
 	   tp = t, t = t->prev)
@@ -334,7 +329,7 @@ acc_unmap_data (void *h)
 	    break;
 	  }
 
-      gomp_mutex_unlock (&acc_dev->mem_map.lock);
+      gomp_mutex_unlock (&acc_dev->lock);
     }
 
   gomp_unmap_vars (t, true);
@@ -358,7 +353,7 @@ present_create_copy (unsigned f, void *h, size_t s)
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
 
-  n = lookup_host (&acc_dev->mem_map, h, s);
+  n = lookup_host (acc_dev, h, s);
   if (n)
     {
       /* Present. */
@@ -389,13 +384,13 @@ present_create_copy (unsigned f, void *h, size_t s)
       tgt = gomp_map_vars (acc_dev, mapnum, &hostaddrs, NULL, &s, &kinds, true,
 			   false);
 
-      gomp_mutex_lock (&acc_dev->mem_map.lock);
+      gomp_mutex_lock (&acc_dev->lock);
 
       d = tgt->to_free;
       tgt->prev = acc_dev->openacc.data_environ;
       acc_dev->openacc.data_environ = tgt;
 
-      gomp_mutex_unlock (&acc_dev->mem_map.lock);
+      gomp_mutex_unlock (&acc_dev->lock);
     }
 
   return d;
@@ -436,7 +431,7 @@ delete_copyout (unsigned f, void *h, size_t s)
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
 
-  n = lookup_host (&acc_dev->mem_map, h, s);
+  n = lookup_host (acc_dev, h, s);
 
   /* No need to call lazy open, as the data must already have been
      mapped.  */
@@ -479,7 +474,7 @@ update_dev_host (int is_dev, void *h, size_t s)
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
 
-  n = lookup_host (&acc_dev->mem_map, h, s);
+  n = lookup_host (acc_dev, h, s);
 
   /* No need to call lazy open, as the data must already have been
      mapped.  */
@@ -532,7 +527,7 @@ gomp_acc_remove_pointer (void *h, bool force_copyfrom, int async, int mapnum)
   struct target_mem_desc *t;
   int minrefs = (mapnum == 1) ? 2 : 3;
 
-  n = lookup_host (&acc_dev->mem_map, h, 1);
+  n = lookup_host (acc_dev, h, 1);
 
   if (!n)
     gomp_fatal ("%p is not a mapped block", (void *)h);
@@ -543,7 +538,7 @@ gomp_acc_remove_pointer (void *h, bool force_copyfrom, int async, int mapnum)
 
   struct target_mem_desc *tp;
 
-  gomp_mutex_lock (&acc_dev->mem_map.lock);
+  gomp_mutex_lock (&acc_dev->lock);
 
   if (t->refcount == minrefs)
     {
@@ -570,7 +565,7 @@ gomp_acc_remove_pointer (void *h, bool force_copyfrom, int async, int mapnum)
   if (force_copyfrom)
     t->list[0]->copy_from = 1;
 
-  gomp_mutex_unlock (&acc_dev->mem_map.lock);
+  gomp_mutex_unlock (&acc_dev->lock);
 
   /* If running synchronously, unmap immediately.  */
   if (async < acc_async_noval)
diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
index 0c74f54..563f9bb 100644
--- a/libgomp/oacc-parallel.c
+++ b/libgomp/oacc-parallel.c
@@ -144,9 +144,9 @@ GOACC_parallel (int device, void (*fn) (void *),
     {
       k.host_start = (uintptr_t) fn;
       k.host_end = k.host_start + 1;
-      gomp_mutex_lock (&acc_dev->mem_map.lock);
-      tgt_fn_key = splay_tree_lookup (&acc_dev->mem_map.splay_tree, &k);
-      gomp_mutex_unlock (&acc_dev->mem_map.lock);
+      gomp_mutex_lock (&acc_dev->lock);
+      tgt_fn_key = splay_tree_lookup (&acc_dev->mem_map, &k);
+      gomp_mutex_unlock (&acc_dev->lock);
 
       if (tgt_fn_key == NULL)
 	gomp_fatal ("target function wasn't mapped");
diff --git a/libgomp/plugin/plugin-host.c b/libgomp/plugin/plugin-host.c
index ebf7f11..bc60f72 100644
--- a/libgomp/plugin/plugin-host.c
+++ b/libgomp/plugin/plugin-host.c
@@ -95,12 +95,6 @@ GOMP_OFFLOAD_get_num_devices (void)
 }
 
 STATIC void
-GOMP_OFFLOAD_register_image (void *host_table __attribute__ ((unused)),
-			     void *target_data __attribute__ ((unused)))
-{
-}
-
-STATIC void
 GOMP_OFFLOAD_init_device (int n __attribute__ ((unused)))
 {
 }
@@ -111,12 +105,19 @@ GOMP_OFFLOAD_fini_device (int n __attribute__ ((unused)))
 }
 
 STATIC int
-GOMP_OFFLOAD_get_table (int n __attribute__ ((unused)),
-			struct mapping_table **table __attribute__ ((unused)))
+GOMP_OFFLOAD_load_image (int n __attribute__ ((unused)),
+			 void *i __attribute__ ((unused)),
+			 struct addr_pair **r __attribute__ ((unused)))
 {
   return 0;
 }
 
+STATIC void
+GOMP_OFFLOAD_unload_image (int n __attribute__ ((unused)),
+			   void *i __attribute__ ((unused)))
+{
+}
+
 STATIC void *
 GOMP_OFFLOAD_openacc_open_device (int n)
 {
diff --git a/libgomp/target.c b/libgomp/target.c
index c5dda3f..ba2d231 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -49,6 +49,9 @@ static void gomp_target_init (void);
 /* The whole initialization code for offloading plugins is only run one.  */
 static pthread_once_t gomp_is_initialized = PTHREAD_ONCE_INIT;
 
+/* Mutex for offload image registration.  */
+static gomp_mutex_t register_lock;
+
 /* This structure describes an offload image.
    It contains type of the target device, pointer to host table descriptor, and
    pointer to target data.  */
@@ -153,14 +156,14 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
   size_t i, tgt_align, tgt_size, not_found_cnt = 0;
   const int rshift = is_openacc ? 8 : 3;
   const int typemask = is_openacc ? 0xff : 0x7;
-  struct gomp_memory_mapping *mm = &devicep->mem_map;
+  struct splay_tree_s *mem_map = &devicep->mem_map;
   struct splay_tree_key_s cur_node;
   struct target_mem_desc *tgt
     = gomp_malloc (sizeof (*tgt) + sizeof (tgt->list[0]) * mapnum);
   tgt->list_count = mapnum;
   tgt->refcount = 1;
   tgt->device_descr = devicep;
-  tgt->mem_map = mm;
+  tgt->mem_map = mem_map;
 
   if (mapnum == 0)
     return tgt;
@@ -174,7 +177,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
       tgt_size = mapnum * sizeof (void *);
     }
 
-  gomp_mutex_lock (&mm->lock);
+  gomp_mutex_lock (&devicep->lock);
 
   for (i = 0; i < mapnum; i++)
     {
@@ -189,7 +192,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	cur_node.host_end = cur_node.host_start + sizes[i];
       else
 	cur_node.host_end = cur_node.host_start + sizeof (void *);
-      splay_tree_key n = splay_tree_lookup (&mm->splay_tree, &cur_node);
+      splay_tree_key n = splay_tree_lookup (mem_map, &cur_node);
       if (n)
 	{
 	  tgt->list[i] = n;
@@ -274,7 +277,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	      k->host_end = k->host_start + sizes[i];
 	    else
 	      k->host_end = k->host_start + sizeof (void *);
-	    splay_tree_key n = splay_tree_lookup (&mm->splay_tree, k);
+	    splay_tree_key n = splay_tree_lookup (mem_map, k);
 	    if (n)
 	      {
 		tgt->list[i] = n;
@@ -294,7 +297,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 		tgt->refcount++;
 		array->left = NULL;
 		array->right = NULL;
-		splay_tree_insert (&mm->splay_tree, array);
+		splay_tree_insert (mem_map, array);
 		switch (kind & typemask)
 		  {
 		  case GOMP_MAP_ALLOC:
@@ -332,16 +335,16 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 		    /* Add bias to the pointer value.  */
 		    cur_node.host_start += sizes[i];
 		    cur_node.host_end = cur_node.host_start + 1;
-		    n = splay_tree_lookup (&mm->splay_tree, &cur_node);
+		    n = splay_tree_lookup (mem_map, &cur_node);
 		    if (n == NULL)
 		      {
 			/* Could be possibly zero size array section.  */
 			cur_node.host_end--;
-			n = splay_tree_lookup (&mm->splay_tree, &cur_node);
+			n = splay_tree_lookup (mem_map, &cur_node);
 			if (n == NULL)
 			  {
 			    cur_node.host_start--;
-			    n = splay_tree_lookup (&mm->splay_tree, &cur_node);
+			    n = splay_tree_lookup (mem_map, &cur_node);
 			    cur_node.host_start++;
 			  }
 		      }
@@ -400,18 +403,16 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 			  /* Add bias to the pointer value.  */
 			  cur_node.host_start += sizes[j];
 			  cur_node.host_end = cur_node.host_start + 1;
-			  n = splay_tree_lookup (&mm->splay_tree, &cur_node);
+			  n = splay_tree_lookup (mem_map, &cur_node);
 			  if (n == NULL)
 			    {
 			      /* Could be possibly zero size array section.  */
 			      cur_node.host_end--;
-			      n = splay_tree_lookup (&mm->splay_tree,
-						     &cur_node);
+			      n = splay_tree_lookup (mem_map, &cur_node);
 			      if (n == NULL)
 				{
 				  cur_node.host_start--;
-				  n = splay_tree_lookup (&mm->splay_tree,
-							 &cur_node);
+				  n = splay_tree_lookup (mem_map, &cur_node);
 				  cur_node.host_start++;
 				}
 			    }
@@ -489,7 +490,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	}
     }
 
-  gomp_mutex_unlock (&mm->lock);
+  gomp_mutex_unlock (&devicep->lock);
   return tgt;
 }
 
@@ -514,10 +515,9 @@ attribute_hidden void
 gomp_copy_from_async (struct target_mem_desc *tgt)
 {
   struct gomp_device_descr *devicep = tgt->device_descr;
-  struct gomp_memory_mapping *mm = tgt->mem_map;
   size_t i;
 
-  gomp_mutex_lock (&mm->lock);
+  gomp_mutex_lock (&devicep->lock);
 
   for (i = 0; i < tgt->list_count; i++)
     if (tgt->list[i] == NULL)
@@ -536,7 +536,7 @@ gomp_copy_from_async (struct target_mem_desc *tgt)
 				  k->host_end - k->host_start);
       }
 
-  gomp_mutex_unlock (&mm->lock);
+  gomp_mutex_unlock (&devicep->lock);
 }
 
 /* Unmap variables described by TGT.  If DO_COPYFROM is true, copy relevant
@@ -547,7 +547,6 @@ attribute_hidden void
 gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
 {
   struct gomp_device_descr *devicep = tgt->device_descr;
-  struct gomp_memory_mapping *mm = tgt->mem_map;
 
   if (tgt->list_count == 0)
     {
@@ -555,7 +554,7 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
       return;
     }
 
-  gomp_mutex_lock (&mm->lock);
+  gomp_mutex_lock (&devicep->lock);
 
   size_t i;
   for (i = 0; i < tgt->list_count; i++)
@@ -572,7 +571,7 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
 	  devicep->dev2host_func (devicep->target_id, (void *) k->host_start,
 				  (void *) (k->tgt->tgt_start + k->tgt_offset),
 				  k->host_end - k->host_start);
-	splay_tree_remove (&mm->splay_tree, k);
+	splay_tree_remove (tgt->mem_map, k);
 	if (k->tgt->refcount > 1)
 	  k->tgt->refcount--;
 	else
@@ -584,13 +583,12 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
   else
     gomp_unmap_tgt (tgt);
 
-  gomp_mutex_unlock (&mm->lock);
+  gomp_mutex_unlock (&devicep->lock);
 }
 
 static void
-gomp_update (struct gomp_device_descr *devicep, struct gomp_memory_mapping *mm,
-	     size_t mapnum, void **hostaddrs, size_t *sizes, void *kinds,
-	     bool is_openacc)
+gomp_update (struct gomp_device_descr *devicep, size_t mapnum, void **hostaddrs,
+	     size_t *sizes, void *kinds, bool is_openacc)
 {
   size_t i;
   struct splay_tree_key_s cur_node;
@@ -602,14 +600,13 @@ gomp_update (struct gomp_device_descr *devicep, struct gomp_memory_mapping *mm,
   if (mapnum == 0)
     return;
 
-  gomp_mutex_lock (&mm->lock);
+  gomp_mutex_lock (&devicep->lock);
   for (i = 0; i < mapnum; i++)
     if (sizes[i])
       {
 	cur_node.host_start = (uintptr_t) hostaddrs[i];
 	cur_node.host_end = cur_node.host_start + sizes[i];
-	splay_tree_key n = splay_tree_lookup (&mm->splay_tree,
-					      &cur_node);
+	splay_tree_key n = splay_tree_lookup (&devicep->mem_map, &cur_node);
 	if (n)
 	  {
 	    int kind = get_kind (is_openacc, kinds, i);
@@ -643,10 +640,86 @@ gomp_update (struct gomp_device_descr *devicep, struct gomp_memory_mapping *mm,
 		      (void *) cur_node.host_start,
 		      (void *) cur_node.host_end);
       }
-  gomp_mutex_unlock (&mm->lock);
+  gomp_mutex_unlock (&devicep->lock);
+}
+
+
+/* Insert mapping of host -> target address pairs to splay tree.  */
+
+static void
+gomp_splay_tree_insert_mapping (struct gomp_device_descr *devicep,
+				struct addr_pair *host_addr,
+				struct addr_pair *tgt_addr)
+{
+  struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt));
+  tgt->refcount = 1;
+  tgt->array = gomp_malloc (sizeof (*tgt->array));
+  tgt->tgt_start = tgt_addr->start;
+  tgt->tgt_end = tgt_addr->end;
+  tgt->to_free = NULL;
+  tgt->list_count = 0;
+  tgt->device_descr = devicep;
+  splay_tree_node node = tgt->array;
+  splay_tree_key k = &node->key;
+  k->host_start = host_addr->start;
+  k->host_end = host_addr->end;
+  k->tgt_offset = 0;
+  k->refcount = 1;
+  k->copy_from = false;
+  k->tgt = tgt;
+  node->left = NULL;
+  node->right = NULL;
+  splay_tree_insert (&devicep->mem_map, node);
+}
+
+/* Load image pointed by TARGET_DATA to the device, specified by DEVICEP.
+   And insert to splay tree the mapping between addresses from HOST_TABLE and
+   from loaded target image.  */
+
+static void
+gomp_offload_image_to_device (struct gomp_device_descr *devicep,
+			      void *host_table, void *target_data)
+{
+  void **host_func_table = ((void ***) host_table)[0];
+  void **host_funcs_end  = ((void ***) host_table)[1];
+  void **host_var_table  = ((void ***) host_table)[2];
+  void **host_vars_end   = ((void ***) host_table)[3];
+
+  /* The func table contains only addresses, the var table contains addresses
+     and corresponding sizes.  */
+  int num_funcs = host_funcs_end - host_func_table;
+  int num_vars  = (host_vars_end - host_var_table) / 2;
+
+  /* Load image to device and get target addresses for the image.  */
+  struct addr_pair *target_table = NULL;
+  int i, num_target_entries
+    = devicep->load_image_func (devicep->target_id, target_data, &target_table);
+
+  if (num_target_entries != num_funcs + num_vars)
+    gomp_fatal ("Can't map target functions or variables");
+
+  /* Insert host-target address mapping into devicep->dev_splay_tree.  */
+  for (i = 0; i < num_funcs; i++)
+    {
+      struct addr_pair host_addr;
+      host_addr.start = (uintptr_t) host_func_table[i];
+      host_addr.end = host_addr.start + 1;
+      gomp_splay_tree_insert_mapping (devicep, &host_addr, &target_table[i]);
+    }
+
+  for (i = 0; i < num_vars; i++)
+    {
+      struct addr_pair host_addr;
+      host_addr.start = (uintptr_t) host_var_table[i*2];
+      host_addr.end = host_addr.start + (uintptr_t) host_var_table[i*2+1];
+      gomp_splay_tree_insert_mapping (devicep, &host_addr,
+				      &target_table[num_funcs+i]);
+    }
+
+  free (target_table);
 }
 
-/* This function should be called from every offload image.
+/* This function should be called from every offload image while loading.
    It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of
    the target, and TARGET_DATA needed by target plugin.  */
 
@@ -654,6 +727,18 @@ void
 GOMP_offload_register (void *host_table, enum offload_target_type target_type,
 		       void *target_data)
 {
+  int i;
+  gomp_mutex_lock (&register_lock);
+
+  /* Load image to all initialized devices.  */
+  for (i = 0; i < num_devices; i++)
+    {
+      struct gomp_device_descr *devicep = &devices[i];
+      if (devicep->type == target_type && devicep->is_initialized)
+	gomp_offload_image_to_device (devicep, host_table, target_data);
+    }
+
+  /* Insert image to array of pending images.  */
   offload_images = gomp_realloc (offload_images,
 				 (num_offload_images + 1)
 				 * sizeof (struct offload_image_descr));
@@ -663,74 +748,105 @@ GOMP_offload_register (void *host_table, enum offload_target_type target_type,
   offload_images[num_offload_images].target_data = target_data;
 
   num_offload_images++;
+  gomp_mutex_unlock (&register_lock);
 }
 
-/* This function initializes the target device, specified by DEVICEP.  DEVICEP
-   must be locked on entry, and remains locked on return.  */
+/* This function should be called from every offload image while unloading.
+   It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of
+   the target, and TARGET_DATA needed by target plugin.  */
 
-attribute_hidden void
-gomp_init_device (struct gomp_device_descr *devicep)
+void
+GOMP_offload_unregister (void *host_table, enum offload_target_type target_type,
+			 void *target_data)
 {
-  devicep->init_device_func (devicep->target_id);
-  devicep->is_initialized = true;
+  void **host_func_table = ((void ***) host_table)[0];
+  void **host_funcs_end  = ((void ***) host_table)[1];
+  void **host_var_table  = ((void ***) host_table)[2];
+  void **host_vars_end   = ((void ***) host_table)[3];
+  int i;
+
+  /* The func table contains only addresses, the var table contains addresses
+     and corresponding sizes.  */
+  int num_funcs = host_funcs_end - host_func_table;
+  int num_vars  = (host_vars_end - host_var_table) / 2;
+
+  gomp_mutex_lock (&register_lock);
+
+  /* Unload image from all initialized devices.  */
+  for (i = 0; i < num_devices; i++)
+    {
+      int j;
+      struct gomp_device_descr *devicep = &devices[i];
+
+      if (devicep->type != target_type || !devicep->is_initialized)
+	continue;
+
+      devicep->unload_image_func (devicep->target_id, target_data);
+
+      /* Remove mapping from splay tree.  */
+      for (j = 0; j < num_funcs; j++)
+	{
+	  struct splay_tree_key_s k;
+	  k.host_start = (uintptr_t) host_func_table[j];
+	  k.host_end = k.host_start + 1;
+	  splay_tree_remove (&devicep->mem_map, &k);
+	}
+
+      for (j = 0; j < num_vars; j++)
+	{
+	  struct splay_tree_key_s k;
+	  k.host_start = (uintptr_t) host_var_table[j*2];
+	  k.host_end = k.host_start + (uintptr_t) host_var_table[j*2+1];
+	  splay_tree_remove (&devicep->mem_map, &k);
+	}
+    }
+
+  /* Remove image from array of pending images.  */
+  for (i = 0; i < num_offload_images; i++)
+    if (offload_images[i].target_data == target_data)
+      {
+	offload_images[i] = offload_images[--num_offload_images];
+	break;
+      }
+
+  gomp_mutex_unlock (&register_lock);
 }
 
-/* Initialize address mapping tables.  MM must be locked on entry, and remains
-   locked on return.  */
+/* This function initializes the target device, specified by DEVICEP.  DEVICEP
+   must be locked on entry, and remains locked on return.  */
 
 attribute_hidden void
-gomp_init_tables (struct gomp_device_descr *devicep,
-		  struct gomp_memory_mapping *mm)
+gomp_init_device (struct gomp_device_descr *devicep)
 {
-  /* Get address mapping table for device.  */
-  struct mapping_table *table = NULL;
-  int num_entries = devicep->get_table_func (devicep->target_id, &table);
-
-  /* Insert host-target address mapping into dev_splay_tree.  */
   int i;
-  for (i = 0; i < num_entries; i++)
+  devicep->init_device_func (devicep->target_id);
+
+  /* Load to device all images registered by the moment.  */
+  for (i = 0; i < num_offload_images; i++)
     {
-      struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt));
-      tgt->refcount = 1;
-      tgt->array = gomp_malloc (sizeof (*tgt->array));
-      tgt->tgt_start = table[i].tgt_start;
-      tgt->tgt_end = table[i].tgt_end;
-      tgt->to_free = NULL;
-      tgt->list_count = 0;
-      tgt->device_descr = devicep;
-      splay_tree_node node = tgt->array;
-      splay_tree_key k = &node->key;
-      k->host_start = table[i].host_start;
-      k->host_end = table[i].host_end;
-      k->tgt_offset = 0;
-      k->refcount = 1;
-      k->copy_from = false;
-      k->tgt = tgt;
-      node->left = NULL;
-      node->right = NULL;
-      splay_tree_insert (&mm->splay_tree, node);
+      struct offload_image_descr *image = &offload_images[i];
+      if (image->type == devicep->type)
+	gomp_offload_image_to_device (devicep, image->host_table,
+				      image->target_data);
     }
 
-  free (table);
-  mm->is_initialized = true;
+  devicep->is_initialized = true;
 }
 
 /* Free address mapping tables.  MM must be locked on entry, and remains locked
    on return.  */
 
 attribute_hidden void
-gomp_free_memmap (struct gomp_memory_mapping *mm)
+gomp_free_memmap (struct splay_tree_s *mem_map)
 {
-  while (mm->splay_tree.root)
+  while (mem_map->root)
     {
-      struct target_mem_desc *tgt = mm->splay_tree.root->key.tgt;
+      struct target_mem_desc *tgt = mem_map->root->key.tgt;
 
-      splay_tree_remove (&mm->splay_tree, &mm->splay_tree.root->key);
+      splay_tree_remove (mem_map, &mem_map->root->key);
       free (tgt->array);
       free (tgt);
     }
-
-  mm->is_initialized = false;
 }
 
 /* This function de-initializes the target device, specified by DEVICEP.
@@ -791,20 +907,15 @@ GOMP_target (int device, void (*fn) (void *), const void *unused,
     fn_addr = (void *) fn;
   else
     {
-      struct gomp_memory_mapping *mm = &devicep->mem_map;
-      gomp_mutex_lock (&mm->lock);
-
-      if (!mm->is_initialized)
-	gomp_init_tables (devicep, mm);
-
+      gomp_mutex_lock (&devicep->lock);
       struct splay_tree_key_s k;
       k.host_start = (uintptr_t) fn;
       k.host_end = k.host_start + 1;
-      splay_tree_key tgt_fn = splay_tree_lookup (&mm->splay_tree, &k);
+      splay_tree_key tgt_fn = splay_tree_lookup (&devicep->mem_map, &k);
       if (tgt_fn == NULL)
 	gomp_fatal ("Target function wasn't mapped");
 
-      gomp_mutex_unlock (&mm->lock);
+      gomp_mutex_unlock (&devicep->lock);
 
       fn_addr = (void *) tgt_fn->tgt->tgt_start;
     }
@@ -856,12 +967,6 @@ GOMP_target_data (int device, const void *unused, size_t mapnum,
     gomp_init_device (devicep);
   gomp_mutex_unlock (&devicep->lock);
 
-  struct gomp_memory_mapping *mm = &devicep->mem_map;
-  gomp_mutex_lock (&mm->lock);
-  if (!mm->is_initialized)
-    gomp_init_tables (devicep, mm);
-  gomp_mutex_unlock (&mm->lock);
-
   struct target_mem_desc *tgt
     = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false,
 		     false);
@@ -897,13 +1002,7 @@ GOMP_target_update (int device, const void *unused, size_t mapnum,
     gomp_init_device (devicep);
   gomp_mutex_unlock (&devicep->lock);
 
-  struct gomp_memory_mapping *mm = &devicep->mem_map;
-  gomp_mutex_lock (&mm->lock);
-  if (!mm->is_initialized)
-    gomp_init_tables (devicep, mm);
-  gomp_mutex_unlock (&mm->lock);
-
-  gomp_update (devicep, mm, mapnum, hostaddrs, sizes, kinds, false);
+  gomp_update (devicep, mapnum, hostaddrs, sizes, kinds, false);
 }
 
 void
@@ -972,10 +1071,10 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
   DLSYM (get_caps);
   DLSYM (get_type);
   DLSYM (get_num_devices);
-  DLSYM (register_image);
   DLSYM (init_device);
   DLSYM (fini_device);
-  DLSYM (get_table);
+  DLSYM (load_image);
+  DLSYM (unload_image);
   DLSYM (alloc);
   DLSYM (free);
   DLSYM (dev2host);
@@ -1038,22 +1137,6 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
   return err == NULL;
 }
 
-/* This function adds a compatible offload image IMAGE to an accelerator device
-   DEVICE.  DEVICE must be locked on entry, and remains locked on return.  */
-
-static void
-gomp_register_image_for_device (struct gomp_device_descr *device,
-				struct offload_image_descr *image)
-{
-  if (!device->offload_regions_registered
-      && (device->type == image->type
-	  || device->type == OFFLOAD_TARGET_TYPE_HOST))
-    {
-      device->register_image_func (image->host_table, image->target_data);
-      device->offload_regions_registered = true;
-    }
-}
-
 /* This function initializes the runtime needed for offloading.
    It parses the list of offload targets and tries to load the plugins for
    these targets.  On return, the variables NUM_DEVICES and NUM_DEVICES_OPENMP
@@ -1112,17 +1195,14 @@ gomp_target_init (void)
 		current_device.name = current_device.get_name_func ();
 		/* current_device.capabilities has already been set.  */
 		current_device.type = current_device.get_type_func ();
-		current_device.mem_map.is_initialized = false;
-		current_device.mem_map.splay_tree.root = NULL;
+		current_device.mem_map.root = NULL;
 		current_device.is_initialized = false;
-		current_device.offload_regions_registered = false;
 		current_device.openacc.data_environ = NULL;
 		current_device.openacc.target_data = NULL;
 		for (i = 0; i < new_num_devices; i++)
 		  {
 		    current_device.target_id = i;
 		    devices[num_devices] = current_device;
-		    gomp_mutex_init (&devices[num_devices].mem_map.lock);
 		    gomp_mutex_init (&devices[num_devices].lock);
 		    num_devices++;
 		  }
@@ -1157,21 +1237,12 @@ gomp_target_init (void)
 
   for (i = 0; i < num_devices; i++)
     {
-      int j;
-
-      for (j = 0; j < num_offload_images; j++)
-	gomp_register_image_for_device (&devices[i], &offload_images[j]);
-
       /* The 'devices' array can be moved (by the realloc call) until we have
 	 found all the plugins, so registering with the OpenACC runtime (which
 	 takes a copy of the pointer argument) must be delayed until now.  */
       if (devices[i].capabilities & GOMP_OFFLOAD_CAP_OPENACC_200)
 	goacc_register (&devices[i]);
     }
-
-  free (offload_images);
-  offload_images = NULL;
-  num_offload_images = 0;
 }
 
 #else /* PLUGIN_SUPPORT */
diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
index 3e7a958..a2d61b1 100644
--- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
+++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
@@ -34,6 +34,7 @@
 #include <string.h>
 #include <utility>
 #include <vector>
+#include <map>
 #include "libgomp-plugin.h"
 #include "compiler_if_host.h"
 #include "main_target_image.h"
@@ -53,6 +54,29 @@ fprintf (stderr, "\n");					    \
 #endif
 
 
+/* Start/end addresses of functions and global variables on a device.  */
+typedef std::vector<addr_pair> AddrVect;
+
+/* Addresses for one image and all devices.  */
+typedef std::vector<AddrVect> DevAddrVect;
+
+/* Addresses for all images and all devices.  */
+typedef std::map<void *, DevAddrVect> ImgDevAddrMap;
+
+
+/* Total number of available devices.  */
+static int num_devices;
+
+/* Total number of shared libraries with offloading to Intel MIC.  */
+static int num_images;
+
+/* Two dimensional array: one key is a pointer to image,
+   second key is number of device.  Contains a vector of pointer pairs.  */
+static ImgDevAddrMap *address_table;
+
+/* Thread-safe registration of the main image.  */
+static pthread_once_t main_image_is_registered = PTHREAD_ONCE_INIT;
+
 static VarDesc vd_host2tgt = {
   { 1, 1 },		      /* dst, src			      */
   { 1, 0 },		      /* in, out			      */
@@ -90,28 +114,17 @@ static VarDesc vd_tgt2host = {
 };
 
 
-/* Total number of shared libraries with offloading to Intel MIC.  */
-static int num_libraries;
-
-/* Pointers to the descriptors, containing pointers to host-side tables and to
-   target images.  */
-static std::vector< std::pair<void *, void *> > lib_descrs;
-
-/* Thread-safe registration of the main image.  */
-static pthread_once_t main_image_is_registered = PTHREAD_ONCE_INIT;
-
-
 /* Add path specified in LD_LIBRARY_PATH to MIC_LD_LIBRARY_PATH, which is
    required by liboffloadmic.  */
 __attribute__((constructor))
 static void
-set_mic_lib_path (void)
+init (void)
 {
   const char *ld_lib_path = getenv (LD_LIBRARY_PATH_ENV);
   const char *mic_lib_path = getenv (MIC_LD_LIBRARY_PATH_ENV);
 
   if (!ld_lib_path)
-    return;
+    goto out;
 
   if (!mic_lib_path)
     setenv (MIC_LD_LIBRARY_PATH_ENV, ld_lib_path, 1);
@@ -133,6 +146,10 @@ set_mic_lib_path (void)
       if (!use_alloca)
 	free (mic_lib_path_new);
     }
+
+out:
+  address_table = new ImgDevAddrMap;
+  num_devices = _Offload_number_of_devices ();
 }
 
 extern "C" const char *
@@ -162,18 +179,8 @@ GOMP_OFFLOAD_get_type (void)
 extern "C" int
 GOMP_OFFLOAD_get_num_devices (void)
 {
-  int res = _Offload_number_of_devices ();
-  TRACE ("(): return %d", res);
-  return res;
-}
-
-/* This should be called from every shared library with offloading.  */
-extern "C" void
-GOMP_OFFLOAD_register_image (void *host_table, void *target_image)
-{
-  TRACE ("(host_table = %p, target_image = %p)", host_table, target_image);
-  lib_descrs.push_back (std::make_pair (host_table, target_image));
-  num_libraries++;
+  TRACE ("(): return %d", num_devices);
+  return num_devices;
 }
 
 static void
@@ -196,7 +203,8 @@ register_main_image ()
   __offload_register_image (&main_target_image);
 }
 
-/* Load offload_target_main on target.  */
+/* liboffloadmic loads and runs offload_target_main on all available devices
+   during a first call to offload ().  */
 extern "C" void
 GOMP_OFFLOAD_init_device (int device)
 {
@@ -243,9 +251,11 @@ get_target_table (int device, int &num_funcs, int &num_vars, void **&table)
     }
 }
 
+/* Offload TARGET_IMAGE to all available devices and fill address_table with
+   corresponding target addresses.  */
+
 static void
-load_lib_and_get_table (int device, int lib_num, mapping_table *&table,
-			int &table_size)
+offload_image (void *target_image)
 {
   struct TargetImage {
     int64_t size;
@@ -254,19 +264,11 @@ load_lib_and_get_table (int device, int lib_num, mapping_table *&table,
     char data[];
   } __attribute__ ((packed));
 
-  void ***host_table_descr = (void ***) lib_descrs[lib_num].first;
-  void **host_func_start = host_table_descr[0];
-  void **host_func_end   = host_table_descr[1];
-  void **host_var_start  = host_table_descr[2];
-  void **host_var_end    = host_table_descr[3];
+  void *image_start = ((void **) target_image)[0];
+  void *image_end   = ((void **) target_image)[1];
 
-  void **target_image_descr = (void **) lib_descrs[lib_num].second;
-  void *image_start = target_image_descr[0];
-  void *image_end   = target_image_descr[1];
-
-  TRACE ("() host_table_descr { %p, %p, %p, %p }", host_func_start,
-	 host_func_end, host_var_start, host_var_end);
-  TRACE ("() target_image_descr { %p, %p }", image_start, image_end);
+  TRACE ("(target_image = %p { %p, %p })",
+	 target_image, image_start, image_end);
 
   int64_t image_size = (uintptr_t) image_end - (uintptr_t) image_start;
   TargetImage *image
@@ -279,94 +281,87 @@ load_lib_and_get_table (int device, int lib_num, mapping_table *&table,
     }
 
   image->size = image_size;
-  sprintf (image->name, "lib%010d.so", lib_num);
+  sprintf (image->name, "lib%010d.so", num_images++);
   memcpy (image->data, image_start, image->size);
 
   TRACE ("() __offload_register_image %s { %p, %d }",
 	 image->name, image_start, image->size);
   __offload_register_image (image);
 
-  int tgt_num_funcs = 0;
-  int tgt_num_vars = 0;
-  void **tgt_table = NULL;
-  get_target_table (device, tgt_num_funcs, tgt_num_vars, tgt_table);
-  free (image);
-
-  /* The func table contains only addresses, the var table contains addresses
-     and corresponding sizes.  */
-  int host_num_funcs = host_func_end - host_func_start;
-  int host_num_vars  = (host_var_end - host_var_start) / 2;
-  TRACE ("() host_num_funcs = %d, tgt_num_funcs = %d",
-	 host_num_funcs, tgt_num_funcs);
-  TRACE ("() host_num_vars = %d, tgt_num_vars = %d",
-	 host_num_vars, tgt_num_vars);
-  if (host_num_funcs != tgt_num_funcs)
+  /* Receive tables for target_image from all devices.  */
+  DevAddrVect dev_table;
+  for (int dev = 0; dev < num_devices; dev++)
     {
-      fprintf (stderr, "%s: Can't map target functions\n", __FILE__);
-      exit (1);
-    }
-  if (host_num_vars != tgt_num_vars)
-    {
-      fprintf (stderr, "%s: Can't map target variables\n", __FILE__);
-      exit (1);
-    }
+      int num_funcs = 0;
+      int num_vars = 0;
+      void **table = NULL;
 
-  table = (mapping_table *) realloc (table, (table_size + host_num_funcs
-					     + host_num_vars)
-					    * sizeof (mapping_table));
-  if (table == NULL)
-    {
-      fprintf (stderr, "%s: Can't allocate memory\n", __FILE__);
-      exit (1);
-    }
+      get_target_table (dev, num_funcs, num_vars, table);
 
-  for (int i = 0; i < host_num_funcs; i++)
-    {
-      mapping_table t;
-      t.host_start = (uintptr_t) host_func_start[i];
-      t.host_end = t.host_start + 1;
-      t.tgt_start = (uintptr_t) tgt_table[i];
-      t.tgt_end = t.tgt_start + 1;
-
-      TRACE ("() lib %d, func %d:\t0x%llx -- 0x%llx",
-	     lib_num, i, t.host_start, t.tgt_start);
-
-      table[table_size++] = t;
-    }
+      AddrVect curr_dev_table;
 
-  for (int i = 0; i < host_num_vars * 2; i += 2)
-    {
-      mapping_table t;
-      t.host_start = (uintptr_t) host_var_start[i];
-      t.host_end = t.host_start + (uintptr_t) host_var_start[i+1];
-      t.tgt_start = (uintptr_t) tgt_table[tgt_num_funcs+i];
-      t.tgt_end = t.tgt_start + (uintptr_t) tgt_table[tgt_num_funcs+i+1];
+      for (int i = 0; i < num_funcs; i++)
+	{
+	  addr_pair tgt_addr;
+	  tgt_addr.start = (uintptr_t) table[i];
+	  tgt_addr.end = tgt_addr.start + 1;
+	  TRACE ("() func %d:\t0x%llx..0x%llx", i,
+		 tgt_addr.start, tgt_addr.end);
+	  curr_dev_table.push_back (tgt_addr);
+	}
 
-      TRACE ("() lib %d, var %d:\t0x%llx (%d) -- 0x%llx (%d)", lib_num, i/2,
-	     t.host_start, t.host_end - t.host_start,
-	     t.tgt_start, t.tgt_end - t.tgt_start);
+      for (int i = 0; i < num_vars; i++)
+	{
+	  addr_pair tgt_addr;
+	  tgt_addr.start = (uintptr_t) table[num_funcs+i*2];
+	  tgt_addr.end = tgt_addr.start + (uintptr_t) table[num_funcs+i*2+1];
+	  TRACE ("() var %d:\t0x%llx..0x%llx", i, tgt_addr.start, tgt_addr.end);
+	  curr_dev_table.push_back (tgt_addr);
+	}
 
-      table[table_size++] = t;
+      dev_table.push_back (curr_dev_table);
     }
 
-  delete [] tgt_table;
+  address_table->insert (std::make_pair (target_image, dev_table));
+
+  free (image);
 }
 
 extern "C" int
-GOMP_OFFLOAD_get_table (int device, void *result)
+GOMP_OFFLOAD_load_image (int device, void *target_image, addr_pair **result)
 {
-  TRACE ("(num_libraries = %d)", num_libraries);
+  TRACE ("(device = %d, target_image = %p)", device, target_image);
 
-  mapping_table *table = NULL;
-  int table_size = 0;
+  /* If target_image is already present in address_table, then there is no need
+     to offload it.  */
+  if (address_table->count (target_image) == 0)
+    offload_image (target_image);
 
-  for (int i = 0; i < num_libraries; i++)
-    load_lib_and_get_table (device, i, table, table_size);
+  AddrVect *curr_dev_table = &(*address_table)[target_image][device];
+  int table_size = curr_dev_table->size ();
+  addr_pair *table = (addr_pair *) malloc (table_size * sizeof (addr_pair));
+  if (table == NULL)
+    {
+      fprintf (stderr, "%s: Can't allocate memory\n", __FILE__);
+      exit (1);
+    }
 
-  *(void **) result = table;
+  std::copy (curr_dev_table->begin (), curr_dev_table->end (), table);
+  *result = table;
   return table_size;
 }
 
+extern "C" void
+GOMP_OFFLOAD_unload_image (int device, void *target_image)
+{
+  TRACE ("(device = %d, target_image = %p)", device, target_image);
+
+  /* TODO: Currently liboffloadmic doesn't support __offload_unregister_image
+     for libraries.  */
+
+  address_table->erase (target_image);
+}
+
 extern "C" void *
 GOMP_OFFLOAD_alloc (int device, size_t size)
 {


Thanks,
  -- Ilya

[-- Attachment #2: dlopen.tgz --]
[-- Type: application/gzip, Size: 1031 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
  2015-03-26 12:09                               ` libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch) Jakub Jelinek
  2015-03-26 20:41                                 ` Ilya Verbin
@ 2015-03-27 15:21                                 ` Julian Brown
  1 sibling, 0 replies; 92+ messages in thread
From: Julian Brown @ 2015-03-27 15:21 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Ilya Verbin, Thomas Schwinge, gcc-patches, Kirill Yukhin

[-- Attachment #1: Type: text/plain, Size: 7685 bytes --]

On Thu, 26 Mar 2015 13:09:19 +0100
Jakub Jelinek <jakub@redhat.com> wrote:

> On Mon, Mar 23, 2015 at 10:44:39PM +0300, Ilya Verbin wrote:
> > If it is too late for such global changes (rework initialization in
> > libgomp, change mic and ptx plugins), then here is a small
> > workaround patch to fix offloading from libraries.  Likely, it will
> > not affect OpenACC programs with one image.  make
> > check-target-libgomp passed.
> 
> Sorry for not getting to this earlier, really busy with severe
> regressions bugfixing lately.
> 
> Anyway, IMHO it is not too late to fixing it properly, after all,
> the current code is majorly broken.  As I've said earlier, e.g. the
> lack of mutex guarding gomp_target_init (which is using pthread_once
> guaranteed to be run just once) vs. concurrent GOMP_offload_register
> calls (if those are run from ctors, then I guess something like
> dl_load_lock ensures at least on glibc that multiple
> GOMP_offload_register calls aren't performed at the same time) in
> accessing/reallocating offload_images and num_offload_images and the
> lack of support to register further images after the gomp_target_init
> call (if you dlopen further shared libraries) is really bad.  And it
> would be really nice to support the unloading.
> 
> But I'm afraid I'm lost in what is the latest posted patch for that,
> and how has it been tested (whether just on MIC or MIC emul, or also
> for nvptx).
> 
> So can you please post a link to the latest full patch and how it has
> been tested, and if it is still error prone if say one thread executes
> GOMP_target the first time and another at the same time dlopens some
> shared library that has offloading regions in it, fix that too?

I couldn't say about that -- I don't have all the state on the locking
problems at the moment.

> We still have a week or so to get this sorted out.

Apologies again for the delay in getting this out. Here's a "current"
version of the patch against the gomp4 branch (on top of Ilya's
load/unload patch) which passes testing for nvptx/openacc/libgomp,
modulo the usual (timing-related) noise in the lib-83.c test.

Thomas, can we get this tested on mainline and with MIC emulation?

This version fixes (some if not all) regressions with multiple
NVidia devices, and removes the memory-map lock and is_initialised
fields, reverting to just a splay tree in gomp_device_descr (as the
code was before the OpenACC merge). (Our multi-GPU machine is
temporarily out of action, so I can't easily test that setup at the
moment).

HTH,

Julian

ChangeLog

    gcc/
    * config/nvptx/mkoffload.c (process): Support variable mapping.

    libgomp/
    * libgomp.h (target_mem_desc: Remove mem_map field.
    (struct gomp_memory_mapping): Remove.
    (acc_dispatch_t): Remove open_device_func, close_device_func,
    get_device_num_func, set_device_num_func, target_data members.
    Change create_thread_data_func argument to device number instead of
    generic pointer.
    (struct gomp_device_descr): Replace mem_map field with splay tree
    directly.
    (gomp_free_memmap): Update prototype.
    * oacc-async.c (assert.h): Include.
    (acc_async_test, acc_async_test_all, acc_wait, acc_wait_async)
    (acc_wait_all, acc_wait_all_async): Use current host thread's
    active device, not base_dev.
    * oacc-cuda.c (acc_get_current_cuda_device)
    (acc_get_current_cuda_context, acc_get_cuda_stream)
    (acc_set_cuda_stream): Likewise.
    * oacc-host.c (host_dispatch): Don't set open_device_func,
    close_device_func, get_device_num_func or set_device_num_func.
    (goacc_host_init): Don't initialise host_dispatch.mem_map.lock.
    * oacc-init.c (base_dev, init_key): Remove.
    (cached_base_dev): New.
    (name_of_acc_device_t): New.
    (acc_init_1): Initialise default-numbered device, not zeroth.
    (acc_shutdown_1): Close all devices of a given type.
    (goacc_destroy_thread): Don't use base_dev.
    (lazy_open, lazy_init, lazy_init_and_open): Remove.
    (goacc_attach_host_thread_to_device): New.
    (acc_init): Reimplement with goacc_attach_host_thread_to_device.
    (acc_get_num_devices): Don't use base_dev.
    (acc_set_device_type): Reimplement.
    (acc_get_device_type): Don't use base_dev.
    (acc_get_device_num): Tweak logic.
    (acc_set_device_num): Likewise.
    (goacc_runtime_initialize): Initialize cached_base_dev not base_dev.
    (goacc_lazy_initialize): Reimplement with acc_init and
    goacc_attach_host_thread_to_device.
    * oacc-int.h (goacc_thread): Add base_dev field.
    (base_dev): Remove extern declaration.
    (goacc_attach_host_thread_to_device): Add prototype.
    * oacc-mem.c (lookup_host): Change first argument to
    gomp_device_descr. Use lock/splay tree from gomp_device_descr.
    (lookup_dev): Use lock from devicep not mem_map.
    (acc_malloc): Use current thread's device instead of base_dev.
    (acc_free): Likewise.
    (acc_memcpy_to_device): Likewise.
    (acc_memcpy_from_device): Likewise.
    (acc_deviceptr, acc_map_data, acc_unmap_data, present_create_copy)
    (delete_copyout, update_dev_host, gomp_acc_remove_pointer): Tweak
    lookup_host calls.
    * oacc-parallel.c (select_acc_device): Remove. Replace calls with
    goacc_lazy_initialize (throughout).
    (GOACC_parallel): Use lock and splay tree from gomp_device_descr not
    gomp_memory_mapping.
    * target.c (gomp_map_vars, gomp_copy_from_async, gomp_unmap_vars)
    (gomp_splay_tree_insert_mapping, GOMP_offload_unregister)
    (GOMP_target): Use splay tree and lock directly in
    gomp_device_descr, not gomp_memory_mapping.
    (gomp_update): Remove mm argument. Use splay tree and lock directly
    in gomp_device_descr.
    (gomp_free_memmap): Change argument to struct splay_tree_s.
    (gomp_load_plugin_for_device): Don't initialise openacc
    open_device, close_device, get_device_num or set_device_num hooks.
    Don't initialise target_data or deleted mem_map is_initialized,
    splay_tree.root fields.
    * plugin/plugin-host.c (GOMP_OFFLOAD_openacc_open_device)
    (GOMP_OFFLOAD_openacc_close_device)
    (GOMP_OFFLOAD_openacc_get_device_num)
    (GOMP_OFFLOAD_openacc_set_device_num): Remove.
    (GOMP_OFFLOAD_openacc_create_thread_data): Change (unused) argument
    to int.
    * plugin/plugin-nvptx.c (pthread.h): Include.
    (ptx_inited): Remove.
    (instantiated_devices, ptx_dev_lock): New.
    (struct ptx_image_data): New.
    (ptx_devices, ptx_images, ptx_image_lock): New.
    (fini_streams_for_device): Reorder cuStreamDestroy call.
    (nvptx_get_num_devices): Remove forward declaration.
    (nvptx_init): Change return type to bool.
    (nvptx_fini): Remove.
    (nvptx_attach_host_thread_to_device): New.
    (nvptx_open_device): Return struct ptx_device* instead of void*.
    (nvptx_close_device): Change argument type to struct ptx_device*,
    return type to void.
    (nvptx_get_num_devices): Use instantiated_devices not ptx_inited.
    (kernel_target_data, kernel_host_table): Remove static globals.
    (GOMP_OFFLOAD_register_image, GOMP_OFFLOAD_get_table): Remove.
    (GOMP_OFFLOAD_init_device): Reimplement.
    (GOMP_OFFLOAD_fini_device): Likewise.
    (GOMP_OFFLOAD_load_image, GOMP_OFFLOAD_unload_image): New.
    (GOMP_OFFLOAD_alloc, GOMP_OFFLOAD_free, GOMP_OFFLOAD_dev2host)
    (GOMP_OFFLOAD_host2dev): Use ORD argument.
    (GOMP_OFFLOAD_openacc_open_device)
    (GOMP_OFFLOAD_openacc_close_device)
    (GOMP_OFFLOAD_openacc_set_device_num)
    (GOMP_OFFLOAD_openacc_get_device_num): Remove.
    (GOMP_OFFLOAD_openacc_create_thread_data): Change argument to int
    (device number).

    libgomp/testsuite/
    * libgomp.oacc-c-c++-common/lib-9.c: Fix devnum check in test.

[-- Attachment #2: nvptx-load-unload-4.diff --]
[-- Type: text/x-patch, Size: 65298 bytes --]

commit 63091061f227f124d8d496fd3064982935178f3a
Author: Julian Brown <julian@codesourcery.com>
Date:   Mon Feb 23 11:55:41 2015 -0800

    nvptx load/unload image support, init rework
    
    fix multi-device tests
    
    more load/unload patch cleanups
    
    misc fixes

diff --git a/gcc/config/nvptx/mkoffload.c b/gcc/config/nvptx/mkoffload.c
index 02c44b6..dbc68bc 100644
--- a/gcc/config/nvptx/mkoffload.c
+++ b/gcc/config/nvptx/mkoffload.c
@@ -839,6 +839,7 @@ process (FILE *in, FILE *out)
 {
   const char *input = read_file (in);
   Token *tok = tokenize (input);
+  unsigned int nvars = 0, nfuncs = 0;
 
   do
     tok = parse_file (tok);
@@ -850,16 +851,17 @@ process (FILE *in, FILE *out)
   write_stmts (out, rev_stmts (fns));
   fprintf (out, ";\n\n");
   fprintf (out, "static const char *var_mappings[] = {\n");
-  for (id_map *id = var_ids; id; id = id->next)
+  for (id_map *id = var_ids; id; id = id->next, nvars++)
     fprintf (out, "\t\"%s\"%s\n", id->ptx_name, id->next ? "," : "");
   fprintf (out, "};\n\n");
   fprintf (out, "static const char *func_mappings[] = {\n");
-  for (id_map *id = func_ids; id; id = id->next)
+  for (id_map *id = func_ids; id; id = id->next, nfuncs++)
     fprintf (out, "\t\"%s\"%s\n", id->ptx_name, id->next ? "," : "");
   fprintf (out, "};\n\n");
 
   fprintf (out, "static const void *target_data[] = {\n");
-  fprintf (out, "  ptx_code, var_mappings, func_mappings\n");
+  fprintf (out, "  ptx_code, (void*) %u, var_mappings, (void*) %u, "
+		"func_mappings\n", nvars, nfuncs);
   fprintf (out, "};\n\n");
 
   fprintf (out, "extern void GOMP_offload_register (const void *, int, void *);\n");
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 3fc9aa9..822d2fe 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -656,9 +656,6 @@ struct target_mem_desc {
   /* Corresponding target device descriptor.  */
   struct gomp_device_descr *device_descr;
 
-  /* Memory mapping info for the thread that created this descriptor.  */
-  struct gomp_memory_mapping *mem_map;
-
   /* List of splay keys to remove (or decrease refcount)
      at the end of region.  */
   splay_tree_key list[];
@@ -683,20 +680,6 @@ struct splay_tree_key_s {
 
 #include "splay-tree.h"
 
-/* Information about mapped memory regions (per device/context).  */
-
-struct gomp_memory_mapping
-{
-  /* Mutex for operating with the splay tree and other shared structures.  */
-  gomp_mutex_t lock;
-
-  /* True when tables have been added to this memory map.  */
-  bool is_initialized;
-
-  /* Splay tree containing information about mapped memory regions.  */
-  struct splay_tree_s splay_tree;
-};
-
 typedef struct acc_dispatch_t
 {
   /* This is a linked list of data mapped using the
@@ -706,18 +689,6 @@ typedef struct acc_dispatch_t
   /* This is guarded by the lock in the "outer" struct gomp_device_descr.  */
   struct target_mem_desc *data_environ;
 
-  /* Extra information required for a device instance by a given target.  */
-  /* This is guarded by the lock in the "outer" struct gomp_device_descr.  */
-  void *target_data;
-
-  /* Open or close a device instance.  */
-  void *(*open_device_func) (int n);
-  int (*close_device_func) (void *h);
-
-  /* Set or get the device number.  */
-  int (*get_device_num_func) (void);
-  void (*set_device_num_func) (int);
-
   /* Execute.  */
   void (*exec_func) (void (*) (void *), size_t, void **, void **, size_t *,
 		     unsigned short *, int, int, int, int, void *);
@@ -735,7 +706,7 @@ typedef struct acc_dispatch_t
   void (*async_set_async_func) (int);
 
   /* Create/destroy TLS data.  */
-  void *(*create_thread_data_func) (void *);
+  void *(*create_thread_data_func) (int);
   void (*destroy_thread_data_func) (void *);
 
   /* NVIDIA target specific routines.  */
@@ -783,9 +754,9 @@ struct gomp_device_descr
   void *(*host2dev_func) (int, void *, const void *, size_t);
   void (*run_func) (int, void *, void *);
 
-  /* Memory-mapping info for this device instance.  */
-  /* Uses a separate lock.  */
-  struct gomp_memory_mapping mem_map;
+  /* Splay tree containing information about mapped memory regions for this
+     device instance.  */
+  struct splay_tree_s splay_tree;
 
   /* Mutex for the mutable data.  */
   gomp_mutex_t lock;
@@ -810,7 +781,7 @@ extern void gomp_unmap_vars (struct target_mem_desc *, bool);
 extern void gomp_init_device (struct gomp_device_descr *);
 extern void gomp_init_tables (struct gomp_device_descr *,
 			      struct gomp_memory_mapping *);
-extern void gomp_free_memmap (struct gomp_memory_mapping *);
+extern void gomp_free_memmap (struct splay_tree_s);
 extern void gomp_fini_device (struct gomp_device_descr *);
 
 /* work.c */
diff --git a/libgomp/oacc-async.c b/libgomp/oacc-async.c
index 08b7c5e..1f5827e 100644
--- a/libgomp/oacc-async.c
+++ b/libgomp/oacc-async.c
@@ -26,7 +26,7 @@
    see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
-
+#include <assert.h>
 #include "openacc.h"
 #include "libgomp.h"
 #include "oacc-int.h"
@@ -37,13 +37,23 @@ acc_async_test (int async)
   if (async < acc_async_sync)
     gomp_fatal ("invalid async argument: %d", async);
 
-  return base_dev->openacc.async_test_func (async);
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (!thr || !thr->dev)
+    gomp_fatal ("no device active");
+
+  return thr->dev->openacc.async_test_func (async);
 }
 
 int
 acc_async_test_all (void)
 {
-  return base_dev->openacc.async_test_all_func ();
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (!thr || !thr->dev)
+    gomp_fatal ("no device active");
+
+  return thr->dev->openacc.async_test_all_func ();
 }
 
 void
@@ -52,19 +62,34 @@ acc_wait (int async)
   if (async < acc_async_sync)
     gomp_fatal ("invalid async argument: %d", async);
 
-  base_dev->openacc.async_wait_func (async);
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (!thr || !thr->dev)
+    gomp_fatal ("no device active");
+
+  thr->dev->openacc.async_wait_func (async);
 }
 
 void
 acc_wait_async (int async1, int async2)
 {
-  base_dev->openacc.async_wait_async_func (async1, async2);
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (!thr || !thr->dev)
+    gomp_fatal ("no device active");
+
+  thr->dev->openacc.async_wait_async_func (async1, async2);
 }
 
 void
 acc_wait_all (void)
 {
-  base_dev->openacc.async_wait_all_func ();
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (!thr || !thr->dev)
+    gomp_fatal ("no device active");
+
+  thr->dev->openacc.async_wait_all_func ();
 }
 
 void
@@ -73,5 +98,10 @@ acc_wait_all_async (int async)
   if (async < acc_async_sync)
     gomp_fatal ("invalid async argument: %d", async);
 
-  base_dev->openacc.async_wait_all_async_func (async);
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (!thr || !thr->dev)
+    gomp_fatal ("no device active");
+
+  thr->dev->openacc.async_wait_all_async_func (async);
 }
diff --git a/libgomp/oacc-cuda.c b/libgomp/oacc-cuda.c
index c8ef376..4aab422 100644
--- a/libgomp/oacc-cuda.c
+++ b/libgomp/oacc-cuda.c
@@ -34,51 +34,53 @@
 void *
 acc_get_current_cuda_device (void)
 {
-  void *p = NULL;
+  struct goacc_thread *thr = goacc_thread ();
 
-  if (base_dev && base_dev->openacc.cuda.get_current_device_func)
-    p = base_dev->openacc.cuda.get_current_device_func ();
+  if (thr && thr->dev && thr->dev->openacc.cuda.get_current_device_func)
+    return thr->dev->openacc.cuda.get_current_device_func ();
 
-  return p;
+  return NULL;
 }
 
 void *
 acc_get_current_cuda_context (void)
 {
-  void *p = NULL;
+  struct goacc_thread *thr = goacc_thread ();
 
-  if (base_dev && base_dev->openacc.cuda.get_current_context_func)
-    p = base_dev->openacc.cuda.get_current_context_func ();
-
-  return p;
+  if (thr && thr->dev && thr->dev->openacc.cuda.get_current_context_func)
+    return thr->dev->openacc.cuda.get_current_context_func ();
+ 
+  return NULL;
 }
 
 void *
 acc_get_cuda_stream (int async)
 {
-  void *p = NULL;
+  struct goacc_thread *thr = goacc_thread ();
 
   if (async < 0)
-    return p;
-
-  if (base_dev && base_dev->openacc.cuda.get_stream_func)
-    p = base_dev->openacc.cuda.get_stream_func (async);
+    return NULL;
 
-  return p;
+  if (thr && thr->dev && thr->dev->openacc.cuda.get_stream_func)
+    return thr->dev->openacc.cuda.get_stream_func (async);
+ 
+  return NULL;
 }
 
 int
 acc_set_cuda_stream (int async, void *stream)
 {
-  int s = -1;
+  struct goacc_thread *thr;
 
   if (async < 0 || stream == NULL)
     return 0;
 
   goacc_lazy_initialize ();
 
-  if (base_dev && base_dev->openacc.cuda.set_stream_func)
-    s = base_dev->openacc.cuda.set_stream_func (async, stream);
+  thr = goacc_thread ();
+
+  if (thr && thr->dev && thr->dev->openacc.cuda.set_stream_func)
+    return thr->dev->openacc.cuda.set_stream_func (async, stream);
 
-  return s;
+  return -1;
 }
diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index 5d67c6c..6dcdbf3 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -53,17 +53,9 @@ static struct gomp_device_descr host_dispatch =
     .host2dev_func = GOMP_OFFLOAD_host2dev,
     .run_func = GOMP_OFFLOAD_run,
 
-    .mem_map.is_initialized = false,
-    .mem_map.splay_tree.root = NULL,
     .is_initialized = false,
 
     .openacc = {
-      .open_device_func = GOMP_OFFLOAD_openacc_open_device,
-      .close_device_func = GOMP_OFFLOAD_openacc_close_device,
-
-      .get_device_num_func = GOMP_OFFLOAD_openacc_get_device_num,
-      .set_device_num_func = GOMP_OFFLOAD_openacc_set_device_num,
-
       .exec_func = GOMP_OFFLOAD_openacc_parallel,
 
       .register_async_cleanup_func
@@ -93,7 +85,6 @@ static struct gomp_device_descr host_dispatch =
 static __attribute__ ((constructor))
 void goacc_host_init (void)
 {
-  gomp_mutex_init (&host_dispatch.mem_map.lock);
   gomp_mutex_init (&host_dispatch.lock);
   goacc_register (&host_dispatch);
 }
diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
index 19b937a..091dbc9 100644
--- a/libgomp/oacc-init.c
+++ b/libgomp/oacc-init.c
@@ -37,14 +37,13 @@
 
 static gomp_mutex_t acc_device_lock;
 
-/* The dispatch table for the current accelerator device.  This is global, so
-   you can only have one type of device open at any given time in a program.
-   This is the "base" device in that several devices that use the same
-   dispatch table may be active concurrently: this one (the "zeroth") is used
-   for overall initialisation/shutdown, and other instances -- not necessarily
-   including this one -- may be opened and closed once the base device has
-   been initialized.  */
-struct gomp_device_descr *base_dev;
+/* A cached version of the dispatcher for the global "current" accelerator type,
+   e.g. used as the default when creating new host threads.  This is the
+   device-type equivalent of goacc_device_num (which specifies which device to
+   use out of potentially several of the same type).  If there are several
+   devices of a given type, this points at the first one.  */
+
+static struct gomp_device_descr *cached_base_dev = NULL;
 
 #if defined HAVE_TLS || defined USE_EMUTLS
 __thread struct goacc_thread *goacc_tls_data;
@@ -53,9 +52,6 @@ pthread_key_t goacc_tls_key;
 #endif
 static pthread_key_t goacc_cleanup_key;
 
-/* Current dispatcher, and how it was initialized */
-static acc_device_t init_key = _ACC_device_hwm;
-
 static struct goacc_thread *goacc_threads;
 static gomp_mutex_t goacc_thread_lock;
 
@@ -94,6 +90,21 @@ get_openacc_name (const char *name)
     return name;
 }
 
+static const char *
+name_of_acc_device_t (enum acc_device_t type)
+{
+  switch (type)
+    {
+    case acc_device_none: return "none";
+    case acc_device_default: return "default";
+    case acc_device_host: return "host";
+    case acc_device_host_nonshm: return "host_nonshm";
+    case acc_device_not_host: return "not_host";
+    case acc_device_nvidia: return "nvidia";
+    default: gomp_fatal ("unknown device type %u", (unsigned) type);
+    }
+}
+
 static struct gomp_device_descr *
 resolve_device (acc_device_t d)
 {
@@ -159,22 +170,87 @@ resolve_device (acc_device_t d)
 static struct gomp_device_descr *
 acc_init_1 (acc_device_t d)
 {
-  struct gomp_device_descr *acc_dev;
+  struct gomp_device_descr *base_dev, *acc_dev;
+  int ndevs;
 
-  acc_dev = resolve_device (d);
+  base_dev = resolve_device (d);
+
+  ndevs = base_dev->get_num_devices_func ();
+
+  if (!base_dev || ndevs <= 0 || goacc_device_num >= ndevs)
+    gomp_fatal ("device %s not supported", name_of_acc_device_t (d));
 
-  if (!acc_dev || acc_dev->get_num_devices_func () <= 0)
-    gomp_fatal ("device %u not supported", (unsigned)d);
+  acc_dev = &base_dev[goacc_device_num];
 
   if (acc_dev->is_initialized)
     gomp_fatal ("device already active");
 
-  /* We need to remember what we were intialized as, to check shutdown etc.  */
-  init_key = d;
-
   gomp_init_device (acc_dev);
 
-  return acc_dev;
+  return base_dev;
+}
+
+static void
+acc_shutdown_1 (acc_device_t d)
+{
+  struct gomp_device_descr *base_dev;
+  struct goacc_thread *walk;
+  int ndevs, i;
+  bool devices_active = false;
+
+  /* Get the base device for this device type.  */
+  base_dev = resolve_device (d);
+
+  if (!base_dev)
+    gomp_fatal ("device %s not supported", name_of_acc_device_t (d));
+
+  gomp_mutex_lock (&goacc_thread_lock);
+
+  /* Free target-specific TLS data and close all devices.  */
+  for (walk = goacc_threads; walk != NULL; walk = walk->next)
+    {
+      if (walk->target_tls)
+	base_dev->openacc.destroy_thread_data_func (walk->target_tls);
+
+      walk->target_tls = NULL;
+
+      /* This would mean the user is shutting down OpenACC in the middle of an
+         "acc data" pragma.  Likely not intentional.  */
+      if (walk->mapped_data)
+	gomp_fatal ("shutdown in 'acc data' region");
+
+      /* Similarly, if this happens then user code has done something weird.  */
+      if (walk->saved_bound_dev)
+        gomp_fatal ("shutdown during host fallback");
+
+      if (walk->dev)
+	{
+	  gomp_mutex_lock (&walk->dev->lock);
+	  gomp_free_memmap (walk->dev->splay_tree);
+	  gomp_mutex_unlock (&walk->dev->lock);
+
+	  walk->dev = NULL;
+	  walk->base_dev = NULL;
+	}
+    }
+
+  gomp_mutex_unlock (&goacc_thread_lock);
+
+  ndevs = base_dev->get_num_devices_func ();
+
+  /* Close all the devices of this type that have been opened.  */
+  for (i = 0; i < ndevs; i++)
+    {
+      struct gomp_device_descr *acc_dev = &base_dev[i];
+      if (acc_dev->is_initialized)
+        {
+	  devices_active = true;
+	  gomp_fini_device (acc_dev);
+	}
+    }
+
+  if (!devices_active)
+    gomp_fatal ("no device initialized");
 }
 
 static struct goacc_thread *
@@ -207,9 +283,11 @@ goacc_destroy_thread (void *data)
 
   if (thr)
     {
-      if (base_dev && thr->target_tls)
+      struct gomp_device_descr *acc_dev = thr->dev;
+
+      if (acc_dev && thr->target_tls)
 	{
-	  base_dev->openacc.destroy_thread_data_func (thr->target_tls);
+	  acc_dev->openacc.destroy_thread_data_func (thr->target_tls);
 	  thr->target_tls = NULL;
 	}
 
@@ -236,53 +314,49 @@ goacc_destroy_thread (void *data)
   gomp_mutex_unlock (&goacc_thread_lock);
 }
 
-/* Open the ORD'th device of the currently-active type (base_dev must be
-   initialised before calling).  If ORD is < 0, open the default-numbered
-   device (set by the ACC_DEVICE_NUM environment variable or a call to
-   acc_set_device_num), or leave any currently-opened device as is.  "Opening"
-   consists of calling the device's open_device_func hook, and setting up
-   thread-local data (maybe allocating, then initializing with information
-   pertaining to the newly-opened or previously-opened device).  */
+/* Use the ORD'th device instance for the current host thread (or -1 for the
+   current global default).  The device (and the runtime) must be initialised
+   before calling this function.  */
 
-static void
-lazy_open (int ord)
+void
+goacc_attach_host_thread_to_device (int ord)
 {
   struct goacc_thread *thr = goacc_thread ();
-  struct gomp_device_descr *acc_dev;
-
-  if (thr && thr->dev)
-    {
-      assert (ord < 0 || ord == thr->dev->target_id);
-      return;
-    }
-
-  assert (base_dev);
-
+  struct gomp_device_descr *acc_dev = NULL, *base_dev = NULL;
+  int num_devices;
+  
+  if (thr && thr->dev && (thr->dev->target_id == ord || ord < 0))
+    return;
+  
   if (ord < 0)
     ord = goacc_device_num;
-
-  /* The OpenACC 2.0 spec leaves the runtime's behaviour when an out-of-range
-     device is requested as implementation-defined (4.2 ACC_DEVICE_NUM).
-     We choose to raise an error in such a case.  */
-  if (ord >= base_dev->get_num_devices_func ())
-    gomp_fatal ("device %u does not exist", ord);
-
+  
+  /* Decide which type of device to use.  If the current thread has a device
+     type already (e.g. set by acc_set_device_type), use that, else use the
+     global default.  */
+  if (thr && thr->base_dev)
+    base_dev = thr->base_dev;
+  else
+    {
+      assert (cached_base_dev);
+      base_dev = cached_base_dev;
+    }
+  
+  num_devices = base_dev->get_num_devices_func ();
+  if (num_devices <= 0 || ord >= num_devices)
+    gomp_fatal ("device %u out of range", ord);
+  
   if (!thr)
     thr = goacc_new_thread ();
-
-  acc_dev = thr->dev = &base_dev[ord];
-
-  assert (acc_dev->target_id == ord);
-
+  
+  thr->base_dev = base_dev;
+  thr->dev = acc_dev = &base_dev[ord];
   thr->saved_bound_dev = NULL;
   thr->mapped_data = NULL;
-
-  if (!acc_dev->openacc.target_data)
-    acc_dev->openacc.target_data = acc_dev->openacc.open_device_func (ord);
-
+  
   thr->target_tls
-    = acc_dev->openacc.create_thread_data_func (acc_dev->openacc.target_data);
-
+    = acc_dev->openacc.create_thread_data_func (ord);
+  
   acc_dev->openacc.async_set_async_func (acc_async_sync);
 }
 
@@ -292,75 +366,20 @@ lazy_open (int ord)
 void
 acc_init (acc_device_t d)
 {
-  if (!base_dev)
+  if (!cached_base_dev)
     gomp_init_targets_once ();
 
   gomp_mutex_lock (&acc_device_lock);
 
-  base_dev = acc_init_1 (d);
-
-  lazy_open (-1);
+  cached_base_dev = acc_init_1 (d);
 
   gomp_mutex_unlock (&acc_device_lock);
+  
+  goacc_attach_host_thread_to_device (-1);
 }
 
 ialias (acc_init)
 
-static void
-acc_shutdown_1 (acc_device_t d)
-{
-  struct goacc_thread *walk;
-
-  /* We don't check whether d matches the actual device found, because
-     OpenACC 2.0 (3.2.12) says the parameters to the init and this
-     call must match (for the shutdown call anyway, it's silent on
-     others).  */
-
-  if (!base_dev)
-    gomp_fatal ("no device initialized");
-  if (d != init_key)
-    gomp_fatal ("device %u(%u) is initialized",
-		(unsigned) init_key, (unsigned) base_dev->type);
-
-  gomp_mutex_lock (&goacc_thread_lock);
-
-  /* Free target-specific TLS data and close all devices.  */
-  for (walk = goacc_threads; walk != NULL; walk = walk->next)
-    {
-      if (walk->target_tls)
-	base_dev->openacc.destroy_thread_data_func (walk->target_tls);
-
-      walk->target_tls = NULL;
-
-      /* This would mean the user is shutting down OpenACC in the middle of an
-         "acc data" pragma.  Likely not intentional.  */
-      if (walk->mapped_data)
-	gomp_fatal ("shutdown in 'acc data' region");
-
-      if (walk->dev)
-	{
-	  void *target_data = walk->dev->openacc.target_data;
-	  if (walk->dev->openacc.close_device_func (target_data) < 0)
-	    gomp_fatal ("failed to close device");
-
-	  walk->dev->openacc.target_data = target_data = NULL;
-
-	  struct gomp_memory_mapping *mem_map = &walk->dev->mem_map;
-	  gomp_mutex_lock (&mem_map->lock);
-	  gomp_free_memmap (mem_map);
-	  gomp_mutex_unlock (&mem_map->lock);
-
-	  walk->dev = NULL;
-	}
-    }
-
-  gomp_mutex_unlock (&goacc_thread_lock);
-
-  gomp_fini_device (base_dev);
-
-  base_dev = NULL;
-}
-
 void
 acc_shutdown (acc_device_t d)
 {
@@ -373,59 +392,16 @@ acc_shutdown (acc_device_t d)
 
 ialias (acc_shutdown)
 
-/* This function is called after plugins have been initialized.  It deals with
-   the "base" device, and is used to prepare the runtime for dealing with a
-   number of such devices (as implemented by some particular plugin).  If the
-   argument device type D matches a previous call to the function, return the
-   current base device, else shut the old device down and re-initialize with
-   the new device type.  */
-
-static struct gomp_device_descr *
-lazy_init (acc_device_t d)
-{
-  if (base_dev)
-    {
-      /* Re-initializing the same device, do nothing.  */
-      if (d == init_key)
-	return base_dev;
-
-      acc_shutdown_1 (init_key);
-    }
-
-  assert (!base_dev);
-
-  return acc_init_1 (d);
-}
-
-/* Ensure that plugins are loaded, initialize and open the (default-numbered)
-   device.  */
-
-static void
-lazy_init_and_open (acc_device_t d)
-{
-  if (!base_dev)
-    gomp_init_targets_once ();
-
-  gomp_mutex_lock (&acc_device_lock);
-
-  base_dev = lazy_init (d);
-
-  lazy_open (-1);
-
-  gomp_mutex_unlock (&acc_device_lock);
-}
-
 int
 acc_get_num_devices (acc_device_t d)
 {
   int n = 0;
-  const struct gomp_device_descr *acc_dev;
+  struct gomp_device_descr *acc_dev;
 
   if (d == acc_device_none)
     return 0;
 
-  if (!base_dev)
-    gomp_init_targets_once ();
+  gomp_init_targets_once ();
 
   acc_dev = resolve_device (d);
   if (!acc_dev)
@@ -440,10 +416,39 @@ acc_get_num_devices (acc_device_t d)
 
 ialias (acc_get_num_devices)
 
+/* Set the device type for the current thread only (using the current global
+   default device number), initialising that device if necessary.  Also set the
+   default device type for new threads to D.  */
+
 void
 acc_set_device_type (acc_device_t d)
 {
-  lazy_init_and_open (d);
+  struct gomp_device_descr *base_dev, *acc_dev;
+  struct goacc_thread *thr = goacc_thread ();
+
+  gomp_mutex_lock (&acc_device_lock);
+
+  if (!cached_base_dev)
+    gomp_init_targets_once ();
+
+  cached_base_dev = base_dev = resolve_device (d);
+  acc_dev = &base_dev[goacc_device_num];
+
+  if (!acc_dev->is_initialized)
+    gomp_init_device (acc_dev);
+
+  gomp_mutex_unlock (&acc_device_lock);
+
+  /* We're changing device type: invalidate the current thread's dev and
+     base_dev pointers.  */
+  if (thr && thr->base_dev != base_dev)
+    {
+      thr->base_dev = thr->dev = NULL;
+      if (thr->mapped_data)
+        gomp_fatal ("acc_set_device_type in 'acc data' region");
+    }
+
+  goacc_attach_host_thread_to_device (-1);
 }
 
 ialias (acc_set_device_type)
@@ -452,10 +457,11 @@ acc_device_t
 acc_get_device_type (void)
 {
   acc_device_t res = acc_device_none;
-  const struct gomp_device_descr *dev;
+  struct gomp_device_descr *dev;
+  struct goacc_thread *thr = goacc_thread ();
 
-  if (base_dev)
-    res = acc_device_type (base_dev->type);
+  if (thr && thr->base_dev)
+    res = acc_device_type (thr->base_dev->type);
   else
     {
       gomp_init_targets_once ();
@@ -476,78 +482,65 @@ int
 acc_get_device_num (acc_device_t d)
 {
   const struct gomp_device_descr *dev;
-  int num;
+  struct goacc_thread *thr = goacc_thread ();
 
   if (d >= _ACC_device_hwm)
     gomp_fatal ("device %u out of range", (unsigned)d);
 
-  if (!base_dev)
+  if (!cached_base_dev)
     gomp_init_targets_once ();
 
   dev = resolve_device (d);
   if (!dev)
-    gomp_fatal ("no devices of type %u", d);
+    gomp_fatal ("device %s not supported", name_of_acc_device_t (d));
 
-  /* We might not have called lazy_open for this host thread yet, in which case
-     the get_device_num_func hook will return -1.  */
-  num = dev->openacc.get_device_num_func ();
-  if (num < 0)
-    num = goacc_device_num;
+  if (thr && thr->base_dev == dev && thr->dev)
+    return thr->dev->target_id;
 
-  return num;
+  return goacc_device_num;
 }
 
 ialias (acc_get_device_num)
 
 void
-acc_set_device_num (int n, acc_device_t d)
+acc_set_device_num (int ord, acc_device_t d)
 {
-  const struct gomp_device_descr *dev;
+  struct gomp_device_descr *base_dev, *acc_dev;
   int num_devices;
 
-  if (!base_dev)
+  if (!cached_base_dev)
     gomp_init_targets_once ();
 
-  if ((int) d == 0)
-    {
-      int i;
-
-      /* A device setting of zero sets all device types on the system to use
-         the Nth instance of that device type.  Only attempt it for initialized
-	 devices though.  */
-      for (i = acc_device_not_host + 1; i < _ACC_device_hwm; i++)
-        {
-	  dev = resolve_device (d);
-	  if (dev && dev->is_initialized)
-	    dev->openacc.set_device_num_func (n);
-	}
+  if (ord < 0)
+    ord = goacc_device_num;
 
-      /* ...and for future calls to acc_init/acc_set_device_type, etc.  */
-      goacc_device_num = n;
-    }
+  if ((int) d == 0)
+    /* Set whatever device is being used by the current host thread to use
+       device instance ORD.  It's unclear if this is supposed to affect other
+       host threads too (OpenACC 2.0 (3.2.4) acc_set_device_num).  */
+    goacc_attach_host_thread_to_device (ord);
   else
     {
-      struct goacc_thread *thr = goacc_thread ();
-
       gomp_mutex_lock (&acc_device_lock);
 
-      base_dev = lazy_init (d);
+      cached_base_dev = base_dev = resolve_device (d);
 
       num_devices = base_dev->get_num_devices_func ();
 
-      if (n >= num_devices)
-        gomp_fatal ("device %u out of range", n);
+      if (ord >= num_devices)
+        gomp_fatal ("device %u out of range", ord);
 
-      /* If we're changing the device number, de-associate this thread with
-	 the device (but don't close the device, since it may be in use by
-	 other threads).  */
-      if (thr && thr->dev && n != thr->dev->target_id)
-	thr->dev = NULL;
+      acc_dev = &base_dev[ord];
 
-      lazy_open (n);
+      if (!acc_dev->is_initialized)
+        gomp_init_device (acc_dev);
 
       gomp_mutex_unlock (&acc_device_lock);
+
+      goacc_attach_host_thread_to_device (ord);
     }
+  
+  goacc_device_num = ord;
 }
 
 ialias (acc_set_device_num)
@@ -555,10 +548,7 @@ ialias (acc_set_device_num)
 int
 acc_on_device (acc_device_t dev)
 {
-  struct goacc_thread *thr = goacc_thread ();
-
-  if (thr && thr->dev
-      && acc_device_type (thr->dev->type) == acc_device_host_nonshm)
+  if (acc_get_device_type () == acc_device_host_nonshm)
     return dev == acc_device_host_nonshm || dev == acc_device_not_host;
 
   /* Just rely on the compiler builtin.  */
@@ -578,7 +568,7 @@ goacc_runtime_initialize (void)
 
   pthread_key_create (&goacc_cleanup_key, goacc_destroy_thread);
 
-  base_dev = NULL;
+  cached_base_dev = NULL;
 
   goacc_threads = NULL;
   gomp_mutex_init (&goacc_thread_lock);
@@ -607,9 +597,8 @@ goacc_restore_bind (void)
 }
 
 /* This is called from any OpenACC support function that may need to implicitly
-   initialize the libgomp runtime.  On exit all such initialization will have
-   been done, and both the global ACC_dev and the per-host-thread ACC_memmap
-   pointers will be valid.  */
+   initialize the libgomp runtime, either globally or from a new host thread. 
+   On exit "goacc_thread" will return a valid & populated thread block.  */
 
 attribute_hidden void
 goacc_lazy_initialize (void)
@@ -619,12 +608,8 @@ goacc_lazy_initialize (void)
   if (thr && thr->dev)
     return;
 
-  if (!base_dev)
-    lazy_init_and_open (acc_device_default);
+  if (!cached_base_dev)
+    acc_init (acc_device_default);
   else
-    {
-      gomp_mutex_lock (&acc_device_lock);
-      lazy_open (-1);
-      gomp_mutex_unlock (&acc_device_lock);
-    }
+    goacc_attach_host_thread_to_device (-1);
 }
diff --git a/libgomp/oacc-int.h b/libgomp/oacc-int.h
index 85619c8..0ace737 100644
--- a/libgomp/oacc-int.h
+++ b/libgomp/oacc-int.h
@@ -56,6 +56,9 @@ acc_device_type (enum offload_target_type type)
 
 struct goacc_thread
 {
+  /* The base device for the current thread.  */
+  struct gomp_device_descr *base_dev;
+
   /* The device for the current thread.  */
   struct gomp_device_descr *dev;
 
@@ -89,10 +92,7 @@ goacc_thread (void)
 #endif
 
 void goacc_register (struct gomp_device_descr *) __GOACC_NOTHROW;
-
-/* Current dispatcher.  */
-extern struct gomp_device_descr *base_dev;
-
+void goacc_attach_host_thread_to_device (int);
 void goacc_runtime_initialize (void);
 void goacc_save_and_set_bind (acc_device_t);
 void goacc_restore_bind (void);
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 0096d51..1135be5 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -38,7 +38,7 @@
 /* Return block containing [H->S), or NULL if not contained.  */
 
 static splay_tree_key
-lookup_host (struct gomp_memory_mapping *mem_map, void *h, size_t s)
+lookup_host (struct gomp_device_descr *devicep, void *h, size_t s)
 {
   struct splay_tree_key_s node;
   splay_tree_key key;
@@ -46,11 +46,11 @@ lookup_host (struct gomp_memory_mapping *mem_map, void *h, size_t s)
   node.host_start = (uintptr_t) h;
   node.host_end = (uintptr_t) h + s;
 
-  gomp_mutex_lock (&mem_map->lock);
+  gomp_mutex_lock (&devicep->lock);
 
-  key = splay_tree_lookup (&mem_map->splay_tree, &node);
+  key = splay_tree_lookup (&devicep->splay_tree, &node);
 
-  gomp_mutex_unlock (&mem_map->lock);
+  gomp_mutex_unlock (&devicep->lock);
 
   return key;
 }
@@ -65,14 +65,14 @@ lookup_dev (struct target_mem_desc *tgt, void *d, size_t s)
 {
   int i;
   struct target_mem_desc *t;
-  struct gomp_memory_mapping *mem_map;
+  struct gomp_device_descr *devicep;
 
   if (!tgt)
     return NULL;
 
-  mem_map = tgt->mem_map;
+  devicep = tgt->device_descr;
 
-  gomp_mutex_lock (&mem_map->lock);
+  gomp_mutex_lock (&devicep->lock);
 
   for (t = tgt; t != NULL; t = t->prev)
     {
@@ -80,7 +80,7 @@ lookup_dev (struct target_mem_desc *tgt, void *d, size_t s)
         break;
     }
 
-  gomp_mutex_unlock (&mem_map->lock);
+  gomp_mutex_unlock (&devicep->lock);
 
   if (!t)
     return NULL;
@@ -112,7 +112,9 @@ acc_malloc (size_t s)
 
   struct goacc_thread *thr = goacc_thread ();
 
-  return base_dev->alloc_func (thr->dev->target_id, s);
+  assert (thr->dev);
+
+  return thr->dev->alloc_func (thr->dev->target_id, s);
 }
 
 /* OpenACC 2.0a (3.2.16) doesn't specify what to do in the event
@@ -127,6 +129,8 @@ acc_free (void *d)
   if (!d)
     return;
 
+  assert (thr && thr->dev);
+
   /* We don't have to call lazy open here, as the ptr value must have
      been returned by acc_malloc.  It's not permitted to pass NULL in
      (unless you got that null from acc_malloc).  */
@@ -139,7 +143,7 @@ acc_free (void *d)
      acc_unmap_data ((void *)(k->host_start + offset));
    }
 
-  base_dev->free_func (thr->dev->target_id, d);
+  thr->dev->free_func (thr->dev->target_id, d);
 }
 
 void
@@ -149,7 +153,9 @@ acc_memcpy_to_device (void *d, void *h, size_t s)
      been obtained from a routine that did that.  */
   struct goacc_thread *thr = goacc_thread ();
 
-  base_dev->host2dev_func (thr->dev->target_id, d, h, s);
+  assert (thr && thr->dev);
+
+  thr->dev->host2dev_func (thr->dev->target_id, d, h, s);
 }
 
 void
@@ -159,7 +165,9 @@ acc_memcpy_from_device (void *h, void *d, size_t s)
      been obtained from a routine that did that.  */
   struct goacc_thread *thr = goacc_thread ();
 
-  base_dev->dev2host_func (thr->dev->target_id, h, d, s);
+  assert (thr && thr->dev);
+
+  thr->dev->dev2host_func (thr->dev->target_id, h, d, s);
 }
 
 /* Return the device pointer that corresponds to host data H.  Or NULL
@@ -176,7 +184,7 @@ acc_deviceptr (void *h)
 
   struct goacc_thread *thr = goacc_thread ();
 
-  n = lookup_host (&thr->dev->mem_map, h, 1);
+  n = lookup_host (thr->dev, h, 1);
 
   if (!n)
     return NULL;
@@ -229,7 +237,7 @@ acc_is_present (void *h, size_t s)
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
 
-  n = lookup_host (&acc_dev->mem_map, h, s);
+  n = lookup_host (acc_dev, h, s);
 
   if (n && ((uintptr_t)h < n->host_start
 	    || (uintptr_t)h + s > n->host_end
@@ -271,7 +279,7 @@ acc_map_data (void *h, void *d, size_t s)
 	gomp_fatal ("[%p,+%d]->[%p,+%d] is a bad map",
                     (void *)h, (int)s, (void *)d, (int)s);
 
-      if (lookup_host (&acc_dev->mem_map, h, s))
+      if (lookup_host (acc_dev, h, s))
 	gomp_fatal ("host address [%p, +%d] is already mapped", (void *)h,
 		    (int)s);
 
@@ -296,7 +304,7 @@ acc_unmap_data (void *h)
   /* No need to call lazy open, as the address must have been mapped.  */
 
   size_t host_size;
-  splay_tree_key n = lookup_host (&acc_dev->mem_map, h, 1);
+  splay_tree_key n = lookup_host (acc_dev, h, 1);
   struct target_mem_desc *t;
 
   if (!n)
@@ -320,7 +328,7 @@ acc_unmap_data (void *h)
       t->tgt_end = 0;
       t->to_free = 0;
 
-      gomp_mutex_lock (&acc_dev->mem_map.lock);
+      gomp_mutex_lock (&acc_dev->lock);
 
       for (tp = NULL, t = acc_dev->openacc.data_environ; t != NULL;
 	   tp = t, t = t->prev)
@@ -334,7 +342,7 @@ acc_unmap_data (void *h)
 	    break;
 	  }
 
-      gomp_mutex_unlock (&acc_dev->mem_map.lock);
+      gomp_mutex_unlock (&acc_dev->lock);
     }
 
   gomp_unmap_vars (t, true);
@@ -358,7 +366,7 @@ present_create_copy (unsigned f, void *h, size_t s)
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
 
-  n = lookup_host (&acc_dev->mem_map, h, s);
+  n = lookup_host (acc_dev, h, s);
   if (n)
     {
       /* Present. */
@@ -389,13 +397,13 @@ present_create_copy (unsigned f, void *h, size_t s)
       tgt = gomp_map_vars (acc_dev, mapnum, &hostaddrs, NULL, &s, &kinds, true,
 			   false);
 
-      gomp_mutex_lock (&acc_dev->mem_map.lock);
+      gomp_mutex_lock (&acc_dev->lock);
 
       d = tgt->to_free;
       tgt->prev = acc_dev->openacc.data_environ;
       acc_dev->openacc.data_environ = tgt;
 
-      gomp_mutex_unlock (&acc_dev->mem_map.lock);
+      gomp_mutex_unlock (&acc_dev->lock);
     }
 
   return d;
@@ -436,7 +444,7 @@ delete_copyout (unsigned f, void *h, size_t s)
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
 
-  n = lookup_host (&acc_dev->mem_map, h, s);
+  n = lookup_host (acc_dev, h, s);
 
   /* No need to call lazy open, as the data must already have been
      mapped.  */
@@ -479,7 +487,7 @@ update_dev_host (int is_dev, void *h, size_t s)
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
 
-  n = lookup_host (&acc_dev->mem_map, h, s);
+  n = lookup_host (acc_dev, h, s);
 
   /* No need to call lazy open, as the data must already have been
      mapped.  */
@@ -532,7 +540,7 @@ gomp_acc_remove_pointer (void *h, bool force_copyfrom, int async, int mapnum)
   struct target_mem_desc *t;
   int minrefs = (mapnum == 1) ? 2 : 3;
 
-  n = lookup_host (&acc_dev->mem_map, h, 1);
+  n = lookup_host (acc_dev, h, 1);
 
   if (!n)
     gomp_fatal ("%p is not a mapped block", (void *)h);
@@ -543,7 +551,7 @@ gomp_acc_remove_pointer (void *h, bool force_copyfrom, int async, int mapnum)
 
   struct target_mem_desc *tp;
 
-  gomp_mutex_lock (&acc_dev->mem_map.lock);
+  gomp_mutex_lock (&acc_dev->lock);
 
   if (t->refcount == minrefs)
     {
@@ -570,7 +578,7 @@ gomp_acc_remove_pointer (void *h, bool force_copyfrom, int async, int mapnum)
   if (force_copyfrom)
     t->list[0]->copy_from = 1;
 
-  gomp_mutex_unlock (&acc_dev->mem_map.lock);
+  gomp_mutex_unlock (&acc_dev->lock);
 
   /* If running synchronously, unmap immediately.  */
   if (async < acc_async_noval)
diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
index 727fced..ed1b965 100644
--- a/libgomp/oacc-parallel.c
+++ b/libgomp/oacc-parallel.c
@@ -46,32 +46,6 @@ find_pset (int pos, size_t mapnum, unsigned short *kinds)
   return kind == GOMP_MAP_TO_PSET;
 }
 
-
-/* Ensure that the target device for DEVICE_TYPE is initialised (and that
-   plugins have been loaded if appropriate).  The ACC_dev variable for the
-   current thread will be set appropriately for the given device type on
-   return.  */
-
-attribute_hidden void
-select_acc_device (int device_type)
-{
-  goacc_lazy_initialize ();
-
-  if (device_type == GOMP_DEVICE_HOST_FALLBACK)
-    return;
-
-  if (device_type == acc_device_none)
-    device_type = acc_device_host;
-
-  if (device_type >= 0)
-    {
-      /* NOTE: this will go badly if the surrounding data environment is set up
-         to use a different device type.  We'll just have to trust that users
-	 know what they're doing...  */
-      acc_set_device_type (device_type);
-    }
-}
-
 static void goacc_wait (int async, int num_waits, va_list ap);
 
 void
@@ -99,7 +73,7 @@ GOACC_parallel (int device, void (*fn) (void *),
   gomp_debug (0, "%s: mapnum=%zd, hostaddrs=%p, sizes=%p, kinds=%p, async=%d\n",
 	      __FUNCTION__, mapnum, hostaddrs, sizes, kinds, async);
 
-  select_acc_device (device);
+  goacc_lazy_initialize ();
 
   thr = goacc_thread ();
   acc_dev = thr->dev;
@@ -132,9 +106,9 @@ GOACC_parallel (int device, void (*fn) (void *),
     {
       k.host_start = (uintptr_t) fn;
       k.host_end = k.host_start + 1;
-      gomp_mutex_lock (&acc_dev->mem_map.lock);
-      tgt_fn_key = splay_tree_lookup (&acc_dev->mem_map.splay_tree, &k);
-      gomp_mutex_unlock (&acc_dev->mem_map.lock);
+      gomp_mutex_lock (&acc_dev->lock);
+      tgt_fn_key = splay_tree_lookup (&acc_dev->splay_tree, &k);
+      gomp_mutex_unlock (&acc_dev->lock);
 
       if (tgt_fn_key == NULL)
 	gomp_fatal ("target function wasn't mapped");
@@ -178,7 +152,7 @@ GOACC_data_start (int device, size_t mapnum,
   gomp_debug (0, "%s: mapnum=%zd, hostaddrs=%p, sizes=%p, kinds=%p\n",
 	      __FUNCTION__, mapnum, hostaddrs, sizes, kinds);
 
-  select_acc_device (device);
+  goacc_lazy_initialize ();
 
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
@@ -225,7 +199,7 @@ GOACC_enter_exit_data (int device, size_t mapnum,
   bool data_enter = false;
   size_t i;
 
-  select_acc_device (device);
+  goacc_lazy_initialize ();
 
   thr = goacc_thread ();
   acc_dev = thr->dev;
@@ -366,7 +340,7 @@ GOACC_kernels (int device, void (*fn) (void *),
 
   va_list ap;
 
-  select_acc_device (device);
+  goacc_lazy_initialize ();
 
   va_start (ap, num_waits);
 
@@ -437,7 +411,7 @@ GOACC_update (int device, size_t mapnum,
   bool host_fallback = device == GOMP_DEVICE_HOST_FALLBACK;
   size_t i;
 
-  select_acc_device (device);
+  goacc_lazy_initialize ();
 
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
diff --git a/libgomp/plugin/plugin-host.c b/libgomp/plugin/plugin-host.c
index bc60f72..1faf5bc 100644
--- a/libgomp/plugin/plugin-host.c
+++ b/libgomp/plugin/plugin-host.c
@@ -119,31 +119,6 @@ GOMP_OFFLOAD_unload_image (int n __attribute__ ((unused)),
 }
 
 STATIC void *
-GOMP_OFFLOAD_openacc_open_device (int n)
-{
-  return (void *) (intptr_t) n;
-}
-
-STATIC int
-GOMP_OFFLOAD_openacc_close_device (void *hnd)
-{
-  return 0;
-}
-
-STATIC int
-GOMP_OFFLOAD_openacc_get_device_num (void)
-{
-  return 0;
-}
-
-STATIC void
-GOMP_OFFLOAD_openacc_set_device_num (int n)
-{
-  if (n > 0)
-    GOMP (fatal) ("device number %u out of range for host execution", n);
-}
-
-STATIC void *
 GOMP_OFFLOAD_alloc (int n __attribute__ ((unused)), size_t s)
 {
   return GOMP (malloc) (s);
@@ -254,7 +229,7 @@ GOMP_OFFLOAD_openacc_async_wait_all_async (int async __attribute__ ((unused)))
 }
 
 STATIC void *
-GOMP_OFFLOAD_openacc_create_thread_data (void *targ_data
+GOMP_OFFLOAD_openacc_create_thread_data (int ord
 					 __attribute__ ((unused)))
 {
   return NULL;
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 483cb75..583ec87 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -133,7 +133,8 @@ struct targ_fn_descriptor
   const char *name;
 };
 
-static bool ptx_inited = false;
+static unsigned int instantiated_devices = 0;
+static pthread_mutex_t ptx_dev_lock = PTHREAD_MUTEX_INITIALIZER;
 
 struct ptx_stream
 {
@@ -331,9 +332,21 @@ struct ptx_event
   struct ptx_event *next;
 };
 
+struct ptx_image_data
+{
+  void *target_data;
+  CUmodule module;
+  struct ptx_image_data *next;
+};
+
 static pthread_mutex_t ptx_event_lock;
 static struct ptx_event *ptx_events;
 
+static struct ptx_device **ptx_devices;
+
+static struct ptx_image_data *ptx_images = NULL;
+static pthread_mutex_t ptx_image_lock = PTHREAD_MUTEX_INITIALIZER;
+
 #define _XSTR(s) _STR(s)
 #define _STR(s) #s
 
@@ -450,8 +463,8 @@ fini_streams_for_device (struct ptx_device *ptx_dev)
       struct ptx_stream *s = ptx_dev->active_streams;
       ptx_dev->active_streams = ptx_dev->active_streams->next;
 
-      cuStreamDestroy (s->stream);
       map_fini (s);
+      cuStreamDestroy (s->stream);
       free (s);
     }
 
@@ -575,21 +588,21 @@ select_stream_for_async (int async, pthread_t thread, bool create,
   return stream;
 }
 
-static int nvptx_get_num_devices (void);
-
-/* Initialize the device.  */
-static int
+/* Initialize the device.  Return TRUE on success, else FALSE.  PTX_DEV_LOCK
+   should be locked on entry and remains locked on exit.  */
+static bool
 nvptx_init (void)
 {
   CUresult r;
   int rc;
+  int ndevs;
 
-  if (ptx_inited)
-    return nvptx_get_num_devices ();
+  if (instantiated_devices != 0)
+    return true;
 
   rc = verify_device_library ();
   if (rc < 0)
-    return -1;
+    return false;
 
   r = cuInit (0);
   if (r != CUDA_SUCCESS)
@@ -599,22 +612,64 @@ nvptx_init (void)
 
   pthread_mutex_init (&ptx_event_lock, NULL);
 
-  ptx_inited = true;
+  r = cuDeviceGetCount (&ndevs);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuDeviceGetCount error: %s", cuda_error (r));
 
-  return nvptx_get_num_devices ();
+  ptx_devices = GOMP_PLUGIN_malloc_cleared (sizeof (struct ptx_device *)
+					    * ndevs);
+
+  return true;
 }
 
+/* Select the N'th PTX device for the current host thread.  The device must
+   have been previously opened before calling this function.  */
+
 static void
-nvptx_fini (void)
+nvptx_attach_host_thread_to_device (int n)
 {
-  ptx_inited = false;
+  CUdevice dev;
+  CUresult r;
+  struct ptx_device *ptx_dev;
+  CUcontext thd_ctx;
+
+  r = cuCtxGetDevice (&dev);
+  if (r != CUDA_SUCCESS && r != CUDA_ERROR_INVALID_CONTEXT)
+    GOMP_PLUGIN_fatal ("cuCtxGetDevice error: %s", cuda_error (r));
+
+  if (r != CUDA_ERROR_INVALID_CONTEXT && dev == n)
+    return;
+  else
+    {
+      CUcontext old_ctx;
+
+      ptx_dev = ptx_devices[n];
+      assert (ptx_dev);
+
+      r = cuCtxGetCurrent (&thd_ctx);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuCtxGetCurrent error: %s", cuda_error (r));
+
+      /* We don't necessarily have a current context (e.g. if it has been
+         destroyed.  Pop it if we do though.  */
+      if (thd_ctx != NULL)
+	{
+	  r = cuCtxPopCurrent (&old_ctx);
+	  if (r != CUDA_SUCCESS)
+            GOMP_PLUGIN_fatal ("cuCtxPopCurrent error: %s", cuda_error (r));
+	}
+
+      r = cuCtxPushCurrent (ptx_dev->ctx);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuCtxPushCurrent error: %s", cuda_error (r));
+    }
 }
 
-static void *
+static struct ptx_device *
 nvptx_open_device (int n)
 {
   struct ptx_device *ptx_dev;
-  CUdevice dev;
+  CUdevice dev, ctx_dev;
   CUresult r;
   int async_engines, pi;
 
@@ -628,6 +683,21 @@ nvptx_open_device (int n)
   ptx_dev->dev = dev;
   ptx_dev->ctx_shared = false;
 
+  r = cuCtxGetDevice (&ctx_dev);
+  if (r != CUDA_SUCCESS && r != CUDA_ERROR_INVALID_CONTEXT)
+    GOMP_PLUGIN_fatal ("cuCtxGetDevice error: %s", cuda_error (r));
+  
+  if (r != CUDA_ERROR_INVALID_CONTEXT && ctx_dev != dev)
+    {
+      /* The current host thread has an active context for a different device.
+         Detach it.  */
+      CUcontext old_ctx;
+      
+      r = cuCtxPopCurrent (&old_ctx);
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuCtxPopCurrent error: %s", cuda_error (r));
+    }
+
   r = cuCtxGetCurrent (&ptx_dev->ctx);
   if (r != CUDA_SUCCESS)
     GOMP_PLUGIN_fatal ("cuCtxGetCurrent error: %s", cuda_error (r));
@@ -678,17 +748,16 @@ nvptx_open_device (int n)
 
   init_streams_for_device (ptx_dev, async_engines);
 
-  return (void *) ptx_dev;
+  return ptx_dev;
 }
 
-static int
-nvptx_close_device (void *targ_data)
+static void
+nvptx_close_device (struct ptx_device *ptx_dev)
 {
   CUresult r;
-  struct ptx_device *ptx_dev = targ_data;
 
   if (!ptx_dev)
-    return 0;
+    return;
 
   fini_streams_for_device (ptx_dev);
 
@@ -700,8 +769,6 @@ nvptx_close_device (void *targ_data)
     }
 
   free (ptx_dev);
-
-  return 0;
 }
 
 static int
@@ -714,7 +781,7 @@ nvptx_get_num_devices (void)
      order to enumerate available devices, but CUDA API routines can't be used
      until cuInit has been called.  Just call it now (but don't yet do any
      further initialization).  */
-  if (!ptx_inited)
+  if (instantiated_devices == 0)
     cuInit (0);
 
   r = cuDeviceGetCount (&n);
@@ -1507,64 +1574,84 @@ GOMP_OFFLOAD_get_num_devices (void)
   return nvptx_get_num_devices ();
 }
 
-static void **kernel_target_data;
-static void **kernel_host_table;
-
 void
-GOMP_OFFLOAD_register_image (void *host_table, void *target_data)
+GOMP_OFFLOAD_init_device (int n)
 {
-  kernel_target_data = target_data;
-  kernel_host_table = host_table;
-}
+  pthread_mutex_lock (&ptx_dev_lock);
 
-void
-GOMP_OFFLOAD_init_device (int n __attribute__ ((unused)))
-{
-  (void) nvptx_init ();
+  if (!nvptx_init () || ptx_devices[n] != NULL)
+    {
+      pthread_mutex_unlock (&ptx_dev_lock);
+      return;
+    }
+
+  ptx_devices[n] = nvptx_open_device (n);
+  instantiated_devices++;
+
+  pthread_mutex_unlock (&ptx_dev_lock);
 }
 
 void
-GOMP_OFFLOAD_fini_device (int n __attribute__ ((unused)))
+GOMP_OFFLOAD_fini_device (int n)
 {
-  nvptx_fini ();
+  pthread_mutex_lock (&ptx_dev_lock);
+
+  if (ptx_devices[n] != NULL)
+    {
+      nvptx_attach_host_thread_to_device (n);
+      nvptx_close_device (ptx_devices[n]);
+      ptx_devices[n] = NULL;
+      instantiated_devices--;
+    }
+
+  pthread_mutex_unlock (&ptx_dev_lock);
 }
 
 int
-GOMP_OFFLOAD_get_table (int n __attribute__ ((unused)),
-			struct mapping_table **tablep)
+GOMP_OFFLOAD_load_image (int ord, void *target_data,
+			 struct addr_pair **target_table)
 {
   CUmodule module;
-  void **fn_table;
-  char **fn_names;
-  int fn_entries, i;
+  char **fn_names, **var_names;
+  unsigned int fn_entries, var_entries, i, j;
   CUresult r;
   struct targ_fn_descriptor *targ_fns;
+  void **img_header = (void **) target_data;
+  struct ptx_image_data *new_image;
 
-  if (nvptx_init () <= 0)
-    return 0;
+  GOMP_OFFLOAD_init_device (ord);
 
-  /* This isn't an error, because an image may legitimately have no offloaded
-     regions and so will not call GOMP_offload_register.  */
-  if (kernel_target_data == NULL)
-    return 0;
+  nvptx_attach_host_thread_to_device (ord);
+
+  link_ptx (&module, img_header[0]);
 
-  link_ptx (&module, kernel_target_data[0]);
+  pthread_mutex_lock (&ptx_image_lock);
+  new_image = GOMP_PLUGIN_malloc (sizeof (struct ptx_image_data));
+  new_image->target_data = target_data;
+  new_image->module = module;
+  new_image->next = ptx_images;
+  ptx_images = new_image;
+  pthread_mutex_unlock (&ptx_image_lock);
 
-  /* kernel_target_data[0] -> ptx code
-     kernel_target_data[1] -> variable mappings
-     kernel_target_data[2] -> array of kernel names in ascii
+  /* The mkoffload utility emits a table of pointers/integers at the start of
+     each offload image:
 
-     kernel_host_table[0] -> start of function addresses (__offload_func_table)
-     kernel_host_table[1] -> end of function addresses (__offload_funcs_end)
+     img_header[0] -> ptx code
+     img_header[1] -> number of variables
+     img_header[2] -> array of variable names (pointers to strings)
+     img_header[3] -> number of kernels
+     img_header[4] -> array of kernel names (pointers to strings)
 
      The array of kernel names and the functions addresses form a
      one-to-one correspondence.  */
 
-  fn_table = kernel_host_table[0];
-  fn_names = (char **) kernel_target_data[2];
-  fn_entries = (kernel_host_table[1] - kernel_host_table[0]) / sizeof (void *);
+  var_entries = (uintptr_t) img_header[1];
+  var_names = (char **) img_header[2];
+  fn_entries = (uintptr_t) img_header[3];
+  fn_names = (char **) img_header[4];
 
-  *tablep = GOMP_PLUGIN_malloc (sizeof (struct mapping_table) * fn_entries);
+  *target_table = GOMP_PLUGIN_malloc (sizeof (struct addr_pair)
+				      * (fn_entries + var_entries));
   targ_fns = GOMP_PLUGIN_malloc (sizeof (struct targ_fn_descriptor)
 				 * fn_entries);
 
@@ -1579,38 +1666,86 @@ GOMP_OFFLOAD_get_table (int n __attribute__ ((unused)),
       targ_fns[i].fn = function;
       targ_fns[i].name = (const char *) fn_names[i];
 
-      (*tablep)[i].host_start = (uintptr_t) fn_table[i];
-      (*tablep)[i].host_end = (*tablep)[i].host_start + 1;
-      (*tablep)[i].tgt_start = (uintptr_t) &targ_fns[i];
-      (*tablep)[i].tgt_end = (*tablep)[i].tgt_start + 1;
+      (*target_table)[i].start = (uintptr_t) &targ_fns[i];
+      (*target_table)[i].end = (*target_table)[i].start + 1;
     }
 
-  return fn_entries;
+  for (j = 0; j < var_entries; j++, i++)
+    {
+      CUdeviceptr var;
+      size_t bytes;
+
+      r = cuModuleGetGlobal (&var, &bytes, module, var_names[j]);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuModuleGetGlobal error: %s", cuda_error (r));
+
+      (*target_table)[i].start = (uintptr_t) var;
+      (*target_table)[i].end = (*target_table)[i].start + bytes;
+    }
+
+  return i;
+}
+
+void
+GOMP_OFFLOAD_unload_image (int tid __attribute__((unused)), void *target_data)
+{
+  void **img_header = (void **) target_data;
+  struct targ_fn_descriptor *targ_fns
+    = (struct targ_fn_descriptor *) img_header[0];
+  struct ptx_image_data *image, *prev = NULL, *newhd = NULL;
+
+  free (targ_fns);
+
+  pthread_mutex_lock (&ptx_image_lock);
+  for (image = ptx_images; image != NULL;)
+    {
+      struct ptx_image_data *next = image->next;
+
+      if (image->target_data == target_data)
+	{
+	  cuModuleUnload (image->module);
+	  free (image);
+	  if (prev)
+	    prev->next = next;
+	}
+      else
+	{
+	  prev = image;
+	  if (!newhd)
+	    newhd = image;
+	}
+
+      image = next;
+    }
+  ptx_images = newhd;
+  pthread_mutex_unlock (&ptx_image_lock);
 }
 
 void *
-GOMP_OFFLOAD_alloc (int n __attribute__ ((unused)), size_t size)
+GOMP_OFFLOAD_alloc (int ord, size_t size)
 {
+  nvptx_attach_host_thread_to_device (ord);
   return nvptx_alloc (size);
 }
 
 void
-GOMP_OFFLOAD_free (int n __attribute__ ((unused)), void *ptr)
+GOMP_OFFLOAD_free (int ord, void *ptr)
 {
+  nvptx_attach_host_thread_to_device (ord);
   nvptx_free (ptr);
 }
 
 void *
-GOMP_OFFLOAD_dev2host (int ord __attribute__ ((unused)), void *dst,
-		       const void *src, size_t n)
+GOMP_OFFLOAD_dev2host (int ord, void *dst, const void *src, size_t n)
 {
+  nvptx_attach_host_thread_to_device (ord);
   return nvptx_dev2host (dst, src, n);
 }
 
 void *
-GOMP_OFFLOAD_host2dev (int ord __attribute__ ((unused)), void *dst,
-		       const void *src, size_t n)
+GOMP_OFFLOAD_host2dev (int ord, void *dst, const void *src, size_t n)
 {
+  nvptx_attach_host_thread_to_device (ord);
   return nvptx_host2dev (dst, src, n);
 }
 
@@ -1627,45 +1762,6 @@ GOMP_OFFLOAD_openacc_parallel (void (*fn) (void *), size_t mapnum,
 	    num_workers, vector_length, async, targ_mem_desc);
 }
 
-void *
-GOMP_OFFLOAD_openacc_open_device (int n)
-{
-  return nvptx_open_device (n);
-}
-
-int
-GOMP_OFFLOAD_openacc_close_device (void *h)
-{
-  return nvptx_close_device (h);
-}
-
-void
-GOMP_OFFLOAD_openacc_set_device_num (int n)
-{
-  struct nvptx_thread *nvthd = nvptx_thread ();
-
-  assert (n >= 0);
-
-  if (!nvthd->ptx_dev || nvthd->ptx_dev->ord != n)
-    (void) nvptx_open_device (n);
-}
-
-/* This can be called before the device is "opened" for the current thread, in
-   which case we can't tell which device number should be returned.  We don't
-   actually want to open the device here, so just return -1 and let the caller
-   (oacc-init.c:acc_get_device_num) handle it.  */
-
-int
-GOMP_OFFLOAD_openacc_get_device_num (void)
-{
-  struct nvptx_thread *nvthd = nvptx_thread ();
-
-  if (nvthd && nvthd->ptx_dev)
-    return nvthd->ptx_dev->ord;
-  else
-    return -1;
-}
-
 void
 GOMP_OFFLOAD_openacc_register_async_cleanup (void *targ_mem_desc)
 {
@@ -1729,14 +1825,18 @@ GOMP_OFFLOAD_openacc_async_set_async (int async)
 }
 
 void *
-GOMP_OFFLOAD_openacc_create_thread_data (void *targ_data)
+GOMP_OFFLOAD_openacc_create_thread_data (int ord)
 {
-  struct ptx_device *ptx_dev = (struct ptx_device *) targ_data;
+  struct ptx_device *ptx_dev;
   struct nvptx_thread *nvthd
     = GOMP_PLUGIN_malloc (sizeof (struct nvptx_thread));
   CUresult r;
   CUcontext thd_ctx;
 
+  ptx_dev = ptx_devices[ord];
+
+  assert (ptx_dev);
+
   r = cuCtxGetCurrent (&thd_ctx);
   if (r != CUDA_SUCCESS)
     GOMP_PLUGIN_fatal ("cuCtxGetCurrent error: %s", cuda_error (r));
diff --git a/libgomp/target.c b/libgomp/target.c
index f443cff..fabfc02 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -150,14 +150,12 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
   size_t i, tgt_align, tgt_size, not_found_cnt = 0;
   const int rshift = is_openacc ? 8 : 3;
   const int typemask = is_openacc ? 0xff : 0x7;
-  struct gomp_memory_mapping *mm = &devicep->mem_map;
   struct splay_tree_key_s cur_node;
   struct target_mem_desc *tgt
     = gomp_malloc (sizeof (*tgt) + sizeof (tgt->list[0]) * mapnum);
   tgt->list_count = mapnum;
   tgt->refcount = 1;
   tgt->device_descr = devicep;
-  tgt->mem_map = mm;
 
   if (mapnum == 0)
     return tgt;
@@ -171,7 +169,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
       tgt_size = mapnum * sizeof (void *);
     }
 
-  gomp_mutex_lock (&mm->lock);
+  gomp_mutex_lock (&devicep->lock);
 
   for (i = 0; i < mapnum; i++)
     {
@@ -186,7 +184,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	cur_node.host_end = cur_node.host_start + sizes[i];
       else
 	cur_node.host_end = cur_node.host_start + sizeof (void *);
-      splay_tree_key n = splay_tree_lookup (&mm->splay_tree, &cur_node);
+      splay_tree_key n = splay_tree_lookup (&devicep->splay_tree, &cur_node);
       if (n)
 	{
 	  tgt->list[i] = n;
@@ -271,7 +269,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	      k->host_end = k->host_start + sizes[i];
 	    else
 	      k->host_end = k->host_start + sizeof (void *);
-	    splay_tree_key n = splay_tree_lookup (&mm->splay_tree, k);
+	    splay_tree_key n = splay_tree_lookup (&devicep->splay_tree, k);
 	    if (n)
 	      {
 		tgt->list[i] = n;
@@ -291,7 +289,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 		tgt->refcount++;
 		array->left = NULL;
 		array->right = NULL;
-		splay_tree_insert (&mm->splay_tree, array);
+		splay_tree_insert (&devicep->splay_tree, array);
 		switch (kind & typemask)
 		  {
 		  case GOMP_MAP_ALLOC:
@@ -329,16 +327,17 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 		    /* Add bias to the pointer value.  */
 		    cur_node.host_start += sizes[i];
 		    cur_node.host_end = cur_node.host_start + 1;
-		    n = splay_tree_lookup (&mm->splay_tree, &cur_node);
+		    n = splay_tree_lookup (&devicep->splay_tree, &cur_node);
 		    if (n == NULL)
 		      {
 			/* Could be possibly zero size array section.  */
 			cur_node.host_end--;
-			n = splay_tree_lookup (&mm->splay_tree, &cur_node);
+			n = splay_tree_lookup (&devicep->splay_tree, &cur_node);
 			if (n == NULL)
 			  {
 			    cur_node.host_start--;
-			    n = splay_tree_lookup (&mm->splay_tree, &cur_node);
+			    n = splay_tree_lookup (&devicep->splay_tree,
+						   &cur_node);
 			    cur_node.host_start++;
 			  }
 		      }
@@ -397,17 +396,18 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 			  /* Add bias to the pointer value.  */
 			  cur_node.host_start += sizes[j];
 			  cur_node.host_end = cur_node.host_start + 1;
-			  n = splay_tree_lookup (&mm->splay_tree, &cur_node);
+			  n = splay_tree_lookup (&devicep->splay_tree,
+						 &cur_node);
 			  if (n == NULL)
 			    {
 			      /* Could be possibly zero size array section.  */
 			      cur_node.host_end--;
-			      n = splay_tree_lookup (&mm->splay_tree,
+			      n = splay_tree_lookup (&devicep->splay_tree,
 						     &cur_node);
 			      if (n == NULL)
 				{
 				  cur_node.host_start--;
-				  n = splay_tree_lookup (&mm->splay_tree,
+				  n = splay_tree_lookup (&devicep->splay_tree,
 							 &cur_node);
 				  cur_node.host_start++;
 				}
@@ -479,7 +479,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	}
     }
 
-  gomp_mutex_unlock (&mm->lock);
+  gomp_mutex_unlock (&devicep->lock);
   return tgt;
 }
 
@@ -504,10 +504,9 @@ attribute_hidden void
 gomp_copy_from_async (struct target_mem_desc *tgt)
 {
   struct gomp_device_descr *devicep = tgt->device_descr;
-  struct gomp_memory_mapping *mm = tgt->mem_map;
   size_t i;
 
-  gomp_mutex_lock (&mm->lock);
+  gomp_mutex_lock (&devicep->lock);
 
   for (i = 0; i < tgt->list_count; i++)
     if (tgt->list[i] == NULL)
@@ -526,7 +525,7 @@ gomp_copy_from_async (struct target_mem_desc *tgt)
 				  k->host_end - k->host_start);
       }
 
-  gomp_mutex_unlock (&mm->lock);
+  gomp_mutex_unlock (&devicep->lock);
 }
 
 /* Unmap variables described by TGT.  If DO_COPYFROM is true, copy relevant
@@ -537,7 +536,6 @@ attribute_hidden void
 gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
 {
   struct gomp_device_descr *devicep = tgt->device_descr;
-  struct gomp_memory_mapping *mm = tgt->mem_map;
 
   if (tgt->list_count == 0)
     {
@@ -545,7 +543,7 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
       return;
     }
 
-  gomp_mutex_lock (&mm->lock);
+  gomp_mutex_lock (&devicep->lock);
 
   size_t i;
   for (i = 0; i < tgt->list_count; i++)
@@ -562,7 +560,7 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
 	  devicep->dev2host_func (devicep->target_id, (void *) k->host_start,
 				  (void *) (k->tgt->tgt_start + k->tgt_offset),
 				  k->host_end - k->host_start);
-	splay_tree_remove (&mm->splay_tree, k);
+	splay_tree_remove (&devicep->splay_tree, k);
 	if (k->tgt->refcount > 1)
 	  k->tgt->refcount--;
 	else
@@ -574,13 +572,12 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
   else
     gomp_unmap_tgt (tgt);
 
-  gomp_mutex_unlock (&mm->lock);
+  gomp_mutex_unlock (&devicep->lock);
 }
 
 static void
-gomp_update (struct gomp_device_descr *devicep, struct gomp_memory_mapping *mm,
-	     size_t mapnum, void **hostaddrs, size_t *sizes, void *kinds,
-	     bool is_openacc)
+gomp_update (struct gomp_device_descr *devicep, size_t mapnum,
+	     void **hostaddrs, size_t *sizes, void *kinds, bool is_openacc)
 {
   size_t i;
   struct splay_tree_key_s cur_node;
@@ -592,13 +589,13 @@ gomp_update (struct gomp_device_descr *devicep, struct gomp_memory_mapping *mm,
   if (mapnum == 0)
     return;
 
-  gomp_mutex_lock (&mm->lock);
+  gomp_mutex_lock (&devicep->lock);
   for (i = 0; i < mapnum; i++)
     if (sizes[i])
       {
 	cur_node.host_start = (uintptr_t) hostaddrs[i];
 	cur_node.host_end = cur_node.host_start + sizes[i];
-	splay_tree_key n = splay_tree_lookup (&mm->splay_tree,
+	splay_tree_key n = splay_tree_lookup (&devicep->splay_tree,
 					      &cur_node);
 	if (n)
 	  {
@@ -633,7 +630,7 @@ gomp_update (struct gomp_device_descr *devicep, struct gomp_memory_mapping *mm,
 		      (void *) cur_node.host_start,
 		      (void *) cur_node.host_end);
       }
-  gomp_mutex_unlock (&mm->lock);
+  gomp_mutex_unlock (&devicep->lock);
 }
 
 
@@ -644,7 +641,6 @@ gomp_splay_tree_insert_mapping (struct gomp_device_descr *devicep,
 				struct addr_pair *host_addr,
 				struct addr_pair *tgt_addr)
 {
-  struct gomp_memory_mapping *mm = &devicep->mem_map;
   struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt));
   tgt->refcount = 1;
   tgt->array = gomp_malloc (sizeof (*tgt->array));
@@ -663,7 +659,7 @@ gomp_splay_tree_insert_mapping (struct gomp_device_descr *devicep,
   k->tgt = tgt;
   node->left = NULL;
   node->right = NULL;
-  splay_tree_insert (&mm->splay_tree, node);
+  splay_tree_insert (&devicep->splay_tree, node);
 }
 
 /* Load image pointed by TARGET_DATA to the device, specified by DEVICEP.
@@ -767,7 +763,6 @@ GOMP_offload_unregister (void *host_table, enum offload_target_type target_type,
     {
       int j;
       struct gomp_device_descr *devicep = &devices[i];
-      struct gomp_memory_mapping *mm = &devicep->mem_map;
 
       if (devicep->type != target_type || !devicep->is_initialized)
 	continue;
@@ -780,7 +775,7 @@ GOMP_offload_unregister (void *host_table, enum offload_target_type target_type,
 	  struct splay_tree_key_s k;
 	  k.host_start = (uintptr_t) host_func_table[j];
 	  k.host_end = k.host_start + 1;
-	  splay_tree_remove (&mm->splay_tree, &k);
+	  splay_tree_remove (&devicep->splay_tree, &k);
 	}
 
       for (j = 0; j < num_vars; j++)
@@ -788,7 +783,7 @@ GOMP_offload_unregister (void *host_table, enum offload_target_type target_type,
 	  struct splay_tree_key_s k;
 	  k.host_start = (uintptr_t) host_var_table[j*2];
 	  k.host_end = k.host_start + (uintptr_t) host_var_table[j*2+1];
-	  splay_tree_remove (&mm->splay_tree, &k);
+	  splay_tree_remove (&devicep->splay_tree, &k);
 	}
     }
 
@@ -822,22 +817,20 @@ gomp_init_device (struct gomp_device_descr *devicep)
   devicep->is_initialized = true;
 }
 
-/* Free address mapping tables.  MM must be locked on entry, and remains locked
-   on return.  */
+/* Free address mapping tables.  The owning device must be locked on entry, and
+   remains locked on return.  */
 
 attribute_hidden void
-gomp_free_memmap (struct gomp_memory_mapping *mm)
+gomp_free_memmap (struct splay_tree_s splay_tree)
 {
-  while (mm->splay_tree.root)
+  while (splay_tree.root)
     {
-      struct target_mem_desc *tgt = mm->splay_tree.root->key.tgt;
+      struct target_mem_desc *tgt = splay_tree.root->key.tgt;
 
-      splay_tree_remove (&mm->splay_tree, &mm->splay_tree.root->key);
+      splay_tree_remove (&splay_tree, &splay_tree.root->key);
       free (tgt->array);
       free (tgt);
     }
-
-  mm->is_initialized = false;
 }
 
 /* This function de-initializes the target device, specified by DEVICEP.
@@ -868,7 +861,6 @@ GOMP_target (int device, void (*fn) (void *), const void *unused,
 	     unsigned char *kinds)
 {
   struct gomp_device_descr *devicep = resolve_device (device);
-  struct gomp_memory_mapping *mm = &devicep->mem_map;
 
   if (devicep == NULL
       || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
@@ -902,7 +894,7 @@ GOMP_target (int device, void (*fn) (void *), const void *unused,
       struct splay_tree_key_s k;
       k.host_start = (uintptr_t) fn;
       k.host_end = k.host_start + 1;
-      splay_tree_key tgt_fn = splay_tree_lookup (&mm->splay_tree, &k);
+      splay_tree_key tgt_fn = splay_tree_lookup (&devicep->splay_tree, &k);
       if (tgt_fn == NULL)
 	gomp_fatal ("Target function wasn't mapped");
       fn_addr = (void *) tgt_fn->tgt->tgt_start;
@@ -990,7 +982,7 @@ GOMP_target_update (int device, const void *unused, size_t mapnum,
     gomp_init_device (devicep);
   gomp_mutex_unlock (&devicep->lock);
 
-  gomp_update (devicep, &devicep->mem_map, mapnum, hostaddrs, sizes, kinds, false);
+  gomp_update (devicep, mapnum, hostaddrs, sizes, kinds, false);
 }
 
 void
@@ -1074,10 +1066,6 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
     {
       optional_present = optional_total = 0;
       DLSYM_OPT (openacc.exec, openacc_parallel);
-      DLSYM_OPT (openacc.open_device, openacc_open_device);
-      DLSYM_OPT (openacc.close_device, openacc_close_device);
-      DLSYM_OPT (openacc.get_device_num, openacc_get_device_num);
-      DLSYM_OPT (openacc.set_device_num, openacc_set_device_num);
       DLSYM_OPT (openacc.register_async_cleanup,
 		 openacc_register_async_cleanup);
       DLSYM_OPT (openacc.async_test, openacc_async_test);
@@ -1183,16 +1171,13 @@ gomp_target_init (void)
 		current_device.name = current_device.get_name_func ();
 		/* current_device.capabilities has already been set.  */
 		current_device.type = current_device.get_type_func ();
-		current_device.mem_map.is_initialized = false;
-		current_device.mem_map.splay_tree.root = NULL;
+		current_device.splay_tree.root = NULL;
 		current_device.is_initialized = false;
 		current_device.openacc.data_environ = NULL;
-		current_device.openacc.target_data = NULL;
 		for (i = 0; i < new_num_devices; i++)
 		  {
 		    current_device.target_id = i;
 		    devices[num_devices] = current_device;
-		    gomp_mutex_init (&devices[num_devices].mem_map.lock);
 		    gomp_mutex_init (&devices[num_devices].lock);
 		    num_devices++;
 		  }
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-9.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-9.c
index 84045db..a4cf7f2 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-9.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-9.c
@@ -58,7 +58,7 @@ main (int argc, char **argv)
       acc_set_device_num (1, (acc_device_t) 0);
 
       devnum = acc_get_device_num (devtype);
-      if (devnum != 0)
+      if (devnum != 1)
 	abort ();
   }
 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
  2015-03-26 20:41                                 ` Ilya Verbin
@ 2015-03-30 16:42                                   ` Jakub Jelinek
  2015-03-30 21:43                                     ` Julian Brown
                                                       ` (2 more replies)
  2015-09-25 15:10                                   ` libgomp: Compile-time error for non-portable gomp_mutex_t initialization (was: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks) Thomas Schwinge
  2015-09-25 16:56                                   ` libgomp: Guard all offload_images/num_offload_images access by register_lock (was: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks) Thomas Schwinge
  2 siblings, 3 replies; 92+ messages in thread
From: Jakub Jelinek @ 2015-03-30 16:42 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: Thomas Schwinge, Julian Brown, gcc-patches, Kirill Yukhin

On Thu, Mar 26, 2015 at 11:41:30PM +0300, Ilya Verbin wrote:
> Here is the latest patch for libgomp and mic plugin.
> make check-target-libgomp using intelmic emul passed.
> Also I used a testcase from the attachment.

This applies cleanly.

> Latest ptx part is here, I guess:
> https://gcc.gnu.org/ml/gcc-patches/2015-02/msg01407.html

But the one Julian posted doesn't apply on top of your patch.
If there is any interdiff needed on top of your patch, can it be
posted against trunk + your patch?

> +/* Insert mapping of host -> target address pairs to splay tree.  */
> +
> +static void
> +gomp_splay_tree_insert_mapping (struct gomp_device_descr *devicep,
> +				struct addr_pair *host_addr,
> +				struct addr_pair *tgt_addr)
> +{
> +  struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt));
> +  tgt->refcount = 1;
> +  tgt->array = gomp_malloc (sizeof (*tgt->array));
> +  tgt->tgt_start = tgt_addr->start;
> +  tgt->tgt_end = tgt_addr->end;
> +  tgt->to_free = NULL;
> +  tgt->list_count = 0;
> +  tgt->device_descr = devicep;
> +  splay_tree_node node = tgt->array;
> +  splay_tree_key k = &node->key;
> +  k->host_start = host_addr->start;
> +  k->host_end = host_addr->end;
> +  k->tgt_offset = 0;
> +  k->refcount = 1;
> +  k->copy_from = false;
> +  k->tgt = tgt;
> +  node->left = NULL;
> +  node->right = NULL;
> +  splay_tree_insert (&devicep->mem_map, node);
> +}

What is the reason to register and allocate these one at a time, rather than
using one struct target_mem_desc with one tgt->array for all splay tree
nodes registered from one image?
Perhaps you would just use tgt_start of 0 and tgt_end of 0 too (to make it
clear it is special) and just use tgt_offset relative to that (i.e.
absolute), but having to malloc each node individually and having to malloc
a target_mem_desc for each one sounds expensive.
Everything is freed just once anyway, isn't it?

> @@ -654,6 +727,18 @@ void
>  GOMP_offload_register (void *host_table, enum offload_target_type target_type,
>  		       void *target_data)
>  {
> +  int i;
> +  gomp_mutex_lock (&register_lock);
> +
> +  /* Load image to all initialized devices.  */
> +  for (i = 0; i < num_devices; i++)
> +    {
> +      struct gomp_device_descr *devicep = &devices[i];
> +      if (devicep->type == target_type && devicep->is_initialized)
> +	gomp_offload_image_to_device (devicep, host_table, target_data);

Shouldn't either this function, or gomp_offload_image_to_device lock
also devicep->lock mutex and unlock at the end?
Where exactly I guess depends on if the devicep->* hook calls should be
guarded with the mutex or not.  If yes, it should be this function and
gomp_init_device.

> +      if (devicep->type != target_type || !devicep->is_initialized)
> +	continue;
> +

Similarly.

> +      devicep->unload_image_func (devicep->target_id, target_data);
> +
> +      /* Remove mapping from splay tree.  */
> +      for (j = 0; j < num_funcs; j++)
> +	{
> +	  struct splay_tree_key_s k;
> +	  k.host_start = (uintptr_t) host_func_table[j];
> +	  k.host_end = k.host_start + 1;
> +	  splay_tree_remove (&devicep->mem_map, &k);
> +	}
> +
> +      for (j = 0; j < num_vars; j++)
> +	{
> +	  struct splay_tree_key_s k;
> +	  k.host_start = (uintptr_t) host_var_table[j*2];
> +	  k.host_end = k.host_start + (uintptr_t) host_var_table[j*2+1];
> +	  splay_tree_remove (&devicep->mem_map, &k);
> +	}
> +    }

Aren't you leaking here all the tgt->array and tgt allocations here?
Though, if you change it to just two allocations (one tgt, one array),
you'd need to free just once.

	Jakub

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
  2015-03-30 16:42                                   ` Jakub Jelinek
@ 2015-03-30 21:43                                     ` Julian Brown
  2015-03-31 12:52                                     ` Ilya Verbin
  2015-03-31 18:25                                     ` libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch) Ilya Verbin
  2 siblings, 0 replies; 92+ messages in thread
From: Julian Brown @ 2015-03-30 21:43 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Ilya Verbin, Thomas Schwinge, gcc-patches, Kirill Yukhin

[-- Attachment #1: Type: text/plain, Size: 728 bytes --]

On Mon, 30 Mar 2015 18:42:02 +0200
Jakub Jelinek <jakub@redhat.com> wrote:

> On Thu, Mar 26, 2015 at 11:41:30PM +0300, Ilya Verbin wrote:
> > Here is the latest patch for libgomp and mic plugin.
> > make check-target-libgomp using intelmic emul passed.
> > Also I used a testcase from the attachment.
> 
> This applies cleanly.
> 
> > Latest ptx part is here, I guess:
> > https://gcc.gnu.org/ml/gcc-patches/2015-02/msg01407.html
> 
> But the one Julian posted doesn't apply on top of your patch.
> If there is any interdiff needed on top of your patch, can it be
> posted against trunk + your patch?

Here's a version of my patch against trunk and Ilya's latest patch
(hopefully!). Tests look OK (libgomp + PTX).

HTH,

Julian

[-- Attachment #2: nvptx-load-unload-5.diff --]
[-- Type: text/x-patch, Size: 46820 bytes --]

commit f203634ace786b5bb2fdce56f123f3fba236dda3
Author: Julian Brown <julian@codesourcery.com>
Date:   Mon Mar 30 14:37:53 2015 -0700

    nvptx load/unload support, init rework

diff --git a/gcc/config/nvptx/mkoffload.c b/gcc/config/nvptx/mkoffload.c
index 02c44b6..dbc68bc 100644
--- a/gcc/config/nvptx/mkoffload.c
+++ b/gcc/config/nvptx/mkoffload.c
@@ -839,6 +839,7 @@ process (FILE *in, FILE *out)
 {
   const char *input = read_file (in);
   Token *tok = tokenize (input);
+  unsigned int nvars = 0, nfuncs = 0;
 
   do
     tok = parse_file (tok);
@@ -850,16 +851,17 @@ process (FILE *in, FILE *out)
   write_stmts (out, rev_stmts (fns));
   fprintf (out, ";\n\n");
   fprintf (out, "static const char *var_mappings[] = {\n");
-  for (id_map *id = var_ids; id; id = id->next)
+  for (id_map *id = var_ids; id; id = id->next, nvars++)
     fprintf (out, "\t\"%s\"%s\n", id->ptx_name, id->next ? "," : "");
   fprintf (out, "};\n\n");
   fprintf (out, "static const char *func_mappings[] = {\n");
-  for (id_map *id = func_ids; id; id = id->next)
+  for (id_map *id = func_ids; id; id = id->next, nfuncs++)
     fprintf (out, "\t\"%s\"%s\n", id->ptx_name, id->next ? "," : "");
   fprintf (out, "};\n\n");
 
   fprintf (out, "static const void *target_data[] = {\n");
-  fprintf (out, "  ptx_code, var_mappings, func_mappings\n");
+  fprintf (out, "  ptx_code, (void*) %u, var_mappings, (void*) %u, "
+		"func_mappings\n", nvars, nfuncs);
   fprintf (out, "};\n\n");
 
   fprintf (out, "extern void GOMP_offload_register (const void *, int, void *);\n");
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index a1d42c5..5272f01 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -655,9 +655,6 @@ struct target_mem_desc {
   /* Corresponding target device descriptor.  */
   struct gomp_device_descr *device_descr;
 
-  /* Memory mapping info for the thread that created this descriptor.  */
-  struct splay_tree_s *mem_map;
-
   /* List of splay keys to remove (or decrease refcount)
      at the end of region.  */
   splay_tree_key list[];
@@ -691,18 +688,6 @@ typedef struct acc_dispatch_t
   /* This is guarded by the lock in the "outer" struct gomp_device_descr.  */
   struct target_mem_desc *data_environ;
 
-  /* Extra information required for a device instance by a given target.  */
-  /* This is guarded by the lock in the "outer" struct gomp_device_descr.  */
-  void *target_data;
-
-  /* Open or close a device instance.  */
-  void *(*open_device_func) (int n);
-  int (*close_device_func) (void *h);
-
-  /* Set or get the device number.  */
-  int (*get_device_num_func) (void);
-  void (*set_device_num_func) (int);
-
   /* Execute.  */
   void (*exec_func) (void (*) (void *), size_t, void **, void **, size_t *,
 		     unsigned short *, int, int, int, int, void *);
@@ -720,7 +705,7 @@ typedef struct acc_dispatch_t
   void (*async_set_async_func) (int);
 
   /* Create/destroy TLS data.  */
-  void *(*create_thread_data_func) (void *);
+  void *(*create_thread_data_func) (int);
   void (*destroy_thread_data_func) (void *);
 
   /* NVIDIA target specific routines.  */
diff --git a/libgomp/oacc-async.c b/libgomp/oacc-async.c
index 08b7c5e..1f5827e 100644
--- a/libgomp/oacc-async.c
+++ b/libgomp/oacc-async.c
@@ -26,7 +26,7 @@
    see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
-
+#include <assert.h>
 #include "openacc.h"
 #include "libgomp.h"
 #include "oacc-int.h"
@@ -37,13 +37,23 @@ acc_async_test (int async)
   if (async < acc_async_sync)
     gomp_fatal ("invalid async argument: %d", async);
 
-  return base_dev->openacc.async_test_func (async);
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (!thr || !thr->dev)
+    gomp_fatal ("no device active");
+
+  return thr->dev->openacc.async_test_func (async);
 }
 
 int
 acc_async_test_all (void)
 {
-  return base_dev->openacc.async_test_all_func ();
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (!thr || !thr->dev)
+    gomp_fatal ("no device active");
+
+  return thr->dev->openacc.async_test_all_func ();
 }
 
 void
@@ -52,19 +62,34 @@ acc_wait (int async)
   if (async < acc_async_sync)
     gomp_fatal ("invalid async argument: %d", async);
 
-  base_dev->openacc.async_wait_func (async);
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (!thr || !thr->dev)
+    gomp_fatal ("no device active");
+
+  thr->dev->openacc.async_wait_func (async);
 }
 
 void
 acc_wait_async (int async1, int async2)
 {
-  base_dev->openacc.async_wait_async_func (async1, async2);
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (!thr || !thr->dev)
+    gomp_fatal ("no device active");
+
+  thr->dev->openacc.async_wait_async_func (async1, async2);
 }
 
 void
 acc_wait_all (void)
 {
-  base_dev->openacc.async_wait_all_func ();
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (!thr || !thr->dev)
+    gomp_fatal ("no device active");
+
+  thr->dev->openacc.async_wait_all_func ();
 }
 
 void
@@ -73,5 +98,10 @@ acc_wait_all_async (int async)
   if (async < acc_async_sync)
     gomp_fatal ("invalid async argument: %d", async);
 
-  base_dev->openacc.async_wait_all_async_func (async);
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (!thr || !thr->dev)
+    gomp_fatal ("no device active");
+
+  thr->dev->openacc.async_wait_all_async_func (async);
 }
diff --git a/libgomp/oacc-cuda.c b/libgomp/oacc-cuda.c
index c8ef376..4aab422 100644
--- a/libgomp/oacc-cuda.c
+++ b/libgomp/oacc-cuda.c
@@ -34,51 +34,53 @@
 void *
 acc_get_current_cuda_device (void)
 {
-  void *p = NULL;
+  struct goacc_thread *thr = goacc_thread ();
 
-  if (base_dev && base_dev->openacc.cuda.get_current_device_func)
-    p = base_dev->openacc.cuda.get_current_device_func ();
+  if (thr && thr->dev && thr->dev->openacc.cuda.get_current_device_func)
+    return thr->dev->openacc.cuda.get_current_device_func ();
 
-  return p;
+  return NULL;
 }
 
 void *
 acc_get_current_cuda_context (void)
 {
-  void *p = NULL;
+  struct goacc_thread *thr = goacc_thread ();
 
-  if (base_dev && base_dev->openacc.cuda.get_current_context_func)
-    p = base_dev->openacc.cuda.get_current_context_func ();
-
-  return p;
+  if (thr && thr->dev && thr->dev->openacc.cuda.get_current_context_func)
+    return thr->dev->openacc.cuda.get_current_context_func ();
+ 
+  return NULL;
 }
 
 void *
 acc_get_cuda_stream (int async)
 {
-  void *p = NULL;
+  struct goacc_thread *thr = goacc_thread ();
 
   if (async < 0)
-    return p;
-
-  if (base_dev && base_dev->openacc.cuda.get_stream_func)
-    p = base_dev->openacc.cuda.get_stream_func (async);
+    return NULL;
 
-  return p;
+  if (thr && thr->dev && thr->dev->openacc.cuda.get_stream_func)
+    return thr->dev->openacc.cuda.get_stream_func (async);
+ 
+  return NULL;
 }
 
 int
 acc_set_cuda_stream (int async, void *stream)
 {
-  int s = -1;
+  struct goacc_thread *thr;
 
   if (async < 0 || stream == NULL)
     return 0;
 
   goacc_lazy_initialize ();
 
-  if (base_dev && base_dev->openacc.cuda.set_stream_func)
-    s = base_dev->openacc.cuda.set_stream_func (async, stream);
+  thr = goacc_thread ();
+
+  if (thr && thr->dev && thr->dev->openacc.cuda.set_stream_func)
+    return thr->dev->openacc.cuda.set_stream_func (async, stream);
 
-  return s;
+  return -1;
 }
diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index e4756b6..6dcdbf3 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -53,16 +53,9 @@ static struct gomp_device_descr host_dispatch =
     .host2dev_func = GOMP_OFFLOAD_host2dev,
     .run_func = GOMP_OFFLOAD_run,
 
-    .mem_map.root = NULL,
     .is_initialized = false,
 
     .openacc = {
-      .open_device_func = GOMP_OFFLOAD_openacc_open_device,
-      .close_device_func = GOMP_OFFLOAD_openacc_close_device,
-
-      .get_device_num_func = GOMP_OFFLOAD_openacc_get_device_num,
-      .set_device_num_func = GOMP_OFFLOAD_openacc_set_device_num,
-
       .exec_func = GOMP_OFFLOAD_openacc_parallel,
 
       .register_async_cleanup_func
diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
index 1e0243e..dc40fb6 100644
--- a/libgomp/oacc-init.c
+++ b/libgomp/oacc-init.c
@@ -37,14 +37,13 @@
 
 static gomp_mutex_t acc_device_lock;
 
-/* The dispatch table for the current accelerator device.  This is global, so
-   you can only have one type of device open at any given time in a program.
-   This is the "base" device in that several devices that use the same
-   dispatch table may be active concurrently: this one (the "zeroth") is used
-   for overall initialisation/shutdown, and other instances -- not necessarily
-   including this one -- may be opened and closed once the base device has
-   been initialized.  */
-struct gomp_device_descr *base_dev;
+/* A cached version of the dispatcher for the global "current" accelerator type,
+   e.g. used as the default when creating new host threads.  This is the
+   device-type equivalent of goacc_device_num (which specifies which device to
+   use out of potentially several of the same type).  If there are several
+   devices of a given type, this points at the first one.  */
+
+static struct gomp_device_descr *cached_base_dev = NULL;
 
 #if defined HAVE_TLS || defined USE_EMUTLS
 __thread struct goacc_thread *goacc_tls_data;
@@ -53,9 +52,6 @@ pthread_key_t goacc_tls_key;
 #endif
 static pthread_key_t goacc_cleanup_key;
 
-/* Current dispatcher, and how it was initialized */
-static acc_device_t init_key = _ACC_device_hwm;
-
 static struct goacc_thread *goacc_threads;
 static gomp_mutex_t goacc_thread_lock;
 
@@ -94,6 +90,21 @@ get_openacc_name (const char *name)
     return name;
 }
 
+static const char *
+name_of_acc_device_t (enum acc_device_t type)
+{
+  switch (type)
+    {
+    case acc_device_none: return "none";
+    case acc_device_default: return "default";
+    case acc_device_host: return "host";
+    case acc_device_host_nonshm: return "host_nonshm";
+    case acc_device_not_host: return "not_host";
+    case acc_device_nvidia: return "nvidia";
+    default: gomp_fatal ("unknown device type %u", (unsigned) type);
+    }
+}
+
 static struct gomp_device_descr *
 resolve_device (acc_device_t d)
 {
@@ -159,22 +170,87 @@ resolve_device (acc_device_t d)
 static struct gomp_device_descr *
 acc_init_1 (acc_device_t d)
 {
-  struct gomp_device_descr *acc_dev;
+  struct gomp_device_descr *base_dev, *acc_dev;
+  int ndevs;
 
-  acc_dev = resolve_device (d);
+  base_dev = resolve_device (d);
+
+  ndevs = base_dev->get_num_devices_func ();
+
+  if (!base_dev || ndevs <= 0 || goacc_device_num >= ndevs)
+    gomp_fatal ("device %s not supported", name_of_acc_device_t (d));
 
-  if (!acc_dev || acc_dev->get_num_devices_func () <= 0)
-    gomp_fatal ("device %u not supported", (unsigned)d);
+  acc_dev = &base_dev[goacc_device_num];
 
   if (acc_dev->is_initialized)
     gomp_fatal ("device already active");
 
-  /* We need to remember what we were intialized as, to check shutdown etc.  */
-  init_key = d;
-
   gomp_init_device (acc_dev);
 
-  return acc_dev;
+  return base_dev;
+}
+
+static void
+acc_shutdown_1 (acc_device_t d)
+{
+  struct gomp_device_descr *base_dev;
+  struct goacc_thread *walk;
+  int ndevs, i;
+  bool devices_active = false;
+
+  /* Get the base device for this device type.  */
+  base_dev = resolve_device (d);
+
+  if (!base_dev)
+    gomp_fatal ("device %s not supported", name_of_acc_device_t (d));
+
+  gomp_mutex_lock (&goacc_thread_lock);
+
+  /* Free target-specific TLS data and close all devices.  */
+  for (walk = goacc_threads; walk != NULL; walk = walk->next)
+    {
+      if (walk->target_tls)
+	base_dev->openacc.destroy_thread_data_func (walk->target_tls);
+
+      walk->target_tls = NULL;
+
+      /* This would mean the user is shutting down OpenACC in the middle of an
+         "acc data" pragma.  Likely not intentional.  */
+      if (walk->mapped_data)
+	gomp_fatal ("shutdown in 'acc data' region");
+
+      /* Similarly, if this happens then user code has done something weird.  */
+      if (walk->saved_bound_dev)
+        gomp_fatal ("shutdown during host fallback");
+
+      if (walk->dev)
+	{
+	  gomp_mutex_lock (&walk->dev->lock);
+	  gomp_free_memmap (&walk->dev->mem_map);
+	  gomp_mutex_unlock (&walk->dev->lock);
+
+	  walk->dev = NULL;
+	  walk->base_dev = NULL;
+	}
+    }
+
+  gomp_mutex_unlock (&goacc_thread_lock);
+
+  ndevs = base_dev->get_num_devices_func ();
+
+  /* Close all the devices of this type that have been opened.  */
+  for (i = 0; i < ndevs; i++)
+    {
+      struct gomp_device_descr *acc_dev = &base_dev[i];
+      if (acc_dev->is_initialized)
+        {
+	  devices_active = true;
+	  gomp_fini_device (acc_dev);
+	}
+    }
+
+  if (!devices_active)
+    gomp_fatal ("no device initialized");
 }
 
 static struct goacc_thread *
@@ -207,9 +283,11 @@ goacc_destroy_thread (void *data)
 
   if (thr)
     {
-      if (base_dev && thr->target_tls)
+      struct gomp_device_descr *acc_dev = thr->dev;
+
+      if (acc_dev && thr->target_tls)
 	{
-	  base_dev->openacc.destroy_thread_data_func (thr->target_tls);
+	  acc_dev->openacc.destroy_thread_data_func (thr->target_tls);
 	  thr->target_tls = NULL;
 	}
 
@@ -236,53 +314,49 @@ goacc_destroy_thread (void *data)
   gomp_mutex_unlock (&goacc_thread_lock);
 }
 
-/* Open the ORD'th device of the currently-active type (base_dev must be
-   initialised before calling).  If ORD is < 0, open the default-numbered
-   device (set by the ACC_DEVICE_NUM environment variable or a call to
-   acc_set_device_num), or leave any currently-opened device as is.  "Opening"
-   consists of calling the device's open_device_func hook, and setting up
-   thread-local data (maybe allocating, then initializing with information
-   pertaining to the newly-opened or previously-opened device).  */
+/* Use the ORD'th device instance for the current host thread (or -1 for the
+   current global default).  The device (and the runtime) must be initialised
+   before calling this function.  */
 
-static void
-lazy_open (int ord)
+void
+goacc_attach_host_thread_to_device (int ord)
 {
   struct goacc_thread *thr = goacc_thread ();
-  struct gomp_device_descr *acc_dev;
-
-  if (thr && thr->dev)
-    {
-      assert (ord < 0 || ord == thr->dev->target_id);
-      return;
-    }
-
-  assert (base_dev);
-
+  struct gomp_device_descr *acc_dev = NULL, *base_dev = NULL;
+  int num_devices;
+  
+  if (thr && thr->dev && (thr->dev->target_id == ord || ord < 0))
+    return;
+  
   if (ord < 0)
     ord = goacc_device_num;
-
-  /* The OpenACC 2.0 spec leaves the runtime's behaviour when an out-of-range
-     device is requested as implementation-defined (4.2 ACC_DEVICE_NUM).
-     We choose to raise an error in such a case.  */
-  if (ord >= base_dev->get_num_devices_func ())
-    gomp_fatal ("device %u does not exist", ord);
-
+  
+  /* Decide which type of device to use.  If the current thread has a device
+     type already (e.g. set by acc_set_device_type), use that, else use the
+     global default.  */
+  if (thr && thr->base_dev)
+    base_dev = thr->base_dev;
+  else
+    {
+      assert (cached_base_dev);
+      base_dev = cached_base_dev;
+    }
+  
+  num_devices = base_dev->get_num_devices_func ();
+  if (num_devices <= 0 || ord >= num_devices)
+    gomp_fatal ("device %u out of range", ord);
+  
   if (!thr)
     thr = goacc_new_thread ();
-
-  acc_dev = thr->dev = &base_dev[ord];
-
-  assert (acc_dev->target_id == ord);
-
+  
+  thr->base_dev = base_dev;
+  thr->dev = acc_dev = &base_dev[ord];
   thr->saved_bound_dev = NULL;
   thr->mapped_data = NULL;
-
-  if (!acc_dev->openacc.target_data)
-    acc_dev->openacc.target_data = acc_dev->openacc.open_device_func (ord);
-
+  
   thr->target_tls
-    = acc_dev->openacc.create_thread_data_func (acc_dev->openacc.target_data);
-
+    = acc_dev->openacc.create_thread_data_func (ord);
+  
   acc_dev->openacc.async_set_async_func (acc_async_sync);
 }
 
@@ -292,74 +366,20 @@ lazy_open (int ord)
 void
 acc_init (acc_device_t d)
 {
-  if (!base_dev)
+  if (!cached_base_dev)
     gomp_init_targets_once ();
 
   gomp_mutex_lock (&acc_device_lock);
 
-  base_dev = acc_init_1 (d);
-
-  lazy_open (-1);
+  cached_base_dev = acc_init_1 (d);
 
   gomp_mutex_unlock (&acc_device_lock);
+  
+  goacc_attach_host_thread_to_device (-1);
 }
 
 ialias (acc_init)
 
-static void
-acc_shutdown_1 (acc_device_t d)
-{
-  struct goacc_thread *walk;
-
-  /* We don't check whether d matches the actual device found, because
-     OpenACC 2.0 (3.2.12) says the parameters to the init and this
-     call must match (for the shutdown call anyway, it's silent on
-     others).  */
-
-  if (!base_dev)
-    gomp_fatal ("no device initialized");
-  if (d != init_key)
-    gomp_fatal ("device %u(%u) is initialized",
-		(unsigned) init_key, (unsigned) base_dev->type);
-
-  gomp_mutex_lock (&goacc_thread_lock);
-
-  /* Free target-specific TLS data and close all devices.  */
-  for (walk = goacc_threads; walk != NULL; walk = walk->next)
-    {
-      if (walk->target_tls)
-	base_dev->openacc.destroy_thread_data_func (walk->target_tls);
-
-      walk->target_tls = NULL;
-
-      /* This would mean the user is shutting down OpenACC in the middle of an
-         "acc data" pragma.  Likely not intentional.  */
-      if (walk->mapped_data)
-	gomp_fatal ("shutdown in 'acc data' region");
-
-      if (walk->dev)
-	{
-	  void *target_data = walk->dev->openacc.target_data;
-	  if (walk->dev->openacc.close_device_func (target_data) < 0)
-	    gomp_fatal ("failed to close device");
-
-	  walk->dev->openacc.target_data = target_data = NULL;
-
-	  gomp_mutex_lock (&walk->dev->lock);
-	  gomp_free_memmap (&walk->dev->mem_map);
-	  gomp_mutex_unlock (&walk->dev->lock);
-
-	  walk->dev = NULL;
-	}
-    }
-
-  gomp_mutex_unlock (&goacc_thread_lock);
-
-  gomp_fini_device (base_dev);
-
-  base_dev = NULL;
-}
-
 void
 acc_shutdown (acc_device_t d)
 {
@@ -372,59 +392,16 @@ acc_shutdown (acc_device_t d)
 
 ialias (acc_shutdown)
 
-/* This function is called after plugins have been initialized.  It deals with
-   the "base" device, and is used to prepare the runtime for dealing with a
-   number of such devices (as implemented by some particular plugin).  If the
-   argument device type D matches a previous call to the function, return the
-   current base device, else shut the old device down and re-initialize with
-   the new device type.  */
-
-static struct gomp_device_descr *
-lazy_init (acc_device_t d)
-{
-  if (base_dev)
-    {
-      /* Re-initializing the same device, do nothing.  */
-      if (d == init_key)
-	return base_dev;
-
-      acc_shutdown_1 (init_key);
-    }
-
-  assert (!base_dev);
-
-  return acc_init_1 (d);
-}
-
-/* Ensure that plugins are loaded, initialize and open the (default-numbered)
-   device.  */
-
-static void
-lazy_init_and_open (acc_device_t d)
-{
-  if (!base_dev)
-    gomp_init_targets_once ();
-
-  gomp_mutex_lock (&acc_device_lock);
-
-  base_dev = lazy_init (d);
-
-  lazy_open (-1);
-
-  gomp_mutex_unlock (&acc_device_lock);
-}
-
 int
 acc_get_num_devices (acc_device_t d)
 {
   int n = 0;
-  const struct gomp_device_descr *acc_dev;
+  struct gomp_device_descr *acc_dev;
 
   if (d == acc_device_none)
     return 0;
 
-  if (!base_dev)
-    gomp_init_targets_once ();
+  gomp_init_targets_once ();
 
   acc_dev = resolve_device (d);
   if (!acc_dev)
@@ -439,10 +416,39 @@ acc_get_num_devices (acc_device_t d)
 
 ialias (acc_get_num_devices)
 
+/* Set the device type for the current thread only (using the current global
+   default device number), initialising that device if necessary.  Also set the
+   default device type for new threads to D.  */
+
 void
 acc_set_device_type (acc_device_t d)
 {
-  lazy_init_and_open (d);
+  struct gomp_device_descr *base_dev, *acc_dev;
+  struct goacc_thread *thr = goacc_thread ();
+
+  gomp_mutex_lock (&acc_device_lock);
+
+  if (!cached_base_dev)
+    gomp_init_targets_once ();
+
+  cached_base_dev = base_dev = resolve_device (d);
+  acc_dev = &base_dev[goacc_device_num];
+
+  if (!acc_dev->is_initialized)
+    gomp_init_device (acc_dev);
+
+  gomp_mutex_unlock (&acc_device_lock);
+
+  /* We're changing device type: invalidate the current thread's dev and
+     base_dev pointers.  */
+  if (thr && thr->base_dev != base_dev)
+    {
+      thr->base_dev = thr->dev = NULL;
+      if (thr->mapped_data)
+        gomp_fatal ("acc_set_device_type in 'acc data' region");
+    }
+
+  goacc_attach_host_thread_to_device (-1);
 }
 
 ialias (acc_set_device_type)
@@ -451,10 +457,11 @@ acc_device_t
 acc_get_device_type (void)
 {
   acc_device_t res = acc_device_none;
-  const struct gomp_device_descr *dev;
+  struct gomp_device_descr *dev;
+  struct goacc_thread *thr = goacc_thread ();
 
-  if (base_dev)
-    res = acc_device_type (base_dev->type);
+  if (thr && thr->base_dev)
+    res = acc_device_type (thr->base_dev->type);
   else
     {
       gomp_init_targets_once ();
@@ -475,78 +482,65 @@ int
 acc_get_device_num (acc_device_t d)
 {
   const struct gomp_device_descr *dev;
-  int num;
+  struct goacc_thread *thr = goacc_thread ();
 
   if (d >= _ACC_device_hwm)
     gomp_fatal ("device %u out of range", (unsigned)d);
 
-  if (!base_dev)
+  if (!cached_base_dev)
     gomp_init_targets_once ();
 
   dev = resolve_device (d);
   if (!dev)
-    gomp_fatal ("no devices of type %u", d);
+    gomp_fatal ("device %s not supported", name_of_acc_device_t (d));
 
-  /* We might not have called lazy_open for this host thread yet, in which case
-     the get_device_num_func hook will return -1.  */
-  num = dev->openacc.get_device_num_func ();
-  if (num < 0)
-    num = goacc_device_num;
+  if (thr && thr->base_dev == dev && thr->dev)
+    return thr->dev->target_id;
 
-  return num;
+  return goacc_device_num;
 }
 
 ialias (acc_get_device_num)
 
 void
-acc_set_device_num (int n, acc_device_t d)
+acc_set_device_num (int ord, acc_device_t d)
 {
-  const struct gomp_device_descr *dev;
+  struct gomp_device_descr *base_dev, *acc_dev;
   int num_devices;
 
-  if (!base_dev)
+  if (!cached_base_dev)
     gomp_init_targets_once ();
 
-  if ((int) d == 0)
-    {
-      int i;
-
-      /* A device setting of zero sets all device types on the system to use
-         the Nth instance of that device type.  Only attempt it for initialized
-	 devices though.  */
-      for (i = acc_device_not_host + 1; i < _ACC_device_hwm; i++)
-        {
-	  dev = resolve_device (d);
-	  if (dev && dev->is_initialized)
-	    dev->openacc.set_device_num_func (n);
-	}
+  if (ord < 0)
+    ord = goacc_device_num;
 
-      /* ...and for future calls to acc_init/acc_set_device_type, etc.  */
-      goacc_device_num = n;
-    }
+  if ((int) d == 0)
+    /* Set whatever device is being used by the current host thread to use
+       device instance ORD.  It's unclear if this is supposed to affect other
+       host threads too (OpenACC 2.0 (3.2.4) acc_set_device_num).  */
+    goacc_attach_host_thread_to_device (ord);
   else
     {
-      struct goacc_thread *thr = goacc_thread ();
-
       gomp_mutex_lock (&acc_device_lock);
 
-      base_dev = lazy_init (d);
+      cached_base_dev = base_dev = resolve_device (d);
 
       num_devices = base_dev->get_num_devices_func ();
 
-      if (n >= num_devices)
-        gomp_fatal ("device %u out of range", n);
+      if (ord >= num_devices)
+        gomp_fatal ("device %u out of range", ord);
 
-      /* If we're changing the device number, de-associate this thread with
-	 the device (but don't close the device, since it may be in use by
-	 other threads).  */
-      if (thr && thr->dev && n != thr->dev->target_id)
-	thr->dev = NULL;
+      acc_dev = &base_dev[ord];
 
-      lazy_open (n);
+      if (!acc_dev->is_initialized)
+        gomp_init_device (acc_dev);
 
       gomp_mutex_unlock (&acc_device_lock);
+
+      goacc_attach_host_thread_to_device (ord);
     }
+  
+  goacc_device_num = ord;
 }
 
 ialias (acc_set_device_num)
@@ -554,10 +548,7 @@ ialias (acc_set_device_num)
 int
 acc_on_device (acc_device_t dev)
 {
-  struct goacc_thread *thr = goacc_thread ();
-
-  if (thr && thr->dev
-      && acc_device_type (thr->dev->type) == acc_device_host_nonshm)
+  if (acc_get_device_type () == acc_device_host_nonshm)
     return dev == acc_device_host_nonshm || dev == acc_device_not_host;
 
   /* Just rely on the compiler builtin.  */
@@ -577,7 +568,7 @@ goacc_runtime_initialize (void)
 
   pthread_key_create (&goacc_cleanup_key, goacc_destroy_thread);
 
-  base_dev = NULL;
+  cached_base_dev = NULL;
 
   goacc_threads = NULL;
   gomp_mutex_init (&goacc_thread_lock);
@@ -606,9 +597,8 @@ goacc_restore_bind (void)
 }
 
 /* This is called from any OpenACC support function that may need to implicitly
-   initialize the libgomp runtime.  On exit all such initialization will have
-   been done, and both the global ACC_dev and the per-host-thread ACC_memmap
-   pointers will be valid.  */
+   initialize the libgomp runtime, either globally or from a new host thread. 
+   On exit "goacc_thread" will return a valid & populated thread block.  */
 
 attribute_hidden void
 goacc_lazy_initialize (void)
@@ -618,12 +608,8 @@ goacc_lazy_initialize (void)
   if (thr && thr->dev)
     return;
 
-  if (!base_dev)
-    lazy_init_and_open (acc_device_default);
+  if (!cached_base_dev)
+    acc_init (acc_device_default);
   else
-    {
-      gomp_mutex_lock (&acc_device_lock);
-      lazy_open (-1);
-      gomp_mutex_unlock (&acc_device_lock);
-    }
+    goacc_attach_host_thread_to_device (-1);
 }
diff --git a/libgomp/oacc-int.h b/libgomp/oacc-int.h
index 85619c8..0ace737 100644
--- a/libgomp/oacc-int.h
+++ b/libgomp/oacc-int.h
@@ -56,6 +56,9 @@ acc_device_type (enum offload_target_type type)
 
 struct goacc_thread
 {
+  /* The base device for the current thread.  */
+  struct gomp_device_descr *base_dev;
+
   /* The device for the current thread.  */
   struct gomp_device_descr *dev;
 
@@ -89,10 +92,7 @@ goacc_thread (void)
 #endif
 
 void goacc_register (struct gomp_device_descr *) __GOACC_NOTHROW;
-
-/* Current dispatcher.  */
-extern struct gomp_device_descr *base_dev;
-
+void goacc_attach_host_thread_to_device (int);
 void goacc_runtime_initialize (void);
 void goacc_save_and_set_bind (acc_device_t);
 void goacc_restore_bind (void);
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index fdc82e6..89ef5fc 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -107,7 +107,9 @@ acc_malloc (size_t s)
 
   struct goacc_thread *thr = goacc_thread ();
 
-  return base_dev->alloc_func (thr->dev->target_id, s);
+  assert (thr->dev);
+
+  return thr->dev->alloc_func (thr->dev->target_id, s);
 }
 
 /* OpenACC 2.0a (3.2.16) doesn't specify what to do in the event
@@ -122,6 +124,8 @@ acc_free (void *d)
   if (!d)
     return;
 
+  assert (thr && thr->dev);
+
   /* We don't have to call lazy open here, as the ptr value must have
      been returned by acc_malloc.  It's not permitted to pass NULL in
      (unless you got that null from acc_malloc).  */
@@ -134,7 +138,7 @@ acc_free (void *d)
      acc_unmap_data ((void *)(k->host_start + offset));
    }
 
-  base_dev->free_func (thr->dev->target_id, d);
+  thr->dev->free_func (thr->dev->target_id, d);
 }
 
 void
@@ -144,7 +148,9 @@ acc_memcpy_to_device (void *d, void *h, size_t s)
      been obtained from a routine that did that.  */
   struct goacc_thread *thr = goacc_thread ();
 
-  base_dev->host2dev_func (thr->dev->target_id, d, h, s);
+  assert (thr && thr->dev);
+
+  thr->dev->host2dev_func (thr->dev->target_id, d, h, s);
 }
 
 void
@@ -154,7 +160,9 @@ acc_memcpy_from_device (void *h, void *d, size_t s)
      been obtained from a routine that did that.  */
   struct goacc_thread *thr = goacc_thread ();
 
-  base_dev->dev2host_func (thr->dev->target_id, h, d, s);
+  assert (thr && thr->dev);
+
+  thr->dev->dev2host_func (thr->dev->target_id, h, d, s);
 }
 
 /* Return the device pointer that corresponds to host data H.  Or NULL
diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
index 563f9bb..9729e12 100644
--- a/libgomp/oacc-parallel.c
+++ b/libgomp/oacc-parallel.c
@@ -49,32 +49,6 @@ find_pset (int pos, size_t mapnum, unsigned short *kinds)
   return kind == GOMP_MAP_TO_PSET;
 }
 
-
-/* Ensure that the target device for DEVICE_TYPE is initialised (and that
-   plugins have been loaded if appropriate).  The ACC_dev variable for the
-   current thread will be set appropriately for the given device type on
-   return.  */
-
-attribute_hidden void
-select_acc_device (int device_type)
-{
-  goacc_lazy_initialize ();
-
-  if (device_type == GOMP_DEVICE_HOST_FALLBACK)
-    return;
-
-  if (device_type == acc_device_none)
-    device_type = acc_device_host;
-
-  if (device_type >= 0)
-    {
-      /* NOTE: this will go badly if the surrounding data environment is set up
-         to use a different device type.  We'll just have to trust that users
-	 know what they're doing...  */
-      acc_set_device_type (device_type);
-    }
-}
-
 static void goacc_wait (int async, int num_waits, va_list ap);
 
 void
@@ -111,7 +85,7 @@ GOACC_parallel (int device, void (*fn) (void *),
 	      __FUNCTION__, (unsigned long) mapnum, hostaddrs, sizes, kinds,
 	      async);
 #endif
-  select_acc_device (device);
+  goacc_lazy_initialize ();
 
   thr = goacc_thread ();
   acc_dev = thr->dev;
@@ -195,7 +169,7 @@ GOACC_data_start (int device, size_t mapnum,
 	      __FUNCTION__, (unsigned long) mapnum, hostaddrs, sizes, kinds);
 #endif
 
-  select_acc_device (device);
+  goacc_lazy_initialize ();
 
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
@@ -242,7 +216,7 @@ GOACC_enter_exit_data (int device, size_t mapnum,
   bool data_enter = false;
   size_t i;
 
-  select_acc_device (device);
+  goacc_lazy_initialize ();
 
   thr = goacc_thread ();
   acc_dev = thr->dev;
@@ -429,7 +403,7 @@ GOACC_update (int device, size_t mapnum,
   bool host_fallback = device == GOMP_DEVICE_HOST_FALLBACK;
   size_t i;
 
-  select_acc_device (device);
+  goacc_lazy_initialize ();
 
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
diff --git a/libgomp/plugin/plugin-host.c b/libgomp/plugin/plugin-host.c
index bc60f72..1faf5bc 100644
--- a/libgomp/plugin/plugin-host.c
+++ b/libgomp/plugin/plugin-host.c
@@ -119,31 +119,6 @@ GOMP_OFFLOAD_unload_image (int n __attribute__ ((unused)),
 }
 
 STATIC void *
-GOMP_OFFLOAD_openacc_open_device (int n)
-{
-  return (void *) (intptr_t) n;
-}
-
-STATIC int
-GOMP_OFFLOAD_openacc_close_device (void *hnd)
-{
-  return 0;
-}
-
-STATIC int
-GOMP_OFFLOAD_openacc_get_device_num (void)
-{
-  return 0;
-}
-
-STATIC void
-GOMP_OFFLOAD_openacc_set_device_num (int n)
-{
-  if (n > 0)
-    GOMP (fatal) ("device number %u out of range for host execution", n);
-}
-
-STATIC void *
 GOMP_OFFLOAD_alloc (int n __attribute__ ((unused)), size_t s)
 {
   return GOMP (malloc) (s);
@@ -254,7 +229,7 @@ GOMP_OFFLOAD_openacc_async_wait_all_async (int async __attribute__ ((unused)))
 }
 
 STATIC void *
-GOMP_OFFLOAD_openacc_create_thread_data (void *targ_data
+GOMP_OFFLOAD_openacc_create_thread_data (int ord
 					 __attribute__ ((unused)))
 {
   return NULL;
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 483cb75..583ec87 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -133,7 +133,8 @@ struct targ_fn_descriptor
   const char *name;
 };
 
-static bool ptx_inited = false;
+static unsigned int instantiated_devices = 0;
+static pthread_mutex_t ptx_dev_lock = PTHREAD_MUTEX_INITIALIZER;
 
 struct ptx_stream
 {
@@ -331,9 +332,21 @@ struct ptx_event
   struct ptx_event *next;
 };
 
+struct ptx_image_data
+{
+  void *target_data;
+  CUmodule module;
+  struct ptx_image_data *next;
+};
+
 static pthread_mutex_t ptx_event_lock;
 static struct ptx_event *ptx_events;
 
+static struct ptx_device **ptx_devices;
+
+static struct ptx_image_data *ptx_images = NULL;
+static pthread_mutex_t ptx_image_lock = PTHREAD_MUTEX_INITIALIZER;
+
 #define _XSTR(s) _STR(s)
 #define _STR(s) #s
 
@@ -450,8 +463,8 @@ fini_streams_for_device (struct ptx_device *ptx_dev)
       struct ptx_stream *s = ptx_dev->active_streams;
       ptx_dev->active_streams = ptx_dev->active_streams->next;
 
-      cuStreamDestroy (s->stream);
       map_fini (s);
+      cuStreamDestroy (s->stream);
       free (s);
     }
 
@@ -575,21 +588,21 @@ select_stream_for_async (int async, pthread_t thread, bool create,
   return stream;
 }
 
-static int nvptx_get_num_devices (void);
-
-/* Initialize the device.  */
-static int
+/* Initialize the device.  Return TRUE on success, else FALSE.  PTX_DEV_LOCK
+   should be locked on entry and remains locked on exit.  */
+static bool
 nvptx_init (void)
 {
   CUresult r;
   int rc;
+  int ndevs;
 
-  if (ptx_inited)
-    return nvptx_get_num_devices ();
+  if (instantiated_devices != 0)
+    return true;
 
   rc = verify_device_library ();
   if (rc < 0)
-    return -1;
+    return false;
 
   r = cuInit (0);
   if (r != CUDA_SUCCESS)
@@ -599,22 +612,64 @@ nvptx_init (void)
 
   pthread_mutex_init (&ptx_event_lock, NULL);
 
-  ptx_inited = true;
+  r = cuDeviceGetCount (&ndevs);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuDeviceGetCount error: %s", cuda_error (r));
 
-  return nvptx_get_num_devices ();
+  ptx_devices = GOMP_PLUGIN_malloc_cleared (sizeof (struct ptx_device *)
+					    * ndevs);
+
+  return true;
 }
 
+/* Select the N'th PTX device for the current host thread.  The device must
+   have been previously opened before calling this function.  */
+
 static void
-nvptx_fini (void)
+nvptx_attach_host_thread_to_device (int n)
 {
-  ptx_inited = false;
+  CUdevice dev;
+  CUresult r;
+  struct ptx_device *ptx_dev;
+  CUcontext thd_ctx;
+
+  r = cuCtxGetDevice (&dev);
+  if (r != CUDA_SUCCESS && r != CUDA_ERROR_INVALID_CONTEXT)
+    GOMP_PLUGIN_fatal ("cuCtxGetDevice error: %s", cuda_error (r));
+
+  if (r != CUDA_ERROR_INVALID_CONTEXT && dev == n)
+    return;
+  else
+    {
+      CUcontext old_ctx;
+
+      ptx_dev = ptx_devices[n];
+      assert (ptx_dev);
+
+      r = cuCtxGetCurrent (&thd_ctx);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuCtxGetCurrent error: %s", cuda_error (r));
+
+      /* We don't necessarily have a current context (e.g. if it has been
+         destroyed.  Pop it if we do though.  */
+      if (thd_ctx != NULL)
+	{
+	  r = cuCtxPopCurrent (&old_ctx);
+	  if (r != CUDA_SUCCESS)
+            GOMP_PLUGIN_fatal ("cuCtxPopCurrent error: %s", cuda_error (r));
+	}
+
+      r = cuCtxPushCurrent (ptx_dev->ctx);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuCtxPushCurrent error: %s", cuda_error (r));
+    }
 }
 
-static void *
+static struct ptx_device *
 nvptx_open_device (int n)
 {
   struct ptx_device *ptx_dev;
-  CUdevice dev;
+  CUdevice dev, ctx_dev;
   CUresult r;
   int async_engines, pi;
 
@@ -628,6 +683,21 @@ nvptx_open_device (int n)
   ptx_dev->dev = dev;
   ptx_dev->ctx_shared = false;
 
+  r = cuCtxGetDevice (&ctx_dev);
+  if (r != CUDA_SUCCESS && r != CUDA_ERROR_INVALID_CONTEXT)
+    GOMP_PLUGIN_fatal ("cuCtxGetDevice error: %s", cuda_error (r));
+  
+  if (r != CUDA_ERROR_INVALID_CONTEXT && ctx_dev != dev)
+    {
+      /* The current host thread has an active context for a different device.
+         Detach it.  */
+      CUcontext old_ctx;
+      
+      r = cuCtxPopCurrent (&old_ctx);
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuCtxPopCurrent error: %s", cuda_error (r));
+    }
+
   r = cuCtxGetCurrent (&ptx_dev->ctx);
   if (r != CUDA_SUCCESS)
     GOMP_PLUGIN_fatal ("cuCtxGetCurrent error: %s", cuda_error (r));
@@ -678,17 +748,16 @@ nvptx_open_device (int n)
 
   init_streams_for_device (ptx_dev, async_engines);
 
-  return (void *) ptx_dev;
+  return ptx_dev;
 }
 
-static int
-nvptx_close_device (void *targ_data)
+static void
+nvptx_close_device (struct ptx_device *ptx_dev)
 {
   CUresult r;
-  struct ptx_device *ptx_dev = targ_data;
 
   if (!ptx_dev)
-    return 0;
+    return;
 
   fini_streams_for_device (ptx_dev);
 
@@ -700,8 +769,6 @@ nvptx_close_device (void *targ_data)
     }
 
   free (ptx_dev);
-
-  return 0;
 }
 
 static int
@@ -714,7 +781,7 @@ nvptx_get_num_devices (void)
      order to enumerate available devices, but CUDA API routines can't be used
      until cuInit has been called.  Just call it now (but don't yet do any
      further initialization).  */
-  if (!ptx_inited)
+  if (instantiated_devices == 0)
     cuInit (0);
 
   r = cuDeviceGetCount (&n);
@@ -1507,64 +1574,84 @@ GOMP_OFFLOAD_get_num_devices (void)
   return nvptx_get_num_devices ();
 }
 
-static void **kernel_target_data;
-static void **kernel_host_table;
-
 void
-GOMP_OFFLOAD_register_image (void *host_table, void *target_data)
+GOMP_OFFLOAD_init_device (int n)
 {
-  kernel_target_data = target_data;
-  kernel_host_table = host_table;
-}
+  pthread_mutex_lock (&ptx_dev_lock);
 
-void
-GOMP_OFFLOAD_init_device (int n __attribute__ ((unused)))
-{
-  (void) nvptx_init ();
+  if (!nvptx_init () || ptx_devices[n] != NULL)
+    {
+      pthread_mutex_unlock (&ptx_dev_lock);
+      return;
+    }
+
+  ptx_devices[n] = nvptx_open_device (n);
+  instantiated_devices++;
+
+  pthread_mutex_unlock (&ptx_dev_lock);
 }
 
 void
-GOMP_OFFLOAD_fini_device (int n __attribute__ ((unused)))
+GOMP_OFFLOAD_fini_device (int n)
 {
-  nvptx_fini ();
+  pthread_mutex_lock (&ptx_dev_lock);
+
+  if (ptx_devices[n] != NULL)
+    {
+      nvptx_attach_host_thread_to_device (n);
+      nvptx_close_device (ptx_devices[n]);
+      ptx_devices[n] = NULL;
+      instantiated_devices--;
+    }
+
+  pthread_mutex_unlock (&ptx_dev_lock);
 }
 
 int
-GOMP_OFFLOAD_get_table (int n __attribute__ ((unused)),
-			struct mapping_table **tablep)
+GOMP_OFFLOAD_load_image (int ord, void *target_data,
+			 struct addr_pair **target_table)
 {
   CUmodule module;
-  void **fn_table;
-  char **fn_names;
-  int fn_entries, i;
+  char **fn_names, **var_names;
+  unsigned int fn_entries, var_entries, i, j;
   CUresult r;
   struct targ_fn_descriptor *targ_fns;
+  void **img_header = (void **) target_data;
+  struct ptx_image_data *new_image;
 
-  if (nvptx_init () <= 0)
-    return 0;
+  GOMP_OFFLOAD_init_device (ord);
 
-  /* This isn't an error, because an image may legitimately have no offloaded
-     regions and so will not call GOMP_offload_register.  */
-  if (kernel_target_data == NULL)
-    return 0;
+  nvptx_attach_host_thread_to_device (ord);
+
+  link_ptx (&module, img_header[0]);
 
-  link_ptx (&module, kernel_target_data[0]);
+  pthread_mutex_lock (&ptx_image_lock);
+  new_image = GOMP_PLUGIN_malloc (sizeof (struct ptx_image_data));
+  new_image->target_data = target_data;
+  new_image->module = module;
+  new_image->next = ptx_images;
+  ptx_images = new_image;
+  pthread_mutex_unlock (&ptx_image_lock);
 
-  /* kernel_target_data[0] -> ptx code
-     kernel_target_data[1] -> variable mappings
-     kernel_target_data[2] -> array of kernel names in ascii
+  /* The mkoffload utility emits a table of pointers/integers at the start of
+     each offload image:
 
-     kernel_host_table[0] -> start of function addresses (__offload_func_table)
-     kernel_host_table[1] -> end of function addresses (__offload_funcs_end)
+     img_header[0] -> ptx code
+     img_header[1] -> number of variables
+     img_header[2] -> array of variable names (pointers to strings)
+     img_header[3] -> number of kernels
+     img_header[4] -> array of kernel names (pointers to strings)
 
      The array of kernel names and the functions addresses form a
      one-to-one correspondence.  */
 
-  fn_table = kernel_host_table[0];
-  fn_names = (char **) kernel_target_data[2];
-  fn_entries = (kernel_host_table[1] - kernel_host_table[0]) / sizeof (void *);
+  var_entries = (uintptr_t) img_header[1];
+  var_names = (char **) img_header[2];
+  fn_entries = (uintptr_t) img_header[3];
+  fn_names = (char **) img_header[4];
 
-  *tablep = GOMP_PLUGIN_malloc (sizeof (struct mapping_table) * fn_entries);
+  *target_table = GOMP_PLUGIN_malloc (sizeof (struct addr_pair)
+				      * (fn_entries + var_entries));
   targ_fns = GOMP_PLUGIN_malloc (sizeof (struct targ_fn_descriptor)
 				 * fn_entries);
 
@@ -1579,38 +1666,86 @@ GOMP_OFFLOAD_get_table (int n __attribute__ ((unused)),
       targ_fns[i].fn = function;
       targ_fns[i].name = (const char *) fn_names[i];
 
-      (*tablep)[i].host_start = (uintptr_t) fn_table[i];
-      (*tablep)[i].host_end = (*tablep)[i].host_start + 1;
-      (*tablep)[i].tgt_start = (uintptr_t) &targ_fns[i];
-      (*tablep)[i].tgt_end = (*tablep)[i].tgt_start + 1;
+      (*target_table)[i].start = (uintptr_t) &targ_fns[i];
+      (*target_table)[i].end = (*target_table)[i].start + 1;
     }
 
-  return fn_entries;
+  for (j = 0; j < var_entries; j++, i++)
+    {
+      CUdeviceptr var;
+      size_t bytes;
+
+      r = cuModuleGetGlobal (&var, &bytes, module, var_names[j]);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuModuleGetGlobal error: %s", cuda_error (r));
+
+      (*target_table)[i].start = (uintptr_t) var;
+      (*target_table)[i].end = (*target_table)[i].start + bytes;
+    }
+
+  return i;
+}
+
+void
+GOMP_OFFLOAD_unload_image (int tid __attribute__((unused)), void *target_data)
+{
+  void **img_header = (void **) target_data;
+  struct targ_fn_descriptor *targ_fns
+    = (struct targ_fn_descriptor *) img_header[0];
+  struct ptx_image_data *image, *prev = NULL, *newhd = NULL;
+
+  free (targ_fns);
+
+  pthread_mutex_lock (&ptx_image_lock);
+  for (image = ptx_images; image != NULL;)
+    {
+      struct ptx_image_data *next = image->next;
+
+      if (image->target_data == target_data)
+	{
+	  cuModuleUnload (image->module);
+	  free (image);
+	  if (prev)
+	    prev->next = next;
+	}
+      else
+	{
+	  prev = image;
+	  if (!newhd)
+	    newhd = image;
+	}
+
+      image = next;
+    }
+  ptx_images = newhd;
+  pthread_mutex_unlock (&ptx_image_lock);
 }
 
 void *
-GOMP_OFFLOAD_alloc (int n __attribute__ ((unused)), size_t size)
+GOMP_OFFLOAD_alloc (int ord, size_t size)
 {
+  nvptx_attach_host_thread_to_device (ord);
   return nvptx_alloc (size);
 }
 
 void
-GOMP_OFFLOAD_free (int n __attribute__ ((unused)), void *ptr)
+GOMP_OFFLOAD_free (int ord, void *ptr)
 {
+  nvptx_attach_host_thread_to_device (ord);
   nvptx_free (ptr);
 }
 
 void *
-GOMP_OFFLOAD_dev2host (int ord __attribute__ ((unused)), void *dst,
-		       const void *src, size_t n)
+GOMP_OFFLOAD_dev2host (int ord, void *dst, const void *src, size_t n)
 {
+  nvptx_attach_host_thread_to_device (ord);
   return nvptx_dev2host (dst, src, n);
 }
 
 void *
-GOMP_OFFLOAD_host2dev (int ord __attribute__ ((unused)), void *dst,
-		       const void *src, size_t n)
+GOMP_OFFLOAD_host2dev (int ord, void *dst, const void *src, size_t n)
 {
+  nvptx_attach_host_thread_to_device (ord);
   return nvptx_host2dev (dst, src, n);
 }
 
@@ -1627,45 +1762,6 @@ GOMP_OFFLOAD_openacc_parallel (void (*fn) (void *), size_t mapnum,
 	    num_workers, vector_length, async, targ_mem_desc);
 }
 
-void *
-GOMP_OFFLOAD_openacc_open_device (int n)
-{
-  return nvptx_open_device (n);
-}
-
-int
-GOMP_OFFLOAD_openacc_close_device (void *h)
-{
-  return nvptx_close_device (h);
-}
-
-void
-GOMP_OFFLOAD_openacc_set_device_num (int n)
-{
-  struct nvptx_thread *nvthd = nvptx_thread ();
-
-  assert (n >= 0);
-
-  if (!nvthd->ptx_dev || nvthd->ptx_dev->ord != n)
-    (void) nvptx_open_device (n);
-}
-
-/* This can be called before the device is "opened" for the current thread, in
-   which case we can't tell which device number should be returned.  We don't
-   actually want to open the device here, so just return -1 and let the caller
-   (oacc-init.c:acc_get_device_num) handle it.  */
-
-int
-GOMP_OFFLOAD_openacc_get_device_num (void)
-{
-  struct nvptx_thread *nvthd = nvptx_thread ();
-
-  if (nvthd && nvthd->ptx_dev)
-    return nvthd->ptx_dev->ord;
-  else
-    return -1;
-}
-
 void
 GOMP_OFFLOAD_openacc_register_async_cleanup (void *targ_mem_desc)
 {
@@ -1729,14 +1825,18 @@ GOMP_OFFLOAD_openacc_async_set_async (int async)
 }
 
 void *
-GOMP_OFFLOAD_openacc_create_thread_data (void *targ_data)
+GOMP_OFFLOAD_openacc_create_thread_data (int ord)
 {
-  struct ptx_device *ptx_dev = (struct ptx_device *) targ_data;
+  struct ptx_device *ptx_dev;
   struct nvptx_thread *nvthd
     = GOMP_PLUGIN_malloc (sizeof (struct nvptx_thread));
   CUresult r;
   CUcontext thd_ctx;
 
+  ptx_dev = ptx_devices[ord];
+
+  assert (ptx_dev);
+
   r = cuCtxGetCurrent (&thd_ctx);
   if (r != CUDA_SUCCESS)
     GOMP_PLUGIN_fatal ("cuCtxGetCurrent error: %s", cuda_error (r));
diff --git a/libgomp/target.c b/libgomp/target.c
index ba2d231..c67502a 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -163,7 +163,6 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
   tgt->list_count = mapnum;
   tgt->refcount = 1;
   tgt->device_descr = devicep;
-  tgt->mem_map = mem_map;
 
   if (mapnum == 0)
     return tgt;
@@ -571,7 +570,7 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
 	  devicep->dev2host_func (devicep->target_id, (void *) k->host_start,
 				  (void *) (k->tgt->tgt_start + k->tgt_offset),
 				  k->host_end - k->host_start);
-	splay_tree_remove (tgt->mem_map, k);
+	splay_tree_remove (&devicep->mem_map, k);
 	if (k->tgt->refcount > 1)
 	  k->tgt->refcount--;
 	else
@@ -1086,10 +1085,6 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
     {
       optional_present = optional_total = 0;
       DLSYM_OPT (openacc.exec, openacc_parallel);
-      DLSYM_OPT (openacc.open_device, openacc_open_device);
-      DLSYM_OPT (openacc.close_device, openacc_close_device);
-      DLSYM_OPT (openacc.get_device_num, openacc_get_device_num);
-      DLSYM_OPT (openacc.set_device_num, openacc_set_device_num);
       DLSYM_OPT (openacc.register_async_cleanup,
 		 openacc_register_async_cleanup);
       DLSYM_OPT (openacc.async_test, openacc_async_test);
@@ -1198,7 +1193,6 @@ gomp_target_init (void)
 		current_device.mem_map.root = NULL;
 		current_device.is_initialized = false;
 		current_device.openacc.data_environ = NULL;
-		current_device.openacc.target_data = NULL;
 		for (i = 0; i < new_num_devices; i++)
 		  {
 		    current_device.target_id = i;
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-9.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-9.c
index 84045db..a4cf7f2 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-9.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-9.c
@@ -58,7 +58,7 @@ main (int argc, char **argv)
       acc_set_device_num (1, (acc_device_t) 0);
 
       devnum = acc_get_device_num (devtype);
-      if (devnum != 0)
+      if (devnum != 1)
 	abort ();
   }
 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
  2015-03-30 16:42                                   ` Jakub Jelinek
  2015-03-30 21:43                                     ` Julian Brown
@ 2015-03-31 12:52                                     ` Ilya Verbin
  2015-03-31 13:08                                       ` Jakub Jelinek
  2015-03-31 18:25                                     ` libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch) Ilya Verbin
  2 siblings, 1 reply; 92+ messages in thread
From: Ilya Verbin @ 2015-03-31 12:52 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Thomas Schwinge, Julian Brown, gcc-patches, Kirill Yukhin

On Mon, Mar 30, 2015 at 22:42:51 +0100, Julian Brown wrote:
> On Mon, 30 Mar 2015 18:42:02 +0200
> Jakub Jelinek <jakub@redhat.com> wrote:
> > But the one Julian posted doesn't apply on top of your patch.
> > If there is any interdiff needed on top of your patch, can it be
> > posted against trunk + your patch?
> 
> Here's a version of my patch against trunk and Ilya's latest patch
> (hopefully!). Tests look OK (libgomp + PTX).

Thanks for rebasing!

On Mon, Mar 30, 2015 at 18:42:02 +0200, Jakub Jelinek wrote:
> > +/* Insert mapping of host -> target address pairs to splay tree.  */
> > +
> > +static void
> > +gomp_splay_tree_insert_mapping (struct gomp_device_descr *devicep,
> > +				struct addr_pair *host_addr,
> > +				struct addr_pair *tgt_addr)
> > +{
> > +  struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt));
> > +  tgt->refcount = 1;
> > +  tgt->array = gomp_malloc (sizeof (*tgt->array));
> > +  tgt->tgt_start = tgt_addr->start;
> > +  tgt->tgt_end = tgt_addr->end;
> > +  tgt->to_free = NULL;
> > +  tgt->list_count = 0;
> > +  tgt->device_descr = devicep;
> > +  splay_tree_node node = tgt->array;
> > +  splay_tree_key k = &node->key;
> > +  k->host_start = host_addr->start;
> > +  k->host_end = host_addr->end;
> > +  k->tgt_offset = 0;
> > +  k->refcount = 1;
> > +  k->copy_from = false;
> > +  k->tgt = tgt;
> > +  node->left = NULL;
> > +  node->right = NULL;
> > +  splay_tree_insert (&devicep->mem_map, node);
> > +}
> 
> What is the reason to register and allocate these one at a time, rather than
> using one struct target_mem_desc with one tgt->array for all splay tree
> nodes registered from one image?
> Perhaps you would just use tgt_start of 0 and tgt_end of 0 too (to make it
> clear it is special) and just use tgt_offset relative to that (i.e.
> absolute), but having to malloc each node individually and having to malloc
> a target_mem_desc for each one sounds expensive.
> Everything is freed just once anyway, isn't it?

Here is WIP patch, does this look like what you suggested?  It works fine with
functions, however I'm not sure what to do with variables.  Will gomp_map_vars
work when tgt_start and tgt_end are equal to 0?

> > @@ -654,6 +727,18 @@ void
> >  GOMP_offload_register (void *host_table, enum offload_target_type target_type,
> >  		       void *target_data)
> >  {
> > +  int i;
> > +  gomp_mutex_lock (&register_lock);
> > +
> > +  /* Load image to all initialized devices.  */
> > +  for (i = 0; i < num_devices; i++)
> > +    {
> > +      struct gomp_device_descr *devicep = &devices[i];
> > +      if (devicep->type == target_type && devicep->is_initialized)
> > +	gomp_offload_image_to_device (devicep, host_table, target_data);
> 
> Shouldn't either this function, or gomp_offload_image_to_device lock
> also devicep->lock mutex and unlock at the end?
> Where exactly I guess depends on if the devicep->* hook calls should be
> guarded with the mutex or not.  If yes, it should be this function and
> gomp_init_device.
> 
> > +      if (devicep->type != target_type || !devicep->is_initialized)
> > +	continue;
> > +
> 
> Similarly.

I've added lock/unlock to GOMP_offload_register and GOMP_offload_unregister.
All calls to gomp_init_device were already guarded.

> > +      devicep->unload_image_func (devicep->target_id, target_data);
> > +
> > +      /* Remove mapping from splay tree.  */
> > +      for (j = 0; j < num_funcs; j++)
> > +	{
> > +	  struct splay_tree_key_s k;
> > +	  k.host_start = (uintptr_t) host_func_table[j];
> > +	  k.host_end = k.host_start + 1;
> > +	  splay_tree_remove (&devicep->mem_map, &k);
> > +	}
> > +
> > +      for (j = 0; j < num_vars; j++)
> > +	{
> > +	  struct splay_tree_key_s k;
> > +	  k.host_start = (uintptr_t) host_var_table[j*2];
> > +	  k.host_end = k.host_start + (uintptr_t) host_var_table[j*2+1];
> > +	  splay_tree_remove (&devicep->mem_map, &k);
> > +	}
> > +    }
> 
> Aren't you leaking here all the tgt->array and tgt allocations here?
> Though, if you change it to just two allocations (one tgt, one array),
> you'd need to free just once.

You're right.  I've fixed this for functions, variables are WIP.


diff --git a/gcc/config/i386/intelmic-mkoffload.c b/gcc/config/i386/intelmic-mkoffload.c
index f93007c..e101f93 100644
--- a/gcc/config/i386/intelmic-mkoffload.c
+++ b/gcc/config/i386/intelmic-mkoffload.c
@@ -350,14 +350,24 @@ generate_host_descr_file (const char *host_compiler)
 	   "#ifdef __cplusplus\n"
 	   "extern \"C\"\n"
 	   "#endif\n"
-	   "void GOMP_offload_register (void *, int, void *);\n\n"
+	   "void GOMP_offload_register (void *, int, void *);\n"
+	   "void GOMP_offload_unregister (void *, int, void *);\n\n"
 
 	   "__attribute__((constructor))\n"
 	   "static void\n"
 	   "init (void)\n"
 	   "{\n"
 	   "  GOMP_offload_register (&__OFFLOAD_TABLE__, %d, __offload_target_data);\n"
+	   "}\n\n", GOMP_DEVICE_INTEL_MIC);
+
+  fprintf (src_file,
+	   "__attribute__((destructor))\n"
+	   "static void\n"
+	   "fini (void)\n"
+	   "{\n"
+	   "  GOMP_offload_unregister (&__OFFLOAD_TABLE__, %d, __offload_target_data);\n"
 	   "}\n", GOMP_DEVICE_INTEL_MIC);
+
   fclose (src_file);
 
   unsigned new_argc = 0;
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index d9cbff5..1072ae4 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -51,14 +51,12 @@ enum offload_target_type
   OFFLOAD_TARGET_TYPE_INTEL_MIC = 6
 };
 
-/* Auxiliary struct, used for transferring a host-target address range mapping
-   from plugin to libgomp.  */
-struct mapping_table
+/* Auxiliary struct, used for transferring pairs of addresses from plugin
+   to libgomp.  */
+struct addr_pair
 {
-  uintptr_t host_start;
-  uintptr_t host_end;
-  uintptr_t tgt_start;
-  uintptr_t tgt_end;
+  uintptr_t start;
+  uintptr_t end;
 };
 
 /* Miscellaneous functions.  */
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 3089401..a1d42c5 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -224,7 +224,6 @@ struct gomp_team_state
 };
 
 struct target_mem_desc;
-struct gomp_memory_mapping;
 
 /* These are the OpenMP 4.0 Internal Control Variables described in
    section 2.3.1.  Those described as having one copy per task are
@@ -657,7 +656,7 @@ struct target_mem_desc {
   struct gomp_device_descr *device_descr;
 
   /* Memory mapping info for the thread that created this descriptor.  */
-  struct gomp_memory_mapping *mem_map;
+  struct splay_tree_s *mem_map;
 
   /* List of splay keys to remove (or decrease refcount)
      at the end of region.  */
@@ -683,20 +682,6 @@ struct splay_tree_key_s {
 
 #include "splay-tree.h"
 
-/* Information about mapped memory regions (per device/context).  */
-
-struct gomp_memory_mapping
-{
-  /* Mutex for operating with the splay tree and other shared structures.  */
-  gomp_mutex_t lock;
-
-  /* True when tables have been added to this memory map.  */
-  bool is_initialized;
-
-  /* Splay tree containing information about mapped memory regions.  */
-  struct splay_tree_s splay_tree;
-};
-
 typedef struct acc_dispatch_t
 {
   /* This is a linked list of data mapped using the
@@ -773,19 +758,18 @@ struct gomp_device_descr
   unsigned int (*get_caps_func) (void);
   int (*get_type_func) (void);
   int (*get_num_devices_func) (void);
-  void (*register_image_func) (void *, void *);
   void (*init_device_func) (int);
   void (*fini_device_func) (int);
-  int (*get_table_func) (int, struct mapping_table **);
+  int (*load_image_func) (int, void *, struct addr_pair **);
+  void (*unload_image_func) (int, void *);
   void *(*alloc_func) (int, size_t);
   void (*free_func) (int, void *);
   void *(*dev2host_func) (int, void *, const void *, size_t);
   void *(*host2dev_func) (int, void *, const void *, size_t);
   void (*run_func) (int, void *, void *);
 
-  /* Memory-mapping info for this device instance.  */
-  /* Uses a separate lock.  */
-  struct gomp_memory_mapping mem_map;
+  /* Splay tree containing information about mapped memory regions.  */
+  struct splay_tree_s mem_map;
 
   /* Mutex for the mutable data.  */
   gomp_mutex_t lock;
@@ -793,9 +777,6 @@ struct gomp_device_descr
   /* Set to true when device is initialized.  */
   bool is_initialized;
 
-  /* True when offload regions have been registered with this device.  */
-  bool offload_regions_registered;
-
   /* OpenACC-specific data and functions.  */
   /* This is mutable because of its mutable data_environ and target_data
      members.  */
@@ -811,9 +792,7 @@ extern struct target_mem_desc *gomp_map_vars (struct gomp_device_descr *,
 extern void gomp_copy_from_async (struct target_mem_desc *);
 extern void gomp_unmap_vars (struct target_mem_desc *, bool);
 extern void gomp_init_device (struct gomp_device_descr *);
-extern void gomp_init_tables (struct gomp_device_descr *,
-			      struct gomp_memory_mapping *);
-extern void gomp_free_memmap (struct gomp_memory_mapping *);
+extern void gomp_free_memmap (struct splay_tree_s *);
 extern void gomp_fini_device (struct gomp_device_descr *);
 
 /* work.c */
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index f44174e..2b2b953 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -231,6 +231,7 @@ GOMP_4.0 {
 GOMP_4.0.1 {
   global:
 	GOMP_offload_register;
+	GOMP_offload_unregister;
 } GOMP_4.0;
 
 OACC_2.0 {
diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index 6aeb1e7..e4756b6 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -43,20 +43,18 @@ static struct gomp_device_descr host_dispatch =
     .get_caps_func = GOMP_OFFLOAD_get_caps,
     .get_type_func = GOMP_OFFLOAD_get_type,
     .get_num_devices_func = GOMP_OFFLOAD_get_num_devices,
-    .register_image_func = GOMP_OFFLOAD_register_image,
     .init_device_func = GOMP_OFFLOAD_init_device,
     .fini_device_func = GOMP_OFFLOAD_fini_device,
-    .get_table_func = GOMP_OFFLOAD_get_table,
+    .load_image_func = GOMP_OFFLOAD_load_image,
+    .unload_image_func = GOMP_OFFLOAD_unload_image,
     .alloc_func = GOMP_OFFLOAD_alloc,
     .free_func = GOMP_OFFLOAD_free,
     .dev2host_func = GOMP_OFFLOAD_dev2host,
     .host2dev_func = GOMP_OFFLOAD_host2dev,
     .run_func = GOMP_OFFLOAD_run,
 
-    .mem_map.is_initialized = false,
-    .mem_map.splay_tree.root = NULL,
+    .mem_map.root = NULL,
     .is_initialized = false,
-    .offload_regions_registered = false,
 
     .openacc = {
       .open_device_func = GOMP_OFFLOAD_openacc_open_device,
@@ -94,7 +92,6 @@ static struct gomp_device_descr host_dispatch =
 static __attribute__ ((constructor))
 void goacc_host_init (void)
 {
-  gomp_mutex_init (&host_dispatch.mem_map.lock);
   gomp_mutex_init (&host_dispatch.lock);
   goacc_register (&host_dispatch);
 }
diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
index 166eb55..1e0243e 100644
--- a/libgomp/oacc-init.c
+++ b/libgomp/oacc-init.c
@@ -284,12 +284,6 @@ lazy_open (int ord)
     = acc_dev->openacc.create_thread_data_func (acc_dev->openacc.target_data);
 
   acc_dev->openacc.async_set_async_func (acc_async_sync);
-
-  struct gomp_memory_mapping *mem_map = &acc_dev->mem_map;
-  gomp_mutex_lock (&mem_map->lock);
-  if (!mem_map->is_initialized)
-    gomp_init_tables (acc_dev, mem_map);
-  gomp_mutex_unlock (&mem_map->lock);
 }
 
 /* OpenACC 2.0a (3.2.12, 3.2.13) doesn't specify whether the serialization of
@@ -351,10 +345,9 @@ acc_shutdown_1 (acc_device_t d)
 
 	  walk->dev->openacc.target_data = target_data = NULL;
 
-	  struct gomp_memory_mapping *mem_map = &walk->dev->mem_map;
-	  gomp_mutex_lock (&mem_map->lock);
-	  gomp_free_memmap (mem_map);
-	  gomp_mutex_unlock (&mem_map->lock);
+	  gomp_mutex_lock (&walk->dev->lock);
+	  gomp_free_memmap (&walk->dev->mem_map);
+	  gomp_mutex_unlock (&walk->dev->lock);
 
 	  walk->dev = NULL;
 	}
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 0096d51..fdc82e6 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -38,7 +38,7 @@
 /* Return block containing [H->S), or NULL if not contained.  */
 
 static splay_tree_key
-lookup_host (struct gomp_memory_mapping *mem_map, void *h, size_t s)
+lookup_host (struct gomp_device_descr *dev, void *h, size_t s)
 {
   struct splay_tree_key_s node;
   splay_tree_key key;
@@ -46,11 +46,9 @@ lookup_host (struct gomp_memory_mapping *mem_map, void *h, size_t s)
   node.host_start = (uintptr_t) h;
   node.host_end = (uintptr_t) h + s;
 
-  gomp_mutex_lock (&mem_map->lock);
-
-  key = splay_tree_lookup (&mem_map->splay_tree, &node);
-
-  gomp_mutex_unlock (&mem_map->lock);
+  gomp_mutex_lock (&dev->lock);
+  key = splay_tree_lookup (&dev->mem_map, &node);
+  gomp_mutex_unlock (&dev->lock);
 
   return key;
 }
@@ -65,14 +63,11 @@ lookup_dev (struct target_mem_desc *tgt, void *d, size_t s)
 {
   int i;
   struct target_mem_desc *t;
-  struct gomp_memory_mapping *mem_map;
 
   if (!tgt)
     return NULL;
 
-  mem_map = tgt->mem_map;
-
-  gomp_mutex_lock (&mem_map->lock);
+  gomp_mutex_lock (&tgt->device_descr->lock);
 
   for (t = tgt; t != NULL; t = t->prev)
     {
@@ -80,7 +75,7 @@ lookup_dev (struct target_mem_desc *tgt, void *d, size_t s)
         break;
     }
 
-  gomp_mutex_unlock (&mem_map->lock);
+  gomp_mutex_unlock (&tgt->device_descr->lock);
 
   if (!t)
     return NULL;
@@ -176,7 +171,7 @@ acc_deviceptr (void *h)
 
   struct goacc_thread *thr = goacc_thread ();
 
-  n = lookup_host (&thr->dev->mem_map, h, 1);
+  n = lookup_host (thr->dev, h, 1);
 
   if (!n)
     return NULL;
@@ -229,7 +224,7 @@ acc_is_present (void *h, size_t s)
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
 
-  n = lookup_host (&acc_dev->mem_map, h, s);
+  n = lookup_host (acc_dev, h, s);
 
   if (n && ((uintptr_t)h < n->host_start
 	    || (uintptr_t)h + s > n->host_end
@@ -271,7 +266,7 @@ acc_map_data (void *h, void *d, size_t s)
 	gomp_fatal ("[%p,+%d]->[%p,+%d] is a bad map",
                     (void *)h, (int)s, (void *)d, (int)s);
 
-      if (lookup_host (&acc_dev->mem_map, h, s))
+      if (lookup_host (acc_dev, h, s))
 	gomp_fatal ("host address [%p, +%d] is already mapped", (void *)h,
 		    (int)s);
 
@@ -296,7 +291,7 @@ acc_unmap_data (void *h)
   /* No need to call lazy open, as the address must have been mapped.  */
 
   size_t host_size;
-  splay_tree_key n = lookup_host (&acc_dev->mem_map, h, 1);
+  splay_tree_key n = lookup_host (acc_dev, h, 1);
   struct target_mem_desc *t;
 
   if (!n)
@@ -320,7 +315,7 @@ acc_unmap_data (void *h)
       t->tgt_end = 0;
       t->to_free = 0;
 
-      gomp_mutex_lock (&acc_dev->mem_map.lock);
+      gomp_mutex_lock (&acc_dev->lock);
 
       for (tp = NULL, t = acc_dev->openacc.data_environ; t != NULL;
 	   tp = t, t = t->prev)
@@ -334,7 +329,7 @@ acc_unmap_data (void *h)
 	    break;
 	  }
 
-      gomp_mutex_unlock (&acc_dev->mem_map.lock);
+      gomp_mutex_unlock (&acc_dev->lock);
     }
 
   gomp_unmap_vars (t, true);
@@ -358,7 +353,7 @@ present_create_copy (unsigned f, void *h, size_t s)
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
 
-  n = lookup_host (&acc_dev->mem_map, h, s);
+  n = lookup_host (acc_dev, h, s);
   if (n)
     {
       /* Present. */
@@ -389,13 +384,13 @@ present_create_copy (unsigned f, void *h, size_t s)
       tgt = gomp_map_vars (acc_dev, mapnum, &hostaddrs, NULL, &s, &kinds, true,
 			   false);
 
-      gomp_mutex_lock (&acc_dev->mem_map.lock);
+      gomp_mutex_lock (&acc_dev->lock);
 
       d = tgt->to_free;
       tgt->prev = acc_dev->openacc.data_environ;
       acc_dev->openacc.data_environ = tgt;
 
-      gomp_mutex_unlock (&acc_dev->mem_map.lock);
+      gomp_mutex_unlock (&acc_dev->lock);
     }
 
   return d;
@@ -436,7 +431,7 @@ delete_copyout (unsigned f, void *h, size_t s)
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
 
-  n = lookup_host (&acc_dev->mem_map, h, s);
+  n = lookup_host (acc_dev, h, s);
 
   /* No need to call lazy open, as the data must already have been
      mapped.  */
@@ -479,7 +474,7 @@ update_dev_host (int is_dev, void *h, size_t s)
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
 
-  n = lookup_host (&acc_dev->mem_map, h, s);
+  n = lookup_host (acc_dev, h, s);
 
   /* No need to call lazy open, as the data must already have been
      mapped.  */
@@ -532,7 +527,7 @@ gomp_acc_remove_pointer (void *h, bool force_copyfrom, int async, int mapnum)
   struct target_mem_desc *t;
   int minrefs = (mapnum == 1) ? 2 : 3;
 
-  n = lookup_host (&acc_dev->mem_map, h, 1);
+  n = lookup_host (acc_dev, h, 1);
 
   if (!n)
     gomp_fatal ("%p is not a mapped block", (void *)h);
@@ -543,7 +538,7 @@ gomp_acc_remove_pointer (void *h, bool force_copyfrom, int async, int mapnum)
 
   struct target_mem_desc *tp;
 
-  gomp_mutex_lock (&acc_dev->mem_map.lock);
+  gomp_mutex_lock (&acc_dev->lock);
 
   if (t->refcount == minrefs)
     {
@@ -570,7 +565,7 @@ gomp_acc_remove_pointer (void *h, bool force_copyfrom, int async, int mapnum)
   if (force_copyfrom)
     t->list[0]->copy_from = 1;
 
-  gomp_mutex_unlock (&acc_dev->mem_map.lock);
+  gomp_mutex_unlock (&acc_dev->lock);
 
   /* If running synchronously, unmap immediately.  */
   if (async < acc_async_noval)
diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
index 0c74f54..563f9bb 100644
--- a/libgomp/oacc-parallel.c
+++ b/libgomp/oacc-parallel.c
@@ -144,9 +144,9 @@ GOACC_parallel (int device, void (*fn) (void *),
     {
       k.host_start = (uintptr_t) fn;
       k.host_end = k.host_start + 1;
-      gomp_mutex_lock (&acc_dev->mem_map.lock);
-      tgt_fn_key = splay_tree_lookup (&acc_dev->mem_map.splay_tree, &k);
-      gomp_mutex_unlock (&acc_dev->mem_map.lock);
+      gomp_mutex_lock (&acc_dev->lock);
+      tgt_fn_key = splay_tree_lookup (&acc_dev->mem_map, &k);
+      gomp_mutex_unlock (&acc_dev->lock);
 
       if (tgt_fn_key == NULL)
 	gomp_fatal ("target function wasn't mapped");
diff --git a/libgomp/plugin/plugin-host.c b/libgomp/plugin/plugin-host.c
index ebf7f11..bc60f72 100644
--- a/libgomp/plugin/plugin-host.c
+++ b/libgomp/plugin/plugin-host.c
@@ -95,12 +95,6 @@ GOMP_OFFLOAD_get_num_devices (void)
 }
 
 STATIC void
-GOMP_OFFLOAD_register_image (void *host_table __attribute__ ((unused)),
-			     void *target_data __attribute__ ((unused)))
-{
-}
-
-STATIC void
 GOMP_OFFLOAD_init_device (int n __attribute__ ((unused)))
 {
 }
@@ -111,12 +105,19 @@ GOMP_OFFLOAD_fini_device (int n __attribute__ ((unused)))
 }
 
 STATIC int
-GOMP_OFFLOAD_get_table (int n __attribute__ ((unused)),
-			struct mapping_table **table __attribute__ ((unused)))
+GOMP_OFFLOAD_load_image (int n __attribute__ ((unused)),
+			 void *i __attribute__ ((unused)),
+			 struct addr_pair **r __attribute__ ((unused)))
 {
   return 0;
 }
 
+STATIC void
+GOMP_OFFLOAD_unload_image (int n __attribute__ ((unused)),
+			   void *i __attribute__ ((unused)))
+{
+}
+
 STATIC void *
 GOMP_OFFLOAD_openacc_open_device (int n)
 {
diff --git a/libgomp/target.c b/libgomp/target.c
index c5dda3f..418a2f5 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -49,6 +49,9 @@ static void gomp_target_init (void);
 /* The whole initialization code for offloading plugins is only run one.  */
 static pthread_once_t gomp_is_initialized = PTHREAD_ONCE_INIT;
 
+/* Mutex for offload image registration.  */
+static gomp_mutex_t register_lock;
+
 /* This structure describes an offload image.
    It contains type of the target device, pointer to host table descriptor, and
    pointer to target data.  */
@@ -153,14 +156,14 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
   size_t i, tgt_align, tgt_size, not_found_cnt = 0;
   const int rshift = is_openacc ? 8 : 3;
   const int typemask = is_openacc ? 0xff : 0x7;
-  struct gomp_memory_mapping *mm = &devicep->mem_map;
+  struct splay_tree_s *mem_map = &devicep->mem_map;
   struct splay_tree_key_s cur_node;
   struct target_mem_desc *tgt
     = gomp_malloc (sizeof (*tgt) + sizeof (tgt->list[0]) * mapnum);
   tgt->list_count = mapnum;
   tgt->refcount = 1;
   tgt->device_descr = devicep;
-  tgt->mem_map = mm;
+  tgt->mem_map = mem_map;
 
   if (mapnum == 0)
     return tgt;
@@ -174,7 +177,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
       tgt_size = mapnum * sizeof (void *);
     }
 
-  gomp_mutex_lock (&mm->lock);
+  gomp_mutex_lock (&devicep->lock);
 
   for (i = 0; i < mapnum; i++)
     {
@@ -189,7 +192,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	cur_node.host_end = cur_node.host_start + sizes[i];
       else
 	cur_node.host_end = cur_node.host_start + sizeof (void *);
-      splay_tree_key n = splay_tree_lookup (&mm->splay_tree, &cur_node);
+      splay_tree_key n = splay_tree_lookup (mem_map, &cur_node);
       if (n)
 	{
 	  tgt->list[i] = n;
@@ -274,7 +277,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	      k->host_end = k->host_start + sizes[i];
 	    else
 	      k->host_end = k->host_start + sizeof (void *);
-	    splay_tree_key n = splay_tree_lookup (&mm->splay_tree, k);
+	    splay_tree_key n = splay_tree_lookup (mem_map, k);
 	    if (n)
 	      {
 		tgt->list[i] = n;
@@ -294,7 +297,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 		tgt->refcount++;
 		array->left = NULL;
 		array->right = NULL;
-		splay_tree_insert (&mm->splay_tree, array);
+		splay_tree_insert (mem_map, array);
 		switch (kind & typemask)
 		  {
 		  case GOMP_MAP_ALLOC:
@@ -332,16 +335,16 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 		    /* Add bias to the pointer value.  */
 		    cur_node.host_start += sizes[i];
 		    cur_node.host_end = cur_node.host_start + 1;
-		    n = splay_tree_lookup (&mm->splay_tree, &cur_node);
+		    n = splay_tree_lookup (mem_map, &cur_node);
 		    if (n == NULL)
 		      {
 			/* Could be possibly zero size array section.  */
 			cur_node.host_end--;
-			n = splay_tree_lookup (&mm->splay_tree, &cur_node);
+			n = splay_tree_lookup (mem_map, &cur_node);
 			if (n == NULL)
 			  {
 			    cur_node.host_start--;
-			    n = splay_tree_lookup (&mm->splay_tree, &cur_node);
+			    n = splay_tree_lookup (mem_map, &cur_node);
 			    cur_node.host_start++;
 			  }
 		      }
@@ -400,18 +403,16 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 			  /* Add bias to the pointer value.  */
 			  cur_node.host_start += sizes[j];
 			  cur_node.host_end = cur_node.host_start + 1;
-			  n = splay_tree_lookup (&mm->splay_tree, &cur_node);
+			  n = splay_tree_lookup (mem_map, &cur_node);
 			  if (n == NULL)
 			    {
 			      /* Could be possibly zero size array section.  */
 			      cur_node.host_end--;
-			      n = splay_tree_lookup (&mm->splay_tree,
-						     &cur_node);
+			      n = splay_tree_lookup (mem_map, &cur_node);
 			      if (n == NULL)
 				{
 				  cur_node.host_start--;
-				  n = splay_tree_lookup (&mm->splay_tree,
-							 &cur_node);
+				  n = splay_tree_lookup (mem_map, &cur_node);
 				  cur_node.host_start++;
 				}
 			    }
@@ -489,7 +490,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	}
     }
 
-  gomp_mutex_unlock (&mm->lock);
+  gomp_mutex_unlock (&devicep->lock);
   return tgt;
 }
 
@@ -514,10 +515,9 @@ attribute_hidden void
 gomp_copy_from_async (struct target_mem_desc *tgt)
 {
   struct gomp_device_descr *devicep = tgt->device_descr;
-  struct gomp_memory_mapping *mm = tgt->mem_map;
   size_t i;
 
-  gomp_mutex_lock (&mm->lock);
+  gomp_mutex_lock (&devicep->lock);
 
   for (i = 0; i < tgt->list_count; i++)
     if (tgt->list[i] == NULL)
@@ -536,7 +536,7 @@ gomp_copy_from_async (struct target_mem_desc *tgt)
 				  k->host_end - k->host_start);
       }
 
-  gomp_mutex_unlock (&mm->lock);
+  gomp_mutex_unlock (&devicep->lock);
 }
 
 /* Unmap variables described by TGT.  If DO_COPYFROM is true, copy relevant
@@ -547,7 +547,6 @@ attribute_hidden void
 gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
 {
   struct gomp_device_descr *devicep = tgt->device_descr;
-  struct gomp_memory_mapping *mm = tgt->mem_map;
 
   if (tgt->list_count == 0)
     {
@@ -555,7 +554,7 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
       return;
     }
 
-  gomp_mutex_lock (&mm->lock);
+  gomp_mutex_lock (&devicep->lock);
 
   size_t i;
   for (i = 0; i < tgt->list_count; i++)
@@ -572,7 +571,7 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
 	  devicep->dev2host_func (devicep->target_id, (void *) k->host_start,
 				  (void *) (k->tgt->tgt_start + k->tgt_offset),
 				  k->host_end - k->host_start);
-	splay_tree_remove (&mm->splay_tree, k);
+	splay_tree_remove (tgt->mem_map, k);
 	if (k->tgt->refcount > 1)
 	  k->tgt->refcount--;
 	else
@@ -584,13 +583,12 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
   else
     gomp_unmap_tgt (tgt);
 
-  gomp_mutex_unlock (&mm->lock);
+  gomp_mutex_unlock (&devicep->lock);
 }
 
 static void
-gomp_update (struct gomp_device_descr *devicep, struct gomp_memory_mapping *mm,
-	     size_t mapnum, void **hostaddrs, size_t *sizes, void *kinds,
-	     bool is_openacc)
+gomp_update (struct gomp_device_descr *devicep, size_t mapnum, void **hostaddrs,
+	     size_t *sizes, void *kinds, bool is_openacc)
 {
   size_t i;
   struct splay_tree_key_s cur_node;
@@ -602,14 +600,13 @@ gomp_update (struct gomp_device_descr *devicep, struct gomp_memory_mapping *mm,
   if (mapnum == 0)
     return;
 
-  gomp_mutex_lock (&mm->lock);
+  gomp_mutex_lock (&devicep->lock);
   for (i = 0; i < mapnum; i++)
     if (sizes[i])
       {
 	cur_node.host_start = (uintptr_t) hostaddrs[i];
 	cur_node.host_end = cur_node.host_start + sizes[i];
-	splay_tree_key n = splay_tree_lookup (&mm->splay_tree,
-					      &cur_node);
+	splay_tree_key n = splay_tree_lookup (&devicep->mem_map, &cur_node);
 	if (n)
 	  {
 	    int kind = get_kind (is_openacc, kinds, i);
@@ -643,10 +640,105 @@ gomp_update (struct gomp_device_descr *devicep, struct gomp_memory_mapping *mm,
 		      (void *) cur_node.host_start,
 		      (void *) cur_node.host_end);
       }
-  gomp_mutex_unlock (&mm->lock);
+  gomp_mutex_unlock (&devicep->lock);
+}
+
+
+/* Insert mapping of host -> target address pairs to splay tree.  */
+
+static void
+gomp_splay_tree_insert_mapping (struct gomp_device_descr *devicep,
+				struct addr_pair *host_addr,
+				struct addr_pair *tgt_addr)
+{
+  struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt));
+  tgt->refcount = 1;
+  tgt->array = gomp_malloc (sizeof (*tgt->array));
+  tgt->tgt_start = tgt_addr->start;
+  tgt->tgt_end = tgt_addr->end;
+  tgt->to_free = NULL;
+  tgt->list_count = 0;
+  tgt->device_descr = devicep;
+  splay_tree_node node = tgt->array;
+  splay_tree_key k = &node->key;
+  k->host_start = host_addr->start;
+  k->host_end = host_addr->end;
+  k->tgt_offset = 0;
+  k->refcount = 1;
+  k->copy_from = false;
+  k->tgt = tgt;
+  node->left = NULL;
+  node->right = NULL;
+  splay_tree_insert (&devicep->mem_map, node);
+}
+
+/* Load image pointed by TARGET_DATA to the device, specified by DEVICEP.
+   And insert to splay tree the mapping between addresses from HOST_TABLE and
+   from loaded target image.  */
+
+static void
+gomp_offload_image_to_device (struct gomp_device_descr *devicep,
+			      void *host_table, void *target_data)
+{
+  void **host_func_table = ((void ***) host_table)[0];
+  void **host_funcs_end  = ((void ***) host_table)[1];
+  void **host_var_table  = ((void ***) host_table)[2];
+  void **host_vars_end   = ((void ***) host_table)[3];
+
+  /* The func table contains only addresses, the var table contains addresses
+     and corresponding sizes.  */
+  int num_funcs = host_funcs_end - host_func_table;
+  int num_vars  = (host_vars_end - host_var_table) / 2;
+
+  /* Load image to device and get target addresses for the image.  */
+  struct addr_pair *target_table = NULL;
+  int i, num_target_entries
+    = devicep->load_image_func (devicep->target_id, target_data, &target_table);
+
+  if (num_target_entries != num_funcs + num_vars)
+    gomp_fatal ("Can't map target functions or variables");
+
+  /* Insert host-target address mapping into splay tree.  */
+  struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt));
+  tgt->array = gomp_malloc (num_funcs * sizeof (*tgt->array));
+  tgt->refcount = 1;
+  tgt->tgt_start = 0;
+  tgt->tgt_end = 0;
+  tgt->to_free = NULL;
+  tgt->prev = NULL;
+  tgt->list_count = 0;
+  tgt->device_descr = devicep;
+
+  splay_tree_node array = tgt->array;
+  for (i = 0; i < num_funcs; i++)
+    {
+      splay_tree_key k = &array->key;
+      k->host_start = (uintptr_t) host_func_table[i];
+      k->host_end = k->host_start + 1;
+      k->tgt = tgt;
+      k->tgt_offset = target_table[i].start;
+      k->refcount = 1;
+      k->async_refcount = 0;
+      k->copy_from = false;
+      array->left = NULL;
+      array->right = NULL;
+      splay_tree_insert (&devicep->mem_map, array);
+      array++;
+    }
+
+  for (i = 0; i < num_vars; i++)
+    {
+      struct addr_pair host_addr;
+      host_addr.start = (uintptr_t) host_var_table[i*2];
+      host_addr.end = host_addr.start + (uintptr_t) host_var_table[i*2+1];
+      gomp_splay_tree_insert_mapping (devicep, &host_addr,
+				      &target_table[num_funcs+i]);
+    }
+
+  free (target_table);
 }
 
-/* This function should be called from every offload image.
+/* This function should be called from every offload image while loading.
    It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of
    the target, and TARGET_DATA needed by target plugin.  */
 
@@ -654,6 +746,20 @@ void
 GOMP_offload_register (void *host_table, enum offload_target_type target_type,
 		       void *target_data)
 {
+  int i;
+  gomp_mutex_lock (&register_lock);
+
+  /* Load image to all initialized devices.  */
+  for (i = 0; i < num_devices; i++)
+    {
+      struct gomp_device_descr *devicep = &devices[i];
+      gomp_mutex_lock (&devicep->lock);
+      if (devicep->type == target_type && devicep->is_initialized)
+	gomp_offload_image_to_device (devicep, host_table, target_data);
+      gomp_mutex_unlock (&devicep->lock);
+    }
+
+  /* Insert image to array of pending images.  */
   offload_images = gomp_realloc (offload_images,
 				 (num_offload_images + 1)
 				 * sizeof (struct offload_image_descr));
@@ -663,74 +769,122 @@ GOMP_offload_register (void *host_table, enum offload_target_type target_type,
   offload_images[num_offload_images].target_data = target_data;
 
   num_offload_images++;
+  gomp_mutex_unlock (&register_lock);
 }
 
-/* This function initializes the target device, specified by DEVICEP.  DEVICEP
-   must be locked on entry, and remains locked on return.  */
+/* This function should be called from every offload image while unloading.
+   It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of
+   the target, and TARGET_DATA needed by target plugin.  */
 
-attribute_hidden void
-gomp_init_device (struct gomp_device_descr *devicep)
+void
+GOMP_offload_unregister (void *host_table, enum offload_target_type target_type,
+			 void *target_data)
 {
-  devicep->init_device_func (devicep->target_id);
-  devicep->is_initialized = true;
+  void **host_func_table = ((void ***) host_table)[0];
+  void **host_funcs_end  = ((void ***) host_table)[1];
+  void **host_var_table  = ((void ***) host_table)[2];
+  void **host_vars_end   = ((void ***) host_table)[3];
+  int i;
+
+  /* The func table contains only addresses, the var table contains addresses
+     and corresponding sizes.  */
+  int num_funcs = host_funcs_end - host_func_table;
+  int num_vars  = (host_vars_end - host_var_table) / 2;
+
+  gomp_mutex_lock (&register_lock);
+
+  /* Unload image from all initialized devices.  */
+  for (i = 0; i < num_devices; i++)
+    {
+      int j;
+      struct gomp_device_descr *devicep = &devices[i];
+      gomp_mutex_lock (&devicep->lock);
+      if (devicep->type != target_type || !devicep->is_initialized)
+	{
+	  gomp_mutex_unlock (&devicep->lock);
+	  continue;
+	}
+
+      devicep->unload_image_func (devicep->target_id, target_data);
+
+      /* Remove mapping from splay tree.  */
+      struct splay_tree_key_s k;
+      splay_tree_key node = NULL;
+      if (num_funcs > 0)
+	{
+	  k.host_start = (uintptr_t) host_func_table[0];
+	  k.host_end = k.host_start + 1;
+	  node = splay_tree_lookup (&devicep->mem_map, &k);
+	}
+      for (j = 0; j < num_funcs; j++)
+	{
+	  k.host_start = (uintptr_t) host_func_table[j];
+	  k.host_end = k.host_start + 1;
+	  splay_tree_remove (&devicep->mem_map, &k);
+	}
+      if (node)
+	{
+	  free (node->tgt);
+	  free (node);
+	}
+
+      for (j = 0; j < num_vars; j++)
+	{
+	  k.host_start = (uintptr_t) host_var_table[j*2];
+	  k.host_end = k.host_start + (uintptr_t) host_var_table[j*2+1];
+	  splay_tree_remove (&devicep->mem_map, &k);
+	  /* FIXME: free tgt->array and tgt.  */
+	}
+
+      gomp_mutex_unlock (&devicep->lock);
+    }
+
+  /* Remove image from array of pending images.  */
+  for (i = 0; i < num_offload_images; i++)
+    if (offload_images[i].target_data == target_data)
+      {
+	offload_images[i] = offload_images[--num_offload_images];
+	break;
+      }
+
+  gomp_mutex_unlock (&register_lock);
 }
 
-/* Initialize address mapping tables.  MM must be locked on entry, and remains
-   locked on return.  */
+/* This function initializes the target device, specified by DEVICEP.  DEVICEP
+   must be locked on entry, and remains locked on return.  */
 
 attribute_hidden void
-gomp_init_tables (struct gomp_device_descr *devicep,
-		  struct gomp_memory_mapping *mm)
+gomp_init_device (struct gomp_device_descr *devicep)
 {
-  /* Get address mapping table for device.  */
-  struct mapping_table *table = NULL;
-  int num_entries = devicep->get_table_func (devicep->target_id, &table);
-
-  /* Insert host-target address mapping into dev_splay_tree.  */
   int i;
-  for (i = 0; i < num_entries; i++)
+  devicep->init_device_func (devicep->target_id);
+
+  /* Load to device all images registered by the moment.  */
+  for (i = 0; i < num_offload_images; i++)
     {
-      struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt));
-      tgt->refcount = 1;
-      tgt->array = gomp_malloc (sizeof (*tgt->array));
-      tgt->tgt_start = table[i].tgt_start;
-      tgt->tgt_end = table[i].tgt_end;
-      tgt->to_free = NULL;
-      tgt->list_count = 0;
-      tgt->device_descr = devicep;
-      splay_tree_node node = tgt->array;
-      splay_tree_key k = &node->key;
-      k->host_start = table[i].host_start;
-      k->host_end = table[i].host_end;
-      k->tgt_offset = 0;
-      k->refcount = 1;
-      k->copy_from = false;
-      k->tgt = tgt;
-      node->left = NULL;
-      node->right = NULL;
-      splay_tree_insert (&mm->splay_tree, node);
+      struct offload_image_descr *image = &offload_images[i];
+      if (image->type == devicep->type)
+	gomp_offload_image_to_device (devicep, image->host_table,
+				      image->target_data);
     }
 
-  free (table);
-  mm->is_initialized = true;
+  devicep->is_initialized = true;
 }
 
 /* Free address mapping tables.  MM must be locked on entry, and remains locked
    on return.  */
 
 attribute_hidden void
-gomp_free_memmap (struct gomp_memory_mapping *mm)
+gomp_free_memmap (struct splay_tree_s *mem_map)
 {
-  while (mm->splay_tree.root)
+  while (mem_map->root)
     {
-      struct target_mem_desc *tgt = mm->splay_tree.root->key.tgt;
+      struct target_mem_desc *tgt = mem_map->root->key.tgt;
 
-      splay_tree_remove (&mm->splay_tree, &mm->splay_tree.root->key);
+      splay_tree_remove (mem_map, &mem_map->root->key);
       free (tgt->array);
       free (tgt);
     }
-
-  mm->is_initialized = false;
 }
 
 /* This function de-initializes the target device, specified by DEVICEP.
@@ -791,22 +945,17 @@ GOMP_target (int device, void (*fn) (void *), const void *unused,
     fn_addr = (void *) fn;
   else
     {
-      struct gomp_memory_mapping *mm = &devicep->mem_map;
-      gomp_mutex_lock (&mm->lock);
-
-      if (!mm->is_initialized)
-	gomp_init_tables (devicep, mm);
-
+      gomp_mutex_lock (&devicep->lock);
       struct splay_tree_key_s k;
       k.host_start = (uintptr_t) fn;
       k.host_end = k.host_start + 1;
-      splay_tree_key tgt_fn = splay_tree_lookup (&mm->splay_tree, &k);
+      splay_tree_key tgt_fn = splay_tree_lookup (&devicep->mem_map, &k);
       if (tgt_fn == NULL)
 	gomp_fatal ("Target function wasn't mapped");
 
-      gomp_mutex_unlock (&mm->lock);
+      gomp_mutex_unlock (&devicep->lock);
 
-      fn_addr = (void *) tgt_fn->tgt->tgt_start;
+      fn_addr = (void *) tgt_fn->tgt_offset;
     }
 
   struct target_mem_desc *tgt_vars
@@ -856,12 +1005,6 @@ GOMP_target_data (int device, const void *unused, size_t mapnum,
     gomp_init_device (devicep);
   gomp_mutex_unlock (&devicep->lock);
 
-  struct gomp_memory_mapping *mm = &devicep->mem_map;
-  gomp_mutex_lock (&mm->lock);
-  if (!mm->is_initialized)
-    gomp_init_tables (devicep, mm);
-  gomp_mutex_unlock (&mm->lock);
-
   struct target_mem_desc *tgt
     = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false,
 		     false);
@@ -897,13 +1040,7 @@ GOMP_target_update (int device, const void *unused, size_t mapnum,
     gomp_init_device (devicep);
   gomp_mutex_unlock (&devicep->lock);
 
-  struct gomp_memory_mapping *mm = &devicep->mem_map;
-  gomp_mutex_lock (&mm->lock);
-  if (!mm->is_initialized)
-    gomp_init_tables (devicep, mm);
-  gomp_mutex_unlock (&mm->lock);
-
-  gomp_update (devicep, mm, mapnum, hostaddrs, sizes, kinds, false);
+  gomp_update (devicep, mapnum, hostaddrs, sizes, kinds, false);
 }
 
 void
@@ -972,10 +1109,10 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
   DLSYM (get_caps);
   DLSYM (get_type);
   DLSYM (get_num_devices);
-  DLSYM (register_image);
   DLSYM (init_device);
   DLSYM (fini_device);
-  DLSYM (get_table);
+  DLSYM (load_image);
+  DLSYM (unload_image);
   DLSYM (alloc);
   DLSYM (free);
   DLSYM (dev2host);
@@ -1038,22 +1175,6 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
   return err == NULL;
 }
 
-/* This function adds a compatible offload image IMAGE to an accelerator device
-   DEVICE.  DEVICE must be locked on entry, and remains locked on return.  */
-
-static void
-gomp_register_image_for_device (struct gomp_device_descr *device,
-				struct offload_image_descr *image)
-{
-  if (!device->offload_regions_registered
-      && (device->type == image->type
-	  || device->type == OFFLOAD_TARGET_TYPE_HOST))
-    {
-      device->register_image_func (image->host_table, image->target_data);
-      device->offload_regions_registered = true;
-    }
-}
-
 /* This function initializes the runtime needed for offloading.
    It parses the list of offload targets and tries to load the plugins for
    these targets.  On return, the variables NUM_DEVICES and NUM_DEVICES_OPENMP
@@ -1112,17 +1233,14 @@ gomp_target_init (void)
 		current_device.name = current_device.get_name_func ();
 		/* current_device.capabilities has already been set.  */
 		current_device.type = current_device.get_type_func ();
-		current_device.mem_map.is_initialized = false;
-		current_device.mem_map.splay_tree.root = NULL;
+		current_device.mem_map.root = NULL;
 		current_device.is_initialized = false;
-		current_device.offload_regions_registered = false;
 		current_device.openacc.data_environ = NULL;
 		current_device.openacc.target_data = NULL;
 		for (i = 0; i < new_num_devices; i++)
 		  {
 		    current_device.target_id = i;
 		    devices[num_devices] = current_device;
-		    gomp_mutex_init (&devices[num_devices].mem_map.lock);
 		    gomp_mutex_init (&devices[num_devices].lock);
 		    num_devices++;
 		  }
@@ -1157,21 +1275,12 @@ gomp_target_init (void)
 
   for (i = 0; i < num_devices; i++)
     {
-      int j;
-
-      for (j = 0; j < num_offload_images; j++)
-	gomp_register_image_for_device (&devices[i], &offload_images[j]);
-
       /* The 'devices' array can be moved (by the realloc call) until we have
 	 found all the plugins, so registering with the OpenACC runtime (which
 	 takes a copy of the pointer argument) must be delayed until now.  */
       if (devices[i].capabilities & GOMP_OFFLOAD_CAP_OPENACC_200)
 	goacc_register (&devices[i]);
     }
-
-  free (offload_images);
-  offload_images = NULL;
-  num_offload_images = 0;
 }
 
 #else /* PLUGIN_SUPPORT */
diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
index 3e7a958..a2d61b1 100644
--- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
+++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
@@ -34,6 +34,7 @@
 #include <string.h>
 #include <utility>
 #include <vector>
+#include <map>
 #include "libgomp-plugin.h"
 #include "compiler_if_host.h"
 #include "main_target_image.h"
@@ -53,6 +54,29 @@ fprintf (stderr, "\n");					    \
 #endif
 
 
+/* Start/end addresses of functions and global variables on a device.  */
+typedef std::vector<addr_pair> AddrVect;
+
+/* Addresses for one image and all devices.  */
+typedef std::vector<AddrVect> DevAddrVect;
+
+/* Addresses for all images and all devices.  */
+typedef std::map<void *, DevAddrVect> ImgDevAddrMap;
+
+
+/* Total number of available devices.  */
+static int num_devices;
+
+/* Total number of shared libraries with offloading to Intel MIC.  */
+static int num_images;
+
+/* Two dimensional array: one key is a pointer to image,
+   second key is number of device.  Contains a vector of pointer pairs.  */
+static ImgDevAddrMap *address_table;
+
+/* Thread-safe registration of the main image.  */
+static pthread_once_t main_image_is_registered = PTHREAD_ONCE_INIT;
+
 static VarDesc vd_host2tgt = {
   { 1, 1 },		      /* dst, src			      */
   { 1, 0 },		      /* in, out			      */
@@ -90,28 +114,17 @@ static VarDesc vd_tgt2host = {
 };
 
 
-/* Total number of shared libraries with offloading to Intel MIC.  */
-static int num_libraries;
-
-/* Pointers to the descriptors, containing pointers to host-side tables and to
-   target images.  */
-static std::vector< std::pair<void *, void *> > lib_descrs;
-
-/* Thread-safe registration of the main image.  */
-static pthread_once_t main_image_is_registered = PTHREAD_ONCE_INIT;
-
-
 /* Add path specified in LD_LIBRARY_PATH to MIC_LD_LIBRARY_PATH, which is
    required by liboffloadmic.  */
 __attribute__((constructor))
 static void
-set_mic_lib_path (void)
+init (void)
 {
   const char *ld_lib_path = getenv (LD_LIBRARY_PATH_ENV);
   const char *mic_lib_path = getenv (MIC_LD_LIBRARY_PATH_ENV);
 
   if (!ld_lib_path)
-    return;
+    goto out;
 
   if (!mic_lib_path)
     setenv (MIC_LD_LIBRARY_PATH_ENV, ld_lib_path, 1);
@@ -133,6 +146,10 @@ set_mic_lib_path (void)
       if (!use_alloca)
 	free (mic_lib_path_new);
     }
+
+out:
+  address_table = new ImgDevAddrMap;
+  num_devices = _Offload_number_of_devices ();
 }
 
 extern "C" const char *
@@ -162,18 +179,8 @@ GOMP_OFFLOAD_get_type (void)
 extern "C" int
 GOMP_OFFLOAD_get_num_devices (void)
 {
-  int res = _Offload_number_of_devices ();
-  TRACE ("(): return %d", res);
-  return res;
-}
-
-/* This should be called from every shared library with offloading.  */
-extern "C" void
-GOMP_OFFLOAD_register_image (void *host_table, void *target_image)
-{
-  TRACE ("(host_table = %p, target_image = %p)", host_table, target_image);
-  lib_descrs.push_back (std::make_pair (host_table, target_image));
-  num_libraries++;
+  TRACE ("(): return %d", num_devices);
+  return num_devices;
 }
 
 static void
@@ -196,7 +203,8 @@ register_main_image ()
   __offload_register_image (&main_target_image);
 }
 
-/* Load offload_target_main on target.  */
+/* liboffloadmic loads and runs offload_target_main on all available devices
+   during a first call to offload ().  */
 extern "C" void
 GOMP_OFFLOAD_init_device (int device)
 {
@@ -243,9 +251,11 @@ get_target_table (int device, int &num_funcs, int &num_vars, void **&table)
     }
 }
 
+/* Offload TARGET_IMAGE to all available devices and fill address_table with
+   corresponding target addresses.  */
+
 static void
-load_lib_and_get_table (int device, int lib_num, mapping_table *&table,
-			int &table_size)
+offload_image (void *target_image)
 {
   struct TargetImage {
     int64_t size;
@@ -254,19 +264,11 @@ load_lib_and_get_table (int device, int lib_num, mapping_table *&table,
     char data[];
   } __attribute__ ((packed));
 
-  void ***host_table_descr = (void ***) lib_descrs[lib_num].first;
-  void **host_func_start = host_table_descr[0];
-  void **host_func_end   = host_table_descr[1];
-  void **host_var_start  = host_table_descr[2];
-  void **host_var_end    = host_table_descr[3];
+  void *image_start = ((void **) target_image)[0];
+  void *image_end   = ((void **) target_image)[1];
 
-  void **target_image_descr = (void **) lib_descrs[lib_num].second;
-  void *image_start = target_image_descr[0];
-  void *image_end   = target_image_descr[1];
-
-  TRACE ("() host_table_descr { %p, %p, %p, %p }", host_func_start,
-	 host_func_end, host_var_start, host_var_end);
-  TRACE ("() target_image_descr { %p, %p }", image_start, image_end);
+  TRACE ("(target_image = %p { %p, %p })",
+	 target_image, image_start, image_end);
 
   int64_t image_size = (uintptr_t) image_end - (uintptr_t) image_start;
   TargetImage *image
@@ -279,94 +281,87 @@ load_lib_and_get_table (int device, int lib_num, mapping_table *&table,
     }
 
   image->size = image_size;
-  sprintf (image->name, "lib%010d.so", lib_num);
+  sprintf (image->name, "lib%010d.so", num_images++);
   memcpy (image->data, image_start, image->size);
 
   TRACE ("() __offload_register_image %s { %p, %d }",
 	 image->name, image_start, image->size);
   __offload_register_image (image);
 
-  int tgt_num_funcs = 0;
-  int tgt_num_vars = 0;
-  void **tgt_table = NULL;
-  get_target_table (device, tgt_num_funcs, tgt_num_vars, tgt_table);
-  free (image);
-
-  /* The func table contains only addresses, the var table contains addresses
-     and corresponding sizes.  */
-  int host_num_funcs = host_func_end - host_func_start;
-  int host_num_vars  = (host_var_end - host_var_start) / 2;
-  TRACE ("() host_num_funcs = %d, tgt_num_funcs = %d",
-	 host_num_funcs, tgt_num_funcs);
-  TRACE ("() host_num_vars = %d, tgt_num_vars = %d",
-	 host_num_vars, tgt_num_vars);
-  if (host_num_funcs != tgt_num_funcs)
+  /* Receive tables for target_image from all devices.  */
+  DevAddrVect dev_table;
+  for (int dev = 0; dev < num_devices; dev++)
     {
-      fprintf (stderr, "%s: Can't map target functions\n", __FILE__);
-      exit (1);
-    }
-  if (host_num_vars != tgt_num_vars)
-    {
-      fprintf (stderr, "%s: Can't map target variables\n", __FILE__);
-      exit (1);
-    }
+      int num_funcs = 0;
+      int num_vars = 0;
+      void **table = NULL;
 
-  table = (mapping_table *) realloc (table, (table_size + host_num_funcs
-					     + host_num_vars)
-					    * sizeof (mapping_table));
-  if (table == NULL)
-    {
-      fprintf (stderr, "%s: Can't allocate memory\n", __FILE__);
-      exit (1);
-    }
+      get_target_table (dev, num_funcs, num_vars, table);
 
-  for (int i = 0; i < host_num_funcs; i++)
-    {
-      mapping_table t;
-      t.host_start = (uintptr_t) host_func_start[i];
-      t.host_end = t.host_start + 1;
-      t.tgt_start = (uintptr_t) tgt_table[i];
-      t.tgt_end = t.tgt_start + 1;
-
-      TRACE ("() lib %d, func %d:\t0x%llx -- 0x%llx",
-	     lib_num, i, t.host_start, t.tgt_start);
-
-      table[table_size++] = t;
-    }
+      AddrVect curr_dev_table;
 
-  for (int i = 0; i < host_num_vars * 2; i += 2)
-    {
-      mapping_table t;
-      t.host_start = (uintptr_t) host_var_start[i];
-      t.host_end = t.host_start + (uintptr_t) host_var_start[i+1];
-      t.tgt_start = (uintptr_t) tgt_table[tgt_num_funcs+i];
-      t.tgt_end = t.tgt_start + (uintptr_t) tgt_table[tgt_num_funcs+i+1];
+      for (int i = 0; i < num_funcs; i++)
+	{
+	  addr_pair tgt_addr;
+	  tgt_addr.start = (uintptr_t) table[i];
+	  tgt_addr.end = tgt_addr.start + 1;
+	  TRACE ("() func %d:\t0x%llx..0x%llx", i,
+		 tgt_addr.start, tgt_addr.end);
+	  curr_dev_table.push_back (tgt_addr);
+	}
 
-      TRACE ("() lib %d, var %d:\t0x%llx (%d) -- 0x%llx (%d)", lib_num, i/2,
-	     t.host_start, t.host_end - t.host_start,
-	     t.tgt_start, t.tgt_end - t.tgt_start);
+      for (int i = 0; i < num_vars; i++)
+	{
+	  addr_pair tgt_addr;
+	  tgt_addr.start = (uintptr_t) table[num_funcs+i*2];
+	  tgt_addr.end = tgt_addr.start + (uintptr_t) table[num_funcs+i*2+1];
+	  TRACE ("() var %d:\t0x%llx..0x%llx", i, tgt_addr.start, tgt_addr.end);
+	  curr_dev_table.push_back (tgt_addr);
+	}
 
-      table[table_size++] = t;
+      dev_table.push_back (curr_dev_table);
     }
 
-  delete [] tgt_table;
+  address_table->insert (std::make_pair (target_image, dev_table));
+
+  free (image);
 }
 
 extern "C" int
-GOMP_OFFLOAD_get_table (int device, void *result)
+GOMP_OFFLOAD_load_image (int device, void *target_image, addr_pair **result)
 {
-  TRACE ("(num_libraries = %d)", num_libraries);
+  TRACE ("(device = %d, target_image = %p)", device, target_image);
 
-  mapping_table *table = NULL;
-  int table_size = 0;
+  /* If target_image is already present in address_table, then there is no need
+     to offload it.  */
+  if (address_table->count (target_image) == 0)
+    offload_image (target_image);
 
-  for (int i = 0; i < num_libraries; i++)
-    load_lib_and_get_table (device, i, table, table_size);
+  AddrVect *curr_dev_table = &(*address_table)[target_image][device];
+  int table_size = curr_dev_table->size ();
+  addr_pair *table = (addr_pair *) malloc (table_size * sizeof (addr_pair));
+  if (table == NULL)
+    {
+      fprintf (stderr, "%s: Can't allocate memory\n", __FILE__);
+      exit (1);
+    }
 
-  *(void **) result = table;
+  std::copy (curr_dev_table->begin (), curr_dev_table->end (), table);
+  *result = table;
   return table_size;
 }
 
+extern "C" void
+GOMP_OFFLOAD_unload_image (int device, void *target_image)
+{
+  TRACE ("(device = %d, target_image = %p)", device, target_image);
+
+  /* TODO: Currently liboffloadmic doesn't support __offload_unregister_image
+     for libraries.  */
+
+  address_table->erase (target_image);
+}
+
 extern "C" void *
 GOMP_OFFLOAD_alloc (int device, size_t size)
 {


Thanks,
  -- Ilya

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
  2015-03-31 12:52                                     ` Ilya Verbin
@ 2015-03-31 13:08                                       ` Jakub Jelinek
  2015-03-31 16:10                                         ` Ilya Verbin
  0 siblings, 1 reply; 92+ messages in thread
From: Jakub Jelinek @ 2015-03-31 13:08 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: Thomas Schwinge, Julian Brown, gcc-patches, Kirill Yukhin

On Tue, Mar 31, 2015 at 03:52:06PM +0300, Ilya Verbin wrote:
> > What is the reason to register and allocate these one at a time, rather than
> > using one struct target_mem_desc with one tgt->array for all splay tree
> > nodes registered from one image?
> > Perhaps you would just use tgt_start of 0 and tgt_end of 0 too (to make it
> > clear it is special) and just use tgt_offset relative to that (i.e.
> > absolute), but having to malloc each node individually and having to malloc
> > a target_mem_desc for each one sounds expensive.
> > Everything is freed just once anyway, isn't it?
> 
> Here is WIP patch, does this look like what you suggested?  It works fine with
> functions, however I'm not sure what to do with variables.  Will gomp_map_vars
> work when tgt_start and tgt_end are equal to 0?

Can you explain what you are afraid of?  The mapped images (both their
mapping and unmapping) are done in pairs, and in a valid program the
addresses shouldn't be already mapped when the image is mapped in etc.
So, for gomp_map_vars, the var allocations should just be the pre-existing
mappings, i.e.
      splay_tree_key n = splay_tree_lookup (&mm->splay_tree, &cur_node);
      if (n)
        {
          tgt->list[i] = n;
          gomp_map_vars_existing (n, &cur_node, kind & typemask);
        }
case and
  if (is_target)
    {
      for (i = 0; i < mapnum; i++)
        {
          if (tgt->list[i] == NULL)
            cur_node.tgt_offset = (uintptr_t) NULL;
          else
            cur_node.tgt_offset = tgt->list[i]->tgt->tgt_start
                                  + tgt->list[i]->tgt_offset;
          /* FIXME: see above FIXME comment.  */
          devicep->host2dev_func (devicep->target_id,
                                  (void *) (tgt->tgt_start
                                            + i * sizeof (void *)),
                                  (void *) &cur_node.tgt_offset,
                                  sizeof (void *));
        }
    }
at the end.  tgt->list[i] will be non-NULL, tgt->list[i]->tgt->tgt_start
will be 0, but tgt->list[i]->tgt_offset will be absolute and so should DTRT.

> +  for (i = 0; i < num_vars; i++)
> +    {
> +      struct addr_pair host_addr;
> +      host_addr.start = (uintptr_t) host_var_table[i*2];
> +      host_addr.end = host_addr.start + (uintptr_t) host_var_table[i*2+1];

Formatting, spaces around + or *.  But, as said earlier, I don't see why
this wouldn't work for variables too.

	Jakub

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
  2015-03-31 13:08                                       ` Jakub Jelinek
@ 2015-03-31 16:10                                         ` Ilya Verbin
  2015-03-31 23:53                                           ` Ilya Verbin
  0 siblings, 1 reply; 92+ messages in thread
From: Ilya Verbin @ 2015-03-31 16:10 UTC (permalink / raw)
  To: Jakub Jelinek, Julian Brown; +Cc: Thomas Schwinge, gcc-patches, Kirill Yukhin

On Tue, Mar 31, 2015 at 15:07:58 +0200, Jakub Jelinek wrote:
> On Tue, Mar 31, 2015 at 03:52:06PM +0300, Ilya Verbin wrote:
> > > What is the reason to register and allocate these one at a time, rather than
> > > using one struct target_mem_desc with one tgt->array for all splay tree
> > > nodes registered from one image?
> > > Perhaps you would just use tgt_start of 0 and tgt_end of 0 too (to make it
> > > clear it is special) and just use tgt_offset relative to that (i.e.
> > > absolute), but having to malloc each node individually and having to malloc
> > > a target_mem_desc for each one sounds expensive.
> > > Everything is freed just once anyway, isn't it?
> > 
> > Here is WIP patch, does this look like what you suggested?  It works fine with
> > functions, however I'm not sure what to do with variables.  Will gomp_map_vars
> > work when tgt_start and tgt_end are equal to 0?
> 
> Can you explain what you are afraid of?  The mapped images (both their
> mapping and unmapping) are done in pairs, and in a valid program the
> addresses shouldn't be already mapped when the image is mapped in etc.
> So, for gomp_map_vars, the var allocations should just be the pre-existing
> mappings, i.e.
>       splay_tree_key n = splay_tree_lookup (&mm->splay_tree, &cur_node);
>       if (n)
>         {
>           tgt->list[i] = n;
>           gomp_map_vars_existing (n, &cur_node, kind & typemask);
>         }
> case and
>   if (is_target)
>     {
>       for (i = 0; i < mapnum; i++)
>         {
>           if (tgt->list[i] == NULL)
>             cur_node.tgt_offset = (uintptr_t) NULL;
>           else
>             cur_node.tgt_offset = tgt->list[i]->tgt->tgt_start
>                                   + tgt->list[i]->tgt_offset;
>           /* FIXME: see above FIXME comment.  */
>           devicep->host2dev_func (devicep->target_id,
>                                   (void *) (tgt->tgt_start
>                                             + i * sizeof (void *)),
>                                   (void *) &cur_node.tgt_offset,
>                                   sizeof (void *));
>         }
>     }
> at the end.  tgt->list[i] will be non-NULL, tgt->list[i]->tgt->tgt_start
> will be 0, but tgt->list[i]->tgt_offset will be absolute and so should DTRT.

Ok, thanks for the clarification!  Here is the new patch with variables.

Unfortunately I see 4 fails in make check-target-libgomp with PTX patch applied
on top, but with disabled offloading to PTX.
Julian, have you seen them?  All other tests passed with intelmic emul.

FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_on_device-1.c -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/if-1.c -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/acc_on_device-1.c -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/if-1.c -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test

acc_on_device-1.c aborts here:
  /* Offloaded.  */
#pragma acc parallel
  {
    if (acc_on_device (acc_device_none))
      abort ();


diff --git a/gcc/config/i386/intelmic-mkoffload.c b/gcc/config/i386/intelmic-mkoffload.c
index f93007c..e101f93 100644
--- a/gcc/config/i386/intelmic-mkoffload.c
+++ b/gcc/config/i386/intelmic-mkoffload.c
@@ -350,14 +350,24 @@ generate_host_descr_file (const char *host_compiler)
 	   "#ifdef __cplusplus\n"
 	   "extern \"C\"\n"
 	   "#endif\n"
-	   "void GOMP_offload_register (void *, int, void *);\n\n"
+	   "void GOMP_offload_register (void *, int, void *);\n"
+	   "void GOMP_offload_unregister (void *, int, void *);\n\n"
 
 	   "__attribute__((constructor))\n"
 	   "static void\n"
 	   "init (void)\n"
 	   "{\n"
 	   "  GOMP_offload_register (&__OFFLOAD_TABLE__, %d, __offload_target_data);\n"
+	   "}\n\n", GOMP_DEVICE_INTEL_MIC);
+
+  fprintf (src_file,
+	   "__attribute__((destructor))\n"
+	   "static void\n"
+	   "fini (void)\n"
+	   "{\n"
+	   "  GOMP_offload_unregister (&__OFFLOAD_TABLE__, %d, __offload_target_data);\n"
 	   "}\n", GOMP_DEVICE_INTEL_MIC);
+
   fclose (src_file);
 
   unsigned new_argc = 0;
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index d9cbff5..1072ae4 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -51,14 +51,12 @@ enum offload_target_type
   OFFLOAD_TARGET_TYPE_INTEL_MIC = 6
 };
 
-/* Auxiliary struct, used for transferring a host-target address range mapping
-   from plugin to libgomp.  */
-struct mapping_table
+/* Auxiliary struct, used for transferring pairs of addresses from plugin
+   to libgomp.  */
+struct addr_pair
 {
-  uintptr_t host_start;
-  uintptr_t host_end;
-  uintptr_t tgt_start;
-  uintptr_t tgt_end;
+  uintptr_t start;
+  uintptr_t end;
 };
 
 /* Miscellaneous functions.  */
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 3089401..a1d42c5 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -224,7 +224,6 @@ struct gomp_team_state
 };
 
 struct target_mem_desc;
-struct gomp_memory_mapping;
 
 /* These are the OpenMP 4.0 Internal Control Variables described in
    section 2.3.1.  Those described as having one copy per task are
@@ -657,7 +656,7 @@ struct target_mem_desc {
   struct gomp_device_descr *device_descr;
 
   /* Memory mapping info for the thread that created this descriptor.  */
-  struct gomp_memory_mapping *mem_map;
+  struct splay_tree_s *mem_map;
 
   /* List of splay keys to remove (or decrease refcount)
      at the end of region.  */
@@ -683,20 +682,6 @@ struct splay_tree_key_s {
 
 #include "splay-tree.h"
 
-/* Information about mapped memory regions (per device/context).  */
-
-struct gomp_memory_mapping
-{
-  /* Mutex for operating with the splay tree and other shared structures.  */
-  gomp_mutex_t lock;
-
-  /* True when tables have been added to this memory map.  */
-  bool is_initialized;
-
-  /* Splay tree containing information about mapped memory regions.  */
-  struct splay_tree_s splay_tree;
-};
-
 typedef struct acc_dispatch_t
 {
   /* This is a linked list of data mapped using the
@@ -773,19 +758,18 @@ struct gomp_device_descr
   unsigned int (*get_caps_func) (void);
   int (*get_type_func) (void);
   int (*get_num_devices_func) (void);
-  void (*register_image_func) (void *, void *);
   void (*init_device_func) (int);
   void (*fini_device_func) (int);
-  int (*get_table_func) (int, struct mapping_table **);
+  int (*load_image_func) (int, void *, struct addr_pair **);
+  void (*unload_image_func) (int, void *);
   void *(*alloc_func) (int, size_t);
   void (*free_func) (int, void *);
   void *(*dev2host_func) (int, void *, const void *, size_t);
   void *(*host2dev_func) (int, void *, const void *, size_t);
   void (*run_func) (int, void *, void *);
 
-  /* Memory-mapping info for this device instance.  */
-  /* Uses a separate lock.  */
-  struct gomp_memory_mapping mem_map;
+  /* Splay tree containing information about mapped memory regions.  */
+  struct splay_tree_s mem_map;
 
   /* Mutex for the mutable data.  */
   gomp_mutex_t lock;
@@ -793,9 +777,6 @@ struct gomp_device_descr
   /* Set to true when device is initialized.  */
   bool is_initialized;
 
-  /* True when offload regions have been registered with this device.  */
-  bool offload_regions_registered;
-
   /* OpenACC-specific data and functions.  */
   /* This is mutable because of its mutable data_environ and target_data
      members.  */
@@ -811,9 +792,7 @@ extern struct target_mem_desc *gomp_map_vars (struct gomp_device_descr *,
 extern void gomp_copy_from_async (struct target_mem_desc *);
 extern void gomp_unmap_vars (struct target_mem_desc *, bool);
 extern void gomp_init_device (struct gomp_device_descr *);
-extern void gomp_init_tables (struct gomp_device_descr *,
-			      struct gomp_memory_mapping *);
-extern void gomp_free_memmap (struct gomp_memory_mapping *);
+extern void gomp_free_memmap (struct splay_tree_s *);
 extern void gomp_fini_device (struct gomp_device_descr *);
 
 /* work.c */
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index f44174e..2b2b953 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -231,6 +231,7 @@ GOMP_4.0 {
 GOMP_4.0.1 {
   global:
 	GOMP_offload_register;
+	GOMP_offload_unregister;
 } GOMP_4.0;
 
 OACC_2.0 {
diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index 6aeb1e7..e4756b6 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -43,20 +43,18 @@ static struct gomp_device_descr host_dispatch =
     .get_caps_func = GOMP_OFFLOAD_get_caps,
     .get_type_func = GOMP_OFFLOAD_get_type,
     .get_num_devices_func = GOMP_OFFLOAD_get_num_devices,
-    .register_image_func = GOMP_OFFLOAD_register_image,
     .init_device_func = GOMP_OFFLOAD_init_device,
     .fini_device_func = GOMP_OFFLOAD_fini_device,
-    .get_table_func = GOMP_OFFLOAD_get_table,
+    .load_image_func = GOMP_OFFLOAD_load_image,
+    .unload_image_func = GOMP_OFFLOAD_unload_image,
     .alloc_func = GOMP_OFFLOAD_alloc,
     .free_func = GOMP_OFFLOAD_free,
     .dev2host_func = GOMP_OFFLOAD_dev2host,
     .host2dev_func = GOMP_OFFLOAD_host2dev,
     .run_func = GOMP_OFFLOAD_run,
 
-    .mem_map.is_initialized = false,
-    .mem_map.splay_tree.root = NULL,
+    .mem_map.root = NULL,
     .is_initialized = false,
-    .offload_regions_registered = false,
 
     .openacc = {
       .open_device_func = GOMP_OFFLOAD_openacc_open_device,
@@ -94,7 +92,6 @@ static struct gomp_device_descr host_dispatch =
 static __attribute__ ((constructor))
 void goacc_host_init (void)
 {
-  gomp_mutex_init (&host_dispatch.mem_map.lock);
   gomp_mutex_init (&host_dispatch.lock);
   goacc_register (&host_dispatch);
 }
diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
index 166eb55..1e0243e 100644
--- a/libgomp/oacc-init.c
+++ b/libgomp/oacc-init.c
@@ -284,12 +284,6 @@ lazy_open (int ord)
     = acc_dev->openacc.create_thread_data_func (acc_dev->openacc.target_data);
 
   acc_dev->openacc.async_set_async_func (acc_async_sync);
-
-  struct gomp_memory_mapping *mem_map = &acc_dev->mem_map;
-  gomp_mutex_lock (&mem_map->lock);
-  if (!mem_map->is_initialized)
-    gomp_init_tables (acc_dev, mem_map);
-  gomp_mutex_unlock (&mem_map->lock);
 }
 
 /* OpenACC 2.0a (3.2.12, 3.2.13) doesn't specify whether the serialization of
@@ -351,10 +345,9 @@ acc_shutdown_1 (acc_device_t d)
 
 	  walk->dev->openacc.target_data = target_data = NULL;
 
-	  struct gomp_memory_mapping *mem_map = &walk->dev->mem_map;
-	  gomp_mutex_lock (&mem_map->lock);
-	  gomp_free_memmap (mem_map);
-	  gomp_mutex_unlock (&mem_map->lock);
+	  gomp_mutex_lock (&walk->dev->lock);
+	  gomp_free_memmap (&walk->dev->mem_map);
+	  gomp_mutex_unlock (&walk->dev->lock);
 
 	  walk->dev = NULL;
 	}
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 0096d51..fdc82e6 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -38,7 +38,7 @@
 /* Return block containing [H->S), or NULL if not contained.  */
 
 static splay_tree_key
-lookup_host (struct gomp_memory_mapping *mem_map, void *h, size_t s)
+lookup_host (struct gomp_device_descr *dev, void *h, size_t s)
 {
   struct splay_tree_key_s node;
   splay_tree_key key;
@@ -46,11 +46,9 @@ lookup_host (struct gomp_memory_mapping *mem_map, void *h, size_t s)
   node.host_start = (uintptr_t) h;
   node.host_end = (uintptr_t) h + s;
 
-  gomp_mutex_lock (&mem_map->lock);
-
-  key = splay_tree_lookup (&mem_map->splay_tree, &node);
-
-  gomp_mutex_unlock (&mem_map->lock);
+  gomp_mutex_lock (&dev->lock);
+  key = splay_tree_lookup (&dev->mem_map, &node);
+  gomp_mutex_unlock (&dev->lock);
 
   return key;
 }
@@ -65,14 +63,11 @@ lookup_dev (struct target_mem_desc *tgt, void *d, size_t s)
 {
   int i;
   struct target_mem_desc *t;
-  struct gomp_memory_mapping *mem_map;
 
   if (!tgt)
     return NULL;
 
-  mem_map = tgt->mem_map;
-
-  gomp_mutex_lock (&mem_map->lock);
+  gomp_mutex_lock (&tgt->device_descr->lock);
 
   for (t = tgt; t != NULL; t = t->prev)
     {
@@ -80,7 +75,7 @@ lookup_dev (struct target_mem_desc *tgt, void *d, size_t s)
         break;
     }
 
-  gomp_mutex_unlock (&mem_map->lock);
+  gomp_mutex_unlock (&tgt->device_descr->lock);
 
   if (!t)
     return NULL;
@@ -176,7 +171,7 @@ acc_deviceptr (void *h)
 
   struct goacc_thread *thr = goacc_thread ();
 
-  n = lookup_host (&thr->dev->mem_map, h, 1);
+  n = lookup_host (thr->dev, h, 1);
 
   if (!n)
     return NULL;
@@ -229,7 +224,7 @@ acc_is_present (void *h, size_t s)
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
 
-  n = lookup_host (&acc_dev->mem_map, h, s);
+  n = lookup_host (acc_dev, h, s);
 
   if (n && ((uintptr_t)h < n->host_start
 	    || (uintptr_t)h + s > n->host_end
@@ -271,7 +266,7 @@ acc_map_data (void *h, void *d, size_t s)
 	gomp_fatal ("[%p,+%d]->[%p,+%d] is a bad map",
                     (void *)h, (int)s, (void *)d, (int)s);
 
-      if (lookup_host (&acc_dev->mem_map, h, s))
+      if (lookup_host (acc_dev, h, s))
 	gomp_fatal ("host address [%p, +%d] is already mapped", (void *)h,
 		    (int)s);
 
@@ -296,7 +291,7 @@ acc_unmap_data (void *h)
   /* No need to call lazy open, as the address must have been mapped.  */
 
   size_t host_size;
-  splay_tree_key n = lookup_host (&acc_dev->mem_map, h, 1);
+  splay_tree_key n = lookup_host (acc_dev, h, 1);
   struct target_mem_desc *t;
 
   if (!n)
@@ -320,7 +315,7 @@ acc_unmap_data (void *h)
       t->tgt_end = 0;
       t->to_free = 0;
 
-      gomp_mutex_lock (&acc_dev->mem_map.lock);
+      gomp_mutex_lock (&acc_dev->lock);
 
       for (tp = NULL, t = acc_dev->openacc.data_environ; t != NULL;
 	   tp = t, t = t->prev)
@@ -334,7 +329,7 @@ acc_unmap_data (void *h)
 	    break;
 	  }
 
-      gomp_mutex_unlock (&acc_dev->mem_map.lock);
+      gomp_mutex_unlock (&acc_dev->lock);
     }
 
   gomp_unmap_vars (t, true);
@@ -358,7 +353,7 @@ present_create_copy (unsigned f, void *h, size_t s)
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
 
-  n = lookup_host (&acc_dev->mem_map, h, s);
+  n = lookup_host (acc_dev, h, s);
   if (n)
     {
       /* Present. */
@@ -389,13 +384,13 @@ present_create_copy (unsigned f, void *h, size_t s)
       tgt = gomp_map_vars (acc_dev, mapnum, &hostaddrs, NULL, &s, &kinds, true,
 			   false);
 
-      gomp_mutex_lock (&acc_dev->mem_map.lock);
+      gomp_mutex_lock (&acc_dev->lock);
 
       d = tgt->to_free;
       tgt->prev = acc_dev->openacc.data_environ;
       acc_dev->openacc.data_environ = tgt;
 
-      gomp_mutex_unlock (&acc_dev->mem_map.lock);
+      gomp_mutex_unlock (&acc_dev->lock);
     }
 
   return d;
@@ -436,7 +431,7 @@ delete_copyout (unsigned f, void *h, size_t s)
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
 
-  n = lookup_host (&acc_dev->mem_map, h, s);
+  n = lookup_host (acc_dev, h, s);
 
   /* No need to call lazy open, as the data must already have been
      mapped.  */
@@ -479,7 +474,7 @@ update_dev_host (int is_dev, void *h, size_t s)
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
 
-  n = lookup_host (&acc_dev->mem_map, h, s);
+  n = lookup_host (acc_dev, h, s);
 
   /* No need to call lazy open, as the data must already have been
      mapped.  */
@@ -532,7 +527,7 @@ gomp_acc_remove_pointer (void *h, bool force_copyfrom, int async, int mapnum)
   struct target_mem_desc *t;
   int minrefs = (mapnum == 1) ? 2 : 3;
 
-  n = lookup_host (&acc_dev->mem_map, h, 1);
+  n = lookup_host (acc_dev, h, 1);
 
   if (!n)
     gomp_fatal ("%p is not a mapped block", (void *)h);
@@ -543,7 +538,7 @@ gomp_acc_remove_pointer (void *h, bool force_copyfrom, int async, int mapnum)
 
   struct target_mem_desc *tp;
 
-  gomp_mutex_lock (&acc_dev->mem_map.lock);
+  gomp_mutex_lock (&acc_dev->lock);
 
   if (t->refcount == minrefs)
     {
@@ -570,7 +565,7 @@ gomp_acc_remove_pointer (void *h, bool force_copyfrom, int async, int mapnum)
   if (force_copyfrom)
     t->list[0]->copy_from = 1;
 
-  gomp_mutex_unlock (&acc_dev->mem_map.lock);
+  gomp_mutex_unlock (&acc_dev->lock);
 
   /* If running synchronously, unmap immediately.  */
   if (async < acc_async_noval)
diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
index 0c74f54..563f9bb 100644
--- a/libgomp/oacc-parallel.c
+++ b/libgomp/oacc-parallel.c
@@ -144,9 +144,9 @@ GOACC_parallel (int device, void (*fn) (void *),
     {
       k.host_start = (uintptr_t) fn;
       k.host_end = k.host_start + 1;
-      gomp_mutex_lock (&acc_dev->mem_map.lock);
-      tgt_fn_key = splay_tree_lookup (&acc_dev->mem_map.splay_tree, &k);
-      gomp_mutex_unlock (&acc_dev->mem_map.lock);
+      gomp_mutex_lock (&acc_dev->lock);
+      tgt_fn_key = splay_tree_lookup (&acc_dev->mem_map, &k);
+      gomp_mutex_unlock (&acc_dev->lock);
 
       if (tgt_fn_key == NULL)
 	gomp_fatal ("target function wasn't mapped");
diff --git a/libgomp/plugin/plugin-host.c b/libgomp/plugin/plugin-host.c
index ebf7f11..bc60f72 100644
--- a/libgomp/plugin/plugin-host.c
+++ b/libgomp/plugin/plugin-host.c
@@ -95,12 +95,6 @@ GOMP_OFFLOAD_get_num_devices (void)
 }
 
 STATIC void
-GOMP_OFFLOAD_register_image (void *host_table __attribute__ ((unused)),
-			     void *target_data __attribute__ ((unused)))
-{
-}
-
-STATIC void
 GOMP_OFFLOAD_init_device (int n __attribute__ ((unused)))
 {
 }
@@ -111,12 +105,19 @@ GOMP_OFFLOAD_fini_device (int n __attribute__ ((unused)))
 }
 
 STATIC int
-GOMP_OFFLOAD_get_table (int n __attribute__ ((unused)),
-			struct mapping_table **table __attribute__ ((unused)))
+GOMP_OFFLOAD_load_image (int n __attribute__ ((unused)),
+			 void *i __attribute__ ((unused)),
+			 struct addr_pair **r __attribute__ ((unused)))
 {
   return 0;
 }
 
+STATIC void
+GOMP_OFFLOAD_unload_image (int n __attribute__ ((unused)),
+			   void *i __attribute__ ((unused)))
+{
+}
+
 STATIC void *
 GOMP_OFFLOAD_openacc_open_device (int n)
 {
diff --git a/libgomp/target.c b/libgomp/target.c
index c5dda3f..1d08681 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -49,6 +49,9 @@ static void gomp_target_init (void);
 /* The whole initialization code for offloading plugins is only run one.  */
 static pthread_once_t gomp_is_initialized = PTHREAD_ONCE_INIT;
 
+/* Mutex for offload image registration.  */
+static gomp_mutex_t register_lock;
+
 /* This structure describes an offload image.
    It contains type of the target device, pointer to host table descriptor, and
    pointer to target data.  */
@@ -153,14 +156,14 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
   size_t i, tgt_align, tgt_size, not_found_cnt = 0;
   const int rshift = is_openacc ? 8 : 3;
   const int typemask = is_openacc ? 0xff : 0x7;
-  struct gomp_memory_mapping *mm = &devicep->mem_map;
+  struct splay_tree_s *mem_map = &devicep->mem_map;
   struct splay_tree_key_s cur_node;
   struct target_mem_desc *tgt
     = gomp_malloc (sizeof (*tgt) + sizeof (tgt->list[0]) * mapnum);
   tgt->list_count = mapnum;
   tgt->refcount = 1;
   tgt->device_descr = devicep;
-  tgt->mem_map = mm;
+  tgt->mem_map = mem_map;
 
   if (mapnum == 0)
     return tgt;
@@ -174,7 +177,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
       tgt_size = mapnum * sizeof (void *);
     }
 
-  gomp_mutex_lock (&mm->lock);
+  gomp_mutex_lock (&devicep->lock);
 
   for (i = 0; i < mapnum; i++)
     {
@@ -189,7 +192,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	cur_node.host_end = cur_node.host_start + sizes[i];
       else
 	cur_node.host_end = cur_node.host_start + sizeof (void *);
-      splay_tree_key n = splay_tree_lookup (&mm->splay_tree, &cur_node);
+      splay_tree_key n = splay_tree_lookup (mem_map, &cur_node);
       if (n)
 	{
 	  tgt->list[i] = n;
@@ -274,7 +277,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	      k->host_end = k->host_start + sizes[i];
 	    else
 	      k->host_end = k->host_start + sizeof (void *);
-	    splay_tree_key n = splay_tree_lookup (&mm->splay_tree, k);
+	    splay_tree_key n = splay_tree_lookup (mem_map, k);
 	    if (n)
 	      {
 		tgt->list[i] = n;
@@ -294,7 +297,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 		tgt->refcount++;
 		array->left = NULL;
 		array->right = NULL;
-		splay_tree_insert (&mm->splay_tree, array);
+		splay_tree_insert (mem_map, array);
 		switch (kind & typemask)
 		  {
 		  case GOMP_MAP_ALLOC:
@@ -332,16 +335,16 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 		    /* Add bias to the pointer value.  */
 		    cur_node.host_start += sizes[i];
 		    cur_node.host_end = cur_node.host_start + 1;
-		    n = splay_tree_lookup (&mm->splay_tree, &cur_node);
+		    n = splay_tree_lookup (mem_map, &cur_node);
 		    if (n == NULL)
 		      {
 			/* Could be possibly zero size array section.  */
 			cur_node.host_end--;
-			n = splay_tree_lookup (&mm->splay_tree, &cur_node);
+			n = splay_tree_lookup (mem_map, &cur_node);
 			if (n == NULL)
 			  {
 			    cur_node.host_start--;
-			    n = splay_tree_lookup (&mm->splay_tree, &cur_node);
+			    n = splay_tree_lookup (mem_map, &cur_node);
 			    cur_node.host_start++;
 			  }
 		      }
@@ -400,18 +403,16 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 			  /* Add bias to the pointer value.  */
 			  cur_node.host_start += sizes[j];
 			  cur_node.host_end = cur_node.host_start + 1;
-			  n = splay_tree_lookup (&mm->splay_tree, &cur_node);
+			  n = splay_tree_lookup (mem_map, &cur_node);
 			  if (n == NULL)
 			    {
 			      /* Could be possibly zero size array section.  */
 			      cur_node.host_end--;
-			      n = splay_tree_lookup (&mm->splay_tree,
-						     &cur_node);
+			      n = splay_tree_lookup (mem_map, &cur_node);
 			      if (n == NULL)
 				{
 				  cur_node.host_start--;
-				  n = splay_tree_lookup (&mm->splay_tree,
-							 &cur_node);
+				  n = splay_tree_lookup (mem_map, &cur_node);
 				  cur_node.host_start++;
 				}
 			    }
@@ -489,7 +490,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	}
     }
 
-  gomp_mutex_unlock (&mm->lock);
+  gomp_mutex_unlock (&devicep->lock);
   return tgt;
 }
 
@@ -514,10 +515,9 @@ attribute_hidden void
 gomp_copy_from_async (struct target_mem_desc *tgt)
 {
   struct gomp_device_descr *devicep = tgt->device_descr;
-  struct gomp_memory_mapping *mm = tgt->mem_map;
   size_t i;
 
-  gomp_mutex_lock (&mm->lock);
+  gomp_mutex_lock (&devicep->lock);
 
   for (i = 0; i < tgt->list_count; i++)
     if (tgt->list[i] == NULL)
@@ -536,7 +536,7 @@ gomp_copy_from_async (struct target_mem_desc *tgt)
 				  k->host_end - k->host_start);
       }
 
-  gomp_mutex_unlock (&mm->lock);
+  gomp_mutex_unlock (&devicep->lock);
 }
 
 /* Unmap variables described by TGT.  If DO_COPYFROM is true, copy relevant
@@ -547,7 +547,6 @@ attribute_hidden void
 gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
 {
   struct gomp_device_descr *devicep = tgt->device_descr;
-  struct gomp_memory_mapping *mm = tgt->mem_map;
 
   if (tgt->list_count == 0)
     {
@@ -555,7 +554,7 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
       return;
     }
 
-  gomp_mutex_lock (&mm->lock);
+  gomp_mutex_lock (&devicep->lock);
 
   size_t i;
   for (i = 0; i < tgt->list_count; i++)
@@ -572,7 +571,7 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
 	  devicep->dev2host_func (devicep->target_id, (void *) k->host_start,
 				  (void *) (k->tgt->tgt_start + k->tgt_offset),
 				  k->host_end - k->host_start);
-	splay_tree_remove (&mm->splay_tree, k);
+	splay_tree_remove (tgt->mem_map, k);
 	if (k->tgt->refcount > 1)
 	  k->tgt->refcount--;
 	else
@@ -584,13 +583,12 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
   else
     gomp_unmap_tgt (tgt);
 
-  gomp_mutex_unlock (&mm->lock);
+  gomp_mutex_unlock (&devicep->lock);
 }
 
 static void
-gomp_update (struct gomp_device_descr *devicep, struct gomp_memory_mapping *mm,
-	     size_t mapnum, void **hostaddrs, size_t *sizes, void *kinds,
-	     bool is_openacc)
+gomp_update (struct gomp_device_descr *devicep, size_t mapnum, void **hostaddrs,
+	     size_t *sizes, void *kinds, bool is_openacc)
 {
   size_t i;
   struct splay_tree_key_s cur_node;
@@ -602,14 +600,13 @@ gomp_update (struct gomp_device_descr *devicep, struct gomp_memory_mapping *mm,
   if (mapnum == 0)
     return;
 
-  gomp_mutex_lock (&mm->lock);
+  gomp_mutex_lock (&devicep->lock);
   for (i = 0; i < mapnum; i++)
     if (sizes[i])
       {
 	cur_node.host_start = (uintptr_t) hostaddrs[i];
 	cur_node.host_end = cur_node.host_start + sizes[i];
-	splay_tree_key n = splay_tree_lookup (&mm->splay_tree,
-					      &cur_node);
+	splay_tree_key n = splay_tree_lookup (&devicep->mem_map, &cur_node);
 	if (n)
 	  {
 	    int kind = get_kind (is_openacc, kinds, i);
@@ -643,10 +640,88 @@ gomp_update (struct gomp_device_descr *devicep, struct gomp_memory_mapping *mm,
 		      (void *) cur_node.host_start,
 		      (void *) cur_node.host_end);
       }
-  gomp_mutex_unlock (&mm->lock);
+  gomp_mutex_unlock (&devicep->lock);
+}
+
+/* Load image pointed by TARGET_DATA to the device, specified by DEVICEP.
+   And insert to splay tree the mapping between addresses from HOST_TABLE and
+   from loaded target image.  */
+
+static void
+gomp_offload_image_to_device (struct gomp_device_descr *devicep,
+			      void *host_table, void *target_data)
+{
+  void **host_func_table = ((void ***) host_table)[0];
+  void **host_funcs_end  = ((void ***) host_table)[1];
+  void **host_var_table  = ((void ***) host_table)[2];
+  void **host_vars_end   = ((void ***) host_table)[3];
+
+  /* The func table contains only addresses, the var table contains addresses
+     and corresponding sizes.  */
+  int num_funcs = host_funcs_end - host_func_table;
+  int num_vars  = (host_vars_end - host_var_table) / 2;
+
+  /* Load image to device and get target addresses for the image.  */
+  struct addr_pair *target_table = NULL;
+  int i, num_target_entries
+    = devicep->load_image_func (devicep->target_id, target_data, &target_table);
+
+  if (num_target_entries != num_funcs + num_vars)
+    gomp_fatal ("Can't map target functions or variables");
+
+  /* Insert host-target address mapping into splay tree.  */
+  struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt));
+  tgt->array = gomp_malloc ((num_funcs + num_vars) * sizeof (*tgt->array));
+  tgt->refcount = 1;
+  tgt->tgt_start = 0;
+  tgt->tgt_end = 0;
+  tgt->to_free = NULL;
+  tgt->prev = NULL;
+  tgt->list_count = 0;
+  tgt->device_descr = devicep;
+  splay_tree_node array = tgt->array;
+
+  for (i = 0; i < num_funcs; i++)
+    {
+      splay_tree_key k = &array->key;
+      k->host_start = (uintptr_t) host_func_table[i];
+      k->host_end = k->host_start + 1;
+      k->tgt = tgt;
+      k->tgt_offset = target_table[i].start;
+      k->refcount = 1;
+      k->async_refcount = 0;
+      k->copy_from = false;
+      array->left = NULL;
+      array->right = NULL;
+      splay_tree_insert (&devicep->mem_map, array);
+      array++;
+    }
+
+  for (i = 0; i < num_vars; i++)
+    {
+      struct addr_pair *target_var = &target_table[num_funcs + i];
+      if (target_var->end - target_var->start
+	  != (uintptr_t) host_var_table[i * 2 + 1])
+	gomp_fatal ("Can't map target variables (size mismatch)");
+
+      splay_tree_key k = &array->key;
+      k->host_start = (uintptr_t) host_var_table[i * 2];
+      k->host_end = k->host_start + (uintptr_t) host_var_table[i * 2 + 1];
+      k->tgt = tgt;
+      k->tgt_offset = target_var->start;
+      k->refcount = 1;
+      k->async_refcount = 0;
+      k->copy_from = false;
+      array->left = NULL;
+      array->right = NULL;
+      splay_tree_insert (&devicep->mem_map, array);
+      array++;
+    }
+
+  free (target_table);
 }
 
-/* This function should be called from every offload image.
+/* This function should be called from every offload image while loading.
    It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of
    the target, and TARGET_DATA needed by target plugin.  */
 
@@ -654,6 +729,20 @@ void
 GOMP_offload_register (void *host_table, enum offload_target_type target_type,
 		       void *target_data)
 {
+  int i;
+  gomp_mutex_lock (&register_lock);
+
+  /* Load image to all initialized devices.  */
+  for (i = 0; i < num_devices; i++)
+    {
+      struct gomp_device_descr *devicep = &devices[i];
+      gomp_mutex_lock (&devicep->lock);
+      if (devicep->type == target_type && devicep->is_initialized)
+	gomp_offload_image_to_device (devicep, host_table, target_data);
+      gomp_mutex_unlock (&devicep->lock);
+    }
+
+  /* Insert image to array of pending images.  */
   offload_images = gomp_realloc (offload_images,
 				 (num_offload_images + 1)
 				 * sizeof (struct offload_image_descr));
@@ -663,74 +752,129 @@ GOMP_offload_register (void *host_table, enum offload_target_type target_type,
   offload_images[num_offload_images].target_data = target_data;
 
   num_offload_images++;
+  gomp_mutex_unlock (&register_lock);
 }
 
-/* This function initializes the target device, specified by DEVICEP.  DEVICEP
-   must be locked on entry, and remains locked on return.  */
+/* This function should be called from every offload image while unloading.
+   It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of
+   the target, and TARGET_DATA needed by target plugin.  */
 
-attribute_hidden void
-gomp_init_device (struct gomp_device_descr *devicep)
+void
+GOMP_offload_unregister (void *host_table, enum offload_target_type target_type,
+			 void *target_data)
 {
-  devicep->init_device_func (devicep->target_id);
-  devicep->is_initialized = true;
+  void **host_func_table = ((void ***) host_table)[0];
+  void **host_funcs_end  = ((void ***) host_table)[1];
+  void **host_var_table  = ((void ***) host_table)[2];
+  void **host_vars_end   = ((void ***) host_table)[3];
+  int i;
+
+  /* The func table contains only addresses, the var table contains addresses
+     and corresponding sizes.  */
+  int num_funcs = host_funcs_end - host_func_table;
+  int num_vars  = (host_vars_end - host_var_table) / 2;
+
+  gomp_mutex_lock (&register_lock);
+
+  /* Unload image from all initialized devices.  */
+  for (i = 0; i < num_devices; i++)
+    {
+      int j;
+      struct gomp_device_descr *devicep = &devices[i];
+      gomp_mutex_lock (&devicep->lock);
+      if (devicep->type != target_type || !devicep->is_initialized)
+	{
+	  gomp_mutex_unlock (&devicep->lock);
+	  continue;
+	}
+
+      devicep->unload_image_func (devicep->target_id, target_data);
+
+      /* Remove mapping from splay tree.  */
+      struct splay_tree_key_s k;
+      splay_tree_key node = NULL;
+      if (num_funcs > 0)
+	{
+	  k.host_start = (uintptr_t) host_func_table[0];
+	  k.host_end = k.host_start + 1;
+	  node = splay_tree_lookup (&devicep->mem_map, &k);
+	}
+      else if (num_vars > 0)
+	{
+	  k.host_start = (uintptr_t) host_var_table[0];
+	  k.host_end = k.host_start + (uintptr_t) host_var_table[1];
+	  node = splay_tree_lookup (&devicep->mem_map, &k);
+	}
+
+      for (j = 0; j < num_funcs; j++)
+	{
+	  k.host_start = (uintptr_t) host_func_table[j];
+	  k.host_end = k.host_start + 1;
+	  splay_tree_remove (&devicep->mem_map, &k);
+	}
+
+      for (j = 0; j < num_vars; j++)
+	{
+	  k.host_start = (uintptr_t) host_var_table[j * 2];
+	  k.host_end = k.host_start + (uintptr_t) host_var_table[j * 2 + 1];
+	  splay_tree_remove (&devicep->mem_map, &k);
+	}
+
+      if (node)
+	{
+	  free (node->tgt);
+	  free (node);
+	}
+
+      gomp_mutex_unlock (&devicep->lock);
+    }
+
+  /* Remove image from array of pending images.  */
+  for (i = 0; i < num_offload_images; i++)
+    if (offload_images[i].target_data == target_data)
+      {
+	offload_images[i] = offload_images[--num_offload_images];
+	break;
+      }
+
+  gomp_mutex_unlock (&register_lock);
 }
 
-/* Initialize address mapping tables.  MM must be locked on entry, and remains
-   locked on return.  */
+/* This function initializes the target device, specified by DEVICEP.  DEVICEP
+   must be locked on entry, and remains locked on return.  */
 
 attribute_hidden void
-gomp_init_tables (struct gomp_device_descr *devicep,
-		  struct gomp_memory_mapping *mm)
+gomp_init_device (struct gomp_device_descr *devicep)
 {
-  /* Get address mapping table for device.  */
-  struct mapping_table *table = NULL;
-  int num_entries = devicep->get_table_func (devicep->target_id, &table);
-
-  /* Insert host-target address mapping into dev_splay_tree.  */
   int i;
-  for (i = 0; i < num_entries; i++)
+  devicep->init_device_func (devicep->target_id);
+
+  /* Load to device all images registered by the moment.  */
+  for (i = 0; i < num_offload_images; i++)
     {
-      struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt));
-      tgt->refcount = 1;
-      tgt->array = gomp_malloc (sizeof (*tgt->array));
-      tgt->tgt_start = table[i].tgt_start;
-      tgt->tgt_end = table[i].tgt_end;
-      tgt->to_free = NULL;
-      tgt->list_count = 0;
-      tgt->device_descr = devicep;
-      splay_tree_node node = tgt->array;
-      splay_tree_key k = &node->key;
-      k->host_start = table[i].host_start;
-      k->host_end = table[i].host_end;
-      k->tgt_offset = 0;
-      k->refcount = 1;
-      k->copy_from = false;
-      k->tgt = tgt;
-      node->left = NULL;
-      node->right = NULL;
-      splay_tree_insert (&mm->splay_tree, node);
+      struct offload_image_descr *image = &offload_images[i];
+      if (image->type == devicep->type)
+	gomp_offload_image_to_device (devicep, image->host_table,
+				      image->target_data);
     }
 
-  free (table);
-  mm->is_initialized = true;
+  devicep->is_initialized = true;
 }
 
 /* Free address mapping tables.  MM must be locked on entry, and remains locked
    on return.  */
 
 attribute_hidden void
-gomp_free_memmap (struct gomp_memory_mapping *mm)
+gomp_free_memmap (struct splay_tree_s *mem_map)
 {
-  while (mm->splay_tree.root)
+  while (mem_map->root)
     {
-      struct target_mem_desc *tgt = mm->splay_tree.root->key.tgt;
+      struct target_mem_desc *tgt = mem_map->root->key.tgt;
 
-      splay_tree_remove (&mm->splay_tree, &mm->splay_tree.root->key);
+      splay_tree_remove (mem_map, &mem_map->root->key);
       free (tgt->array);
       free (tgt);
     }
-
-  mm->is_initialized = false;
 }
 
 /* This function de-initializes the target device, specified by DEVICEP.
@@ -791,22 +935,17 @@ GOMP_target (int device, void (*fn) (void *), const void *unused,
     fn_addr = (void *) fn;
   else
     {
-      struct gomp_memory_mapping *mm = &devicep->mem_map;
-      gomp_mutex_lock (&mm->lock);
-
-      if (!mm->is_initialized)
-	gomp_init_tables (devicep, mm);
-
+      gomp_mutex_lock (&devicep->lock);
       struct splay_tree_key_s k;
       k.host_start = (uintptr_t) fn;
       k.host_end = k.host_start + 1;
-      splay_tree_key tgt_fn = splay_tree_lookup (&mm->splay_tree, &k);
+      splay_tree_key tgt_fn = splay_tree_lookup (&devicep->mem_map, &k);
       if (tgt_fn == NULL)
 	gomp_fatal ("Target function wasn't mapped");
 
-      gomp_mutex_unlock (&mm->lock);
+      gomp_mutex_unlock (&devicep->lock);
 
-      fn_addr = (void *) tgt_fn->tgt->tgt_start;
+      fn_addr = (void *) tgt_fn->tgt_offset;
     }
 
   struct target_mem_desc *tgt_vars
@@ -856,12 +995,6 @@ GOMP_target_data (int device, const void *unused, size_t mapnum,
     gomp_init_device (devicep);
   gomp_mutex_unlock (&devicep->lock);
 
-  struct gomp_memory_mapping *mm = &devicep->mem_map;
-  gomp_mutex_lock (&mm->lock);
-  if (!mm->is_initialized)
-    gomp_init_tables (devicep, mm);
-  gomp_mutex_unlock (&mm->lock);
-
   struct target_mem_desc *tgt
     = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false,
 		     false);
@@ -897,13 +1030,7 @@ GOMP_target_update (int device, const void *unused, size_t mapnum,
     gomp_init_device (devicep);
   gomp_mutex_unlock (&devicep->lock);
 
-  struct gomp_memory_mapping *mm = &devicep->mem_map;
-  gomp_mutex_lock (&mm->lock);
-  if (!mm->is_initialized)
-    gomp_init_tables (devicep, mm);
-  gomp_mutex_unlock (&mm->lock);
-
-  gomp_update (devicep, mm, mapnum, hostaddrs, sizes, kinds, false);
+  gomp_update (devicep, mapnum, hostaddrs, sizes, kinds, false);
 }
 
 void
@@ -972,10 +1099,10 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
   DLSYM (get_caps);
   DLSYM (get_type);
   DLSYM (get_num_devices);
-  DLSYM (register_image);
   DLSYM (init_device);
   DLSYM (fini_device);
-  DLSYM (get_table);
+  DLSYM (load_image);
+  DLSYM (unload_image);
   DLSYM (alloc);
   DLSYM (free);
   DLSYM (dev2host);
@@ -1038,22 +1165,6 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
   return err == NULL;
 }
 
-/* This function adds a compatible offload image IMAGE to an accelerator device
-   DEVICE.  DEVICE must be locked on entry, and remains locked on return.  */
-
-static void
-gomp_register_image_for_device (struct gomp_device_descr *device,
-				struct offload_image_descr *image)
-{
-  if (!device->offload_regions_registered
-      && (device->type == image->type
-	  || device->type == OFFLOAD_TARGET_TYPE_HOST))
-    {
-      device->register_image_func (image->host_table, image->target_data);
-      device->offload_regions_registered = true;
-    }
-}
-
 /* This function initializes the runtime needed for offloading.
    It parses the list of offload targets and tries to load the plugins for
    these targets.  On return, the variables NUM_DEVICES and NUM_DEVICES_OPENMP
@@ -1112,17 +1223,14 @@ gomp_target_init (void)
 		current_device.name = current_device.get_name_func ();
 		/* current_device.capabilities has already been set.  */
 		current_device.type = current_device.get_type_func ();
-		current_device.mem_map.is_initialized = false;
-		current_device.mem_map.splay_tree.root = NULL;
+		current_device.mem_map.root = NULL;
 		current_device.is_initialized = false;
-		current_device.offload_regions_registered = false;
 		current_device.openacc.data_environ = NULL;
 		current_device.openacc.target_data = NULL;
 		for (i = 0; i < new_num_devices; i++)
 		  {
 		    current_device.target_id = i;
 		    devices[num_devices] = current_device;
-		    gomp_mutex_init (&devices[num_devices].mem_map.lock);
 		    gomp_mutex_init (&devices[num_devices].lock);
 		    num_devices++;
 		  }
@@ -1157,21 +1265,12 @@ gomp_target_init (void)
 
   for (i = 0; i < num_devices; i++)
     {
-      int j;
-
-      for (j = 0; j < num_offload_images; j++)
-	gomp_register_image_for_device (&devices[i], &offload_images[j]);
-
       /* The 'devices' array can be moved (by the realloc call) until we have
 	 found all the plugins, so registering with the OpenACC runtime (which
 	 takes a copy of the pointer argument) must be delayed until now.  */
       if (devices[i].capabilities & GOMP_OFFLOAD_CAP_OPENACC_200)
 	goacc_register (&devices[i]);
     }
-
-  free (offload_images);
-  offload_images = NULL;
-  num_offload_images = 0;
 }
 
 #else /* PLUGIN_SUPPORT */
diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
index 3e7a958..a2d61b1 100644
--- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
+++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
@@ -34,6 +34,7 @@
 #include <string.h>
 #include <utility>
 #include <vector>
+#include <map>
 #include "libgomp-plugin.h"
 #include "compiler_if_host.h"
 #include "main_target_image.h"
@@ -53,6 +54,29 @@ fprintf (stderr, "\n");					    \
 #endif
 
 
+/* Start/end addresses of functions and global variables on a device.  */
+typedef std::vector<addr_pair> AddrVect;
+
+/* Addresses for one image and all devices.  */
+typedef std::vector<AddrVect> DevAddrVect;
+
+/* Addresses for all images and all devices.  */
+typedef std::map<void *, DevAddrVect> ImgDevAddrMap;
+
+
+/* Total number of available devices.  */
+static int num_devices;
+
+/* Total number of shared libraries with offloading to Intel MIC.  */
+static int num_images;
+
+/* Two dimensional array: one key is a pointer to image,
+   second key is number of device.  Contains a vector of pointer pairs.  */
+static ImgDevAddrMap *address_table;
+
+/* Thread-safe registration of the main image.  */
+static pthread_once_t main_image_is_registered = PTHREAD_ONCE_INIT;
+
 static VarDesc vd_host2tgt = {
   { 1, 1 },		      /* dst, src			      */
   { 1, 0 },		      /* in, out			      */
@@ -90,28 +114,17 @@ static VarDesc vd_tgt2host = {
 };
 
 
-/* Total number of shared libraries with offloading to Intel MIC.  */
-static int num_libraries;
-
-/* Pointers to the descriptors, containing pointers to host-side tables and to
-   target images.  */
-static std::vector< std::pair<void *, void *> > lib_descrs;
-
-/* Thread-safe registration of the main image.  */
-static pthread_once_t main_image_is_registered = PTHREAD_ONCE_INIT;
-
-
 /* Add path specified in LD_LIBRARY_PATH to MIC_LD_LIBRARY_PATH, which is
    required by liboffloadmic.  */
 __attribute__((constructor))
 static void
-set_mic_lib_path (void)
+init (void)
 {
   const char *ld_lib_path = getenv (LD_LIBRARY_PATH_ENV);
   const char *mic_lib_path = getenv (MIC_LD_LIBRARY_PATH_ENV);
 
   if (!ld_lib_path)
-    return;
+    goto out;
 
   if (!mic_lib_path)
     setenv (MIC_LD_LIBRARY_PATH_ENV, ld_lib_path, 1);
@@ -133,6 +146,10 @@ set_mic_lib_path (void)
       if (!use_alloca)
 	free (mic_lib_path_new);
     }
+
+out:
+  address_table = new ImgDevAddrMap;
+  num_devices = _Offload_number_of_devices ();
 }
 
 extern "C" const char *
@@ -162,18 +179,8 @@ GOMP_OFFLOAD_get_type (void)
 extern "C" int
 GOMP_OFFLOAD_get_num_devices (void)
 {
-  int res = _Offload_number_of_devices ();
-  TRACE ("(): return %d", res);
-  return res;
-}
-
-/* This should be called from every shared library with offloading.  */
-extern "C" void
-GOMP_OFFLOAD_register_image (void *host_table, void *target_image)
-{
-  TRACE ("(host_table = %p, target_image = %p)", host_table, target_image);
-  lib_descrs.push_back (std::make_pair (host_table, target_image));
-  num_libraries++;
+  TRACE ("(): return %d", num_devices);
+  return num_devices;
 }
 
 static void
@@ -196,7 +203,8 @@ register_main_image ()
   __offload_register_image (&main_target_image);
 }
 
-/* Load offload_target_main on target.  */
+/* liboffloadmic loads and runs offload_target_main on all available devices
+   during a first call to offload ().  */
 extern "C" void
 GOMP_OFFLOAD_init_device (int device)
 {
@@ -243,9 +251,11 @@ get_target_table (int device, int &num_funcs, int &num_vars, void **&table)
     }
 }
 
+/* Offload TARGET_IMAGE to all available devices and fill address_table with
+   corresponding target addresses.  */
+
 static void
-load_lib_and_get_table (int device, int lib_num, mapping_table *&table,
-			int &table_size)
+offload_image (void *target_image)
 {
   struct TargetImage {
     int64_t size;
@@ -254,19 +264,11 @@ load_lib_and_get_table (int device, int lib_num, mapping_table *&table,
     char data[];
   } __attribute__ ((packed));
 
-  void ***host_table_descr = (void ***) lib_descrs[lib_num].first;
-  void **host_func_start = host_table_descr[0];
-  void **host_func_end   = host_table_descr[1];
-  void **host_var_start  = host_table_descr[2];
-  void **host_var_end    = host_table_descr[3];
+  void *image_start = ((void **) target_image)[0];
+  void *image_end   = ((void **) target_image)[1];
 
-  void **target_image_descr = (void **) lib_descrs[lib_num].second;
-  void *image_start = target_image_descr[0];
-  void *image_end   = target_image_descr[1];
-
-  TRACE ("() host_table_descr { %p, %p, %p, %p }", host_func_start,
-	 host_func_end, host_var_start, host_var_end);
-  TRACE ("() target_image_descr { %p, %p }", image_start, image_end);
+  TRACE ("(target_image = %p { %p, %p })",
+	 target_image, image_start, image_end);
 
   int64_t image_size = (uintptr_t) image_end - (uintptr_t) image_start;
   TargetImage *image
@@ -279,94 +281,87 @@ load_lib_and_get_table (int device, int lib_num, mapping_table *&table,
     }
 
   image->size = image_size;
-  sprintf (image->name, "lib%010d.so", lib_num);
+  sprintf (image->name, "lib%010d.so", num_images++);
   memcpy (image->data, image_start, image->size);
 
   TRACE ("() __offload_register_image %s { %p, %d }",
 	 image->name, image_start, image->size);
   __offload_register_image (image);
 
-  int tgt_num_funcs = 0;
-  int tgt_num_vars = 0;
-  void **tgt_table = NULL;
-  get_target_table (device, tgt_num_funcs, tgt_num_vars, tgt_table);
-  free (image);
-
-  /* The func table contains only addresses, the var table contains addresses
-     and corresponding sizes.  */
-  int host_num_funcs = host_func_end - host_func_start;
-  int host_num_vars  = (host_var_end - host_var_start) / 2;
-  TRACE ("() host_num_funcs = %d, tgt_num_funcs = %d",
-	 host_num_funcs, tgt_num_funcs);
-  TRACE ("() host_num_vars = %d, tgt_num_vars = %d",
-	 host_num_vars, tgt_num_vars);
-  if (host_num_funcs != tgt_num_funcs)
+  /* Receive tables for target_image from all devices.  */
+  DevAddrVect dev_table;
+  for (int dev = 0; dev < num_devices; dev++)
     {
-      fprintf (stderr, "%s: Can't map target functions\n", __FILE__);
-      exit (1);
-    }
-  if (host_num_vars != tgt_num_vars)
-    {
-      fprintf (stderr, "%s: Can't map target variables\n", __FILE__);
-      exit (1);
-    }
+      int num_funcs = 0;
+      int num_vars = 0;
+      void **table = NULL;
 
-  table = (mapping_table *) realloc (table, (table_size + host_num_funcs
-					     + host_num_vars)
-					    * sizeof (mapping_table));
-  if (table == NULL)
-    {
-      fprintf (stderr, "%s: Can't allocate memory\n", __FILE__);
-      exit (1);
-    }
+      get_target_table (dev, num_funcs, num_vars, table);
 
-  for (int i = 0; i < host_num_funcs; i++)
-    {
-      mapping_table t;
-      t.host_start = (uintptr_t) host_func_start[i];
-      t.host_end = t.host_start + 1;
-      t.tgt_start = (uintptr_t) tgt_table[i];
-      t.tgt_end = t.tgt_start + 1;
-
-      TRACE ("() lib %d, func %d:\t0x%llx -- 0x%llx",
-	     lib_num, i, t.host_start, t.tgt_start);
-
-      table[table_size++] = t;
-    }
+      AddrVect curr_dev_table;
 
-  for (int i = 0; i < host_num_vars * 2; i += 2)
-    {
-      mapping_table t;
-      t.host_start = (uintptr_t) host_var_start[i];
-      t.host_end = t.host_start + (uintptr_t) host_var_start[i+1];
-      t.tgt_start = (uintptr_t) tgt_table[tgt_num_funcs+i];
-      t.tgt_end = t.tgt_start + (uintptr_t) tgt_table[tgt_num_funcs+i+1];
+      for (int i = 0; i < num_funcs; i++)
+	{
+	  addr_pair tgt_addr;
+	  tgt_addr.start = (uintptr_t) table[i];
+	  tgt_addr.end = tgt_addr.start + 1;
+	  TRACE ("() func %d:\t0x%llx..0x%llx", i,
+		 tgt_addr.start, tgt_addr.end);
+	  curr_dev_table.push_back (tgt_addr);
+	}
 
-      TRACE ("() lib %d, var %d:\t0x%llx (%d) -- 0x%llx (%d)", lib_num, i/2,
-	     t.host_start, t.host_end - t.host_start,
-	     t.tgt_start, t.tgt_end - t.tgt_start);
+      for (int i = 0; i < num_vars; i++)
+	{
+	  addr_pair tgt_addr;
+	  tgt_addr.start = (uintptr_t) table[num_funcs+i*2];
+	  tgt_addr.end = tgt_addr.start + (uintptr_t) table[num_funcs+i*2+1];
+	  TRACE ("() var %d:\t0x%llx..0x%llx", i, tgt_addr.start, tgt_addr.end);
+	  curr_dev_table.push_back (tgt_addr);
+	}
 
-      table[table_size++] = t;
+      dev_table.push_back (curr_dev_table);
     }
 
-  delete [] tgt_table;
+  address_table->insert (std::make_pair (target_image, dev_table));
+
+  free (image);
 }
 
 extern "C" int
-GOMP_OFFLOAD_get_table (int device, void *result)
+GOMP_OFFLOAD_load_image (int device, void *target_image, addr_pair **result)
 {
-  TRACE ("(num_libraries = %d)", num_libraries);
+  TRACE ("(device = %d, target_image = %p)", device, target_image);
 
-  mapping_table *table = NULL;
-  int table_size = 0;
+  /* If target_image is already present in address_table, then there is no need
+     to offload it.  */
+  if (address_table->count (target_image) == 0)
+    offload_image (target_image);
 
-  for (int i = 0; i < num_libraries; i++)
-    load_lib_and_get_table (device, i, table, table_size);
+  AddrVect *curr_dev_table = &(*address_table)[target_image][device];
+  int table_size = curr_dev_table->size ();
+  addr_pair *table = (addr_pair *) malloc (table_size * sizeof (addr_pair));
+  if (table == NULL)
+    {
+      fprintf (stderr, "%s: Can't allocate memory\n", __FILE__);
+      exit (1);
+    }
 
-  *(void **) result = table;
+  std::copy (curr_dev_table->begin (), curr_dev_table->end (), table);
+  *result = table;
   return table_size;
 }
 
+extern "C" void
+GOMP_OFFLOAD_unload_image (int device, void *target_image)
+{
+  TRACE ("(device = %d, target_image = %p)", device, target_image);
+
+  /* TODO: Currently liboffloadmic doesn't support __offload_unregister_image
+     for libraries.  */
+
+  address_table->erase (target_image);
+}
+
 extern "C" void *
 GOMP_OFFLOAD_alloc (int device, size_t size)
 {


  -- Ilya

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
  2015-03-30 16:42                                   ` Jakub Jelinek
  2015-03-30 21:43                                     ` Julian Brown
  2015-03-31 12:52                                     ` Ilya Verbin
@ 2015-03-31 18:25                                     ` Ilya Verbin
  2015-03-31 19:06                                       ` Jakub Jelinek
  2 siblings, 1 reply; 92+ messages in thread
From: Ilya Verbin @ 2015-03-31 18:25 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Thomas Schwinge, Julian Brown, gcc-patches, Kirill Yukhin

On Mon, Mar 30, 2015 at 18:42:02 +0200, Jakub Jelinek wrote:
> Shouldn't either this function, or gomp_offload_image_to_device lock
> also devicep->lock mutex and unlock at the end?
> Where exactly I guess depends on if the devicep->* hook calls should be
> guarded with the mutex or not.  If yes, it should be this function and
> gomp_init_device.
> 
> > +      if (devicep->type != target_type || !devicep->is_initialized)
> > +	continue;
> > +
> 
> Similarly.

Oops, there is a deadlock.  E.g. if gomp_map_vars locks devicep->lock and then
calls gomp_fatal, the destructors from .fini section are executed, so
gomp_mutex_lock in GOMP_offload_unregister will wait for devicep->lock.

  -- Ilya

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
  2015-03-31 18:25                                     ` libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch) Ilya Verbin
@ 2015-03-31 19:06                                       ` Jakub Jelinek
  0 siblings, 0 replies; 92+ messages in thread
From: Jakub Jelinek @ 2015-03-31 19:06 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: Thomas Schwinge, Julian Brown, gcc-patches, Kirill Yukhin

On Tue, Mar 31, 2015 at 09:25:26PM +0300, Ilya Verbin wrote:
> On Mon, Mar 30, 2015 at 18:42:02 +0200, Jakub Jelinek wrote:
> > Shouldn't either this function, or gomp_offload_image_to_device lock
> > also devicep->lock mutex and unlock at the end?
> > Where exactly I guess depends on if the devicep->* hook calls should be
> > guarded with the mutex or not.  If yes, it should be this function and
> > gomp_init_device.
> > 
> > > +      if (devicep->type != target_type || !devicep->is_initialized)
> > > +	continue;
> > > +
> > 
> > Similarly.
> 
> Oops, there is a deadlock.  E.g. if gomp_map_vars locks devicep->lock and then
> calls gomp_fatal, the destructors from .fini section are executed, so
> gomp_mutex_lock in GOMP_offload_unregister will wait for devicep->lock.

Thus perhaps before calling gomp_fatal you should release the device lock
(if held) and register_lock (ditto).

	Jakub

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
  2015-03-31 16:10                                         ` Ilya Verbin
@ 2015-03-31 23:53                                           ` Ilya Verbin
  2015-04-01  5:21                                             ` Jakub Jelinek
  0 siblings, 1 reply; 92+ messages in thread
From: Ilya Verbin @ 2015-03-31 23:53 UTC (permalink / raw)
  To: Jakub Jelinek, Julian Brown; +Cc: Thomas Schwinge, gcc-patches, Kirill Yukhin

On Tue, Mar 31, 2015 at 19:10:36 +0300, Ilya Verbin wrote:
> Ok, thanks for the clarification!  Here is the new patch with variables.
> 
> Unfortunately I see 4 fails in make check-target-libgomp with PTX patch applied
> on top, but with disabled offloading to PTX.
> Julian, have you seen them?  All other tests passed with intelmic emul.
> 
> FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_on_device-1.c -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
> FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/if-1.c -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
> FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/acc_on_device-1.c -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
> FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/if-1.c -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
> 
> acc_on_device-1.c aborts here:
>   /* Offloaded.  */
> #pragma acc parallel
>   {
>     if (acc_on_device (acc_device_none))
>       abort ();

And here is the next version with fixed potential deadlock in
GOMP_offload_unregister.  make check-target-libgomp also passed.
(but with PTX patch make check-target-libgomp has several fails mentioned above)


diff --git a/gcc/config/i386/intelmic-mkoffload.c b/gcc/config/i386/intelmic-mkoffload.c
index f93007c..e101f93 100644
--- a/gcc/config/i386/intelmic-mkoffload.c
+++ b/gcc/config/i386/intelmic-mkoffload.c
@@ -350,14 +350,24 @@ generate_host_descr_file (const char *host_compiler)
 	   "#ifdef __cplusplus\n"
 	   "extern \"C\"\n"
 	   "#endif\n"
-	   "void GOMP_offload_register (void *, int, void *);\n\n"
+	   "void GOMP_offload_register (void *, int, void *);\n"
+	   "void GOMP_offload_unregister (void *, int, void *);\n\n"
 
 	   "__attribute__((constructor))\n"
 	   "static void\n"
 	   "init (void)\n"
 	   "{\n"
 	   "  GOMP_offload_register (&__OFFLOAD_TABLE__, %d, __offload_target_data);\n"
+	   "}\n\n", GOMP_DEVICE_INTEL_MIC);
+
+  fprintf (src_file,
+	   "__attribute__((destructor))\n"
+	   "static void\n"
+	   "fini (void)\n"
+	   "{\n"
+	   "  GOMP_offload_unregister (&__OFFLOAD_TABLE__, %d, __offload_target_data);\n"
 	   "}\n", GOMP_DEVICE_INTEL_MIC);
+
   fclose (src_file);
 
   unsigned new_argc = 0;
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index d9cbff5..1072ae4 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -51,14 +51,12 @@ enum offload_target_type
   OFFLOAD_TARGET_TYPE_INTEL_MIC = 6
 };
 
-/* Auxiliary struct, used for transferring a host-target address range mapping
-   from plugin to libgomp.  */
-struct mapping_table
+/* Auxiliary struct, used for transferring pairs of addresses from plugin
+   to libgomp.  */
+struct addr_pair
 {
-  uintptr_t host_start;
-  uintptr_t host_end;
-  uintptr_t tgt_start;
-  uintptr_t tgt_end;
+  uintptr_t start;
+  uintptr_t end;
 };
 
 /* Miscellaneous functions.  */
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 3089401..a1d42c5 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -224,7 +224,6 @@ struct gomp_team_state
 };
 
 struct target_mem_desc;
-struct gomp_memory_mapping;
 
 /* These are the OpenMP 4.0 Internal Control Variables described in
    section 2.3.1.  Those described as having one copy per task are
@@ -657,7 +656,7 @@ struct target_mem_desc {
   struct gomp_device_descr *device_descr;
 
   /* Memory mapping info for the thread that created this descriptor.  */
-  struct gomp_memory_mapping *mem_map;
+  struct splay_tree_s *mem_map;
 
   /* List of splay keys to remove (or decrease refcount)
      at the end of region.  */
@@ -683,20 +682,6 @@ struct splay_tree_key_s {
 
 #include "splay-tree.h"
 
-/* Information about mapped memory regions (per device/context).  */
-
-struct gomp_memory_mapping
-{
-  /* Mutex for operating with the splay tree and other shared structures.  */
-  gomp_mutex_t lock;
-
-  /* True when tables have been added to this memory map.  */
-  bool is_initialized;
-
-  /* Splay tree containing information about mapped memory regions.  */
-  struct splay_tree_s splay_tree;
-};
-
 typedef struct acc_dispatch_t
 {
   /* This is a linked list of data mapped using the
@@ -773,19 +758,18 @@ struct gomp_device_descr
   unsigned int (*get_caps_func) (void);
   int (*get_type_func) (void);
   int (*get_num_devices_func) (void);
-  void (*register_image_func) (void *, void *);
   void (*init_device_func) (int);
   void (*fini_device_func) (int);
-  int (*get_table_func) (int, struct mapping_table **);
+  int (*load_image_func) (int, void *, struct addr_pair **);
+  void (*unload_image_func) (int, void *);
   void *(*alloc_func) (int, size_t);
   void (*free_func) (int, void *);
   void *(*dev2host_func) (int, void *, const void *, size_t);
   void *(*host2dev_func) (int, void *, const void *, size_t);
   void (*run_func) (int, void *, void *);
 
-  /* Memory-mapping info for this device instance.  */
-  /* Uses a separate lock.  */
-  struct gomp_memory_mapping mem_map;
+  /* Splay tree containing information about mapped memory regions.  */
+  struct splay_tree_s mem_map;
 
   /* Mutex for the mutable data.  */
   gomp_mutex_t lock;
@@ -793,9 +777,6 @@ struct gomp_device_descr
   /* Set to true when device is initialized.  */
   bool is_initialized;
 
-  /* True when offload regions have been registered with this device.  */
-  bool offload_regions_registered;
-
   /* OpenACC-specific data and functions.  */
   /* This is mutable because of its mutable data_environ and target_data
      members.  */
@@ -811,9 +792,7 @@ extern struct target_mem_desc *gomp_map_vars (struct gomp_device_descr *,
 extern void gomp_copy_from_async (struct target_mem_desc *);
 extern void gomp_unmap_vars (struct target_mem_desc *, bool);
 extern void gomp_init_device (struct gomp_device_descr *);
-extern void gomp_init_tables (struct gomp_device_descr *,
-			      struct gomp_memory_mapping *);
-extern void gomp_free_memmap (struct gomp_memory_mapping *);
+extern void gomp_free_memmap (struct splay_tree_s *);
 extern void gomp_fini_device (struct gomp_device_descr *);
 
 /* work.c */
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index f44174e..2b2b953 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -231,6 +231,7 @@ GOMP_4.0 {
 GOMP_4.0.1 {
   global:
 	GOMP_offload_register;
+	GOMP_offload_unregister;
 } GOMP_4.0;
 
 OACC_2.0 {
diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index 6aeb1e7..e4756b6 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -43,20 +43,18 @@ static struct gomp_device_descr host_dispatch =
     .get_caps_func = GOMP_OFFLOAD_get_caps,
     .get_type_func = GOMP_OFFLOAD_get_type,
     .get_num_devices_func = GOMP_OFFLOAD_get_num_devices,
-    .register_image_func = GOMP_OFFLOAD_register_image,
     .init_device_func = GOMP_OFFLOAD_init_device,
     .fini_device_func = GOMP_OFFLOAD_fini_device,
-    .get_table_func = GOMP_OFFLOAD_get_table,
+    .load_image_func = GOMP_OFFLOAD_load_image,
+    .unload_image_func = GOMP_OFFLOAD_unload_image,
     .alloc_func = GOMP_OFFLOAD_alloc,
     .free_func = GOMP_OFFLOAD_free,
     .dev2host_func = GOMP_OFFLOAD_dev2host,
     .host2dev_func = GOMP_OFFLOAD_host2dev,
     .run_func = GOMP_OFFLOAD_run,
 
-    .mem_map.is_initialized = false,
-    .mem_map.splay_tree.root = NULL,
+    .mem_map.root = NULL,
     .is_initialized = false,
-    .offload_regions_registered = false,
 
     .openacc = {
       .open_device_func = GOMP_OFFLOAD_openacc_open_device,
@@ -94,7 +92,6 @@ static struct gomp_device_descr host_dispatch =
 static __attribute__ ((constructor))
 void goacc_host_init (void)
 {
-  gomp_mutex_init (&host_dispatch.mem_map.lock);
   gomp_mutex_init (&host_dispatch.lock);
   goacc_register (&host_dispatch);
 }
diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
index 166eb55..1e0243e 100644
--- a/libgomp/oacc-init.c
+++ b/libgomp/oacc-init.c
@@ -284,12 +284,6 @@ lazy_open (int ord)
     = acc_dev->openacc.create_thread_data_func (acc_dev->openacc.target_data);
 
   acc_dev->openacc.async_set_async_func (acc_async_sync);
-
-  struct gomp_memory_mapping *mem_map = &acc_dev->mem_map;
-  gomp_mutex_lock (&mem_map->lock);
-  if (!mem_map->is_initialized)
-    gomp_init_tables (acc_dev, mem_map);
-  gomp_mutex_unlock (&mem_map->lock);
 }
 
 /* OpenACC 2.0a (3.2.12, 3.2.13) doesn't specify whether the serialization of
@@ -351,10 +345,9 @@ acc_shutdown_1 (acc_device_t d)
 
 	  walk->dev->openacc.target_data = target_data = NULL;
 
-	  struct gomp_memory_mapping *mem_map = &walk->dev->mem_map;
-	  gomp_mutex_lock (&mem_map->lock);
-	  gomp_free_memmap (mem_map);
-	  gomp_mutex_unlock (&mem_map->lock);
+	  gomp_mutex_lock (&walk->dev->lock);
+	  gomp_free_memmap (&walk->dev->mem_map);
+	  gomp_mutex_unlock (&walk->dev->lock);
 
 	  walk->dev = NULL;
 	}
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 0096d51..fdc82e6 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -38,7 +38,7 @@
 /* Return block containing [H->S), or NULL if not contained.  */
 
 static splay_tree_key
-lookup_host (struct gomp_memory_mapping *mem_map, void *h, size_t s)
+lookup_host (struct gomp_device_descr *dev, void *h, size_t s)
 {
   struct splay_tree_key_s node;
   splay_tree_key key;
@@ -46,11 +46,9 @@ lookup_host (struct gomp_memory_mapping *mem_map, void *h, size_t s)
   node.host_start = (uintptr_t) h;
   node.host_end = (uintptr_t) h + s;
 
-  gomp_mutex_lock (&mem_map->lock);
-
-  key = splay_tree_lookup (&mem_map->splay_tree, &node);
-
-  gomp_mutex_unlock (&mem_map->lock);
+  gomp_mutex_lock (&dev->lock);
+  key = splay_tree_lookup (&dev->mem_map, &node);
+  gomp_mutex_unlock (&dev->lock);
 
   return key;
 }
@@ -65,14 +63,11 @@ lookup_dev (struct target_mem_desc *tgt, void *d, size_t s)
 {
   int i;
   struct target_mem_desc *t;
-  struct gomp_memory_mapping *mem_map;
 
   if (!tgt)
     return NULL;
 
-  mem_map = tgt->mem_map;
-
-  gomp_mutex_lock (&mem_map->lock);
+  gomp_mutex_lock (&tgt->device_descr->lock);
 
   for (t = tgt; t != NULL; t = t->prev)
     {
@@ -80,7 +75,7 @@ lookup_dev (struct target_mem_desc *tgt, void *d, size_t s)
         break;
     }
 
-  gomp_mutex_unlock (&mem_map->lock);
+  gomp_mutex_unlock (&tgt->device_descr->lock);
 
   if (!t)
     return NULL;
@@ -176,7 +171,7 @@ acc_deviceptr (void *h)
 
   struct goacc_thread *thr = goacc_thread ();
 
-  n = lookup_host (&thr->dev->mem_map, h, 1);
+  n = lookup_host (thr->dev, h, 1);
 
   if (!n)
     return NULL;
@@ -229,7 +224,7 @@ acc_is_present (void *h, size_t s)
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
 
-  n = lookup_host (&acc_dev->mem_map, h, s);
+  n = lookup_host (acc_dev, h, s);
 
   if (n && ((uintptr_t)h < n->host_start
 	    || (uintptr_t)h + s > n->host_end
@@ -271,7 +266,7 @@ acc_map_data (void *h, void *d, size_t s)
 	gomp_fatal ("[%p,+%d]->[%p,+%d] is a bad map",
                     (void *)h, (int)s, (void *)d, (int)s);
 
-      if (lookup_host (&acc_dev->mem_map, h, s))
+      if (lookup_host (acc_dev, h, s))
 	gomp_fatal ("host address [%p, +%d] is already mapped", (void *)h,
 		    (int)s);
 
@@ -296,7 +291,7 @@ acc_unmap_data (void *h)
   /* No need to call lazy open, as the address must have been mapped.  */
 
   size_t host_size;
-  splay_tree_key n = lookup_host (&acc_dev->mem_map, h, 1);
+  splay_tree_key n = lookup_host (acc_dev, h, 1);
   struct target_mem_desc *t;
 
   if (!n)
@@ -320,7 +315,7 @@ acc_unmap_data (void *h)
       t->tgt_end = 0;
       t->to_free = 0;
 
-      gomp_mutex_lock (&acc_dev->mem_map.lock);
+      gomp_mutex_lock (&acc_dev->lock);
 
       for (tp = NULL, t = acc_dev->openacc.data_environ; t != NULL;
 	   tp = t, t = t->prev)
@@ -334,7 +329,7 @@ acc_unmap_data (void *h)
 	    break;
 	  }
 
-      gomp_mutex_unlock (&acc_dev->mem_map.lock);
+      gomp_mutex_unlock (&acc_dev->lock);
     }
 
   gomp_unmap_vars (t, true);
@@ -358,7 +353,7 @@ present_create_copy (unsigned f, void *h, size_t s)
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
 
-  n = lookup_host (&acc_dev->mem_map, h, s);
+  n = lookup_host (acc_dev, h, s);
   if (n)
     {
       /* Present. */
@@ -389,13 +384,13 @@ present_create_copy (unsigned f, void *h, size_t s)
       tgt = gomp_map_vars (acc_dev, mapnum, &hostaddrs, NULL, &s, &kinds, true,
 			   false);
 
-      gomp_mutex_lock (&acc_dev->mem_map.lock);
+      gomp_mutex_lock (&acc_dev->lock);
 
       d = tgt->to_free;
       tgt->prev = acc_dev->openacc.data_environ;
       acc_dev->openacc.data_environ = tgt;
 
-      gomp_mutex_unlock (&acc_dev->mem_map.lock);
+      gomp_mutex_unlock (&acc_dev->lock);
     }
 
   return d;
@@ -436,7 +431,7 @@ delete_copyout (unsigned f, void *h, size_t s)
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
 
-  n = lookup_host (&acc_dev->mem_map, h, s);
+  n = lookup_host (acc_dev, h, s);
 
   /* No need to call lazy open, as the data must already have been
      mapped.  */
@@ -479,7 +474,7 @@ update_dev_host (int is_dev, void *h, size_t s)
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
 
-  n = lookup_host (&acc_dev->mem_map, h, s);
+  n = lookup_host (acc_dev, h, s);
 
   /* No need to call lazy open, as the data must already have been
      mapped.  */
@@ -532,7 +527,7 @@ gomp_acc_remove_pointer (void *h, bool force_copyfrom, int async, int mapnum)
   struct target_mem_desc *t;
   int minrefs = (mapnum == 1) ? 2 : 3;
 
-  n = lookup_host (&acc_dev->mem_map, h, 1);
+  n = lookup_host (acc_dev, h, 1);
 
   if (!n)
     gomp_fatal ("%p is not a mapped block", (void *)h);
@@ -543,7 +538,7 @@ gomp_acc_remove_pointer (void *h, bool force_copyfrom, int async, int mapnum)
 
   struct target_mem_desc *tp;
 
-  gomp_mutex_lock (&acc_dev->mem_map.lock);
+  gomp_mutex_lock (&acc_dev->lock);
 
   if (t->refcount == minrefs)
     {
@@ -570,7 +565,7 @@ gomp_acc_remove_pointer (void *h, bool force_copyfrom, int async, int mapnum)
   if (force_copyfrom)
     t->list[0]->copy_from = 1;
 
-  gomp_mutex_unlock (&acc_dev->mem_map.lock);
+  gomp_mutex_unlock (&acc_dev->lock);
 
   /* If running synchronously, unmap immediately.  */
   if (async < acc_async_noval)
diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
index 0c74f54..563f9bb 100644
--- a/libgomp/oacc-parallel.c
+++ b/libgomp/oacc-parallel.c
@@ -144,9 +144,9 @@ GOACC_parallel (int device, void (*fn) (void *),
     {
       k.host_start = (uintptr_t) fn;
       k.host_end = k.host_start + 1;
-      gomp_mutex_lock (&acc_dev->mem_map.lock);
-      tgt_fn_key = splay_tree_lookup (&acc_dev->mem_map.splay_tree, &k);
-      gomp_mutex_unlock (&acc_dev->mem_map.lock);
+      gomp_mutex_lock (&acc_dev->lock);
+      tgt_fn_key = splay_tree_lookup (&acc_dev->mem_map, &k);
+      gomp_mutex_unlock (&acc_dev->lock);
 
       if (tgt_fn_key == NULL)
 	gomp_fatal ("target function wasn't mapped");
diff --git a/libgomp/plugin/plugin-host.c b/libgomp/plugin/plugin-host.c
index ebf7f11..bc60f72 100644
--- a/libgomp/plugin/plugin-host.c
+++ b/libgomp/plugin/plugin-host.c
@@ -95,12 +95,6 @@ GOMP_OFFLOAD_get_num_devices (void)
 }
 
 STATIC void
-GOMP_OFFLOAD_register_image (void *host_table __attribute__ ((unused)),
-			     void *target_data __attribute__ ((unused)))
-{
-}
-
-STATIC void
 GOMP_OFFLOAD_init_device (int n __attribute__ ((unused)))
 {
 }
@@ -111,12 +105,19 @@ GOMP_OFFLOAD_fini_device (int n __attribute__ ((unused)))
 }
 
 STATIC int
-GOMP_OFFLOAD_get_table (int n __attribute__ ((unused)),
-			struct mapping_table **table __attribute__ ((unused)))
+GOMP_OFFLOAD_load_image (int n __attribute__ ((unused)),
+			 void *i __attribute__ ((unused)),
+			 struct addr_pair **r __attribute__ ((unused)))
 {
   return 0;
 }
 
+STATIC void
+GOMP_OFFLOAD_unload_image (int n __attribute__ ((unused)),
+			   void *i __attribute__ ((unused)))
+{
+}
+
 STATIC void *
 GOMP_OFFLOAD_openacc_open_device (int n)
 {
diff --git a/libgomp/target.c b/libgomp/target.c
index c5dda3f..fd9ba6d 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -49,6 +49,9 @@ static void gomp_target_init (void);
 /* The whole initialization code for offloading plugins is only run one.  */
 static pthread_once_t gomp_is_initialized = PTHREAD_ONCE_INIT;
 
+/* Mutex for offload image registration.  */
+static gomp_mutex_t register_lock;
+
 /* This structure describes an offload image.
    It contains type of the target device, pointer to host table descriptor, and
    pointer to target data.  */
@@ -67,14 +70,29 @@ static int num_offload_images;
 /* Array of descriptors for all available devices.  */
 static struct gomp_device_descr *devices;
 
-#ifdef PLUGIN_SUPPORT
 /* Total number of available devices.  */
 static int num_devices;
-#endif
 
 /* Number of GOMP_OFFLOAD_CAP_OPENMP_400 devices.  */
 static int num_devices_openmp;
 
+/* Similar to gomp_fatal, but release mutexes before.  */
+
+static void
+gomp_fatal_unlock (const char *fmt, ...)
+{
+  int i;
+  va_list list;
+
+  for (i = 0; i < num_devices; i++)
+    gomp_mutex_unlock (&devices[i].lock);
+  gomp_mutex_unlock (&register_lock);
+
+  va_start (list, fmt);
+  gomp_vfatal (fmt, list);
+  va_end (list);
+}
+
 /* The comparison function.  */
 
 attribute_hidden int
@@ -131,10 +149,10 @@ gomp_map_vars_existing (splay_tree_key oldn, splay_tree_key newn,
   if ((kind & GOMP_MAP_FLAG_FORCE)
       || oldn->host_start > newn->host_start
       || oldn->host_end < newn->host_end)
-    gomp_fatal ("Trying to map into device [%p..%p) object when "
-		"[%p..%p) is already mapped",
-		(void *) newn->host_start, (void *) newn->host_end,
-		(void *) oldn->host_start, (void *) oldn->host_end);
+    gomp_fatal_unlock ("Trying to map into device [%p..%p) object when "
+		       "[%p..%p) is already mapped",
+		       (void *) newn->host_start, (void *) newn->host_end,
+		       (void *) oldn->host_start, (void *) oldn->host_end);
   oldn->refcount++;
 }
 
@@ -153,14 +171,14 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
   size_t i, tgt_align, tgt_size, not_found_cnt = 0;
   const int rshift = is_openacc ? 8 : 3;
   const int typemask = is_openacc ? 0xff : 0x7;
-  struct gomp_memory_mapping *mm = &devicep->mem_map;
+  struct splay_tree_s *mem_map = &devicep->mem_map;
   struct splay_tree_key_s cur_node;
   struct target_mem_desc *tgt
     = gomp_malloc (sizeof (*tgt) + sizeof (tgt->list[0]) * mapnum);
   tgt->list_count = mapnum;
   tgt->refcount = 1;
   tgt->device_descr = devicep;
-  tgt->mem_map = mm;
+  tgt->mem_map = mem_map;
 
   if (mapnum == 0)
     return tgt;
@@ -174,7 +192,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
       tgt_size = mapnum * sizeof (void *);
     }
 
-  gomp_mutex_lock (&mm->lock);
+  gomp_mutex_lock (&devicep->lock);
 
   for (i = 0; i < mapnum; i++)
     {
@@ -189,7 +207,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	cur_node.host_end = cur_node.host_start + sizes[i];
       else
 	cur_node.host_end = cur_node.host_start + sizeof (void *);
-      splay_tree_key n = splay_tree_lookup (&mm->splay_tree, &cur_node);
+      splay_tree_key n = splay_tree_lookup (mem_map, &cur_node);
       if (n)
 	{
 	  tgt->list[i] = n;
@@ -228,7 +246,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
   if (devaddrs)
     {
       if (mapnum != 1)
-        gomp_fatal ("unexpected aggregation");
+        gomp_fatal_unlock ("unexpected aggregation");
       tgt->to_free = devaddrs[0];
       tgt->tgt_start = (uintptr_t) tgt->to_free;
       tgt->tgt_end = tgt->tgt_start + sizes[0];
@@ -274,7 +292,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	      k->host_end = k->host_start + sizes[i];
 	    else
 	      k->host_end = k->host_start + sizeof (void *);
-	    splay_tree_key n = splay_tree_lookup (&mm->splay_tree, k);
+	    splay_tree_key n = splay_tree_lookup (mem_map, k);
 	    if (n)
 	      {
 		tgt->list[i] = n;
@@ -294,7 +312,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 		tgt->refcount++;
 		array->left = NULL;
 		array->right = NULL;
-		splay_tree_insert (&mm->splay_tree, array);
+		splay_tree_insert (mem_map, array);
 		switch (kind & typemask)
 		  {
 		  case GOMP_MAP_ALLOC:
@@ -332,22 +350,22 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 		    /* Add bias to the pointer value.  */
 		    cur_node.host_start += sizes[i];
 		    cur_node.host_end = cur_node.host_start + 1;
-		    n = splay_tree_lookup (&mm->splay_tree, &cur_node);
+		    n = splay_tree_lookup (mem_map, &cur_node);
 		    if (n == NULL)
 		      {
 			/* Could be possibly zero size array section.  */
 			cur_node.host_end--;
-			n = splay_tree_lookup (&mm->splay_tree, &cur_node);
+			n = splay_tree_lookup (mem_map, &cur_node);
 			if (n == NULL)
 			  {
 			    cur_node.host_start--;
-			    n = splay_tree_lookup (&mm->splay_tree, &cur_node);
+			    n = splay_tree_lookup (mem_map, &cur_node);
 			    cur_node.host_start++;
 			  }
 		      }
 		    if (n == NULL)
-		      gomp_fatal ("Pointer target of array section "
-				  "wasn't mapped");
+		      gomp_fatal_unlock ("Pointer target of array section "
+					 "wasn't mapped");
 		    cur_node.host_start -= n->host_start;
 		    cur_node.tgt_offset = n->tgt->tgt_start + n->tgt_offset
 					  + cur_node.host_start;
@@ -400,24 +418,22 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 			  /* Add bias to the pointer value.  */
 			  cur_node.host_start += sizes[j];
 			  cur_node.host_end = cur_node.host_start + 1;
-			  n = splay_tree_lookup (&mm->splay_tree, &cur_node);
+			  n = splay_tree_lookup (mem_map, &cur_node);
 			  if (n == NULL)
 			    {
 			      /* Could be possibly zero size array section.  */
 			      cur_node.host_end--;
-			      n = splay_tree_lookup (&mm->splay_tree,
-						     &cur_node);
+			      n = splay_tree_lookup (mem_map, &cur_node);
 			      if (n == NULL)
 				{
 				  cur_node.host_start--;
-				  n = splay_tree_lookup (&mm->splay_tree,
-							 &cur_node);
+				  n = splay_tree_lookup (mem_map, &cur_node);
 				  cur_node.host_start++;
 				}
 			    }
 			  if (n == NULL)
-			    gomp_fatal ("Pointer target of array section "
-					"wasn't mapped");
+			    gomp_fatal_unlock ("Pointer target of array section"
+					       " wasn't mapped");
 			  cur_node.host_start -= n->host_start;
 			  cur_node.tgt_offset = n->tgt->tgt_start
 						+ n->tgt_offset
@@ -442,14 +458,15 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 			 was missing.  */
 		      size_t size = k->host_end - k->host_start;
 #ifdef HAVE_INTTYPES_H
-		      gomp_fatal ("present clause: !acc_is_present (%p, "
-				  "%"PRIu64" (0x%"PRIx64"))",
-				  (void *) k->host_start,
-				  (uint64_t) size, (uint64_t) size);
+		      gomp_fatal_unlock ("present clause: !acc_is_present (%p, "
+					 "%"PRIu64" (0x%"PRIx64"))",
+					 (void *) k->host_start,
+					 (uint64_t) size, (uint64_t) size);
 #else
-		      gomp_fatal ("present clause: !acc_is_present (%p, "
-				  "%lu (0x%lx))", (void *) k->host_start,
-				  (unsigned long) size, (unsigned long) size);
+		      gomp_fatal_unlock ("present clause: !acc_is_present (%p, "
+					 "%lu (0x%lx))", (void *) k->host_start,
+					 (unsigned long) size,
+					 (unsigned long) size);
 #endif
 		    }
 		    break;
@@ -463,8 +480,8 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 					    sizeof (void *));
 		    break;
 		  default:
-		    gomp_fatal ("%s: unhandled kind 0x%.2x", __FUNCTION__,
-				kind);
+		    gomp_fatal_unlock ("%s: unhandled kind 0x%.2x",
+				       __FUNCTION__, kind);
 		  }
 		array++;
 	      }
@@ -489,7 +506,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	}
     }
 
-  gomp_mutex_unlock (&mm->lock);
+  gomp_mutex_unlock (&devicep->lock);
   return tgt;
 }
 
@@ -514,10 +531,9 @@ attribute_hidden void
 gomp_copy_from_async (struct target_mem_desc *tgt)
 {
   struct gomp_device_descr *devicep = tgt->device_descr;
-  struct gomp_memory_mapping *mm = tgt->mem_map;
   size_t i;
 
-  gomp_mutex_lock (&mm->lock);
+  gomp_mutex_lock (&devicep->lock);
 
   for (i = 0; i < tgt->list_count; i++)
     if (tgt->list[i] == NULL)
@@ -536,7 +552,7 @@ gomp_copy_from_async (struct target_mem_desc *tgt)
 				  k->host_end - k->host_start);
       }
 
-  gomp_mutex_unlock (&mm->lock);
+  gomp_mutex_unlock (&devicep->lock);
 }
 
 /* Unmap variables described by TGT.  If DO_COPYFROM is true, copy relevant
@@ -547,7 +563,6 @@ attribute_hidden void
 gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
 {
   struct gomp_device_descr *devicep = tgt->device_descr;
-  struct gomp_memory_mapping *mm = tgt->mem_map;
 
   if (tgt->list_count == 0)
     {
@@ -555,7 +570,7 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
       return;
     }
 
-  gomp_mutex_lock (&mm->lock);
+  gomp_mutex_lock (&devicep->lock);
 
   size_t i;
   for (i = 0; i < tgt->list_count; i++)
@@ -572,7 +587,7 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
 	  devicep->dev2host_func (devicep->target_id, (void *) k->host_start,
 				  (void *) (k->tgt->tgt_start + k->tgt_offset),
 				  k->host_end - k->host_start);
-	splay_tree_remove (&mm->splay_tree, k);
+	splay_tree_remove (tgt->mem_map, k);
 	if (k->tgt->refcount > 1)
 	  k->tgt->refcount--;
 	else
@@ -584,13 +599,12 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
   else
     gomp_unmap_tgt (tgt);
 
-  gomp_mutex_unlock (&mm->lock);
+  gomp_mutex_unlock (&devicep->lock);
 }
 
 static void
-gomp_update (struct gomp_device_descr *devicep, struct gomp_memory_mapping *mm,
-	     size_t mapnum, void **hostaddrs, size_t *sizes, void *kinds,
-	     bool is_openacc)
+gomp_update (struct gomp_device_descr *devicep, size_t mapnum, void **hostaddrs,
+	     size_t *sizes, void *kinds, bool is_openacc)
 {
   size_t i;
   struct splay_tree_key_s cur_node;
@@ -602,25 +616,24 @@ gomp_update (struct gomp_device_descr *devicep, struct gomp_memory_mapping *mm,
   if (mapnum == 0)
     return;
 
-  gomp_mutex_lock (&mm->lock);
+  gomp_mutex_lock (&devicep->lock);
   for (i = 0; i < mapnum; i++)
     if (sizes[i])
       {
 	cur_node.host_start = (uintptr_t) hostaddrs[i];
 	cur_node.host_end = cur_node.host_start + sizes[i];
-	splay_tree_key n = splay_tree_lookup (&mm->splay_tree,
-					      &cur_node);
+	splay_tree_key n = splay_tree_lookup (&devicep->mem_map, &cur_node);
 	if (n)
 	  {
 	    int kind = get_kind (is_openacc, kinds, i);
 	    if (n->host_start > cur_node.host_start
 		|| n->host_end < cur_node.host_end)
-	      gomp_fatal ("Trying to update [%p..%p) object when"
-			  "only [%p..%p) is mapped",
-			  (void *) cur_node.host_start,
-			  (void *) cur_node.host_end,
-			  (void *) n->host_start,
-			  (void *) n->host_end);
+	      gomp_fatal_unlock ("Trying to update [%p..%p) object when "
+				 "only [%p..%p) is mapped",
+				 (void *) cur_node.host_start,
+				 (void *) cur_node.host_end,
+				 (void *) n->host_start,
+				 (void *) n->host_end);
 	    if (GOMP_MAP_COPY_TO_P (kind & typemask))
 	      devicep->host2dev_func (devicep->target_id,
 				      (void *) (n->tgt->tgt_start
@@ -639,14 +652,92 @@ gomp_update (struct gomp_device_descr *devicep, struct gomp_memory_mapping *mm,
 				      cur_node.host_end - cur_node.host_start);
 	  }
 	else
-	  gomp_fatal ("Trying to update [%p..%p) object that is not mapped",
-		      (void *) cur_node.host_start,
-		      (void *) cur_node.host_end);
+	  gomp_fatal_unlock ("Trying to update [%p..%p) object that is not "
+			     "mapped", (void *) cur_node.host_start,
+			     (void *) cur_node.host_end);
       }
-  gomp_mutex_unlock (&mm->lock);
+  gomp_mutex_unlock (&devicep->lock);
 }
 
-/* This function should be called from every offload image.
+/* Load image pointed by TARGET_DATA to the device, specified by DEVICEP.
+   And insert to splay tree the mapping between addresses from HOST_TABLE and
+   from loaded target image.  */
+
+static void
+gomp_offload_image_to_device (struct gomp_device_descr *devicep,
+			      void *host_table, void *target_data)
+{
+  void **host_func_table = ((void ***) host_table)[0];
+  void **host_funcs_end  = ((void ***) host_table)[1];
+  void **host_var_table  = ((void ***) host_table)[2];
+  void **host_vars_end   = ((void ***) host_table)[3];
+
+  /* The func table contains only addresses, the var table contains addresses
+     and corresponding sizes.  */
+  int num_funcs = host_funcs_end - host_func_table;
+  int num_vars  = (host_vars_end - host_var_table) / 2;
+
+  /* Load image to device and get target addresses for the image.  */
+  struct addr_pair *target_table = NULL;
+  int i, num_target_entries
+    = devicep->load_image_func (devicep->target_id, target_data, &target_table);
+
+  if (num_target_entries != num_funcs + num_vars)
+    gomp_fatal_unlock ("Can't map target functions or variables");
+
+  /* Insert host-target address mapping into splay tree.  */
+  struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt));
+  tgt->array = gomp_malloc ((num_funcs + num_vars) * sizeof (*tgt->array));
+  tgt->refcount = 1;
+  tgt->tgt_start = 0;
+  tgt->tgt_end = 0;
+  tgt->to_free = NULL;
+  tgt->prev = NULL;
+  tgt->list_count = 0;
+  tgt->device_descr = devicep;
+  splay_tree_node array = tgt->array;
+
+  for (i = 0; i < num_funcs; i++)
+    {
+      splay_tree_key k = &array->key;
+      k->host_start = (uintptr_t) host_func_table[i];
+      k->host_end = k->host_start + 1;
+      k->tgt = tgt;
+      k->tgt_offset = target_table[i].start;
+      k->refcount = 1;
+      k->async_refcount = 0;
+      k->copy_from = false;
+      array->left = NULL;
+      array->right = NULL;
+      splay_tree_insert (&devicep->mem_map, array);
+      array++;
+    }
+
+  for (i = 0; i < num_vars; i++)
+    {
+      struct addr_pair *target_var = &target_table[num_funcs + i];
+      if (target_var->end - target_var->start
+	  != (uintptr_t) host_var_table[i * 2 + 1])
+	gomp_fatal_unlock ("Can't map target variables (size mismatch)");
+
+      splay_tree_key k = &array->key;
+      k->host_start = (uintptr_t) host_var_table[i * 2];
+      k->host_end = k->host_start + (uintptr_t) host_var_table[i * 2 + 1];
+      k->tgt = tgt;
+      k->tgt_offset = target_var->start;
+      k->refcount = 1;
+      k->async_refcount = 0;
+      k->copy_from = false;
+      array->left = NULL;
+      array->right = NULL;
+      splay_tree_insert (&devicep->mem_map, array);
+      array++;
+    }
+
+  free (target_table);
+}
+
+/* This function should be called from every offload image while loading.
    It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of
    the target, and TARGET_DATA needed by target plugin.  */
 
@@ -654,6 +745,20 @@ void
 GOMP_offload_register (void *host_table, enum offload_target_type target_type,
 		       void *target_data)
 {
+  int i;
+  gomp_mutex_lock (&register_lock);
+
+  /* Load image to all initialized devices.  */
+  for (i = 0; i < num_devices; i++)
+    {
+      struct gomp_device_descr *devicep = &devices[i];
+      gomp_mutex_lock (&devicep->lock);
+      if (devicep->type == target_type && devicep->is_initialized)
+	gomp_offload_image_to_device (devicep, host_table, target_data);
+      gomp_mutex_unlock (&devicep->lock);
+    }
+
+  /* Insert image to array of pending images.  */
   offload_images = gomp_realloc (offload_images,
 				 (num_offload_images + 1)
 				 * sizeof (struct offload_image_descr));
@@ -663,74 +768,129 @@ GOMP_offload_register (void *host_table, enum offload_target_type target_type,
   offload_images[num_offload_images].target_data = target_data;
 
   num_offload_images++;
+  gomp_mutex_unlock (&register_lock);
 }
 
-/* This function initializes the target device, specified by DEVICEP.  DEVICEP
-   must be locked on entry, and remains locked on return.  */
+/* This function should be called from every offload image while unloading.
+   It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of
+   the target, and TARGET_DATA needed by target plugin.  */
 
-attribute_hidden void
-gomp_init_device (struct gomp_device_descr *devicep)
+void
+GOMP_offload_unregister (void *host_table, enum offload_target_type target_type,
+			 void *target_data)
 {
-  devicep->init_device_func (devicep->target_id);
-  devicep->is_initialized = true;
+  void **host_func_table = ((void ***) host_table)[0];
+  void **host_funcs_end  = ((void ***) host_table)[1];
+  void **host_var_table  = ((void ***) host_table)[2];
+  void **host_vars_end   = ((void ***) host_table)[3];
+  int i;
+
+  /* The func table contains only addresses, the var table contains addresses
+     and corresponding sizes.  */
+  int num_funcs = host_funcs_end - host_func_table;
+  int num_vars  = (host_vars_end - host_var_table) / 2;
+
+  gomp_mutex_lock (&register_lock);
+
+  /* Unload image from all initialized devices.  */
+  for (i = 0; i < num_devices; i++)
+    {
+      int j;
+      struct gomp_device_descr *devicep = &devices[i];
+      gomp_mutex_lock (&devicep->lock);
+      if (devicep->type != target_type || !devicep->is_initialized)
+	{
+	  gomp_mutex_unlock (&devicep->lock);
+	  continue;
+	}
+
+      devicep->unload_image_func (devicep->target_id, target_data);
+
+      /* Remove mapping from splay tree.  */
+      struct splay_tree_key_s k;
+      splay_tree_key node = NULL;
+      if (num_funcs > 0)
+	{
+	  k.host_start = (uintptr_t) host_func_table[0];
+	  k.host_end = k.host_start + 1;
+	  node = splay_tree_lookup (&devicep->mem_map, &k);
+	}
+      else if (num_vars > 0)
+	{
+	  k.host_start = (uintptr_t) host_var_table[0];
+	  k.host_end = k.host_start + (uintptr_t) host_var_table[1];
+	  node = splay_tree_lookup (&devicep->mem_map, &k);
+	}
+
+      for (j = 0; j < num_funcs; j++)
+	{
+	  k.host_start = (uintptr_t) host_func_table[j];
+	  k.host_end = k.host_start + 1;
+	  splay_tree_remove (&devicep->mem_map, &k);
+	}
+
+      for (j = 0; j < num_vars; j++)
+	{
+	  k.host_start = (uintptr_t) host_var_table[j * 2];
+	  k.host_end = k.host_start + (uintptr_t) host_var_table[j * 2 + 1];
+	  splay_tree_remove (&devicep->mem_map, &k);
+	}
+
+      if (node)
+	{
+	  free (node->tgt);
+	  free (node);
+	}
+
+      gomp_mutex_unlock (&devicep->lock);
+    }
+
+  /* Remove image from array of pending images.  */
+  for (i = 0; i < num_offload_images; i++)
+    if (offload_images[i].target_data == target_data)
+      {
+	offload_images[i] = offload_images[--num_offload_images];
+	break;
+      }
+
+  gomp_mutex_unlock (&register_lock);
 }
 
-/* Initialize address mapping tables.  MM must be locked on entry, and remains
-   locked on return.  */
+/* This function initializes the target device, specified by DEVICEP.  DEVICEP
+   must be locked on entry, and remains locked on return.  */
 
 attribute_hidden void
-gomp_init_tables (struct gomp_device_descr *devicep,
-		  struct gomp_memory_mapping *mm)
+gomp_init_device (struct gomp_device_descr *devicep)
 {
-  /* Get address mapping table for device.  */
-  struct mapping_table *table = NULL;
-  int num_entries = devicep->get_table_func (devicep->target_id, &table);
-
-  /* Insert host-target address mapping into dev_splay_tree.  */
   int i;
-  for (i = 0; i < num_entries; i++)
+  devicep->init_device_func (devicep->target_id);
+
+  /* Load to device all images registered by the moment.  */
+  for (i = 0; i < num_offload_images; i++)
     {
-      struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt));
-      tgt->refcount = 1;
-      tgt->array = gomp_malloc (sizeof (*tgt->array));
-      tgt->tgt_start = table[i].tgt_start;
-      tgt->tgt_end = table[i].tgt_end;
-      tgt->to_free = NULL;
-      tgt->list_count = 0;
-      tgt->device_descr = devicep;
-      splay_tree_node node = tgt->array;
-      splay_tree_key k = &node->key;
-      k->host_start = table[i].host_start;
-      k->host_end = table[i].host_end;
-      k->tgt_offset = 0;
-      k->refcount = 1;
-      k->copy_from = false;
-      k->tgt = tgt;
-      node->left = NULL;
-      node->right = NULL;
-      splay_tree_insert (&mm->splay_tree, node);
+      struct offload_image_descr *image = &offload_images[i];
+      if (image->type == devicep->type)
+	gomp_offload_image_to_device (devicep, image->host_table,
+				      image->target_data);
     }
 
-  free (table);
-  mm->is_initialized = true;
+  devicep->is_initialized = true;
 }
 
 /* Free address mapping tables.  MM must be locked on entry, and remains locked
    on return.  */
 
 attribute_hidden void
-gomp_free_memmap (struct gomp_memory_mapping *mm)
+gomp_free_memmap (struct splay_tree_s *mem_map)
 {
-  while (mm->splay_tree.root)
+  while (mem_map->root)
     {
-      struct target_mem_desc *tgt = mm->splay_tree.root->key.tgt;
+      struct target_mem_desc *tgt = mem_map->root->key.tgt;
 
-      splay_tree_remove (&mm->splay_tree, &mm->splay_tree.root->key);
+      splay_tree_remove (mem_map, &mem_map->root->key);
       free (tgt->array);
       free (tgt);
     }
-
-  mm->is_initialized = false;
 }
 
 /* This function de-initializes the target device, specified by DEVICEP.
@@ -791,22 +951,17 @@ GOMP_target (int device, void (*fn) (void *), const void *unused,
     fn_addr = (void *) fn;
   else
     {
-      struct gomp_memory_mapping *mm = &devicep->mem_map;
-      gomp_mutex_lock (&mm->lock);
-
-      if (!mm->is_initialized)
-	gomp_init_tables (devicep, mm);
-
+      gomp_mutex_lock (&devicep->lock);
       struct splay_tree_key_s k;
       k.host_start = (uintptr_t) fn;
       k.host_end = k.host_start + 1;
-      splay_tree_key tgt_fn = splay_tree_lookup (&mm->splay_tree, &k);
+      splay_tree_key tgt_fn = splay_tree_lookup (&devicep->mem_map, &k);
       if (tgt_fn == NULL)
-	gomp_fatal ("Target function wasn't mapped");
+	gomp_fatal_unlock ("Target function wasn't mapped");
 
-      gomp_mutex_unlock (&mm->lock);
+      gomp_mutex_unlock (&devicep->lock);
 
-      fn_addr = (void *) tgt_fn->tgt->tgt_start;
+      fn_addr = (void *) tgt_fn->tgt_offset;
     }
 
   struct target_mem_desc *tgt_vars
@@ -856,12 +1011,6 @@ GOMP_target_data (int device, const void *unused, size_t mapnum,
     gomp_init_device (devicep);
   gomp_mutex_unlock (&devicep->lock);
 
-  struct gomp_memory_mapping *mm = &devicep->mem_map;
-  gomp_mutex_lock (&mm->lock);
-  if (!mm->is_initialized)
-    gomp_init_tables (devicep, mm);
-  gomp_mutex_unlock (&mm->lock);
-
   struct target_mem_desc *tgt
     = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false,
 		     false);
@@ -897,13 +1046,7 @@ GOMP_target_update (int device, const void *unused, size_t mapnum,
     gomp_init_device (devicep);
   gomp_mutex_unlock (&devicep->lock);
 
-  struct gomp_memory_mapping *mm = &devicep->mem_map;
-  gomp_mutex_lock (&mm->lock);
-  if (!mm->is_initialized)
-    gomp_init_tables (devicep, mm);
-  gomp_mutex_unlock (&mm->lock);
-
-  gomp_update (devicep, mm, mapnum, hostaddrs, sizes, kinds, false);
+  gomp_update (devicep, mapnum, hostaddrs, sizes, kinds, false);
 }
 
 void
@@ -972,10 +1115,10 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
   DLSYM (get_caps);
   DLSYM (get_type);
   DLSYM (get_num_devices);
-  DLSYM (register_image);
   DLSYM (init_device);
   DLSYM (fini_device);
-  DLSYM (get_table);
+  DLSYM (load_image);
+  DLSYM (unload_image);
   DLSYM (alloc);
   DLSYM (free);
   DLSYM (dev2host);
@@ -1038,22 +1181,6 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
   return err == NULL;
 }
 
-/* This function adds a compatible offload image IMAGE to an accelerator device
-   DEVICE.  DEVICE must be locked on entry, and remains locked on return.  */
-
-static void
-gomp_register_image_for_device (struct gomp_device_descr *device,
-				struct offload_image_descr *image)
-{
-  if (!device->offload_regions_registered
-      && (device->type == image->type
-	  || device->type == OFFLOAD_TARGET_TYPE_HOST))
-    {
-      device->register_image_func (image->host_table, image->target_data);
-      device->offload_regions_registered = true;
-    }
-}
-
 /* This function initializes the runtime needed for offloading.
    It parses the list of offload targets and tries to load the plugins for
    these targets.  On return, the variables NUM_DEVICES and NUM_DEVICES_OPENMP
@@ -1112,17 +1239,14 @@ gomp_target_init (void)
 		current_device.name = current_device.get_name_func ();
 		/* current_device.capabilities has already been set.  */
 		current_device.type = current_device.get_type_func ();
-		current_device.mem_map.is_initialized = false;
-		current_device.mem_map.splay_tree.root = NULL;
+		current_device.mem_map.root = NULL;
 		current_device.is_initialized = false;
-		current_device.offload_regions_registered = false;
 		current_device.openacc.data_environ = NULL;
 		current_device.openacc.target_data = NULL;
 		for (i = 0; i < new_num_devices; i++)
 		  {
 		    current_device.target_id = i;
 		    devices[num_devices] = current_device;
-		    gomp_mutex_init (&devices[num_devices].mem_map.lock);
 		    gomp_mutex_init (&devices[num_devices].lock);
 		    num_devices++;
 		  }
@@ -1157,21 +1281,12 @@ gomp_target_init (void)
 
   for (i = 0; i < num_devices; i++)
     {
-      int j;
-
-      for (j = 0; j < num_offload_images; j++)
-	gomp_register_image_for_device (&devices[i], &offload_images[j]);
-
       /* The 'devices' array can be moved (by the realloc call) until we have
 	 found all the plugins, so registering with the OpenACC runtime (which
 	 takes a copy of the pointer argument) must be delayed until now.  */
       if (devices[i].capabilities & GOMP_OFFLOAD_CAP_OPENACC_200)
 	goacc_register (&devices[i]);
     }
-
-  free (offload_images);
-  offload_images = NULL;
-  num_offload_images = 0;
 }
 
 #else /* PLUGIN_SUPPORT */
diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
index 3e7a958..a2d61b1 100644
--- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
+++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
@@ -34,6 +34,7 @@
 #include <string.h>
 #include <utility>
 #include <vector>
+#include <map>
 #include "libgomp-plugin.h"
 #include "compiler_if_host.h"
 #include "main_target_image.h"
@@ -53,6 +54,29 @@ fprintf (stderr, "\n");					    \
 #endif
 
 
+/* Start/end addresses of functions and global variables on a device.  */
+typedef std::vector<addr_pair> AddrVect;
+
+/* Addresses for one image and all devices.  */
+typedef std::vector<AddrVect> DevAddrVect;
+
+/* Addresses for all images and all devices.  */
+typedef std::map<void *, DevAddrVect> ImgDevAddrMap;
+
+
+/* Total number of available devices.  */
+static int num_devices;
+
+/* Total number of shared libraries with offloading to Intel MIC.  */
+static int num_images;
+
+/* Two dimensional array: one key is a pointer to image,
+   second key is number of device.  Contains a vector of pointer pairs.  */
+static ImgDevAddrMap *address_table;
+
+/* Thread-safe registration of the main image.  */
+static pthread_once_t main_image_is_registered = PTHREAD_ONCE_INIT;
+
 static VarDesc vd_host2tgt = {
   { 1, 1 },		      /* dst, src			      */
   { 1, 0 },		      /* in, out			      */
@@ -90,28 +114,17 @@ static VarDesc vd_tgt2host = {
 };
 
 
-/* Total number of shared libraries with offloading to Intel MIC.  */
-static int num_libraries;
-
-/* Pointers to the descriptors, containing pointers to host-side tables and to
-   target images.  */
-static std::vector< std::pair<void *, void *> > lib_descrs;
-
-/* Thread-safe registration of the main image.  */
-static pthread_once_t main_image_is_registered = PTHREAD_ONCE_INIT;
-
-
 /* Add path specified in LD_LIBRARY_PATH to MIC_LD_LIBRARY_PATH, which is
    required by liboffloadmic.  */
 __attribute__((constructor))
 static void
-set_mic_lib_path (void)
+init (void)
 {
   const char *ld_lib_path = getenv (LD_LIBRARY_PATH_ENV);
   const char *mic_lib_path = getenv (MIC_LD_LIBRARY_PATH_ENV);
 
   if (!ld_lib_path)
-    return;
+    goto out;
 
   if (!mic_lib_path)
     setenv (MIC_LD_LIBRARY_PATH_ENV, ld_lib_path, 1);
@@ -133,6 +146,10 @@ set_mic_lib_path (void)
       if (!use_alloca)
 	free (mic_lib_path_new);
     }
+
+out:
+  address_table = new ImgDevAddrMap;
+  num_devices = _Offload_number_of_devices ();
 }
 
 extern "C" const char *
@@ -162,18 +179,8 @@ GOMP_OFFLOAD_get_type (void)
 extern "C" int
 GOMP_OFFLOAD_get_num_devices (void)
 {
-  int res = _Offload_number_of_devices ();
-  TRACE ("(): return %d", res);
-  return res;
-}
-
-/* This should be called from every shared library with offloading.  */
-extern "C" void
-GOMP_OFFLOAD_register_image (void *host_table, void *target_image)
-{
-  TRACE ("(host_table = %p, target_image = %p)", host_table, target_image);
-  lib_descrs.push_back (std::make_pair (host_table, target_image));
-  num_libraries++;
+  TRACE ("(): return %d", num_devices);
+  return num_devices;
 }
 
 static void
@@ -196,7 +203,8 @@ register_main_image ()
   __offload_register_image (&main_target_image);
 }
 
-/* Load offload_target_main on target.  */
+/* liboffloadmic loads and runs offload_target_main on all available devices
+   during a first call to offload ().  */
 extern "C" void
 GOMP_OFFLOAD_init_device (int device)
 {
@@ -243,9 +251,11 @@ get_target_table (int device, int &num_funcs, int &num_vars, void **&table)
     }
 }
 
+/* Offload TARGET_IMAGE to all available devices and fill address_table with
+   corresponding target addresses.  */
+
 static void
-load_lib_and_get_table (int device, int lib_num, mapping_table *&table,
-			int &table_size)
+offload_image (void *target_image)
 {
   struct TargetImage {
     int64_t size;
@@ -254,19 +264,11 @@ load_lib_and_get_table (int device, int lib_num, mapping_table *&table,
     char data[];
   } __attribute__ ((packed));
 
-  void ***host_table_descr = (void ***) lib_descrs[lib_num].first;
-  void **host_func_start = host_table_descr[0];
-  void **host_func_end   = host_table_descr[1];
-  void **host_var_start  = host_table_descr[2];
-  void **host_var_end    = host_table_descr[3];
+  void *image_start = ((void **) target_image)[0];
+  void *image_end   = ((void **) target_image)[1];
 
-  void **target_image_descr = (void **) lib_descrs[lib_num].second;
-  void *image_start = target_image_descr[0];
-  void *image_end   = target_image_descr[1];
-
-  TRACE ("() host_table_descr { %p, %p, %p, %p }", host_func_start,
-	 host_func_end, host_var_start, host_var_end);
-  TRACE ("() target_image_descr { %p, %p }", image_start, image_end);
+  TRACE ("(target_image = %p { %p, %p })",
+	 target_image, image_start, image_end);
 
   int64_t image_size = (uintptr_t) image_end - (uintptr_t) image_start;
   TargetImage *image
@@ -279,94 +281,87 @@ load_lib_and_get_table (int device, int lib_num, mapping_table *&table,
     }
 
   image->size = image_size;
-  sprintf (image->name, "lib%010d.so", lib_num);
+  sprintf (image->name, "lib%010d.so", num_images++);
   memcpy (image->data, image_start, image->size);
 
   TRACE ("() __offload_register_image %s { %p, %d }",
 	 image->name, image_start, image->size);
   __offload_register_image (image);
 
-  int tgt_num_funcs = 0;
-  int tgt_num_vars = 0;
-  void **tgt_table = NULL;
-  get_target_table (device, tgt_num_funcs, tgt_num_vars, tgt_table);
-  free (image);
-
-  /* The func table contains only addresses, the var table contains addresses
-     and corresponding sizes.  */
-  int host_num_funcs = host_func_end - host_func_start;
-  int host_num_vars  = (host_var_end - host_var_start) / 2;
-  TRACE ("() host_num_funcs = %d, tgt_num_funcs = %d",
-	 host_num_funcs, tgt_num_funcs);
-  TRACE ("() host_num_vars = %d, tgt_num_vars = %d",
-	 host_num_vars, tgt_num_vars);
-  if (host_num_funcs != tgt_num_funcs)
+  /* Receive tables for target_image from all devices.  */
+  DevAddrVect dev_table;
+  for (int dev = 0; dev < num_devices; dev++)
     {
-      fprintf (stderr, "%s: Can't map target functions\n", __FILE__);
-      exit (1);
-    }
-  if (host_num_vars != tgt_num_vars)
-    {
-      fprintf (stderr, "%s: Can't map target variables\n", __FILE__);
-      exit (1);
-    }
+      int num_funcs = 0;
+      int num_vars = 0;
+      void **table = NULL;
 
-  table = (mapping_table *) realloc (table, (table_size + host_num_funcs
-					     + host_num_vars)
-					    * sizeof (mapping_table));
-  if (table == NULL)
-    {
-      fprintf (stderr, "%s: Can't allocate memory\n", __FILE__);
-      exit (1);
-    }
+      get_target_table (dev, num_funcs, num_vars, table);
 
-  for (int i = 0; i < host_num_funcs; i++)
-    {
-      mapping_table t;
-      t.host_start = (uintptr_t) host_func_start[i];
-      t.host_end = t.host_start + 1;
-      t.tgt_start = (uintptr_t) tgt_table[i];
-      t.tgt_end = t.tgt_start + 1;
-
-      TRACE ("() lib %d, func %d:\t0x%llx -- 0x%llx",
-	     lib_num, i, t.host_start, t.tgt_start);
-
-      table[table_size++] = t;
-    }
+      AddrVect curr_dev_table;
 
-  for (int i = 0; i < host_num_vars * 2; i += 2)
-    {
-      mapping_table t;
-      t.host_start = (uintptr_t) host_var_start[i];
-      t.host_end = t.host_start + (uintptr_t) host_var_start[i+1];
-      t.tgt_start = (uintptr_t) tgt_table[tgt_num_funcs+i];
-      t.tgt_end = t.tgt_start + (uintptr_t) tgt_table[tgt_num_funcs+i+1];
+      for (int i = 0; i < num_funcs; i++)
+	{
+	  addr_pair tgt_addr;
+	  tgt_addr.start = (uintptr_t) table[i];
+	  tgt_addr.end = tgt_addr.start + 1;
+	  TRACE ("() func %d:\t0x%llx..0x%llx", i,
+		 tgt_addr.start, tgt_addr.end);
+	  curr_dev_table.push_back (tgt_addr);
+	}
 
-      TRACE ("() lib %d, var %d:\t0x%llx (%d) -- 0x%llx (%d)", lib_num, i/2,
-	     t.host_start, t.host_end - t.host_start,
-	     t.tgt_start, t.tgt_end - t.tgt_start);
+      for (int i = 0; i < num_vars; i++)
+	{
+	  addr_pair tgt_addr;
+	  tgt_addr.start = (uintptr_t) table[num_funcs+i*2];
+	  tgt_addr.end = tgt_addr.start + (uintptr_t) table[num_funcs+i*2+1];
+	  TRACE ("() var %d:\t0x%llx..0x%llx", i, tgt_addr.start, tgt_addr.end);
+	  curr_dev_table.push_back (tgt_addr);
+	}
 
-      table[table_size++] = t;
+      dev_table.push_back (curr_dev_table);
     }
 
-  delete [] tgt_table;
+  address_table->insert (std::make_pair (target_image, dev_table));
+
+  free (image);
 }
 
 extern "C" int
-GOMP_OFFLOAD_get_table (int device, void *result)
+GOMP_OFFLOAD_load_image (int device, void *target_image, addr_pair **result)
 {
-  TRACE ("(num_libraries = %d)", num_libraries);
+  TRACE ("(device = %d, target_image = %p)", device, target_image);
 
-  mapping_table *table = NULL;
-  int table_size = 0;
+  /* If target_image is already present in address_table, then there is no need
+     to offload it.  */
+  if (address_table->count (target_image) == 0)
+    offload_image (target_image);
 
-  for (int i = 0; i < num_libraries; i++)
-    load_lib_and_get_table (device, i, table, table_size);
+  AddrVect *curr_dev_table = &(*address_table)[target_image][device];
+  int table_size = curr_dev_table->size ();
+  addr_pair *table = (addr_pair *) malloc (table_size * sizeof (addr_pair));
+  if (table == NULL)
+    {
+      fprintf (stderr, "%s: Can't allocate memory\n", __FILE__);
+      exit (1);
+    }
 
-  *(void **) result = table;
+  std::copy (curr_dev_table->begin (), curr_dev_table->end (), table);
+  *result = table;
   return table_size;
 }
 
+extern "C" void
+GOMP_OFFLOAD_unload_image (int device, void *target_image)
+{
+  TRACE ("(device = %d, target_image = %p)", device, target_image);
+
+  /* TODO: Currently liboffloadmic doesn't support __offload_unregister_image
+     for libraries.  */
+
+  address_table->erase (target_image);
+}
+
 extern "C" void *
 GOMP_OFFLOAD_alloc (int device, size_t size)
 {


  -- Ilya

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
  2015-03-31 23:53                                           ` Ilya Verbin
@ 2015-04-01  5:21                                             ` Jakub Jelinek
  2015-04-01 13:14                                               ` Ilya Verbin
  0 siblings, 1 reply; 92+ messages in thread
From: Jakub Jelinek @ 2015-04-01  5:21 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: Julian Brown, Thomas Schwinge, gcc-patches, Kirill Yukhin

On Wed, Apr 01, 2015 at 02:53:28AM +0300, Ilya Verbin wrote:
> +/* Similar to gomp_fatal, but release mutexes before.  */
> +
> +static void
> +gomp_fatal_unlock (const char *fmt, ...)
> +{
> +  int i;
> +  va_list list;
> +
> +  for (i = 0; i < num_devices; i++)
> +    gomp_mutex_unlock (&devices[i].lock);

This is wrong.  Calling gomp_mutex_unlock on a lock that you don't have
locked is undefined behavior.
You really should unlock it in the caller which should be aware which 0/1/2
locks it holds.

> +  gomp_mutex_unlock (&register_lock);

	Jakub

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
  2015-04-01  5:21                                             ` Jakub Jelinek
@ 2015-04-01 13:14                                               ` Ilya Verbin
  2015-04-01 13:20                                                 ` Jakub Jelinek
  0 siblings, 1 reply; 92+ messages in thread
From: Ilya Verbin @ 2015-04-01 13:14 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Julian Brown, Thomas Schwinge, gcc-patches, Kirill Yukhin

On Wed, Apr 01, 2015 at 07:21:47 +0200, Jakub Jelinek wrote:
> On Wed, Apr 01, 2015 at 02:53:28AM +0300, Ilya Verbin wrote:
> > +/* Similar to gomp_fatal, but release mutexes before.  */
> > +
> > +static void
> > +gomp_fatal_unlock (const char *fmt, ...)
> > +{
> > +  int i;
> > +  va_list list;
> > +
> > +  for (i = 0; i < num_devices; i++)
> > +    gomp_mutex_unlock (&devices[i].lock);
> 
> This is wrong.  Calling gomp_mutex_unlock on a lock that you don't have
> locked is undefined behavior.
> You really should unlock it in the caller which should be aware which 0/1/2
> locks it holds.

I was worried about the following scenario:
1. Thread 1 in GOMP_target locks device 1.
2. Thread 2 in GOMP_target locks device 2 and calls gomp_fatal.
3. GOMP_offload_unregister will wait for device 1, even device 2 is unlocked.
Anyway, it was a bad idea to unlock mutexes from non-owner thread.

Here is patch, which unlocks proper mutexes in the caller, as you suggested.
make check-target-libgomp passed.


diff --git a/gcc/config/i386/intelmic-mkoffload.c b/gcc/config/i386/intelmic-mkoffload.c
index f93007c..e101f93 100644
--- a/gcc/config/i386/intelmic-mkoffload.c
+++ b/gcc/config/i386/intelmic-mkoffload.c
@@ -350,14 +350,24 @@ generate_host_descr_file (const char *host_compiler)
 	   "#ifdef __cplusplus\n"
 	   "extern \"C\"\n"
 	   "#endif\n"
-	   "void GOMP_offload_register (void *, int, void *);\n\n"
+	   "void GOMP_offload_register (void *, int, void *);\n"
+	   "void GOMP_offload_unregister (void *, int, void *);\n\n"
 
 	   "__attribute__((constructor))\n"
 	   "static void\n"
 	   "init (void)\n"
 	   "{\n"
 	   "  GOMP_offload_register (&__OFFLOAD_TABLE__, %d, __offload_target_data);\n"
+	   "}\n\n", GOMP_DEVICE_INTEL_MIC);
+
+  fprintf (src_file,
+	   "__attribute__((destructor))\n"
+	   "static void\n"
+	   "fini (void)\n"
+	   "{\n"
+	   "  GOMP_offload_unregister (&__OFFLOAD_TABLE__, %d, __offload_target_data);\n"
 	   "}\n", GOMP_DEVICE_INTEL_MIC);
+
   fclose (src_file);
 
   unsigned new_argc = 0;
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index d9cbff5..1072ae4 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -51,14 +51,12 @@ enum offload_target_type
   OFFLOAD_TARGET_TYPE_INTEL_MIC = 6
 };
 
-/* Auxiliary struct, used for transferring a host-target address range mapping
-   from plugin to libgomp.  */
-struct mapping_table
+/* Auxiliary struct, used for transferring pairs of addresses from plugin
+   to libgomp.  */
+struct addr_pair
 {
-  uintptr_t host_start;
-  uintptr_t host_end;
-  uintptr_t tgt_start;
-  uintptr_t tgt_end;
+  uintptr_t start;
+  uintptr_t end;
 };
 
 /* Miscellaneous functions.  */
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 3089401..a1d42c5 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -224,7 +224,6 @@ struct gomp_team_state
 };
 
 struct target_mem_desc;
-struct gomp_memory_mapping;
 
 /* These are the OpenMP 4.0 Internal Control Variables described in
    section 2.3.1.  Those described as having one copy per task are
@@ -657,7 +656,7 @@ struct target_mem_desc {
   struct gomp_device_descr *device_descr;
 
   /* Memory mapping info for the thread that created this descriptor.  */
-  struct gomp_memory_mapping *mem_map;
+  struct splay_tree_s *mem_map;
 
   /* List of splay keys to remove (or decrease refcount)
      at the end of region.  */
@@ -683,20 +682,6 @@ struct splay_tree_key_s {
 
 #include "splay-tree.h"
 
-/* Information about mapped memory regions (per device/context).  */
-
-struct gomp_memory_mapping
-{
-  /* Mutex for operating with the splay tree and other shared structures.  */
-  gomp_mutex_t lock;
-
-  /* True when tables have been added to this memory map.  */
-  bool is_initialized;
-
-  /* Splay tree containing information about mapped memory regions.  */
-  struct splay_tree_s splay_tree;
-};
-
 typedef struct acc_dispatch_t
 {
   /* This is a linked list of data mapped using the
@@ -773,19 +758,18 @@ struct gomp_device_descr
   unsigned int (*get_caps_func) (void);
   int (*get_type_func) (void);
   int (*get_num_devices_func) (void);
-  void (*register_image_func) (void *, void *);
   void (*init_device_func) (int);
   void (*fini_device_func) (int);
-  int (*get_table_func) (int, struct mapping_table **);
+  int (*load_image_func) (int, void *, struct addr_pair **);
+  void (*unload_image_func) (int, void *);
   void *(*alloc_func) (int, size_t);
   void (*free_func) (int, void *);
   void *(*dev2host_func) (int, void *, const void *, size_t);
   void *(*host2dev_func) (int, void *, const void *, size_t);
   void (*run_func) (int, void *, void *);
 
-  /* Memory-mapping info for this device instance.  */
-  /* Uses a separate lock.  */
-  struct gomp_memory_mapping mem_map;
+  /* Splay tree containing information about mapped memory regions.  */
+  struct splay_tree_s mem_map;
 
   /* Mutex for the mutable data.  */
   gomp_mutex_t lock;
@@ -793,9 +777,6 @@ struct gomp_device_descr
   /* Set to true when device is initialized.  */
   bool is_initialized;
 
-  /* True when offload regions have been registered with this device.  */
-  bool offload_regions_registered;
-
   /* OpenACC-specific data and functions.  */
   /* This is mutable because of its mutable data_environ and target_data
      members.  */
@@ -811,9 +792,7 @@ extern struct target_mem_desc *gomp_map_vars (struct gomp_device_descr *,
 extern void gomp_copy_from_async (struct target_mem_desc *);
 extern void gomp_unmap_vars (struct target_mem_desc *, bool);
 extern void gomp_init_device (struct gomp_device_descr *);
-extern void gomp_init_tables (struct gomp_device_descr *,
-			      struct gomp_memory_mapping *);
-extern void gomp_free_memmap (struct gomp_memory_mapping *);
+extern void gomp_free_memmap (struct splay_tree_s *);
 extern void gomp_fini_device (struct gomp_device_descr *);
 
 /* work.c */
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index f44174e..2b2b953 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -231,6 +231,7 @@ GOMP_4.0 {
 GOMP_4.0.1 {
   global:
 	GOMP_offload_register;
+	GOMP_offload_unregister;
 } GOMP_4.0;
 
 OACC_2.0 {
diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index 6aeb1e7..e4756b6 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -43,20 +43,18 @@ static struct gomp_device_descr host_dispatch =
     .get_caps_func = GOMP_OFFLOAD_get_caps,
     .get_type_func = GOMP_OFFLOAD_get_type,
     .get_num_devices_func = GOMP_OFFLOAD_get_num_devices,
-    .register_image_func = GOMP_OFFLOAD_register_image,
     .init_device_func = GOMP_OFFLOAD_init_device,
     .fini_device_func = GOMP_OFFLOAD_fini_device,
-    .get_table_func = GOMP_OFFLOAD_get_table,
+    .load_image_func = GOMP_OFFLOAD_load_image,
+    .unload_image_func = GOMP_OFFLOAD_unload_image,
     .alloc_func = GOMP_OFFLOAD_alloc,
     .free_func = GOMP_OFFLOAD_free,
     .dev2host_func = GOMP_OFFLOAD_dev2host,
     .host2dev_func = GOMP_OFFLOAD_host2dev,
     .run_func = GOMP_OFFLOAD_run,
 
-    .mem_map.is_initialized = false,
-    .mem_map.splay_tree.root = NULL,
+    .mem_map.root = NULL,
     .is_initialized = false,
-    .offload_regions_registered = false,
 
     .openacc = {
       .open_device_func = GOMP_OFFLOAD_openacc_open_device,
@@ -94,7 +92,6 @@ static struct gomp_device_descr host_dispatch =
 static __attribute__ ((constructor))
 void goacc_host_init (void)
 {
-  gomp_mutex_init (&host_dispatch.mem_map.lock);
   gomp_mutex_init (&host_dispatch.lock);
   goacc_register (&host_dispatch);
 }
diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
index 166eb55..1e0243e 100644
--- a/libgomp/oacc-init.c
+++ b/libgomp/oacc-init.c
@@ -284,12 +284,6 @@ lazy_open (int ord)
     = acc_dev->openacc.create_thread_data_func (acc_dev->openacc.target_data);
 
   acc_dev->openacc.async_set_async_func (acc_async_sync);
-
-  struct gomp_memory_mapping *mem_map = &acc_dev->mem_map;
-  gomp_mutex_lock (&mem_map->lock);
-  if (!mem_map->is_initialized)
-    gomp_init_tables (acc_dev, mem_map);
-  gomp_mutex_unlock (&mem_map->lock);
 }
 
 /* OpenACC 2.0a (3.2.12, 3.2.13) doesn't specify whether the serialization of
@@ -351,10 +345,9 @@ acc_shutdown_1 (acc_device_t d)
 
 	  walk->dev->openacc.target_data = target_data = NULL;
 
-	  struct gomp_memory_mapping *mem_map = &walk->dev->mem_map;
-	  gomp_mutex_lock (&mem_map->lock);
-	  gomp_free_memmap (mem_map);
-	  gomp_mutex_unlock (&mem_map->lock);
+	  gomp_mutex_lock (&walk->dev->lock);
+	  gomp_free_memmap (&walk->dev->mem_map);
+	  gomp_mutex_unlock (&walk->dev->lock);
 
 	  walk->dev = NULL;
 	}
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 0096d51..fdc82e6 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -38,7 +38,7 @@
 /* Return block containing [H->S), or NULL if not contained.  */
 
 static splay_tree_key
-lookup_host (struct gomp_memory_mapping *mem_map, void *h, size_t s)
+lookup_host (struct gomp_device_descr *dev, void *h, size_t s)
 {
   struct splay_tree_key_s node;
   splay_tree_key key;
@@ -46,11 +46,9 @@ lookup_host (struct gomp_memory_mapping *mem_map, void *h, size_t s)
   node.host_start = (uintptr_t) h;
   node.host_end = (uintptr_t) h + s;
 
-  gomp_mutex_lock (&mem_map->lock);
-
-  key = splay_tree_lookup (&mem_map->splay_tree, &node);
-
-  gomp_mutex_unlock (&mem_map->lock);
+  gomp_mutex_lock (&dev->lock);
+  key = splay_tree_lookup (&dev->mem_map, &node);
+  gomp_mutex_unlock (&dev->lock);
 
   return key;
 }
@@ -65,14 +63,11 @@ lookup_dev (struct target_mem_desc *tgt, void *d, size_t s)
 {
   int i;
   struct target_mem_desc *t;
-  struct gomp_memory_mapping *mem_map;
 
   if (!tgt)
     return NULL;
 
-  mem_map = tgt->mem_map;
-
-  gomp_mutex_lock (&mem_map->lock);
+  gomp_mutex_lock (&tgt->device_descr->lock);
 
   for (t = tgt; t != NULL; t = t->prev)
     {
@@ -80,7 +75,7 @@ lookup_dev (struct target_mem_desc *tgt, void *d, size_t s)
         break;
     }
 
-  gomp_mutex_unlock (&mem_map->lock);
+  gomp_mutex_unlock (&tgt->device_descr->lock);
 
   if (!t)
     return NULL;
@@ -176,7 +171,7 @@ acc_deviceptr (void *h)
 
   struct goacc_thread *thr = goacc_thread ();
 
-  n = lookup_host (&thr->dev->mem_map, h, 1);
+  n = lookup_host (thr->dev, h, 1);
 
   if (!n)
     return NULL;
@@ -229,7 +224,7 @@ acc_is_present (void *h, size_t s)
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
 
-  n = lookup_host (&acc_dev->mem_map, h, s);
+  n = lookup_host (acc_dev, h, s);
 
   if (n && ((uintptr_t)h < n->host_start
 	    || (uintptr_t)h + s > n->host_end
@@ -271,7 +266,7 @@ acc_map_data (void *h, void *d, size_t s)
 	gomp_fatal ("[%p,+%d]->[%p,+%d] is a bad map",
                     (void *)h, (int)s, (void *)d, (int)s);
 
-      if (lookup_host (&acc_dev->mem_map, h, s))
+      if (lookup_host (acc_dev, h, s))
 	gomp_fatal ("host address [%p, +%d] is already mapped", (void *)h,
 		    (int)s);
 
@@ -296,7 +291,7 @@ acc_unmap_data (void *h)
   /* No need to call lazy open, as the address must have been mapped.  */
 
   size_t host_size;
-  splay_tree_key n = lookup_host (&acc_dev->mem_map, h, 1);
+  splay_tree_key n = lookup_host (acc_dev, h, 1);
   struct target_mem_desc *t;
 
   if (!n)
@@ -320,7 +315,7 @@ acc_unmap_data (void *h)
       t->tgt_end = 0;
       t->to_free = 0;
 
-      gomp_mutex_lock (&acc_dev->mem_map.lock);
+      gomp_mutex_lock (&acc_dev->lock);
 
       for (tp = NULL, t = acc_dev->openacc.data_environ; t != NULL;
 	   tp = t, t = t->prev)
@@ -334,7 +329,7 @@ acc_unmap_data (void *h)
 	    break;
 	  }
 
-      gomp_mutex_unlock (&acc_dev->mem_map.lock);
+      gomp_mutex_unlock (&acc_dev->lock);
     }
 
   gomp_unmap_vars (t, true);
@@ -358,7 +353,7 @@ present_create_copy (unsigned f, void *h, size_t s)
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
 
-  n = lookup_host (&acc_dev->mem_map, h, s);
+  n = lookup_host (acc_dev, h, s);
   if (n)
     {
       /* Present. */
@@ -389,13 +384,13 @@ present_create_copy (unsigned f, void *h, size_t s)
       tgt = gomp_map_vars (acc_dev, mapnum, &hostaddrs, NULL, &s, &kinds, true,
 			   false);
 
-      gomp_mutex_lock (&acc_dev->mem_map.lock);
+      gomp_mutex_lock (&acc_dev->lock);
 
       d = tgt->to_free;
       tgt->prev = acc_dev->openacc.data_environ;
       acc_dev->openacc.data_environ = tgt;
 
-      gomp_mutex_unlock (&acc_dev->mem_map.lock);
+      gomp_mutex_unlock (&acc_dev->lock);
     }
 
   return d;
@@ -436,7 +431,7 @@ delete_copyout (unsigned f, void *h, size_t s)
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
 
-  n = lookup_host (&acc_dev->mem_map, h, s);
+  n = lookup_host (acc_dev, h, s);
 
   /* No need to call lazy open, as the data must already have been
      mapped.  */
@@ -479,7 +474,7 @@ update_dev_host (int is_dev, void *h, size_t s)
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
 
-  n = lookup_host (&acc_dev->mem_map, h, s);
+  n = lookup_host (acc_dev, h, s);
 
   /* No need to call lazy open, as the data must already have been
      mapped.  */
@@ -532,7 +527,7 @@ gomp_acc_remove_pointer (void *h, bool force_copyfrom, int async, int mapnum)
   struct target_mem_desc *t;
   int minrefs = (mapnum == 1) ? 2 : 3;
 
-  n = lookup_host (&acc_dev->mem_map, h, 1);
+  n = lookup_host (acc_dev, h, 1);
 
   if (!n)
     gomp_fatal ("%p is not a mapped block", (void *)h);
@@ -543,7 +538,7 @@ gomp_acc_remove_pointer (void *h, bool force_copyfrom, int async, int mapnum)
 
   struct target_mem_desc *tp;
 
-  gomp_mutex_lock (&acc_dev->mem_map.lock);
+  gomp_mutex_lock (&acc_dev->lock);
 
   if (t->refcount == minrefs)
     {
@@ -570,7 +565,7 @@ gomp_acc_remove_pointer (void *h, bool force_copyfrom, int async, int mapnum)
   if (force_copyfrom)
     t->list[0]->copy_from = 1;
 
-  gomp_mutex_unlock (&acc_dev->mem_map.lock);
+  gomp_mutex_unlock (&acc_dev->lock);
 
   /* If running synchronously, unmap immediately.  */
   if (async < acc_async_noval)
diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
index 0c74f54..563f9bb 100644
--- a/libgomp/oacc-parallel.c
+++ b/libgomp/oacc-parallel.c
@@ -144,9 +144,9 @@ GOACC_parallel (int device, void (*fn) (void *),
     {
       k.host_start = (uintptr_t) fn;
       k.host_end = k.host_start + 1;
-      gomp_mutex_lock (&acc_dev->mem_map.lock);
-      tgt_fn_key = splay_tree_lookup (&acc_dev->mem_map.splay_tree, &k);
-      gomp_mutex_unlock (&acc_dev->mem_map.lock);
+      gomp_mutex_lock (&acc_dev->lock);
+      tgt_fn_key = splay_tree_lookup (&acc_dev->mem_map, &k);
+      gomp_mutex_unlock (&acc_dev->lock);
 
       if (tgt_fn_key == NULL)
 	gomp_fatal ("target function wasn't mapped");
diff --git a/libgomp/plugin/plugin-host.c b/libgomp/plugin/plugin-host.c
index ebf7f11..bc60f72 100644
--- a/libgomp/plugin/plugin-host.c
+++ b/libgomp/plugin/plugin-host.c
@@ -95,12 +95,6 @@ GOMP_OFFLOAD_get_num_devices (void)
 }
 
 STATIC void
-GOMP_OFFLOAD_register_image (void *host_table __attribute__ ((unused)),
-			     void *target_data __attribute__ ((unused)))
-{
-}
-
-STATIC void
 GOMP_OFFLOAD_init_device (int n __attribute__ ((unused)))
 {
 }
@@ -111,12 +105,19 @@ GOMP_OFFLOAD_fini_device (int n __attribute__ ((unused)))
 }
 
 STATIC int
-GOMP_OFFLOAD_get_table (int n __attribute__ ((unused)),
-			struct mapping_table **table __attribute__ ((unused)))
+GOMP_OFFLOAD_load_image (int n __attribute__ ((unused)),
+			 void *i __attribute__ ((unused)),
+			 struct addr_pair **r __attribute__ ((unused)))
 {
   return 0;
 }
 
+STATIC void
+GOMP_OFFLOAD_unload_image (int n __attribute__ ((unused)),
+			   void *i __attribute__ ((unused)))
+{
+}
+
 STATIC void *
 GOMP_OFFLOAD_openacc_open_device (int n)
 {
diff --git a/libgomp/target.c b/libgomp/target.c
index c5dda3f..dfe7fb9 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -49,6 +49,9 @@ static void gomp_target_init (void);
 /* The whole initialization code for offloading plugins is only run one.  */
 static pthread_once_t gomp_is_initialized = PTHREAD_ONCE_INIT;
 
+/* Mutex for offload image registration.  */
+static gomp_mutex_t register_lock;
+
 /* This structure describes an offload image.
    It contains type of the target device, pointer to host table descriptor, and
    pointer to target data.  */
@@ -67,14 +70,26 @@ static int num_offload_images;
 /* Array of descriptors for all available devices.  */
 static struct gomp_device_descr *devices;
 
-#ifdef PLUGIN_SUPPORT
 /* Total number of available devices.  */
 static int num_devices;
-#endif
 
 /* Number of GOMP_OFFLOAD_CAP_OPENMP_400 devices.  */
 static int num_devices_openmp;
 
+/* Similar to gomp_realloc, but release register_lock before gomp_fatal.  */
+
+static void *
+gomp_realloc_unlock (void *old, size_t size)
+{
+  void *ret = realloc (old, size);
+  if (ret == NULL)
+    {
+      gomp_mutex_unlock (&register_lock);
+      gomp_fatal ("Out of memory allocating %lu bytes", (unsigned long) size);
+    }
+  return ret;
+}
+
 /* The comparison function.  */
 
 attribute_hidden int
@@ -125,16 +140,19 @@ resolve_device (int device_id)
    Helper function of gomp_map_vars.  */
 
 static inline void
-gomp_map_vars_existing (splay_tree_key oldn, splay_tree_key newn,
-			unsigned char kind)
+gomp_map_vars_existing (struct gomp_device_descr *devicep, splay_tree_key oldn,
+			splay_tree_key newn, unsigned char kind)
 {
   if ((kind & GOMP_MAP_FLAG_FORCE)
       || oldn->host_start > newn->host_start
       || oldn->host_end < newn->host_end)
-    gomp_fatal ("Trying to map into device [%p..%p) object when "
-		"[%p..%p) is already mapped",
-		(void *) newn->host_start, (void *) newn->host_end,
-		(void *) oldn->host_start, (void *) oldn->host_end);
+    {
+      gomp_mutex_unlock (&devicep->lock);
+      gomp_fatal ("Trying to map into device [%p..%p) object when "
+		  "[%p..%p) is already mapped",
+		  (void *) newn->host_start, (void *) newn->host_end,
+		  (void *) oldn->host_start, (void *) oldn->host_end);
+    }
   oldn->refcount++;
 }
 
@@ -153,14 +171,14 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
   size_t i, tgt_align, tgt_size, not_found_cnt = 0;
   const int rshift = is_openacc ? 8 : 3;
   const int typemask = is_openacc ? 0xff : 0x7;
-  struct gomp_memory_mapping *mm = &devicep->mem_map;
+  struct splay_tree_s *mem_map = &devicep->mem_map;
   struct splay_tree_key_s cur_node;
   struct target_mem_desc *tgt
     = gomp_malloc (sizeof (*tgt) + sizeof (tgt->list[0]) * mapnum);
   tgt->list_count = mapnum;
   tgt->refcount = 1;
   tgt->device_descr = devicep;
-  tgt->mem_map = mm;
+  tgt->mem_map = mem_map;
 
   if (mapnum == 0)
     return tgt;
@@ -174,7 +192,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
       tgt_size = mapnum * sizeof (void *);
     }
 
-  gomp_mutex_lock (&mm->lock);
+  gomp_mutex_lock (&devicep->lock);
 
   for (i = 0; i < mapnum; i++)
     {
@@ -189,11 +207,11 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	cur_node.host_end = cur_node.host_start + sizes[i];
       else
 	cur_node.host_end = cur_node.host_start + sizeof (void *);
-      splay_tree_key n = splay_tree_lookup (&mm->splay_tree, &cur_node);
+      splay_tree_key n = splay_tree_lookup (mem_map, &cur_node);
       if (n)
 	{
 	  tgt->list[i] = n;
-	  gomp_map_vars_existing (n, &cur_node, kind & typemask);
+	  gomp_map_vars_existing (devicep, n, &cur_node, kind & typemask);
 	}
       else
 	{
@@ -228,7 +246,10 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
   if (devaddrs)
     {
       if (mapnum != 1)
-        gomp_fatal ("unexpected aggregation");
+	{
+	  gomp_mutex_unlock (&devicep->lock);
+	  gomp_fatal ("unexpected aggregation");
+	}
       tgt->to_free = devaddrs[0];
       tgt->tgt_start = (uintptr_t) tgt->to_free;
       tgt->tgt_end = tgt->tgt_start + sizes[0];
@@ -274,11 +295,11 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	      k->host_end = k->host_start + sizes[i];
 	    else
 	      k->host_end = k->host_start + sizeof (void *);
-	    splay_tree_key n = splay_tree_lookup (&mm->splay_tree, k);
+	    splay_tree_key n = splay_tree_lookup (mem_map, k);
 	    if (n)
 	      {
 		tgt->list[i] = n;
-		gomp_map_vars_existing (n, k, kind & typemask);
+		gomp_map_vars_existing (devicep, n, k, kind & typemask);
 	      }
 	    else
 	      {
@@ -294,7 +315,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 		tgt->refcount++;
 		array->left = NULL;
 		array->right = NULL;
-		splay_tree_insert (&mm->splay_tree, array);
+		splay_tree_insert (mem_map, array);
 		switch (kind & typemask)
 		  {
 		  case GOMP_MAP_ALLOC:
@@ -332,22 +353,25 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 		    /* Add bias to the pointer value.  */
 		    cur_node.host_start += sizes[i];
 		    cur_node.host_end = cur_node.host_start + 1;
-		    n = splay_tree_lookup (&mm->splay_tree, &cur_node);
+		    n = splay_tree_lookup (mem_map, &cur_node);
 		    if (n == NULL)
 		      {
 			/* Could be possibly zero size array section.  */
 			cur_node.host_end--;
-			n = splay_tree_lookup (&mm->splay_tree, &cur_node);
+			n = splay_tree_lookup (mem_map, &cur_node);
 			if (n == NULL)
 			  {
 			    cur_node.host_start--;
-			    n = splay_tree_lookup (&mm->splay_tree, &cur_node);
+			    n = splay_tree_lookup (mem_map, &cur_node);
 			    cur_node.host_start++;
 			  }
 		      }
 		    if (n == NULL)
-		      gomp_fatal ("Pointer target of array section "
-				  "wasn't mapped");
+		      {
+			gomp_mutex_unlock (&devicep->lock);
+			gomp_fatal ("Pointer target of array section "
+				    "wasn't mapped");
+		      }
 		    cur_node.host_start -= n->host_start;
 		    cur_node.tgt_offset = n->tgt->tgt_start + n->tgt_offset
 					  + cur_node.host_start;
@@ -400,24 +424,25 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 			  /* Add bias to the pointer value.  */
 			  cur_node.host_start += sizes[j];
 			  cur_node.host_end = cur_node.host_start + 1;
-			  n = splay_tree_lookup (&mm->splay_tree, &cur_node);
+			  n = splay_tree_lookup (mem_map, &cur_node);
 			  if (n == NULL)
 			    {
 			      /* Could be possibly zero size array section.  */
 			      cur_node.host_end--;
-			      n = splay_tree_lookup (&mm->splay_tree,
-						     &cur_node);
+			      n = splay_tree_lookup (mem_map, &cur_node);
 			      if (n == NULL)
 				{
 				  cur_node.host_start--;
-				  n = splay_tree_lookup (&mm->splay_tree,
-							 &cur_node);
+				  n = splay_tree_lookup (mem_map, &cur_node);
 				  cur_node.host_start++;
 				}
 			    }
 			  if (n == NULL)
-			    gomp_fatal ("Pointer target of array section "
-					"wasn't mapped");
+			    {
+			      gomp_mutex_unlock (&devicep->lock);
+			      gomp_fatal ("Pointer target of array section "
+					  "wasn't mapped");
+			    }
 			  cur_node.host_start -= n->host_start;
 			  cur_node.tgt_offset = n->tgt->tgt_start
 						+ n->tgt_offset
@@ -441,6 +466,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 		      /* We already looked up the memory region above and it
 			 was missing.  */
 		      size_t size = k->host_end - k->host_start;
+		      gomp_mutex_unlock (&devicep->lock);
 #ifdef HAVE_INTTYPES_H
 		      gomp_fatal ("present clause: !acc_is_present (%p, "
 				  "%"PRIu64" (0x%"PRIx64"))",
@@ -463,6 +489,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 					    sizeof (void *));
 		    break;
 		  default:
+		    gomp_mutex_unlock (&devicep->lock);
 		    gomp_fatal ("%s: unhandled kind 0x%.2x", __FUNCTION__,
 				kind);
 		  }
@@ -489,7 +516,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	}
     }
 
-  gomp_mutex_unlock (&mm->lock);
+  gomp_mutex_unlock (&devicep->lock);
   return tgt;
 }
 
@@ -514,10 +541,9 @@ attribute_hidden void
 gomp_copy_from_async (struct target_mem_desc *tgt)
 {
   struct gomp_device_descr *devicep = tgt->device_descr;
-  struct gomp_memory_mapping *mm = tgt->mem_map;
   size_t i;
 
-  gomp_mutex_lock (&mm->lock);
+  gomp_mutex_lock (&devicep->lock);
 
   for (i = 0; i < tgt->list_count; i++)
     if (tgt->list[i] == NULL)
@@ -536,7 +562,7 @@ gomp_copy_from_async (struct target_mem_desc *tgt)
 				  k->host_end - k->host_start);
       }
 
-  gomp_mutex_unlock (&mm->lock);
+  gomp_mutex_unlock (&devicep->lock);
 }
 
 /* Unmap variables described by TGT.  If DO_COPYFROM is true, copy relevant
@@ -547,7 +573,6 @@ attribute_hidden void
 gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
 {
   struct gomp_device_descr *devicep = tgt->device_descr;
-  struct gomp_memory_mapping *mm = tgt->mem_map;
 
   if (tgt->list_count == 0)
     {
@@ -555,7 +580,7 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
       return;
     }
 
-  gomp_mutex_lock (&mm->lock);
+  gomp_mutex_lock (&devicep->lock);
 
   size_t i;
   for (i = 0; i < tgt->list_count; i++)
@@ -572,7 +597,7 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
 	  devicep->dev2host_func (devicep->target_id, (void *) k->host_start,
 				  (void *) (k->tgt->tgt_start + k->tgt_offset),
 				  k->host_end - k->host_start);
-	splay_tree_remove (&mm->splay_tree, k);
+	splay_tree_remove (tgt->mem_map, k);
 	if (k->tgt->refcount > 1)
 	  k->tgt->refcount--;
 	else
@@ -584,13 +609,12 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
   else
     gomp_unmap_tgt (tgt);
 
-  gomp_mutex_unlock (&mm->lock);
+  gomp_mutex_unlock (&devicep->lock);
 }
 
 static void
-gomp_update (struct gomp_device_descr *devicep, struct gomp_memory_mapping *mm,
-	     size_t mapnum, void **hostaddrs, size_t *sizes, void *kinds,
-	     bool is_openacc)
+gomp_update (struct gomp_device_descr *devicep, size_t mapnum, void **hostaddrs,
+	     size_t *sizes, void *kinds, bool is_openacc)
 {
   size_t i;
   struct splay_tree_key_s cur_node;
@@ -602,25 +626,27 @@ gomp_update (struct gomp_device_descr *devicep, struct gomp_memory_mapping *mm,
   if (mapnum == 0)
     return;
 
-  gomp_mutex_lock (&mm->lock);
+  gomp_mutex_lock (&devicep->lock);
   for (i = 0; i < mapnum; i++)
     if (sizes[i])
       {
 	cur_node.host_start = (uintptr_t) hostaddrs[i];
 	cur_node.host_end = cur_node.host_start + sizes[i];
-	splay_tree_key n = splay_tree_lookup (&mm->splay_tree,
-					      &cur_node);
+	splay_tree_key n = splay_tree_lookup (&devicep->mem_map, &cur_node);
 	if (n)
 	  {
 	    int kind = get_kind (is_openacc, kinds, i);
 	    if (n->host_start > cur_node.host_start
 		|| n->host_end < cur_node.host_end)
-	      gomp_fatal ("Trying to update [%p..%p) object when"
-			  "only [%p..%p) is mapped",
-			  (void *) cur_node.host_start,
-			  (void *) cur_node.host_end,
-			  (void *) n->host_start,
-			  (void *) n->host_end);
+	      {
+		gomp_mutex_unlock (&devicep->lock);
+		gomp_fatal ("Trying to update [%p..%p) object when "
+			    "only [%p..%p) is mapped",
+			    (void *) cur_node.host_start,
+			    (void *) cur_node.host_end,
+			    (void *) n->host_start,
+			    (void *) n->host_end);
+	      }
 	    if (GOMP_MAP_COPY_TO_P (kind & typemask))
 	      devicep->host2dev_func (devicep->target_id,
 				      (void *) (n->tgt->tgt_start
@@ -639,14 +665,106 @@ gomp_update (struct gomp_device_descr *devicep, struct gomp_memory_mapping *mm,
 				      cur_node.host_end - cur_node.host_start);
 	  }
 	else
-	  gomp_fatal ("Trying to update [%p..%p) object that is not mapped",
-		      (void *) cur_node.host_start,
-		      (void *) cur_node.host_end);
+	  {
+	    gomp_mutex_unlock (&devicep->lock);
+	    gomp_fatal ("Trying to update [%p..%p) object that is not mapped",
+			(void *) cur_node.host_start,
+			(void *) cur_node.host_end);
+	  }
       }
-  gomp_mutex_unlock (&mm->lock);
+  gomp_mutex_unlock (&devicep->lock);
+}
+
+/* Load image pointed by TARGET_DATA to the device, specified by DEVICEP.
+   And insert to splay tree the mapping between addresses from HOST_TABLE and
+   from loaded target image.  */
+
+static void
+gomp_offload_image_to_device (struct gomp_device_descr *devicep,
+			      void *host_table, void *target_data,
+			      bool is_register_lock)
+{
+  void **host_func_table = ((void ***) host_table)[0];
+  void **host_funcs_end  = ((void ***) host_table)[1];
+  void **host_var_table  = ((void ***) host_table)[2];
+  void **host_vars_end   = ((void ***) host_table)[3];
+
+  /* The func table contains only addresses, the var table contains addresses
+     and corresponding sizes.  */
+  int num_funcs = host_funcs_end - host_func_table;
+  int num_vars  = (host_vars_end - host_var_table) / 2;
+
+  /* Load image to device and get target addresses for the image.  */
+  struct addr_pair *target_table = NULL;
+  int i, num_target_entries
+    = devicep->load_image_func (devicep->target_id, target_data, &target_table);
+
+  if (num_target_entries != num_funcs + num_vars)
+    {
+      gomp_mutex_unlock (&devicep->lock);
+      if (is_register_lock)
+	gomp_mutex_unlock (&register_lock);
+      gomp_fatal ("Can't map target functions or variables");
+    }
+
+  /* Insert host-target address mapping into splay tree.  */
+  struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt));
+  tgt->array = gomp_malloc ((num_funcs + num_vars) * sizeof (*tgt->array));
+  tgt->refcount = 1;
+  tgt->tgt_start = 0;
+  tgt->tgt_end = 0;
+  tgt->to_free = NULL;
+  tgt->prev = NULL;
+  tgt->list_count = 0;
+  tgt->device_descr = devicep;
+  splay_tree_node array = tgt->array;
+
+  for (i = 0; i < num_funcs; i++)
+    {
+      splay_tree_key k = &array->key;
+      k->host_start = (uintptr_t) host_func_table[i];
+      k->host_end = k->host_start + 1;
+      k->tgt = tgt;
+      k->tgt_offset = target_table[i].start;
+      k->refcount = 1;
+      k->async_refcount = 0;
+      k->copy_from = false;
+      array->left = NULL;
+      array->right = NULL;
+      splay_tree_insert (&devicep->mem_map, array);
+      array++;
+    }
+
+  for (i = 0; i < num_vars; i++)
+    {
+      struct addr_pair *target_var = &target_table[num_funcs + i];
+      if (target_var->end - target_var->start
+	  != (uintptr_t) host_var_table[i * 2 + 1])
+	{
+	  gomp_mutex_unlock (&devicep->lock);
+	  if (is_register_lock)
+	    gomp_mutex_unlock (&register_lock);
+	  gomp_fatal ("Can't map target variables (size mismatch)");
+	}
+
+      splay_tree_key k = &array->key;
+      k->host_start = (uintptr_t) host_var_table[i * 2];
+      k->host_end = k->host_start + (uintptr_t) host_var_table[i * 2 + 1];
+      k->tgt = tgt;
+      k->tgt_offset = target_var->start;
+      k->refcount = 1;
+      k->async_refcount = 0;
+      k->copy_from = false;
+      array->left = NULL;
+      array->right = NULL;
+      splay_tree_insert (&devicep->mem_map, array);
+      array++;
+    }
+
+  free (target_table);
 }
 
-/* This function should be called from every offload image.
+/* This function should be called from every offload image while loading.
    It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of
    the target, and TARGET_DATA needed by target plugin.  */
 
@@ -654,83 +772,152 @@ void
 GOMP_offload_register (void *host_table, enum offload_target_type target_type,
 		       void *target_data)
 {
-  offload_images = gomp_realloc (offload_images,
-				 (num_offload_images + 1)
-				 * sizeof (struct offload_image_descr));
+  int i;
+  gomp_mutex_lock (&register_lock);
+
+  /* Load image to all initialized devices.  */
+  for (i = 0; i < num_devices; i++)
+    {
+      struct gomp_device_descr *devicep = &devices[i];
+      gomp_mutex_lock (&devicep->lock);
+      if (devicep->type == target_type && devicep->is_initialized)
+	gomp_offload_image_to_device (devicep, host_table, target_data, true);
+      gomp_mutex_unlock (&devicep->lock);
+    }
 
+  /* Insert image to array of pending images.  */
+  offload_images
+    = gomp_realloc_unlock (offload_images,
+			   (num_offload_images + 1)
+			   * sizeof (struct offload_image_descr));
   offload_images[num_offload_images].type = target_type;
   offload_images[num_offload_images].host_table = host_table;
   offload_images[num_offload_images].target_data = target_data;
 
   num_offload_images++;
+  gomp_mutex_unlock (&register_lock);
 }
 
-/* This function initializes the target device, specified by DEVICEP.  DEVICEP
-   must be locked on entry, and remains locked on return.  */
+/* This function should be called from every offload image while unloading.
+   It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of
+   the target, and TARGET_DATA needed by target plugin.  */
 
-attribute_hidden void
-gomp_init_device (struct gomp_device_descr *devicep)
+void
+GOMP_offload_unregister (void *host_table, enum offload_target_type target_type,
+			 void *target_data)
 {
-  devicep->init_device_func (devicep->target_id);
-  devicep->is_initialized = true;
+  void **host_func_table = ((void ***) host_table)[0];
+  void **host_funcs_end  = ((void ***) host_table)[1];
+  void **host_var_table  = ((void ***) host_table)[2];
+  void **host_vars_end   = ((void ***) host_table)[3];
+  int i;
+
+  /* The func table contains only addresses, the var table contains addresses
+     and corresponding sizes.  */
+  int num_funcs = host_funcs_end - host_func_table;
+  int num_vars  = (host_vars_end - host_var_table) / 2;
+
+  gomp_mutex_lock (&register_lock);
+
+  /* Unload image from all initialized devices.  */
+  for (i = 0; i < num_devices; i++)
+    {
+      int j;
+      struct gomp_device_descr *devicep = &devices[i];
+      gomp_mutex_lock (&devicep->lock);
+      if (devicep->type != target_type || !devicep->is_initialized)
+	{
+	  gomp_mutex_unlock (&devicep->lock);
+	  continue;
+	}
+
+      devicep->unload_image_func (devicep->target_id, target_data);
+
+      /* Remove mapping from splay tree.  */
+      struct splay_tree_key_s k;
+      splay_tree_key node = NULL;
+      if (num_funcs > 0)
+	{
+	  k.host_start = (uintptr_t) host_func_table[0];
+	  k.host_end = k.host_start + 1;
+	  node = splay_tree_lookup (&devicep->mem_map, &k);
+	}
+      else if (num_vars > 0)
+	{
+	  k.host_start = (uintptr_t) host_var_table[0];
+	  k.host_end = k.host_start + (uintptr_t) host_var_table[1];
+	  node = splay_tree_lookup (&devicep->mem_map, &k);
+	}
+
+      for (j = 0; j < num_funcs; j++)
+	{
+	  k.host_start = (uintptr_t) host_func_table[j];
+	  k.host_end = k.host_start + 1;
+	  splay_tree_remove (&devicep->mem_map, &k);
+	}
+
+      for (j = 0; j < num_vars; j++)
+	{
+	  k.host_start = (uintptr_t) host_var_table[j * 2];
+	  k.host_end = k.host_start + (uintptr_t) host_var_table[j * 2 + 1];
+	  splay_tree_remove (&devicep->mem_map, &k);
+	}
+
+      if (node)
+	{
+	  free (node->tgt);
+	  free (node);
+	}
+
+      gomp_mutex_unlock (&devicep->lock);
+    }
+
+  /* Remove image from array of pending images.  */
+  for (i = 0; i < num_offload_images; i++)
+    if (offload_images[i].target_data == target_data)
+      {
+	offload_images[i] = offload_images[--num_offload_images];
+	break;
+      }
+
+  gomp_mutex_unlock (&register_lock);
 }
 
-/* Initialize address mapping tables.  MM must be locked on entry, and remains
-   locked on return.  */
+/* This function initializes the target device, specified by DEVICEP.  DEVICEP
+   must be locked on entry, and remains locked on return.  */
 
 attribute_hidden void
-gomp_init_tables (struct gomp_device_descr *devicep,
-		  struct gomp_memory_mapping *mm)
+gomp_init_device (struct gomp_device_descr *devicep)
 {
-  /* Get address mapping table for device.  */
-  struct mapping_table *table = NULL;
-  int num_entries = devicep->get_table_func (devicep->target_id, &table);
-
-  /* Insert host-target address mapping into dev_splay_tree.  */
   int i;
-  for (i = 0; i < num_entries; i++)
+  devicep->init_device_func (devicep->target_id);
+
+  /* Load to device all images registered by the moment.  */
+  for (i = 0; i < num_offload_images; i++)
     {
-      struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt));
-      tgt->refcount = 1;
-      tgt->array = gomp_malloc (sizeof (*tgt->array));
-      tgt->tgt_start = table[i].tgt_start;
-      tgt->tgt_end = table[i].tgt_end;
-      tgt->to_free = NULL;
-      tgt->list_count = 0;
-      tgt->device_descr = devicep;
-      splay_tree_node node = tgt->array;
-      splay_tree_key k = &node->key;
-      k->host_start = table[i].host_start;
-      k->host_end = table[i].host_end;
-      k->tgt_offset = 0;
-      k->refcount = 1;
-      k->copy_from = false;
-      k->tgt = tgt;
-      node->left = NULL;
-      node->right = NULL;
-      splay_tree_insert (&mm->splay_tree, node);
+      struct offload_image_descr *image = &offload_images[i];
+      if (image->type == devicep->type)
+	gomp_offload_image_to_device (devicep, image->host_table,
+				      image->target_data, false);
     }
 
-  free (table);
-  mm->is_initialized = true;
+  devicep->is_initialized = true;
 }
 
 /* Free address mapping tables.  MM must be locked on entry, and remains locked
    on return.  */
 
 attribute_hidden void
-gomp_free_memmap (struct gomp_memory_mapping *mm)
+gomp_free_memmap (struct splay_tree_s *mem_map)
 {
-  while (mm->splay_tree.root)
+  while (mem_map->root)
     {
-      struct target_mem_desc *tgt = mm->splay_tree.root->key.tgt;
+      struct target_mem_desc *tgt = mem_map->root->key.tgt;
 
-      splay_tree_remove (&mm->splay_tree, &mm->splay_tree.root->key);
+      splay_tree_remove (mem_map, &mem_map->root->key);
       free (tgt->array);
       free (tgt);
     }
-
-  mm->is_initialized = false;
 }
 
 /* This function de-initializes the target device, specified by DEVICEP.
@@ -791,22 +978,19 @@ GOMP_target (int device, void (*fn) (void *), const void *unused,
     fn_addr = (void *) fn;
   else
     {
-      struct gomp_memory_mapping *mm = &devicep->mem_map;
-      gomp_mutex_lock (&mm->lock);
-
-      if (!mm->is_initialized)
-	gomp_init_tables (devicep, mm);
-
+      gomp_mutex_lock (&devicep->lock);
       struct splay_tree_key_s k;
       k.host_start = (uintptr_t) fn;
       k.host_end = k.host_start + 1;
-      splay_tree_key tgt_fn = splay_tree_lookup (&mm->splay_tree, &k);
+      splay_tree_key tgt_fn = splay_tree_lookup (&devicep->mem_map, &k);
       if (tgt_fn == NULL)
-	gomp_fatal ("Target function wasn't mapped");
-
-      gomp_mutex_unlock (&mm->lock);
+	{
+	  gomp_mutex_unlock (&devicep->lock);
+	  gomp_fatal ("Target function wasn't mapped");
+	}
+      gomp_mutex_unlock (&devicep->lock);
 
-      fn_addr = (void *) tgt_fn->tgt->tgt_start;
+      fn_addr = (void *) tgt_fn->tgt_offset;
     }
 
   struct target_mem_desc *tgt_vars
@@ -856,12 +1040,6 @@ GOMP_target_data (int device, const void *unused, size_t mapnum,
     gomp_init_device (devicep);
   gomp_mutex_unlock (&devicep->lock);
 
-  struct gomp_memory_mapping *mm = &devicep->mem_map;
-  gomp_mutex_lock (&mm->lock);
-  if (!mm->is_initialized)
-    gomp_init_tables (devicep, mm);
-  gomp_mutex_unlock (&mm->lock);
-
   struct target_mem_desc *tgt
     = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false,
 		     false);
@@ -897,13 +1075,7 @@ GOMP_target_update (int device, const void *unused, size_t mapnum,
     gomp_init_device (devicep);
   gomp_mutex_unlock (&devicep->lock);
 
-  struct gomp_memory_mapping *mm = &devicep->mem_map;
-  gomp_mutex_lock (&mm->lock);
-  if (!mm->is_initialized)
-    gomp_init_tables (devicep, mm);
-  gomp_mutex_unlock (&mm->lock);
-
-  gomp_update (devicep, mm, mapnum, hostaddrs, sizes, kinds, false);
+  gomp_update (devicep, mapnum, hostaddrs, sizes, kinds, false);
 }
 
 void
@@ -972,10 +1144,10 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
   DLSYM (get_caps);
   DLSYM (get_type);
   DLSYM (get_num_devices);
-  DLSYM (register_image);
   DLSYM (init_device);
   DLSYM (fini_device);
-  DLSYM (get_table);
+  DLSYM (load_image);
+  DLSYM (unload_image);
   DLSYM (alloc);
   DLSYM (free);
   DLSYM (dev2host);
@@ -1038,22 +1210,6 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
   return err == NULL;
 }
 
-/* This function adds a compatible offload image IMAGE to an accelerator device
-   DEVICE.  DEVICE must be locked on entry, and remains locked on return.  */
-
-static void
-gomp_register_image_for_device (struct gomp_device_descr *device,
-				struct offload_image_descr *image)
-{
-  if (!device->offload_regions_registered
-      && (device->type == image->type
-	  || device->type == OFFLOAD_TARGET_TYPE_HOST))
-    {
-      device->register_image_func (image->host_table, image->target_data);
-      device->offload_regions_registered = true;
-    }
-}
-
 /* This function initializes the runtime needed for offloading.
    It parses the list of offload targets and tries to load the plugins for
    these targets.  On return, the variables NUM_DEVICES and NUM_DEVICES_OPENMP
@@ -1112,17 +1268,14 @@ gomp_target_init (void)
 		current_device.name = current_device.get_name_func ();
 		/* current_device.capabilities has already been set.  */
 		current_device.type = current_device.get_type_func ();
-		current_device.mem_map.is_initialized = false;
-		current_device.mem_map.splay_tree.root = NULL;
+		current_device.mem_map.root = NULL;
 		current_device.is_initialized = false;
-		current_device.offload_regions_registered = false;
 		current_device.openacc.data_environ = NULL;
 		current_device.openacc.target_data = NULL;
 		for (i = 0; i < new_num_devices; i++)
 		  {
 		    current_device.target_id = i;
 		    devices[num_devices] = current_device;
-		    gomp_mutex_init (&devices[num_devices].mem_map.lock);
 		    gomp_mutex_init (&devices[num_devices].lock);
 		    num_devices++;
 		  }
@@ -1157,21 +1310,12 @@ gomp_target_init (void)
 
   for (i = 0; i < num_devices; i++)
     {
-      int j;
-
-      for (j = 0; j < num_offload_images; j++)
-	gomp_register_image_for_device (&devices[i], &offload_images[j]);
-
       /* The 'devices' array can be moved (by the realloc call) until we have
 	 found all the plugins, so registering with the OpenACC runtime (which
 	 takes a copy of the pointer argument) must be delayed until now.  */
       if (devices[i].capabilities & GOMP_OFFLOAD_CAP_OPENACC_200)
 	goacc_register (&devices[i]);
     }
-
-  free (offload_images);
-  offload_images = NULL;
-  num_offload_images = 0;
 }
 
 #else /* PLUGIN_SUPPORT */
diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
index 3e7a958..a2d61b1 100644
--- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
+++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
@@ -34,6 +34,7 @@
 #include <string.h>
 #include <utility>
 #include <vector>
+#include <map>
 #include "libgomp-plugin.h"
 #include "compiler_if_host.h"
 #include "main_target_image.h"
@@ -53,6 +54,29 @@ fprintf (stderr, "\n");					    \
 #endif
 
 
+/* Start/end addresses of functions and global variables on a device.  */
+typedef std::vector<addr_pair> AddrVect;
+
+/* Addresses for one image and all devices.  */
+typedef std::vector<AddrVect> DevAddrVect;
+
+/* Addresses for all images and all devices.  */
+typedef std::map<void *, DevAddrVect> ImgDevAddrMap;
+
+
+/* Total number of available devices.  */
+static int num_devices;
+
+/* Total number of shared libraries with offloading to Intel MIC.  */
+static int num_images;
+
+/* Two dimensional array: one key is a pointer to image,
+   second key is number of device.  Contains a vector of pointer pairs.  */
+static ImgDevAddrMap *address_table;
+
+/* Thread-safe registration of the main image.  */
+static pthread_once_t main_image_is_registered = PTHREAD_ONCE_INIT;
+
 static VarDesc vd_host2tgt = {
   { 1, 1 },		      /* dst, src			      */
   { 1, 0 },		      /* in, out			      */
@@ -90,28 +114,17 @@ static VarDesc vd_tgt2host = {
 };
 
 
-/* Total number of shared libraries with offloading to Intel MIC.  */
-static int num_libraries;
-
-/* Pointers to the descriptors, containing pointers to host-side tables and to
-   target images.  */
-static std::vector< std::pair<void *, void *> > lib_descrs;
-
-/* Thread-safe registration of the main image.  */
-static pthread_once_t main_image_is_registered = PTHREAD_ONCE_INIT;
-
-
 /* Add path specified in LD_LIBRARY_PATH to MIC_LD_LIBRARY_PATH, which is
    required by liboffloadmic.  */
 __attribute__((constructor))
 static void
-set_mic_lib_path (void)
+init (void)
 {
   const char *ld_lib_path = getenv (LD_LIBRARY_PATH_ENV);
   const char *mic_lib_path = getenv (MIC_LD_LIBRARY_PATH_ENV);
 
   if (!ld_lib_path)
-    return;
+    goto out;
 
   if (!mic_lib_path)
     setenv (MIC_LD_LIBRARY_PATH_ENV, ld_lib_path, 1);
@@ -133,6 +146,10 @@ set_mic_lib_path (void)
       if (!use_alloca)
 	free (mic_lib_path_new);
     }
+
+out:
+  address_table = new ImgDevAddrMap;
+  num_devices = _Offload_number_of_devices ();
 }
 
 extern "C" const char *
@@ -162,18 +179,8 @@ GOMP_OFFLOAD_get_type (void)
 extern "C" int
 GOMP_OFFLOAD_get_num_devices (void)
 {
-  int res = _Offload_number_of_devices ();
-  TRACE ("(): return %d", res);
-  return res;
-}
-
-/* This should be called from every shared library with offloading.  */
-extern "C" void
-GOMP_OFFLOAD_register_image (void *host_table, void *target_image)
-{
-  TRACE ("(host_table = %p, target_image = %p)", host_table, target_image);
-  lib_descrs.push_back (std::make_pair (host_table, target_image));
-  num_libraries++;
+  TRACE ("(): return %d", num_devices);
+  return num_devices;
 }
 
 static void
@@ -196,7 +203,8 @@ register_main_image ()
   __offload_register_image (&main_target_image);
 }
 
-/* Load offload_target_main on target.  */
+/* liboffloadmic loads and runs offload_target_main on all available devices
+   during a first call to offload ().  */
 extern "C" void
 GOMP_OFFLOAD_init_device (int device)
 {
@@ -243,9 +251,11 @@ get_target_table (int device, int &num_funcs, int &num_vars, void **&table)
     }
 }
 
+/* Offload TARGET_IMAGE to all available devices and fill address_table with
+   corresponding target addresses.  */
+
 static void
-load_lib_and_get_table (int device, int lib_num, mapping_table *&table,
-			int &table_size)
+offload_image (void *target_image)
 {
   struct TargetImage {
     int64_t size;
@@ -254,19 +264,11 @@ load_lib_and_get_table (int device, int lib_num, mapping_table *&table,
     char data[];
   } __attribute__ ((packed));
 
-  void ***host_table_descr = (void ***) lib_descrs[lib_num].first;
-  void **host_func_start = host_table_descr[0];
-  void **host_func_end   = host_table_descr[1];
-  void **host_var_start  = host_table_descr[2];
-  void **host_var_end    = host_table_descr[3];
+  void *image_start = ((void **) target_image)[0];
+  void *image_end   = ((void **) target_image)[1];
 
-  void **target_image_descr = (void **) lib_descrs[lib_num].second;
-  void *image_start = target_image_descr[0];
-  void *image_end   = target_image_descr[1];
-
-  TRACE ("() host_table_descr { %p, %p, %p, %p }", host_func_start,
-	 host_func_end, host_var_start, host_var_end);
-  TRACE ("() target_image_descr { %p, %p }", image_start, image_end);
+  TRACE ("(target_image = %p { %p, %p })",
+	 target_image, image_start, image_end);
 
   int64_t image_size = (uintptr_t) image_end - (uintptr_t) image_start;
   TargetImage *image
@@ -279,94 +281,87 @@ load_lib_and_get_table (int device, int lib_num, mapping_table *&table,
     }
 
   image->size = image_size;
-  sprintf (image->name, "lib%010d.so", lib_num);
+  sprintf (image->name, "lib%010d.so", num_images++);
   memcpy (image->data, image_start, image->size);
 
   TRACE ("() __offload_register_image %s { %p, %d }",
 	 image->name, image_start, image->size);
   __offload_register_image (image);
 
-  int tgt_num_funcs = 0;
-  int tgt_num_vars = 0;
-  void **tgt_table = NULL;
-  get_target_table (device, tgt_num_funcs, tgt_num_vars, tgt_table);
-  free (image);
-
-  /* The func table contains only addresses, the var table contains addresses
-     and corresponding sizes.  */
-  int host_num_funcs = host_func_end - host_func_start;
-  int host_num_vars  = (host_var_end - host_var_start) / 2;
-  TRACE ("() host_num_funcs = %d, tgt_num_funcs = %d",
-	 host_num_funcs, tgt_num_funcs);
-  TRACE ("() host_num_vars = %d, tgt_num_vars = %d",
-	 host_num_vars, tgt_num_vars);
-  if (host_num_funcs != tgt_num_funcs)
+  /* Receive tables for target_image from all devices.  */
+  DevAddrVect dev_table;
+  for (int dev = 0; dev < num_devices; dev++)
     {
-      fprintf (stderr, "%s: Can't map target functions\n", __FILE__);
-      exit (1);
-    }
-  if (host_num_vars != tgt_num_vars)
-    {
-      fprintf (stderr, "%s: Can't map target variables\n", __FILE__);
-      exit (1);
-    }
+      int num_funcs = 0;
+      int num_vars = 0;
+      void **table = NULL;
 
-  table = (mapping_table *) realloc (table, (table_size + host_num_funcs
-					     + host_num_vars)
-					    * sizeof (mapping_table));
-  if (table == NULL)
-    {
-      fprintf (stderr, "%s: Can't allocate memory\n", __FILE__);
-      exit (1);
-    }
+      get_target_table (dev, num_funcs, num_vars, table);
 
-  for (int i = 0; i < host_num_funcs; i++)
-    {
-      mapping_table t;
-      t.host_start = (uintptr_t) host_func_start[i];
-      t.host_end = t.host_start + 1;
-      t.tgt_start = (uintptr_t) tgt_table[i];
-      t.tgt_end = t.tgt_start + 1;
-
-      TRACE ("() lib %d, func %d:\t0x%llx -- 0x%llx",
-	     lib_num, i, t.host_start, t.tgt_start);
-
-      table[table_size++] = t;
-    }
+      AddrVect curr_dev_table;
 
-  for (int i = 0; i < host_num_vars * 2; i += 2)
-    {
-      mapping_table t;
-      t.host_start = (uintptr_t) host_var_start[i];
-      t.host_end = t.host_start + (uintptr_t) host_var_start[i+1];
-      t.tgt_start = (uintptr_t) tgt_table[tgt_num_funcs+i];
-      t.tgt_end = t.tgt_start + (uintptr_t) tgt_table[tgt_num_funcs+i+1];
+      for (int i = 0; i < num_funcs; i++)
+	{
+	  addr_pair tgt_addr;
+	  tgt_addr.start = (uintptr_t) table[i];
+	  tgt_addr.end = tgt_addr.start + 1;
+	  TRACE ("() func %d:\t0x%llx..0x%llx", i,
+		 tgt_addr.start, tgt_addr.end);
+	  curr_dev_table.push_back (tgt_addr);
+	}
 
-      TRACE ("() lib %d, var %d:\t0x%llx (%d) -- 0x%llx (%d)", lib_num, i/2,
-	     t.host_start, t.host_end - t.host_start,
-	     t.tgt_start, t.tgt_end - t.tgt_start);
+      for (int i = 0; i < num_vars; i++)
+	{
+	  addr_pair tgt_addr;
+	  tgt_addr.start = (uintptr_t) table[num_funcs+i*2];
+	  tgt_addr.end = tgt_addr.start + (uintptr_t) table[num_funcs+i*2+1];
+	  TRACE ("() var %d:\t0x%llx..0x%llx", i, tgt_addr.start, tgt_addr.end);
+	  curr_dev_table.push_back (tgt_addr);
+	}
 
-      table[table_size++] = t;
+      dev_table.push_back (curr_dev_table);
     }
 
-  delete [] tgt_table;
+  address_table->insert (std::make_pair (target_image, dev_table));
+
+  free (image);
 }
 
 extern "C" int
-GOMP_OFFLOAD_get_table (int device, void *result)
+GOMP_OFFLOAD_load_image (int device, void *target_image, addr_pair **result)
 {
-  TRACE ("(num_libraries = %d)", num_libraries);
+  TRACE ("(device = %d, target_image = %p)", device, target_image);
 
-  mapping_table *table = NULL;
-  int table_size = 0;
+  /* If target_image is already present in address_table, then there is no need
+     to offload it.  */
+  if (address_table->count (target_image) == 0)
+    offload_image (target_image);
 
-  for (int i = 0; i < num_libraries; i++)
-    load_lib_and_get_table (device, i, table, table_size);
+  AddrVect *curr_dev_table = &(*address_table)[target_image][device];
+  int table_size = curr_dev_table->size ();
+  addr_pair *table = (addr_pair *) malloc (table_size * sizeof (addr_pair));
+  if (table == NULL)
+    {
+      fprintf (stderr, "%s: Can't allocate memory\n", __FILE__);
+      exit (1);
+    }
 
-  *(void **) result = table;
+  std::copy (curr_dev_table->begin (), curr_dev_table->end (), table);
+  *result = table;
   return table_size;
 }
 
+extern "C" void
+GOMP_OFFLOAD_unload_image (int device, void *target_image)
+{
+  TRACE ("(device = %d, target_image = %p)", device, target_image);
+
+  /* TODO: Currently liboffloadmic doesn't support __offload_unregister_image
+     for libraries.  */
+
+  address_table->erase (target_image);
+}
+
 extern "C" void *
 GOMP_OFFLOAD_alloc (int device, size_t size)
 {


  -- Ilya

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
  2015-04-01 13:14                                               ` Ilya Verbin
@ 2015-04-01 13:20                                                 ` Jakub Jelinek
  2015-04-01 17:26                                                   ` Ilya Verbin
  2015-04-06 12:46                                                   ` Ilya Verbin
  0 siblings, 2 replies; 92+ messages in thread
From: Jakub Jelinek @ 2015-04-01 13:20 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: Julian Brown, Thomas Schwinge, gcc-patches, Kirill Yukhin

On Wed, Apr 01, 2015 at 04:14:05PM +0300, Ilya Verbin wrote:
> On Wed, Apr 01, 2015 at 07:21:47 +0200, Jakub Jelinek wrote:
> > On Wed, Apr 01, 2015 at 02:53:28AM +0300, Ilya Verbin wrote:
> > > +/* Similar to gomp_fatal, but release mutexes before.  */
> > > +
> > > +static void
> > > +gomp_fatal_unlock (const char *fmt, ...)
> > > +{
> > > +  int i;
> > > +  va_list list;
> > > +
> > > +  for (i = 0; i < num_devices; i++)
> > > +    gomp_mutex_unlock (&devices[i].lock);
> > 
> > This is wrong.  Calling gomp_mutex_unlock on a lock that you don't have
> > locked is undefined behavior.
> > You really should unlock it in the caller which should be aware which 0/1/2
> > locks it holds.
> 
> I was worried about the following scenario:
> 1. Thread 1 in GOMP_target locks device 1.
> 2. Thread 2 in GOMP_target locks device 2 and calls gomp_fatal.
> 3. GOMP_offload_unregister will wait for device 1, even device 2 is unlocked.

How is that different from
1. Thread 1 in GOMP_target locks device 1.
2. Thread 2 calls exit.
?  I mean when you unlock the device and register locks if you own them
before gomp_fatal.

> Anyway, it was a bad idea to unlock mutexes from non-owner thread.
> 
> Here is patch, which unlocks proper mutexes in the caller, as you suggested.
> make check-target-libgomp passed.

LGTM with proper ChangeLog entry.

	Jakub

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
  2015-04-01 13:20                                                 ` Jakub Jelinek
@ 2015-04-01 17:26                                                   ` Ilya Verbin
  2015-04-06 12:46                                                   ` Ilya Verbin
  1 sibling, 0 replies; 92+ messages in thread
From: Ilya Verbin @ 2015-04-01 17:26 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Julian Brown, Thomas Schwinge, gcc-patches, Kirill Yukhin

On Wed, Apr 01, 2015 at 15:20:25 +0200, Jakub Jelinek wrote:
> On Wed, Apr 01, 2015 at 04:14:05PM +0300, Ilya Verbin wrote:
> > I was worried about the following scenario:
> > 1. Thread 1 in GOMP_target locks device 1.
> > 2. Thread 2 in GOMP_target locks device 2 and calls gomp_fatal.
> > 3. GOMP_offload_unregister will wait for device 1, even device 2 is unlocked.
> 
> How is that different from
> 1. Thread 1 in GOMP_target locks device 1.
> 2. Thread 2 calls exit.
> ?  I mean when you unlock the device and register locks if you own them
> before gomp_fatal.

Yeah, it's the same situation.

> > Here is patch, which unlocks proper mutexes in the caller, as you suggested.
> > make check-target-libgomp passed.
> 
> LGTM with proper ChangeLog entry.

When should I commit it into trunk?  Without the corresponding PTX part,
offloading to PTX will not work.

  -- Ilya

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
  2015-04-01 13:20                                                 ` Jakub Jelinek
  2015-04-01 17:26                                                   ` Ilya Verbin
@ 2015-04-06 12:46                                                   ` Ilya Verbin
  2015-04-07 15:26                                                     ` Jakub Jelinek
  1 sibling, 1 reply; 92+ messages in thread
From: Ilya Verbin @ 2015-04-06 12:46 UTC (permalink / raw)
  To: Julian Brown, Thomas Schwinge; +Cc: Jakub Jelinek, gcc-patches, Kirill Yukhin

On Wed, Apr 01, 2015 at 15:20:25 +0200, Jakub Jelinek wrote:
> LGTM with proper ChangeLog entry.

I've commited this patch into trunk.

Julian, you probably want to update the nvptx plugin.


gcc/
	* config/i386/intelmic-mkoffload.c (generate_host_descr_file): Call
	GOMP_offload_unregister from the destructor.
libgomp/
	* libgomp-plugin.h (struct mapping_table): Replace with addr_pair.
	* libgomp.h (struct gomp_memory_mapping): Remove.
	(struct target_mem_desc): Change type of mem_map from
	gomp_memory_mapping * to splay_tree_s *.
	(struct gomp_device_descr): Remove register_image_func, get_table_func.
	Add load_image_func, unload_image_func.
	Change type of mem_map from gomp_memory_mapping to splay_tree_s.
	Remove offload_regions_registered.
	(gomp_init_tables): Remove.
	(gomp_free_memmap): Change type of argument from gomp_memory_mapping *
	to splay_tree_s *.
	* libgomp.map (GOMP_4.0.1): Add GOMP_offload_unregister.
	* oacc-host.c (host_dispatch): Do not initialize register_image_func,
	get_table_func, mem_map.is_initialized, mem_map.splay_tree.root,
	offload_regions_registered.
	Initialize load_image_func, unload_image_func, mem_map.root.
	(goacc_host_init): Do not initialize host_dispatch.mem_map.lock.
	* oacc-init.c (lazy_open): Don't call gomp_init_tables.
	(acc_shutdown_1): Use dev's lock and splay_tree instead of mem_map's.
	* oacc-mem.c (lookup_host): Get gomp_device_descr *dev instead of
	gomp_memory_mapping *.  Use dev's lock and splay_tree.
	(lookup_dev): Use dev's lock.
	(acc_deviceptr): Pass dev to lookup_host instead of mem_map.
	(acc_is_present): Likewise.
	(acc_map_data): Likewise.
	(acc_unmap_data): Likewise.  Use dev's lock.
	(present_create_copy): Likewise.
	(delete_copyout): Pass dev to lookup_host instead of mem_map.
	(update_dev_host): Likewise.
	(gomp_acc_remove_pointer): Likewise.  Use dev's lock.
	* oacc-parallel.c (GOACC_parallel): Use dev's lock and splay_tree.
	* plugin/plugin-host.c (GOMP_OFFLOAD_register_image): Remove.
	(GOMP_OFFLOAD_get_table): Remove
	(GOMP_OFFLOAD_load_image): New function.
	(GOMP_OFFLOAD_unload_image): New function.
	* target.c (register_lock): New mutex for offload image registration.
	(num_devices): Do not guard with PLUGIN_SUPPORT.
	(gomp_realloc_unlock): New static function.
	(gomp_map_vars_existing): Add device descriptor argument.  Unlock mutex
	before gomp_fatal.
	(gomp_map_vars): Use dev's lock and splay_tree instead of mem_map's.
	Pass devicep to gomp_map_vars_existing.  Unlock mutex before gomp_fatal.
	(gomp_copy_from_async): Use dev's lock and splay_tree instead of
	mem_map's.
	(gomp_unmap_vars): Likewise.
	(gomp_update): Remove gomp_memory_mapping argument.  Use dev's lock and
	splay_tree instead of mm's.  Unlock mutex before gomp_fatal.
	(gomp_offload_image_to_device): New static function.
	(GOMP_offload_register): Add mutex lock.
	Call gomp_offload_image_to_device for all initialized devices.
	Replace gomp_realloc with gomp_realloc_unlock.
	(GOMP_offload_unregister): New function.
	(gomp_init_tables): Replace with gomp_init_device.  Replace a call to
	get_table_func from the plugin with calls to init_device_func and
	gomp_offload_image_to_device.
	(gomp_free_memmap): Change type of argument from gomp_memory_mapping *
	to splay_tree_s *.
	(GOMP_target): Do not call gomp_init_tables.  Use dev's lock and
	splay_tree instead of mem_map's.  Unlock mutex before gomp_fatal.
	(GOMP_target_data): Do not call gomp_init_tables.
	(GOMP_target_update): Likewise.  Remove argument from gomp_update.
	(gomp_load_plugin_for_device): Replace register_image and get_table
	with load_image and unload_image in DLSYM ().
	(gomp_register_images_for_device): Remove function.
	(gomp_target_init): Do not initialize current_device.mem_map.*,
	current_device.offload_regions_registered.
	Remove call to gomp_register_images_for_device.
	Do not free offload_images and num_offload_images.
liboffloadmic/
	* plugin/libgomp-plugin-intelmic.cpp: Include map.
	(AddrVect, DevAddrVect, ImgDevAddrMap): New typedefs.
	(num_devices, num_images, address_table): New static vars.
	(num_libraries, lib_descrs): Remove static vars.
	(set_mic_lib_path): Rename to ...
	(init): ... this.  Allocate address_table and get num_devices.
	(GOMP_OFFLOAD_get_num_devices): return num_devices.
	(load_lib_and_get_table): Remove static function.
	(offload_image): New static function.
	(GOMP_OFFLOAD_get_table): Remove function.
	(GOMP_OFFLOAD_load_image, GOMP_OFFLOAD_unload_image): New functions.

  -- Ilya

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
  2015-04-06 12:46                                                   ` Ilya Verbin
@ 2015-04-07 15:26                                                     ` Jakub Jelinek
  2015-04-08 14:32                                                       ` Julian Brown
  0 siblings, 1 reply; 92+ messages in thread
From: Jakub Jelinek @ 2015-04-07 15:26 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: Julian Brown, Thomas Schwinge, gcc-patches, Kirill Yukhin

On Mon, Apr 06, 2015 at 03:45:57PM +0300, Ilya Verbin wrote:
> On Wed, Apr 01, 2015 at 15:20:25 +0200, Jakub Jelinek wrote:
> > LGTM with proper ChangeLog entry.
> 
> I've commited this patch into trunk.
> 
> Julian, you probably want to update the nvptx plugin.

Note that as the number of P1s without posted fixes is now zero, it is
likely RC1 will be done this week, so if you want nvptx working in GCC 5,
please post a fix as soon as possible.

	Jakub

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
  2015-04-07 15:26                                                     ` Jakub Jelinek
@ 2015-04-08 14:32                                                       ` Julian Brown
  2015-04-08 14:34                                                         ` Jakub Jelinek
  2015-04-08 14:59                                                         ` Ilya Verbin
  0 siblings, 2 replies; 92+ messages in thread
From: Julian Brown @ 2015-04-08 14:32 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Ilya Verbin, Thomas Schwinge, gcc-patches, Kirill Yukhin

[-- Attachment #1: Type: text/plain, Size: 927 bytes --]

On Tue, 7 Apr 2015 17:26:45 +0200
Jakub Jelinek <jakub@redhat.com> wrote:

> On Mon, Apr 06, 2015 at 03:45:57PM +0300, Ilya Verbin wrote:
> > On Wed, Apr 01, 2015 at 15:20:25 +0200, Jakub Jelinek wrote:
> > > LGTM with proper ChangeLog entry.
> > 
> > I've commited this patch into trunk.
> > 
> > Julian, you probably want to update the nvptx plugin.
> 
> Note that as the number of P1s without posted fixes is now zero, it is
> likely RC1 will be done this week, so if you want nvptx working in
> GCC 5, please post a fix as soon as possible.

This version is mostly the same as the last posted version but has a
tweak in GOACC_parallel to account for the new splay tree arrangement
for target functions:

-      tgt_fn = (void (*)) tgt_fn_key->tgt->tgt_start;
+      tgt_fn = (void (*)) tgt_fn_key->tgt_offset;

Have there been any other changes I might have missed?

It passes libgomp testing on NVPTX. OK?

Thanks,

Julian

[-- Attachment #2: nvptx-load-unload-6.diff --]
[-- Type: text/x-patch, Size: 47121 bytes --]

commit ac06b5e25e170061bb9855b9ea4b8e5696816bf1
Author: Julian Brown <julian@codesourcery.com>
Date:   Tue Apr 7 09:23:58 2015 -0700

    NVPTX load/unload and init-rework patch.

diff --git a/gcc/config/nvptx/mkoffload.c b/gcc/config/nvptx/mkoffload.c
index 02c44b6..dbc68bc 100644
--- a/gcc/config/nvptx/mkoffload.c
+++ b/gcc/config/nvptx/mkoffload.c
@@ -839,6 +839,7 @@ process (FILE *in, FILE *out)
 {
   const char *input = read_file (in);
   Token *tok = tokenize (input);
+  unsigned int nvars = 0, nfuncs = 0;
 
   do
     tok = parse_file (tok);
@@ -850,16 +851,17 @@ process (FILE *in, FILE *out)
   write_stmts (out, rev_stmts (fns));
   fprintf (out, ";\n\n");
   fprintf (out, "static const char *var_mappings[] = {\n");
-  for (id_map *id = var_ids; id; id = id->next)
+  for (id_map *id = var_ids; id; id = id->next, nvars++)
     fprintf (out, "\t\"%s\"%s\n", id->ptx_name, id->next ? "," : "");
   fprintf (out, "};\n\n");
   fprintf (out, "static const char *func_mappings[] = {\n");
-  for (id_map *id = func_ids; id; id = id->next)
+  for (id_map *id = func_ids; id; id = id->next, nfuncs++)
     fprintf (out, "\t\"%s\"%s\n", id->ptx_name, id->next ? "," : "");
   fprintf (out, "};\n\n");
 
   fprintf (out, "static const void *target_data[] = {\n");
-  fprintf (out, "  ptx_code, var_mappings, func_mappings\n");
+  fprintf (out, "  ptx_code, (void*) %u, var_mappings, (void*) %u, "
+		"func_mappings\n", nvars, nfuncs);
   fprintf (out, "};\n\n");
 
   fprintf (out, "extern void GOMP_offload_register (const void *, int, void *);\n");
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index a1d42c5..5272f01 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -655,9 +655,6 @@ struct target_mem_desc {
   /* Corresponding target device descriptor.  */
   struct gomp_device_descr *device_descr;
 
-  /* Memory mapping info for the thread that created this descriptor.  */
-  struct splay_tree_s *mem_map;
-
   /* List of splay keys to remove (or decrease refcount)
      at the end of region.  */
   splay_tree_key list[];
@@ -691,18 +688,6 @@ typedef struct acc_dispatch_t
   /* This is guarded by the lock in the "outer" struct gomp_device_descr.  */
   struct target_mem_desc *data_environ;
 
-  /* Extra information required for a device instance by a given target.  */
-  /* This is guarded by the lock in the "outer" struct gomp_device_descr.  */
-  void *target_data;
-
-  /* Open or close a device instance.  */
-  void *(*open_device_func) (int n);
-  int (*close_device_func) (void *h);
-
-  /* Set or get the device number.  */
-  int (*get_device_num_func) (void);
-  void (*set_device_num_func) (int);
-
   /* Execute.  */
   void (*exec_func) (void (*) (void *), size_t, void **, void **, size_t *,
 		     unsigned short *, int, int, int, int, void *);
@@ -720,7 +705,7 @@ typedef struct acc_dispatch_t
   void (*async_set_async_func) (int);
 
   /* Create/destroy TLS data.  */
-  void *(*create_thread_data_func) (void *);
+  void *(*create_thread_data_func) (int);
   void (*destroy_thread_data_func) (void *);
 
   /* NVIDIA target specific routines.  */
diff --git a/libgomp/oacc-async.c b/libgomp/oacc-async.c
index 08b7c5e..1f5827e 100644
--- a/libgomp/oacc-async.c
+++ b/libgomp/oacc-async.c
@@ -26,7 +26,7 @@
    see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
-
+#include <assert.h>
 #include "openacc.h"
 #include "libgomp.h"
 #include "oacc-int.h"
@@ -37,13 +37,23 @@ acc_async_test (int async)
   if (async < acc_async_sync)
     gomp_fatal ("invalid async argument: %d", async);
 
-  return base_dev->openacc.async_test_func (async);
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (!thr || !thr->dev)
+    gomp_fatal ("no device active");
+
+  return thr->dev->openacc.async_test_func (async);
 }
 
 int
 acc_async_test_all (void)
 {
-  return base_dev->openacc.async_test_all_func ();
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (!thr || !thr->dev)
+    gomp_fatal ("no device active");
+
+  return thr->dev->openacc.async_test_all_func ();
 }
 
 void
@@ -52,19 +62,34 @@ acc_wait (int async)
   if (async < acc_async_sync)
     gomp_fatal ("invalid async argument: %d", async);
 
-  base_dev->openacc.async_wait_func (async);
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (!thr || !thr->dev)
+    gomp_fatal ("no device active");
+
+  thr->dev->openacc.async_wait_func (async);
 }
 
 void
 acc_wait_async (int async1, int async2)
 {
-  base_dev->openacc.async_wait_async_func (async1, async2);
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (!thr || !thr->dev)
+    gomp_fatal ("no device active");
+
+  thr->dev->openacc.async_wait_async_func (async1, async2);
 }
 
 void
 acc_wait_all (void)
 {
-  base_dev->openacc.async_wait_all_func ();
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (!thr || !thr->dev)
+    gomp_fatal ("no device active");
+
+  thr->dev->openacc.async_wait_all_func ();
 }
 
 void
@@ -73,5 +98,10 @@ acc_wait_all_async (int async)
   if (async < acc_async_sync)
     gomp_fatal ("invalid async argument: %d", async);
 
-  base_dev->openacc.async_wait_all_async_func (async);
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (!thr || !thr->dev)
+    gomp_fatal ("no device active");
+
+  thr->dev->openacc.async_wait_all_async_func (async);
 }
diff --git a/libgomp/oacc-cuda.c b/libgomp/oacc-cuda.c
index c8ef376..4aab422 100644
--- a/libgomp/oacc-cuda.c
+++ b/libgomp/oacc-cuda.c
@@ -34,51 +34,53 @@
 void *
 acc_get_current_cuda_device (void)
 {
-  void *p = NULL;
+  struct goacc_thread *thr = goacc_thread ();
 
-  if (base_dev && base_dev->openacc.cuda.get_current_device_func)
-    p = base_dev->openacc.cuda.get_current_device_func ();
+  if (thr && thr->dev && thr->dev->openacc.cuda.get_current_device_func)
+    return thr->dev->openacc.cuda.get_current_device_func ();
 
-  return p;
+  return NULL;
 }
 
 void *
 acc_get_current_cuda_context (void)
 {
-  void *p = NULL;
+  struct goacc_thread *thr = goacc_thread ();
 
-  if (base_dev && base_dev->openacc.cuda.get_current_context_func)
-    p = base_dev->openacc.cuda.get_current_context_func ();
-
-  return p;
+  if (thr && thr->dev && thr->dev->openacc.cuda.get_current_context_func)
+    return thr->dev->openacc.cuda.get_current_context_func ();
+ 
+  return NULL;
 }
 
 void *
 acc_get_cuda_stream (int async)
 {
-  void *p = NULL;
+  struct goacc_thread *thr = goacc_thread ();
 
   if (async < 0)
-    return p;
-
-  if (base_dev && base_dev->openacc.cuda.get_stream_func)
-    p = base_dev->openacc.cuda.get_stream_func (async);
+    return NULL;
 
-  return p;
+  if (thr && thr->dev && thr->dev->openacc.cuda.get_stream_func)
+    return thr->dev->openacc.cuda.get_stream_func (async);
+ 
+  return NULL;
 }
 
 int
 acc_set_cuda_stream (int async, void *stream)
 {
-  int s = -1;
+  struct goacc_thread *thr;
 
   if (async < 0 || stream == NULL)
     return 0;
 
   goacc_lazy_initialize ();
 
-  if (base_dev && base_dev->openacc.cuda.set_stream_func)
-    s = base_dev->openacc.cuda.set_stream_func (async, stream);
+  thr = goacc_thread ();
+
+  if (thr && thr->dev && thr->dev->openacc.cuda.set_stream_func)
+    return thr->dev->openacc.cuda.set_stream_func (async, stream);
 
-  return s;
+  return -1;
 }
diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index e4756b6..6dcdbf3 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -53,16 +53,9 @@ static struct gomp_device_descr host_dispatch =
     .host2dev_func = GOMP_OFFLOAD_host2dev,
     .run_func = GOMP_OFFLOAD_run,
 
-    .mem_map.root = NULL,
     .is_initialized = false,
 
     .openacc = {
-      .open_device_func = GOMP_OFFLOAD_openacc_open_device,
-      .close_device_func = GOMP_OFFLOAD_openacc_close_device,
-
-      .get_device_num_func = GOMP_OFFLOAD_openacc_get_device_num,
-      .set_device_num_func = GOMP_OFFLOAD_openacc_set_device_num,
-
       .exec_func = GOMP_OFFLOAD_openacc_parallel,
 
       .register_async_cleanup_func
diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
index 1e0243e..dc40fb6 100644
--- a/libgomp/oacc-init.c
+++ b/libgomp/oacc-init.c
@@ -37,14 +37,13 @@
 
 static gomp_mutex_t acc_device_lock;
 
-/* The dispatch table for the current accelerator device.  This is global, so
-   you can only have one type of device open at any given time in a program.
-   This is the "base" device in that several devices that use the same
-   dispatch table may be active concurrently: this one (the "zeroth") is used
-   for overall initialisation/shutdown, and other instances -- not necessarily
-   including this one -- may be opened and closed once the base device has
-   been initialized.  */
-struct gomp_device_descr *base_dev;
+/* A cached version of the dispatcher for the global "current" accelerator type,
+   e.g. used as the default when creating new host threads.  This is the
+   device-type equivalent of goacc_device_num (which specifies which device to
+   use out of potentially several of the same type).  If there are several
+   devices of a given type, this points at the first one.  */
+
+static struct gomp_device_descr *cached_base_dev = NULL;
 
 #if defined HAVE_TLS || defined USE_EMUTLS
 __thread struct goacc_thread *goacc_tls_data;
@@ -53,9 +52,6 @@ pthread_key_t goacc_tls_key;
 #endif
 static pthread_key_t goacc_cleanup_key;
 
-/* Current dispatcher, and how it was initialized */
-static acc_device_t init_key = _ACC_device_hwm;
-
 static struct goacc_thread *goacc_threads;
 static gomp_mutex_t goacc_thread_lock;
 
@@ -94,6 +90,21 @@ get_openacc_name (const char *name)
     return name;
 }
 
+static const char *
+name_of_acc_device_t (enum acc_device_t type)
+{
+  switch (type)
+    {
+    case acc_device_none: return "none";
+    case acc_device_default: return "default";
+    case acc_device_host: return "host";
+    case acc_device_host_nonshm: return "host_nonshm";
+    case acc_device_not_host: return "not_host";
+    case acc_device_nvidia: return "nvidia";
+    default: gomp_fatal ("unknown device type %u", (unsigned) type);
+    }
+}
+
 static struct gomp_device_descr *
 resolve_device (acc_device_t d)
 {
@@ -159,22 +170,87 @@ resolve_device (acc_device_t d)
 static struct gomp_device_descr *
 acc_init_1 (acc_device_t d)
 {
-  struct gomp_device_descr *acc_dev;
+  struct gomp_device_descr *base_dev, *acc_dev;
+  int ndevs;
 
-  acc_dev = resolve_device (d);
+  base_dev = resolve_device (d);
+
+  ndevs = base_dev->get_num_devices_func ();
+
+  if (!base_dev || ndevs <= 0 || goacc_device_num >= ndevs)
+    gomp_fatal ("device %s not supported", name_of_acc_device_t (d));
 
-  if (!acc_dev || acc_dev->get_num_devices_func () <= 0)
-    gomp_fatal ("device %u not supported", (unsigned)d);
+  acc_dev = &base_dev[goacc_device_num];
 
   if (acc_dev->is_initialized)
     gomp_fatal ("device already active");
 
-  /* We need to remember what we were intialized as, to check shutdown etc.  */
-  init_key = d;
-
   gomp_init_device (acc_dev);
 
-  return acc_dev;
+  return base_dev;
+}
+
+static void
+acc_shutdown_1 (acc_device_t d)
+{
+  struct gomp_device_descr *base_dev;
+  struct goacc_thread *walk;
+  int ndevs, i;
+  bool devices_active = false;
+
+  /* Get the base device for this device type.  */
+  base_dev = resolve_device (d);
+
+  if (!base_dev)
+    gomp_fatal ("device %s not supported", name_of_acc_device_t (d));
+
+  gomp_mutex_lock (&goacc_thread_lock);
+
+  /* Free target-specific TLS data and close all devices.  */
+  for (walk = goacc_threads; walk != NULL; walk = walk->next)
+    {
+      if (walk->target_tls)
+	base_dev->openacc.destroy_thread_data_func (walk->target_tls);
+
+      walk->target_tls = NULL;
+
+      /* This would mean the user is shutting down OpenACC in the middle of an
+         "acc data" pragma.  Likely not intentional.  */
+      if (walk->mapped_data)
+	gomp_fatal ("shutdown in 'acc data' region");
+
+      /* Similarly, if this happens then user code has done something weird.  */
+      if (walk->saved_bound_dev)
+        gomp_fatal ("shutdown during host fallback");
+
+      if (walk->dev)
+	{
+	  gomp_mutex_lock (&walk->dev->lock);
+	  gomp_free_memmap (&walk->dev->mem_map);
+	  gomp_mutex_unlock (&walk->dev->lock);
+
+	  walk->dev = NULL;
+	  walk->base_dev = NULL;
+	}
+    }
+
+  gomp_mutex_unlock (&goacc_thread_lock);
+
+  ndevs = base_dev->get_num_devices_func ();
+
+  /* Close all the devices of this type that have been opened.  */
+  for (i = 0; i < ndevs; i++)
+    {
+      struct gomp_device_descr *acc_dev = &base_dev[i];
+      if (acc_dev->is_initialized)
+        {
+	  devices_active = true;
+	  gomp_fini_device (acc_dev);
+	}
+    }
+
+  if (!devices_active)
+    gomp_fatal ("no device initialized");
 }
 
 static struct goacc_thread *
@@ -207,9 +283,11 @@ goacc_destroy_thread (void *data)
 
   if (thr)
     {
-      if (base_dev && thr->target_tls)
+      struct gomp_device_descr *acc_dev = thr->dev;
+
+      if (acc_dev && thr->target_tls)
 	{
-	  base_dev->openacc.destroy_thread_data_func (thr->target_tls);
+	  acc_dev->openacc.destroy_thread_data_func (thr->target_tls);
 	  thr->target_tls = NULL;
 	}
 
@@ -236,53 +314,49 @@ goacc_destroy_thread (void *data)
   gomp_mutex_unlock (&goacc_thread_lock);
 }
 
-/* Open the ORD'th device of the currently-active type (base_dev must be
-   initialised before calling).  If ORD is < 0, open the default-numbered
-   device (set by the ACC_DEVICE_NUM environment variable or a call to
-   acc_set_device_num), or leave any currently-opened device as is.  "Opening"
-   consists of calling the device's open_device_func hook, and setting up
-   thread-local data (maybe allocating, then initializing with information
-   pertaining to the newly-opened or previously-opened device).  */
+/* Use the ORD'th device instance for the current host thread (or -1 for the
+   current global default).  The device (and the runtime) must be initialised
+   before calling this function.  */
 
-static void
-lazy_open (int ord)
+void
+goacc_attach_host_thread_to_device (int ord)
 {
   struct goacc_thread *thr = goacc_thread ();
-  struct gomp_device_descr *acc_dev;
-
-  if (thr && thr->dev)
-    {
-      assert (ord < 0 || ord == thr->dev->target_id);
-      return;
-    }
-
-  assert (base_dev);
-
+  struct gomp_device_descr *acc_dev = NULL, *base_dev = NULL;
+  int num_devices;
+  
+  if (thr && thr->dev && (thr->dev->target_id == ord || ord < 0))
+    return;
+  
   if (ord < 0)
     ord = goacc_device_num;
-
-  /* The OpenACC 2.0 spec leaves the runtime's behaviour when an out-of-range
-     device is requested as implementation-defined (4.2 ACC_DEVICE_NUM).
-     We choose to raise an error in such a case.  */
-  if (ord >= base_dev->get_num_devices_func ())
-    gomp_fatal ("device %u does not exist", ord);
-
+  
+  /* Decide which type of device to use.  If the current thread has a device
+     type already (e.g. set by acc_set_device_type), use that, else use the
+     global default.  */
+  if (thr && thr->base_dev)
+    base_dev = thr->base_dev;
+  else
+    {
+      assert (cached_base_dev);
+      base_dev = cached_base_dev;
+    }
+  
+  num_devices = base_dev->get_num_devices_func ();
+  if (num_devices <= 0 || ord >= num_devices)
+    gomp_fatal ("device %u out of range", ord);
+  
   if (!thr)
     thr = goacc_new_thread ();
-
-  acc_dev = thr->dev = &base_dev[ord];
-
-  assert (acc_dev->target_id == ord);
-
+  
+  thr->base_dev = base_dev;
+  thr->dev = acc_dev = &base_dev[ord];
   thr->saved_bound_dev = NULL;
   thr->mapped_data = NULL;
-
-  if (!acc_dev->openacc.target_data)
-    acc_dev->openacc.target_data = acc_dev->openacc.open_device_func (ord);
-
+  
   thr->target_tls
-    = acc_dev->openacc.create_thread_data_func (acc_dev->openacc.target_data);
-
+    = acc_dev->openacc.create_thread_data_func (ord);
+  
   acc_dev->openacc.async_set_async_func (acc_async_sync);
 }
 
@@ -292,74 +366,20 @@ lazy_open (int ord)
 void
 acc_init (acc_device_t d)
 {
-  if (!base_dev)
+  if (!cached_base_dev)
     gomp_init_targets_once ();
 
   gomp_mutex_lock (&acc_device_lock);
 
-  base_dev = acc_init_1 (d);
-
-  lazy_open (-1);
+  cached_base_dev = acc_init_1 (d);
 
   gomp_mutex_unlock (&acc_device_lock);
+  
+  goacc_attach_host_thread_to_device (-1);
 }
 
 ialias (acc_init)
 
-static void
-acc_shutdown_1 (acc_device_t d)
-{
-  struct goacc_thread *walk;
-
-  /* We don't check whether d matches the actual device found, because
-     OpenACC 2.0 (3.2.12) says the parameters to the init and this
-     call must match (for the shutdown call anyway, it's silent on
-     others).  */
-
-  if (!base_dev)
-    gomp_fatal ("no device initialized");
-  if (d != init_key)
-    gomp_fatal ("device %u(%u) is initialized",
-		(unsigned) init_key, (unsigned) base_dev->type);
-
-  gomp_mutex_lock (&goacc_thread_lock);
-
-  /* Free target-specific TLS data and close all devices.  */
-  for (walk = goacc_threads; walk != NULL; walk = walk->next)
-    {
-      if (walk->target_tls)
-	base_dev->openacc.destroy_thread_data_func (walk->target_tls);
-
-      walk->target_tls = NULL;
-
-      /* This would mean the user is shutting down OpenACC in the middle of an
-         "acc data" pragma.  Likely not intentional.  */
-      if (walk->mapped_data)
-	gomp_fatal ("shutdown in 'acc data' region");
-
-      if (walk->dev)
-	{
-	  void *target_data = walk->dev->openacc.target_data;
-	  if (walk->dev->openacc.close_device_func (target_data) < 0)
-	    gomp_fatal ("failed to close device");
-
-	  walk->dev->openacc.target_data = target_data = NULL;
-
-	  gomp_mutex_lock (&walk->dev->lock);
-	  gomp_free_memmap (&walk->dev->mem_map);
-	  gomp_mutex_unlock (&walk->dev->lock);
-
-	  walk->dev = NULL;
-	}
-    }
-
-  gomp_mutex_unlock (&goacc_thread_lock);
-
-  gomp_fini_device (base_dev);
-
-  base_dev = NULL;
-}
-
 void
 acc_shutdown (acc_device_t d)
 {
@@ -372,59 +392,16 @@ acc_shutdown (acc_device_t d)
 
 ialias (acc_shutdown)
 
-/* This function is called after plugins have been initialized.  It deals with
-   the "base" device, and is used to prepare the runtime for dealing with a
-   number of such devices (as implemented by some particular plugin).  If the
-   argument device type D matches a previous call to the function, return the
-   current base device, else shut the old device down and re-initialize with
-   the new device type.  */
-
-static struct gomp_device_descr *
-lazy_init (acc_device_t d)
-{
-  if (base_dev)
-    {
-      /* Re-initializing the same device, do nothing.  */
-      if (d == init_key)
-	return base_dev;
-
-      acc_shutdown_1 (init_key);
-    }
-
-  assert (!base_dev);
-
-  return acc_init_1 (d);
-}
-
-/* Ensure that plugins are loaded, initialize and open the (default-numbered)
-   device.  */
-
-static void
-lazy_init_and_open (acc_device_t d)
-{
-  if (!base_dev)
-    gomp_init_targets_once ();
-
-  gomp_mutex_lock (&acc_device_lock);
-
-  base_dev = lazy_init (d);
-
-  lazy_open (-1);
-
-  gomp_mutex_unlock (&acc_device_lock);
-}
-
 int
 acc_get_num_devices (acc_device_t d)
 {
   int n = 0;
-  const struct gomp_device_descr *acc_dev;
+  struct gomp_device_descr *acc_dev;
 
   if (d == acc_device_none)
     return 0;
 
-  if (!base_dev)
-    gomp_init_targets_once ();
+  gomp_init_targets_once ();
 
   acc_dev = resolve_device (d);
   if (!acc_dev)
@@ -439,10 +416,39 @@ acc_get_num_devices (acc_device_t d)
 
 ialias (acc_get_num_devices)
 
+/* Set the device type for the current thread only (using the current global
+   default device number), initialising that device if necessary.  Also set the
+   default device type for new threads to D.  */
+
 void
 acc_set_device_type (acc_device_t d)
 {
-  lazy_init_and_open (d);
+  struct gomp_device_descr *base_dev, *acc_dev;
+  struct goacc_thread *thr = goacc_thread ();
+
+  gomp_mutex_lock (&acc_device_lock);
+
+  if (!cached_base_dev)
+    gomp_init_targets_once ();
+
+  cached_base_dev = base_dev = resolve_device (d);
+  acc_dev = &base_dev[goacc_device_num];
+
+  if (!acc_dev->is_initialized)
+    gomp_init_device (acc_dev);
+
+  gomp_mutex_unlock (&acc_device_lock);
+
+  /* We're changing device type: invalidate the current thread's dev and
+     base_dev pointers.  */
+  if (thr && thr->base_dev != base_dev)
+    {
+      thr->base_dev = thr->dev = NULL;
+      if (thr->mapped_data)
+        gomp_fatal ("acc_set_device_type in 'acc data' region");
+    }
+
+  goacc_attach_host_thread_to_device (-1);
 }
 
 ialias (acc_set_device_type)
@@ -451,10 +457,11 @@ acc_device_t
 acc_get_device_type (void)
 {
   acc_device_t res = acc_device_none;
-  const struct gomp_device_descr *dev;
+  struct gomp_device_descr *dev;
+  struct goacc_thread *thr = goacc_thread ();
 
-  if (base_dev)
-    res = acc_device_type (base_dev->type);
+  if (thr && thr->base_dev)
+    res = acc_device_type (thr->base_dev->type);
   else
     {
       gomp_init_targets_once ();
@@ -475,78 +482,65 @@ int
 acc_get_device_num (acc_device_t d)
 {
   const struct gomp_device_descr *dev;
-  int num;
+  struct goacc_thread *thr = goacc_thread ();
 
   if (d >= _ACC_device_hwm)
     gomp_fatal ("device %u out of range", (unsigned)d);
 
-  if (!base_dev)
+  if (!cached_base_dev)
     gomp_init_targets_once ();
 
   dev = resolve_device (d);
   if (!dev)
-    gomp_fatal ("no devices of type %u", d);
+    gomp_fatal ("device %s not supported", name_of_acc_device_t (d));
 
-  /* We might not have called lazy_open for this host thread yet, in which case
-     the get_device_num_func hook will return -1.  */
-  num = dev->openacc.get_device_num_func ();
-  if (num < 0)
-    num = goacc_device_num;
+  if (thr && thr->base_dev == dev && thr->dev)
+    return thr->dev->target_id;
 
-  return num;
+  return goacc_device_num;
 }
 
 ialias (acc_get_device_num)
 
 void
-acc_set_device_num (int n, acc_device_t d)
+acc_set_device_num (int ord, acc_device_t d)
 {
-  const struct gomp_device_descr *dev;
+  struct gomp_device_descr *base_dev, *acc_dev;
   int num_devices;
 
-  if (!base_dev)
+  if (!cached_base_dev)
     gomp_init_targets_once ();
 
-  if ((int) d == 0)
-    {
-      int i;
-
-      /* A device setting of zero sets all device types on the system to use
-         the Nth instance of that device type.  Only attempt it for initialized
-	 devices though.  */
-      for (i = acc_device_not_host + 1; i < _ACC_device_hwm; i++)
-        {
-	  dev = resolve_device (d);
-	  if (dev && dev->is_initialized)
-	    dev->openacc.set_device_num_func (n);
-	}
+  if (ord < 0)
+    ord = goacc_device_num;
 
-      /* ...and for future calls to acc_init/acc_set_device_type, etc.  */
-      goacc_device_num = n;
-    }
+  if ((int) d == 0)
+    /* Set whatever device is being used by the current host thread to use
+       device instance ORD.  It's unclear if this is supposed to affect other
+       host threads too (OpenACC 2.0 (3.2.4) acc_set_device_num).  */
+    goacc_attach_host_thread_to_device (ord);
   else
     {
-      struct goacc_thread *thr = goacc_thread ();
-
       gomp_mutex_lock (&acc_device_lock);
 
-      base_dev = lazy_init (d);
+      cached_base_dev = base_dev = resolve_device (d);
 
       num_devices = base_dev->get_num_devices_func ();
 
-      if (n >= num_devices)
-        gomp_fatal ("device %u out of range", n);
+      if (ord >= num_devices)
+        gomp_fatal ("device %u out of range", ord);
 
-      /* If we're changing the device number, de-associate this thread with
-	 the device (but don't close the device, since it may be in use by
-	 other threads).  */
-      if (thr && thr->dev && n != thr->dev->target_id)
-	thr->dev = NULL;
+      acc_dev = &base_dev[ord];
 
-      lazy_open (n);
+      if (!acc_dev->is_initialized)
+        gomp_init_device (acc_dev);
 
       gomp_mutex_unlock (&acc_device_lock);
+
+      goacc_attach_host_thread_to_device (ord);
     }
+  
+  goacc_device_num = ord;
 }
 
 ialias (acc_set_device_num)
@@ -554,10 +548,7 @@ ialias (acc_set_device_num)
 int
 acc_on_device (acc_device_t dev)
 {
-  struct goacc_thread *thr = goacc_thread ();
-
-  if (thr && thr->dev
-      && acc_device_type (thr->dev->type) == acc_device_host_nonshm)
+  if (acc_get_device_type () == acc_device_host_nonshm)
     return dev == acc_device_host_nonshm || dev == acc_device_not_host;
 
   /* Just rely on the compiler builtin.  */
@@ -577,7 +568,7 @@ goacc_runtime_initialize (void)
 
   pthread_key_create (&goacc_cleanup_key, goacc_destroy_thread);
 
-  base_dev = NULL;
+  cached_base_dev = NULL;
 
   goacc_threads = NULL;
   gomp_mutex_init (&goacc_thread_lock);
@@ -606,9 +597,8 @@ goacc_restore_bind (void)
 }
 
 /* This is called from any OpenACC support function that may need to implicitly
-   initialize the libgomp runtime.  On exit all such initialization will have
-   been done, and both the global ACC_dev and the per-host-thread ACC_memmap
-   pointers will be valid.  */
+   initialize the libgomp runtime, either globally or from a new host thread. 
+   On exit "goacc_thread" will return a valid & populated thread block.  */
 
 attribute_hidden void
 goacc_lazy_initialize (void)
@@ -618,12 +608,8 @@ goacc_lazy_initialize (void)
   if (thr && thr->dev)
     return;
 
-  if (!base_dev)
-    lazy_init_and_open (acc_device_default);
+  if (!cached_base_dev)
+    acc_init (acc_device_default);
   else
-    {
-      gomp_mutex_lock (&acc_device_lock);
-      lazy_open (-1);
-      gomp_mutex_unlock (&acc_device_lock);
-    }
+    goacc_attach_host_thread_to_device (-1);
 }
diff --git a/libgomp/oacc-int.h b/libgomp/oacc-int.h
index 85619c8..0ace737 100644
--- a/libgomp/oacc-int.h
+++ b/libgomp/oacc-int.h
@@ -56,6 +56,9 @@ acc_device_type (enum offload_target_type type)
 
 struct goacc_thread
 {
+  /* The base device for the current thread.  */
+  struct gomp_device_descr *base_dev;
+
   /* The device for the current thread.  */
   struct gomp_device_descr *dev;
 
@@ -89,10 +92,7 @@ goacc_thread (void)
 #endif
 
 void goacc_register (struct gomp_device_descr *) __GOACC_NOTHROW;
-
-/* Current dispatcher.  */
-extern struct gomp_device_descr *base_dev;
-
+void goacc_attach_host_thread_to_device (int);
 void goacc_runtime_initialize (void);
 void goacc_save_and_set_bind (acc_device_t);
 void goacc_restore_bind (void);
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index fdc82e6..89ef5fc 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -107,7 +107,9 @@ acc_malloc (size_t s)
 
   struct goacc_thread *thr = goacc_thread ();
 
-  return base_dev->alloc_func (thr->dev->target_id, s);
+  assert (thr->dev);
+
+  return thr->dev->alloc_func (thr->dev->target_id, s);
 }
 
 /* OpenACC 2.0a (3.2.16) doesn't specify what to do in the event
@@ -122,6 +124,8 @@ acc_free (void *d)
   if (!d)
     return;
 
+  assert (thr && thr->dev);
+
   /* We don't have to call lazy open here, as the ptr value must have
      been returned by acc_malloc.  It's not permitted to pass NULL in
      (unless you got that null from acc_malloc).  */
@@ -134,7 +138,7 @@ acc_free (void *d)
      acc_unmap_data ((void *)(k->host_start + offset));
    }
 
-  base_dev->free_func (thr->dev->target_id, d);
+  thr->dev->free_func (thr->dev->target_id, d);
 }
 
 void
@@ -144,7 +148,9 @@ acc_memcpy_to_device (void *d, void *h, size_t s)
      been obtained from a routine that did that.  */
   struct goacc_thread *thr = goacc_thread ();
 
-  base_dev->host2dev_func (thr->dev->target_id, d, h, s);
+  assert (thr && thr->dev);
+
+  thr->dev->host2dev_func (thr->dev->target_id, d, h, s);
 }
 
 void
@@ -154,7 +160,9 @@ acc_memcpy_from_device (void *h, void *d, size_t s)
      been obtained from a routine that did that.  */
   struct goacc_thread *thr = goacc_thread ();
 
-  base_dev->dev2host_func (thr->dev->target_id, h, d, s);
+  assert (thr && thr->dev);
+
+  thr->dev->dev2host_func (thr->dev->target_id, h, d, s);
 }
 
 /* Return the device pointer that corresponds to host data H.  Or NULL
diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
index 563f9bb..d899946 100644
--- a/libgomp/oacc-parallel.c
+++ b/libgomp/oacc-parallel.c
@@ -49,32 +49,6 @@ find_pset (int pos, size_t mapnum, unsigned short *kinds)
   return kind == GOMP_MAP_TO_PSET;
 }
 
-
-/* Ensure that the target device for DEVICE_TYPE is initialised (and that
-   plugins have been loaded if appropriate).  The ACC_dev variable for the
-   current thread will be set appropriately for the given device type on
-   return.  */
-
-attribute_hidden void
-select_acc_device (int device_type)
-{
-  goacc_lazy_initialize ();
-
-  if (device_type == GOMP_DEVICE_HOST_FALLBACK)
-    return;
-
-  if (device_type == acc_device_none)
-    device_type = acc_device_host;
-
-  if (device_type >= 0)
-    {
-      /* NOTE: this will go badly if the surrounding data environment is set up
-         to use a different device type.  We'll just have to trust that users
-	 know what they're doing...  */
-      acc_set_device_type (device_type);
-    }
-}
-
 static void goacc_wait (int async, int num_waits, va_list ap);
 
 void
@@ -111,7 +85,7 @@ GOACC_parallel (int device, void (*fn) (void *),
 	      __FUNCTION__, (unsigned long) mapnum, hostaddrs, sizes, kinds,
 	      async);
 #endif
-  select_acc_device (device);
+  goacc_lazy_initialize ();
 
   thr = goacc_thread ();
   acc_dev = thr->dev;
@@ -151,7 +125,7 @@ GOACC_parallel (int device, void (*fn) (void *),
       if (tgt_fn_key == NULL)
 	gomp_fatal ("target function wasn't mapped");
 
-      tgt_fn = (void (*)) tgt_fn_key->tgt->tgt_start;
+      tgt_fn = (void (*)) tgt_fn_key->tgt_offset;
     }
   else
     tgt_fn = (void (*)) fn;
@@ -195,7 +169,7 @@ GOACC_data_start (int device, size_t mapnum,
 	      __FUNCTION__, (unsigned long) mapnum, hostaddrs, sizes, kinds);
 #endif
 
-  select_acc_device (device);
+  goacc_lazy_initialize ();
 
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
@@ -242,7 +216,7 @@ GOACC_enter_exit_data (int device, size_t mapnum,
   bool data_enter = false;
   size_t i;
 
-  select_acc_device (device);
+  goacc_lazy_initialize ();
 
   thr = goacc_thread ();
   acc_dev = thr->dev;
@@ -429,7 +403,7 @@ GOACC_update (int device, size_t mapnum,
   bool host_fallback = device == GOMP_DEVICE_HOST_FALLBACK;
   size_t i;
 
-  select_acc_device (device);
+  goacc_lazy_initialize ();
 
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
diff --git a/libgomp/plugin/plugin-host.c b/libgomp/plugin/plugin-host.c
index bc60f72..1faf5bc 100644
--- a/libgomp/plugin/plugin-host.c
+++ b/libgomp/plugin/plugin-host.c
@@ -119,31 +119,6 @@ GOMP_OFFLOAD_unload_image (int n __attribute__ ((unused)),
 }
 
 STATIC void *
-GOMP_OFFLOAD_openacc_open_device (int n)
-{
-  return (void *) (intptr_t) n;
-}
-
-STATIC int
-GOMP_OFFLOAD_openacc_close_device (void *hnd)
-{
-  return 0;
-}
-
-STATIC int
-GOMP_OFFLOAD_openacc_get_device_num (void)
-{
-  return 0;
-}
-
-STATIC void
-GOMP_OFFLOAD_openacc_set_device_num (int n)
-{
-  if (n > 0)
-    GOMP (fatal) ("device number %u out of range for host execution", n);
-}
-
-STATIC void *
 GOMP_OFFLOAD_alloc (int n __attribute__ ((unused)), size_t s)
 {
   return GOMP (malloc) (s);
@@ -254,7 +229,7 @@ GOMP_OFFLOAD_openacc_async_wait_all_async (int async __attribute__ ((unused)))
 }
 
 STATIC void *
-GOMP_OFFLOAD_openacc_create_thread_data (void *targ_data
+GOMP_OFFLOAD_openacc_create_thread_data (int ord
 					 __attribute__ ((unused)))
 {
   return NULL;
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 483cb75..583ec87 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -133,7 +133,8 @@ struct targ_fn_descriptor
   const char *name;
 };
 
-static bool ptx_inited = false;
+static unsigned int instantiated_devices = 0;
+static pthread_mutex_t ptx_dev_lock = PTHREAD_MUTEX_INITIALIZER;
 
 struct ptx_stream
 {
@@ -331,9 +332,21 @@ struct ptx_event
   struct ptx_event *next;
 };
 
+struct ptx_image_data
+{
+  void *target_data;
+  CUmodule module;
+  struct ptx_image_data *next;
+};
+
 static pthread_mutex_t ptx_event_lock;
 static struct ptx_event *ptx_events;
 
+static struct ptx_device **ptx_devices;
+
+static struct ptx_image_data *ptx_images = NULL;
+static pthread_mutex_t ptx_image_lock = PTHREAD_MUTEX_INITIALIZER;
+
 #define _XSTR(s) _STR(s)
 #define _STR(s) #s
 
@@ -450,8 +463,8 @@ fini_streams_for_device (struct ptx_device *ptx_dev)
       struct ptx_stream *s = ptx_dev->active_streams;
       ptx_dev->active_streams = ptx_dev->active_streams->next;
 
-      cuStreamDestroy (s->stream);
       map_fini (s);
+      cuStreamDestroy (s->stream);
       free (s);
     }
 
@@ -575,21 +588,21 @@ select_stream_for_async (int async, pthread_t thread, bool create,
   return stream;
 }
 
-static int nvptx_get_num_devices (void);
-
-/* Initialize the device.  */
-static int
+/* Initialize the device.  Return TRUE on success, else FALSE.  PTX_DEV_LOCK
+   should be locked on entry and remains locked on exit.  */
+static bool
 nvptx_init (void)
 {
   CUresult r;
   int rc;
+  int ndevs;
 
-  if (ptx_inited)
-    return nvptx_get_num_devices ();
+  if (instantiated_devices != 0)
+    return true;
 
   rc = verify_device_library ();
   if (rc < 0)
-    return -1;
+    return false;
 
   r = cuInit (0);
   if (r != CUDA_SUCCESS)
@@ -599,22 +612,64 @@ nvptx_init (void)
 
   pthread_mutex_init (&ptx_event_lock, NULL);
 
-  ptx_inited = true;
+  r = cuDeviceGetCount (&ndevs);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuDeviceGetCount error: %s", cuda_error (r));
 
-  return nvptx_get_num_devices ();
+  ptx_devices = GOMP_PLUGIN_malloc_cleared (sizeof (struct ptx_device *)
+					    * ndevs);
+
+  return true;
 }
 
+/* Select the N'th PTX device for the current host thread.  The device must
+   have been previously opened before calling this function.  */
+
 static void
-nvptx_fini (void)
+nvptx_attach_host_thread_to_device (int n)
 {
-  ptx_inited = false;
+  CUdevice dev;
+  CUresult r;
+  struct ptx_device *ptx_dev;
+  CUcontext thd_ctx;
+
+  r = cuCtxGetDevice (&dev);
+  if (r != CUDA_SUCCESS && r != CUDA_ERROR_INVALID_CONTEXT)
+    GOMP_PLUGIN_fatal ("cuCtxGetDevice error: %s", cuda_error (r));
+
+  if (r != CUDA_ERROR_INVALID_CONTEXT && dev == n)
+    return;
+  else
+    {
+      CUcontext old_ctx;
+
+      ptx_dev = ptx_devices[n];
+      assert (ptx_dev);
+
+      r = cuCtxGetCurrent (&thd_ctx);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuCtxGetCurrent error: %s", cuda_error (r));
+
+      /* We don't necessarily have a current context (e.g. if it has been
+         destroyed.  Pop it if we do though.  */
+      if (thd_ctx != NULL)
+	{
+	  r = cuCtxPopCurrent (&old_ctx);
+	  if (r != CUDA_SUCCESS)
+            GOMP_PLUGIN_fatal ("cuCtxPopCurrent error: %s", cuda_error (r));
+	}
+
+      r = cuCtxPushCurrent (ptx_dev->ctx);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuCtxPushCurrent error: %s", cuda_error (r));
+    }
 }
 
-static void *
+static struct ptx_device *
 nvptx_open_device (int n)
 {
   struct ptx_device *ptx_dev;
-  CUdevice dev;
+  CUdevice dev, ctx_dev;
   CUresult r;
   int async_engines, pi;
 
@@ -628,6 +683,21 @@ nvptx_open_device (int n)
   ptx_dev->dev = dev;
   ptx_dev->ctx_shared = false;
 
+  r = cuCtxGetDevice (&ctx_dev);
+  if (r != CUDA_SUCCESS && r != CUDA_ERROR_INVALID_CONTEXT)
+    GOMP_PLUGIN_fatal ("cuCtxGetDevice error: %s", cuda_error (r));
+  
+  if (r != CUDA_ERROR_INVALID_CONTEXT && ctx_dev != dev)
+    {
+      /* The current host thread has an active context for a different device.
+         Detach it.  */
+      CUcontext old_ctx;
+      
+      r = cuCtxPopCurrent (&old_ctx);
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuCtxPopCurrent error: %s", cuda_error (r));
+    }
+
   r = cuCtxGetCurrent (&ptx_dev->ctx);
   if (r != CUDA_SUCCESS)
     GOMP_PLUGIN_fatal ("cuCtxGetCurrent error: %s", cuda_error (r));
@@ -678,17 +748,16 @@ nvptx_open_device (int n)
 
   init_streams_for_device (ptx_dev, async_engines);
 
-  return (void *) ptx_dev;
+  return ptx_dev;
 }
 
-static int
-nvptx_close_device (void *targ_data)
+static void
+nvptx_close_device (struct ptx_device *ptx_dev)
 {
   CUresult r;
-  struct ptx_device *ptx_dev = targ_data;
 
   if (!ptx_dev)
-    return 0;
+    return;
 
   fini_streams_for_device (ptx_dev);
 
@@ -700,8 +769,6 @@ nvptx_close_device (void *targ_data)
     }
 
   free (ptx_dev);
-
-  return 0;
 }
 
 static int
@@ -714,7 +781,7 @@ nvptx_get_num_devices (void)
      order to enumerate available devices, but CUDA API routines can't be used
      until cuInit has been called.  Just call it now (but don't yet do any
      further initialization).  */
-  if (!ptx_inited)
+  if (instantiated_devices == 0)
     cuInit (0);
 
   r = cuDeviceGetCount (&n);
@@ -1507,64 +1574,84 @@ GOMP_OFFLOAD_get_num_devices (void)
   return nvptx_get_num_devices ();
 }
 
-static void **kernel_target_data;
-static void **kernel_host_table;
-
 void
-GOMP_OFFLOAD_register_image (void *host_table, void *target_data)
+GOMP_OFFLOAD_init_device (int n)
 {
-  kernel_target_data = target_data;
-  kernel_host_table = host_table;
-}
+  pthread_mutex_lock (&ptx_dev_lock);
 
-void
-GOMP_OFFLOAD_init_device (int n __attribute__ ((unused)))
-{
-  (void) nvptx_init ();
+  if (!nvptx_init () || ptx_devices[n] != NULL)
+    {
+      pthread_mutex_unlock (&ptx_dev_lock);
+      return;
+    }
+
+  ptx_devices[n] = nvptx_open_device (n);
+  instantiated_devices++;
+
+  pthread_mutex_unlock (&ptx_dev_lock);
 }
 
 void
-GOMP_OFFLOAD_fini_device (int n __attribute__ ((unused)))
+GOMP_OFFLOAD_fini_device (int n)
 {
-  nvptx_fini ();
+  pthread_mutex_lock (&ptx_dev_lock);
+
+  if (ptx_devices[n] != NULL)
+    {
+      nvptx_attach_host_thread_to_device (n);
+      nvptx_close_device (ptx_devices[n]);
+      ptx_devices[n] = NULL;
+      instantiated_devices--;
+    }
+
+  pthread_mutex_unlock (&ptx_dev_lock);
 }
 
 int
-GOMP_OFFLOAD_get_table (int n __attribute__ ((unused)),
-			struct mapping_table **tablep)
+GOMP_OFFLOAD_load_image (int ord, void *target_data,
+			 struct addr_pair **target_table)
 {
   CUmodule module;
-  void **fn_table;
-  char **fn_names;
-  int fn_entries, i;
+  char **fn_names, **var_names;
+  unsigned int fn_entries, var_entries, i, j;
   CUresult r;
   struct targ_fn_descriptor *targ_fns;
+  void **img_header = (void **) target_data;
+  struct ptx_image_data *new_image;
 
-  if (nvptx_init () <= 0)
-    return 0;
+  GOMP_OFFLOAD_init_device (ord);
 
-  /* This isn't an error, because an image may legitimately have no offloaded
-     regions and so will not call GOMP_offload_register.  */
-  if (kernel_target_data == NULL)
-    return 0;
+  nvptx_attach_host_thread_to_device (ord);
+
+  link_ptx (&module, img_header[0]);
 
-  link_ptx (&module, kernel_target_data[0]);
+  pthread_mutex_lock (&ptx_image_lock);
+  new_image = GOMP_PLUGIN_malloc (sizeof (struct ptx_image_data));
+  new_image->target_data = target_data;
+  new_image->module = module;
+  new_image->next = ptx_images;
+  ptx_images = new_image;
+  pthread_mutex_unlock (&ptx_image_lock);
 
-  /* kernel_target_data[0] -> ptx code
-     kernel_target_data[1] -> variable mappings
-     kernel_target_data[2] -> array of kernel names in ascii
+  /* The mkoffload utility emits a table of pointers/integers at the start of
+     each offload image:
 
-     kernel_host_table[0] -> start of function addresses (__offload_func_table)
-     kernel_host_table[1] -> end of function addresses (__offload_funcs_end)
+     img_header[0] -> ptx code
+     img_header[1] -> number of variables
+     img_header[2] -> array of variable names (pointers to strings)
+     img_header[3] -> number of kernels
+     img_header[4] -> array of kernel names (pointers to strings)
 
      The array of kernel names and the functions addresses form a
      one-to-one correspondence.  */
 
-  fn_table = kernel_host_table[0];
-  fn_names = (char **) kernel_target_data[2];
-  fn_entries = (kernel_host_table[1] - kernel_host_table[0]) / sizeof (void *);
+  var_entries = (uintptr_t) img_header[1];
+  var_names = (char **) img_header[2];
+  fn_entries = (uintptr_t) img_header[3];
+  fn_names = (char **) img_header[4];
 
-  *tablep = GOMP_PLUGIN_malloc (sizeof (struct mapping_table) * fn_entries);
+  *target_table = GOMP_PLUGIN_malloc (sizeof (struct addr_pair)
+				      * (fn_entries + var_entries));
   targ_fns = GOMP_PLUGIN_malloc (sizeof (struct targ_fn_descriptor)
 				 * fn_entries);
 
@@ -1579,38 +1666,86 @@ GOMP_OFFLOAD_get_table (int n __attribute__ ((unused)),
       targ_fns[i].fn = function;
       targ_fns[i].name = (const char *) fn_names[i];
 
-      (*tablep)[i].host_start = (uintptr_t) fn_table[i];
-      (*tablep)[i].host_end = (*tablep)[i].host_start + 1;
-      (*tablep)[i].tgt_start = (uintptr_t) &targ_fns[i];
-      (*tablep)[i].tgt_end = (*tablep)[i].tgt_start + 1;
+      (*target_table)[i].start = (uintptr_t) &targ_fns[i];
+      (*target_table)[i].end = (*target_table)[i].start + 1;
     }
 
-  return fn_entries;
+  for (j = 0; j < var_entries; j++, i++)
+    {
+      CUdeviceptr var;
+      size_t bytes;
+
+      r = cuModuleGetGlobal (&var, &bytes, module, var_names[j]);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuModuleGetGlobal error: %s", cuda_error (r));
+
+      (*target_table)[i].start = (uintptr_t) var;
+      (*target_table)[i].end = (*target_table)[i].start + bytes;
+    }
+
+  return i;
+}
+
+void
+GOMP_OFFLOAD_unload_image (int tid __attribute__((unused)), void *target_data)
+{
+  void **img_header = (void **) target_data;
+  struct targ_fn_descriptor *targ_fns
+    = (struct targ_fn_descriptor *) img_header[0];
+  struct ptx_image_data *image, *prev = NULL, *newhd = NULL;
+
+  free (targ_fns);
+
+  pthread_mutex_lock (&ptx_image_lock);
+  for (image = ptx_images; image != NULL;)
+    {
+      struct ptx_image_data *next = image->next;
+
+      if (image->target_data == target_data)
+	{
+	  cuModuleUnload (image->module);
+	  free (image);
+	  if (prev)
+	    prev->next = next;
+	}
+      else
+	{
+	  prev = image;
+	  if (!newhd)
+	    newhd = image;
+	}
+
+      image = next;
+    }
+  ptx_images = newhd;
+  pthread_mutex_unlock (&ptx_image_lock);
 }
 
 void *
-GOMP_OFFLOAD_alloc (int n __attribute__ ((unused)), size_t size)
+GOMP_OFFLOAD_alloc (int ord, size_t size)
 {
+  nvptx_attach_host_thread_to_device (ord);
   return nvptx_alloc (size);
 }
 
 void
-GOMP_OFFLOAD_free (int n __attribute__ ((unused)), void *ptr)
+GOMP_OFFLOAD_free (int ord, void *ptr)
 {
+  nvptx_attach_host_thread_to_device (ord);
   nvptx_free (ptr);
 }
 
 void *
-GOMP_OFFLOAD_dev2host (int ord __attribute__ ((unused)), void *dst,
-		       const void *src, size_t n)
+GOMP_OFFLOAD_dev2host (int ord, void *dst, const void *src, size_t n)
 {
+  nvptx_attach_host_thread_to_device (ord);
   return nvptx_dev2host (dst, src, n);
 }
 
 void *
-GOMP_OFFLOAD_host2dev (int ord __attribute__ ((unused)), void *dst,
-		       const void *src, size_t n)
+GOMP_OFFLOAD_host2dev (int ord, void *dst, const void *src, size_t n)
 {
+  nvptx_attach_host_thread_to_device (ord);
   return nvptx_host2dev (dst, src, n);
 }
 
@@ -1627,45 +1762,6 @@ GOMP_OFFLOAD_openacc_parallel (void (*fn) (void *), size_t mapnum,
 	    num_workers, vector_length, async, targ_mem_desc);
 }
 
-void *
-GOMP_OFFLOAD_openacc_open_device (int n)
-{
-  return nvptx_open_device (n);
-}
-
-int
-GOMP_OFFLOAD_openacc_close_device (void *h)
-{
-  return nvptx_close_device (h);
-}
-
-void
-GOMP_OFFLOAD_openacc_set_device_num (int n)
-{
-  struct nvptx_thread *nvthd = nvptx_thread ();
-
-  assert (n >= 0);
-
-  if (!nvthd->ptx_dev || nvthd->ptx_dev->ord != n)
-    (void) nvptx_open_device (n);
-}
-
-/* This can be called before the device is "opened" for the current thread, in
-   which case we can't tell which device number should be returned.  We don't
-   actually want to open the device here, so just return -1 and let the caller
-   (oacc-init.c:acc_get_device_num) handle it.  */
-
-int
-GOMP_OFFLOAD_openacc_get_device_num (void)
-{
-  struct nvptx_thread *nvthd = nvptx_thread ();
-
-  if (nvthd && nvthd->ptx_dev)
-    return nvthd->ptx_dev->ord;
-  else
-    return -1;
-}
-
 void
 GOMP_OFFLOAD_openacc_register_async_cleanup (void *targ_mem_desc)
 {
@@ -1729,14 +1825,18 @@ GOMP_OFFLOAD_openacc_async_set_async (int async)
 }
 
 void *
-GOMP_OFFLOAD_openacc_create_thread_data (void *targ_data)
+GOMP_OFFLOAD_openacc_create_thread_data (int ord)
 {
-  struct ptx_device *ptx_dev = (struct ptx_device *) targ_data;
+  struct ptx_device *ptx_dev;
   struct nvptx_thread *nvthd
     = GOMP_PLUGIN_malloc (sizeof (struct nvptx_thread));
   CUresult r;
   CUcontext thd_ctx;
 
+  ptx_dev = ptx_devices[ord];
+
+  assert (ptx_dev);
+
   r = cuCtxGetCurrent (&thd_ctx);
   if (r != CUDA_SUCCESS)
     GOMP_PLUGIN_fatal ("cuCtxGetCurrent error: %s", cuda_error (r));
diff --git a/libgomp/target.c b/libgomp/target.c
index dfe7fb9..d8da783 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -178,7 +178,6 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
   tgt->list_count = mapnum;
   tgt->refcount = 1;
   tgt->device_descr = devicep;
-  tgt->mem_map = mem_map;
 
   if (mapnum == 0)
     return tgt;
@@ -597,7 +596,7 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
 	  devicep->dev2host_func (devicep->target_id, (void *) k->host_start,
 				  (void *) (k->tgt->tgt_start + k->tgt_offset),
 				  k->host_end - k->host_start);
-	splay_tree_remove (tgt->mem_map, k);
+	splay_tree_remove (&devicep->mem_map, k);
 	if (k->tgt->refcount > 1)
 	  k->tgt->refcount--;
 	else
@@ -1159,10 +1158,6 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
     {
       optional_present = optional_total = 0;
       DLSYM_OPT (openacc.exec, openacc_parallel);
-      DLSYM_OPT (openacc.open_device, openacc_open_device);
-      DLSYM_OPT (openacc.close_device, openacc_close_device);
-      DLSYM_OPT (openacc.get_device_num, openacc_get_device_num);
-      DLSYM_OPT (openacc.set_device_num, openacc_set_device_num);
       DLSYM_OPT (openacc.register_async_cleanup,
 		 openacc_register_async_cleanup);
       DLSYM_OPT (openacc.async_test, openacc_async_test);
@@ -1271,7 +1266,6 @@ gomp_target_init (void)
 		current_device.mem_map.root = NULL;
 		current_device.is_initialized = false;
 		current_device.openacc.data_environ = NULL;
-		current_device.openacc.target_data = NULL;
 		for (i = 0; i < new_num_devices; i++)
 		  {
 		    current_device.target_id = i;
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-9.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-9.c
index 84045db..a4cf7f2 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-9.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-9.c
@@ -58,7 +58,7 @@ main (int argc, char **argv)
       acc_set_device_num (1, (acc_device_t) 0);
 
       devnum = acc_get_device_num (devtype);
-      if (devnum != 0)
+      if (devnum != 1)
 	abort ();
   }
 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
  2015-04-08 14:32                                                       ` Julian Brown
@ 2015-04-08 14:34                                                         ` Jakub Jelinek
  2015-04-08 14:59                                                         ` Ilya Verbin
  1 sibling, 0 replies; 92+ messages in thread
From: Jakub Jelinek @ 2015-04-08 14:34 UTC (permalink / raw)
  To: Julian Brown; +Cc: Ilya Verbin, Thomas Schwinge, gcc-patches, Kirill Yukhin

On Wed, Apr 08, 2015 at 03:31:42PM +0100, Julian Brown wrote:
> It passes libgomp testing on NVPTX. OK?

Please write a proper ChangeLog entry for it.
Ok with that.

	Jakub

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
  2015-04-08 14:32                                                       ` Julian Brown
  2015-04-08 14:34                                                         ` Jakub Jelinek
@ 2015-04-08 14:59                                                         ` Ilya Verbin
  2015-04-08 16:14                                                           ` Julian Brown
  2015-04-14 14:15                                                           ` Julian Brown
  1 sibling, 2 replies; 92+ messages in thread
From: Ilya Verbin @ 2015-04-08 14:59 UTC (permalink / raw)
  To: Julian Brown; +Cc: Jakub Jelinek, Thomas Schwinge, gcc-patches, Kirill Yukhin

On Wed, Apr 08, 2015 at 15:31:42 +0100, Julian Brown wrote:
> This version is mostly the same as the last posted version but has a
> tweak in GOACC_parallel to account for the new splay tree arrangement
> for target functions:
> 
> -      tgt_fn = (void (*)) tgt_fn_key->tgt->tgt_start;
> +      tgt_fn = (void (*)) tgt_fn_key->tgt_offset;
> 
> Have there been any other changes I might have missed?

No.

> It passes libgomp testing on NVPTX. OK?

Have you tested it with disabled offloading?

I see several regressions:
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_on_device-1.c -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/if-1.c -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test

  -- Ilya

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
  2015-04-08 14:59                                                         ` Ilya Verbin
@ 2015-04-08 16:14                                                           ` Julian Brown
  2015-04-14 14:15                                                           ` Julian Brown
  1 sibling, 0 replies; 92+ messages in thread
From: Julian Brown @ 2015-04-08 16:14 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: Jakub Jelinek, Thomas Schwinge, gcc-patches, Kirill Yukhin

On Wed, 8 Apr 2015 17:58:56 +0300
Ilya Verbin <iverbin@gmail.com> wrote:

> Have you tested it with disabled offloading?
> 
> I see several regressions:
> FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_on_device-1.c
> -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
> FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/if-1.c
> -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test

No -- thanks for the note. I've committed the patch now, but I'll try
to get to looking at these in the next day or two (it's probably
something relatively minor, I guess).

Julian

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
  2015-04-08 14:59                                                         ` Ilya Verbin
  2015-04-08 16:14                                                           ` Julian Brown
@ 2015-04-14 14:15                                                           ` Julian Brown
  2015-04-14 15:35                                                             ` Using -foffload=[...] to cycle through accelerators (was: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks) Thomas Schwinge
                                                                               ` (2 more replies)
  1 sibling, 3 replies; 92+ messages in thread
From: Julian Brown @ 2015-04-14 14:15 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: Jakub Jelinek, Thomas Schwinge, gcc-patches, Kirill Yukhin

[-- Attachment #1: Type: text/plain, Size: 1944 bytes --]

On Wed, 8 Apr 2015 17:58:56 +0300
Ilya Verbin <iverbin@gmail.com> wrote:

> On Wed, Apr 08, 2015 at 15:31:42 +0100, Julian Brown wrote:
> > This version is mostly the same as the last posted version but has a
> > tweak in GOACC_parallel to account for the new splay tree
> > arrangement for target functions:
> > 
> > -      tgt_fn = (void (*)) tgt_fn_key->tgt->tgt_start;
> > +      tgt_fn = (void (*)) tgt_fn_key->tgt_offset;
> > 
> > Have there been any other changes I might have missed?
> 
> No.
> 
> > It passes libgomp testing on NVPTX. OK?
> 
> Have you tested it with disabled offloading?
> 
> I see several regressions:
> FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_on_device-1.c
> -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
> FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/if-1.c
> -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test

I think there may be multiple issues here. The attached patch addresses
one -- acc_device_type not distinguishing between "offloaded" and host
code with the host_nonshm plugin.

The other problem is that it appears that the ACC_DEVICE_TYPE
environment variable is not getting set properly on the target for (any
of) the OpenACC tests: this means a lot of the time the "wrong" plugin
is being tested, and means that the above tests (and several others)
still fail. That will apparently need some more engineering (on our
part).

(Not asking for review just yet, JFYI.)

Julian

ChangeLog

    libgomp/
    * oacc-init.c (acc_on_device): Check whether we're in an offloaded
    region for host_nonshm plugin.
    * plugin/plugin-host.c (GOMP_OFFLOAD_openacc_parallel): Set
    nonshm_exec flag in thread-local data.
    (GOMP_OFFLOAD_openacc_create_thread_data): Allocate thread-local
    data for host_nonshm plugin.
    (+GOMP_OFFLOAD_openacc_destroy_thread_data): Free thread-local data
    for host_nonshm plugin.
    * plugin/plugin-host.h: New.

[-- Attachment #2: nonshm-acc-on-device-1.diff --]
[-- Type: text/x-patch, Size: 3949 bytes --]

Index: libgomp/oacc-init.c
===================================================================
--- libgomp/oacc-init.c	(revision 221922)
+++ libgomp/oacc-init.c	(working copy)
@@ -29,6 +29,7 @@
 #include "libgomp.h"
 #include "oacc-int.h"
 #include "openacc.h"
+#include "plugin/plugin-host.h"
 #include <assert.h>
 #include <stdlib.h>
 #include <strings.h>
@@ -548,7 +549,14 @@ ialias (acc_set_device_num)
 int
 acc_on_device (acc_device_t dev)
 {
-  if (acc_get_device_type () == acc_device_host_nonshm)
+  struct goacc_thread *thr = goacc_thread ();
+
+  /* We only want to appear to be the "host_nonshm" plugin from "offloaded"
+     code -- i.e. within a parallel region.  Test a flag set by the
+     openacc_parallel hook of the host_nonshm plugin to determine that.  */
+  if (acc_get_device_type () == acc_device_host_nonshm
+      && thr && thr->target_tls
+      && ((struct nonshm_thread *)thr->target_tls)->nonshm_exec)
     return dev == acc_device_host_nonshm || dev == acc_device_not_host;
 
   /* Just rely on the compiler builtin.  */
Index: libgomp/plugin/plugin-host.c
===================================================================
--- libgomp/plugin/plugin-host.c	(revision 221922)
+++ libgomp/plugin/plugin-host.c	(working copy)
@@ -44,6 +44,7 @@
 #include <stdlib.h>
 #include <string.h>
 #include <stdio.h>
+#include <stdbool.h>
 
 #ifdef HOST_NONSHM_PLUGIN
 #define STATIC
@@ -55,6 +56,10 @@
 #define SELF "host: "
 #endif
 
+#ifdef HOST_NONSHM_PLUGIN
+#include "plugin-host.h"
+#endif
+
 STATIC const char *
 GOMP_OFFLOAD_get_name (void)
 {
@@ -174,7 +179,10 @@ GOMP_OFFLOAD_openacc_parallel (void (*fn
 			       void *targ_mem_desc __attribute__ ((unused)))
 {
 #ifdef HOST_NONSHM_PLUGIN
+  struct nonshm_thread *thd = GOMP_PLUGIN_acc_thread ();
+  thd->nonshm_exec = true;
   fn (devaddrs);
+  thd->nonshm_exec = false;
 #else
   fn (hostaddrs);
 #endif
@@ -232,11 +240,20 @@ STATIC void *
 GOMP_OFFLOAD_openacc_create_thread_data (int ord
 					 __attribute__ ((unused)))
 {
+#ifdef HOST_NONSHM_PLUGIN
+  struct nonshm_thread *thd
+    = GOMP_PLUGIN_malloc (sizeof (struct nonshm_thread));
+  thd->nonshm_exec = false;
+  return thd;
+#else
   return NULL;
+#endif
 }
 
 STATIC void
-GOMP_OFFLOAD_openacc_destroy_thread_data (void *tls_data
-					  __attribute__ ((unused)))
+GOMP_OFFLOAD_openacc_destroy_thread_data (void *tls_data)
 {
+#ifdef HOST_NONSHM_PLUGIN
+  free (tls_data);
+#endif
 }
Index: libgomp/plugin/plugin-host.h
===================================================================
--- libgomp/plugin/plugin-host.h	(revision 0)
+++ libgomp/plugin/plugin-host.h	(revision 0)
@@ -0,0 +1,37 @@
+/* OpenACC Runtime Library: acc_device_host, acc_device_host_nonshm.
+
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef PLUGIN_HOST_H
+#define PLUGIN_HOST_H
+
+struct nonshm_thread
+{
+  bool nonshm_exec;
+};
+
+#endif

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Using -foffload=[...] to cycle through accelerators (was: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks)
  2015-04-14 14:15                                                           ` Julian Brown
@ 2015-04-14 15:35                                                             ` Thomas Schwinge
  2015-04-14 15:43                                                             ` acc_on_device for device_type_host_nonshm " Thomas Schwinge
  2015-04-17  9:54                                                             ` libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch) (PR65742) Julian Brown
  2 siblings, 0 replies; 92+ messages in thread
From: Thomas Schwinge @ 2015-04-14 15:35 UTC (permalink / raw)
  To: Julian Brown, Jakub Jelinek, Kirill Yukhin, Ilya Verbin; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1736 bytes --]

Hi!

On Tue, 14 Apr 2015 15:15:02 +0100, Julian Brown <julian@codesourcery.com> wrote:
> The other problem is that it appears that the ACC_DEVICE_TYPE
> environment variable is not getting set properly on the target for (any
> of) the OpenACC tests: this means a lot of the time the "wrong" plugin
> is being tested, and means that the above tests (and several others)
> still fail. That will apparently need some more engineering (on our
> part).

This should be working fine for "local" testing (that is, build-tree
testing, without a remote board).  Setting/communicating environment
variables to remote boards (which we use a lot for our internal testing)
indeed does not work in DejaGnu.  This has been reported several times in
the last years, by different people, but nobody ever came up with a
solution that was sufficiently generic so that was acceptable for
inclusion.  (Need a way to specify whether environment variables should
be set for the host and/or target system, and so on.)


For the problem at hand, I once had a different suggestion, which I'm
paraphrasing here: the existing code should be changed such that when
-foffload=nvptx,intelmic,... is specified (also considering a comple-time
default value based on --enable-offload-targets=[...]), the first
offloading target (nvptx in my example) is the one that is used by
default (acc_device_default) with OpenACC, and for OpenMP as device ID 0,
and so on.  That way, we could compile the libgomp test cases with
-foffload=$offload_target_openacc (see
libgomp/testsuite/libgomp.oacc-c/c.exp for context), and wouldn't have to
care about the ACC_DEVICE_TYPE environment variable anymore.

Does that make sense?


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* acc_on_device for device_type_host_nonshm (was: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks)
  2015-04-14 14:15                                                           ` Julian Brown
  2015-04-14 15:35                                                             ` Using -foffload=[...] to cycle through accelerators (was: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks) Thomas Schwinge
@ 2015-04-14 15:43                                                             ` Thomas Schwinge
  2015-04-17 13:16                                                               ` Jakub Jelinek
  2015-04-17  9:54                                                             ` libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch) (PR65742) Julian Brown
  2 siblings, 1 reply; 92+ messages in thread
From: Thomas Schwinge @ 2015-04-14 15:43 UTC (permalink / raw)
  To: Julian Brown; +Cc: Jakub Jelinek, gcc-patches, Kirill Yukhin, Ilya Verbin

[-- Attachment #1: Type: text/plain, Size: 2322 bytes --]

Hi!

On Tue, 14 Apr 2015 15:15:02 +0100, Julian Brown <julian@codesourcery.com> wrote:
> On Wed, 8 Apr 2015 17:58:56 +0300
> Ilya Verbin <iverbin@gmail.com> wrote:
> > I see several regressions:
> > FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_on_device-1.c
> > -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
> > FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/if-1.c
> > -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
> 
> I think there may be multiple issues here. The attached patch addresses
> one -- acc_device_type not distinguishing between "offloaded" and host
> code with the host_nonshm plugin.

(You mean acc_on_device?)

> --- libgomp/oacc-init.c	(revision 221922)
> +++ libgomp/oacc-init.c	(working copy)
> @@ -548,7 +549,14 @@ ialias (acc_set_device_num)
>  int
>  acc_on_device (acc_device_t dev)
>  {
> -  if (acc_get_device_type () == acc_device_host_nonshm)
> +  struct goacc_thread *thr = goacc_thread ();
> +
> +  /* We only want to appear to be the "host_nonshm" plugin from "offloaded"
> +     code -- i.e. within a parallel region.  Test a flag set by the
> +     openacc_parallel hook of the host_nonshm plugin to determine that.  */
> +  if (acc_get_device_type () == acc_device_host_nonshm
> +      && thr && thr->target_tls
> +      && ((struct nonshm_thread *)thr->target_tls)->nonshm_exec)
>      return dev == acc_device_host_nonshm || dev == acc_device_not_host;
>  
>    /* Just rely on the compiler builtin.  */

Really, acc_on_device is implemented as a compiler builtin (which is just
disabled for a few libgomp test cases, in order to test the acc_on_device
library function in libgomp), and I never understood why the "fallback"
implementation in libgomp (cited above) should be doing anything
different from the GCC builtin.  Is the "problem" actually, that some
libgomp test cases are expecting from acc_on_device for
acc_device_host_nonshm a different answer than the one they're currently
getting?  What is the expected answer?  Given that the OpenACC
specification doesn't talk about a host_nonshm device type, can we
accordingly define what the expected behavior is, so that we can just
have libgomp/oacc-init.c:acc_on_device »rely on the compiler builtin«?


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch) (PR65742)
  2015-04-14 14:15                                                           ` Julian Brown
  2015-04-14 15:35                                                             ` Using -foffload=[...] to cycle through accelerators (was: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks) Thomas Schwinge
  2015-04-14 15:43                                                             ` acc_on_device for device_type_host_nonshm " Thomas Schwinge
@ 2015-04-17  9:54                                                             ` Julian Brown
  2 siblings, 0 replies; 92+ messages in thread
From: Julian Brown @ 2015-04-17  9:54 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: Jakub Jelinek, Thomas Schwinge, gcc-patches, Kirill Yukhin

On Tue, 14 Apr 2015 15:15:02 +0100
Julian Brown <julian@codesourcery.com> wrote:

> On Wed, 8 Apr 2015 17:58:56 +0300
> Ilya Verbin <iverbin@gmail.com> wrote:
> 
> > On Wed, Apr 08, 2015 at 15:31:42 +0100, Julian Brown wrote:
> > > This version is mostly the same as the last posted version but
> > > has a tweak in GOACC_parallel to account for the new splay tree
> > > arrangement for target functions:
> > > 
> > > -      tgt_fn = (void (*)) tgt_fn_key->tgt->tgt_start;
> > > +      tgt_fn = (void (*)) tgt_fn_key->tgt_offset;
> > > 
> > > Have there been any other changes I might have missed?
> > 
> > No.
> > 
> > > It passes libgomp testing on NVPTX. OK?
> > 
> > Have you tested it with disabled offloading?
> > 
> > I see several regressions:
> > FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_on_device-1.c
> > -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
> > FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/if-1.c
> > -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
> 
> I think there may be multiple issues here. The attached patch
> addresses one -- acc_device_type not distinguishing between
> "offloaded" and host code with the host_nonshm plugin.

The patch appears to fix the original issue after all: I've re-run
tests with host==target and the failures no longer appear. Also the
same has been noted by Dominique d'Humieres in PR65742.

> The other problem is that it appears that the ACC_DEVICE_TYPE
> environment variable is not getting set properly on the target for
> (any of) the OpenACC tests: this means a lot of the time the "wrong"
> plugin is being tested, and means that the above tests (and several
> others) still fail. That will apparently need some more engineering
> (on our part).

Fixing this turns out to require more DejaGNU-fu than I have: AFAICT,
setting a per-test environment variable from an .exp file can't easily
be done at present. The potentially useful-looking
{dg-}set-target-env-var doesn't look quite suitable for this purpose,
and besides which doesn't actually seem to be implemented for host !=
target anyway.

(At least, if this fragment of gcc-dg.exp is anything to go by:

   if { [info exists set_target_env_var] \
        && [llength $set_target_env_var] != 0 } {
     if { [is_remote target] } {
       return [list "unsupported" ""]
     } ...
).

So: OK for trunk?

Thanks,

Julian

> ChangeLog
> 
>     libgomp/
>     * oacc-init.c (acc_on_device): Check whether we're in an offloaded
>     region for host_nonshm plugin.
>     * plugin/plugin-host.c (GOMP_OFFLOAD_openacc_parallel): Set
>     nonshm_exec flag in thread-local data.
>     (GOMP_OFFLOAD_openacc_create_thread_data): Allocate thread-local
>     data for host_nonshm plugin.
>     (+GOMP_OFFLOAD_openacc_destroy_thread_data): Free thread-local
> data for host_nonshm plugin.
>     * plugin/plugin-host.h: New.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: acc_on_device for device_type_host_nonshm (was: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks)
  2015-04-14 15:43                                                             ` acc_on_device for device_type_host_nonshm " Thomas Schwinge
@ 2015-04-17 13:16                                                               ` Jakub Jelinek
  2015-05-07 18:32                                                                 ` acc_on_device for device_type_host_nonshm (was: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks) (PR65742) Julian Brown
  0 siblings, 1 reply; 92+ messages in thread
From: Jakub Jelinek @ 2015-04-17 13:16 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: Julian Brown, gcc-patches, Kirill Yukhin, Ilya Verbin

On Tue, Apr 14, 2015 at 05:43:26PM +0200, Thomas Schwinge wrote:
> On Tue, 14 Apr 2015 15:15:02 +0100, Julian Brown <julian@codesourcery.com> wrote:
> > On Wed, 8 Apr 2015 17:58:56 +0300
> > Ilya Verbin <iverbin@gmail.com> wrote:
> > > I see several regressions:
> > > FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_on_device-1.c
> > > -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
> > > FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/if-1.c
> > > -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
> > 
> > I think there may be multiple issues here. The attached patch addresses
> > one -- acc_device_type not distinguishing between "offloaded" and host
> > code with the host_nonshm plugin.
> 
> (You mean acc_on_device?)
> 
> > --- libgomp/oacc-init.c	(revision 221922)
> > +++ libgomp/oacc-init.c	(working copy)
> > @@ -548,7 +549,14 @@ ialias (acc_set_device_num)
> >  int
> >  acc_on_device (acc_device_t dev)
> >  {
> > -  if (acc_get_device_type () == acc_device_host_nonshm)
> > +  struct goacc_thread *thr = goacc_thread ();
> > +
> > +  /* We only want to appear to be the "host_nonshm" plugin from "offloaded"
> > +     code -- i.e. within a parallel region.  Test a flag set by the
> > +     openacc_parallel hook of the host_nonshm plugin to determine that.  */
> > +  if (acc_get_device_type () == acc_device_host_nonshm
> > +      && thr && thr->target_tls
> > +      && ((struct nonshm_thread *)thr->target_tls)->nonshm_exec)
> >      return dev == acc_device_host_nonshm || dev == acc_device_not_host;
> >  
> >    /* Just rely on the compiler builtin.  */
> 
> Really, acc_on_device is implemented as a compiler builtin (which is just
> disabled for a few libgomp test cases, in order to test the acc_on_device
> library function in libgomp), and I never understood why the "fallback"
> implementation in libgomp (cited above) should be doing anything
> different from the GCC builtin.  Is the "problem" actually, that some

The question is if the builtin expansion isn't wrong, at least as long as
the host_nonshm device is meant to be supported.  The
#ifdef ACCEL_COMPILER
case is easier, at least as long as ACCEL_COMPILER compiled code is not
meant to be able to offload to other devices (or host again), but the
non-ACCEL_COMPILER case means the code is either on the host, or
host_nonshm, or e.g. with Intel MIC you could have some shared library be
compiled by the host compiler, but then actuall linked into the MIC
offloaded path.  In all those cases, I think it is just the library that
can determine the return value.

E.g. OpenMP omp_is_initial_device function is also only implemented in the
library, perhaps at some point I could expand it for #ifdef ACCEL_COMPILER
as builtin, but not for the host code, at least not due to the host-nonshm
plugin.

	Jakub

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Merge current set of OpenACC changes from gomp-4_0-branch
  2015-01-15 20:44 Merge current set of OpenACC changes from gomp-4_0-branch Thomas Schwinge
                   ` (9 preceding siblings ...)
  2015-02-23 10:31 ` Fix number of arguments parameter in Ada DEF_FUNCTION_TYPE_* (was: Merge current set of OpenACC changes from gomp-4_0-branch) Thomas Schwinge
@ 2015-04-20 14:24 ` Thomas Schwinge
  2015-04-20 20:14   ` Gerald Pfeifer
  10 siblings, 1 reply; 92+ messages in thread
From: Thomas Schwinge @ 2015-04-20 14:24 UTC (permalink / raw)
  To: Gerald Pfeifer; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1386 bytes --]

Hi!

On Thu, 15 Jan 2015 21:20:07 +0100, I wrote:
> In r219682, I have committed to trunk our current set of OpenACC changes,
> which we had prepared on gomp-4_0-branch.  Thanks to everyone who has
> been contributing!
> 
> Note that this is an experimental feature, incomplete, and subject to
> change in future versions of GCC.  We shall update -- and keep updated --
> <https://gcc.gnu.org/wiki/OpenACC>, to track the current status.

(This has now happened, finally...)

Gerald, is it OK to commit the following to update GCC 5 changes' »New
Languages and Language specific improvements« section?

Index: htdocs/gcc-5/changes.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/changes.html,v
retrieving revision 1.109
diff -u -p -r1.109 changes.html
--- htdocs/gcc-5/changes.html	20 Apr 2015 08:22:35 -0000	1.109
+++ htdocs/gcc-5/changes.html	20 Apr 2015 14:20:54 -0000
@@ -193,6 +193,12 @@
 	  <li>Card emulator.</li>
 	</ul>
     </li>
+    <li id="openacc">
+      GCC 5 includes a preliminary implementation of the OpenACC 2.0a
+      specification.  OpenACC is intended for programming accelerator devices
+      such as GPUs.  See <a href="https://gcc.gnu.org/wiki/OpenACC">the OpenACC
+      wiki page</a> for more information.
+    </li>
   </ul>
 
 


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Merge current set of OpenACC changes from gomp-4_0-branch
  2015-04-20 14:24 ` Merge current set of OpenACC changes from gomp-4_0-branch Thomas Schwinge
@ 2015-04-20 20:14   ` Gerald Pfeifer
  0 siblings, 0 replies; 92+ messages in thread
From: Gerald Pfeifer @ 2015-04-20 20:14 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 196 bytes --]

On Mon, 20 Apr 2015, Thomas Schwinge wrote:
> Gerald, is it OK to commit the following to update GCC 5 changes' »New
> Languages and Language specific improvements« section?

Sure thing.

Gerald

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: acc_on_device for device_type_host_nonshm (was: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks) (PR65742)
  2015-04-17 13:16                                                               ` Jakub Jelinek
@ 2015-05-07 18:32                                                                 ` Julian Brown
  2015-05-21 11:32                                                                   ` acc_on_device for device_type_host_nonshm Thomas Schwinge
                                                                                     ` (2 more replies)
  0 siblings, 3 replies; 92+ messages in thread
From: Julian Brown @ 2015-05-07 18:32 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Thomas Schwinge, gcc-patches, Kirill Yukhin, Ilya Verbin

[-- Attachment #1: Type: text/plain, Size: 4972 bytes --]

On Fri, 17 Apr 2015 15:16:19 +0200
Jakub Jelinek <jakub@redhat.com> wrote:

> On Tue, Apr 14, 2015 at 05:43:26PM +0200, Thomas Schwinge wrote:
> > On Tue, 14 Apr 2015 15:15:02 +0100, Julian Brown
> > <julian@codesourcery.com> wrote:
> > > On Wed, 8 Apr 2015 17:58:56 +0300
> > > Ilya Verbin <iverbin@gmail.com> wrote:
> > > > I see several regressions:
> > > > FAIL:
> > > > libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_on_device-1.c
> > > > -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution
> > > > test FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/if-1.c
> > > > -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution
> > > > test
> > > 
> > > I think there may be multiple issues here. The attached patch
> > > addresses one -- acc_device_type not distinguishing between
> > > "offloaded" and host code with the host_nonshm plugin.
> > 
> > (You mean acc_on_device?)
> > 
> > > --- libgomp/oacc-init.c	(revision 221922)
> > > +++ libgomp/oacc-init.c	(working copy)
> > > @@ -548,7 +549,14 @@ ialias (acc_set_device_num)
> > >  int
> > >  acc_on_device (acc_device_t dev)
> > >  {
> > > -  if (acc_get_device_type () == acc_device_host_nonshm)
> > > +  struct goacc_thread *thr = goacc_thread ();
> > > +
> > > +  /* We only want to appear to be the "host_nonshm" plugin from
> > > "offloaded"
> > > +     code -- i.e. within a parallel region.  Test a flag set by
> > > the
> > > +     openacc_parallel hook of the host_nonshm plugin to
> > > determine that.  */
> > > +  if (acc_get_device_type () == acc_device_host_nonshm
> > > +      && thr && thr->target_tls
> > > +      && ((struct nonshm_thread *)thr->target_tls)->nonshm_exec)
> > >      return dev == acc_device_host_nonshm || dev ==
> > > acc_device_not_host; 
> > >    /* Just rely on the compiler builtin.  */
> > 
> > Really, acc_on_device is implemented as a compiler builtin (which
> > is just disabled for a few libgomp test cases, in order to test the
> > acc_on_device library function in libgomp), and I never understood
> > why the "fallback" implementation in libgomp (cited above) should
> > be doing anything different from the GCC builtin.  Is the "problem"
> > actually, that some
> 
> The question is if the builtin expansion isn't wrong, at least as
> long as the host_nonshm device is meant to be supported.  The
> #ifdef ACCEL_COMPILER
> case is easier, at least as long as ACCEL_COMPILER compiled code is
> not meant to be able to offload to other devices (or host again), but
> the non-ACCEL_COMPILER case means the code is either on the host, or
> host_nonshm, or e.g. with Intel MIC you could have some shared
> library be compiled by the host compiler, but then actuall linked
> into the MIC offloaded path.  In all those cases, I think it is just
> the library that can determine the return value.
> 
> E.g. OpenMP omp_is_initial_device function is also only implemented
> in the library, perhaps at some point I could expand it for #ifdef
> ACCEL_COMPILER as builtin, but not for the host code, at least not
> due to the host-nonshm plugin.

Here's a new version of the patch that doesn't use the open-coded
expansion for acc_on_device for the host compiler at all. This means
that the host and the host_nonshm plugin should DTRT without any
special compiler options (which have thus been removed from the libgomp
tests that set them or refer to them).

So now, for the host, acc_on_device returns:

acc_on_device (acc_device_none): true
acc_on_device (acc_device_host): true
otherwise: false

When the host_nonshm plugin is active, acc_on_device returns:

acc_on_device (acc_device_host_nonshm): true (except when "host
fallback" is in effect, i.e. because of a false "if" clause).
acc_on_device (acc_device_not_host): likewise.
otherwise: false

In particular, the host_nonshm plugin doesn't consider itself to be
running code "on the host".

OK for trunk?

Julian

ChangeLog

    PR libgomp/65742

    gcc/
    * builtins.c (expand_builtin_acc_on_device): Don't use open-coded
    sequence for !ACCEL_COMPILER.

    libgomp/
    * oacc-init.c (plugin/plugin-host.h): Include.
    (acc_on_device): Check whether we're in an offloaded region for
    host_nonshm
    plugin. Don't use __builtin_acc_on_device.
    * plugin/plugin-host.c (GOMP_OFFLOAD_openacc_parallel): Set
    nonshm_exec flag in thread-local data.
    (GOMP_OFFLOAD_openacc_create_thread_data): Allocate thread-local
    data for host_nonshm plugin.
    (GOMP_OFFLOAD_openacc_destroy_thread_data): Free thread-local data
    for host_nonshm plugin.
    * plugin/plugin-host.h: New.
    * testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c: Remove
    -fno-builtin-acc_on_device flag.
    * testsuite/libgomp.oacc-c-c++-common/if-1.c: Likewise.
    * testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90: Remove
    comment re: acc_on_device builtin.
    * testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f: Likewise.
    * testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f: Likewise.

[-- Attachment #2: builtin-acc-on-dev-2.diff --]
[-- Type: text/x-patch, Size: 8059 bytes --]

commit adccf2e7d313263d585f63e752a4d36653d47811
Author: Julian Brown <julian@codesourcery.com>
Date:   Tue Apr 21 12:40:45 2015 -0700

    Non-SHM acc_on_device fixes

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 6fe1456..5930fe4 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -5917,6 +5917,7 @@ expand_stack_save (void)
 static rtx
 expand_builtin_acc_on_device (tree exp, rtx target)
 {
+#ifdef ACCEL_COMPILER
   if (!validate_arglist (exp, INTEGER_TYPE, VOID_TYPE))
     return NULL_RTX;
 
@@ -5925,13 +5926,8 @@ expand_builtin_acc_on_device (tree exp, rtx target)
   /* Return (arg == v1 || arg == v2) ? 1 : 0.  */
   machine_mode v_mode = TYPE_MODE (TREE_TYPE (arg));
   rtx v = expand_normal (arg), v1, v2;
-#ifdef ACCEL_COMPILER
   v1 = GEN_INT (GOMP_DEVICE_NOT_HOST);
   v2 = GEN_INT (ACCEL_COMPILER_acc_device);
-#else
-  v1 = GEN_INT (GOMP_DEVICE_NONE);
-  v2 = GEN_INT (GOMP_DEVICE_HOST);
-#endif
   machine_mode target_mode = TYPE_MODE (integer_type_node);
   if (!target || !register_operand (target, target_mode))
     target = gen_reg_rtx (target_mode);
@@ -5945,6 +5941,9 @@ expand_builtin_acc_on_device (tree exp, rtx target)
   emit_label (done_label);
 
   return target;
+#else
+  return NULL;
+#endif
 }
 
 
diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
index 335ffd4..157147a 100644
--- a/libgomp/oacc-init.c
+++ b/libgomp/oacc-init.c
@@ -29,6 +29,7 @@
 #include "libgomp.h"
 #include "oacc-int.h"
 #include "openacc.h"
+#include "plugin/plugin-host.h"
 #include <assert.h>
 #include <stdlib.h>
 #include <strings.h>
@@ -611,11 +612,18 @@ ialias (acc_set_device_num)
 int
 acc_on_device (acc_device_t dev)
 {
-  if (acc_get_device_type () == acc_device_host_nonshm)
+  struct goacc_thread *thr = goacc_thread ();
+
+  /* We only want to appear to be the "host_nonshm" plugin from "offloaded"
+     code -- i.e. within a parallel region.  Test a flag set by the
+     openacc_parallel hook of the host_nonshm plugin to determine that.  */
+  if (acc_get_device_type () == acc_device_host_nonshm
+      && thr && thr->target_tls
+      && ((struct nonshm_thread *)thr->target_tls)->nonshm_exec)
     return dev == acc_device_host_nonshm || dev == acc_device_not_host;
 
-  /* Just rely on the compiler builtin.  */
-  return __builtin_acc_on_device (dev);
+  /* For OpenACC, libgomp is only built for the host, so this is sufficient.  */
+  return dev == acc_device_host || dev == acc_device_none;
 }
 
 ialias (acc_on_device)
diff --git a/libgomp/plugin/plugin-host.c b/libgomp/plugin/plugin-host.c
index 1faf5bc..3cb4dab 100644
--- a/libgomp/plugin/plugin-host.c
+++ b/libgomp/plugin/plugin-host.c
@@ -44,6 +44,7 @@
 #include <stdlib.h>
 #include <string.h>
 #include <stdio.h>
+#include <stdbool.h>
 
 #ifdef HOST_NONSHM_PLUGIN
 #define STATIC
@@ -55,6 +56,10 @@
 #define SELF "host: "
 #endif
 
+#ifdef HOST_NONSHM_PLUGIN
+#include "plugin-host.h"
+#endif
+
 STATIC const char *
 GOMP_OFFLOAD_get_name (void)
 {
@@ -174,7 +179,10 @@ GOMP_OFFLOAD_openacc_parallel (void (*fn) (void *),
 			       void *targ_mem_desc __attribute__ ((unused)))
 {
 #ifdef HOST_NONSHM_PLUGIN
+  struct nonshm_thread *thd = GOMP_PLUGIN_acc_thread ();
+  thd->nonshm_exec = true;
   fn (devaddrs);
+  thd->nonshm_exec = false;
 #else
   fn (hostaddrs);
 #endif
@@ -232,11 +240,20 @@ STATIC void *
 GOMP_OFFLOAD_openacc_create_thread_data (int ord
 					 __attribute__ ((unused)))
 {
+#ifdef HOST_NONSHM_PLUGIN
+  struct nonshm_thread *thd
+    = GOMP_PLUGIN_malloc (sizeof (struct nonshm_thread));
+  thd->nonshm_exec = false;
+  return thd;
+#else
   return NULL;
+#endif
 }
 
 STATIC void
-GOMP_OFFLOAD_openacc_destroy_thread_data (void *tls_data
-					  __attribute__ ((unused)))
+GOMP_OFFLOAD_openacc_destroy_thread_data (void *tls_data)
 {
+#ifdef HOST_NONSHM_PLUGIN
+  free (tls_data);
+#endif
 }
diff --git a/libgomp/plugin/plugin-host.h b/libgomp/plugin/plugin-host.h
new file mode 100644
index 0000000..96955d1
--- /dev/null
+++ b/libgomp/plugin/plugin-host.h
@@ -0,0 +1,37 @@
+/* OpenACC Runtime Library: acc_device_host, acc_device_host_nonshm.
+
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef PLUGIN_HOST_H
+#define PLUGIN_HOST_H
+
+struct nonshm_thread
+{
+  bool nonshm_exec;
+};
+
+#endif
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c
index 81ea476..25cc15a 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c
@@ -1,7 +1,3 @@
-/* Disable the acc_on_device builtin; we want to test the libgomp library
-   function.  */
-/* { dg-additional-options "-fno-builtin-acc_on_device" } */
-
 #include <stdlib.h>
 #include <openacc.h>
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/if-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/if-1.c
index 184b355..6aa3bb7 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/if-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/if-1.c
@@ -1,5 +1,4 @@
 /* { dg-do run } */
-/* { dg-additional-options "-fno-builtin-acc_on_device" } */
 
 #include <openacc.h>
 #include <stdlib.h>
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90
index 4488818..729b685 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90
@@ -1,8 +1,4 @@
 ! { dg-additional-options "-cpp" }
-! TODO: Have to disable the acc_on_device builtin for we want to test the
-! libgomp library function?  The command line option
-! '-fno-builtin-acc_on_device' is valid for C/C++/ObjC/ObjC++ but not for
-! Fortran.
 
 use openacc
 implicit none
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f b/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f
index 0047a19..19ff4a5 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f
+++ b/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f
@@ -1,8 +1,4 @@
 ! { dg-additional-options "-cpp" }
-! TODO: Have to disable the acc_on_device builtin for we want to test
-! the libgomp library function?  The command line option
-! '-fno-builtin-acc_on_device' is valid for C/C++/ObjC/ObjC++ but not
-! for Fortran.
 
       USE OPENACC
       IMPLICIT NONE
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f b/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f
index 49d7a72..b01c553 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f
+++ b/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f
@@ -1,8 +1,4 @@
 ! { dg-additional-options "-cpp" }
-! TODO: Have to disable the acc_on_device builtin for we want to test
-! the libgomp library function?  The command line option
-! '-fno-builtin-acc_on_device' is valid for C/C++/ObjC/ObjC++ but not
-! for Fortran.
 
       IMPLICIT NONE
       INCLUDE "openacc_lib.h"

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: acc_on_device for device_type_host_nonshm
  2015-05-07 18:32                                                                 ` acc_on_device for device_type_host_nonshm (was: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks) (PR65742) Julian Brown
@ 2015-05-21 11:32                                                                   ` Thomas Schwinge
  2015-05-21 11:42                                                                     ` Jakub Jelinek
  2015-06-02 12:08                                                                   ` [PR libgomp/65742, PR middle-end/66332] XFAIL acc_on_device compile-time evaluation (was: acc_on_device for device_type_host_nonshm) Thomas Schwinge
  2015-07-14 20:26                                                                   ` PR65742: OpenACC acc_on_device fixes Thomas Schwinge
  2 siblings, 1 reply; 92+ messages in thread
From: Thomas Schwinge @ 2015-05-21 11:32 UTC (permalink / raw)
  To: Julian Brown, Jakub Jelinek; +Cc: gcc-patches, Kirill Yukhin, Ilya Verbin

[-- Attachment #1: Type: text/plain, Size: 10471 bytes --]

Hi!

On Thu, 7 May 2015 19:32:26 +0100, Julian Brown <julian@codesourcery.com> wrote:
> Here's a new version of the patch [...]

> OK for trunk?

Makes sense to me (with just a request to drop the testsuite changes, see
below), to get the existing regressions under control.  Jakub?

>     PR libgomp/65742
> 
>     gcc/
>     * builtins.c (expand_builtin_acc_on_device): Don't use open-coded
>     sequence for !ACCEL_COMPILER.
> 
>     libgomp/
>     * oacc-init.c (plugin/plugin-host.h): Include.
>     (acc_on_device): Check whether we're in an offloaded region for
>     host_nonshm
>     plugin. Don't use __builtin_acc_on_device.
>     * plugin/plugin-host.c (GOMP_OFFLOAD_openacc_parallel): Set
>     nonshm_exec flag in thread-local data.
>     (GOMP_OFFLOAD_openacc_create_thread_data): Allocate thread-local
>     data for host_nonshm plugin.
>     (GOMP_OFFLOAD_openacc_destroy_thread_data): Free thread-local data
>     for host_nonshm plugin.
>     * plugin/plugin-host.h: New.
>     * testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c: Remove
>     -fno-builtin-acc_on_device flag.
>     * testsuite/libgomp.oacc-c-c++-common/if-1.c: Likewise.
>     * testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90: Remove
>     comment re: acc_on_device builtin.
>     * testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f: Likewise.
>     * testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f: Likewise.> commit adccf2e7d313263d585f63e752a4d36653d47811

> Author: Julian Brown <julian@codesourcery.com>
> Date:   Tue Apr 21 12:40:45 2015 -0700
> 
>     Non-SHM acc_on_device fixes
> 
> diff --git a/gcc/builtins.c b/gcc/builtins.c
> index 6fe1456..5930fe4 100644
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -5917,6 +5917,7 @@ expand_stack_save (void)
>  static rtx
>  expand_builtin_acc_on_device (tree exp, rtx target)
>  {
> +#ifdef ACCEL_COMPILER
>    if (!validate_arglist (exp, INTEGER_TYPE, VOID_TYPE))
>      return NULL_RTX;
>  
> @@ -5925,13 +5926,8 @@ expand_builtin_acc_on_device (tree exp, rtx target)
>    /* Return (arg == v1 || arg == v2) ? 1 : 0.  */
>    machine_mode v_mode = TYPE_MODE (TREE_TYPE (arg));
>    rtx v = expand_normal (arg), v1, v2;
> -#ifdef ACCEL_COMPILER
>    v1 = GEN_INT (GOMP_DEVICE_NOT_HOST);
>    v2 = GEN_INT (ACCEL_COMPILER_acc_device);
> -#else
> -  v1 = GEN_INT (GOMP_DEVICE_NONE);
> -  v2 = GEN_INT (GOMP_DEVICE_HOST);
> -#endif
>    machine_mode target_mode = TYPE_MODE (integer_type_node);
>    if (!target || !register_operand (target, target_mode))
>      target = gen_reg_rtx (target_mode);
> @@ -5945,6 +5941,9 @@ expand_builtin_acc_on_device (tree exp, rtx target)
>    emit_label (done_label);
>  
>    return target;
> +#else
> +  return NULL;
> +#endif
>  }
>  
>  
> diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
> index 335ffd4..157147a 100644
> --- a/libgomp/oacc-init.c
> +++ b/libgomp/oacc-init.c
> @@ -29,6 +29,7 @@
>  #include "libgomp.h"
>  #include "oacc-int.h"
>  #include "openacc.h"
> +#include "plugin/plugin-host.h"
>  #include <assert.h>
>  #include <stdlib.h>
>  #include <strings.h>
> @@ -611,11 +612,18 @@ ialias (acc_set_device_num)
>  int
>  acc_on_device (acc_device_t dev)
>  {
> -  if (acc_get_device_type () == acc_device_host_nonshm)
> +  struct goacc_thread *thr = goacc_thread ();
> +
> +  /* We only want to appear to be the "host_nonshm" plugin from "offloaded"
> +     code -- i.e. within a parallel region.  Test a flag set by the
> +     openacc_parallel hook of the host_nonshm plugin to determine that.  */
> +  if (acc_get_device_type () == acc_device_host_nonshm
> +      && thr && thr->target_tls
> +      && ((struct nonshm_thread *)thr->target_tls)->nonshm_exec)
>      return dev == acc_device_host_nonshm || dev == acc_device_not_host;
>  
> -  /* Just rely on the compiler builtin.  */
> -  return __builtin_acc_on_device (dev);
> +  /* For OpenACC, libgomp is only built for the host, so this is sufficient.  */
> +  return dev == acc_device_host || dev == acc_device_none;
>  }
>  
>  ialias (acc_on_device)
> diff --git a/libgomp/plugin/plugin-host.c b/libgomp/plugin/plugin-host.c
> index 1faf5bc..3cb4dab 100644
> --- a/libgomp/plugin/plugin-host.c
> +++ b/libgomp/plugin/plugin-host.c
> @@ -44,6 +44,7 @@
>  #include <stdlib.h>
>  #include <string.h>
>  #include <stdio.h>
> +#include <stdbool.h>
>  
>  #ifdef HOST_NONSHM_PLUGIN
>  #define STATIC
> @@ -55,6 +56,10 @@
>  #define SELF "host: "
>  #endif
>  
> +#ifdef HOST_NONSHM_PLUGIN
> +#include "plugin-host.h"
> +#endif
> +
>  STATIC const char *
>  GOMP_OFFLOAD_get_name (void)
>  {
> @@ -174,7 +179,10 @@ GOMP_OFFLOAD_openacc_parallel (void (*fn) (void *),
>  			       void *targ_mem_desc __attribute__ ((unused)))
>  {
>  #ifdef HOST_NONSHM_PLUGIN
> +  struct nonshm_thread *thd = GOMP_PLUGIN_acc_thread ();
> +  thd->nonshm_exec = true;
>    fn (devaddrs);
> +  thd->nonshm_exec = false;
>  #else
>    fn (hostaddrs);
>  #endif
> @@ -232,11 +240,20 @@ STATIC void *
>  GOMP_OFFLOAD_openacc_create_thread_data (int ord
>  					 __attribute__ ((unused)))
>  {
> +#ifdef HOST_NONSHM_PLUGIN
> +  struct nonshm_thread *thd
> +    = GOMP_PLUGIN_malloc (sizeof (struct nonshm_thread));
> +  thd->nonshm_exec = false;
> +  return thd;
> +#else
>    return NULL;
> +#endif
>  }
>  
>  STATIC void
> -GOMP_OFFLOAD_openacc_destroy_thread_data (void *tls_data
> -					  __attribute__ ((unused)))
> +GOMP_OFFLOAD_openacc_destroy_thread_data (void *tls_data)
>  {
> +#ifdef HOST_NONSHM_PLUGIN
> +  free (tls_data);
> +#endif
>  }
> diff --git a/libgomp/plugin/plugin-host.h b/libgomp/plugin/plugin-host.h
> new file mode 100644
> index 0000000..96955d1
> --- /dev/null
> +++ b/libgomp/plugin/plugin-host.h
> @@ -0,0 +1,37 @@
> +/* OpenACC Runtime Library: acc_device_host, acc_device_host_nonshm.
> +
> +   Copyright (C) 2015 Free Software Foundation, Inc.
> +
> +   Contributed by Mentor Embedded.
> +
> +   This file is part of the GNU Offloading and Multi Processing Library
> +   (libgomp).
> +
> +   Libgomp is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
> +   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
> +   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> +   more details.
> +
> +   Under Section 7 of GPL version 3, you are granted additional
> +   permissions described in the GCC Runtime Library Exception, version
> +   3.1, as published by the Free Software Foundation.
> +
> +   You should have received a copy of the GNU General Public License and
> +   a copy of the GCC Runtime Library Exception along with this program;
> +   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef PLUGIN_HOST_H
> +#define PLUGIN_HOST_H
> +
> +struct nonshm_thread
> +{
> +  bool nonshm_exec;
> +};
> +
> +#endif

For the following files, don't the dg-additional-options as well as
comments still apply, and should thus remain as they are?  We're still
using (and should continue to use) the open-coded acc_on_device in
offloaded code.

> diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c
> index 81ea476..25cc15a 100644
> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c
> @@ -1,7 +1,3 @@
> -/* Disable the acc_on_device builtin; we want to test the libgomp library
> -   function.  */
> -/* { dg-additional-options "-fno-builtin-acc_on_device" } */
> -
>  #include <stdlib.h>
>  #include <openacc.h>
>  
> diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/if-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/if-1.c
> index 184b355..6aa3bb7 100644
> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/if-1.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/if-1.c
> @@ -1,5 +1,4 @@
>  /* { dg-do run } */
> -/* { dg-additional-options "-fno-builtin-acc_on_device" } */
>  
>  #include <openacc.h>
>  #include <stdlib.h>
> diff --git a/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90
> index 4488818..729b685 100644
> --- a/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90
> @@ -1,8 +1,4 @@
>  ! { dg-additional-options "-cpp" }
> -! TODO: Have to disable the acc_on_device builtin for we want to test the
> -! libgomp library function?  The command line option
> -! '-fno-builtin-acc_on_device' is valid for C/C++/ObjC/ObjC++ but not for
> -! Fortran.
>  
>  use openacc
>  implicit none
> diff --git a/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f b/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f
> index 0047a19..19ff4a5 100644
> --- a/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f
> @@ -1,8 +1,4 @@
>  ! { dg-additional-options "-cpp" }
> -! TODO: Have to disable the acc_on_device builtin for we want to test
> -! the libgomp library function?  The command line option
> -! '-fno-builtin-acc_on_device' is valid for C/C++/ObjC/ObjC++ but not
> -! for Fortran.
>  
>        USE OPENACC
>        IMPLICIT NONE
> diff --git a/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f b/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f
> index 49d7a72..b01c553 100644
> --- a/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f
> @@ -1,8 +1,4 @@
>  ! { dg-additional-options "-cpp" }
> -! TODO: Have to disable the acc_on_device builtin for we want to test
> -! the libgomp library function?  The command line option
> -! '-fno-builtin-acc_on_device' is valid for C/C++/ObjC/ObjC++ but not
> -! for Fortran.
>  
>        IMPLICIT NONE
>        INCLUDE "openacc_lib.h"


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: acc_on_device for device_type_host_nonshm
  2015-05-21 11:32                                                                   ` acc_on_device for device_type_host_nonshm Thomas Schwinge
@ 2015-05-21 11:42                                                                     ` Jakub Jelinek
  2015-05-28 11:56                                                                       ` H.J. Lu
  0 siblings, 1 reply; 92+ messages in thread
From: Jakub Jelinek @ 2015-05-21 11:42 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: Julian Brown, gcc-patches, Kirill Yukhin, Ilya Verbin

On Thu, May 21, 2015 at 01:02:12PM +0200, Thomas Schwinge wrote:
> Hi!
> 
> On Thu, 7 May 2015 19:32:26 +0100, Julian Brown <julian@codesourcery.com> wrote:
> > Here's a new version of the patch [...]
> 
> > OK for trunk?
> 
> Makes sense to me (with just a request to drop the testsuite changes, see
> below), to get the existing regressions under control.  Jakub?

Ok for trunk.
> 
> >     PR libgomp/65742
> > 
> >     gcc/
> >     * builtins.c (expand_builtin_acc_on_device): Don't use open-coded
> >     sequence for !ACCEL_COMPILER.
> > 
> >     libgomp/
> >     * oacc-init.c (plugin/plugin-host.h): Include.
> >     (acc_on_device): Check whether we're in an offloaded region for
> >     host_nonshm
> >     plugin. Don't use __builtin_acc_on_device.
> >     * plugin/plugin-host.c (GOMP_OFFLOAD_openacc_parallel): Set
> >     nonshm_exec flag in thread-local data.
> >     (GOMP_OFFLOAD_openacc_create_thread_data): Allocate thread-local
> >     data for host_nonshm plugin.
> >     (GOMP_OFFLOAD_openacc_destroy_thread_data): Free thread-local data
> >     for host_nonshm plugin.
> >     * plugin/plugin-host.h: New.
> >     * testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c: Remove
> >     -fno-builtin-acc_on_device flag.
> >     * testsuite/libgomp.oacc-c-c++-common/if-1.c: Likewise.
> >     * testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90: Remove
> >     comment re: acc_on_device builtin.
> >     * testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f: Likewise.
> >     * testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f: Likewise.> commit adccf2e7d313263d585f63e752a4d36653d47811

	Jakub

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: acc_on_device for device_type_host_nonshm
  2015-05-21 11:42                                                                     ` Jakub Jelinek
@ 2015-05-28 11:56                                                                       ` H.J. Lu
  2015-05-28 13:29                                                                         ` Julian Brown
  0 siblings, 1 reply; 92+ messages in thread
From: H.J. Lu @ 2015-05-28 11:56 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Thomas Schwinge, Julian Brown, GCC Patches, Kirill Yukhin, Ilya Verbin

On Thu, May 21, 2015 at 4:10 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, May 21, 2015 at 01:02:12PM +0200, Thomas Schwinge wrote:
>> Hi!
>>
>> On Thu, 7 May 2015 19:32:26 +0100, Julian Brown <julian@codesourcery.com> wrote:
>> > Here's a new version of the patch [...]
>>
>> > OK for trunk?
>>
>> Makes sense to me (with just a request to drop the testsuite changes, see
>> below), to get the existing regressions under control.  Jakub?
>
> Ok for trunk.
>>
>> >     PR libgomp/65742
>> >
>> >     gcc/
>> >     * builtins.c (expand_builtin_acc_on_device): Don't use open-coded
>> >     sequence for !ACCEL_COMPILER.
>> >

It breaks bootstrap on x86:

https://gcc.gnu.org/ml/gcc-regression/2015-05/msg00389.html

I checked in this to fix it.

-- 
H.J.
---
Index: gcc/ChangeLog
===================================================================
--- gcc/ChangeLog (revision 223804)
+++ gcc/ChangeLog (working copy)
@@ -1,3 +1,8 @@
+2015-05-28  H.J. Lu  <hongjiu.lu@intel.com>
+
+ * builtins.c (expand_builtin_acc_on_device): Mark parameters
+ with ATTRIBUTE_UNUSED.
+
 2015-05-28  Julian Brown  <julian@codesourcery.com>

  PR libgomp/65742
Index: gcc/builtins.c
===================================================================
--- gcc/builtins.c (revision 223804)
+++ gcc/builtins.c (working copy)
@@ -5911,7 +5911,8 @@
    acceleration device (ACCEL_COMPILER conditional).  */

 static rtx
-expand_builtin_acc_on_device (tree exp, rtx target)
+expand_builtin_acc_on_device (tree exp ATTRIBUTE_UNUSED,
+      rtx target ATTRIBUTE_UNUSED)
 {
 #ifdef ACCEL_COMPILER
   if (!validate_arglist (exp, INTEGER_TYPE, VOID_TYPE))

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: acc_on_device for device_type_host_nonshm
  2015-05-28 11:56                                                                       ` H.J. Lu
@ 2015-05-28 13:29                                                                         ` Julian Brown
  2015-06-04  7:25                                                                           ` [gomp4] " Tom de Vries
  0 siblings, 1 reply; 92+ messages in thread
From: Julian Brown @ 2015-05-28 13:29 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Jakub Jelinek, Thomas Schwinge, GCC Patches, Kirill Yukhin, Ilya Verbin

On Thu, 28 May 2015 04:48:58 -0700
"H.J. Lu" <hjl.tools@gmail.com> wrote:

> On Thu, May 21, 2015 at 4:10 AM, Jakub Jelinek <jakub@redhat.com>
> wrote:
> > On Thu, May 21, 2015 at 01:02:12PM +0200, Thomas Schwinge wrote:
> >> Hi!
> >>
> >> On Thu, 7 May 2015 19:32:26 +0100, Julian Brown
> >> <julian@codesourcery.com> wrote:
> >> > Here's a new version of the patch [...]
> >>
> >> > OK for trunk?
> >>
> >> Makes sense to me (with just a request to drop the testsuite
> >> changes, see below), to get the existing regressions under
> >> control.  Jakub?
> >
> > Ok for trunk.
> >>
> >> >     PR libgomp/65742
> >> >
> >> >     gcc/
> >> >     * builtins.c (expand_builtin_acc_on_device): Don't use
> >> > open-coded sequence for !ACCEL_COMPILER.
> >> >
> 
> It breaks bootstrap on x86:
> 
> https://gcc.gnu.org/ml/gcc-regression/2015-05/msg00389.html
> 
> I checked in this to fix it.

Apologies, and thanks!

Julian

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PR libgomp/65742, PR middle-end/66332] XFAIL acc_on_device compile-time evaluation (was: acc_on_device for device_type_host_nonshm)
  2015-05-07 18:32                                                                 ` acc_on_device for device_type_host_nonshm (was: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks) (PR65742) Julian Brown
  2015-05-21 11:32                                                                   ` acc_on_device for device_type_host_nonshm Thomas Schwinge
@ 2015-06-02 12:08                                                                   ` Thomas Schwinge
  2015-07-14 20:26                                                                   ` PR65742: OpenACC acc_on_device fixes Thomas Schwinge
  2 siblings, 0 replies; 92+ messages in thread
From: Thomas Schwinge @ 2015-06-02 12:08 UTC (permalink / raw)
  To: gcc-patches, Julian Brown; +Cc: jpsinthemix, dominiq, danglin, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 5364 bytes --]

Hi!

On Thu, 7 May 2015 19:32:26 +0100, Julian Brown <julian@codesourcery.com> wrote:
> On Fri, 17 Apr 2015 15:16:19 +0200
> Jakub Jelinek <jakub@redhat.com> wrote:
> 
> > On Tue, Apr 14, 2015 at 05:43:26PM +0200, Thomas Schwinge wrote:
> > > Really, acc_on_device is implemented as a compiler builtin (which
> > > is just disabled for a few libgomp test cases, in order to test the
> > > acc_on_device library function in libgomp), and I never understood
> > > why the "fallback" implementation in libgomp (cited above) should
> > > be doing anything different from the GCC builtin.  Is the "problem"
> > > actually, that some
> > 
> > The question is if the builtin expansion isn't wrong, at least as
> > long as the host_nonshm device is meant to be supported.  The
> > #ifdef ACCEL_COMPILER
> > case is easier, at least as long as ACCEL_COMPILER compiled code is
> > not meant to be able to offload to other devices (or host again), but
> > the non-ACCEL_COMPILER case means the code is either on the host, or
> > host_nonshm, or e.g. with Intel MIC you could have some shared
> > library be compiled by the host compiler, but then actuall linked
> > into the MIC offloaded path.  In all those cases, I think it is just
> > the library that can determine the return value.
> > 
> > E.g. OpenMP omp_is_initial_device function is also only implemented
> > in the library, perhaps at some point I could expand it for #ifdef
> > ACCEL_COMPILER as builtin, but not for the host code, at least not
> > due to the host-nonshm plugin.
> 
> Here's a new version of the patch that doesn't use the open-coded
> expansion for acc_on_device for the host compiler at all. This means
> that the host and the host_nonshm plugin should DTRT without any
> special compiler options (which have thus been removed from the libgomp
> tests that set them or refer to them).
> 
> So now, for the host, acc_on_device returns:
> 
> acc_on_device (acc_device_none): true
> acc_on_device (acc_device_host): true
> otherwise: false
> 
> When the host_nonshm plugin is active, acc_on_device returns:
> 
> acc_on_device (acc_device_host_nonshm): true (except when "host
> fallback" is in effect, i.e. because of a false "if" clause).
> acc_on_device (acc_device_not_host): likewise.
> otherwise: false
> 
> In particular, the host_nonshm plugin doesn't consider itself to be
> running code "on the host".

>     PR libgomp/65742
> 
>     gcc/
>     * builtins.c (expand_builtin_acc_on_device): Don't use open-coded
>     sequence for !ACCEL_COMPILER.

As reported in <https://gcc.gnu.org/PR66332>, this caused the following
regression (C testing):

    PASS: c-c++-common/goacc/acc_on_device-2.c (test for excess errors)
    [-PASS:-]{+FAIL:+} c-c++-common/goacc/acc_on_device-2.c scan-rtl-dump-times expand "\\(call [^\\n]* acc_on_device" 0

Committed to trunk in r224028:

commit 1c2d9da9cee04516151b3894edb107e3cdf2c8b9
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Tue Jun 2 11:48:56 2015 +0000

    [PR libgomp/65742, PR middle-end/66332] XFAIL acc_on_device compile-time evaluation
    
    The OpenACC 2.0a specification mandates differently, but we currently do get a
    library call in the host code.
    
    	PR libgomp/65742
    	PR middle-end/66332
    
    	gcc/testsuite/
    	* c-c++-common/goacc/acc_on_device-2.c: XFAIL for C, too.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@224028 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/testsuite/ChangeLog                            |  6 ++++++
 gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c | 10 +++++++++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git gcc/testsuite/ChangeLog gcc/testsuite/ChangeLog
index d91cf7c..3f51b10 100644
--- gcc/testsuite/ChangeLog
+++ gcc/testsuite/ChangeLog
@@ -1,3 +1,9 @@
+2015-06-02  Thomas Schwinge  <thomas@codesourcery.com>
+
+	PR libgomp/65742
+	PR middle-end/66332
+	* c-c++-common/goacc/acc_on_device-2.c: XFAIL for C, too.
+
 2015-06-02  Uros Bizjak  <ubizjak@gmail.com>
 
 	* g++.dg/abi/mangle-regparm.C (dg-do): Fix x86_32 target selector.
diff --git gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c
index 8db0a66..6e3d292 100644
--- gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c
+++ gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c
@@ -20,9 +20,17 @@ f (void)
 }
 
 /* With -fopenacc, we're expecting the builtin to be expanded, so no calls.
+
    TODO: in C++, even under extern "C", the use of enum for acc_device_t
    perturbs expansion as a builtin, which expects an int parameter.  It's fine
    when changing acc_device_t to plain int, but that's not what we're doing in
    <openacc.h>.
-   { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]* acc_on_device" 0 "expand" { xfail c++ } } } */
+
+   TODO: given that we can't expand acc_on_device in
+   gcc/builtins.c:expand_builtin_acc_on_device for in the !ACCEL_COMPILER case
+   (because at that point we don't know whether we're acc_device_host or
+   acc_device_host_nonshm), we'll (erroneously) get a library call in the host
+   code.
+
+   { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]* acc_on_device" 0 "expand" { xfail { c || c++ } } } } */
 


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [gomp4] Re: acc_on_device for device_type_host_nonshm
  2015-05-28 13:29                                                                         ` Julian Brown
@ 2015-06-04  7:25                                                                           ` Tom de Vries
  0 siblings, 0 replies; 92+ messages in thread
From: Tom de Vries @ 2015-06-04  7:25 UTC (permalink / raw)
  To: Julian Brown, H.J. Lu
  Cc: Jakub Jelinek, Thomas Schwinge, GCC Patches, Kirill Yukhin, Ilya Verbin

[-- Attachment #1: Type: text/plain, Size: 1017 bytes --]

On 28/05/15 15:16, Julian Brown wrote:
> On Thu, 28 May 2015 04:48:58 -0700
> "H.J. Lu" <hjl.tools@gmail.com> wrote:
>
>> On Thu, May 21, 2015 at 4:10 AM, Jakub Jelinek <jakub@redhat.com>
>> wrote:
>>> On Thu, May 21, 2015 at 01:02:12PM +0200, Thomas Schwinge wrote:
>>>> Hi!
>>>>
>>>> On Thu, 7 May 2015 19:32:26 +0100, Julian Brown
>>>> <julian@codesourcery.com> wrote:
>>>>> Here's a new version of the patch [...]
>>>>
>>>>> OK for trunk?
>>>>
>>>> Makes sense to me (with just a request to drop the testsuite
>>>> changes, see below), to get the existing regressions under
>>>> control.  Jakub?
>>>
>>> Ok for trunk.
>>>>
>>>>>      PR libgomp/65742
>>>>>
>>>>>      gcc/
>>>>>      * builtins.c (expand_builtin_acc_on_device): Don't use
>>>>> open-coded sequence for !ACCEL_COMPILER.
>>>>>
>>
>> It breaks bootstrap on x86:
>>
>> https://gcc.gnu.org/ml/gcc-regression/2015-05/msg00389.html
>>
>> I checked in this to fix it.
>
> Apologies, and thanks!
>

And backported it to gomp-4_0-branch.

Thanks,
- Tom




[-- Attachment #2: 0001-Mark-parameters-with-ATTRIBUTE_UNUSED.patch --]
[-- Type: text/x-patch, Size: 800 bytes --]

Mark parameters with ATTRIBUTE_UNUSED

2015-06-04  Tom de Vries  <tom@codesourcery.com>

	backport from trunk:
	2015-05-28  H.J. Lu  <hongjiu.lu@intel.com>

	* builtins.c (expand_builtin_acc_on_device): Mark parameters
	with ATTRIBUTE_UNUSED.
---
 gcc/builtins.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index bfa9832..6574413 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -5915,7 +5915,8 @@ expand_stack_save (void)
    acceleration device (ACCEL_COMPILER conditional).  */
 
 static rtx
-expand_builtin_acc_on_device (tree exp, rtx target)
+expand_builtin_acc_on_device (tree exp ATTRIBUTE_UNUSED,
+			      rtx target ATTRIBUTE_UNUSED)
 {
 #ifdef ACCEL_COMPILER
   if (!validate_arglist (exp, INTEGER_TYPE, VOID_TYPE))
-- 
1.9.1


^ permalink raw reply	[flat|nested] 92+ messages in thread

* PR65742: OpenACC acc_on_device fixes
  2015-05-07 18:32                                                                 ` acc_on_device for device_type_host_nonshm (was: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks) (PR65742) Julian Brown
  2015-05-21 11:32                                                                   ` acc_on_device for device_type_host_nonshm Thomas Schwinge
  2015-06-02 12:08                                                                   ` [PR libgomp/65742, PR middle-end/66332] XFAIL acc_on_device compile-time evaluation (was: acc_on_device for device_type_host_nonshm) Thomas Schwinge
@ 2015-07-14 20:26                                                                   ` Thomas Schwinge
  2015-07-15  7:27                                                                     ` Richard Biener
  2 siblings, 1 reply; 92+ messages in thread
From: Thomas Schwinge @ 2015-07-14 20:26 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 8975 bytes --]

Hi!

Per your request in
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65742#c8>, »can you please
fix the gcc 5 branch«, I'm planning to apply the following to
gcc-5-branch tomorrow (but wanted to give you a chance to veto, given
that your backport request pre-dates the branch freeze by a week):

commit b73b9881a781f8e5572ce6c6a38f51696fc09b83
Author: Thomas Schwinge <thomas@codesourcery.com>
Date:   Tue Jul 14 15:27:49 2015 +0200

    OpenACC acc_on_device fixes
    
    Backport trunk r223801:
    
        PR libgomp/65742
    
        gcc/
        * builtins.c (expand_builtin_acc_on_device): Don't use open-coded
        sequence for !ACCEL_COMPILER.
    
        libgomp/
        * oacc-init.c (plugin/plugin-host.h): Include.
        (acc_on_device): Check whether we're in an offloaded region for
        host_nonshm
        plugin. Don't use __builtin_acc_on_device.
        * plugin/plugin-host.c (GOMP_OFFLOAD_openacc_parallel): Set
        nonshm_exec flag in thread-local data.
        (GOMP_OFFLOAD_openacc_create_thread_data): Allocate thread-local
        data for host_nonshm plugin.
        (GOMP_OFFLOAD_openacc_destroy_thread_data): Free thread-local data
        for host_nonshm plugin.
        * plugin/plugin-host.h: New.
    
    Mark parameters with ATTRIBUTE_UNUSED
    
    Backport trunk r223805:
    
    	* builtins.c (expand_builtin_acc_on_device): Mark parameters
    	with ATTRIBUTE_UNUSED.
    
    [PR libgomp/65742, PR middle-end/66332] XFAIL acc_on_device compile-time evaluation
    
    The OpenACC 2.0a specification mandates differently, but we currently do get a
    library call in the host code.
    
    Backport trunk r224028:
    
    	PR libgomp/65742
    	PR middle-end/66332
    
    	gcc/testsuite/
    	* c-c++-common/goacc/acc_on_device-2.c: XFAIL for C, too.
---
 gcc/builtins.c                                     |   12 +++----
 gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c |   10 +++++-
 libgomp/oacc-init.c                                |   14 ++++++--
 libgomp/plugin/plugin-host.c                       |   21 +++++++++--
 libgomp/plugin/plugin-host.h                       |   37 ++++++++++++++++++++
 8 files changed, 127 insertions(+), 12 deletions(-)

diff --git gcc/builtins.c gcc/builtins.c
index 9263777..bcbc11d 100644
--- gcc/builtins.c
+++ gcc/builtins.c
@@ -5915,8 +5915,10 @@ expand_stack_save (void)
    acceleration device (ACCEL_COMPILER conditional).  */
 
 static rtx
-expand_builtin_acc_on_device (tree exp, rtx target)
+expand_builtin_acc_on_device (tree exp ATTRIBUTE_UNUSED,
+			      rtx target ATTRIBUTE_UNUSED)
 {
+#ifdef ACCEL_COMPILER
   if (!validate_arglist (exp, INTEGER_TYPE, VOID_TYPE))
     return NULL_RTX;
 
@@ -5925,13 +5927,8 @@ expand_builtin_acc_on_device (tree exp, rtx target)
   /* Return (arg == v1 || arg == v2) ? 1 : 0.  */
   machine_mode v_mode = TYPE_MODE (TREE_TYPE (arg));
   rtx v = expand_normal (arg), v1, v2;
-#ifdef ACCEL_COMPILER
   v1 = GEN_INT (GOMP_DEVICE_NOT_HOST);
   v2 = GEN_INT (ACCEL_COMPILER_acc_device);
-#else
-  v1 = GEN_INT (GOMP_DEVICE_NONE);
-  v2 = GEN_INT (GOMP_DEVICE_HOST);
-#endif
   machine_mode target_mode = TYPE_MODE (integer_type_node);
   if (!target || !register_operand (target, target_mode))
     target = gen_reg_rtx (target_mode);
@@ -5945,6 +5942,9 @@ expand_builtin_acc_on_device (tree exp, rtx target)
   emit_label (done_label);
 
   return target;
+#else
+  return NULL;
+#endif
 }
 
 
diff --git gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c
index 2f4ee2b..7fe4e4e 100644
--- gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c
+++ gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c
@@ -20,10 +20,18 @@ f (void)
 }
 
 /* With -fopenacc, we're expecting the builtin to be expanded, so no calls.
+
    TODO: in C++, even under extern "C", the use of enum for acc_device_t
    perturbs expansion as a builtin, which expects an int parameter.  It's fine
    when changing acc_device_t to plain int, but that's not what we're doing in
    <openacc.h>.
-   { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]* acc_on_device" 0 "expand" { xfail c++ } } } */
+
+   TODO: given that we can't expand acc_on_device in
+   gcc/builtins.c:expand_builtin_acc_on_device for in the !ACCEL_COMPILER case
+   (because at that point we don't know whether we're acc_device_host or
+   acc_device_host_nonshm), we'll (erroneously) get a library call in the host
+   code.
+
+   { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]* acc_on_device" 0 "expand" { xfail { c || c++ } } } } */
 
 /* { dg-final { cleanup-rtl-dump "expand" } } */
diff --git libgomp/oacc-init.c libgomp/oacc-init.c
index dc40fb6..a7c2e0d 100644
--- libgomp/oacc-init.c
+++ libgomp/oacc-init.c
@@ -29,6 +29,7 @@
 #include "libgomp.h"
 #include "oacc-int.h"
 #include "openacc.h"
+#include "plugin/plugin-host.h"
 #include <assert.h>
 #include <stdlib.h>
 #include <strings.h>
@@ -548,11 +549,18 @@ ialias (acc_set_device_num)
 int
 acc_on_device (acc_device_t dev)
 {
-  if (acc_get_device_type () == acc_device_host_nonshm)
+  struct goacc_thread *thr = goacc_thread ();
+
+  /* We only want to appear to be the "host_nonshm" plugin from "offloaded"
+     code -- i.e. within a parallel region.  Test a flag set by the
+     openacc_parallel hook of the host_nonshm plugin to determine that.  */
+  if (acc_get_device_type () == acc_device_host_nonshm
+      && thr && thr->target_tls
+      && ((struct nonshm_thread *)thr->target_tls)->nonshm_exec)
     return dev == acc_device_host_nonshm || dev == acc_device_not_host;
 
-  /* Just rely on the compiler builtin.  */
-  return __builtin_acc_on_device (dev);
+  /* For OpenACC, libgomp is only built for the host, so this is sufficient.  */
+  return dev == acc_device_host || dev == acc_device_none;
 }
 
 ialias (acc_on_device)
diff --git libgomp/plugin/plugin-host.c libgomp/plugin/plugin-host.c
index 1faf5bc..3cb4dab 100644
--- libgomp/plugin/plugin-host.c
+++ libgomp/plugin/plugin-host.c
@@ -44,6 +44,7 @@
 #include <stdlib.h>
 #include <string.h>
 #include <stdio.h>
+#include <stdbool.h>
 
 #ifdef HOST_NONSHM_PLUGIN
 #define STATIC
@@ -55,6 +56,10 @@
 #define SELF "host: "
 #endif
 
+#ifdef HOST_NONSHM_PLUGIN
+#include "plugin-host.h"
+#endif
+
 STATIC const char *
 GOMP_OFFLOAD_get_name (void)
 {
@@ -174,7 +179,10 @@ GOMP_OFFLOAD_openacc_parallel (void (*fn) (void *),
 			       void *targ_mem_desc __attribute__ ((unused)))
 {
 #ifdef HOST_NONSHM_PLUGIN
+  struct nonshm_thread *thd = GOMP_PLUGIN_acc_thread ();
+  thd->nonshm_exec = true;
   fn (devaddrs);
+  thd->nonshm_exec = false;
 #else
   fn (hostaddrs);
 #endif
@@ -232,11 +240,20 @@ STATIC void *
 GOMP_OFFLOAD_openacc_create_thread_data (int ord
 					 __attribute__ ((unused)))
 {
+#ifdef HOST_NONSHM_PLUGIN
+  struct nonshm_thread *thd
+    = GOMP_PLUGIN_malloc (sizeof (struct nonshm_thread));
+  thd->nonshm_exec = false;
+  return thd;
+#else
   return NULL;
+#endif
 }
 
 STATIC void
-GOMP_OFFLOAD_openacc_destroy_thread_data (void *tls_data
-					  __attribute__ ((unused)))
+GOMP_OFFLOAD_openacc_destroy_thread_data (void *tls_data)
 {
+#ifdef HOST_NONSHM_PLUGIN
+  free (tls_data);
+#endif
 }
diff --git libgomp/plugin/plugin-host.h libgomp/plugin/plugin-host.h
new file mode 100644
index 0000000..96955d1
--- /dev/null
+++ libgomp/plugin/plugin-host.h
@@ -0,0 +1,37 @@
+/* OpenACC Runtime Library: acc_device_host, acc_device_host_nonshm.
+
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef PLUGIN_HOST_H
+#define PLUGIN_HOST_H
+
+struct nonshm_thread
+{
+  bool nonshm_exec;
+};
+
+#endif


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: PR65742: OpenACC acc_on_device fixes
  2015-07-14 20:26                                                                   ` PR65742: OpenACC acc_on_device fixes Thomas Schwinge
@ 2015-07-15  7:27                                                                     ` Richard Biener
  0 siblings, 0 replies; 92+ messages in thread
From: Richard Biener @ 2015-07-15  7:27 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: gcc-patches

On July 14, 2015 10:15:30 PM GMT+02:00, Thomas Schwinge <thomas@codesourcery.com> wrote:
>Hi!
>
>Per your request in
><https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65742#c8>, »can you
>please
>fix the gcc 5 branch«, I'm planning to apply the following to
>gcc-5-branch tomorrow (but wanted to give you a chance to veto, given
>that your backport request pre-dates the branch freeze by a week):

OK

Richard

>commit b73b9881a781f8e5572ce6c6a38f51696fc09b83
>Author: Thomas Schwinge <thomas@codesourcery.com>
>Date:   Tue Jul 14 15:27:49 2015 +0200
>
>    OpenACC acc_on_device fixes
>    
>    Backport trunk r223801:
>    
>        PR libgomp/65742
>    
>        gcc/
>      * builtins.c (expand_builtin_acc_on_device): Don't use open-coded
>        sequence for !ACCEL_COMPILER.
>    
>        libgomp/
>        * oacc-init.c (plugin/plugin-host.h): Include.
>        (acc_on_device): Check whether we're in an offloaded region for
>        host_nonshm
>        plugin. Don't use __builtin_acc_on_device.
>        * plugin/plugin-host.c (GOMP_OFFLOAD_openacc_parallel): Set
>        nonshm_exec flag in thread-local data.
>       (GOMP_OFFLOAD_openacc_create_thread_data): Allocate thread-local
>        data for host_nonshm plugin.
>     (GOMP_OFFLOAD_openacc_destroy_thread_data): Free thread-local data
>        for host_nonshm plugin.
>        * plugin/plugin-host.h: New.
>    
>    Mark parameters with ATTRIBUTE_UNUSED
>    
>    Backport trunk r223805:
>    
>    	* builtins.c (expand_builtin_acc_on_device): Mark parameters
>    	with ATTRIBUTE_UNUSED.
>    
>[PR libgomp/65742, PR middle-end/66332] XFAIL acc_on_device
>compile-time evaluation
>    
>The OpenACC 2.0a specification mandates differently, but we currently
>do get a
>    library call in the host code.
>    
>    Backport trunk r224028:
>    
>    	PR libgomp/65742
>    	PR middle-end/66332
>    
>    	gcc/testsuite/
>    	* c-c++-common/goacc/acc_on_device-2.c: XFAIL for C, too.
>---
> gcc/builtins.c                                     |   12 +++----
> gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c |   10 +++++-
> libgomp/oacc-init.c                                |   14 ++++++--
> libgomp/plugin/plugin-host.c                       |   21 +++++++++--
>libgomp/plugin/plugin-host.h                       |   37
>++++++++++++++++++++
> 8 files changed, 127 insertions(+), 12 deletions(-)
>
>diff --git gcc/builtins.c gcc/builtins.c
>index 9263777..bcbc11d 100644
>--- gcc/builtins.c
>+++ gcc/builtins.c
>@@ -5915,8 +5915,10 @@ expand_stack_save (void)
>    acceleration device (ACCEL_COMPILER conditional).  */
> 
> static rtx
>-expand_builtin_acc_on_device (tree exp, rtx target)
>+expand_builtin_acc_on_device (tree exp ATTRIBUTE_UNUSED,
>+			      rtx target ATTRIBUTE_UNUSED)
> {
>+#ifdef ACCEL_COMPILER
>   if (!validate_arglist (exp, INTEGER_TYPE, VOID_TYPE))
>     return NULL_RTX;
> 
>@@ -5925,13 +5927,8 @@ expand_builtin_acc_on_device (tree exp, rtx
>target)
>   /* Return (arg == v1 || arg == v2) ? 1 : 0.  */
>   machine_mode v_mode = TYPE_MODE (TREE_TYPE (arg));
>   rtx v = expand_normal (arg), v1, v2;
>-#ifdef ACCEL_COMPILER
>   v1 = GEN_INT (GOMP_DEVICE_NOT_HOST);
>   v2 = GEN_INT (ACCEL_COMPILER_acc_device);
>-#else
>-  v1 = GEN_INT (GOMP_DEVICE_NONE);
>-  v2 = GEN_INT (GOMP_DEVICE_HOST);
>-#endif
>   machine_mode target_mode = TYPE_MODE (integer_type_node);
>   if (!target || !register_operand (target, target_mode))
>     target = gen_reg_rtx (target_mode);
>@@ -5945,6 +5942,9 @@ expand_builtin_acc_on_device (tree exp, rtx
>target)
>   emit_label (done_label);
> 
>   return target;
>+#else
>+  return NULL;
>+#endif
> }
> 
> 
>diff --git gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c
>gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c
>index 2f4ee2b..7fe4e4e 100644
>--- gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c
>+++ gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c
>@@ -20,10 +20,18 @@ f (void)
> }
> 
>/* With -fopenacc, we're expecting the builtin to be expanded, so no
>calls.
>+
>  TODO: in C++, even under extern "C", the use of enum for acc_device_t
>perturbs expansion as a builtin, which expects an int parameter.  It's
>fine
>when changing acc_device_t to plain int, but that's not what we're
>doing in
>    <openacc.h>.
>-   { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]* acc_on_device"
>0 "expand" { xfail c++ } } } */
>+
>+   TODO: given that we can't expand acc_on_device in
>+   gcc/builtins.c:expand_builtin_acc_on_device for in the
>!ACCEL_COMPILER case
>+   (because at that point we don't know whether we're acc_device_host
>or
>+   acc_device_host_nonshm), we'll (erroneously) get a library call in
>the host
>+   code.
>+
>+   { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]* acc_on_device"
>0 "expand" { xfail { c || c++ } } } } */
> 
> /* { dg-final { cleanup-rtl-dump "expand" } } */
>diff --git libgomp/oacc-init.c libgomp/oacc-init.c
>index dc40fb6..a7c2e0d 100644
>--- libgomp/oacc-init.c
>+++ libgomp/oacc-init.c
>@@ -29,6 +29,7 @@
> #include "libgomp.h"
> #include "oacc-int.h"
> #include "openacc.h"
>+#include "plugin/plugin-host.h"
> #include <assert.h>
> #include <stdlib.h>
> #include <strings.h>
>@@ -548,11 +549,18 @@ ialias (acc_set_device_num)
> int
> acc_on_device (acc_device_t dev)
> {
>-  if (acc_get_device_type () == acc_device_host_nonshm)
>+  struct goacc_thread *thr = goacc_thread ();
>+
>+  /* We only want to appear to be the "host_nonshm" plugin from
>"offloaded"
>+     code -- i.e. within a parallel region.  Test a flag set by the
>+     openacc_parallel hook of the host_nonshm plugin to determine
>that.  */
>+  if (acc_get_device_type () == acc_device_host_nonshm
>+      && thr && thr->target_tls
>+      && ((struct nonshm_thread *)thr->target_tls)->nonshm_exec)
>    return dev == acc_device_host_nonshm || dev == acc_device_not_host;
> 
>-  /* Just rely on the compiler builtin.  */
>-  return __builtin_acc_on_device (dev);
>+  /* For OpenACC, libgomp is only built for the host, so this is
>sufficient.  */
>+  return dev == acc_device_host || dev == acc_device_none;
> }
> 
> ialias (acc_on_device)
>diff --git libgomp/plugin/plugin-host.c libgomp/plugin/plugin-host.c
>index 1faf5bc..3cb4dab 100644
>--- libgomp/plugin/plugin-host.c
>+++ libgomp/plugin/plugin-host.c
>@@ -44,6 +44,7 @@
> #include <stdlib.h>
> #include <string.h>
> #include <stdio.h>
>+#include <stdbool.h>
> 
> #ifdef HOST_NONSHM_PLUGIN
> #define STATIC
>@@ -55,6 +56,10 @@
> #define SELF "host: "
> #endif
> 
>+#ifdef HOST_NONSHM_PLUGIN
>+#include "plugin-host.h"
>+#endif
>+
> STATIC const char *
> GOMP_OFFLOAD_get_name (void)
> {
>@@ -174,7 +179,10 @@ GOMP_OFFLOAD_openacc_parallel (void (*fn) (void
>*),
> 			       void *targ_mem_desc __attribute__ ((unused)))
> {
> #ifdef HOST_NONSHM_PLUGIN
>+  struct nonshm_thread *thd = GOMP_PLUGIN_acc_thread ();
>+  thd->nonshm_exec = true;
>   fn (devaddrs);
>+  thd->nonshm_exec = false;
> #else
>   fn (hostaddrs);
> #endif
>@@ -232,11 +240,20 @@ STATIC void *
> GOMP_OFFLOAD_openacc_create_thread_data (int ord
> 					 __attribute__ ((unused)))
> {
>+#ifdef HOST_NONSHM_PLUGIN
>+  struct nonshm_thread *thd
>+    = GOMP_PLUGIN_malloc (sizeof (struct nonshm_thread));
>+  thd->nonshm_exec = false;
>+  return thd;
>+#else
>   return NULL;
>+#endif
> }
> 
> STATIC void
>-GOMP_OFFLOAD_openacc_destroy_thread_data (void *tls_data
>-					  __attribute__ ((unused)))
>+GOMP_OFFLOAD_openacc_destroy_thread_data (void *tls_data)
> {
>+#ifdef HOST_NONSHM_PLUGIN
>+  free (tls_data);
>+#endif
> }
>diff --git libgomp/plugin/plugin-host.h libgomp/plugin/plugin-host.h
>new file mode 100644
>index 0000000..96955d1
>--- /dev/null
>+++ libgomp/plugin/plugin-host.h
>@@ -0,0 +1,37 @@
>+/* OpenACC Runtime Library: acc_device_host, acc_device_host_nonshm.
>+
>+   Copyright (C) 2015 Free Software Foundation, Inc.
>+
>+   Contributed by Mentor Embedded.
>+
>+   This file is part of the GNU Offloading and Multi Processing
>Library
>+   (libgomp).
>+
>+   Libgomp is free software; you can redistribute it and/or modify it
>+   under the terms of the GNU General Public License as published by
>+   the Free Software Foundation; either version 3, or (at your option)
>+   any later version.
>+
>+   Libgomp is distributed in the hope that it will be useful, but
>WITHOUT ANY
>+   WARRANTY; without even the implied warranty of MERCHANTABILITY or
>FITNESS
>+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>+   more details.
>+
>+   Under Section 7 of GPL version 3, you are granted additional
>+   permissions described in the GCC Runtime Library Exception, version
>+   3.1, as published by the Free Software Foundation.
>+
>+   You should have received a copy of the GNU General Public License
>and
>+   a copy of the GCC Runtime Library Exception along with this
>program;
>+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not,
>see
>+   <http://www.gnu.org/licenses/>.  */
>+
>+#ifndef PLUGIN_HOST_H
>+#define PLUGIN_HOST_H
>+
>+struct nonshm_thread
>+{
>+  bool nonshm_exec;
>+};
>+
>+#endif
>
>
>Grüße,
> Thomas


^ permalink raw reply	[flat|nested] 92+ messages in thread

* libgomp: Compile-time error for non-portable gomp_mutex_t initialization (was: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks)
  2015-03-26 20:41                                 ` Ilya Verbin
  2015-03-30 16:42                                   ` Jakub Jelinek
@ 2015-09-25 15:10                                   ` Thomas Schwinge
  2015-09-25 15:59                                     ` Jakub Jelinek
  2015-09-25 16:56                                   ` libgomp: Guard all offload_images/num_offload_images access by register_lock (was: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks) Thomas Schwinge
  2 siblings, 1 reply; 92+ messages in thread
From: Thomas Schwinge @ 2015-09-25 15:10 UTC (permalink / raw)
  To: Ilya Verbin, Jakub Jelinek; +Cc: Julian Brown, gcc-patches, Kirill Yukhin

[-- Attachment #1: Type: text/plain, Size: 4537 bytes --]

Hi!

It's Friday afternoon -- but anyway, is the following analysis correct?

On Thu, 26 Mar 2015 23:41:30 +0300, Ilya Verbin <iverbin@gmail.com> wrote:
> On Thu, Mar 26, 2015 at 13:09:19 +0100, Jakub Jelinek wrote:
> > the current code is majorly broken.  As I've said earlier, e.g. the lack
> > of mutex guarding gomp_target_init (which is using pthread_once guaranteed
> > to be run just once) vs. concurrent GOMP_offload_register calls
> > (if those are run from ctors, then I guess something like dl_load_lock
> > ensures at least on glibc that multiple GOMP_offload_register calls aren't
> > performed at the same time) in accessing/reallocating offload_images
> > and num_offload_images and the lack of support to register further
> > images after the gomp_target_init call (if you dlopen further shared
> > libraries) is really bad.  And it would be really nice to support the
> > unloading.

> Here is the latest patch for libgomp and mic plugin.

> libgomp/

> 	* target.c (register_lock): New mutex for offload image registration.

> 	(GOMP_offload_register): Add mutex lock.

> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -49,6 +49,9 @@ static void gomp_target_init (void);
>  /* The whole initialization code for offloading plugins is only run one.  */
>  static pthread_once_t gomp_is_initialized = PTHREAD_ONCE_INIT;
>  
> +/* Mutex for offload image registration.  */
> +static gomp_mutex_t register_lock;
> +
>  /* This structure describes an offload image.
>     It contains type of the target device, pointer to host table descriptor, and
>     pointer to target data.  */

No gomp_mutex_init call for register_lock has been added -- there is no
sensible place to put it, because...

> @@ -654,6 +727,18 @@ void
>  GOMP_offload_register (void *host_table, enum offload_target_type target_type,
>  		       void *target_data)
>  {
> +  int i;
> +  gomp_mutex_lock (&register_lock);
> +
> +  /* Load image to all initialized devices.  */
> +  for (i = 0; i < num_devices; i++)
> +    {
> +      struct gomp_device_descr *devicep = &devices[i];
> +      if (devicep->type == target_type && devicep->is_initialized)
> +	gomp_offload_image_to_device (devicep, host_table, target_data);
> +    }
> +
> +  /* Insert image to array of pending images.  */
>    offload_images = gomp_realloc (offload_images,
>  				 (num_offload_images + 1)
>  				 * sizeof (struct offload_image_descr));
> @@ -663,74 +748,105 @@ GOMP_offload_register (void *host_table, enum offload_target_type target_type,
>    offload_images[num_offload_images].target_data = target_data;
>  
>    num_offload_images++;
> +  gomp_mutex_unlock (&register_lock);
>  }

... it's used in this function, which is invoked from
__attributed__((constructor)) functions generated by
gcc/config/*/*mkoffload.c, and there is no guaranteed ordering for
constructor functions [... -- see source code comment added in the
following patch]:

commit 81d25f393d9b0fdca309354e6a61f707ff403fe2
Author: Thomas Schwinge <thomas@codesourcery.com>
Date:   Fri Sep 25 16:41:29 2015 +0200

    libgomp: Compile-time error for non-portable gomp_mutex_t initialization
    
    	libgomp/
    	* target.c [PLUGIN_SUPPORT]: Compile-time error if
    	!GOMP_MUTEX_INIT_0.
---
 libgomp/target.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git libgomp/target.c libgomp/target.c
index 758ece5..49cb395 100644
--- libgomp/target.c
+++ libgomp/target.c
@@ -52,6 +52,16 @@ static pthread_once_t gomp_is_initialized = PTHREAD_ONCE_INIT;
 /* Mutex for offload image registration.  */
 static gomp_mutex_t register_lock;
 
+#if !GOMP_MUTEX_INIT_0
+# error Missing initialization for gomp_mutex_t register_lock.
+/* The problem is: gomp_mutex_t register_lock is used in
+   GOMP_offload_register_ver/GOMP_offload_register, which is called from a
+   __attribute__((constructor)) function (see the mkoffload files), so due to
+   non-deterministic constructor ordering, we can't have an initialization
+   constructor for gomp_mutex_t register_lock, such as
+   critical.c:initialize_critical, for example.  */
+#endif
+
 /* This structure describes an offload image.
    It contains type of the target device, pointer to host table descriptor, and
    pointer to target data.  */

I guess we don't have to worry about this non-portable gomp_mutex_t usage
right now; OK to commit the patch to at least document it (patch not yet
tested)?


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: libgomp: Compile-time error for non-portable gomp_mutex_t initialization (was: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks)
  2015-09-25 15:10                                   ` libgomp: Compile-time error for non-portable gomp_mutex_t initialization (was: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks) Thomas Schwinge
@ 2015-09-25 15:59                                     ` Jakub Jelinek
  2015-11-18 15:20                                       ` libgomp: Compile-time error for non-portable gomp_mutex_t initialization Ilya Verbin
  0 siblings, 1 reply; 92+ messages in thread
From: Jakub Jelinek @ 2015-09-25 15:59 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: Ilya Verbin, Julian Brown, gcc-patches, Kirill Yukhin

On Fri, Sep 25, 2015 at 05:04:47PM +0200, Thomas Schwinge wrote:
> Hi!
> 
> It's Friday afternoon -- but anyway, is the following analysis correct?
> 
> On Thu, 26 Mar 2015 23:41:30 +0300, Ilya Verbin <iverbin@gmail.com> wrote:
> > On Thu, Mar 26, 2015 at 13:09:19 +0100, Jakub Jelinek wrote:
> > > the current code is majorly broken.  As I've said earlier, e.g. the lack
> > > of mutex guarding gomp_target_init (which is using pthread_once guaranteed
> > > to be run just once) vs. concurrent GOMP_offload_register calls
> > > (if those are run from ctors, then I guess something like dl_load_lock
> > > ensures at least on glibc that multiple GOMP_offload_register calls aren't
> > > performed at the same time) in accessing/reallocating offload_images
> > > and num_offload_images and the lack of support to register further
> > > images after the gomp_target_init call (if you dlopen further shared
> > > libraries) is really bad.  And it would be really nice to support the
> > > unloading.
> 
> > Here is the latest patch for libgomp and mic plugin.
> 
> > libgomp/
> 
> > 	* target.c (register_lock): New mutex for offload image registration.
> 
> > 	(GOMP_offload_register): Add mutex lock.

That is definitely wrong.  You'd totally break --disable-linux-futex support
on linux and bootstrap on e.g. Solaris and various other pthread targets.
At least for ELF and dynamic linking, shared libraries that contain
constructors that call GOMP_offload_register* should have DT_NEEDED libgomp
and thus libgomp's constructors should be run before the constructors of
the libraries that call GOMP_offload_register*.

For the targets without known zero initializer for gomp_mutex_lock, either
there is an option to use pthread_once to make sure it is initialized once,
or there is an option to define a macro like GOMP_MUTEX_INITIALIZER,
defined to PTHREAD_MUTEX_INITIALIZER in config/posix/mutex.h and to
{ 0 } in config/linux/mutex.h and something like {} or whatever in
config/rtems/mutex.h.  Then for the non-automatic non-heap
gomp_mutex_t's you could just initialize them in their initializers
with GOMP_MUTEX_INITIALIZER.

	Jakub

^ permalink raw reply	[flat|nested] 92+ messages in thread

* libgomp: Guard all offload_images/num_offload_images access by register_lock (was: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks)
  2015-03-26 20:41                                 ` Ilya Verbin
  2015-03-30 16:42                                   ` Jakub Jelinek
  2015-09-25 15:10                                   ` libgomp: Compile-time error for non-portable gomp_mutex_t initialization (was: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks) Thomas Schwinge
@ 2015-09-25 16:56                                   ` Thomas Schwinge
  2015-09-25 16:58                                     ` Ilya Verbin
  2 siblings, 1 reply; 92+ messages in thread
From: Thomas Schwinge @ 2015-09-25 16:56 UTC (permalink / raw)
  To: Ilya Verbin, Jakub Jelinek; +Cc: Julian Brown, gcc-patches, Kirill Yukhin

[-- Attachment #1: Type: text/plain, Size: 5734 bytes --]

Hi!

It's still Friday afternoon -- so please bear with me once again...

On Thu, 26 Mar 2015 23:41:30 +0300, Ilya Verbin <iverbin@gmail.com> wrote:
> On Thu, Mar 26, 2015 at 13:09:19 +0100, Jakub Jelinek wrote:
> > the current code is majorly broken.  As I've said earlier, e.g. the lack
> > of mutex guarding gomp_target_init (which is using pthread_once guaranteed
> > to be run just once) vs. concurrent GOMP_offload_register calls
> > (if those are run from ctors, then I guess something like dl_load_lock
> > ensures at least on glibc that multiple GOMP_offload_register calls aren't
> > performed at the same time) in accessing/reallocating offload_images
> > and num_offload_images and the lack of support to register further
> > images after the gomp_target_init call (if you dlopen further shared
> > libraries) is really bad.  And it would be really nice to support the
> > unloading.

> Here is the latest patch for libgomp and mic plugin.

What about the scenario where one thread is inside
GOMP_offload_register_ver/GOMP_offload_register (say, due to opening a
shared library with such a mkoffload-generated constructor) and is
modifying offload_images with register_lock held, and another thread is
inside a GOMP_target* construct -> gomp_init_device and is accessing
offload_images without register_lock held?  Or, why isn't that a
reachable scenario?

Would the following patch (untested) do the right thing (locking added to
gomp_init_device and gomp_unload_device)?  We can then also remove the
is_register_lock parameter from gomp_load_image_to_device, and simplify
the code.

commit 702090223a8d43e7371d6cbfc9d8a39e3c5c2986
Author: Thomas Schwinge <thomas@codesourcery.com>
Date:   Fri Sep 25 17:37:41 2015 +0200

    libgomp: Guard all offload_images/num_offload_images access by register_lock
---
 libgomp/target.c |   25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git libgomp/target.c libgomp/target.c
index 758ece5..1fbbe31 100644
--- libgomp/target.c
+++ libgomp/target.c
@@ -640,12 +640,13 @@ gomp_update (struct gomp_device_descr *devicep, size_t mapnum, void **hostaddrs,
 /* Load image pointed by TARGET_DATA to the device, specified by DEVICEP.
    And insert to splay tree the mapping between addresses from HOST_TABLE and
    from loaded target image.  We rely in the host and device compiler
-   emitting variable and functions in the same order.  */
+   emitting variable and functions in the same order.
+
+   The device must be locked, and REGISTER_LOCK must be held.  */
 
 static void
 gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
-			   const void *host_table, const void *target_data,
-			   bool is_register_lock)
+			   const void *host_table, const void *target_data)
 {
   void **host_func_table = ((void ***) host_table)[0];
   void **host_funcs_end  = ((void ***) host_table)[1];
@@ -668,8 +669,7 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
   if (num_target_entries != num_funcs + num_vars)
     {
       gomp_mutex_unlock (&devicep->lock);
-      if (is_register_lock)
-	gomp_mutex_unlock (&register_lock);
+      gomp_mutex_unlock (&register_lock);
       gomp_fatal ("Cannot map target functions or variables"
 		  " (expected %u, have %u)", num_funcs + num_vars,
 		  num_target_entries);
@@ -710,8 +710,7 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
 	  != (uintptr_t) host_var_table[i * 2 + 1])
 	{
 	  gomp_mutex_unlock (&devicep->lock);
-	  if (is_register_lock)
-	    gomp_mutex_unlock (&register_lock);
+	  gomp_mutex_unlock (&register_lock);
 	  gomp_fatal ("Can't map target variables (size mismatch)");
 	}
 
@@ -733,7 +732,8 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
 }
 
 /* Unload the mappings described by target_data from device DEVICE_P.
-   The device must be locked.   */
+
+   The device must be locked.  */
 
 static void
 gomp_unload_image_from_device (struct gomp_device_descr *devicep,
@@ -810,7 +810,7 @@ GOMP_offload_register_ver (unsigned version, const void *host_table,
       gomp_mutex_lock (&devicep->lock);
       if (devicep->type == target_type && devicep->is_initialized)
 	gomp_load_image_to_device (devicep, version,
-				   host_table, target_data, true);
+				   host_table, target_data);
       gomp_mutex_unlock (&devicep->lock);
     }
 
@@ -886,14 +886,15 @@ gomp_init_device (struct gomp_device_descr *devicep)
   devicep->init_device_func (devicep->target_id);
 
   /* Load to device all images registered by the moment.  */
+  gomp_mutex_lock (&register_lock);
   for (i = 0; i < num_offload_images; i++)
     {
       struct offload_image_descr *image = &offload_images[i];
       if (image->type == devicep->type)
 	gomp_load_image_to_device (devicep, image->version,
-				   image->host_table, image->target_data,
-				   false);
+				   image->host_table, image->target_data);
     }
+  gomp_mutex_unlock (&register_lock);
 
   devicep->is_initialized = true;
 }
@@ -906,6 +907,7 @@ gomp_unload_device (struct gomp_device_descr *devicep)
       unsigned i;
       
       /* Unload from device all images registered at the moment.  */
+      gomp_mutex_lock (&register_lock);
       for (i = 0; i < num_offload_images; i++)
 	{
 	  struct offload_image_descr *image = &offload_images[i];
@@ -914,6 +916,7 @@ gomp_unload_device (struct gomp_device_descr *devicep)
 					   image->host_table,
 					   image->target_data);
 	}
+      gomp_mutex_unlock (&register_lock);
     }
 }
 


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: libgomp: Guard all offload_images/num_offload_images access by register_lock (was: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks)
  2015-09-25 16:56                                   ` libgomp: Guard all offload_images/num_offload_images access by register_lock (was: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks) Thomas Schwinge
@ 2015-09-25 16:58                                     ` Ilya Verbin
  2015-09-28 10:03                                       ` libgomp: Guard all devices/num_devices/num_devices_openmp access by register_lock (was: libgomp: Guard all offload_images/num_offload_images access by register_lock) Thomas Schwinge
  0 siblings, 1 reply; 92+ messages in thread
From: Ilya Verbin @ 2015-09-25 16:58 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: Jakub Jelinek, Julian Brown, gcc-patches, Kirill Yukhin

On Fri, Sep 25, 2015 at 18:21:27 +0200, Thomas Schwinge wrote:
> On Thu, 26 Mar 2015 23:41:30 +0300, Ilya Verbin <iverbin@gmail.com> wrote:
> > On Thu, Mar 26, 2015 at 13:09:19 +0100, Jakub Jelinek wrote:
> > > the current code is majorly broken.  As I've said earlier, e.g. the lack
> > > of mutex guarding gomp_target_init (which is using pthread_once guaranteed
> > > to be run just once) vs. concurrent GOMP_offload_register calls
> > > (if those are run from ctors, then I guess something like dl_load_lock
> > > ensures at least on glibc that multiple GOMP_offload_register calls aren't
> > > performed at the same time) in accessing/reallocating offload_images
> > > and num_offload_images and the lack of support to register further
> > > images after the gomp_target_init call (if you dlopen further shared
> > > libraries) is really bad.  And it would be really nice to support the
> > > unloading.
> 
> > Here is the latest patch for libgomp and mic plugin.
> 
> What about the scenario where one thread is inside
> GOMP_offload_register_ver/GOMP_offload_register (say, due to opening a
> shared library with such a mkoffload-generated constructor) and is
> modifying offload_images with register_lock held, and another thread is
> inside a GOMP_target* construct -> gomp_init_device and is accessing
> offload_images without register_lock held?  Or, why isn't that a
> reachable scenario?
> 
> Would the following patch (untested) do the right thing (locking added to
> gomp_init_device and gomp_unload_device)?  We can then also remove the
> is_register_lock parameter from gomp_load_image_to_device, and simplify
> the code.

Looks like you're right, and this scenario is possible.

  -- Ilya

^ permalink raw reply	[flat|nested] 92+ messages in thread

* libgomp: Guard all devices/num_devices/num_devices_openmp access by register_lock (was: libgomp: Guard all offload_images/num_offload_images access by register_lock)
  2015-09-25 16:58                                     ` Ilya Verbin
@ 2015-09-28 10:03                                       ` Thomas Schwinge
  2015-10-06 11:49                                         ` Thomas Schwinge
  2015-10-09 11:58                                         ` libgomp: Guard all devices/num_devices/num_devices_openmp access by register_lock Bernd Schmidt
  0 siblings, 2 replies; 92+ messages in thread
From: Thomas Schwinge @ 2015-09-28 10:03 UTC (permalink / raw)
  To: Jakub Jelinek, Ilya Verbin; +Cc: Julian Brown, gcc-patches, Kirill Yukhin

[-- Attachment #1: Type: text/plain, Size: 4001 bytes --]

Hi!

On Fri, 25 Sep 2015 19:49:50 +0300, Ilya Verbin <iverbin@gmail.com> wrote:
> On Fri, Sep 25, 2015 at 18:21:27 +0200, Thomas Schwinge wrote:
> > On Thu, 26 Mar 2015 23:41:30 +0300, Ilya Verbin <iverbin@gmail.com> wrote:
> > > On Thu, Mar 26, 2015 at 13:09:19 +0100, Jakub Jelinek wrote:
> > > > the current code is majorly broken.  As I've said earlier, e.g. the lack
> > > > of mutex guarding gomp_target_init (which is using pthread_once guaranteed
> > > > to be run just once) vs. concurrent GOMP_offload_register calls
> > > > (if those are run from ctors, then I guess something like dl_load_lock
> > > > ensures at least on glibc that multiple GOMP_offload_register calls aren't
> > > > performed at the same time) in accessing/reallocating offload_images
> > > > and num_offload_images and the lack of support to register further
> > > > images after the gomp_target_init call (if you dlopen further shared
> > > > libraries) is really bad.  And it would be really nice to support the
> > > > unloading.
> > 
> > > Here is the latest patch for libgomp and mic plugin.
> > 
> > What about the scenario where one thread is inside
> > GOMP_offload_register_ver/GOMP_offload_register (say, due to opening a
> > shared library with such a mkoffload-generated constructor) and is
> > modifying offload_images with register_lock held, and another thread is
> > inside a GOMP_target* construct -> gomp_init_device and is accessing
> > offload_images without register_lock held?  Or, why isn't that a
> > reachable scenario?
> > 
> > Would the following patch (untested) do the right thing (locking added to
> > gomp_init_device and gomp_unload_device)?  We can then also remove the
> > is_register_lock parameter from gomp_load_image_to_device, and simplify
> > the code.
> 
> Looks like you're right, and this scenario is possible.

Thanks for your review!  Jakub, OK to commit the patch I had posted?


Then, in context of a similar scenario, I think we'll also want the
following.  Please confirm that my reasoning in gomp_get_num_devices and
resolve_device is correct.  OK for trunk?

commit b0cf4dcc588e432c0a0d19d85727a20210b4d837
Author: Thomas Schwinge <thomas@codesourcery.com>
Date:   Sat Sep 26 15:48:09 2015 +0200

    libgomp: Guard all devices/num_devices/num_devices_openmp access by register_lock
---
 libgomp/target.c |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git libgomp/target.c libgomp/target.c
index 1fbbe31..6f0a339 100644
--- libgomp/target.c
+++ libgomp/target.c
@@ -49,7 +49,7 @@ static void gomp_target_init (void);
 /* The whole initialization code for offloading plugins is only run one.  */
 static pthread_once_t gomp_is_initialized = PTHREAD_ONCE_INIT;
 
-/* Mutex for offload image registration.  */
+/* Mutex for offload targets setup and image registration.  */
 static gomp_mutex_t register_lock;
 
 /* This structure describes an offload image.
@@ -118,6 +118,8 @@ attribute_hidden int
 gomp_get_num_devices (void)
 {
   gomp_init_targets_once ();
+  /* As it is immutable once it has been initialized, it's safe to access
+     num_devices_openmp without register_lock held.  */
   return num_devices_openmp;
 }
 
@@ -133,6 +135,8 @@ resolve_device (int device_id)
   if (device_id < 0 || device_id >= gomp_get_num_devices ())
     return NULL;
 
+  /* As it is immutable once it has been initialized, it's safe to access
+     devices without register_lock held.  */
   return &devices[device_id];
 }
 
@@ -1228,6 +1232,8 @@ gomp_target_init (void)
   char *plugin_name;
   int i, new_num_devices;
 
+  gomp_mutex_lock (&register_lock);
+
   num_devices = 0;
   devices = NULL;
 
@@ -1317,6 +1323,8 @@ gomp_target_init (void)
       if (devices[i].capabilities & GOMP_OFFLOAD_CAP_OPENACC_200)
 	goacc_register (&devices[i]);
     }
+
+  gomp_mutex_unlock (&register_lock);
 }
 
 #else /* PLUGIN_SUPPORT */


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: libgomp: Guard all devices/num_devices/num_devices_openmp access by register_lock (was: libgomp: Guard all offload_images/num_offload_images access by register_lock)
  2015-09-28 10:03                                       ` libgomp: Guard all devices/num_devices/num_devices_openmp access by register_lock (was: libgomp: Guard all offload_images/num_offload_images access by register_lock) Thomas Schwinge
@ 2015-10-06 11:49                                         ` Thomas Schwinge
  2015-10-09 11:58                                         ` libgomp: Guard all devices/num_devices/num_devices_openmp access by register_lock Bernd Schmidt
  1 sibling, 0 replies; 92+ messages in thread
From: Thomas Schwinge @ 2015-10-06 11:49 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Julian Brown, gcc-patches, Kirill Yukhin, Ilya Verbin

[-- Attachment #1: Type: text/plain, Size: 4288 bytes --]

Hi!

On Mon, 28 Sep 2015 10:52:38 +0200, I wrote:
> On Fri, 25 Sep 2015 19:49:50 +0300, Ilya Verbin <iverbin@gmail.com> wrote:
> > On Fri, Sep 25, 2015 at 18:21:27 +0200, Thomas Schwinge wrote:
> > > On Thu, 26 Mar 2015 23:41:30 +0300, Ilya Verbin <iverbin@gmail.com> wrote:
> > > > On Thu, Mar 26, 2015 at 13:09:19 +0100, Jakub Jelinek wrote:
> > > > > the current code is majorly broken.  As I've said earlier, e.g. the lack
> > > > > of mutex guarding gomp_target_init (which is using pthread_once guaranteed
> > > > > to be run just once) vs. concurrent GOMP_offload_register calls
> > > > > (if those are run from ctors, then I guess something like dl_load_lock
> > > > > ensures at least on glibc that multiple GOMP_offload_register calls aren't
> > > > > performed at the same time) in accessing/reallocating offload_images
> > > > > and num_offload_images and the lack of support to register further
> > > > > images after the gomp_target_init call (if you dlopen further shared
> > > > > libraries) is really bad.  And it would be really nice to support the
> > > > > unloading.
> > > 
> > > > Here is the latest patch for libgomp and mic plugin.
> > > 
> > > What about the scenario where one thread is inside
> > > GOMP_offload_register_ver/GOMP_offload_register (say, due to opening a
> > > shared library with such a mkoffload-generated constructor) and is
> > > modifying offload_images with register_lock held, and another thread is
> > > inside a GOMP_target* construct -> gomp_init_device and is accessing
> > > offload_images without register_lock held?  Or, why isn't that a
> > > reachable scenario?
> > > 
> > > Would the following patch (untested) do the right thing (locking added to
> > > gomp_init_device and gomp_unload_device)?  We can then also remove the
> > > is_register_lock parameter from gomp_load_image_to_device, and simplify
> > > the code.
> > 
> > Looks like you're right, and this scenario is possible.
> 
> Thanks for your review!  Jakub, OK to commit the patch I had posted?

Ping.


And, likewise, ping for the following:

> Then, in context of a similar scenario, I think we'll also want the
> following.  Please confirm that my reasoning in gomp_get_num_devices and
> resolve_device is correct.  OK for trunk?
> 
> commit b0cf4dcc588e432c0a0d19d85727a20210b4d837
> Author: Thomas Schwinge <thomas@codesourcery.com>
> Date:   Sat Sep 26 15:48:09 2015 +0200
> 
>     libgomp: Guard all devices/num_devices/num_devices_openmp access by register_lock
> ---
>  libgomp/target.c |   10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git libgomp/target.c libgomp/target.c
> index 1fbbe31..6f0a339 100644
> --- libgomp/target.c
> +++ libgomp/target.c
> @@ -49,7 +49,7 @@ static void gomp_target_init (void);
>  /* The whole initialization code for offloading plugins is only run one.  */
>  static pthread_once_t gomp_is_initialized = PTHREAD_ONCE_INIT;
>  
> -/* Mutex for offload image registration.  */
> +/* Mutex for offload targets setup and image registration.  */
>  static gomp_mutex_t register_lock;
>  
>  /* This structure describes an offload image.
> @@ -118,6 +118,8 @@ attribute_hidden int
>  gomp_get_num_devices (void)
>  {
>    gomp_init_targets_once ();
> +  /* As it is immutable once it has been initialized, it's safe to access
> +     num_devices_openmp without register_lock held.  */
>    return num_devices_openmp;
>  }
>  
> @@ -133,6 +135,8 @@ resolve_device (int device_id)
>    if (device_id < 0 || device_id >= gomp_get_num_devices ())
>      return NULL;
>  
> +  /* As it is immutable once it has been initialized, it's safe to access
> +     devices without register_lock held.  */
>    return &devices[device_id];
>  }
>  
> @@ -1228,6 +1232,8 @@ gomp_target_init (void)
>    char *plugin_name;
>    int i, new_num_devices;
>  
> +  gomp_mutex_lock (&register_lock);
> +
>    num_devices = 0;
>    devices = NULL;
>  
> @@ -1317,6 +1323,8 @@ gomp_target_init (void)
>        if (devices[i].capabilities & GOMP_OFFLOAD_CAP_OPENACC_200)
>  	goacc_register (&devices[i]);
>      }
> +
> +  gomp_mutex_unlock (&register_lock);
>  }
>  
>  #else /* PLUGIN_SUPPORT */


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: libgomp: Guard all devices/num_devices/num_devices_openmp access by register_lock
  2015-09-28 10:03                                       ` libgomp: Guard all devices/num_devices/num_devices_openmp access by register_lock (was: libgomp: Guard all offload_images/num_offload_images access by register_lock) Thomas Schwinge
  2015-10-06 11:49                                         ` Thomas Schwinge
@ 2015-10-09 11:58                                         ` Bernd Schmidt
  2015-10-09 14:39                                           ` Ilya Verbin
  1 sibling, 1 reply; 92+ messages in thread
From: Bernd Schmidt @ 2015-10-09 11:58 UTC (permalink / raw)
  To: Thomas Schwinge, Jakub Jelinek, Ilya Verbin
  Cc: Julian Brown, gcc-patches, Kirill Yukhin

On 09/28/2015 10:52 AM, Thomas Schwinge wrote:
> On Fri, 25 Sep 2015 19:49:50 +0300, Ilya Verbin <iverbin@gmail.com> wrote:
>>
>> Looks like you're right, and this scenario is possible.
>
> Thanks for your review!  Jakub, OK to commit the patch I had posted?
>
>
> Then, in context of a similar scenario, I think we'll also want the
> following.  Please confirm that my reasoning in gomp_get_num_devices and
> resolve_device is correct.  OK for trunk?

I've looked at that for a while. I don't see anything immediately wrong 
with the patches, but I think it would still be good for Jakub to have a 
look.

One oddity I noticed in target.c is that there are two different 
num_devices variables:

   /* Total number of available devices.  */
   static int num_devices;

   /* Number of GOMP_OFFLOAD_CAP_OPENMP_400 devices.  */
   static int num_devices_openmp;

Confusingly, the get_num_devices function returns num_devices_openmp. 
That function includes a pthread_once call to gomp_target_init, which 
sets up these variables. References to num_devices_openmp through 
get_num_devices are thereforce guaranteed to be initialized. However, 
there are direct references to num_devices, in GOMP_offload_register_ver 
and GOMP_offload_unregister_ver, and they don't seem to enforce any kind 
of initialization:

   /* Load image to all initialized devices.  */
   for (i = 0; i < num_devices; i++)
     {
       struct gomp_device_descr *devicep = &devices[i];
       gomp_mutex_lock (&devicep->lock);
       if (devicep->type == target_type && devicep->is_initialized)
         gomp_load_image_to_device (devicep, version,
                                    host_table, target_data, true);
       gomp_mutex_unlock (&devicep->lock);
     }

I'm guessing this only triggers when dlopening something with an offload 
image after devices have been initialized already, and it looks like we 
have symmetrical code in gomp_init_device. Wouldn't it be 
possible/better to force a gomp_target_init before referencing 
num_devices, and then relying on the code I quoted and deleting the 
image loading from gomp_init_device?


Bernd

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: libgomp: Guard all devices/num_devices/num_devices_openmp access by register_lock
  2015-10-09 11:58                                         ` libgomp: Guard all devices/num_devices/num_devices_openmp access by register_lock Bernd Schmidt
@ 2015-10-09 14:39                                           ` Ilya Verbin
  0 siblings, 0 replies; 92+ messages in thread
From: Ilya Verbin @ 2015-10-09 14:39 UTC (permalink / raw)
  To: Bernd Schmidt
  Cc: Thomas Schwinge, Jakub Jelinek, Julian Brown, gcc-patches, Kirill Yukhin

On Fri, Oct 09, 2015 at 13:58:32 +0200, Bernd Schmidt wrote:
> One oddity I noticed in target.c is that there are two different num_devices
> variables:
> 
>   /* Total number of available devices.  */
>   static int num_devices;
> 
>   /* Number of GOMP_OFFLOAD_CAP_OPENMP_400 devices.  */
>   static int num_devices_openmp;
> 
> Confusingly, the get_num_devices function returns num_devices_openmp. That
> function includes a pthread_once call to gomp_target_init, which sets up
> these variables. References to num_devices_openmp through get_num_devices
> are thereforce guaranteed to be initialized. However, there are direct
> references to num_devices, in GOMP_offload_register_ver and
> GOMP_offload_unregister_ver, and they don't seem to enforce any kind of
> initialization:
> 
>   /* Load image to all initialized devices.  */
>   for (i = 0; i < num_devices; i++)
>     {
>       struct gomp_device_descr *devicep = &devices[i];
>       gomp_mutex_lock (&devicep->lock);
>       if (devicep->type == target_type && devicep->is_initialized)
>         gomp_load_image_to_device (devicep, version,
>                                    host_table, target_data, true);
>       gomp_mutex_unlock (&devicep->lock);
>     }
> 
> I'm guessing this only triggers when dlopening something with an offload
> image after devices have been initialized already, and it looks like we have
> symmetrical code in gomp_init_device.

Right, this code offloads given image to all initialized devices, and similar
code in gomp_init_device offloads all registered images to a given device.

> Wouldn't it be possible/better to
> force a gomp_target_init before referencing num_devices, and then relying on
> the code I quoted and deleting the image loading from gomp_init_device?

gomp_target_init only loads plugins and sets num_devices/num_devices_openmp, but
it doesn't call gomp_init_device, because we wanted to defer device
initialization as much as possible.  So, gomp_init_device is called immediately
before usage of that device.

  -- Ilya

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: libgomp: Compile-time error for non-portable gomp_mutex_t initialization
  2015-09-25 15:59                                     ` Jakub Jelinek
@ 2015-11-18 15:20                                       ` Ilya Verbin
  2015-11-19 12:31                                         ` Jakub Jelinek
  0 siblings, 1 reply; 92+ messages in thread
From: Ilya Verbin @ 2015-11-18 15:20 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Thomas Schwinge, Julian Brown, gcc-patches, Kirill Yukhin

On Fri, Sep 25, 2015 at 17:28:25 +0200, Jakub Jelinek wrote:
> On Fri, Sep 25, 2015 at 05:04:47PM +0200, Thomas Schwinge wrote:
> > On Thu, 26 Mar 2015 23:41:30 +0300, Ilya Verbin <iverbin@gmail.com> wrote:
> > > On Thu, Mar 26, 2015 at 13:09:19 +0100, Jakub Jelinek wrote:
> > > > the current code is majorly broken.  As I've said earlier, e.g. the lack
> > > > of mutex guarding gomp_target_init (which is using pthread_once guaranteed
> > > > to be run just once) vs. concurrent GOMP_offload_register calls
> > > > (if those are run from ctors, then I guess something like dl_load_lock
> > > > ensures at least on glibc that multiple GOMP_offload_register calls aren't
> > > > performed at the same time) in accessing/reallocating offload_images
> > > > and num_offload_images and the lack of support to register further
> > > > images after the gomp_target_init call (if you dlopen further shared
> > > > libraries) is really bad.  And it would be really nice to support the
> > > > unloading.
> > 
> > > Here is the latest patch for libgomp and mic plugin.
> > 
> > > libgomp/
> > 
> > > 	* target.c (register_lock): New mutex for offload image registration.
> > 
> > > 	(GOMP_offload_register): Add mutex lock.
> 
> That is definitely wrong.  You'd totally break --disable-linux-futex support
> on linux and bootstrap on e.g. Solaris and various other pthread targets.

I don't quite understand, do you mean that gcc 5 and trunk are broken, because
register_lock doesn't have initialization?  But it seems that bootstrap on
Solaris and other targets works fine...

> At least for ELF and dynamic linking, shared libraries that contain
> constructors that call GOMP_offload_register* should have DT_NEEDED libgomp
> and thus libgomp's constructors should be run before the constructors of
> the libraries that call GOMP_offload_register*.

So, libgomp should contain a constructor, which will call gomp_mutex_init
(&register_lock) before any call to GOMP_offload_register*, right?

> For the targets without known zero initializer for gomp_mutex_lock, either
> there is an option to use pthread_once to make sure it is initialized once,
> or there is an option to define a macro like GOMP_MUTEX_INITIALIZER,
> defined to PTHREAD_MUTEX_INITIALIZER in config/posix/mutex.h and to
> { 0 } in config/linux/mutex.h and something like {} or whatever in
> config/rtems/mutex.h.  Then for the non-automatic non-heap
> gomp_mutex_t's you could just initialize them in their initializers
> with GOMP_MUTEX_INITIALIZER.

  -- Ilya

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: libgomp: Compile-time error for non-portable gomp_mutex_t initialization
  2015-11-18 15:20                                       ` libgomp: Compile-time error for non-portable gomp_mutex_t initialization Ilya Verbin
@ 2015-11-19 12:31                                         ` Jakub Jelinek
  0 siblings, 0 replies; 92+ messages in thread
From: Jakub Jelinek @ 2015-11-19 12:31 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: Thomas Schwinge, Julian Brown, gcc-patches, Kirill Yukhin

On Wed, Nov 18, 2015 at 06:19:29PM +0300, Ilya Verbin wrote:
> On Fri, Sep 25, 2015 at 17:28:25 +0200, Jakub Jelinek wrote:
> > On Fri, Sep 25, 2015 at 05:04:47PM +0200, Thomas Schwinge wrote:
> > > On Thu, 26 Mar 2015 23:41:30 +0300, Ilya Verbin <iverbin@gmail.com> wrote:
> > > > On Thu, Mar 26, 2015 at 13:09:19 +0100, Jakub Jelinek wrote:
> > > > > the current code is majorly broken.  As I've said earlier, e.g. the lack
> > > > > of mutex guarding gomp_target_init (which is using pthread_once guaranteed
> > > > > to be run just once) vs. concurrent GOMP_offload_register calls
> > > > > (if those are run from ctors, then I guess something like dl_load_lock
> > > > > ensures at least on glibc that multiple GOMP_offload_register calls aren't
> > > > > performed at the same time) in accessing/reallocating offload_images
> > > > > and num_offload_images and the lack of support to register further
> > > > > images after the gomp_target_init call (if you dlopen further shared
> > > > > libraries) is really bad.  And it would be really nice to support the
> > > > > unloading.
> > > 
> > > > Here is the latest patch for libgomp and mic plugin.
> > > 
> > > > libgomp/
> > > 
> > > > 	* target.c (register_lock): New mutex for offload image registration.
> > > 
> > > > 	(GOMP_offload_register): Add mutex lock.
> > 
> > That is definitely wrong.  You'd totally break --disable-linux-futex support
> > on linux and bootstrap on e.g. Solaris and various other pthread targets.
> 
> I don't quite understand, do you mean that gcc 5 and trunk are broken, because
> register_lock doesn't have initialization?  But it seems that bootstrap on
> Solaris and other targets works fine...

Thomas has been proposing to add an #error when !GOMP_MUTEX_INIT_0 into
target.c, so that means break build of libgomp on all targets where
config/posix/mutex.h is used.  That includes --disable-linux-futex on Linux,
and various other targets.

> > At least for ELF and dynamic linking, shared libraries that contain
> > constructors that call GOMP_offload_register* should have DT_NEEDED libgomp
> > and thus libgomp's constructors should be run before the constructors of
> > the libraries that call GOMP_offload_register*.
> 
> So, libgomp should contain a constructor, which will call gomp_mutex_init
> (&register_lock) before any call to GOMP_offload_register*, right?

No, I think the GOMP_MUTEX_INITIALIZER case is better.
All pthread targets need to support PTHREAD_MUTEX_INITIALIZER, and the other
config/*/bar.h implementations are GOMP_MUTEX_INIT_0 1.
So, config/posix/bar.h would
#define GOMP_MUTEX_INITIALIZER PTHREAD_MUTEX_INITIALIZER
config/linux/bar.h would
#define GOMP_MUTEX_INITIALIZER { 0 }
and
config/rtems/bar.h would
#define GOMP_MUTEX_INITIALIZER {}
// or something similar.
and then just initialize the file scope locks with that static initializer.

	Jakub

^ permalink raw reply	[flat|nested] 92+ messages in thread

end of thread, other threads:[~2015-11-19 12:31 UTC | newest]

Thread overview: 92+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-15 20:44 Merge current set of OpenACC changes from gomp-4_0-branch Thomas Schwinge
2015-01-15 20:47 ` Jeff Law
2015-01-15 22:47 ` Tobias Burnus
2015-01-16 12:34 ` Gerald Pfeifer
2015-01-16 20:37   ` Thomas Schwinge
2015-01-16 15:04 ` Gerald Pfeifer
2015-01-16 15:06 ` Jakub Jelinek
2015-01-16 15:37   ` David Malcolm
2015-01-16 21:13 ` [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter (was: Merge current set of OpenACC changes from gomp-4_0-branch) Thomas Schwinge
2015-01-16 23:19   ` Ilya Verbin
2015-01-16 23:38     ` Jack Howarth
2015-01-16 23:48       ` Ilya Verbin
2015-01-17  0:37         ` Jack Howarth
2015-01-17  1:23           ` Ilya Verbin
2015-01-17  3:09   ` Jack Howarth
2015-02-24 17:23   ` [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter Thomas Schwinge
2015-01-16 22:41 ` Merge current set of OpenACC changes from gomp-4_0-branch Andreas Schwab
2015-02-04  9:41   ` [RFC testsuite] Fix PR64850, tweak acc_on_device* tests Thomas Schwinge
2015-02-10 12:02     ` Thomas Schwinge
2015-02-12  0:23       ` Kaz Kojima
2015-01-16 23:22 ` Merge current set of OpenACC changes from gomp-4_0-branch Ilya Verbin
2015-01-23 18:28   ` Ilya Verbin
2015-01-23 19:11     ` Jakub Jelinek
2015-01-26 14:01     ` Thomas Schwinge
2015-01-26 15:23       ` Ilya Verbin
2015-01-27 14:41         ` Julian Brown
2015-02-03 11:28           ` Ilya Verbin
2015-02-03 13:00             ` Julian Brown
2015-02-03 20:01               ` Ilya Verbin
2015-02-04 15:06                 ` Julian Brown
2015-02-18 12:25                   ` Ilya Verbin
2015-02-24 12:49                   ` Julian Brown
2015-02-25  9:54                     ` libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch) Thomas Schwinge
2015-02-25 12:17                       ` Julian Brown
2015-02-25 12:23                       ` Ilya Verbin
2015-02-26 17:31                       ` Ilya Verbin
2015-03-06 14:01                         ` Ilya Verbin
2015-03-09 14:46                           ` Julian Brown
2015-03-23 19:44                             ` Ilya Verbin
2015-03-26 10:07                               ` libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks Thomas Schwinge
2015-03-26 12:09                               ` libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch) Jakub Jelinek
2015-03-26 20:41                                 ` Ilya Verbin
2015-03-30 16:42                                   ` Jakub Jelinek
2015-03-30 21:43                                     ` Julian Brown
2015-03-31 12:52                                     ` Ilya Verbin
2015-03-31 13:08                                       ` Jakub Jelinek
2015-03-31 16:10                                         ` Ilya Verbin
2015-03-31 23:53                                           ` Ilya Verbin
2015-04-01  5:21                                             ` Jakub Jelinek
2015-04-01 13:14                                               ` Ilya Verbin
2015-04-01 13:20                                                 ` Jakub Jelinek
2015-04-01 17:26                                                   ` Ilya Verbin
2015-04-06 12:46                                                   ` Ilya Verbin
2015-04-07 15:26                                                     ` Jakub Jelinek
2015-04-08 14:32                                                       ` Julian Brown
2015-04-08 14:34                                                         ` Jakub Jelinek
2015-04-08 14:59                                                         ` Ilya Verbin
2015-04-08 16:14                                                           ` Julian Brown
2015-04-14 14:15                                                           ` Julian Brown
2015-04-14 15:35                                                             ` Using -foffload=[...] to cycle through accelerators (was: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks) Thomas Schwinge
2015-04-14 15:43                                                             ` acc_on_device for device_type_host_nonshm " Thomas Schwinge
2015-04-17 13:16                                                               ` Jakub Jelinek
2015-05-07 18:32                                                                 ` acc_on_device for device_type_host_nonshm (was: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks) (PR65742) Julian Brown
2015-05-21 11:32                                                                   ` acc_on_device for device_type_host_nonshm Thomas Schwinge
2015-05-21 11:42                                                                     ` Jakub Jelinek
2015-05-28 11:56                                                                       ` H.J. Lu
2015-05-28 13:29                                                                         ` Julian Brown
2015-06-04  7:25                                                                           ` [gomp4] " Tom de Vries
2015-06-02 12:08                                                                   ` [PR libgomp/65742, PR middle-end/66332] XFAIL acc_on_device compile-time evaluation (was: acc_on_device for device_type_host_nonshm) Thomas Schwinge
2015-07-14 20:26                                                                   ` PR65742: OpenACC acc_on_device fixes Thomas Schwinge
2015-07-15  7:27                                                                     ` Richard Biener
2015-04-17  9:54                                                             ` libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch) (PR65742) Julian Brown
2015-03-31 18:25                                     ` libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch) Ilya Verbin
2015-03-31 19:06                                       ` Jakub Jelinek
2015-09-25 15:10                                   ` libgomp: Compile-time error for non-portable gomp_mutex_t initialization (was: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks) Thomas Schwinge
2015-09-25 15:59                                     ` Jakub Jelinek
2015-11-18 15:20                                       ` libgomp: Compile-time error for non-portable gomp_mutex_t initialization Ilya Verbin
2015-11-19 12:31                                         ` Jakub Jelinek
2015-09-25 16:56                                   ` libgomp: Guard all offload_images/num_offload_images access by register_lock (was: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks) Thomas Schwinge
2015-09-25 16:58                                     ` Ilya Verbin
2015-09-28 10:03                                       ` libgomp: Guard all devices/num_devices/num_devices_openmp access by register_lock (was: libgomp: Guard all offload_images/num_offload_images access by register_lock) Thomas Schwinge
2015-10-06 11:49                                         ` Thomas Schwinge
2015-10-09 11:58                                         ` libgomp: Guard all devices/num_devices/num_devices_openmp access by register_lock Bernd Schmidt
2015-10-09 14:39                                           ` Ilya Verbin
2015-03-27 15:21                                 ` libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch) Julian Brown
2015-01-27 13:43       ` Merge current set of OpenACC changes from gomp-4_0-branch Julian Brown
2015-01-27 19:50       ` Jack Howarth
2015-02-17 18:06 ` Thomas Schwinge
2015-02-23 10:31 ` Fix number of arguments parameter in Ada DEF_FUNCTION_TYPE_* (was: Merge current set of OpenACC changes from gomp-4_0-branch) Thomas Schwinge
2015-04-20 14:24 ` Merge current set of OpenACC changes from gomp-4_0-branch Thomas Schwinge
2015-04-20 20:14   ` Gerald Pfeifer
2015-02-03 23:41 [RFC testsuite] Fix PR64850, tweak acc_on_device* tests Kaz Kojima

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).