public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [Patch] OpenMP, libgomp, gimple: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices
@ 2022-04-14 16:06 Marcel Vollweiler
  2022-06-30 13:16 ` Jakub Jelinek
  0 siblings, 1 reply; 7+ messages in thread
From: Marcel Vollweiler @ 2022-04-14 16:06 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3076 bytes --]

Hi,

This patch adds support for omp_get_max_teams, omp_set_num_teams, and
omp_{gs}et_teams_thread_limit on offload devices.

The patch builds on the following patches which are submitted, but not yet
approved/committed:
- [PATCH] OpenMP, libgomp: Environment variable syntax extension.
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588728.html
- [PATCH] OpenMP, libgomp: Add new runtime routine omp_get_mapped_ptr.
https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591556.html

The OpenMP runtime routines omp_get_max_teams, omp_set_num_teams, and
omp_{gs}et_teams_thread_limit were introduced in OpenMP 5.1 and where already
implemented for the host usage with patch
https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581283.html

The new patch extends the functionality of these OpenMP runtime routines by the
usage also on the device, i.e. device-specific values for nteams-var and
teams-thread-limit-var ICVs can now be retrieved and set also on the device. The
updated number of teams/threads are then used when launching the kernel.

The following main aspects are considered:
(a) Implemented the functions in the according icv-device files.
(b) Added structures to not only store initial device-specific values (they have
to be kept for omp_display_env) but also device-specific ICV values that can be
changed on the device at runtime.
(c) Changed the gimplification:
(c.1) Introduced integer_minus_two_node.
(c.2) For target regions that do not include teams constructs, now the clause
num_teams(-2) is added instead num_teams(1). This was necessary as num_teams(1)
is ambigious: it can also mean that a teams construct with explicit num_teams(1)
clause was specified inside the target region. The disambiguation is needed in
order to choose the correct thread limit: teams-thread-limit-var is only
intended for teams constructs such that if there is no teams construct, then the
number of threads is limited by thread-limit-var.
(d) Extend GOMP_target_ext. The host needs to set the device-specific ICV values
before the kernel is launched. The number of teams and threads are members of
the args list and are modified when no value was specified in an explicit clause
and the computation of the value was not postponed due to mapped variables.
(d.1) The arguments list is copied in order to guarantee immutability.
(e) Added copy back mechanism for ICVs which are modified on the device. The
only way to change device-specific ICVs is to do it on the device. As the
device-specific values are sometimes needed also on the host when the kernel is
launched (particularly number of teams and threads) they have to be copied back.

The patch was tested on x86_64-linux with nvptx and gcn offloading. All with no
regressions.

Marcel
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Attachment #2: teams_and_threads_on_device.diff --]
[-- Type: text/plain, Size: 75768 bytes --]

gcc/ChangeLog:

	* gimplify.cc (optimize_target_teams): Changed integer_one_node to
	integer_minus_two_node in case of non-existing teams construct in target
	region due to disambiguation. Previously, num_teams(1) was used as
	clause on the target construct when (a) no teams construct exists in the
	target region or (b) a teams construct with explicit num_teams(1)
	clause was specified.
	* tree-core.h (enum tree_index): Added TI_INTEGER_MINUS_TWO.
	* tree.cc (build_common_tree_nodes): Added integer_minus_two_node.
	* tree.h (integer_minus_two_node): Likewise.

libgomp/ChangeLog:

	* config/gcn/icv-device.c (omp_set_num_teams): Added.
	(omp_get_teams_thread_limit): Added.
	(omp_set_teams_thread_limit): Added.
	(ialias): Added for omp_set_num_teams and omp_{gs}et_teams_thread_limit.
	* config/nvptx/icv-device.c (omp_set_num_teams): Likewise.
	(omp_get_teams_thread_limit): Likewise.
	(omp_set_teams_thread_limit): Likewise.
	(ialias): Likewise.
	* env.c (struct gomp_default_icv_t): Added to hold default ICV values.
	(struct gomp_icv_list): Removed static.
	(omp_display_env): Renaming of used lists.
	(add_device_specific_icv): Removed static.
	(gomp_add_device_specific_icv): Removed static.
	(parse_device_specific): Renaming of used lists and added storing of
	parsed values in lists of modifiable ICV values. 
	* icv-device.c (omp_set_num_teams): Added.
	(ialias): Added for omp_set_num_teams and omp_{gs}et_teams_thread_limit.
	(omp_get_teams_thread_limit): Added.
	(omp_set_teams_thread_limit): Added.
	* icv.c (omp_set_num_teams): Removed.
	(omp_set_teams_thread_limit): Removed.
	(omp_get_teams_thread_limit): Removed.
	(ialias): Removed for omp_set_num_teams and
	omp_{gs}et_teams_thread_limit.
	* libgomp-plugin.h (GOMP_TEAMS_THREAD_LIMIT_VAR): Added definition.
	* libgomp.h (struct gomp_default_icv_t): Added.
	(gomp_add_device_specific_icv): Added external declaration.
	* plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Added
	GOMP_TEAMS_THREAD_LIMIT_VAR.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise.
	* target.c (gomp_get_icv): Added parameter and changed behaviour
	considering also default ICV values.
	(gomp_get_or_add_icv_int): Added. Adds ICV values to a list of
	device-specific ICVs if they not already there.
	(gomp_load_image_to_device): Implemented copy back for device-specific
	ICVs which are changed on the device but needed on the host. Currently
	only nteams-var and teams-thread-limit-var are copied back.
	(gomp_copy_back_icv): Added. Copies back ICV values from target to
	device using the pointer mapping.
	(GOMP_target_ext): Changing TEAMS and THREADS in the arguments list
	considering device-specific ICV values. Also invokes the copy back for
	ICV values from target to host.
	* testsuite/libgomp.c-c++-common/icv-4.c: Fixed typo for getting
	environment variable OMP_TEAMS_THREAD_LIMIT.
	* testsuite/libgomp.c-c++-common/icv-5.c: Added several test cases now also including teams-thread-limit-var.
	* testsuite/libgomp.c-c++-common/icv-6.c: Likewise.
	* testsuite/libgomp.c-c++-common/icv-7.c: Likewise.
	* testsuite/libgomp.c-c++-common/icv-8.c: New test.
	* testsuite/libgomp.fortran/icv-5.f90: New test.
	* testsuite/libgomp.fortran/icv-6.f90: New test.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/target-teams-1.c: Changed num_teams(1) expectation
	to num_teams(-2) since now non-existing teams constructs are specified
	with num_teams(-2) instead of num_teams(1) which was ambigious.
	* g++.dg/gomp/target-teams-1.C: Likewise.
	* gfortran.dg/gomp/defaultmap-4.f90: Likewise.
	* gfortran.dg/gomp/defaultmap-5.f90: Likewise.
	* gfortran.dg/gomp/defaultmap-6.f90: Likewise.

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 2588824..91f75ec 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -13994,7 +13994,7 @@ optimize_target_teams (tree target, gimple_seq *pre_p)
   struct gimplify_omp_ctx *target_ctx = gimplify_omp_ctxp;
 
   if (teams == NULL_TREE)
-    num_teams_upper = integer_one_node;
+    num_teams_upper = integer_minus_two_node;
   else
     for (c = OMP_TEAMS_CLAUSES (teams); c; c = OMP_CLAUSE_CHAIN (c))
       {
diff --git a/gcc/testsuite/c-c++-common/gomp/target-teams-1.c b/gcc/testsuite/c-c++-common/gomp/target-teams-1.c
index 51b8d48..74d60e1 100644
--- a/gcc/testsuite/c-c++-common/gomp/target-teams-1.c
+++ b/gcc/testsuite/c-c++-common/gomp/target-teams-1.c
@@ -81,5 +81,5 @@ foo (int a, int b, long c, long d)
 /* { dg-final { scan-tree-dump-times "thread_limit\\(-1\\)" 3 "gimple" } } */
 /* { dg-final { scan-tree-dump-times "num_teams\\(0\\)" 4 "gimple" } } */
 /* { dg-final { scan-tree-dump-times "thread_limit\\(0\\)" 6 "gimple" } } */
-/* { dg-final { scan-tree-dump-times "num_teams\\(1\\)" 2 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "num_teams\\(-2\\)" 2 "gimple" } } */
 /* { dg-final { scan-tree-dump-times "thread_limit\\(1\\)" 0 "gimple" } } */
diff --git a/gcc/testsuite/g++.dg/gomp/target-teams-1.C b/gcc/testsuite/g++.dg/gomp/target-teams-1.C
index f78a608..29e5597 100644
--- a/gcc/testsuite/g++.dg/gomp/target-teams-1.C
+++ b/gcc/testsuite/g++.dg/gomp/target-teams-1.C
@@ -88,5 +88,5 @@ foo (int a, int b, long c, long d)
 /* { dg-final { scan-tree-dump-times "thread_limit\\(-1\\)" 3 "gimple" } } */
 /* { dg-final { scan-tree-dump-times "num_teams\\(0\\)" 4 "gimple" } } */
 /* { dg-final { scan-tree-dump-times "thread_limit\\(0\\)" 6 "gimple" } } */
-/* { dg-final { scan-tree-dump-times "num_teams\\(1\\)" 2 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "num_teams\\(-2\\)" 2 "gimple" } } */
 /* { dg-final { scan-tree-dump-times "thread_limit\\(1\\)" 0 "gimple" } } */
diff --git a/gcc/testsuite/gfortran.dg/gomp/defaultmap-4.f90 b/gcc/testsuite/gfortran.dg/gomp/defaultmap-4.f90
index 7b182b5..9081159 100644
--- a/gcc/testsuite/gfortran.dg/gomp/defaultmap-4.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/defaultmap-4.f90
@@ -141,5 +141,5 @@ end
 ! { dg-final { scan-tree-dump-times "map\\(to:\\.strxparr \\\[len:" 2 "gimple" } }
 ! { dg-final { scan-tree-dump-times "map\\(to:strxparr \\\[pointer set, len:" 2 "gimple" } }
 ! { dg-final { scan-tree-dump-times "map\\(to:\\.strxp \\\[len:" 2 "gimple" } }
-! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(1\\) thread_limit\\(0\\) defaultmap\\(alloc\\)" 1 "gimple" } }
-! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(1\\) thread_limit\\(0\\) defaultmap\\(alloc:scalar\\) defaultmap\\(to:aggregate\\) defaultmap\\(tofrom:allocatable\\) defaultmap\\(firstprivate:pointer\\)" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(-2\\) thread_limit\\(0\\) defaultmap\\(alloc\\)" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(-2\\) thread_limit\\(0\\) defaultmap\\(alloc:scalar\\) defaultmap\\(to:aggregate\\) defaultmap\\(tofrom:allocatable\\) defaultmap\\(firstprivate:pointer\\)" 1 "gimple" } }
diff --git a/gcc/testsuite/gfortran.dg/gomp/defaultmap-5.f90 b/gcc/testsuite/gfortran.dg/gomp/defaultmap-5.f90
index 1391274..91566ed 100644
--- a/gcc/testsuite/gfortran.dg/gomp/defaultmap-5.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/defaultmap-5.f90
@@ -141,5 +141,5 @@ end
 ! { dg-final { scan-tree-dump-times "map\\(to:strxparr \\\[pointer set, len:" 2 "gimple" } }
 ! { dg-final { scan-tree-dump-times "map\\(to:\\*strxp \\\[len:" 1 "gimple" } }
 ! { dg-final { scan-tree-dump-times "map\\(to:\\.strxp \\\[len:" 1 "gimple" } }
-! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(1\\) thread_limit\\(0\\) defaultmap\\(to\\)" 1 "gimple" } }
-! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(1\\) thread_limit\\(0\\) defaultmap\\(to:scalar\\) defaultmap\\(tofrom:aggregate\\) defaultmap\\(firstprivate:allocatable\\) defaultmap\\(default:pointer\\)" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(-2\\) thread_limit\\(0\\) defaultmap\\(to\\)" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(-2\\) thread_limit\\(0\\) defaultmap\\(to:scalar\\) defaultmap\\(tofrom:aggregate\\) defaultmap\\(firstprivate:allocatable\\) defaultmap\\(default:pointer\\)" 1 "gimple" } }
diff --git a/gcc/testsuite/gfortran.dg/gomp/defaultmap-6.f90 b/gcc/testsuite/gfortran.dg/gomp/defaultmap-6.f90
index 9a81d0f..867e41a 100644
--- a/gcc/testsuite/gfortran.dg/gomp/defaultmap-6.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/defaultmap-6.f90
@@ -101,4 +101,4 @@ end
 ! { dg-final { scan-tree-dump-times "map\\(to:\\.strxparr \\\[len:" 1 "gimple" } }
 ! { dg-final { scan-tree-dump-times "map\\(to:strxparr \\\[pointer set, len:" 1 "gimple" } }
 ! { dg-final { scan-tree-dump-times "map\\(to:\\.strxp \\\[len:" 1 "gimple" } }
-! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(1\\) thread_limit\\(0\\) defaultmap\\(default\\)" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(-2\\) thread_limit\\(0\\) defaultmap\\(default\\)" 1 "gimple" } }
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index f1c2b64..7aec940 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -642,6 +642,7 @@ enum tree_index {
   TI_INTEGER_ONE,
   TI_INTEGER_THREE,
   TI_INTEGER_MINUS_ONE,
+  TI_INTEGER_MINUS_TWO,
   TI_NULL_POINTER,
 
   TI_SIZE_ZERO,
diff --git a/gcc/tree.cc b/gcc/tree.cc
index 8f83ea1..8cb474d 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -9345,6 +9345,7 @@ build_common_tree_nodes (bool signed_char)
   integer_one_node = build_int_cst (integer_type_node, 1);
   integer_three_node = build_int_cst (integer_type_node, 3);
   integer_minus_one_node = build_int_cst (integer_type_node, -1);
+  integer_minus_two_node = build_int_cst (integer_type_node, -2);
 
   size_zero_node = size_int (0);
   size_one_node = size_int (1);
diff --git a/gcc/tree.h b/gcc/tree.h
index cea49a5..1aeb009 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -4206,6 +4206,7 @@ tree_strip_any_location_wrapper (tree exp)
 #define integer_one_node		global_trees[TI_INTEGER_ONE]
 #define integer_three_node              global_trees[TI_INTEGER_THREE]
 #define integer_minus_one_node		global_trees[TI_INTEGER_MINUS_ONE]
+#define integer_minus_two_node		global_trees[TI_INTEGER_MINUS_TWO]
 #define size_zero_node			global_trees[TI_SIZE_ZERO]
 #define size_one_node			global_trees[TI_SIZE_ONE]
 #define bitsize_zero_node		global_trees[TI_BITSIZE_ZERO]
diff --git a/libgomp/config/gcn/icv-device.c b/libgomp/config/gcn/icv-device.c
index 26b2e7a..a99aa0d 100644
--- a/libgomp/config/gcn/icv-device.c
+++ b/libgomp/config/gcn/icv-device.c
@@ -37,6 +37,7 @@ volatile int GOMP_DEFAULT_DEVICE_VAR;
 volatile int GOMP_MAX_ACTIVE_LEVELS_VAR;
 volatile omp_proc_bind_t GOMP_BIND_VAR;
 volatile int GOMP_NTEAMS_VAR;
+volatile int GOMP_TEAMS_THREAD_LIMIT_VAR;
 
 void
 omp_set_default_device (int device_num __attribute__((unused)))
@@ -80,6 +81,26 @@ omp_get_max_teams (void)
   return GOMP_NTEAMS_VAR;
 }
 
+void
+omp_set_num_teams (int num_teams)
+{
+  if (num_teams >= 0)
+    GOMP_NTEAMS_VAR = num_teams;
+}
+
+int
+omp_get_teams_thread_limit (void)
+{
+  return GOMP_TEAMS_THREAD_LIMIT_VAR;
+}
+
+void
+omp_set_teams_thread_limit (int thread_limit)
+{
+  if (thread_limit >= 0)
+    GOMP_TEAMS_THREAD_LIMIT_VAR = thread_limit;
+}
+
 ialias (omp_set_default_device)
 ialias (omp_get_default_device)
 ialias (omp_get_initial_device)
@@ -87,3 +108,6 @@ ialias (omp_get_num_devices)
 ialias (omp_is_initial_device)
 ialias (omp_get_device_num)
 ialias (omp_get_max_teams)
+ialias (omp_set_num_teams)
+ialias (omp_get_teams_thread_limit)
+ialias (omp_set_teams_thread_limit)
diff --git a/libgomp/config/nvptx/icv-device.c b/libgomp/config/nvptx/icv-device.c
index f744e75..4b1106b 100644
--- a/libgomp/config/nvptx/icv-device.c
+++ b/libgomp/config/nvptx/icv-device.c
@@ -37,6 +37,7 @@ static volatile int GOMP_DEFAULT_DEVICE_VAR;
 static volatile int GOMP_MAX_ACTIVE_LEVELS_VAR;
 static volatile omp_proc_bind_t GOMP_BIND_VAR;
 static volatile int GOMP_NTEAMS_VAR;
+static volatile int GOMP_TEAMS_THREAD_LIMIT_VAR;
 
 void
 omp_set_default_device (int device_num __attribute__((unused)))
@@ -80,6 +81,26 @@ omp_get_max_teams (void)
   return GOMP_NTEAMS_VAR;
 }
 
+void
+omp_set_num_teams (int num_teams)
+{
+  if (num_teams >= 0)
+    GOMP_NTEAMS_VAR = num_teams;
+}
+
+int
+omp_get_teams_thread_limit (void)
+{
+  return GOMP_TEAMS_THREAD_LIMIT_VAR;
+}
+
+void
+omp_set_teams_thread_limit (int thread_limit)
+{
+  if (thread_limit >= 0)
+    GOMP_TEAMS_THREAD_LIMIT_VAR = thread_limit;
+}
+
 ialias (omp_set_default_device)
 ialias (omp_get_default_device)
 ialias (omp_get_initial_device)
@@ -87,3 +108,6 @@ ialias (omp_get_num_devices)
 ialias (omp_is_initial_device)
 ialias (omp_get_device_num)
 ialias (omp_get_max_teams)
+ialias (omp_set_num_teams)
+ialias (omp_get_teams_thread_limit)
+ialias (omp_set_teams_thread_limit)
diff --git a/libgomp/env.c b/libgomp/env.c
index 74803ec..f3a86fa 100644
--- a/libgomp/env.c
+++ b/libgomp/env.c
@@ -84,6 +84,16 @@ struct gomp_initial_icv_t gomp_initial_icv_all;
    by using environment variables like OMP_NUM_TEAMS_DEV.  */
 struct gomp_initial_icv_t gomp_initial_icv_dev;
 
+/* Default values of ICVs according to the OpenMP standard.  */
+struct gomp_default_icv_t gomp_default_icv_values = {
+  .run_sched_var = GFS_DYNAMIC,
+  .run_sched_chunk_size = 1,
+  .max_active_levels_var = 1,
+  .bind_var = omp_proc_bind_false,
+  .nteams_var = 0,
+  .teams_thread_limit_var = 0
+};
+
 /* Returns the element of the list for the specified device number.  */
 struct gomp_icv_list*
 gomp_get_icv_list (struct gomp_icv_list **list, int device_num)
@@ -109,21 +119,25 @@ gomp_get_icv_value_ptr (struct gomp_icv_list **list, int device_num)
 
 /* Lists for initial device-specific ICVs, i.e. ICVs that are configured for
    particular devices (with environment variables like OMP_NUM_TEAMS_DEV_42). */
-struct gomp_icv_list *gomp_dyn_var_dev_list = NULL;
-struct gomp_icv_list *gomp_nthreads_var_dev_list = NULL;
-struct gomp_icv_list *gomp_nthreads_var_list_dev_list = NULL;
-struct gomp_icv_list *gomp_nthreads_var_list_len_dev_list = NULL;
-struct gomp_icv_list *gomp_run_sched_var_dev_list = NULL;
-struct gomp_icv_list *gomp_run_sched_chunk_size_dev_list = NULL;
-struct gomp_icv_list *gomp_nteams_var_dev_list = NULL;
-struct gomp_icv_list *gomp_thread_limit_var_dev_list = NULL;
-struct gomp_icv_list *gomp_max_active_levels_var_dev_list = NULL;
-struct gomp_icv_list *gomp_proc_bind_var_dev_list = NULL;
-struct gomp_icv_list *gomp_proc_bind_var_list_dev_list = NULL;
-struct gomp_icv_list *gomp_proc_bind_var_list_len_dev_list = NULL;
+struct gomp_icv_list *gomp_initial_dyn_var_dev_list = NULL;
+struct gomp_icv_list *gomp_initial_nthreads_var_dev_list = NULL;
+struct gomp_icv_list *gomp_initial_nthreads_var_list_dev_list = NULL;
+struct gomp_icv_list *gomp_initial_nthreads_var_list_len_dev_list = NULL;
+struct gomp_icv_list *gomp_initial_run_sched_var_dev_list = NULL;
+struct gomp_icv_list *gomp_initial_run_sched_chunk_size_dev_list = NULL;
+struct gomp_icv_list *gomp_initial_nteams_var_dev_list = NULL;
+struct gomp_icv_list *gomp_initial_thread_limit_var_dev_list = NULL;
+struct gomp_icv_list *gomp_initial_max_active_levels_var_dev_list = NULL;
+struct gomp_icv_list *gomp_initial_proc_bind_var_dev_list = NULL;
+struct gomp_icv_list *gomp_initial_proc_bind_var_list_dev_list = NULL;
+struct gomp_icv_list *gomp_initial_proc_bind_var_list_len_dev_list = NULL;
 struct gomp_icv_list *stacksize_dev_list = NULL;
 struct gomp_icv_list *wait_policy_dev_list = NULL;
-struct gomp_icv_list *teams_thread_limit_var_dev_list = NULL;
+struct gomp_icv_list *gomp_initial_teams_thread_limit_var_dev_list = NULL;
+
+/* Lists for device-specific ICVs which can be changed, e.g. on the device.  */
+struct gomp_icv_list *gomp_nteams_var_dev_list = NULL;
+struct gomp_icv_list *gomp_teams_thread_limit_var_dev_list = NULL;
 
 /* Flags for non-global ICVs to store by which environment variables they are
    affected.  */
@@ -1425,7 +1439,7 @@ omp_display_env (int verbose)
   if (gomp_initial_icv_flags.dyn_var & GOMP_ENV_VAR_SUFFIX_DEV)
     fprintf (stderr, "  [device] OMP_DYNAMIC = '%s'\n",
 	     gomp_initial_icv_dev.dyn_var ? "TRUE" : "FALSE");
-  struct gomp_icv_list* l_dyn_var = gomp_dyn_var_dev_list;
+  struct gomp_icv_list* l_dyn_var = gomp_initial_dyn_var_dev_list;
   while (l_dyn_var != NULL)
     {
       fprintf (stderr, "  [%d] OMP_DYNAMIC = '%s'\n", l_dyn_var->device_num,
@@ -1458,19 +1472,19 @@ omp_display_env (int verbose)
 	fprintf (stderr, ",%lu", gomp_initial_icv_dev.nthreads_var_list[i]);
       fputs ("'\n", stderr);
     }
-  struct gomp_icv_list* l_nthreads_var = gomp_nthreads_var_dev_list;
+  struct gomp_icv_list* l_nthreads_var = gomp_initial_nthreads_var_dev_list;
   while (l_nthreads_var != NULL)
     {
       fprintf (stderr, "  [%d] OMP_NUM_THREADS = '%lu",
 	       l_nthreads_var->device_num,
 	       *(unsigned long*)l_nthreads_var->value);
       struct gomp_icv_list *len
-	= gomp_get_icv_list (&gomp_nthreads_var_list_len_dev_list,
+	= gomp_get_icv_list (&gomp_initial_nthreads_var_list_len_dev_list,
 			     l_nthreads_var->device_num);
       if (len != NULL)
 	{
 	  struct gomp_icv_list *list
-	    = gomp_get_icv_list (&gomp_nthreads_var_list_dev_list,
+	    = gomp_get_icv_list (&gomp_initial_nthreads_var_list_dev_list,
 				 l_nthreads_var->device_num);
 	  for (i = 1; i < *(unsigned long*)len->value; i++)
 	    fprintf (stderr, ",%lu", (*(unsigned long**)list->value)[i]);
@@ -1487,11 +1501,11 @@ omp_display_env (int verbose)
   if (gomp_initial_icv_flags.run_sched_var & GOMP_ENV_VAR_SUFFIX_DEV)
     print_schedule (gomp_initial_icv_dev.run_sched_var,
 		    gomp_initial_icv_dev.run_sched_chunk_size, "device");
-  struct gomp_icv_list* l_run_sched_var = gomp_run_sched_var_dev_list;
+  struct gomp_icv_list* l_run_sched_var = gomp_initial_run_sched_var_dev_list;
   while (l_run_sched_var != NULL)
     {
       struct gomp_icv_list* l_run_sched_chunk_size
-	= gomp_get_icv_list (&gomp_run_sched_chunk_size_dev_list,
+	= gomp_get_icv_list (&gomp_initial_run_sched_chunk_size_dev_list,
 			     l_run_sched_var->device_num);
       int chunk_size = *(int*)l_run_sched_chunk_size->value;
       char dev_num[10];
@@ -1513,14 +1527,14 @@ omp_display_env (int verbose)
     print_proc_bind (gomp_initial_icv_dev.bind_var,
 		     gomp_initial_icv_dev.bind_var_list_len,
 		     &gomp_initial_icv_dev.bind_var_list, "device");
-  struct gomp_icv_list* l_proc_bind_var = gomp_proc_bind_var_dev_list;
+  struct gomp_icv_list* l_proc_bind_var = gomp_initial_proc_bind_var_dev_list;
   while (l_proc_bind_var != NULL)
     {
       struct gomp_icv_list *list
-	= gomp_get_icv_list (&gomp_proc_bind_var_list_dev_list,
+	= gomp_get_icv_list (&gomp_initial_proc_bind_var_list_dev_list,
 			     l_proc_bind_var->device_num);
       struct gomp_icv_list *len
-	= gomp_get_icv_list (&gomp_proc_bind_var_list_len_dev_list,
+	= gomp_get_icv_list (&gomp_initial_proc_bind_var_list_len_dev_list,
 			     l_proc_bind_var->device_num);
       char dev_num[10];
       sprintf (dev_num, "%d", l_proc_bind_var->device_num);
@@ -1581,7 +1595,7 @@ omp_display_env (int verbose)
   if (gomp_initial_icv_flags.thread_limit_var & GOMP_ENV_VAR_SUFFIX_DEV)
     fprintf (stderr, "  [device] OMP_THREAD_LIMIT = '%d'\n",
 	     gomp_initial_icv_dev.thread_limit_var);
-  struct gomp_icv_list* l_thread_limit = gomp_thread_limit_var_dev_list;
+  struct gomp_icv_list* l_thread_limit = gomp_initial_thread_limit_var_dev_list;
   while (l_thread_limit != NULL)
     {
       fprintf (stderr, "  [%d] OMP_THREAD_LIMIT = '%d'\n",
@@ -1598,7 +1612,7 @@ omp_display_env (int verbose)
     fprintf (stderr, "  [device] OMP_MAX_ACTIVE_LEVELS = '%u'\n",
 	     gomp_initial_icv_dev.max_active_levels_var);
   struct gomp_icv_list* l_max_active_levels
-    = gomp_max_active_levels_var_dev_list;
+    = gomp_initial_max_active_levels_var_dev_list;
   while (l_max_active_levels != NULL)
     {
       fprintf (stderr, "  [%d] OMP_MAX_ACTIVE_LEVELS = '%u'\n",
@@ -1615,7 +1629,7 @@ omp_display_env (int verbose)
   if (gomp_initial_icv_flags.nteams_var & GOMP_ENV_VAR_SUFFIX_DEV)
     fprintf (stderr, "  [device] OMP_NUM_TEAMS = '%d'\n",
 	     gomp_initial_icv_dev.nteams_var);
-  struct gomp_icv_list* l_nteams_var = gomp_nteams_var_dev_list;
+  struct gomp_icv_list* l_nteams_var = gomp_initial_nteams_var_dev_list;
   while (l_nteams_var != NULL)
     {
       fprintf (stderr, "  [%d] OMP_NUM_TEAMS = '%d'\n",
@@ -1631,7 +1645,8 @@ omp_display_env (int verbose)
   if (gomp_initial_icv_flags.teams_thread_limit_var & GOMP_ENV_VAR_SUFFIX_DEV)
     fprintf (stderr, "  [device] OMP_TEAMS_THREAD_LIMIT = '%u'\n",
 	     gomp_initial_icv_dev.teams_thread_limit_var);
-  struct gomp_icv_list* l_teams_thr_limit = teams_thread_limit_var_dev_list;
+  struct gomp_icv_list* l_teams_thr_limit =
+    gomp_initial_teams_thread_limit_var_dev_list;
   while (l_teams_thr_limit != NULL)
     {
       fprintf (stderr, "  [%d] OMP_TEAMS_THREAD_LIMIT = '%u'\n",
@@ -1764,8 +1779,9 @@ get_device_num (char **env, int prefix_len, int *dev_num, int *name_len)
 
 /* Helper function for parse_device_specific. Adds a new node to the given
    list.  */
-static struct gomp_icv_list*
-add_device_specific_icv (int dev_num, size_t size, struct gomp_icv_list **list)
+struct gomp_icv_list *
+gomp_add_device_specific_icv (int dev_num, size_t size,
+			      struct gomp_icv_list **list)
 {
   if (list == NULL)
     return NULL;
@@ -1801,12 +1817,12 @@ parse_device_specific ()
 	enum gomp_schedule_type schedule_type;
 	int chunk_size;
 	parse_schedule (name, &schedule_type, &chunk_size);
-	new_node = add_device_specific_icv (dev_num,
+	new_node = gomp_add_device_specific_icv (dev_num,
 					    sizeof (enum gomp_schedule_type),
-					    &gomp_run_sched_var_dev_list);
+					    &gomp_initial_run_sched_var_dev_list);
 	*((enum gomp_schedule_type*)(new_node->value)) = schedule_type;
-	new_node = add_device_specific_icv (dev_num, sizeof (int),
-					    &gomp_run_sched_chunk_size_dev_list);
+	new_node = gomp_add_device_specific_icv (dev_num, sizeof (int),
+						 &gomp_initial_run_sched_chunk_size_dev_list);
 	*((int*)(new_node->value)) = chunk_size;
 	goto next;
       }
@@ -1819,8 +1835,8 @@ parse_device_specific ()
 	bool value;
 	if (!parse_boolean (name, &value))
 	  continue;
-	new_node = add_device_specific_icv (dev_num, sizeof (bool),
-					    &gomp_dyn_var_dev_list);
+	new_node = gomp_add_device_specific_icv (dev_num, sizeof (bool),
+					    &gomp_initial_dyn_var_dev_list);
 	*((bool*)(new_node->value)) = value;
 	goto next;
       }
@@ -1834,8 +1850,8 @@ parse_device_specific ()
 	if (!parse_unsigned_long (name, &value, false))
 	  continue;
 	value = value > INT_MAX ? UINT_MAX : value;
-	new_node = add_device_specific_icv (dev_num, sizeof (unsigned long),
-					    &gomp_thread_limit_var_dev_list);
+	new_node = gomp_add_device_specific_icv (dev_num, sizeof (unsigned long),
+					    &gomp_initial_thread_limit_var_dev_list);
 	*((unsigned long*)(new_node->value)) = value;
 	goto next;
       }
@@ -1850,17 +1866,17 @@ parse_device_specific ()
 	unsigned long nvalues = 0;
 	if (!parse_unsigned_long_list (name, &value, &pvalues, &nvalues))
 	  continue;
-	new_node = add_device_specific_icv (dev_num, sizeof (unsigned long),
-					    &gomp_nthreads_var_dev_list);
+	new_node = gomp_add_device_specific_icv (dev_num, sizeof (unsigned long),
+					    &gomp_initial_nthreads_var_dev_list);
 	*((unsigned long*)(new_node->value)) = value;
 	if (nvalues > 0)
 	  {
-	    new_node = add_device_specific_icv (dev_num, sizeof (unsigned long*),
-						&gomp_nthreads_var_list_dev_list);
+	    new_node = gomp_add_device_specific_icv (dev_num, sizeof (unsigned long*),
+						&gomp_initial_nthreads_var_list_dev_list);
 	    *((unsigned long**)(new_node->value)) = pvalues;
 
-	    new_node = add_device_specific_icv (dev_num, sizeof (unsigned long*),
-						&gomp_nthreads_var_list_len_dev_list);
+	    new_node = gomp_add_device_specific_icv (dev_num, sizeof (unsigned long*),
+						&gomp_initial_nthreads_var_list_len_dev_list);
 	    *((unsigned long*)(new_node->value)) = nvalues;
 	  }
 	goto next;
@@ -1874,7 +1890,10 @@ parse_device_specific ()
 	int value;
 	if (!parse_int (name, &value, false))
 	  continue;
-	new_node = add_device_specific_icv (dev_num, sizeof (int),
+	new_node = gomp_add_device_specific_icv (dev_num, sizeof (int),
+					    &gomp_initial_nteams_var_dev_list);
+	*((int*)(new_node->value)) = value;
+	new_node = gomp_add_device_specific_icv (dev_num, sizeof (int),
 					    &gomp_nteams_var_dev_list);
 	*((int*)(new_node->value)) = value;
 	goto next;
@@ -1904,18 +1923,18 @@ parse_device_specific ()
 	  if (value == omp_proc_bind_false)
 	    value = true;
 
-	new_node = add_device_specific_icv (dev_num, sizeof (char),
-					    &gomp_proc_bind_var_dev_list);
+	new_node = gomp_add_device_specific_icv (dev_num, sizeof (char),
+					    &gomp_initial_proc_bind_var_dev_list);
 	*((char*)(new_node->value)) = value;
 
 	if (nvalues > 0)
 	  {
-	    new_node = add_device_specific_icv (dev_num, sizeof (char*),
-						&gomp_proc_bind_var_list_dev_list);
+	    new_node = gomp_add_device_specific_icv (dev_num, sizeof (char*),
+						&gomp_initial_proc_bind_var_list_dev_list);
 	    *((char**)(new_node->value)) = pvalues;
 
-	    new_node = add_device_specific_icv (dev_num, sizeof (unsigned long),
-						&gomp_proc_bind_var_list_len_dev_list);
+	    new_node = gomp_add_device_specific_icv (dev_num, sizeof (unsigned long),
+						&gomp_initial_proc_bind_var_list_len_dev_list);
 	    *((unsigned long*)(new_node->value)) = nvalues;
 	  }
 	goto next;
@@ -1931,8 +1950,8 @@ parse_device_specific ()
 	  continue;
 	value = (value > gomp_supported_active_levels)
 		? gomp_supported_active_levels : value;
-	new_node = add_device_specific_icv (dev_num, sizeof (unsigned long),
-					    &gomp_max_active_levels_var_dev_list);
+	new_node = gomp_add_device_specific_icv (dev_num, sizeof (unsigned long),
+					    &gomp_initial_max_active_levels_var_dev_list);
 	*((unsigned long*)(new_node->value)) = value;
 	goto next;
       }
@@ -1945,7 +1964,7 @@ parse_device_specific ()
 	unsigned long value;
 	if (!parse_stacksize (name, &value))
 	  continue;
-	new_node = add_device_specific_icv (dev_num, sizeof (unsigned long),
+	new_node = gomp_add_device_specific_icv (dev_num, sizeof (unsigned long),
 					    &stacksize_dev_list);
 	*((unsigned long*)(new_node->value)) = value;
 	goto next;
@@ -1959,7 +1978,7 @@ parse_device_specific ()
 	int value;
 	if (!parse_wait_policy (name, &value))
 	  continue;
-	new_node = add_device_specific_icv (dev_num, sizeof (int),
+	new_node = gomp_add_device_specific_icv (dev_num, sizeof (int),
 					    &wait_policy_dev_list);
 	*((int*)(new_node->value)) = value;
 	goto next;
@@ -1973,8 +1992,12 @@ parse_device_specific ()
 	int value;
 	if (!parse_int (name, &value, false))
 	  continue;
-	new_node = add_device_specific_icv (dev_num, sizeof (int),
-					    &teams_thread_limit_var_dev_list);
+	new_node = gomp_add_device_specific_icv (
+			dev_num, sizeof (int),
+			&gomp_initial_teams_thread_limit_var_dev_list);
+	*((int*)(new_node->value)) = value;
+	new_node = gomp_add_device_specific_icv (dev_num, sizeof (int),
+					    &gomp_teams_thread_limit_var_dev_list);
 	*((int*)(new_node->value)) = value;
 	goto next;
       }
diff --git a/libgomp/icv-device.c b/libgomp/icv-device.c
index bc1af97..10b2e33 100644
--- a/libgomp/icv-device.c
+++ b/libgomp/icv-device.c
@@ -88,3 +88,29 @@ omp_get_max_teams (void)
 }
 
 ialias (omp_get_max_teams)
+
+void
+omp_set_num_teams (int num_teams)
+{
+  if (num_teams >= 0)
+    gomp_nteams_var = num_teams;
+}
+
+ialias (omp_set_num_teams)
+
+int
+omp_get_teams_thread_limit (void)
+{
+  return gomp_teams_thread_limit_var;
+}
+
+ialias (omp_get_teams_thread_limit)
+
+void
+omp_set_teams_thread_limit (int thread_limit)
+{
+  if (thread_limit >= 0)
+    gomp_teams_thread_limit_var = thread_limit;
+}
+
+ialias (omp_set_teams_thread_limit)
diff --git a/libgomp/icv.c b/libgomp/icv.c
index 5c435d6..9aef91c 100644
--- a/libgomp/icv.c
+++ b/libgomp/icv.c
@@ -148,26 +148,6 @@ omp_get_supported_active_levels (void)
   return gomp_supported_active_levels;
 }
 
-void
-omp_set_num_teams (int num_teams)
-{
-  if (num_teams >= 0)
-    gomp_nteams_var = num_teams;
-}
-
-void
-omp_set_teams_thread_limit (int thread_limit)
-{
-  if (thread_limit >= 0)
-    gomp_teams_thread_limit_var = thread_limit;
-}
-
-int
-omp_get_teams_thread_limit (void)
-{
-  return gomp_teams_thread_limit_var;
-}
-
 int
 omp_get_cancellation (void)
 {
@@ -268,9 +248,6 @@ ialias (omp_get_thread_limit)
 ialias (omp_set_max_active_levels)
 ialias (omp_get_max_active_levels)
 ialias (omp_get_supported_active_levels)
-ialias (omp_set_num_teams)
-ialias (omp_set_teams_thread_limit)
-ialias (omp_get_teams_thread_limit)
 ialias (omp_get_cancellation)
 ialias (omp_get_proc_bind)
 ialias (omp_get_max_task_priority)
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index 072cc47..75d7663 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -116,6 +116,7 @@ struct addr_pair
 #define GOMP_MAX_ACTIVE_LEVELS_VAR __gomp_max_active_levels
 #define GOMP_BIND_VAR __gomp_bind
 #define GOMP_NTEAMS_VAR __gomp_nteams
+#define GOMP_TEAMS_THREAD_LIMIT_VAR __gomp_teams_thread_limit_var
 
 /* Miscellaneous functions.  */
 extern void *GOMP_PLUGIN_malloc (size_t) __attribute__ ((malloc));
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 89d1453..b585c05 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -504,6 +504,17 @@ struct gomp_initial_icv_t
   unsigned long stacksize;
 };
 
+struct gomp_default_icv_t
+{
+  enum gomp_schedule_type run_sched_var;
+  int run_sched_chunk_size;
+  unsigned char max_active_levels_var;
+  char bind_var;
+  int nteams_var;
+  int teams_thread_limit_var;
+};
+extern struct gomp_default_icv_t gomp_default_icv_values;
+
 struct gomp_icv_flags_t
 {
   enum gomp_env_var_suffix_t nthreads_var;
@@ -527,13 +538,19 @@ struct gomp_icv_list {
 
 extern void *gomp_get_icv_value_ptr (struct gomp_icv_list **list,
 				     int device_num);
-extern struct gomp_icv_list *gomp_run_sched_var_dev_list;
-extern struct gomp_icv_list *gomp_run_sched_chunk_size_dev_list;
+extern struct gomp_icv_list* gomp_add_device_specific_icv (int dev_num,
+							   size_t size,
+							    struct gomp_icv_list **list);
+extern struct gomp_icv_list *gomp_initial_run_sched_var_dev_list;
+extern struct gomp_icv_list *gomp_initial_run_sched_chunk_size_dev_list;
+extern struct gomp_icv_list *gomp_initial_max_active_levels_var_dev_list;
+extern struct gomp_icv_list *gomp_initial_proc_bind_var_dev_list;
+extern struct gomp_icv_list *gomp_initial_proc_bind_var_list_dev_list;
+extern struct gomp_icv_list *gomp_initial_proc_bind_var_list_len_dev_list;
+extern struct gomp_icv_list *gomp_initial_nteams_var_dev_list;
+
 extern struct gomp_icv_list *gomp_nteams_var_dev_list;
-extern struct gomp_icv_list *gomp_max_active_levels_var_dev_list;
-extern struct gomp_icv_list *gomp_proc_bind_var_dev_list;
-extern struct gomp_icv_list *gomp_proc_bind_var_list_dev_list;
-extern struct gomp_icv_list *gomp_proc_bind_var_list_len_dev_list;
+extern struct gomp_icv_list *gomp_teams_thread_limit_var_dev_list;
 
 extern struct gomp_initial_icv_t gomp_initial_icv_all;
 extern struct gomp_initial_icv_t gomp_initial_icv_dev;
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 44ab369..3236c38 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -572,7 +572,8 @@ static char *GOMP_ICV_STRINGS[] =
   XSTRING (GOMP_DYN_VAR),
   XSTRING (GOMP_MAX_ACTIVE_LEVELS_VAR),
   XSTRING (GOMP_BIND_VAR),
-  XSTRING (GOMP_NTEAMS_VAR)
+  XSTRING (GOMP_NTEAMS_VAR),
+  XSTRING (GOMP_TEAMS_THREAD_LIMIT_VAR)
 };
 
 /* }}}  */
@@ -3371,7 +3372,7 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
   struct kernel_info *kernel;
   int kernel_count = image_desc->kernel_count;
   unsigned var_count = image_desc->global_variable_count;
-  int other_count = 10;
+  int other_count = 11;
 
   agent = get_agent_info (ord);
   if (!agent)
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 9d603fd..ced24ca 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -346,7 +346,8 @@ static char *GOMP_ICV_STRINGS[] =
   XSTRING (GOMP_DYN_VAR),
   XSTRING (GOMP_MAX_ACTIVE_LEVELS_VAR),
   XSTRING (GOMP_BIND_VAR),
-  XSTRING (GOMP_NTEAMS_VAR)
+  XSTRING (GOMP_NTEAMS_VAR),
+  XSTRING (GOMP_TEAMS_THREAD_LIMIT_VAR)
 };
 
 static inline struct nvptx_thread *
@@ -1316,7 +1317,7 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
   fn_descs = img_header->fn_descs;
 
   /* Currently, other entry kinds are 'device number' and further ICVs.  */
-  other_entries = 10;
+  other_entries = 11;
 
   targ_tbl = GOMP_PLUGIN_malloc (sizeof (struct addr_pair)
 				 * (fn_entries + var_entries + other_entries));
diff --git a/libgomp/target.c b/libgomp/target.c
index 4a9e533..1a6afe6 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -2077,13 +2077,51 @@ gomp_update (struct gomp_device_descr *devicep, size_t mapnum, void **hostaddrs,
    (_DEV_42, _DEV, _ALL).  */
 static void*
 gomp_get_icv (struct gomp_icv_list **list, int dev_num,
-	      enum gomp_env_var_suffix_t flag, void *dev_val, void *all_val)
+	      enum gomp_env_var_suffix_t flag, void *dev_val, void *all_val,
+	      void *default_val)
 {
   void *val = gomp_get_icv_value_ptr (list, dev_num);
   if (val == NULL && (flag & GOMP_ENV_VAR_SUFFIX_DEV))
-      val = dev_val;
+    val = dev_val;
   if (val == NULL && (flag & GOMP_ENV_VAR_SUFFIX_ALL))
-      val = all_val;
+    val = all_val;
+  if (val == NULL)
+    val = default_val;
+  if (val == NULL)
+    {
+      gomp_error ("Expected value for ICV.");
+      return NULL;
+    }
+  return val;
+}
+
+/* Helper function for 'gomp_load_image_to_device'. Similar to gomp_get_icv but
+   also adds the ICV value to LIST if not already there.  */
+static void*
+gomp_get_or_add_icv_int (struct gomp_icv_list **list, int dev_num,
+			 enum gomp_env_var_suffix_t flag, void *dev_val,
+			 void *all_val, void *default_val)
+{
+  void *val = gomp_get_icv_value_ptr (list, dev_num);
+  if (val == NULL)
+    {
+      if (flag & GOMP_ENV_VAR_SUFFIX_DEV)
+	val = dev_val;
+      if (val == NULL && (flag & GOMP_ENV_VAR_SUFFIX_ALL))
+	val = all_val;
+      if (val == NULL)
+	val = default_val;
+      if (val == NULL)
+	{
+	  gomp_error ("Expected value for ICV.");
+	  return NULL;
+	}
+
+      struct gomp_icv_list *new_node;
+      new_node = gomp_add_device_specific_icv (dev_num, sizeof (int), list);
+      *((int*)(new_node->value)) = *(int*)val;
+      val = new_node->value;
+    }
   return val;
 }
 
@@ -2107,8 +2145,12 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
   int num_funcs = host_funcs_end - host_func_table;
   int num_vars  = (host_vars_end - host_var_table) / 2;
 
-  /* Others currently is only 'device_num' */
-  int num_others = 10;
+  /* Others is 'device_num' and further ICVs.  */
+  int num_others = 11;
+
+  /* Number of ICVs, which need to be copied back to the host.
+     Currently nteams-var and teams-thread-limit-var.  */
+  int num_ICVs_copied_back = 2;
 
   /* Load image to device and get target addresses for the image.  */
   struct addr_pair *target_table = NULL;
@@ -2132,7 +2174,8 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
 
   /* Insert host-target address mapping into splay tree.  */
   struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt));
-  tgt->array = gomp_malloc ((num_funcs + num_vars) * sizeof (*tgt->array));
+  tgt->array = gomp_malloc ((num_funcs + num_vars + num_ICVs_copied_back)
+			    * sizeof (*tgt->array));
   tgt->refcount = REFCOUNT_INFINITY;
   tgt->tgt_start = 0;
   tgt->tgt_end = 0;
@@ -2204,6 +2247,10 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
       for (i = 0; i < num_others; ++i)
 	{
 	  struct addr_pair *var = &target_table[num_funcs + num_vars + i];
+	  /* COPY_BACK_SIZE is used if the device-specific ICV needs to be
+	     copied back from device to host.  */
+	  size_t copy_back_size = 0;
+
 	  /* Start address will be non-zero for the current entry if
 	     the variable was found in this image.  */
 	  if (var->start != 0)
@@ -2223,16 +2270,23 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
 		       values that are set during kernel entering.  */
 		    break;
 		  case 3:  // GOMP_RUN_SCHED_VAR
-		    val = gomp_get_icv (&gomp_run_sched_var_dev_list, dev_num,
-					gomp_initial_icv_flags.run_sched_var,
-					&gomp_initial_icv_dev.run_sched_var,
-					&gomp_initial_icv_all.run_sched_var);
+		    val =
+		      gomp_get_icv (&gomp_initial_run_sched_var_dev_list,
+				    dev_num,
+				    gomp_initial_icv_flags.run_sched_var,
+				    &gomp_initial_icv_dev.run_sched_var,
+				    &gomp_initial_icv_all.run_sched_var,
+				    &gomp_default_icv_values.run_sched_var);
 		    break;
 		  case 4:  // GOMP_RUN_SCHED_CHUNK_SIZE
-		    val = gomp_get_icv (&gomp_run_sched_chunk_size_dev_list, dev_num,
-					gomp_initial_icv_flags.run_sched_chunk_size,
-					&gomp_initial_icv_dev.run_sched_chunk_size,
-					&gomp_initial_icv_all.run_sched_chunk_size);
+		    val =
+		      gomp_get_icv (&gomp_initial_run_sched_chunk_size_dev_list,
+				    dev_num,
+				    gomp_initial_icv_flags.run_sched_chunk_size,
+				    &gomp_initial_icv_dev.run_sched_chunk_size,
+				    &gomp_initial_icv_all.run_sched_chunk_size,
+				    &gomp_default_icv_values.
+				    run_sched_chunk_size);
 		    break;
 		  case 5:  // GOMP_DEFAULT_DEVICE_VAR
 		    val = &gomp_global_icv.default_device_var;
@@ -2242,22 +2296,41 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
 		       values that are set during kernel entering.  */
 		    break;
 		  case 7:  // GOMP_MAX_ACTIVE_LEVELS_VAR
-		    val = gomp_get_icv (&gomp_max_active_levels_var_dev_list, dev_num,
-					gomp_initial_icv_flags.max_active_levels_var,
-					&gomp_initial_icv_dev.max_active_levels_var,
-					&gomp_initial_icv_all.max_active_levels_var);
+		    val =
+		      gomp_get_icv
+		      (&gomp_initial_max_active_levels_var_dev_list, dev_num,
+		       gomp_initial_icv_flags.max_active_levels_var,
+		       &gomp_initial_icv_dev.max_active_levels_var,
+		       &gomp_initial_icv_all.max_active_levels_var,
+		       &gomp_default_icv_values.max_active_levels_var);
 		    break;
 		  case 8:  // GOMP_BIND_VAR
-		    val = gomp_get_icv (&gomp_proc_bind_var_dev_list, dev_num,
+		    val = gomp_get_icv (&gomp_initial_proc_bind_var_dev_list,
+					dev_num,
 					gomp_initial_icv_flags.bind_var,
 					&gomp_initial_icv_dev.bind_var,
-					&gomp_initial_icv_all.bind_var);
+					&gomp_initial_icv_all.bind_var,
+					&gomp_default_icv_values.bind_var);
 		    break;
 		  case 9: // GOMP_NTEAMS_VAR
-		    val = gomp_get_icv (&gomp_nteams_var_dev_list, dev_num,
-					gomp_initial_icv_flags.nteams_var,
-					&gomp_initial_icv_dev.nteams_var,
-					&gomp_initial_icv_all.nteams_var);
+		    val =
+		      gomp_get_or_add_icv_int
+		      (&gomp_nteams_var_dev_list, dev_num,
+		       gomp_initial_icv_flags.nteams_var,
+		       &gomp_initial_icv_dev.nteams_var,
+		       &gomp_initial_icv_all.nteams_var,
+		       &gomp_default_icv_values.nteams_var);
+		    copy_back_size = sizeof (int);
+		    break;
+		  case 10: // GOMP_TEAMS_THREAD_LIMIT_VAR
+		    val =
+		      gomp_get_or_add_icv_int
+		      (&gomp_teams_thread_limit_var_dev_list, dev_num,
+		       gomp_initial_icv_flags.teams_thread_limit_var,
+		       &gomp_initial_icv_dev.teams_thread_limit_var,
+		       &gomp_initial_icv_all.teams_thread_limit_var,
+		       &gomp_default_icv_values.teams_thread_limit_var);
+		    copy_back_size = sizeof (int);
 		    break;
 		}
 	      if (val != NULL)
@@ -2267,6 +2340,22 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
 		     actually designating its device number into effect.  */
 		  gomp_copy_host2dev (devicep, NULL, (void *) var->start, val,
 				      var_size, false, NULL);
+		  if (copy_back_size)
+		    {
+		      splay_tree_key k = &array->key;
+		      k->host_start = (uintptr_t) val;
+		      k->host_end =
+			 k->host_start + (size_mask & copy_back_size);
+		      k->tgt = tgt;
+		      k->tgt_offset = var->start;
+		      k->refcount = REFCOUNT_INFINITY;
+		      k->dynamic_refcount = 0;
+		      k->aux = NULL;
+		      array->left = NULL;
+		      array->right = NULL;
+		      splay_tree_insert (&devicep->mem_map, array);
+		      array++;
+		    }
 		}
 	  }
 	}
@@ -2661,6 +2750,16 @@ clear_unsupported_flags (struct gomp_device_descr *devicep, unsigned int flags)
   return flags;
 }
 
+static void gomp_copy_back_icv (struct gomp_icv_list **list,
+				struct gomp_device_descr *devicep, int device,
+				size_t size)
+{
+  void *host_ptr = gomp_get_icv_value_ptr (list, device);
+  void *dev_ptr = omp_get_mapped_ptr (host_ptr, device);
+  if (dev_ptr != NULL)
+    gomp_copy_dev2host (devicep, NULL, host_ptr, dev_ptr, size);
+}
+
 /* Like GOMP_target, but KINDS is 16-bit, UNUSED is no longer present,
    and several arguments have been added:
    FLAGS is a bitmask, see GOMP_TARGET_FLAG_* in gomp-constants.h.
@@ -2693,6 +2792,144 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,
   size_t tgt_align = 0, tgt_size = 0;
   bool fpc_done = false;
 
+  /* Obtain the original TEAMS and THREADS values from ARGS. */
+  intptr_t orig_teams = 1, orig_threads = 0;
+  size_t num_args = 0, len = 1, teams_len = 1, threads_len = 1;
+  void **tmpargs = args;
+  while (*tmpargs)
+    {
+      intptr_t id = (intptr_t) *tmpargs++, val;
+      if (id & GOMP_TARGET_ARG_SUBSEQUENT_PARAM)
+	{
+	  val = (intptr_t) *tmpargs++;
+	  len = 2;
+	}
+      else
+	{
+	  val = id >> GOMP_TARGET_ARG_VALUE_SHIFT;
+	  len = 1;
+	}
+      num_args += len;
+      if ((id & GOMP_TARGET_ARG_DEVICE_MASK) != GOMP_TARGET_ARG_DEVICE_ALL)
+	continue;
+      val = val > INT_MAX ? INT_MAX : val;
+      if ((id & GOMP_TARGET_ARG_ID_MASK) == GOMP_TARGET_ARG_NUM_TEAMS)
+	{
+	  orig_teams = val;
+	  teams_len = len;
+	}
+      else if ((id & GOMP_TARGET_ARG_ID_MASK) == GOMP_TARGET_ARG_THREAD_LIMIT)
+	{
+	  orig_threads = val;
+	  threads_len = len;
+	}
+    }
+
+  intptr_t new_teams = orig_teams, new_threads = orig_threads;
+  /* ORIG_TEAMS == -2: No explicit teams construct specified. Set to 1.
+     ORIG_TEAMS == -1: Teams construct with NUM_TEAMS clause specified, but the
+		       value could not be specified. No Change.
+     ORIG_TEAMS == 0: TEAMS construct without NUM_TEAMS clause.
+		      Set device-specific value.
+     ORIG_TEAMS > 0: Value was already set through e.g. NUM_TEAMS clause.
+		     No change.  */
+  if (orig_teams == -2)
+    new_teams = 1;
+  else if (orig_teams == 0)
+    {
+      void *dev_val =
+	gomp_get_icv_value_ptr (&gomp_nteams_var_dev_list, device);
+      if (dev_val != NULL)
+	new_teams = *((int*) dev_val);
+    }
+  /* The device-specific teams-thread-limit is only set if (a) an explicit TEAMS
+     region exists, i.e. ORIG_TEAMS > -2, and (b) THREADS was not already set by
+     e.g. a THREAD_LIMIT clause.  */
+  if (orig_teams >= -2 && orig_threads == 0)
+    {
+      void *dev_val =
+	gomp_get_icv_value_ptr (&gomp_teams_thread_limit_var_dev_list, device);
+      if (dev_val != NULL)
+	new_threads = *((int*) dev_val);
+    }
+
+  /* Copy and change the arguments list only if TEAMS or THREADS need to be
+     updated.  */
+  void **new_args = args;
+  if (orig_teams != new_teams || orig_threads != new_threads)
+    {
+      size_t tms_len = (orig_teams == new_teams
+			? teams_len
+			: (new_teams > -(1 << 15) && new_teams < (1 << 15)
+			   ? 1 : 2));
+      size_t ths_len = (orig_threads == new_threads
+			? threads_len
+			: (new_threads > -(1 << 15) && new_threads < (1 << 15)
+			   ? 1 : 2));
+      /* One additional item since the last arg must be NULL.  */
+      size_t new_args_cnt = num_args - teams_len - threads_len + tms_len
+			    + ths_len + 1;
+      new_args = (void **) gomp_alloca (new_args_cnt * sizeof (void*));
+
+      tmpargs = args;
+      void **tmp_new_args = new_args;
+      while (*tmpargs)
+	{
+	  intptr_t id = (intptr_t) *tmpargs;
+	  if (((id & GOMP_TARGET_ARG_ID_MASK) == GOMP_TARGET_ARG_NUM_TEAMS
+	       && orig_teams != new_teams)
+	      || ((id & GOMP_TARGET_ARG_ID_MASK) == GOMP_TARGET_ARG_THREAD_LIMIT
+		  && orig_threads != new_threads))
+	    {
+	      tmpargs++;
+	      if (id & GOMP_TARGET_ARG_SUBSEQUENT_PARAM)
+		tmpargs++;
+	    }
+	  else
+	    {
+	      *tmp_new_args++ = *tmpargs++;
+	      if (id & GOMP_TARGET_ARG_SUBSEQUENT_PARAM)
+		*tmp_new_args++ = *tmpargs++;
+	    }
+	}
+
+      if (orig_teams != new_teams)
+	{
+	  intptr_t new_val = new_teams;
+	  if (tms_len == 1)
+	    {
+	      new_val = (new_val << GOMP_TARGET_ARG_VALUE_SHIFT)
+			 | GOMP_TARGET_ARG_NUM_TEAMS;
+	      *tmp_new_args++ = (void *) new_val;
+	    }
+	  else
+	    {
+	      *tmp_new_args++ = (void *) (GOMP_TARGET_ARG_SUBSEQUENT_PARAM
+					  | GOMP_TARGET_ARG_NUM_TEAMS);
+	      *tmp_new_args++ = (void *) new_val;
+	    }
+	}
+
+      if (orig_threads != new_threads)
+	{
+	  intptr_t new_val = new_threads;
+	  if (ths_len == 1)
+	    {
+	      new_val = (new_val << GOMP_TARGET_ARG_VALUE_SHIFT)
+			 | GOMP_TARGET_ARG_THREAD_LIMIT;
+	      *tmp_new_args++ = (void *) new_val;
+	    }
+	  else
+	    {
+	      *tmp_new_args++ = (void *) (GOMP_TARGET_ARG_SUBSEQUENT_PARAM
+					  | GOMP_TARGET_ARG_THREAD_LIMIT);
+	      *tmp_new_args++ = (void *) new_val;
+	    }
+	}
+
+      *tmp_new_args = NULL;
+    }
+
   flags = clear_unsupported_flags (devicep, flags);
 
   if (flags & GOMP_TARGET_FLAG_NOWAIT)
@@ -2731,7 +2968,7 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,
 	  && !thr->task->final_task)
 	{
 	  gomp_create_target_task (devicep, fn, mapnum, hostaddrs,
-				   sizes, kinds, flags, depend, args,
+				   sizes, kinds, flags, depend, new_args,
 				   GOMP_TARGET_TASK_BEFORE_MAP);
 	  return;
 	}
@@ -2777,7 +3014,7 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,
 				      tgt_align, tgt_size);
 	    }
 	}
-      gomp_target_fallback (fn, hostaddrs, devicep, args);
+      gomp_target_fallback (fn, hostaddrs, devicep, new_args);
       return;
     }
 
@@ -2805,9 +3042,10 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,
       tgt_vars = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds,
 				true, &refcount_set, GOMP_MAP_VARS_TARGET);
     }
+
   devicep->run_func (devicep->target_id, fn_addr,
 		     tgt_vars ? (void *) tgt_vars->tgt_start : hostaddrs,
-		     args);
+		     new_args);
   if (tgt_vars)
     {
       htab_clear (refcount_set);
@@ -2815,6 +3053,14 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,
     }
   if (refcount_set)
     htab_free (refcount_set);
+
+  /* Copy back ICVs which were probably changed on the device.
+     HOST_PTR is expected to exist since it was added in
+     gomp_load_image_to_device if not already available.  */
+  gomp_copy_back_icv (&gomp_nteams_var_dev_list, devicep, device, sizeof (int));
+  gomp_copy_back_icv (&gomp_teams_thread_limit_var_dev_list, devicep, device,
+		      sizeof (int));
+
 }
 
 /* Host fallback for GOMP_target_data{,_ext} routines.  */
diff --git a/libgomp/testsuite/libgomp.c-c++-common/icv-4.c b/libgomp/testsuite/libgomp.c-c++-common/icv-4.c
index b987a33..9da0d63 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/icv-4.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/icv-4.c
@@ -16,7 +16,7 @@ main ()
     }
   else
     omp_set_num_teams (6);
-  if (getenv ("OMP_TEAMS_THREAD_LIMIT") == NULL
+  if (getenv ("OMP_TEAMS_THREAD_LIMIT") != NULL
       && strcmp (getenv ("OMP_TEAMS_THREAD_LIMIT"), "12") == 0)
     {
       if (omp_get_teams_thread_limit () != 12)
diff --git a/libgomp/testsuite/libgomp.c-c++-common/icv-5.c b/libgomp/testsuite/libgomp.c-c++-common/icv-5.c
index 21bf44d..8f768b3 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/icv-5.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/icv-5.c
@@ -1,3 +1,4 @@
+/* { dg-additional-options "-DAMD" { target openacc_radeon_accel_selected } } */
 /* { dg-do run } */
 
 #include <omp.h>
@@ -6,15 +7,22 @@
 #include <unistd.h>
 
 char const *varnames[] = {
+  "OMP_NUM_TEAMS_ALL",
+  "OMP_NUM_TEAMS_DEV",
+  "OMP_NUM_TEAMS",
   "OMP_NUM_TEAMS_DEV_0",
   "OMP_NUM_TEAMS_DEV_1",
   "OMP_NUM_TEAMS_DEV_2",
-  "OMP_NUM_TEAMS_ALL",
-  "OMP_NUM_TEAMS_DEV",
-  "OMP_NUM_TEAMS"
+  "OMP_TEAMS_THREAD_LIMIT_ALL",
+  "OMP_TEAMS_THREAD_LIMIT_DEV",
+  "OMP_TEAMS_THREAD_LIMIT",
+  "OMP_TEAMS_THREAD_LIMIT_DEV_0",
+  "OMP_TEAMS_THREAD_LIMIT_DEV_1",
+  "OMP_TEAMS_THREAD_LIMIT_DEV_2"
 };
-char const *values[] = { "42", "43", "44", "45", "46", "47" };
-const int cnt = 6;
+char const *values[] = { "3", "4", "5", "6", "7", "8",
+			 "2", "3", "4", "5", "6", "7" };
+const int cnt = 12;
 
 int
 main (int argc, char *const *argv)
@@ -35,14 +43,192 @@ main (int argc, char *const *argv)
       abort ();
     }
 
-  if (omp_get_max_teams () != 47)
+  if (omp_get_max_teams () != 5
+      || omp_get_teams_thread_limit () != 4)
+    abort ();
+
+  #pragma omp teams
+  {
+    if (omp_get_num_teams () > 5
+	|| omp_get_team_num () >= 5)
+      abort ();
+    #pragma omp parallel
+    if (omp_get_thread_limit () > 4
+	|| omp_get_thread_num () >= 4)
+      abort ();
+  }
+
+  omp_set_num_teams (4);
+  omp_set_teams_thread_limit (3);
+  if (omp_get_max_teams () != 4
+      || omp_get_teams_thread_limit () != 3)
+    abort ();
+
+  #pragma omp teams
+  {
+    if (omp_get_num_teams () > 4
+	|| omp_get_team_num () >= 4)
+      abort ();
+    #pragma omp parallel
+    if (omp_get_thread_limit () > 3
+	|| omp_get_thread_num () >= 3)
+      abort ();
+  }
+
+  #pragma omp teams num_teams(3) thread_limit(2)
+  {
+    if (omp_get_num_teams () != 3
+	|| omp_get_team_num () >= 3)
+    abort ();
+    #pragma omp parallel
+    if (omp_get_thread_limit () > 2
+	|| omp_get_thread_num () >= 2)
+      abort ();
+  }
+
+  #pragma omp teams num_teams(5) thread_limit(4)
+  {
+    if (omp_get_num_teams () != 5
+	|| omp_get_team_num () >= 5)
     abort ();
+    #pragma omp parallel
+    if (omp_get_thread_limit () > 4
+	|| omp_get_thread_num () >= 4)
+      abort ();
+  }
 
   int num_devices = omp_get_num_devices () > 3 ? 3 : omp_get_num_devices ();
+  if (num_devices <= 0)
+    return 0;
+
   for (int i=0; i < num_devices; i++)
-    #pragma omp target device (i)
-      if (omp_get_max_teams () != 42 + i)
+    {
+      #pragma omp target device (i)
+      if (omp_get_max_teams () != 6 + i
+	  || omp_get_teams_thread_limit () != 5 + i)
 	abort ();
 
+      #pragma omp target device (i)
+      #pragma omp teams
+      {
+	if (omp_get_num_teams () > 6 + i
+	    || omp_get_team_num () >= 6 + i)
+	  abort ();
+	#pragma omp parallel
+	if (omp_get_thread_limit () > 5 + i
+	    || omp_get_thread_num () >= 5 + i)
+	  abort ();
+      }
+
+      #pragma omp target device (i)
+      {
+	omp_set_num_teams (5 + i);
+	omp_set_teams_thread_limit (4 + i);
+	if (omp_get_max_teams () != 5 + i
+	    || omp_get_teams_thread_limit () != 4 + i)
+	  abort ();
+      }
+
+      /* omp_set_num_teams and omp_set_teams_thread_limit above set the value
+	 of nteams-var and teams-thread-limit-var ICVs on device 'i', which has
+	 scope 'device' and should be avaible in subsequent target regions.  */
+      #pragma omp target device (i)
+      if (omp_get_max_teams () != 5 + i
+	  || omp_get_teams_thread_limit () != 4 + i)
+	abort ();
+
+      #pragma omp target device (i)
+      #pragma omp teams
+      {
+	if (omp_get_num_teams () > 5 + i
+	    || omp_get_team_num () >= 5 + i)
+	  abort ();
+	#pragma omp parallel
+	if (omp_get_thread_limit () > 4 + i
+	    || omp_get_thread_num () >= 4 + i)
+	  abort ();
+      }
+
+      #pragma omp target device (i)
+      #pragma omp teams num_teams(6 + i) thread_limit(5 + i)
+      {
+	if (omp_get_num_teams () > 6 + i
+	    || omp_get_team_num () >= 6 + i)
+	  abort ();
+	#pragma omp parallel
+	  if (omp_get_thread_limit () > 5 + i
+	      || omp_get_thread_num () >= 5 + i
+	      || omp_get_num_teams () > 6 + i
+	      || omp_get_team_num () >= 6 + i)
+	    abort ();
+      }
+
+      #pragma omp target device (i)
+      #pragma omp teams num_teams(4 + i) thread_limit(3 + i)
+      {
+	if (omp_get_num_teams () > 4 + i
+	    || omp_get_team_num () >= 4 + i)
+	  abort ();
+	#pragma omp parallel
+	  if (omp_get_thread_limit () > 3 + i
+	      || omp_get_thread_num () >= 3 + i
+	      || omp_get_num_teams () > 4 + i
+	      || omp_get_team_num () >= 4 + i)
+	    abort ();
+      }
+
+      #pragma omp target device (i)
+      #pragma omp teams thread_limit(3 + i) num_teams(4 + i)
+      {
+	if (omp_get_num_teams () > 4 + i
+	    || omp_get_team_num () >= 4 + i)
+	  abort ();
+	#pragma omp parallel
+	  if (omp_get_thread_limit () > 3 + i
+	      || omp_get_thread_num () >= 3 + i
+	      || omp_get_num_teams () > 4 + i
+	      || omp_get_team_num () >= 4 + i)
+	    abort ();
+      }
+
+      /* This tests a large number of teams and threads. If it is larger than
+	 2^15+1 then the according argument in the kernels arguments list
+	 is encoded with two items instead of one. On NVIDIA there is an
+	 adjustment for too large teams and threads. For AMD such adjustment
+	 exists only for threads and will cause runtime errors with a two large
+	 number of teams.  */
+      intptr_t large_num_teams = 66000;
+#ifdef AMD
+      large_num_teams = 8;
+#endif
+      intptr_t large_threads_limit = 67000;
+      #pragma omp target device (i)
+      {
+	omp_set_num_teams (large_num_teams + i);
+	omp_set_teams_thread_limit (large_threads_limit + i);
+	if (omp_get_max_teams () != large_num_teams + i
+	    || omp_get_teams_thread_limit () != large_threads_limit + i)
+	  abort ();
+      }
+
+      #pragma omp target device (i)
+	if (omp_get_max_teams () != large_num_teams + i
+	    || omp_get_teams_thread_limit () != large_threads_limit + i)
+	abort ();
+
+      #pragma omp target device (i)
+      #pragma omp teams
+      {
+	if (omp_get_num_teams () > large_num_teams + i
+	    || omp_get_team_num () >= large_num_teams + i)
+	  abort ();
+	#pragma omp parallel
+	if (omp_get_thread_limit () > large_threads_limit + i
+	    || omp_get_thread_num () >= large_threads_limit + i)
+	  abort ();
+      }
+
+    }
+
   return 0;
-}
\ No newline at end of file
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/icv-6.c b/libgomp/testsuite/libgomp.c-c++-common/icv-6.c
index 943147b..5e761aa 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/icv-6.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/icv-6.c
@@ -1,8 +1,8 @@
 /* { dg-do run } */
 
-/* This tests the hierarchical usage of ICVs on the device, i.e. if
-   OMP_NUM_TEAMS_DEV_<device_num> is not configured, then the value of
-   OMP_NUM_TEAMS_DEV should be used.  */
+/* This test considers the hierarchical usage of ICVs on the device, i.e. if
+   e.g. OMP_NUM_TEAMS_DEV_<device_num> is not configured, then the value of
+   OMP_NUM_TEAMS_DEV should be used for the targets.  */
 
 #include <omp.h>
 #include <stdlib.h>
@@ -12,16 +12,21 @@
 char const *varnames[] = {
   "OMP_NUM_TEAMS_ALL",
   "OMP_NUM_TEAMS_DEV",
+  "OMP_TEAMS_THREAD_LIMIT_ALL",
+  "OMP_TEAMS_THREAD_LIMIT_DEV",
 };
-char const *values[] = { "42", "43" };
-const int cnt = 2;
+char const *values[] = { "3", "4", "2", "3" };
+const int cnt = 4;
 
 char const *excludes[] = {
   "OMP_NUM_TEAMS_DEV_0",
   "OMP_NUM_TEAMS_DEV_1",
   "OMP_NUM_TEAMS_DEV_2",
+  "OMP_TEAMS_THREAD_LIMIT_DEV_0",
+  "OMP_TEAMS_THREAD_LIMIT_DEV_1",
+  "OMP_TEAMS_THREAD_LIMIT_DEV_2"
 };
-const int cnt_exludes = 3;
+const int cnt_exludes = 6;
 
 int
 main (int argc, char *const *argv)
@@ -50,14 +55,55 @@ main (int argc, char *const *argv)
       abort ();
     }
 
-  if (omp_get_max_teams () != 42)
-    abort ();
-
   int num_devices = omp_get_num_devices () > 3 ? 3 : omp_get_num_devices ();
   for (int i=0; i < num_devices; i++)
-    #pragma omp target device (i)
-      if (omp_get_max_teams () != 43)
+    {
+      #pragma omp target device (i)
+      if (omp_get_max_teams () != 4
+	  || omp_get_teams_thread_limit () != 3)
+	abort ();
+
+      #pragma omp target device (i)
+      #pragma omp teams
+      {
+	if (omp_get_num_teams () > 4
+	    || omp_get_team_num () >= 4)
+	  abort ();
+	#pragma omp parallel
+	if (omp_get_thread_limit () > 3
+	    || omp_get_thread_num () >= 3)
+	  abort ();
+      }
+
+      #pragma omp target device (i)
+      {
+	omp_set_num_teams (3 + i);
+	omp_set_teams_thread_limit (2 + i);
+	if (omp_get_max_teams () != 3 + i
+	    || omp_get_teams_thread_limit () != 2 + i)
+	  abort ();
+      }
+
+     /* omp_set_num_teams above set the value of nteams-var ICV on device 'i',
+	 which has scope 'device' and should be avaible in subsequent target
+	 regions.  */
+      #pragma omp target device (i)
+      if (omp_get_max_teams () != 3 + i
+	  || omp_get_teams_thread_limit () != 2 + i)
 	abort ();
 
+      #pragma omp target device (i)
+      #pragma omp teams
+      {
+	if (omp_get_num_teams () > 3 + i
+	    || omp_get_team_num () >= 3 + i)
+	  abort ();
+	#pragma omp parallel
+	if (omp_get_thread_limit () > 2 + i
+	    || omp_get_thread_num () >= 2 + i)
+	  abort ();
+      }
+    }
+
   return 0;
-}
\ No newline at end of file
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/icv-7.c b/libgomp/testsuite/libgomp.c-c++-common/icv-7.c
index 857d796..b28a33a 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/icv-7.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/icv-7.c
@@ -12,19 +12,25 @@
 #include <unistd.h>
 
 char const *varnames[] = {
-  "OMP_NUM_TEAMS_ALL"
+  "OMP_NUM_TEAMS_ALL",
+  "OMP_TEAMS_THREAD_LIMIT_ALL"
 };
-char const *values[] = { "42" };
-const int cnt = 1;
+char const *values[] = { "7", "2" };
+const int cnt = 2;
 
 char const *excludes[] = {
   "OMP_NUM_TEAMS_DEV_0",
   "OMP_NUM_TEAMS_DEV_1",
   "OMP_NUM_TEAMS_DEV_2",
   "OMP_NUM_TEAMS_DEV",
-  "OMP_NUM_TEAMS"
+  "OMP_NUM_TEAMS",
+  "OMP_TEAMS_THREAD_LIMIT_DEV_0",
+  "OMP_TEAMS_THREAD_LIMIT_DEV_1",
+  "OMP_TEAMS_THREAD_LIMIT_DEV_2",
+  "OMP_TEAMS_THREAD_LIMIT_DEV",
+  "OMP_TEAMS_THREAD_LIMIT"
 };
-const int cnt_exludes = 5;
+const int cnt_exludes = 10;
 
 int
 main (int argc, char *const *argv)
@@ -53,14 +59,73 @@ main (int argc, char *const *argv)
       abort ();
     }
 
-  if (omp_get_max_teams () != 42)
+  if (omp_get_max_teams () != 7
+      || omp_get_teams_thread_limit () != 2)
+    abort ();
+
+  #pragma omp teams
+  if (omp_get_num_teams () > 7
+      || omp_get_team_num () >= 7)
+    abort ();
+
+  omp_set_num_teams (9);
+  omp_set_teams_thread_limit (3);
+  if (omp_get_max_teams () != 9
+      || omp_get_teams_thread_limit () != 3)
+    abort ();
+
+  #pragma omp teams
+  if (omp_get_num_teams () > 9
+      || omp_get_team_num () >= 9)
+    abort ();
+
+  #pragma omp teams num_teams(5)
+  if (omp_get_num_teams () > 5
+      || omp_get_team_num () >= 5)
     abort ();
 
   int num_devices = omp_get_num_devices () > 3 ? 3 : omp_get_num_devices ();
   for (int i=0; i < num_devices; i++)
-    #pragma omp target device (i)
-      if (omp_get_max_teams () != 42)
+    {
+      #pragma omp target device (i)
+      if (omp_get_max_teams () != 7
+	  || omp_get_teams_thread_limit () != 2)
 	abort ();
 
+      #pragma omp target device (i)
+      #pragma omp teams
+      if (omp_get_num_teams () > 7
+	  || omp_get_team_num () >= 7)
+	abort ();
+
+      #pragma omp target device (i)
+      {
+	omp_set_num_teams (8 + i);
+	omp_set_teams_thread_limit (4 + i);
+	if (omp_get_max_teams () != 8 + i
+	    || omp_get_teams_thread_limit () != 4 + i)
+	  abort ();
+      }
+
+     /* omp_set_num_teams above set the value of nteams-var ICV on device 'i',
+	 which has scope 'device' and should be avaible in subsequent target
+	 regions.  */
+      #pragma omp target device (i)
+      if (omp_get_max_teams () != 8 + i
+	  || omp_get_teams_thread_limit () != 4 + i)
+	abort ();
+
+      #pragma omp target device (i)
+      #pragma omp teams
+      if (omp_get_num_teams () > 8 + i
+	  || omp_get_team_num () >= 8 + i)
+	abort ();
+
+      #pragma omp target device (i)
+      #pragma omp teams num_teams(5 + i)
+      if (omp_get_num_teams () != 5 + i)
+	abort ();
+    }
+
   return 0;
-}
\ No newline at end of file
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/icv-8.c b/libgomp/testsuite/libgomp.c-c++-common/icv-8.c
new file mode 100644
index 0000000..1eda2fd
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/icv-8.c
@@ -0,0 +1,105 @@
+/* { dg-do run } */
+
+/* This tests usage of ICVs on the host and on devices if no corresponding
+   environment variables are configured.  */
+
+#include <omp.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+char const *excludes[] = {
+  "OMP_NUM_TEAMS_DEV_0",
+  "OMP_NUM_TEAMS_DEV_1",
+  "OMP_NUM_TEAMS_DEV_2",
+  "OMP_NUM_TEAMS_DEV",
+  "OMP_NUM_TEAMS_ALL",
+  "OMP_NUM_TEAMS",
+  "OMP_TEAMS_THREAD_LIMIT_DEV_0",
+  "OMP_TEAMS_THREAD_LIMIT_DEV_1",
+  "OMP_TEAMS_THREAD_LIMIT_DEV_2",
+  "OMP_TEAMS_THREAD_LIMIT_DEV",
+  "OMP_TEAMS_THREAD_LIMIT_ALL",
+  "OMP_TEAMS_THREAD_LIMIT"
+};
+const int cnt_exludes = 12;
+
+int
+main (int argc, char *const *argv)
+{
+  int updated = 0;
+  for (int i = 0; i < cnt_exludes; i++)
+    if (getenv (excludes[i]) != NULL)
+      {
+	unsetenv (excludes[i]);
+	updated = 1;
+      }
+  if (updated)
+    {
+      execv (argv[0], argv);
+      abort ();
+    }
+
+  if (omp_get_max_teams () != 0
+      || omp_get_teams_thread_limit () != 0)
+    abort ();
+
+  omp_set_num_teams (9);
+  omp_set_teams_thread_limit (2);
+  if (omp_get_max_teams () != 9
+      || omp_get_teams_thread_limit () != 2)
+    abort ();
+
+  #pragma omp teams
+  if (omp_get_num_teams () > 9
+      || omp_get_team_num () >= 9)
+    abort ();
+
+  #pragma omp teams num_teams(5)
+  if (omp_get_num_teams () > 5
+      || omp_get_team_num () >= 5)
+    abort ();
+
+  int num_devices = omp_get_num_devices () > 3 ? 3 : omp_get_num_devices ();
+  for (int i=0; i < num_devices; i++)
+    {
+      #pragma omp target device (i)
+      if (omp_get_max_teams () != 0
+	  || omp_get_teams_thread_limit () != 0)
+	abort ();
+
+      #pragma omp target device (i)
+      {
+	omp_set_num_teams (8 + i);
+	omp_set_teams_thread_limit (3 + i);
+	if (omp_get_max_teams () != 8 + i
+	    || omp_get_teams_thread_limit () != 3 + i)
+	  abort ();
+      }
+
+     /* omp_set_num_teams above set the value of nteams-var ICV on device 'i',
+	 which has scope 'device' and should be avaible in subsequent target
+	 regions.  */
+      #pragma omp target device (i)
+      if (omp_get_max_teams () != 8 + i
+	  || omp_get_teams_thread_limit () != 3 + i)
+	abort ();
+
+      #pragma omp target device (i)
+      #pragma omp teams
+      if (omp_get_num_teams () > 8 + i
+	  || omp_get_team_num () >= 8 + i)
+	abort ();
+
+      #pragma omp target device (i)
+      #pragma omp teams num_teams(5 + i)
+      if (omp_get_num_teams () > 5 + i
+	  || omp_get_team_num () >= 5 + i)
+	abort ();
+
+      #pragma omp target device (i)
+      ;
+    }
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.fortran/icv-5.f90 b/libgomp/testsuite/libgomp.fortran/icv-5.f90
new file mode 100644
index 0000000..9462ca9
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/icv-5.f90
@@ -0,0 +1,231 @@
+! { dg-set-target-env-var OMP_NUM_TEAMS_ALL "3" }
+! { dg-set-target-env-var OMP_NUM_TEAMS_DEV "4" }
+! { dg-set-target-env-var OMP_NUM_TEAMS "5" }
+! { dg-set-target-env-var OMP_NUM_TEAMS_DEV_0 "6" }
+! { dg-set-target-env-var OMP_NUM_TEAMS_DEV_1 "7" }
+! { dg-set-target-env-var OMP_NUM_TEAMS_DEV_2 "8" }
+! { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_ALL "2" }
+! { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_DEV "3" }
+! { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT "4" }
+! { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_DEV_0 "5" }
+! { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_DEV_1 "6" }
+! { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_DEV_2 "7" }
+
+use omp_lib
+implicit none (type, external)
+  integer :: num_devices, i, large_num_teams, large_threads_limit
+  logical :: err
+
+  if (omp_get_num_devices () > 3) then
+    num_devices = 3
+  else
+    num_devices = omp_get_num_devices ()
+  end if
+
+  do i=0,num_devices-1
+
+    ! Testing NUM_TEAMS.
+    if (env_is_set_dev ("OMP_NUM_TEAMS_DEV_", i, 6 + i)) then
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_max_teams () /= 6 + i) err = .true.
+      !$omp end target
+      if (err) stop 1
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      if (omp_get_num_teams () > 6 + i .or. omp_get_team_num () >= 6 + i) &
+        err = .true.
+      !$omp end teams
+      !$omp end target
+      if (err) stop 2
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      call omp_set_num_teams (5 + i)
+      if (omp_get_max_teams () /= 5 + i) err = .true.
+      !$omp end target
+      if (err) stop 3
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_max_teams () /= 5 + i) err = .true.
+      !$omp end target
+      if (err) stop 4
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      if (omp_get_num_teams () > 5 + i .or. omp_get_team_num () >= 5 + i) &
+        err = .true.
+      !$omp end teams
+      !$omp end target
+      if (err) stop 5
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams num_teams(6 + i)
+      if (omp_get_num_teams () > 6 + i .or. omp_get_team_num () >= 6 + i) &
+        err = .true.
+      !$omp end teams
+      !$omp end target
+      if (err) stop 6
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams num_teams(4 + i)
+      if (omp_get_num_teams () > 4 + i .or. omp_get_team_num () >= 4 + i) &
+        err = .true.
+      !$omp end teams
+      !$omp end target
+      if (err) stop 7
+
+      large_num_teams = 66000
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      call omp_set_num_teams (large_num_teams + i)
+      if (omp_get_max_teams () /= large_num_teams + i) err = .true.
+      !$omp end target
+      if (err) stop 8
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_max_teams () /= large_num_teams + i) err = .true.
+      !$omp end target
+      if (err) stop 9
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      if (omp_get_num_teams () > large_num_teams + i &
+          .or. omp_get_team_num () >= large_num_teams + i) err = .true.
+      !$omp end teams
+      !$omp end target
+      if (err) stop 10
+    end if
+
+    ! Testing TEAMS-THREAD-LIMIT
+    if (env_is_set_dev ("OMP_TEAMS_THREAD_LIMIT_DEV_", i, 5 + i)) then
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_teams_thread_limit () /= 5 + i) err = .true.
+      !$omp end target
+      if (err) stop 11
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      !$omp parallel
+      if (omp_get_thread_limit () > 5 + i .or. omp_get_thread_num () >= 5 + i) &
+        err = .true.
+      !$omp end parallel
+      !$omp end teams
+      !$omp end target
+      if (err) stop 12
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      call omp_set_teams_thread_limit (4 + i)
+      if (omp_get_teams_thread_limit () /= 4 + i) err = .true.
+      !$omp end target
+      if (err) stop 13
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_teams_thread_limit () /= 4 + i) err = .true.
+      !$omp end target
+      if (err) stop 14
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      !$omp parallel
+      if (omp_get_thread_limit () > 4 + i .or. omp_get_thread_num () >= 4 + i) &
+        err = .true.
+      !$omp end parallel
+      !$omp end teams
+      !$omp end target
+      if (err) stop 15
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams thread_limit(5 + i)
+      !$omp parallel
+      if (omp_get_thread_limit () > 5 + i .or. omp_get_thread_num () >= 5 + i) &
+        err = .true.
+      !$omp end parallel
+      !$omp end teams
+      !$omp end target
+      if (err) stop 16
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams thread_limit(3 + i)
+      !$omp parallel
+      if (omp_get_thread_limit () > 3 + i .or. omp_get_thread_num () >= 3 + i) &
+        err = .true.
+      !$omp end parallel
+      !$omp end teams
+      !$omp end target
+      if (err) stop 17
+
+      large_threads_limit = 67000
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      call omp_set_teams_thread_limit (large_threads_limit + i)
+      if (omp_get_teams_thread_limit () /= large_threads_limit + i) err = .true.
+      !$omp end target
+      if (err) stop 18
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_teams_thread_limit () /= large_threads_limit + i) err = .true.
+      !$omp end target
+      if (err) stop 19
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      !$omp parallel
+      if (omp_get_thread_limit () > large_threads_limit + i &
+          .or. omp_get_thread_num () >= large_threads_limit + i) err = .true.
+      !$omp end parallel
+      !$omp end teams
+      !$omp end target
+      if (err) stop 20
+    end if
+
+  end do
+
+contains
+  logical function env_is_set (name, val)
+    character(len=*) :: name, val
+    character(len=40) :: val2
+    integer :: stat
+    call get_environment_variable (name, val2, status=stat)
+    if (stat == 0) then
+      if (val == val2) then
+        env_is_set = .true.
+        return
+      end if
+    else if (stat /= 1) then
+      error stop 30
+    endif
+    env_is_set = .false.
+  end
+  logical function env_is_set_dev (name, dev_num, val)
+    character(len=*) :: name
+    integer :: dev_num, val
+    character(len=64) :: dev_num_str, env_var, val_str
+    write (dev_num_str,*) dev_num
+    dev_num_str = ADJUSTL(dev_num_str)
+    env_var = name // dev_num_str
+    write (val_str,*) val
+    val_str = ADJUSTL(val_str)
+    write (*,*) ">", TRIM(env_var), "<"
+    write (*,*) ">", TRIM(val_str), "<"
+    write (*,*) "env_is_set (TRIM(env_var), TRIM(device_num): ", env_is_set(TRIM(env_var), TRIM(val_str))
+    env_is_set_dev = env_is_set (TRIM(env_var), TRIM(val_str))
+  end
+end
diff --git a/libgomp/testsuite/libgomp.fortran/icv-6.f90 b/libgomp/testsuite/libgomp.fortran/icv-6.f90
new file mode 100644
index 0000000..c1be2b2
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/icv-6.f90
@@ -0,0 +1,142 @@
+! { dg-set-target-env-var OMP_NUM_TEAMS_ALL "3" }
+! { dg-set-target-env-var OMP_NUM_TEAMS_DEV "4" }
+! { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_ALL "2" }
+! { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_DEV "3" }
+
+! This test considers the hierarchical usage of ICVs on the device, i.e. if
+! e.g. OMP_NUM_TEAMS_DEV_<device_num> is not configured, then the value of
+! OMP_NUM_TEAMS_DEV should be used for the targets.
+
+use omp_lib
+implicit none (type, external)
+  integer :: num_devices, i, stat, tmp
+  logical :: err
+  character(len=40) :: val
+
+  ! The following environment variables should not be set.
+  call get_environment_variable ("OMP_NUM_TEAMS_DEV_0", val, status=stat)
+  if (stat /= 1) return
+  call get_environment_variable ("OMP_NUM_TEAMS_DEV_1", val, status=stat)
+  if (stat /= 1) return
+  call get_environment_variable ("OMP_NUM_TEAMS_DEV_2", val, status=stat)
+  if (stat /= 1) return
+  call get_environment_variable ("OMP_TEAMS_THREAD_LIMIT_DEV_0", val, status=stat)
+  if (stat /= 1) return
+  call get_environment_variable ("OMP_TEAMS_THREAD_LIMIT_DEV_1", val, status=stat)
+  if (stat /= 1) return
+  call get_environment_variable ("OMP_TEAMS_THREAD_LIMIT_DEV_2", val, status=stat)
+  if (stat /= 1) return
+
+  if (omp_get_num_devices () > 3) then
+    num_devices = 3
+  else
+    num_devices = omp_get_num_devices ()
+  end if
+
+  do i=0,num_devices-1
+
+    ! Testing NUM_TEAMS.
+    write (*,*) 'env_is_set ("OMP_NUM_TEAMS_DEV", "4"): ', env_is_set ("OMP_NUM_TEAMS_DEV", "4")
+    if (env_is_set ("OMP_NUM_TEAMS_DEV", "4")) then
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_max_teams () /= 4) err = .true.
+      !$omp end target
+      if (err) stop 1
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      if (omp_get_num_teams () > 4 .or. omp_get_team_num () >= 4) &
+        err = .true.
+      !$omp end teams
+      !$omp end target
+      if (err) stop 2
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      call omp_set_num_teams (3 + i)
+      if (omp_get_max_teams () /= 3 + i) err = .true.
+      !$omp end target
+      if (err) stop 3
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_max_teams () /= 3 + i) err = .true.
+      !$omp end target
+      if (err) stop 4
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      if (omp_get_num_teams () > 3 + i .or. omp_get_team_num () >= 3 + i) &
+        err = .true.
+      !$omp end teams
+      !$omp end target
+      if (err) stop 5
+    end if
+
+    ! Testing TEAMS-THREAD-LIMIT
+    write (*,*) 'env_is_set ("OMP_TEAMS_THREAD_LIMIT_DEV", "3"): ', env_is_set ("OMP_TEAMS_THREAD_LIMIT_DEV", "3")
+    if (env_is_set ("OMP_TEAMS_THREAD_LIMIT_DEV", "3")) then
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_teams_thread_limit () /= 3) err = .true.
+      !$omp end target
+      if (err) stop 6
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      !$omp parallel
+      if (omp_get_thread_limit () > 3 .or. omp_get_thread_num () >= 3) &
+        err = .true.
+      !$omp end parallel
+      !$omp end teams
+      !$omp end target
+      if (err) stop 7
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      call omp_set_teams_thread_limit (2 + i)
+      if (omp_get_teams_thread_limit () /= 2 + i) err = .true.
+      !$omp end target
+      if (err) stop 8
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_teams_thread_limit () /= 2 + i) err = .true.
+      !$omp end target
+      if (err) stop 9
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      !$omp parallel
+      if (omp_get_thread_limit () > 2 + i .or. omp_get_thread_num () >= 2 + i) &
+        err = .true.
+      !$omp end parallel
+      !$omp end teams
+      !$omp end target
+      if (err) stop 10
+    end if
+
+  end do
+
+contains
+  logical function env_is_set (name, val)
+    character(len=*) :: name, val
+    character(len=40) :: val2
+    integer :: stat
+    call get_environment_variable (name, val2, status=stat)
+    if (stat == 0) then
+      if (val == val2) then
+        env_is_set = .true.
+        return
+      end if
+    else if (stat /= 1) then
+      error stop 10
+    endif
+    env_is_set = .false.
+  end
+end

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Patch] OpenMP, libgomp, gimple: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices
  2022-04-14 16:06 [Patch] OpenMP, libgomp, gimple: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices Marcel Vollweiler
@ 2022-06-30 13:16 ` Jakub Jelinek
  2022-08-03 12:40   ` Marcel Vollweiler
  0 siblings, 1 reply; 7+ messages in thread
From: Jakub Jelinek @ 2022-06-30 13:16 UTC (permalink / raw)
  To: Marcel Vollweiler; +Cc: gcc-patches

On Thu, Apr 14, 2022 at 06:06:24PM +0200, Marcel Vollweiler wrote:
> --- a/gcc/gimplify.cc
> +++ b/gcc/gimplify.cc
> @@ -13994,7 +13994,7 @@ optimize_target_teams (tree target, gimple_seq *pre_p)
>    struct gimplify_omp_ctx *target_ctx = gimplify_omp_ctxp;
>  
>    if (teams == NULL_TREE)
> -    num_teams_upper = integer_one_node;
> +    num_teams_upper = integer_minus_two_node;

No, please don't introduce this, it is quite costly to have a GC trees
like integer_one_node, so they should stay for the most commonly used
numbers, -2 isn't like that.  Just build_int_cst (integer_type_node, -2).

> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -642,6 +642,7 @@ enum tree_index {
>    TI_INTEGER_ONE,
>    TI_INTEGER_THREE,
>    TI_INTEGER_MINUS_ONE,
> +  TI_INTEGER_MINUS_TWO,
>    TI_NULL_POINTER,
>  
>    TI_SIZE_ZERO,
> diff --git a/gcc/tree.cc b/gcc/tree.cc
> index 8f83ea1..8cb474d 100644
> --- a/gcc/tree.cc
> +++ b/gcc/tree.cc
> @@ -9345,6 +9345,7 @@ build_common_tree_nodes (bool signed_char)
>    integer_one_node = build_int_cst (integer_type_node, 1);
>    integer_three_node = build_int_cst (integer_type_node, 3);
>    integer_minus_one_node = build_int_cst (integer_type_node, -1);
> +  integer_minus_two_node = build_int_cst (integer_type_node, -2);
>  
>    size_zero_node = size_int (0);
>    size_one_node = size_int (1);
> diff --git a/gcc/tree.h b/gcc/tree.h
> index cea49a5..1aeb009 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -4206,6 +4206,7 @@ tree_strip_any_location_wrapper (tree exp)
>  #define integer_one_node		global_trees[TI_INTEGER_ONE]
>  #define integer_three_node              global_trees[TI_INTEGER_THREE]
>  #define integer_minus_one_node		global_trees[TI_INTEGER_MINUS_ONE]
> +#define integer_minus_two_node		global_trees[TI_INTEGER_MINUS_TWO]
>  #define size_zero_node			global_trees[TI_SIZE_ZERO]
>  #define size_one_node			global_trees[TI_SIZE_ONE]
>  #define bitsize_zero_node		global_trees[TI_BITSIZE_ZERO]

And drop the above 3 hunks.

> --- a/libgomp/config/gcn/icv-device.c
> +++ b/libgomp/config/gcn/icv-device.c
> @@ -37,6 +37,7 @@ volatile int GOMP_DEFAULT_DEVICE_VAR;
>  volatile int GOMP_MAX_ACTIVE_LEVELS_VAR;
>  volatile omp_proc_bind_t GOMP_BIND_VAR;
>  volatile int GOMP_NTEAMS_VAR;
> +volatile int GOMP_TEAMS_THREAD_LIMIT_VAR;

I really don't like this copying of individual ICVs one by one to the
device, copy a struct containing them and access fields in that struct.

> --- a/libgomp/libgomp-plugin.h
> +++ b/libgomp/libgomp-plugin.h
> @@ -116,6 +116,7 @@ struct addr_pair
>  #define GOMP_MAX_ACTIVE_LEVELS_VAR __gomp_max_active_levels
>  #define GOMP_BIND_VAR __gomp_bind
>  #define GOMP_NTEAMS_VAR __gomp_nteams
> +#define GOMP_TEAMS_THREAD_LIMIT_VAR __gomp_teams_thread_limit_var

Likewise here.

> @@ -527,13 +538,19 @@ struct gomp_icv_list {
>  
>  extern void *gomp_get_icv_value_ptr (struct gomp_icv_list **list,
>  				     int device_num);
> -extern struct gomp_icv_list *gomp_run_sched_var_dev_list;
> -extern struct gomp_icv_list *gomp_run_sched_chunk_size_dev_list;
> +extern struct gomp_icv_list* gomp_add_device_specific_icv (int dev_num,
> +							   size_t size,
> +							    struct gomp_icv_list **list);
> +extern struct gomp_icv_list *gomp_initial_run_sched_var_dev_list;
> +extern struct gomp_icv_list *gomp_initial_run_sched_chunk_size_dev_list;
> +extern struct gomp_icv_list *gomp_initial_max_active_levels_var_dev_list;
> +extern struct gomp_icv_list *gomp_initial_proc_bind_var_dev_list;
> +extern struct gomp_icv_list *gomp_initial_proc_bind_var_list_dev_list;
> +extern struct gomp_icv_list *gomp_initial_proc_bind_var_list_len_dev_list;
> +extern struct gomp_icv_list *gomp_initial_nteams_var_dev_list;
> +
>  extern struct gomp_icv_list *gomp_nteams_var_dev_list;
> -extern struct gomp_icv_list *gomp_max_active_levels_var_dev_list;
> -extern struct gomp_icv_list *gomp_proc_bind_var_dev_list;
> -extern struct gomp_icv_list *gomp_proc_bind_var_list_dev_list;
> -extern struct gomp_icv_list *gomp_proc_bind_var_list_len_dev_list;
> +extern struct gomp_icv_list *gomp_teams_thread_limit_var_dev_list;

Nor these per-var lists.  For a specific device, walk the list with
all the vars in it, start with the most specific (matching dev number),
then just dev and then all and fill in from it what is going to be copied.
> --- a/libgomp/plugin/plugin-gcn.c
> +++ b/libgomp/plugin/plugin-gcn.c
> @@ -572,7 +572,8 @@ static char *GOMP_ICV_STRINGS[] =
>    XSTRING (GOMP_DYN_VAR),
>    XSTRING (GOMP_MAX_ACTIVE_LEVELS_VAR),
>    XSTRING (GOMP_BIND_VAR),
> -  XSTRING (GOMP_NTEAMS_VAR)
> +  XSTRING (GOMP_NTEAMS_VAR),
> +  XSTRING (GOMP_TEAMS_THREAD_LIMIT_VAR)

Then you don't need to e.g. track the names of the individual vars, just
one for the whole ICV block.

	Jakub


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Patch] OpenMP, libgomp, gimple: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices
  2022-06-30 13:16 ` Jakub Jelinek
@ 2022-08-03 12:40   ` Marcel Vollweiler
  2022-09-18  8:24     ` Marcel Vollweiler
  0 siblings, 1 reply; 7+ messages in thread
From: Marcel Vollweiler @ 2022-08-03 12:40 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 6570 bytes --]

Hi Jakub,

This patch was reduced a bit and most of your comments were considered in the
last submission of the environment variable syntax extension patch
(https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599175.html). This patch
also builds on that envvar patch version.

The nteams-var related content was moved from this patch to the envvar patch as
that is closely connected. However, additional testing and testing of copy back
device-specific nteams-var ICV values is still included in this patch together
with the teams-thread-limit-var content.

>> --- a/gcc/gimplify.cc
>> +++ b/gcc/gimplify.cc
>> @@ -13994,7 +13994,7 @@ optimize_target_teams (tree target, gimple_seq *pre_p)
>>     struct gimplify_omp_ctx *target_ctx = gimplify_omp_ctxp;
>>
>>     if (teams == NULL_TREE)
>> -    num_teams_upper = integer_one_node;
>> +    num_teams_upper = integer_minus_two_node;
>
> No, please don't introduce this, it is quite costly to have a GC trees
> like integer_one_node, so they should stay for the most commonly used
> numbers, -2 isn't like that.  Just build_int_cst (integer_type_node, -2).

integer_minus_two_node was replaced by "build_int_cst (integer_type_node, -2)".

>
>> --- a/gcc/tree-core.h
>> +++ b/gcc/tree-core.h
>> @@ -642,6 +642,7 @@ enum tree_index {
>>     TI_INTEGER_ONE,
>>     TI_INTEGER_THREE,
>>     TI_INTEGER_MINUS_ONE,
>> +  TI_INTEGER_MINUS_TWO,
>>     TI_NULL_POINTER,
>>
>>     TI_SIZE_ZERO,
>> diff --git a/gcc/tree.cc b/gcc/tree.cc
>> index 8f83ea1..8cb474d 100644
>> --- a/gcc/tree.cc
>> +++ b/gcc/tree.cc
>> @@ -9345,6 +9345,7 @@ build_common_tree_nodes (bool signed_char)
>>     integer_one_node = build_int_cst (integer_type_node, 1);
>>     integer_three_node = build_int_cst (integer_type_node, 3);
>>     integer_minus_one_node = build_int_cst (integer_type_node, -1);
>> +  integer_minus_two_node = build_int_cst (integer_type_node, -2);
>>
>>     size_zero_node = size_int (0);
>>     size_one_node = size_int (1);
>> diff --git a/gcc/tree.h b/gcc/tree.h
>> index cea49a5..1aeb009 100644
>> --- a/gcc/tree.h
>> +++ b/gcc/tree.h
>> @@ -4206,6 +4206,7 @@ tree_strip_any_location_wrapper (tree exp)
>>   #define integer_one_node           global_trees[TI_INTEGER_ONE]
>>   #define integer_three_node              global_trees[TI_INTEGER_THREE]
>>   #define integer_minus_one_node             global_trees[TI_INTEGER_MINUS_ONE]
>> +#define integer_minus_two_node              global_trees[TI_INTEGER_MINUS_TWO]
>>   #define size_zero_node                     global_trees[TI_SIZE_ZERO]
>>   #define size_one_node                      global_trees[TI_SIZE_ONE]
>>   #define bitsize_zero_node          global_trees[TI_BITSIZE_ZERO]
>
> And drop the above 3 hunks.

Removed.

>
>> --- a/libgomp/config/gcn/icv-device.c
>> +++ b/libgomp/config/gcn/icv-device.c
>> @@ -37,6 +37,7 @@ volatile int GOMP_DEFAULT_DEVICE_VAR;
>>   volatile int GOMP_MAX_ACTIVE_LEVELS_VAR;
>>   volatile omp_proc_bind_t GOMP_BIND_VAR;
>>   volatile int GOMP_NTEAMS_VAR;
>> +volatile int GOMP_TEAMS_THREAD_LIMIT_VAR;
>
> I really don't like this copying of individual ICVs one by one to the
> device, copy a struct containing them and access fields in that struct.

I recently changed this in
https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599175.html. So there is
one struct containing all ICVs that are copied from host to the device and back.

>
>> --- a/libgomp/libgomp-plugin.h
>> +++ b/libgomp/libgomp-plugin.h
>> @@ -116,6 +116,7 @@ struct addr_pair
>>   #define GOMP_MAX_ACTIVE_LEVELS_VAR __gomp_max_active_levels
>>   #define GOMP_BIND_VAR __gomp_bind
>>   #define GOMP_NTEAMS_VAR __gomp_nteams
>> +#define GOMP_TEAMS_THREAD_LIMIT_VAR __gomp_teams_thread_limit_var
>
> Likewise here.

Those were all removed.

>
>> @@ -527,13 +538,19 @@ struct gomp_icv_list {
>>
>>   extern void *gomp_get_icv_value_ptr (struct gomp_icv_list **list,
>>                                   int device_num);
>> -extern struct gomp_icv_list *gomp_run_sched_var_dev_list;
>> -extern struct gomp_icv_list *gomp_run_sched_chunk_size_dev_list;
>> +extern struct gomp_icv_list* gomp_add_device_specific_icv (int dev_num,
>> +                                                       size_t size,
>> +                                                        struct gomp_icv_list **list);
>> +extern struct gomp_icv_list *gomp_initial_run_sched_var_dev_list;
>> +extern struct gomp_icv_list *gomp_initial_run_sched_chunk_size_dev_list;
>> +extern struct gomp_icv_list *gomp_initial_max_active_levels_var_dev_list;
>> +extern struct gomp_icv_list *gomp_initial_proc_bind_var_dev_list;
>> +extern struct gomp_icv_list *gomp_initial_proc_bind_var_list_dev_list;
>> +extern struct gomp_icv_list *gomp_initial_proc_bind_var_list_len_dev_list;
>> +extern struct gomp_icv_list *gomp_initial_nteams_var_dev_list;
>> +
>>   extern struct gomp_icv_list *gomp_nteams_var_dev_list;
>> -extern struct gomp_icv_list *gomp_max_active_levels_var_dev_list;
>> -extern struct gomp_icv_list *gomp_proc_bind_var_dev_list;
>> -extern struct gomp_icv_list *gomp_proc_bind_var_list_dev_list;
>> -extern struct gomp_icv_list *gomp_proc_bind_var_list_len_dev_list;
>> +extern struct gomp_icv_list *gomp_teams_thread_limit_var_dev_list;
>
> Nor these per-var lists.  For a specific device, walk the list with
> all the vars in it, start with the most specific (matching dev number),
> then just dev and then all and fill in from it what is going to be copied.

The above lists were removed and instead one list for device-specific ICV
structs was introduced in the above mentioned patch.

>> --- a/libgomp/plugin/plugin-gcn.c
>> +++ b/libgomp/plugin/plugin-gcn.c
>> @@ -572,7 +572,8 @@ static char *GOMP_ICV_STRINGS[] =
>>     XSTRING (GOMP_DYN_VAR),
>>     XSTRING (GOMP_MAX_ACTIVE_LEVELS_VAR),
>>     XSTRING (GOMP_BIND_VAR),
>> -  XSTRING (GOMP_NTEAMS_VAR)
>> +  XSTRING (GOMP_NTEAMS_VAR),
>> +  XSTRING (GOMP_TEAMS_THREAD_LIMIT_VAR)
>
> Then you don't need to e.g. track the names of the individual vars, just
> one for the whole ICV block.

That array was also removed.

The patch was tested on x86_64-linux with nvptx and
amdgcn offloading without regression.

Marcel
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Attachment #2: teams_and_threads_on_device.diff --]
[-- Type: text/plain, Size: 47831 bytes --]

This patch adds support for omp_get_max_teams, omp_set_num_teams, and
omp_{gs}et_teams_thread_limit on offload devices. That includes the usage of
device-specific ICV values (specified as environment variables or changed on a
device). In order to reuse device-specific ICV values, a copy back mechanism is
implemented that copies ICV values back from device to the host. 

gcc/ChangeLog:

	* gimplify.cc (optimize_target_teams): Set initial num_teams_upper
	to "-2" instead of "1" for non-existing num_teams clause in order to
	disambiguate from the case of an existing num_teams clause with value 1.

libgomp/ChangeLog:

	* config/gcn/icv-device.c (omp_get_teams_thread_limit): Added to
	allow processing of device-specific values.
	(omp_set_teams_thread_limit): Likewise.
	(ialias): Likewise.
	* config/nvptx/icv-device.c (omp_get_teams_thread_limit): Likewise.
	(omp_set_teams_thread_limit): Likewise.
	(ialias): Likewise.
	* icv-device.c (omp_get_teams_thread_limit): Likewise.
	(ialias): Likewise.
	(omp_set_teams_thread_limit): Likewise.
	* icv.c (omp_set_teams_thread_limit): Removed.
	(omp_get_teams_thread_limit): Likewise.
	(ialias): Likewise.
	* target.c (get_gomp_offload_icvs): Added teams_thread_limit_var
	handling.
	(gomp_load_image_to_device): Added a size check for the ICVs struct
	variable.
	(gomp_copy_back_icvs): New function that is used in GOMP_target_ext to
	copy back the ICV values from device to host.
	(GOMP_target_ext): Update the number of teams and threads in the kernel
	args also considering device-specific values.
	* testsuite/libgomp.c-c++-common/icv-4.c: Bugfix.
	* testsuite/libgomp.c-c++-common/icv-5.c: Extended.
	* testsuite/libgomp.c-c++-common/icv-6.c: Extended.
	* testsuite/libgomp.c-c++-common/icv-7.c: Extended.
	* testsuite/libgomp.c-c++-common/icv-8.c: Extended.
	* testsuite/libgomp.c-c++-common/icv-9.c: New test.
	* testsuite/libgomp.fortran/icv-5.f90: New test.
	* testsuite/libgomp.fortran/icv-6.f90: New test.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/target-teams-1.c: Adapt expected values for
	num_teams from "1" to "-2" in cases without num_teams clause.
	* g++.dg/gomp/target-teams-1.C: Likewise.
	* gfortran.dg/gomp/defaultmap-4.f90: Likewise.
	* gfortran.dg/gomp/defaultmap-5.f90: Likewise.
	* gfortran.dg/gomp/defaultmap-6.f90: Likewise.

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 2ac7ca0..468fc2b 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -14103,7 +14103,7 @@ optimize_target_teams (tree target, gimple_seq *pre_p)
   struct gimplify_omp_ctx *target_ctx = gimplify_omp_ctxp;
 
   if (teams == NULL_TREE)
-    num_teams_upper = integer_one_node;
+    num_teams_upper = build_int_cst (integer_type_node, -2);
   else
     for (c = OMP_TEAMS_CLAUSES (teams); c; c = OMP_CLAUSE_CHAIN (c))
       {
diff --git a/gcc/testsuite/c-c++-common/gomp/target-teams-1.c b/gcc/testsuite/c-c++-common/gomp/target-teams-1.c
index 51b8d48..74d60e1 100644
--- a/gcc/testsuite/c-c++-common/gomp/target-teams-1.c
+++ b/gcc/testsuite/c-c++-common/gomp/target-teams-1.c
@@ -81,5 +81,5 @@ foo (int a, int b, long c, long d)
 /* { dg-final { scan-tree-dump-times "thread_limit\\(-1\\)" 3 "gimple" } } */
 /* { dg-final { scan-tree-dump-times "num_teams\\(0\\)" 4 "gimple" } } */
 /* { dg-final { scan-tree-dump-times "thread_limit\\(0\\)" 6 "gimple" } } */
-/* { dg-final { scan-tree-dump-times "num_teams\\(1\\)" 2 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "num_teams\\(-2\\)" 2 "gimple" } } */
 /* { dg-final { scan-tree-dump-times "thread_limit\\(1\\)" 0 "gimple" } } */
diff --git a/gcc/testsuite/g++.dg/gomp/target-teams-1.C b/gcc/testsuite/g++.dg/gomp/target-teams-1.C
index f78a608..29e5597 100644
--- a/gcc/testsuite/g++.dg/gomp/target-teams-1.C
+++ b/gcc/testsuite/g++.dg/gomp/target-teams-1.C
@@ -88,5 +88,5 @@ foo (int a, int b, long c, long d)
 /* { dg-final { scan-tree-dump-times "thread_limit\\(-1\\)" 3 "gimple" } } */
 /* { dg-final { scan-tree-dump-times "num_teams\\(0\\)" 4 "gimple" } } */
 /* { dg-final { scan-tree-dump-times "thread_limit\\(0\\)" 6 "gimple" } } */
-/* { dg-final { scan-tree-dump-times "num_teams\\(1\\)" 2 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "num_teams\\(-2\\)" 2 "gimple" } } */
 /* { dg-final { scan-tree-dump-times "thread_limit\\(1\\)" 0 "gimple" } } */
diff --git a/gcc/testsuite/gfortran.dg/gomp/defaultmap-4.f90 b/gcc/testsuite/gfortran.dg/gomp/defaultmap-4.f90
index 7b182b5..9081159 100644
--- a/gcc/testsuite/gfortran.dg/gomp/defaultmap-4.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/defaultmap-4.f90
@@ -141,5 +141,5 @@ end
 ! { dg-final { scan-tree-dump-times "map\\(to:\\.strxparr \\\[len:" 2 "gimple" } }
 ! { dg-final { scan-tree-dump-times "map\\(to:strxparr \\\[pointer set, len:" 2 "gimple" } }
 ! { dg-final { scan-tree-dump-times "map\\(to:\\.strxp \\\[len:" 2 "gimple" } }
-! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(1\\) thread_limit\\(0\\) defaultmap\\(alloc\\)" 1 "gimple" } }
-! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(1\\) thread_limit\\(0\\) defaultmap\\(alloc:scalar\\) defaultmap\\(to:aggregate\\) defaultmap\\(tofrom:allocatable\\) defaultmap\\(firstprivate:pointer\\)" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(-2\\) thread_limit\\(0\\) defaultmap\\(alloc\\)" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(-2\\) thread_limit\\(0\\) defaultmap\\(alloc:scalar\\) defaultmap\\(to:aggregate\\) defaultmap\\(tofrom:allocatable\\) defaultmap\\(firstprivate:pointer\\)" 1 "gimple" } }
diff --git a/gcc/testsuite/gfortran.dg/gomp/defaultmap-5.f90 b/gcc/testsuite/gfortran.dg/gomp/defaultmap-5.f90
index 1391274..91566ed 100644
--- a/gcc/testsuite/gfortran.dg/gomp/defaultmap-5.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/defaultmap-5.f90
@@ -141,5 +141,5 @@ end
 ! { dg-final { scan-tree-dump-times "map\\(to:strxparr \\\[pointer set, len:" 2 "gimple" } }
 ! { dg-final { scan-tree-dump-times "map\\(to:\\*strxp \\\[len:" 1 "gimple" } }
 ! { dg-final { scan-tree-dump-times "map\\(to:\\.strxp \\\[len:" 1 "gimple" } }
-! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(1\\) thread_limit\\(0\\) defaultmap\\(to\\)" 1 "gimple" } }
-! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(1\\) thread_limit\\(0\\) defaultmap\\(to:scalar\\) defaultmap\\(tofrom:aggregate\\) defaultmap\\(firstprivate:allocatable\\) defaultmap\\(default:pointer\\)" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(-2\\) thread_limit\\(0\\) defaultmap\\(to\\)" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(-2\\) thread_limit\\(0\\) defaultmap\\(to:scalar\\) defaultmap\\(tofrom:aggregate\\) defaultmap\\(firstprivate:allocatable\\) defaultmap\\(default:pointer\\)" 1 "gimple" } }
diff --git a/gcc/testsuite/gfortran.dg/gomp/defaultmap-6.f90 b/gcc/testsuite/gfortran.dg/gomp/defaultmap-6.f90
index 9a81d0f..867e41a 100644
--- a/gcc/testsuite/gfortran.dg/gomp/defaultmap-6.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/defaultmap-6.f90
@@ -101,4 +101,4 @@ end
 ! { dg-final { scan-tree-dump-times "map\\(to:\\.strxparr \\\[len:" 1 "gimple" } }
 ! { dg-final { scan-tree-dump-times "map\\(to:strxparr \\\[pointer set, len:" 1 "gimple" } }
 ! { dg-final { scan-tree-dump-times "map\\(to:\\.strxp \\\[len:" 1 "gimple" } }
-! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(1\\) thread_limit\\(0\\) defaultmap\\(default\\)" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(-2\\) thread_limit\\(0\\) defaultmap\\(default\\)" 1 "gimple" } }
diff --git a/libgomp/config/gcn/icv-device.c b/libgomp/config/gcn/icv-device.c
index bf757ba..eb68881 100644
--- a/libgomp/config/gcn/icv-device.c
+++ b/libgomp/config/gcn/icv-device.c
@@ -81,6 +81,19 @@ omp_set_num_teams (int num_teams)
     GOMP_ADDITIONAL_ICVS.nteams = num_teams;
 }
 
+int
+omp_get_teams_thread_limit (void)
+{
+  return GOMP_ADDITIONAL_ICVS.teams_thread_limit;
+}
+
+void
+omp_set_teams_thread_limit (int thread_limit)
+{
+  if (thread_limit >= 0)
+    GOMP_ADDITIONAL_ICVS.teams_thread_limit = thread_limit;
+}
+
 ialias (omp_set_default_device)
 ialias (omp_get_default_device)
 ialias (omp_get_initial_device)
@@ -89,3 +102,5 @@ ialias (omp_is_initial_device)
 ialias (omp_get_device_num)
 ialias (omp_get_max_teams)
 ialias (omp_set_num_teams)
+ialias (omp_get_teams_thread_limit)
+ialias (omp_set_teams_thread_limit)
diff --git a/libgomp/config/nvptx/icv-device.c b/libgomp/config/nvptx/icv-device.c
index 6f869be..a3f00cf 100644
--- a/libgomp/config/nvptx/icv-device.c
+++ b/libgomp/config/nvptx/icv-device.c
@@ -81,6 +81,19 @@ omp_set_num_teams (int num_teams)
     GOMP_ADDITIONAL_ICVS.nteams = num_teams;
 }
 
+int
+omp_get_teams_thread_limit (void)
+{
+  return GOMP_ADDITIONAL_ICVS.teams_thread_limit;
+}
+
+void
+omp_set_teams_thread_limit (int thread_limit)
+{
+  if (thread_limit >= 0)
+    GOMP_ADDITIONAL_ICVS.teams_thread_limit = thread_limit;
+}
+
 ialias (omp_set_default_device)
 ialias (omp_get_default_device)
 ialias (omp_get_initial_device)
@@ -89,3 +102,5 @@ ialias (omp_is_initial_device)
 ialias (omp_get_device_num)
 ialias (omp_get_max_teams)
 ialias (omp_set_num_teams)
+ialias (omp_get_teams_thread_limit)
+ialias (omp_set_teams_thread_limit)
diff --git a/libgomp/icv-device.c b/libgomp/icv-device.c
index d8acf0e..48607ce 100644
--- a/libgomp/icv-device.c
+++ b/libgomp/icv-device.c
@@ -97,3 +97,20 @@ omp_set_num_teams (int num_teams)
 }
 
 ialias (omp_set_num_teams)
+
+int
+omp_get_teams_thread_limit (void)
+{
+  return gomp_teams_thread_limit_var;
+}
+
+ialias (omp_get_teams_thread_limit)
+
+void
+omp_set_teams_thread_limit (int thread_limit)
+{
+  if (thread_limit >= 0)
+    gomp_teams_thread_limit_var = thread_limit;
+}
+
+ialias (omp_set_teams_thread_limit)
diff --git a/libgomp/icv.c b/libgomp/icv.c
index df423c0..9aef91c 100644
--- a/libgomp/icv.c
+++ b/libgomp/icv.c
@@ -148,19 +148,6 @@ omp_get_supported_active_levels (void)
   return gomp_supported_active_levels;
 }
 
-void
-omp_set_teams_thread_limit (int thread_limit)
-{
-  if (thread_limit >= 0)
-    gomp_teams_thread_limit_var = thread_limit;
-}
-
-int
-omp_get_teams_thread_limit (void)
-{
-  return gomp_teams_thread_limit_var;
-}
-
 int
 omp_get_cancellation (void)
 {
@@ -261,8 +248,6 @@ ialias (omp_get_thread_limit)
 ialias (omp_set_max_active_levels)
 ialias (omp_get_max_active_levels)
 ialias (omp_get_supported_active_levels)
-ialias (omp_set_teams_thread_limit)
-ialias (omp_get_teams_thread_limit)
 ialias (omp_get_cancellation)
 ialias (omp_get_proc_bind)
 ialias (omp_get_max_task_priority)
diff --git a/libgomp/target.c b/libgomp/target.c
index 1624938..6160deb 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -2143,6 +2143,19 @@ get_gomp_offload_icvs (int dev_num)
     new->icvs.nteams = gomp_default_icv_values.nteams_var;
 
   if (dev_x != NULL
+      && gomp_get_icv_flag (dev_x->flags, GOMP_ICV_TEAMS_THREAD_LIMIT))
+    new->icvs.teams_thread_limit = dev_x->icvs.teams_thread_limit_var;
+  else if (dev != NULL
+	   && gomp_get_icv_flag (dev->flags, GOMP_ICV_TEAMS_THREAD_LIMIT))
+    new->icvs.teams_thread_limit = dev->icvs.teams_thread_limit_var;
+  else if (all != NULL
+	   && gomp_get_icv_flag (all->flags, GOMP_ICV_TEAMS_THREAD_LIMIT))
+    new->icvs.teams_thread_limit = all->icvs.teams_thread_limit_var;
+  else
+    new->icvs.teams_thread_limit
+      = gomp_default_icv_values.teams_thread_limit_var;
+
+  if (dev_x != NULL
       && gomp_get_icv_flag (dev_x->flags, GOMP_ICV_DEFAULT_DEVICE))
     new->icvs.default_device = dev_x->icvs.default_device_var;
   else if (dev != NULL
@@ -2278,24 +2291,31 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
 	  int dev_num = (int) (devicep - &devices[0]);
 	  struct gomp_offload_icvs *icvs = get_gomp_offload_icvs (dev_num);
 	  size_t var_size = var->end - var->start;
-
+	  if (var_size != sizeof (struct gomp_offload_icvs))
+	    {
+	      gomp_mutex_unlock (&devicep->lock);
+	      if (is_register_lock)
+		gomp_mutex_unlock (&register_lock);
+	      gomp_fatal ("offload plugin managed 'icv struct' not of expected "
+			  "format");
+	    }
 	  /* Copy the ICVs variable to place on device memory, hereby
 	     actually designating its device number into effect.  */
 	  gomp_copy_host2dev (devicep, NULL, (void *) var->start, icvs,
 			      var_size, false, NULL);
-	    splay_tree_key k = &array->key;
-	    k->host_start = (uintptr_t) icvs;
-	    k->host_end =
-	      k->host_start + (size_mask & sizeof (struct gomp_offload_icvs));
-	    k->tgt = tgt;
-	    k->tgt_offset = var->start;
-	    k->refcount = REFCOUNT_INFINITY;
-	    k->dynamic_refcount = 0;
-	    k->aux = NULL;
-	    array->left = NULL;
-	    array->right = NULL;
-	    splay_tree_insert (&devicep->mem_map, array);
-	    array++;
+	  splay_tree_key k = &array->key;
+	  k->host_start = (uintptr_t) icvs;
+	  k->host_end =
+	    k->host_start + (size_mask & sizeof (struct gomp_offload_icvs));
+	  k->tgt = tgt;
+	  k->tgt_offset = var->start;
+	  k->refcount = REFCOUNT_INFINITY;
+	  k->dynamic_refcount = 0;
+	  k->aux = NULL;
+	  array->left = NULL;
+	  array->right = NULL;
+	  splay_tree_insert (&devicep->mem_map, array);
+	  array++;
 	}
     }
 
@@ -2757,6 +2777,20 @@ clear_unsupported_flags (struct gomp_device_descr *devicep, unsigned int flags)
   return flags;
 }
 
+static void
+gomp_copy_back_icvs (struct gomp_device_descr *devicep, int device)
+{
+  struct gomp_offload_icv_list *item = gomp_get_offload_icv_item (device);
+  if (item == NULL)
+    return;
+
+  void *host_ptr = &item->icvs;
+  void *dev_ptr = omp_get_mapped_ptr (host_ptr, device);
+  if (dev_ptr != NULL)
+    gomp_copy_dev2host (devicep, NULL, host_ptr, dev_ptr,
+			sizeof (struct gomp_offload_icvs));
+}
+
 /* Like GOMP_target, but KINDS is 16-bit, UNUSED is no longer present,
    and several arguments have been added:
    FLAGS is a bitmask, see GOMP_TARGET_FLAG_* in gomp-constants.h.
@@ -2789,6 +2823,142 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,
   size_t tgt_align = 0, tgt_size = 0;
   bool fpc_done = false;
 
+  /* Obtain the original TEAMS and THREADS values from ARGS. */
+  intptr_t orig_teams = 1, orig_threads = 0;
+  size_t num_args = 0, len = 1, teams_len = 1, threads_len = 1;
+  void **tmpargs = args;
+  while (*tmpargs)
+    {
+      intptr_t id = (intptr_t) *tmpargs++, val;
+      if (id & GOMP_TARGET_ARG_SUBSEQUENT_PARAM)
+	{
+	  val = (intptr_t) *tmpargs++;
+	  len = 2;
+	}
+      else
+	{
+	  val = id >> GOMP_TARGET_ARG_VALUE_SHIFT;
+	  len = 1;
+	}
+      num_args += len;
+      if ((id & GOMP_TARGET_ARG_DEVICE_MASK) != GOMP_TARGET_ARG_DEVICE_ALL)
+	continue;
+      val = val > INT_MAX ? INT_MAX : val;
+      if ((id & GOMP_TARGET_ARG_ID_MASK) == GOMP_TARGET_ARG_NUM_TEAMS)
+	{
+	  orig_teams = val;
+	  teams_len = len;
+	}
+      else if ((id & GOMP_TARGET_ARG_ID_MASK) == GOMP_TARGET_ARG_THREAD_LIMIT)
+	{
+	  orig_threads = val;
+	  threads_len = len;
+	}
+    }
+
+  intptr_t new_teams = orig_teams, new_threads = orig_threads;
+  /* ORIG_TEAMS == -2: No explicit teams construct specified. Set to 1.
+     ORIG_TEAMS == -1: Teams construct with NUM_TEAMS clause specified, but the
+		       value could not be specified. No Change.
+     ORIG_TEAMS == 0: TEAMS construct without NUM_TEAMS clause.
+		      Set device-specific value.
+     ORIG_TEAMS > 0: Value was already set through e.g. NUM_TEAMS clause.
+		     No change.  */
+  if (orig_teams == -2)
+    new_teams = 1;
+  else if (orig_teams == 0)
+    {
+      struct gomp_offload_icv_list *item = gomp_get_offload_icv_item (device);
+      if (item != NULL)
+	new_teams = item->icvs.nteams;
+    }
+  /* The device-specific teams-thread-limit is only set if (a) an explicit TEAMS
+     region exists, i.e. ORIG_TEAMS > -2, and (b) THREADS was not already set by
+     e.g. a THREAD_LIMIT clause.  */
+  if (orig_teams >= -2 && orig_threads == 0)
+    {
+      struct gomp_offload_icv_list *item = gomp_get_offload_icv_item (device);
+      if (item != NULL)
+	new_threads = item->icvs.teams_thread_limit;
+    }
+
+  /* Copy and change the arguments list only if TEAMS or THREADS need to be
+     updated.  */
+  void **new_args = args;
+  if (orig_teams != new_teams || orig_threads != new_threads)
+    {
+      size_t tms_len = (orig_teams == new_teams
+			? teams_len
+			: (new_teams > -(1 << 15) && new_teams < (1 << 15)
+			   ? 1 : 2));
+      size_t ths_len = (orig_threads == new_threads
+			? threads_len
+			: (new_threads > -(1 << 15) && new_threads < (1 << 15)
+			   ? 1 : 2));
+      /* One additional item after the last arg must be NULL.  */
+      size_t new_args_cnt = num_args - teams_len - threads_len + tms_len
+			    + ths_len + 1;
+      new_args = (void **) gomp_alloca (new_args_cnt * sizeof (void*));
+
+      tmpargs = args;
+      void **tmp_new_args = new_args;
+      while (*tmpargs)
+	{
+	  intptr_t id = (intptr_t) *tmpargs;
+	  if (((id & GOMP_TARGET_ARG_ID_MASK) == GOMP_TARGET_ARG_NUM_TEAMS
+	       && orig_teams != new_teams)
+	      || ((id & GOMP_TARGET_ARG_ID_MASK) == GOMP_TARGET_ARG_THREAD_LIMIT
+		  && orig_threads != new_threads))
+	    {
+	      tmpargs++;
+	      if (id & GOMP_TARGET_ARG_SUBSEQUENT_PARAM)
+		tmpargs++;
+	    }
+	  else
+	    {
+	      *tmp_new_args++ = *tmpargs++;
+	      if (id & GOMP_TARGET_ARG_SUBSEQUENT_PARAM)
+		*tmp_new_args++ = *tmpargs++;
+	    }
+	}
+
+      if (orig_teams != new_teams)
+	{
+	  intptr_t new_val = new_teams;
+	  if (tms_len == 1)
+	    {
+	      new_val = (new_val << GOMP_TARGET_ARG_VALUE_SHIFT)
+			 | GOMP_TARGET_ARG_NUM_TEAMS;
+	      *tmp_new_args++ = (void *) new_val;
+	    }
+	  else
+	    {
+	      *tmp_new_args++ = (void *) (GOMP_TARGET_ARG_SUBSEQUENT_PARAM
+					  | GOMP_TARGET_ARG_NUM_TEAMS);
+	      *tmp_new_args++ = (void *) new_val;
+	    }
+	}
+
+      if (orig_threads != new_threads)
+	{
+	  intptr_t new_val = new_threads;
+	  if (ths_len == 1)
+	    {
+	      new_val = (new_val << GOMP_TARGET_ARG_VALUE_SHIFT)
+			 | GOMP_TARGET_ARG_THREAD_LIMIT;
+	      *tmp_new_args++ = (void *) new_val;
+	    }
+	  else
+	    {
+	      *tmp_new_args++ = (void *) (GOMP_TARGET_ARG_SUBSEQUENT_PARAM
+					  | GOMP_TARGET_ARG_THREAD_LIMIT);
+	      *tmp_new_args++ = (void *) new_val;
+	    }
+	}
+
+      *tmp_new_args = NULL;
+    }
+
   flags = clear_unsupported_flags (devicep, flags);
 
   if (flags & GOMP_TARGET_FLAG_NOWAIT)
@@ -2827,7 +2997,7 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,
 	  && !thr->task->final_task)
 	{
 	  gomp_create_target_task (devicep, fn, mapnum, hostaddrs,
-				   sizes, kinds, flags, depend, args,
+				   sizes, kinds, flags, depend, new_args,
 				   GOMP_TARGET_TASK_BEFORE_MAP);
 	  return;
 	}
@@ -2873,7 +3043,7 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,
 				      tgt_align, tgt_size);
 	    }
 	}
-      gomp_target_fallback (fn, hostaddrs, devicep, args);
+      gomp_target_fallback (fn, hostaddrs, devicep, new_args);
       return;
     }
 
@@ -2903,7 +3073,7 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,
     }
   devicep->run_func (devicep->target_id, fn_addr,
 		     tgt_vars ? (void *) tgt_vars->tgt_start : hostaddrs,
-		     args);
+		     new_args);
   if (tgt_vars)
     {
       htab_clear (refcount_set);
@@ -2911,6 +3081,12 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,
     }
   if (refcount_set)
     htab_free (refcount_set);
+
+  /* Copy back ICVs from device to host.
+     HOST_PTR is expected to exist since it was added in
+     gomp_load_image_to_device if not already available.  */
+  gomp_copy_back_icvs (devicep, device);
+
 }
 
 /* Host fallback for GOMP_target_data{,_ext} routines.  */
diff --git a/libgomp/testsuite/libgomp.c-c++-common/icv-4.c b/libgomp/testsuite/libgomp.c-c++-common/icv-4.c
index b987a33..9da0d63 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/icv-4.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/icv-4.c
@@ -16,7 +16,7 @@ main ()
     }
   else
     omp_set_num_teams (6);
-  if (getenv ("OMP_TEAMS_THREAD_LIMIT") == NULL
+  if (getenv ("OMP_TEAMS_THREAD_LIMIT") != NULL
       && strcmp (getenv ("OMP_TEAMS_THREAD_LIMIT"), "12") == 0)
     {
       if (omp_get_teams_thread_limit () != 12)
diff --git a/libgomp/testsuite/libgomp.c-c++-common/icv-5.c b/libgomp/testsuite/libgomp.c-c++-common/icv-5.c
index 82108bce..82d8e76 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/icv-5.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/icv-5.c
@@ -1,25 +1,205 @@
+/* { dg-additional-options "-DAMD" { target offload_target_amdgcn } } */
 /* { dg-do run } */
-/* { dg-set-target-env-var OMP_NUM_TEAMS_DEV_0 "42" } */
-/* { dg-set-target-env-var OMP_NUM_TEAMS_DEV_1 "43" } */
-/* { dg-set-target-env-var OMP_NUM_TEAMS_DEV_2 "44" } */
-/* { dg-set-target-env-var OMP_NUM_TEAMS_ALL "45" } */
-/* { dg-set-target-env-var OMP_NUM_TEAMS_DEV "46" } */
-/* { dg-set-target-env-var OMP_NUM_TEAMS "47" } */
+/* { dg-set-target-env-var OMP_NUM_TEAMS_ALL "3" } */
+/* { dg-set-target-env-var OMP_NUM_TEAMS_DEV "4" } */
+/* { dg-set-target-env-var OMP_NUM_TEAMS "5" } */
+/* { dg-set-target-env-var OMP_NUM_TEAMS_DEV_0 "6" } */
+/* { dg-set-target-env-var OMP_NUM_TEAMS_DEV_1 "7" } */
+/* { dg-set-target-env-var OMP_NUM_TEAMS_DEV_2 "8" } */
+/* { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_ALL "2" } */
+/* { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_DEV "3" } */
+/* { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT "4" } */
+/* { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_DEV_0 "5" } */
+/* { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_DEV_1 "6" } */
+/* { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_DEV_2 "7" } */
 
 #include <omp.h>
 #include <stdlib.h>
+#include <unistd.h>
 
 int
 main (int argc, char *const *argv)
 {
-  if (omp_get_max_teams () != 47)
+  if (omp_get_max_teams () != 5
+      || omp_get_teams_thread_limit () != 4)
     abort ();
 
+  #pragma omp teams
+  {
+    if (omp_get_num_teams () > 5
+	|| omp_get_team_num () >= 5)
+      abort ();
+    #pragma omp parallel
+    if (omp_get_thread_limit () > 4
+	|| omp_get_thread_num () >= 4)
+      abort ();
+  }
+
+  omp_set_num_teams (4);
+  omp_set_teams_thread_limit (3);
+  if (omp_get_max_teams () != 4
+      || omp_get_teams_thread_limit () != 3)
+    abort ();
+
+  #pragma omp teams
+  {
+    if (omp_get_num_teams () > 4
+	|| omp_get_team_num () >= 4)
+      abort ();
+    #pragma omp parallel
+    if (omp_get_thread_limit () > 3
+	|| omp_get_thread_num () >= 3)
+      abort ();
+  }
+
+  #pragma omp teams num_teams(3) thread_limit(2)
+  {
+    if (omp_get_num_teams () != 3
+	|| omp_get_team_num () >= 3)
+    abort ();
+    #pragma omp parallel
+    if (omp_get_thread_limit () > 2
+	|| omp_get_thread_num () >= 2)
+      abort ();
+  }
+
+  #pragma omp teams num_teams(5) thread_limit(4)
+  {
+    if (omp_get_num_teams () != 5
+	|| omp_get_team_num () >= 5)
+    abort ();
+    #pragma omp parallel
+    if (omp_get_thread_limit () > 4
+	|| omp_get_thread_num () >= 4)
+      abort ();
+  }
+
   int num_devices = omp_get_num_devices () > 3 ? 3 : omp_get_num_devices ();
+  if (num_devices <= 0)
+    return 0;
+
   for (int i=0; i < num_devices; i++)
-    #pragma omp target device (i)
-      if (omp_get_max_teams () != 42 + i)
+    {
+      #pragma omp target device (i)
+      if (omp_get_max_teams () != 6 + i
+	  || omp_get_teams_thread_limit () != 5 + i)
 	abort ();
+      #pragma omp target device (i)
+      #pragma omp teams
+	#pragma omp parallel
+	if (omp_get_thread_limit () > 5 + i
+	    || omp_get_thread_num () >= 5 + i)
+	  abort ();
+
+      #pragma omp target device (i)
+      {
+	omp_set_num_teams (5 + i);
+	omp_set_teams_thread_limit (4 + i);
+	if (omp_get_max_teams () != 5 + i
+	    || omp_get_teams_thread_limit () != 4 + i)
+	  abort ();
+      }
+
+      /* omp_set_num_teams and omp_set_teams_thread_limit above set the value
+	 of nteams-var and teams-thread-limit-var ICVs on device 'i', which has
+	 scope 'device' and should be avaible in subsequent target regions.  */
+      #pragma omp target device (i)
+      if (omp_get_max_teams () != 5 + i
+	  || omp_get_teams_thread_limit () != 4 + i)
+	abort ();
+
+      #pragma omp target device (i)
+      #pragma omp teams
+      {
+	if (omp_get_num_teams () > 5 + i
+	    || omp_get_team_num () >= 5 + i)
+	  abort ();
+	#pragma omp parallel
+	if (omp_get_thread_limit () > 4 + i
+	    || omp_get_thread_num () >= 4 + i)
+	  abort ();
+      }
+
+      #pragma omp target device (i)
+      #pragma omp teams num_teams(6 + i) thread_limit(5 + i)
+      {
+	if (omp_get_num_teams () > 6 + i
+	    || omp_get_team_num () >= 6 + i)
+	  abort ();
+	#pragma omp parallel
+	  if (omp_get_thread_limit () > 5 + i
+	      || omp_get_thread_num () >= 5 + i
+	      || omp_get_num_teams () > 6 + i
+	      || omp_get_team_num () >= 6 + i)
+	    abort ();
+      }
+
+      #pragma omp target device (i)
+      #pragma omp teams num_teams(4 + i) thread_limit(3 + i)
+      {
+	if (omp_get_num_teams () > 4 + i
+	    || omp_get_team_num () >= 4 + i)
+	  abort ();
+	#pragma omp parallel
+	  if (omp_get_thread_limit () > 3 + i
+	      || omp_get_thread_num () >= 3 + i
+	      || omp_get_num_teams () > 4 + i
+	      || omp_get_team_num () >= 4 + i)
+	    abort ();
+      }
+
+      #pragma omp target device (i)
+      #pragma omp teams thread_limit(3 + i) num_teams(4 + i)
+      {
+	if (omp_get_num_teams () > 4 + i
+	    || omp_get_team_num () >= 4 + i)
+	  abort ();
+	#pragma omp parallel
+	  if (omp_get_thread_limit () > 3 + i
+	      || omp_get_thread_num () >= 3 + i
+	      || omp_get_num_teams () > 4 + i
+	      || omp_get_team_num () >= 4 + i)
+	    abort ();
+      }
+
+      /* This tests a large number of teams and threads. If it is larger than
+	 2^15+1 then the according argument in the kernels arguments list
+	 is encoded with two items instead of one. On NVIDIA there is an
+	 adjustment for too large teams and threads. For AMD such adjustment
+	 exists only for threads and will cause runtime errors with a two large
+	 number of teams.  */
+      intptr_t large_num_teams = 66000;
+#ifdef AMD
+      large_num_teams = 8;
+#endif
+      intptr_t large_threads_limit = 67000;
+      #pragma omp target device (i)
+      {
+	omp_set_num_teams (large_num_teams + i);
+	omp_set_teams_thread_limit (large_threads_limit + i);
+	if (omp_get_max_teams () != large_num_teams + i
+	    || omp_get_teams_thread_limit () != large_threads_limit + i)
+	  abort ();
+      }
+
+      #pragma omp target device (i)
+	if (omp_get_max_teams () != large_num_teams + i
+	    || omp_get_teams_thread_limit () != large_threads_limit + i)
+	abort ();
+
+      #pragma omp target device (i)
+      #pragma omp teams
+      {
+	if (omp_get_num_teams () > large_num_teams + i
+	    || omp_get_team_num () >= large_num_teams + i)
+	  abort ();
+	#pragma omp parallel
+	if (omp_get_thread_limit () > large_threads_limit + i
+	    || omp_get_thread_num () >= large_threads_limit + i)
+	  abort ();
+      }
+
+    }
 
   return 0;
 }
diff --git a/libgomp/testsuite/libgomp.c-c++-common/icv-6.c b/libgomp/testsuite/libgomp.c-c++-common/icv-6.c
index 05f07c7..7fdcaf0 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/icv-6.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/icv-6.c
@@ -1,9 +1,10 @@
 /* { dg-do run } */
-/* { dg-set-target-env-var OMP_NUM_TEAMS_ALL "42" } */
-/* { dg-set-target-env-var OMP_NUM_TEAMS_DEV "43" } */
+/* { dg-set-target-env-var OMP_NUM_TEAMS_ALL "3" } */
+/* { dg-set-target-env-var OMP_NUM_TEAMS_DEV "4" } */
+/* { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_ALL "2" } */
+/* { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_DEV "3" } */
 /* { dg-set-target-env-var OMP_SCHEDULE_ALL "guided,4" } */
 /* { dg-set-target-env-var OMP_DYNAMIC_ALL "true" } */
-/* { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_ALL "44" } */
 /* { dg-set-target-env-var OMP_THREAD_LIMIT_ALL "45" } */
 /* { dg-set-target-env-var OMP_NUM_THREADS_ALL "46,3,2" } */
 /* { dg-set-target-env-var OMP_MAX_ACTIVE_LEVELS_ALL "47" } */
@@ -12,7 +13,8 @@
 
 /* This tests the hierarchical usage of ICVs on the device, i.e. if
    OMP_NUM_TEAMS_DEV_<device_num> is not configured, then the value of
-   OMP_NUM_TEAMS_DEV should be used.  And if */
+   OMP_NUM_TEAMS_DEV should be used.  And if OMP_NUM_TEAMS (without suffix) is
+   not defined, then OMP_NUM_TEAMS_ALL should be used for the host.  */
 
 #include <omp.h>
 #include <stdlib.h>
@@ -24,10 +26,10 @@ main (int argc, char *const *argv)
   int chunk_size;
   omp_get_schedule(&kind, &chunk_size);
 
-  if (omp_get_max_teams () != 42
+  if (omp_get_max_teams () != 3
       || !omp_get_dynamic ()
       || kind != 3 || chunk_size != 4
-      || omp_get_teams_thread_limit () != 44
+      || omp_get_teams_thread_limit () != 2
       || omp_get_thread_limit () != 45
       || omp_get_max_threads () != 46
       || omp_get_proc_bind () != omp_proc_bind_spread
@@ -36,9 +38,52 @@ main (int argc, char *const *argv)
 
   int num_devices = omp_get_num_devices () > 3 ? 3 : omp_get_num_devices ();
   for (int i=0; i < num_devices; i++)
-    #pragma omp target device (i)
-      if (omp_get_max_teams () != 43)
+    {
+      #pragma omp target device (i)
+      if (omp_get_max_teams () != 4
+	  || omp_get_teams_thread_limit () != 3)
 	abort ();
+      #pragma omp target device (i)
+      #pragma omp teams
+      {
+	if (omp_get_num_teams () > 4
+	    || omp_get_team_num () >= 4)
+	  abort ();
+	#pragma omp parallel
+	if (omp_get_thread_limit () > 3
+	    || omp_get_thread_num () >= 3)
+	  abort ();
+      }
+
+      #pragma omp target device (i)
+      {
+	omp_set_num_teams (3 + i);
+	omp_set_teams_thread_limit (2 + i);
+	if (omp_get_max_teams () != 3 + i
+	    || omp_get_teams_thread_limit () != 2 + i)
+	  abort ();
+      }
+
+     /* omp_set_num_teams above set the value of nteams-var ICV on device 'i',
+	 which has scope 'device' and should be avaible in subsequent target
+	 regions.  */
+      #pragma omp target device (i)
+      if (omp_get_max_teams () != 3 + i
+	  || omp_get_teams_thread_limit () != 2 + i)
+	abort ();
+
+      #pragma omp target device (i)
+      #pragma omp teams
+      {
+	if (omp_get_num_teams () > 3 + i
+	    || omp_get_team_num () >= 3 + i)
+	  abort ();
+	#pragma omp parallel
+	if (omp_get_thread_limit () > 2 + i
+	    || omp_get_thread_num () >= 2 + i)
+	  abort ();
+      }
+    }
 
   return 0;
 }
diff --git a/libgomp/testsuite/libgomp.c-c++-common/icv-7.c b/libgomp/testsuite/libgomp.c-c++-common/icv-7.c
index 67081dc..ffceaf3 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/icv-7.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/icv-7.c
@@ -1,5 +1,6 @@
 /* { dg-do run } */
-/* { dg-set-target-env-var OMP_NUM_TEAMS_ALL "42" } */
+/* { dg-set-target-env-var OMP_NUM_TEAMS_ALL "7" } */
+/* { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_ALL "2" } */
 
 /* This tests the hierarchical usage of ICVs on the host and on devices, i.e. if
    OMP_NUM_TEAMS_DEV_<device_num>, OMP_NUM_TEAMS_DEV, and
@@ -13,14 +14,73 @@
 int
 main (int argc, char *const *argv)
 {
-  if (omp_get_max_teams () != 42)
+  if (omp_get_max_teams () != 7
+      || omp_get_teams_thread_limit () != 2)
+    abort ();
+
+  #pragma omp teams
+  if (omp_get_num_teams () > 7
+      || omp_get_team_num () >= 7)
+    abort ();
+
+  omp_set_num_teams (9);
+  omp_set_teams_thread_limit (3);
+  if (omp_get_max_teams () != 9
+      || omp_get_teams_thread_limit () != 3)
+    abort ();
+
+  #pragma omp teams
+  if (omp_get_num_teams () > 9
+      || omp_get_team_num () >= 9)
+    abort ();
+
+  #pragma omp teams num_teams(5)
+  if (omp_get_num_teams () > 5
+      || omp_get_team_num () >= 5)
     abort ();
 
   int num_devices = omp_get_num_devices () > 3 ? 3 : omp_get_num_devices ();
   for (int i=0; i < num_devices; i++)
-    #pragma omp target device (i)
-      if (omp_get_max_teams () != 42)
+    {
+      #pragma omp target device (i)
+      if (omp_get_max_teams () != 7
+	  || omp_get_teams_thread_limit () != 2)
+	abort ();
+
+      #pragma omp target device (i)
+      #pragma omp teams
+      if (omp_get_num_teams () > 7
+	  || omp_get_team_num () >= 7)
+	abort ();
+
+      #pragma omp target device (i)
+      {
+	omp_set_num_teams (8 + i);
+	omp_set_teams_thread_limit (4 + i);
+	if (omp_get_max_teams () != 8 + i
+	    || omp_get_teams_thread_limit () != 4 + i)
+	  abort ();
+      }
+
+     /* omp_set_num_teams above set the value of nteams-var ICV on device 'i',
+	 which has scope 'device' and should be avaible in subsequent target
+	 regions.  */
+      #pragma omp target device (i)
+      if (omp_get_max_teams () != 8 + i
+	  || omp_get_teams_thread_limit () != 4 + i)
+	abort ();
+
+      #pragma omp target device (i)
+      #pragma omp teams
+      if (omp_get_num_teams () > 8 + i
+	  || omp_get_team_num () >= 8 + i)
+	abort ();
+
+      #pragma omp target device (i)
+      #pragma omp teams num_teams(5 + i)
+      if (omp_get_num_teams () != 5 + i)
 	abort ();
+    }
 
   return 0;
 }
diff --git a/libgomp/testsuite/libgomp.c-c++-common/icv-8.c b/libgomp/testsuite/libgomp.c-c++-common/icv-8.c
index adaff5a..ad1dbfc 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/icv-8.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/icv-8.c
@@ -19,4 +19,4 @@ main (int argc, char *const *argv)
 /* { dg-output ".*Invalid device number in OMP_NUM_TEAMS_DEV_01=44 (leading zero).*" } */
 /* { dg-output ".*Invalid device number in OMP_NUM_TEAMS_DEV_a=45.*" } */
 /* { dg-output ".*Invalid device number in OMP_NUM_TEAMS_DEV_12345678901=46 (too long).*" } */
-/* { dg-output ".*Non-negative device number expected in OMP_NUM_TEAMS_DEV_-1=47.*" } */
+/* { dg-output ".*Non-negative device number expected in OMP_NUM_TEAMS_DEV_-1=47.*" } */
\ No newline at end of file
diff --git a/libgomp/testsuite/libgomp.c-c++-common/icv-9.c b/libgomp/testsuite/libgomp.c-c++-common/icv-9.c
new file mode 100644
index 0000000..c850342
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/icv-9.c
@@ -0,0 +1,74 @@
+/* { dg-do run } */
+
+/* This tests usage of ICVs on the host and on devices if no corresponding
+   environment variables are configured.  */
+
+#include <omp.h>
+#include <stdlib.h>
+
+int
+main (int argc, char *const *argv)
+{
+  if (omp_get_max_teams () != 0
+      || omp_get_teams_thread_limit () != 0)
+    abort ();
+
+  omp_set_num_teams (9);
+  omp_set_teams_thread_limit (2);
+  if (omp_get_max_teams () != 9
+      || omp_get_teams_thread_limit () != 2)
+    abort ();
+
+  #pragma omp teams
+  if (omp_get_num_teams () > 9
+      || omp_get_team_num () >= 9)
+    abort ();
+
+  #pragma omp teams num_teams(5)
+  if (omp_get_num_teams () > 5
+      || omp_get_team_num () >= 5)
+    abort ();
+
+  int num_devices = omp_get_num_devices () > 3 ? 3 : omp_get_num_devices ();
+  for (int i=0; i < num_devices; i++)
+    {
+      #pragma omp target device (i)
+      if (omp_get_max_teams () != 0
+	  || omp_get_teams_thread_limit () != 0)
+	abort ();
+
+      #pragma omp target device (i)
+      {
+	omp_set_num_teams (8 + i);
+	omp_set_teams_thread_limit (3 + i);
+	if (omp_get_max_teams () != 8 + i
+	    || omp_get_teams_thread_limit () != 3 + i)
+	  abort ();
+      }
+
+     /* omp_set_num_teams above set the value of nteams-var ICV on device 'i',
+	 which has scope 'device' and should be avaible in subsequent target
+	 regions.  */
+      #pragma omp target device (i)
+      if (omp_get_max_teams () != 8 + i
+	  || omp_get_teams_thread_limit () != 3 + i)
+	abort ();
+
+      #pragma omp target device (i)
+      #pragma omp teams
+      if (omp_get_num_teams () > 8 + i
+	  || omp_get_team_num () >= 8 + i)
+	abort ();
+
+      #pragma omp target device (i)
+      #pragma omp teams num_teams(5 + i)
+      if (omp_get_num_teams () > 5 + i
+	  || omp_get_team_num () >= 5 + i)
+	abort ();
+
+      #pragma omp target device (i)
+      ;
+    }
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.fortran/icv-5.f90 b/libgomp/testsuite/libgomp.fortran/icv-5.f90
new file mode 100644
index 0000000..05a35fa
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/icv-5.f90
@@ -0,0 +1,226 @@
+! { dg-set-target-env-var OMP_NUM_TEAMS_ALL "3" }
+! { dg-set-target-env-var OMP_NUM_TEAMS_DEV "4" }
+! { dg-set-target-env-var OMP_NUM_TEAMS "5" }
+! { dg-set-target-env-var OMP_NUM_TEAMS_DEV_0 "6" }
+! { dg-set-target-env-var OMP_NUM_TEAMS_DEV_1 "7" }
+! { dg-set-target-env-var OMP_NUM_TEAMS_DEV_2 "8" }
+! { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_ALL "2" }
+! { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_DEV "3" }
+! { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT "4" }
+! { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_DEV_0 "5" }
+! { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_DEV_1 "6" }
+! { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_DEV_2 "7" }
+
+use omp_lib
+implicit none (type, external)
+  integer :: num_devices, i, large_num_teams, large_threads_limit
+  logical :: err
+
+  if (omp_get_num_devices () > 3) then
+    num_devices = 3
+  else
+    num_devices = omp_get_num_devices ()
+  end if
+
+  do i=0,num_devices-1
+
+    ! Testing NUM_TEAMS.
+    if (env_is_set_dev ("OMP_NUM_TEAMS_DEV_", i, 6 + i)) then
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_max_teams () /= 6 + i) err = .true.
+      !$omp end target
+      if (err) stop 1
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      if (omp_get_num_teams () > 6 + i .or. omp_get_team_num () >= 6 + i) &
+        err = .true.
+      !$omp end teams
+      !$omp end target
+      if (err) stop 2
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      call omp_set_num_teams (5 + i)
+      if (omp_get_max_teams () /= 5 + i) err = .true.
+      !$omp end target
+      if (err) stop 3
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_max_teams () /= 5 + i) err = .true.
+      !$omp end target
+      if (err) stop 4
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      if (omp_get_num_teams () > 5 + i .or. omp_get_team_num () >= 5 + i) &
+        err = .true.
+      !$omp end teams
+      !$omp end target
+      if (err) stop 5
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams num_teams(6 + i)
+      if (omp_get_num_teams () > 6 + i .or. omp_get_team_num () >= 6 + i) &
+        err = .true.
+      !$omp end teams
+      !$omp end target
+      if (err) stop 6
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams num_teams(4 + i)
+      if (omp_get_num_teams () > 4 + i .or. omp_get_team_num () >= 4 + i) &
+        err = .true.
+      !$omp end teams
+      !$omp end target
+      if (err) stop 7
+
+      large_num_teams = 66000
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      call omp_set_num_teams (large_num_teams + i)
+      if (omp_get_max_teams () /= large_num_teams + i) err = .true.
+      !$omp end target
+      if (err) stop 8
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_max_teams () /= large_num_teams + i) err = .true.
+      !$omp end target
+      if (err) stop 9
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      if (omp_get_num_teams () > large_num_teams + i &
+          .or. omp_get_team_num () >= large_num_teams + i) err = .true.
+      !$omp end teams
+      !$omp end target
+      if (err) stop 10
+    end if
+
+    ! Testing TEAMS-THREAD-LIMIT
+    if (env_is_set_dev ("OMP_TEAMS_THREAD_LIMIT_DEV_", i, 5 + i)) then
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_teams_thread_limit () /= 5 + i) err = .true.
+      !$omp end target
+      if (err) stop 11
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      !$omp parallel
+      if (omp_get_thread_limit () > 5 + i .or. omp_get_thread_num () >= 5 + i) &
+        err = .true.
+      !$omp end parallel
+      !$omp end teams
+      !$omp end target
+      if (err) stop 12
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      call omp_set_teams_thread_limit (4 + i)
+      if (omp_get_teams_thread_limit () /= 4 + i) err = .true.
+      !$omp end target
+      if (err) stop 13
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_teams_thread_limit () /= 4 + i) err = .true.
+      !$omp end target
+      if (err) stop 14
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      !$omp parallel
+      if (omp_get_thread_limit () > 4 + i .or. omp_get_thread_num () >= 4 + i) &
+        err = .true.
+      !$omp end parallel
+      !$omp end teams
+      !$omp end target
+      if (err) stop 15
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams thread_limit(5 + i)
+      !$omp parallel
+      if (omp_get_thread_limit () > 5 + i .or. omp_get_thread_num () >= 5 + i) &
+        err = .true.
+      !$omp end parallel
+      !$omp end teams
+      !$omp end target
+      if (err) stop 16
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams thread_limit(3 + i)
+      !$omp parallel
+      if (omp_get_thread_limit () > 3 + i .or. omp_get_thread_num () >= 3 + i) &
+        err = .true.
+      !$omp end parallel
+      !$omp end teams
+      !$omp end target
+      if (err) stop 17
+
+      large_threads_limit = 67000
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      call omp_set_teams_thread_limit (large_threads_limit + i)
+      if (omp_get_teams_thread_limit () /= large_threads_limit + i) err = .true.
+      !$omp end target
+      if (err) stop 18
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_teams_thread_limit () /= large_threads_limit + i) err = .true.
+      !$omp end target
+      if (err) stop 19
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      !$omp parallel
+      if (omp_get_thread_limit () > large_threads_limit + i &
+          .or. omp_get_thread_num () >= large_threads_limit + i) err = .true.
+      !$omp end parallel
+      !$omp end teams
+      !$omp end target
+      if (err) stop 20
+    end if
+
+  end do
+
+contains
+  logical function env_is_set (name, val)
+    character(len=*) :: name, val
+    character(len=40) :: val2
+    integer :: stat
+    call get_environment_variable (name, val2, status=stat)
+    if (stat == 0) then
+      if (val == val2) then
+        env_is_set = .true.
+        return
+      end if
+    else if (stat /= 1) then
+      error stop 30
+    endif
+    env_is_set = .false.
+  end
+  logical function env_is_set_dev (name, dev_num, val)
+    character(len=*) :: name
+    integer :: dev_num, val
+    character(len=64) :: dev_num_str, env_var, val_str
+    dev_num_str = ADJUSTL(dev_num_str)
+    env_var = name // dev_num_str
+    val_str = ADJUSTL(val_str)
+    env_is_set_dev = env_is_set (TRIM(env_var), TRIM(val_str))
+  end
+end
diff --git a/libgomp/testsuite/libgomp.fortran/icv-6.f90 b/libgomp/testsuite/libgomp.fortran/icv-6.f90
new file mode 100644
index 0000000..c8e6a0d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/icv-6.f90
@@ -0,0 +1,140 @@
+! { dg-set-target-env-var OMP_NUM_TEAMS_ALL "3" }
+! { dg-set-target-env-var OMP_NUM_TEAMS_DEV "4" }
+! { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_ALL "2" }
+! { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_DEV "3" }
+
+! This test considers the hierarchical usage of ICVs on the device, i.e. if
+! e.g. OMP_NUM_TEAMS_DEV_<device_num> is not configured, then the value of
+! OMP_NUM_TEAMS_DEV should be used for the targets.
+
+use omp_lib
+implicit none (type, external)
+  integer :: num_devices, i, stat, tmp
+  logical :: err
+  character(len=40) :: val
+
+  ! The following environment variables should not be set.
+  call get_environment_variable ("OMP_NUM_TEAMS_DEV_0", val, status=stat)
+  if (stat /= 1) return
+  call get_environment_variable ("OMP_NUM_TEAMS_DEV_1", val, status=stat)
+  if (stat /= 1) return
+  call get_environment_variable ("OMP_NUM_TEAMS_DEV_2", val, status=stat)
+  if (stat /= 1) return
+  call get_environment_variable ("OMP_TEAMS_THREAD_LIMIT_DEV_0", val, status=stat)
+  if (stat /= 1) return
+  call get_environment_variable ("OMP_TEAMS_THREAD_LIMIT_DEV_1", val, status=stat)
+  if (stat /= 1) return
+  call get_environment_variable ("OMP_TEAMS_THREAD_LIMIT_DEV_2", val, status=stat)
+  if (stat /= 1) return
+
+  if (omp_get_num_devices () > 3) then
+    num_devices = 3
+  else
+    num_devices = omp_get_num_devices ()
+  end if
+
+  do i=0,num_devices-1
+
+    ! Testing NUM_TEAMS.
+    if (env_is_set ("OMP_NUM_TEAMS_DEV", "4")) then
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_max_teams () /= 4) err = .true.
+      !$omp end target
+      if (err) stop 1
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      if (omp_get_num_teams () > 4 .or. omp_get_team_num () >= 4) &
+        err = .true.
+      !$omp end teams
+      !$omp end target
+      if (err) stop 2
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      call omp_set_num_teams (3 + i)
+      if (omp_get_max_teams () /= 3 + i) err = .true.
+      !$omp end target
+      if (err) stop 3
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_max_teams () /= 3 + i) err = .true.
+      !$omp end target
+      if (err) stop 4
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      if (omp_get_num_teams () > 3 + i .or. omp_get_team_num () >= 3 + i) &
+        err = .true.
+      !$omp end teams
+      !$omp end target
+      if (err) stop 5
+    end if
+
+    ! Testing TEAMS-THREAD-LIMIT
+    if (env_is_set ("OMP_TEAMS_THREAD_LIMIT_DEV", "3")) then
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_teams_thread_limit () /= 3) err = .true.
+      !$omp end target
+      if (err) stop 6
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      !$omp parallel
+      if (omp_get_thread_limit () > 3 .or. omp_get_thread_num () >= 3) &
+        err = .true.
+      !$omp end parallel
+      !$omp end teams
+      !$omp end target
+      if (err) stop 7
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      call omp_set_teams_thread_limit (2 + i)
+      if (omp_get_teams_thread_limit () /= 2 + i) err = .true.
+      !$omp end target
+      if (err) stop 8
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_teams_thread_limit () /= 2 + i) err = .true.
+      !$omp end target
+      if (err) stop 9
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      !$omp parallel
+      if (omp_get_thread_limit () > 2 + i .or. omp_get_thread_num () >= 2 + i) &
+        err = .true.
+      !$omp end parallel
+      !$omp end teams
+      !$omp end target
+      if (err) stop 10
+    end if
+
+  end do
+
+contains
+  logical function env_is_set (name, val)
+    character(len=*) :: name, val
+    character(len=40) :: val2
+    integer :: stat
+    call get_environment_variable (name, val2, status=stat)
+    if (stat == 0) then
+      if (val == val2) then
+        env_is_set = .true.
+        return
+      end if
+    else if (stat /= 1) then
+      error stop 10
+    endif
+    env_is_set = .false.
+  end
+end

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Patch] OpenMP, libgomp, gimple: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices
  2022-08-03 12:40   ` Marcel Vollweiler
@ 2022-09-18  8:24     ` Marcel Vollweiler
  2022-09-30  9:35       ` Jakub Jelinek
  0 siblings, 1 reply; 7+ messages in thread
From: Marcel Vollweiler @ 2022-09-18  8:24 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 707 bytes --]

Hi Jakub,

The last version of this patch was slightly adapted to the latest changes of the
device-specific environment variable syntax
(https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601145.html), also
considering the latest related bug fixes (commits 994ea892bd02d and 7d37c7f67c1bb).

The new patch version was bootstrapped and tested on x86_64-linux with nvptx and
amdgcn offloading without regressions.

Marcel
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Attachment #2: teams_and_threads_on_device.diff --]
[-- Type: text/plain, Size: 47686 bytes --]

This patch adds support for omp_get_max_teams, omp_set_num_teams, and
omp_{gs}et_teams_thread_limit on offload devices. That includes the usage of
device-specific ICV values (specified as environment variables or changed on a
device). In order to reuse device-specific ICV values, a copy back mechanism is
implemented that copies ICV values back from device to the host. 

gcc/ChangeLog:

	* gimplify.cc (optimize_target_teams): Set initial num_teams_upper
	to "-2" instead of "1" for non-existing num_teams clause in order to
	disambiguate from the case of an existing num_teams clause with value 1.

libgomp/ChangeLog:

	* config/gcn/icv-device.c (omp_get_teams_thread_limit): Added to
	allow processing of device-specific values.
	(omp_set_teams_thread_limit): Likewise.
	(ialias): Likewise.
	* config/nvptx/icv-device.c (omp_get_teams_thread_limit): Likewise.
	(omp_set_teams_thread_limit): Likewise.
	(ialias): Likewise.
	* icv-device.c (omp_get_teams_thread_limit): Likewise.
	(ialias): Likewise.
	(omp_set_teams_thread_limit): Likewise.
	* icv.c (omp_set_teams_thread_limit): Removed.
	(omp_get_teams_thread_limit): Likewise.
	(ialias): Likewise.
	* target.c (get_gomp_offload_icvs): Added teams_thread_limit_var
	handling.
	(gomp_load_image_to_device): Added a size check for the ICVs struct
	variable.
	(gomp_copy_back_icvs): New function that is used in GOMP_target_ext to
	copy back the ICV values from device to host.
	(GOMP_target_ext): Update the number of teams and threads in the kernel
	args also considering device-specific values.
	* testsuite/libgomp.c-c++-common/icv-4.c: Bugfix.
	* testsuite/libgomp.c-c++-common/icv-5.c: Extended.
	* testsuite/libgomp.c-c++-common/icv-6.c: Extended.
	* testsuite/libgomp.c-c++-common/icv-7.c: Extended.
	* testsuite/libgomp.c-c++-common/icv-9.c: New test.
	* testsuite/libgomp.fortran/icv-5.f90: New test.
	* testsuite/libgomp.fortran/icv-6.f90: New test.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/target-teams-1.c: Adapt expected values for
	num_teams from "1" to "-2" in cases without num_teams clause.
	* g++.dg/gomp/target-teams-1.C: Likewise.
	* gfortran.dg/gomp/defaultmap-4.f90: Likewise.
	* gfortran.dg/gomp/defaultmap-5.f90: Likewise.
	* gfortran.dg/gomp/defaultmap-6.f90: Likewise.

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index dcdc852..b393ed8 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -14153,7 +14153,7 @@ optimize_target_teams (tree target, gimple_seq *pre_p)
   struct gimplify_omp_ctx *target_ctx = gimplify_omp_ctxp;
 
   if (teams == NULL_TREE)
-    num_teams_upper = integer_one_node;
+    num_teams_upper = build_int_cst (integer_type_node, -2);
   else
     for (c = OMP_TEAMS_CLAUSES (teams); c; c = OMP_CLAUSE_CHAIN (c))
       {
diff --git a/gcc/testsuite/c-c++-common/gomp/target-teams-1.c b/gcc/testsuite/c-c++-common/gomp/target-teams-1.c
index 51b8d48..74d60e1 100644
--- a/gcc/testsuite/c-c++-common/gomp/target-teams-1.c
+++ b/gcc/testsuite/c-c++-common/gomp/target-teams-1.c
@@ -81,5 +81,5 @@ foo (int a, int b, long c, long d)
 /* { dg-final { scan-tree-dump-times "thread_limit\\(-1\\)" 3 "gimple" } } */
 /* { dg-final { scan-tree-dump-times "num_teams\\(0\\)" 4 "gimple" } } */
 /* { dg-final { scan-tree-dump-times "thread_limit\\(0\\)" 6 "gimple" } } */
-/* { dg-final { scan-tree-dump-times "num_teams\\(1\\)" 2 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "num_teams\\(-2\\)" 2 "gimple" } } */
 /* { dg-final { scan-tree-dump-times "thread_limit\\(1\\)" 0 "gimple" } } */
diff --git a/gcc/testsuite/g++.dg/gomp/target-teams-1.C b/gcc/testsuite/g++.dg/gomp/target-teams-1.C
index f78a608..29e5597 100644
--- a/gcc/testsuite/g++.dg/gomp/target-teams-1.C
+++ b/gcc/testsuite/g++.dg/gomp/target-teams-1.C
@@ -88,5 +88,5 @@ foo (int a, int b, long c, long d)
 /* { dg-final { scan-tree-dump-times "thread_limit\\(-1\\)" 3 "gimple" } } */
 /* { dg-final { scan-tree-dump-times "num_teams\\(0\\)" 4 "gimple" } } */
 /* { dg-final { scan-tree-dump-times "thread_limit\\(0\\)" 6 "gimple" } } */
-/* { dg-final { scan-tree-dump-times "num_teams\\(1\\)" 2 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "num_teams\\(-2\\)" 2 "gimple" } } */
 /* { dg-final { scan-tree-dump-times "thread_limit\\(1\\)" 0 "gimple" } } */
diff --git a/gcc/testsuite/gfortran.dg/gomp/defaultmap-4.f90 b/gcc/testsuite/gfortran.dg/gomp/defaultmap-4.f90
index 7b182b5..9081159 100644
--- a/gcc/testsuite/gfortran.dg/gomp/defaultmap-4.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/defaultmap-4.f90
@@ -141,5 +141,5 @@ end
 ! { dg-final { scan-tree-dump-times "map\\(to:\\.strxparr \\\[len:" 2 "gimple" } }
 ! { dg-final { scan-tree-dump-times "map\\(to:strxparr \\\[pointer set, len:" 2 "gimple" } }
 ! { dg-final { scan-tree-dump-times "map\\(to:\\.strxp \\\[len:" 2 "gimple" } }
-! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(1\\) thread_limit\\(0\\) defaultmap\\(alloc\\)" 1 "gimple" } }
-! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(1\\) thread_limit\\(0\\) defaultmap\\(alloc:scalar\\) defaultmap\\(to:aggregate\\) defaultmap\\(tofrom:allocatable\\) defaultmap\\(firstprivate:pointer\\)" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(-2\\) thread_limit\\(0\\) defaultmap\\(alloc\\)" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(-2\\) thread_limit\\(0\\) defaultmap\\(alloc:scalar\\) defaultmap\\(to:aggregate\\) defaultmap\\(tofrom:allocatable\\) defaultmap\\(firstprivate:pointer\\)" 1 "gimple" } }
diff --git a/gcc/testsuite/gfortran.dg/gomp/defaultmap-5.f90 b/gcc/testsuite/gfortran.dg/gomp/defaultmap-5.f90
index 1391274..91566ed 100644
--- a/gcc/testsuite/gfortran.dg/gomp/defaultmap-5.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/defaultmap-5.f90
@@ -141,5 +141,5 @@ end
 ! { dg-final { scan-tree-dump-times "map\\(to:strxparr \\\[pointer set, len:" 2 "gimple" } }
 ! { dg-final { scan-tree-dump-times "map\\(to:\\*strxp \\\[len:" 1 "gimple" } }
 ! { dg-final { scan-tree-dump-times "map\\(to:\\.strxp \\\[len:" 1 "gimple" } }
-! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(1\\) thread_limit\\(0\\) defaultmap\\(to\\)" 1 "gimple" } }
-! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(1\\) thread_limit\\(0\\) defaultmap\\(to:scalar\\) defaultmap\\(tofrom:aggregate\\) defaultmap\\(firstprivate:allocatable\\) defaultmap\\(default:pointer\\)" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(-2\\) thread_limit\\(0\\) defaultmap\\(to\\)" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(-2\\) thread_limit\\(0\\) defaultmap\\(to:scalar\\) defaultmap\\(tofrom:aggregate\\) defaultmap\\(firstprivate:allocatable\\) defaultmap\\(default:pointer\\)" 1 "gimple" } }
diff --git a/gcc/testsuite/gfortran.dg/gomp/defaultmap-6.f90 b/gcc/testsuite/gfortran.dg/gomp/defaultmap-6.f90
index 9a81d0f..867e41a 100644
--- a/gcc/testsuite/gfortran.dg/gomp/defaultmap-6.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/defaultmap-6.f90
@@ -101,4 +101,4 @@ end
 ! { dg-final { scan-tree-dump-times "map\\(to:\\.strxparr \\\[len:" 1 "gimple" } }
 ! { dg-final { scan-tree-dump-times "map\\(to:strxparr \\\[pointer set, len:" 1 "gimple" } }
 ! { dg-final { scan-tree-dump-times "map\\(to:\\.strxp \\\[len:" 1 "gimple" } }
-! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(1\\) thread_limit\\(0\\) defaultmap\\(default\\)" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(-2\\) thread_limit\\(0\\) defaultmap\\(default\\)" 1 "gimple" } }
diff --git a/libgomp/config/gcn/icv-device.c b/libgomp/config/gcn/icv-device.c
index bf757ba..eb68881 100644
--- a/libgomp/config/gcn/icv-device.c
+++ b/libgomp/config/gcn/icv-device.c
@@ -81,6 +81,19 @@ omp_set_num_teams (int num_teams)
     GOMP_ADDITIONAL_ICVS.nteams = num_teams;
 }
 
+int
+omp_get_teams_thread_limit (void)
+{
+  return GOMP_ADDITIONAL_ICVS.teams_thread_limit;
+}
+
+void
+omp_set_teams_thread_limit (int thread_limit)
+{
+  if (thread_limit >= 0)
+    GOMP_ADDITIONAL_ICVS.teams_thread_limit = thread_limit;
+}
+
 ialias (omp_set_default_device)
 ialias (omp_get_default_device)
 ialias (omp_get_initial_device)
@@ -89,3 +102,5 @@ ialias (omp_is_initial_device)
 ialias (omp_get_device_num)
 ialias (omp_get_max_teams)
 ialias (omp_set_num_teams)
+ialias (omp_get_teams_thread_limit)
+ialias (omp_set_teams_thread_limit)
diff --git a/libgomp/config/nvptx/icv-device.c b/libgomp/config/nvptx/icv-device.c
index 6f869be..a3f00cf 100644
--- a/libgomp/config/nvptx/icv-device.c
+++ b/libgomp/config/nvptx/icv-device.c
@@ -81,6 +81,19 @@ omp_set_num_teams (int num_teams)
     GOMP_ADDITIONAL_ICVS.nteams = num_teams;
 }
 
+int
+omp_get_teams_thread_limit (void)
+{
+  return GOMP_ADDITIONAL_ICVS.teams_thread_limit;
+}
+
+void
+omp_set_teams_thread_limit (int thread_limit)
+{
+  if (thread_limit >= 0)
+    GOMP_ADDITIONAL_ICVS.teams_thread_limit = thread_limit;
+}
+
 ialias (omp_set_default_device)
 ialias (omp_get_default_device)
 ialias (omp_get_initial_device)
@@ -89,3 +102,5 @@ ialias (omp_is_initial_device)
 ialias (omp_get_device_num)
 ialias (omp_get_max_teams)
 ialias (omp_set_num_teams)
+ialias (omp_get_teams_thread_limit)
+ialias (omp_set_teams_thread_limit)
diff --git a/libgomp/icv-device.c b/libgomp/icv-device.c
index d8acf0e..48607ce 100644
--- a/libgomp/icv-device.c
+++ b/libgomp/icv-device.c
@@ -97,3 +97,20 @@ omp_set_num_teams (int num_teams)
 }
 
 ialias (omp_set_num_teams)
+
+int
+omp_get_teams_thread_limit (void)
+{
+  return gomp_teams_thread_limit_var;
+}
+
+ialias (omp_get_teams_thread_limit)
+
+void
+omp_set_teams_thread_limit (int thread_limit)
+{
+  if (thread_limit >= 0)
+    gomp_teams_thread_limit_var = thread_limit;
+}
+
+ialias (omp_set_teams_thread_limit)
diff --git a/libgomp/icv.c b/libgomp/icv.c
index df423c0..9aef91c 100644
--- a/libgomp/icv.c
+++ b/libgomp/icv.c
@@ -148,19 +148,6 @@ omp_get_supported_active_levels (void)
   return gomp_supported_active_levels;
 }
 
-void
-omp_set_teams_thread_limit (int thread_limit)
-{
-  if (thread_limit >= 0)
-    gomp_teams_thread_limit_var = thread_limit;
-}
-
-int
-omp_get_teams_thread_limit (void)
-{
-  return gomp_teams_thread_limit_var;
-}
-
 int
 omp_get_cancellation (void)
 {
@@ -261,8 +248,6 @@ ialias (omp_get_thread_limit)
 ialias (omp_set_max_active_levels)
 ialias (omp_get_max_active_levels)
 ialias (omp_get_supported_active_levels)
-ialias (omp_set_teams_thread_limit)
-ialias (omp_get_teams_thread_limit)
 ialias (omp_get_cancellation)
 ialias (omp_get_proc_bind)
 ialias (omp_get_max_task_priority)
diff --git a/libgomp/target.c b/libgomp/target.c
index 5763483..193c828 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -2155,6 +2155,19 @@ get_gomp_offload_icvs (int dev_num)
     new->icvs.nteams = gomp_default_icv_values.nteams_var;
 
   if (dev_x != NULL
+      && gomp_get_icv_flag (dev_x->flags, GOMP_ICV_TEAMS_THREAD_LIMIT))
+    new->icvs.teams_thread_limit = dev_x->icvs.teams_thread_limit_var;
+  else if (dev != NULL
+	   && gomp_get_icv_flag (dev->flags, GOMP_ICV_TEAMS_THREAD_LIMIT))
+    new->icvs.teams_thread_limit = dev->icvs.teams_thread_limit_var;
+  else if (all != NULL
+	   && gomp_get_icv_flag (all->flags, GOMP_ICV_TEAMS_THREAD_LIMIT))
+    new->icvs.teams_thread_limit = all->icvs.teams_thread_limit_var;
+  else
+    new->icvs.teams_thread_limit
+      = gomp_default_icv_values.teams_thread_limit_var;
+
+  if (dev_x != NULL
       && gomp_get_icv_flag (dev_x->flags, GOMP_ICV_DEFAULT_DEVICE))
     new->icvs.default_device = dev_x->icvs.default_device_var;
   else if (dev != NULL
@@ -2290,7 +2303,14 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
 	  int dev_num = (int) (devicep - &devices[0]);
 	  struct gomp_offload_icvs *icvs = get_gomp_offload_icvs (dev_num);
 	  size_t var_size = var->end - var->start;
-
+	  if (var_size != sizeof (struct gomp_offload_icvs))
+	    {
+	      gomp_mutex_unlock (&devicep->lock);
+	      if (is_register_lock)
+		gomp_mutex_unlock (&register_lock);
+	      gomp_fatal ("offload plugin managed 'icv struct' not of expected "
+			  "format");
+	    }
 	  /* Copy the ICVs variable to place on device memory, hereby
 	     actually designating its device number into effect.  */
 	  gomp_copy_host2dev (devicep, NULL, (void *) var->start, icvs,
@@ -2769,6 +2789,20 @@ clear_unsupported_flags (struct gomp_device_descr *devicep, unsigned int flags)
   return flags;
 }
 
+static void
+gomp_copy_back_icvs (struct gomp_device_descr *devicep, int device)
+{
+  struct gomp_offload_icv_list *item = gomp_get_offload_icv_item (device);
+  if (item == NULL)
+    return;
+
+  void *host_ptr = &item->icvs;
+  void *dev_ptr = omp_get_mapped_ptr (host_ptr, device);
+  if (dev_ptr != NULL)
+    gomp_copy_dev2host (devicep, NULL, host_ptr, dev_ptr,
+			sizeof (struct gomp_offload_icvs));
+}
+
 /* Like GOMP_target, but KINDS is 16-bit, UNUSED is no longer present,
    and several arguments have been added:
    FLAGS is a bitmask, see GOMP_TARGET_FLAG_* in gomp-constants.h.
@@ -2801,6 +2835,146 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,
   size_t tgt_align = 0, tgt_size = 0;
   bool fpc_done = false;
 
+  /* Obtain the original TEAMS and THREADS values from ARGS. */
+  intptr_t orig_teams = 1, orig_threads = 0;
+  size_t num_args = 0, len = 1, teams_len = 1, threads_len = 1;
+  void **tmpargs = args;
+  while (*tmpargs)
+    {
+      intptr_t id = (intptr_t) *tmpargs++, val;
+      if (id & GOMP_TARGET_ARG_SUBSEQUENT_PARAM)
+	{
+	  val = (intptr_t) *tmpargs++;
+	  len = 2;
+	}
+      else
+	{
+	  val = id >> GOMP_TARGET_ARG_VALUE_SHIFT;
+	  len = 1;
+	}
+      num_args += len;
+      if ((id & GOMP_TARGET_ARG_DEVICE_MASK) != GOMP_TARGET_ARG_DEVICE_ALL)
+	continue;
+      val = val > INT_MAX ? INT_MAX : val;
+      if ((id & GOMP_TARGET_ARG_ID_MASK) == GOMP_TARGET_ARG_NUM_TEAMS)
+	{
+	  orig_teams = val;
+	  teams_len = len;
+	}
+      else if ((id & GOMP_TARGET_ARG_ID_MASK) == GOMP_TARGET_ARG_THREAD_LIMIT)
+	{
+	  orig_threads = val;
+	  threads_len = len;
+	}
+    }
+
+  intptr_t new_teams = orig_teams, new_threads = orig_threads;
+  /* ORIG_TEAMS == -2: No explicit teams construct specified. Set to 1.
+     ORIG_TEAMS == -1: TEAMS construct with NUM_TEAMS clause specified, but the
+		       value could not be specified. No Change.
+     ORIG_TEAMS == 0: TEAMS construct without NUM_TEAMS clause.
+		      Set device-specific value.
+     ORIG_TEAMS > 0: Value was already set through e.g. NUM_TEAMS clause.
+		     No change.  */
+  if (orig_teams == -2)
+    new_teams = 1;
+  else if (orig_teams == 0)
+    {
+      struct gomp_offload_icv_list *item = gomp_get_offload_icv_item (device);
+      if (item != NULL)
+	new_teams = item->icvs.nteams;
+    }
+  /* The device-specific teams-thread-limit is only set if (a) an explicit TEAMS
+     region exists, i.e. ORIG_TEAMS > -2, and (b) THREADS was not already set by
+     e.g. a THREAD_LIMIT clause.  */
+  if (orig_teams >= -2 && orig_threads == 0)
+    {
+      struct gomp_offload_icv_list *item = gomp_get_offload_icv_item (device);
+      if (item != NULL)
+	new_threads = item->icvs.teams_thread_limit;
+    }
+
+  /* Copy and change the arguments list only if TEAMS or THREADS need to be
+     updated.  */
+  void **new_args = args;
+  if (orig_teams != new_teams || orig_threads != new_threads)
+    {
+      size_t tms_len = (orig_teams == new_teams
+			? teams_len
+			: (new_teams > -(1 << 15) && new_teams < (1 << 15)
+			   ? 1 : 2));
+      size_t ths_len = (orig_threads == new_threads
+			? threads_len
+			: (new_threads > -(1 << 15) && new_threads < (1 << 15)
+			   ? 1 : 2));
+      /* One additional item after the last arg must be NULL.  */
+      size_t new_args_cnt = num_args - teams_len - threads_len + tms_len
+			    + ths_len + 1;
+      new_args = (void **) gomp_alloca (new_args_cnt * sizeof (void*));
+
+      tmpargs = args;
+      void **tmp_new_args = new_args;
+      /* Copy all args except TEAMS and THREADS.  TEAMS and THREADS are copied
+	 too if they have not been changed and skipped otherwise.  */
+      while (*tmpargs)
+	{
+	  intptr_t id = (intptr_t) *tmpargs;
+	  if (((id & GOMP_TARGET_ARG_ID_MASK) == GOMP_TARGET_ARG_NUM_TEAMS
+	       && orig_teams != new_teams)
+	      || ((id & GOMP_TARGET_ARG_ID_MASK) == GOMP_TARGET_ARG_THREAD_LIMIT
+		  && orig_threads != new_threads))
+	    {
+	      tmpargs++;
+	      if (id & GOMP_TARGET_ARG_SUBSEQUENT_PARAM)
+		tmpargs++;
+	    }
+	  else
+	    {
+	      *tmp_new_args++ = *tmpargs++;
+	      if (id & GOMP_TARGET_ARG_SUBSEQUENT_PARAM)
+		*tmp_new_args++ = *tmpargs++;
+	    }
+	}
+
+      /* Add the new TEAMS arg to the new args list if it has been changed.  */
+      if (orig_teams != new_teams)
+	{
+	  intptr_t new_val = new_teams;
+	  if (tms_len == 1)
+	    {
+	      new_val = (new_val << GOMP_TARGET_ARG_VALUE_SHIFT)
+			 | GOMP_TARGET_ARG_NUM_TEAMS;
+	      *tmp_new_args++ = (void *) new_val;
+	    }
+	  else
+	    {
+	      *tmp_new_args++ = (void *) (GOMP_TARGET_ARG_SUBSEQUENT_PARAM
+					  | GOMP_TARGET_ARG_NUM_TEAMS);
+	      *tmp_new_args++ = (void *) new_val;
+	    }
+	}
+
+      /* Add the new THREADS arg to the new args list if it has been changed. */
+      if (orig_threads != new_threads)
+	{
+	  intptr_t new_val = new_threads;
+	  if (ths_len == 1)
+	    {
+	      new_val = (new_val << GOMP_TARGET_ARG_VALUE_SHIFT)
+			 | GOMP_TARGET_ARG_THREAD_LIMIT;
+	      *tmp_new_args++ = (void *) new_val;
+	    }
+	  else
+	    {
+	      *tmp_new_args++ = (void *) (GOMP_TARGET_ARG_SUBSEQUENT_PARAM
+					  | GOMP_TARGET_ARG_THREAD_LIMIT);
+	      *tmp_new_args++ = (void *) new_val;
+	    }
+	}
+
+      *tmp_new_args = NULL;
+    }
+
   flags = clear_unsupported_flags (devicep, flags);
 
   if (flags & GOMP_TARGET_FLAG_NOWAIT)
@@ -2839,7 +3013,7 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,
 	  && !thr->task->final_task)
 	{
 	  gomp_create_target_task (devicep, fn, mapnum, hostaddrs,
-				   sizes, kinds, flags, depend, args,
+				   sizes, kinds, flags, depend, new_args,
 				   GOMP_TARGET_TASK_BEFORE_MAP);
 	  return;
 	}
@@ -2885,7 +3059,7 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,
 				      tgt_align, tgt_size);
 	    }
 	}
-      gomp_target_fallback (fn, hostaddrs, devicep, args);
+      gomp_target_fallback (fn, hostaddrs, devicep, new_args);
       return;
     }
 
@@ -2915,7 +3089,7 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,
     }
   devicep->run_func (devicep->target_id, fn_addr,
 		     tgt_vars ? (void *) tgt_vars->tgt_start : hostaddrs,
-		     args);
+		     new_args);
   if (tgt_vars)
     {
       htab_clear (refcount_set);
@@ -2923,6 +3097,12 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,
     }
   if (refcount_set)
     htab_free (refcount_set);
+
+  /* Copy back ICVs from device to host.
+     HOST_PTR is expected to exist since it was added in
+     gomp_load_image_to_device if not already available.  */
+  gomp_copy_back_icvs (devicep, device);
+
 }
 
 /* Host fallback for GOMP_target_data{,_ext} routines.  */
diff --git a/libgomp/testsuite/libgomp.c-c++-common/icv-4.c b/libgomp/testsuite/libgomp.c-c++-common/icv-4.c
index b987a33..9da0d63 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/icv-4.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/icv-4.c
@@ -16,7 +16,7 @@ main ()
     }
   else
     omp_set_num_teams (6);
-  if (getenv ("OMP_TEAMS_THREAD_LIMIT") == NULL
+  if (getenv ("OMP_TEAMS_THREAD_LIMIT") != NULL
       && strcmp (getenv ("OMP_TEAMS_THREAD_LIMIT"), "12") == 0)
     {
       if (omp_get_teams_thread_limit () != 12)
diff --git a/libgomp/testsuite/libgomp.c-c++-common/icv-5.c b/libgomp/testsuite/libgomp.c-c++-common/icv-5.c
index 431cfc7..57e960e 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/icv-5.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/icv-5.c
@@ -1,25 +1,210 @@
+/* { dg-additional-options "-DAMD" { target offload_target_amdgcn } } */
 /* { dg-do run } */
-/* { dg-set-target-env-var OMP_NUM_TEAMS_DEV_0 "42" } */
-/* { dg-set-target-env-var OMP_NUM_TEAMS_DEV_1 "43" } */
-/* { dg-set-target-env-var OMP_NUM_TEAMS_DEV_2 "44" } */
-/* { dg-set-target-env-var OMP_NUM_TEAMS_ALL "45" } */
-/* { dg-set-target-env-var OMP_NUM_TEAMS_DEV "46" } */
-/* { dg-set-target-env-var OMP_NUM_TEAMS "47" } */
+/* { dg-set-target-env-var OMP_NUM_TEAMS_ALL "3" } */
+/* { dg-set-target-env-var OMP_NUM_TEAMS_DEV "4" } */
+/* { dg-set-target-env-var OMP_NUM_TEAMS "5" } */
+/* { dg-set-target-env-var OMP_NUM_TEAMS_DEV_0 "6" } */
+/* { dg-set-target-env-var OMP_NUM_TEAMS_DEV_1 "7" } */
+/* { dg-set-target-env-var OMP_NUM_TEAMS_DEV_2 "8" } */
+/* { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_ALL "2" } */
+/* { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_DEV "3" } */
+/* { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT "4" } */
+/* { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_DEV_0 "5" } */
+/* { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_DEV_1 "6" } */
+/* { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_DEV_2 "7" } */
 
 #include <omp.h>
 #include <stdlib.h>
+#include <unistd.h>
 
 int
 main ()
 {
-  if (omp_get_max_teams () != 47)
+  if (omp_get_max_teams () != 5
+      || omp_get_teams_thread_limit () != 4)
     abort ();
 
+  #pragma omp teams
+  {
+    if (omp_get_num_teams () > 5
+	|| omp_get_team_num () >= 5)
+      abort ();
+    #pragma omp parallel
+    if (omp_get_thread_limit () > 4
+	|| omp_get_thread_num () >= 4)
+      abort ();
+  }
+
+  omp_set_num_teams (4);
+  omp_set_teams_thread_limit (3);
+  if (omp_get_max_teams () != 4
+      || omp_get_teams_thread_limit () != 3)
+    abort ();
+
+  #pragma omp teams
+  {
+    if (omp_get_num_teams () > 4
+	|| omp_get_team_num () >= 4)
+      abort ();
+    #pragma omp parallel
+    if (omp_get_thread_limit () > 3
+	|| omp_get_thread_num () >= 3)
+      abort ();
+  }
+
+  #pragma omp teams num_teams(3) thread_limit(2)
+  {
+    if (omp_get_num_teams () != 3
+	|| omp_get_team_num () >= 3)
+    abort ();
+    #pragma omp parallel
+    if (omp_get_thread_limit () > 2
+	|| omp_get_thread_num () >= 2)
+      abort ();
+  }
+
+  #pragma omp teams num_teams(5) thread_limit(4)
+  {
+    if (omp_get_num_teams () != 5
+	|| omp_get_team_num () >= 5)
+    abort ();
+    #pragma omp parallel
+    if (omp_get_thread_limit () > 4
+	|| omp_get_thread_num () >= 4)
+      abort ();
+  }
+
   int num_devices = omp_get_num_devices () > 3 ? 3 : omp_get_num_devices ();
-  for (int i=0; i < num_devices; i++)
-    #pragma omp target device (i)
-      if (omp_get_max_teams () != 42 + i)
+
+  for (int i = 0; i < num_devices; i++)
+    {
+      #pragma omp target device (i)
+      if (omp_get_max_teams () != 6 + i
+	  || omp_get_teams_thread_limit () != 5 + i)
+	abort ();
+
+      #pragma omp target device (i)
+      #pragma omp teams
+      #pragma omp parallel
+      if (omp_get_thread_limit () > 5 + i
+	  || omp_get_thread_num () >= 5 + i)
+	abort ();
+
+      #pragma omp target device (i)
+      {
+	omp_set_num_teams (5 + i);
+	omp_set_teams_thread_limit (4 + i);
+	if (omp_get_max_teams () != 5 + i
+	    || omp_get_teams_thread_limit () != 4 + i)
+	  abort ();
+      }
+
+      /* omp_set_num_teams and omp_set_teams_thread_limit above set the value
+	 of nteams-var and teams-thread-limit-var ICVs on device 'i', which has
+	 scope 'device' and should be avaible in subsequent target regions.  */
+      #pragma omp target device (i)
+      if (omp_get_max_teams () != 5 + i
+	  || omp_get_teams_thread_limit () != 4 + i)
+	abort ();
+
+      #pragma omp target device (i)
+      #pragma omp teams
+      {
+	if (omp_get_num_teams () > 5 + i
+	    || omp_get_team_num () >= 5 + i)
+	  abort ();
+	#pragma omp parallel
+	if (omp_get_thread_limit () > 4 + i
+	    || omp_get_thread_num () >= 4 + i)
+	  abort ();
+      }
+
+      #pragma omp target device (i)
+      #pragma omp teams num_teams(6 + i) thread_limit(5 + i)
+      {
+	if (omp_get_num_teams () > 6 + i
+	    || omp_get_team_num () >= 6 + i)
+	  abort ();
+	#pragma omp parallel
+	if (omp_get_thread_limit () > 5 + i
+	    || omp_get_thread_num () >= 5 + i
+	    || omp_get_num_teams () > 6 + i
+	    || omp_get_team_num () >= 6 + i)
+	  abort ();
+      }
+
+      #pragma omp target device (i)
+      #pragma omp teams num_teams(4 + i) thread_limit(3 + i)
+      {
+	if (omp_get_num_teams () > 4 + i
+	    || omp_get_team_num () >= 4 + i)
+	  abort ();
+	#pragma omp parallel
+	if (omp_get_thread_limit () > 3 + i
+	    || omp_get_thread_num () >= 3 + i
+	    || omp_get_num_teams () > 4 + i
+	    || omp_get_team_num () >= 4 + i)
+	  abort ();
+      }
+
+      #pragma omp target device (i)
+      #pragma omp teams thread_limit(3 + i) num_teams(4 + i)
+      {
+	if (omp_get_num_teams () > 4 + i
+	    || omp_get_team_num () >= 4 + i)
+	  abort ();
+	#pragma omp parallel
+	if (omp_get_thread_limit () > 3 + i
+	    || omp_get_thread_num () >= 3 + i
+	    || omp_get_num_teams () > 4 + i
+	    || omp_get_team_num () >= 4 + i)
+	  abort ();
+      }
+
+      /* The NUM_TEAMS and THREAD_LIMIT clauses should not change the values
+	 of the corresponding ICVs.  */
+      #pragma omp target device (i)
+      if (omp_get_max_teams () != 5 + i
+	  || omp_get_teams_thread_limit () != 4 + i)
+	abort ();
+
+      /* This tests a large number of teams and threads. If it is larger than
+	 2^15+1 then the according argument in the kernels arguments list
+	 is encoded with two items instead of one. On NVIDIA there is an
+	 adjustment for too large teams and threads. For AMD such adjustment
+	 exists only for threads and will cause runtime errors with a two large
+	 number of teams.  */
+      intptr_t large_num_teams = 66000;
+#ifdef AMD
+      large_num_teams = 8;
+#endif
+      intptr_t large_threads_limit = 67000;
+      #pragma omp target device (i)
+      {
+	omp_set_num_teams (large_num_teams + i);
+	omp_set_teams_thread_limit (large_threads_limit + i);
+	if (omp_get_max_teams () != large_num_teams + i
+	    || omp_get_teams_thread_limit () != large_threads_limit + i)
+	  abort ();
+      }
+
+      #pragma omp target device (i)
+	if (omp_get_max_teams () != large_num_teams + i
+	    || omp_get_teams_thread_limit () != large_threads_limit + i)
 	abort ();
 
+      #pragma omp target device (i)
+      #pragma omp teams
+      {
+	if (omp_get_num_teams () > large_num_teams + i
+	    || omp_get_team_num () >= large_num_teams + i)
+	  abort ();
+	#pragma omp parallel
+	if (omp_get_thread_limit () > large_threads_limit + i
+	    || omp_get_thread_num () >= large_threads_limit + i)
+	  abort ();
+      }
+    }
+
   return 0;
 }
diff --git a/libgomp/testsuite/libgomp.c-c++-common/icv-6.c b/libgomp/testsuite/libgomp.c-c++-common/icv-6.c
index e199a18..1b17f2d 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/icv-6.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/icv-6.c
@@ -1,9 +1,10 @@
 /* { dg-do run } */
-/* { dg-set-target-env-var OMP_NUM_TEAMS_ALL "42" } */
-/* { dg-set-target-env-var OMP_NUM_TEAMS_DEV "43" } */
+/* { dg-set-target-env-var OMP_NUM_TEAMS_ALL "3" } */
+/* { dg-set-target-env-var OMP_NUM_TEAMS_DEV "4" } */
+/* { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_ALL "2" } */
+/* { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_DEV "3" } */
 /* { dg-set-target-env-var OMP_SCHEDULE_ALL "guided,4" } */
 /* { dg-set-target-env-var OMP_DYNAMIC_ALL "true" } */
-/* { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_ALL "44" } */
 /* { dg-set-target-env-var OMP_THREAD_LIMIT_ALL "45" } */
 /* { dg-set-target-env-var OMP_NUM_THREADS_ALL "46,3,2" } */
 /* { dg-set-target-env-var OMP_MAX_ACTIVE_LEVELS_ALL "47" } */
@@ -12,8 +13,8 @@
 
 /* This tests the hierarchical usage of ICVs on the device, i.e. if
    OMP_NUM_TEAMS_DEV_<device_num> is not configured, then the value of
-   OMP_NUM_TEAMS_DEV should be used.  And if there is no environment variable
-   without suffix, then the corresponding _ALL variant should be used.  */
+   OMP_NUM_TEAMS_DEV should be used.  And if OMP_NUM_TEAMS (without suffix) is
+   not defined, then OMP_NUM_TEAMS_ALL should be used for the host.  */
 
 #include <omp.h>
 #include <stdlib.h>
@@ -26,10 +27,10 @@ main ()
   int chunk_size;
   omp_get_schedule(&kind, &chunk_size);
 
-  if ((!getenv ("OMP_NUM_TEAMS") && omp_get_max_teams () != 42)
+  if ((!getenv ("OMP_NUM_TEAMS") && omp_get_max_teams () != 3)
       || (!getenv ("OMP_DYNAMIC") && !omp_get_dynamic ())
       || (!getenv ("OMP_SCHEDULE") && (kind != 3 || chunk_size != 4))
-      || (!getenv ("OMP_TEAMS_THREAD_LIMIT") && omp_get_teams_thread_limit () != 44)
+      || (!getenv ("OMP_TEAMS_THREAD_LIMIT") && omp_get_teams_thread_limit () != 2)
       || (!getenv ("OMP_THREAD_LIMIT") && omp_get_thread_limit () != 45)
       || (!getenv ("OMP_NUM_THREADS") && omp_get_max_threads () != 46)
       || (!getenv ("OMP_PROC_BIND") && omp_get_proc_bind () != omp_proc_bind_spread)
@@ -44,9 +45,51 @@ main ()
       name[sizeof ("OMP_NUM_TEAMS_DEV_1") - 2] = '0' + i;
       if (getenv (name))
 	continue;
-    #pragma omp target device (i)
-      if (omp_get_max_teams () != 43)
+
+      #pragma omp target device (i)
+      if (omp_get_max_teams () != 4
+	  || omp_get_teams_thread_limit () != 3)
 	abort ();
+      #pragma omp target device (i)
+      #pragma omp teams
+      {
+	if (omp_get_num_teams () > 4
+	    || omp_get_team_num () >= 4)
+	  abort ();
+	#pragma omp parallel
+	if (omp_get_thread_limit () > 3
+	    || omp_get_thread_num () >= 3)
+	  abort ();
+      }
+
+      #pragma omp target device (i)
+      {
+	omp_set_num_teams (3 + i);
+	omp_set_teams_thread_limit (2 + i);
+	if (omp_get_max_teams () != 3 + i
+	    || omp_get_teams_thread_limit () != 2 + i)
+	  abort ();
+      }
+
+     /* omp_set_num_teams above set the value of nteams-var ICV on device 'i',
+	 which has scope 'device' and should be avaible in subsequent target
+	 regions.  */
+      #pragma omp target device (i)
+      if (omp_get_max_teams () != 3 + i
+	  || omp_get_teams_thread_limit () != 2 + i)
+	abort ();
+
+      #pragma omp target device (i)
+      #pragma omp teams
+      {
+	if (omp_get_num_teams () > 3 + i
+	    || omp_get_team_num () >= 3 + i)
+	  abort ();
+	#pragma omp parallel
+	if (omp_get_thread_limit () > 2 + i
+	    || omp_get_thread_num () >= 2 + i)
+	  abort ();
+      }
     }
 
   return 0;
diff --git a/libgomp/testsuite/libgomp.c-c++-common/icv-7.c b/libgomp/testsuite/libgomp.c-c++-common/icv-7.c
index 70a716d..bbbd6df 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/icv-7.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/icv-7.c
@@ -1,5 +1,6 @@
 /* { dg-do run } */
-/* { dg-set-target-env-var OMP_NUM_TEAMS_ALL "42" } */
+/* { dg-set-target-env-var OMP_NUM_TEAMS_ALL "7" } */
+/* { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_ALL "2" } */
 
 /* This tests the hierarchical usage of ICVs on the host and on devices, i.e. if
    OMP_NUM_TEAMS_DEV_<device_num>, OMP_NUM_TEAMS_DEV, and
@@ -9,18 +10,90 @@
 
 #include <omp.h>
 #include <stdlib.h>
+#include <string.h>
 
 int
 main ()
 {
-  if (omp_get_max_teams () != 42)
+  if ((!getenv ("OMP_NUM_TEAMS") && omp_get_max_teams () != 7)
+      || (!getenv ("OMP_TEAMS_THREAD_LIMIT") && omp_get_teams_thread_limit () != 2))
     abort ();
 
+  #pragma omp teams
+  if ((!getenv ("OMP_NUM_TEAMS"))
+      && (omp_get_num_teams () > 7 || omp_get_team_num () >= 7))
+    abort ();
+
+  omp_set_num_teams (9);
+  omp_set_teams_thread_limit (3);
+  if (omp_get_max_teams () != 9
+      || omp_get_teams_thread_limit () != 3)
+    abort ();
+
+  #pragma omp teams
+  if (omp_get_num_teams () > 9
+      || omp_get_team_num () >= 9)
+    abort ();
+
+  #pragma omp teams num_teams(5)
+  if (omp_get_num_teams () > 5
+      || omp_get_team_num () >= 5)
+    abort ();
+
+  if (getenv ("OMP_NUM_TEAMS_DEV") || getenv ("OMP_TEAMS_THREAD_LIMIT_DEV"))
+    return 0;
+
   int num_devices = omp_get_num_devices () > 3 ? 3 : omp_get_num_devices ();
-  for (int i=0; i < num_devices; i++)
-    #pragma omp target device (i)
-      if (omp_get_max_teams () != 42)
+  for (int i = 0; i < num_devices; i++)
+    {
+      char nteams[sizeof ("OMP_NUM_TEAMS_DEV_1")];
+      strcpy (nteams, "OMP_NUM_TEAMS_DEV_1");
+      nteams[sizeof ("OMP_NUM_TEAMS_DEV_1") - 2] = '0' + i;
+      char teams_thread_limit[sizeof ("OMP_TEAMS_THREAD_LIMIT_DEV_1")];
+      strcpy (teams_thread_limit, "OMP_TEAMS_THREAD_LIMIT_DEV_1");
+      teams_thread_limit[sizeof ("OMP_TEAMS_THREAD_LIMIT_DEV_1") - 2] = '0' + i;
+      if (getenv (nteams) || getenv (teams_thread_limit))
+	continue;
+
+      #pragma omp target device (i)
+      if (omp_get_max_teams () != 7
+	  || omp_get_teams_thread_limit () != 2)
+	abort ();
+
+      #pragma omp target device (i)
+      #pragma omp teams
+      if (omp_get_num_teams () > 7
+	  || omp_get_team_num () >= 7)
+	abort ();
+
+      #pragma omp target device (i)
+      {
+	omp_set_num_teams (8 + i);
+	omp_set_teams_thread_limit (4 + i);
+	if (omp_get_max_teams () != 8 + i
+	    || omp_get_teams_thread_limit () != 4 + i)
+	  abort ();
+      }
+
+     /* omp_set_num_teams above set the value of nteams-var ICV on device 'i',
+	 which has scope 'device' and should be avaible in subsequent target
+	 regions.  */
+      #pragma omp target device (i)
+      if (omp_get_max_teams () != 8 + i
+	  || omp_get_teams_thread_limit () != 4 + i)
+	abort ();
+
+      #pragma omp target device (i)
+      #pragma omp teams
+      if (omp_get_num_teams () > 8 + i
+	  || omp_get_team_num () >= 8 + i)
+	abort ();
+
+      #pragma omp target device (i)
+      #pragma omp teams num_teams(5 + i)
+      if (omp_get_num_teams () != 5 + i)
 	abort ();
+    }
 
   return 0;
 }
diff --git a/libgomp/testsuite/libgomp.c-c++-common/icv-9.c b/libgomp/testsuite/libgomp.c-c++-common/icv-9.c
new file mode 100644
index 0000000..21b874f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/icv-9.c
@@ -0,0 +1,72 @@
+/* { dg-do run } */
+
+/* This tests usage of ICVs on the host and on devices if no corresponding
+   environment variables are configured.  */
+
+#include <omp.h>
+#include <stdlib.h>
+
+int
+main ()
+{
+  if (omp_get_max_teams () != 0
+      || omp_get_teams_thread_limit () != 0)
+    abort ();
+
+  omp_set_num_teams (9);
+  omp_set_teams_thread_limit (2);
+  if (omp_get_max_teams () != 9
+      || omp_get_teams_thread_limit () != 2)
+    abort ();
+
+  #pragma omp teams
+  if (omp_get_num_teams () > 9
+      || omp_get_team_num () >= 9)
+    abort ();
+
+  #pragma omp teams num_teams(5)
+  if (omp_get_num_teams () > 5
+      || omp_get_team_num () >= 5)
+    abort ();
+
+  int num_devices = omp_get_num_devices () > 3 ? 3 : omp_get_num_devices ();
+  for (int i = 0; i < num_devices; i++)
+    {
+      #pragma omp target device (i)
+      if (omp_get_max_teams () != 0
+	  || omp_get_teams_thread_limit () != 0)
+	abort ();
+
+      #pragma omp target device (i)
+      {
+	omp_set_num_teams (8 + i);
+	omp_set_teams_thread_limit (3 + i);
+	if (omp_get_max_teams () != 8 + i
+	    || omp_get_teams_thread_limit () != 3 + i)
+	  abort ();
+      }
+
+     /* omp_set_num_teams above set the value of nteams-var ICV on device 'i',
+	 which has scope 'device' and should be avaible in subsequent target
+	 regions.  */
+      #pragma omp target device (i)
+      if (omp_get_max_teams () != 8 + i
+	  || omp_get_teams_thread_limit () != 3 + i)
+	abort ();
+
+      #pragma omp target device (i)
+      #pragma omp teams
+      if (omp_get_num_teams () > 8 + i
+	  || omp_get_team_num () >= 8 + i)
+	abort ();
+
+      /* NUM_TEAMS clause has priority over previously set NUM_TEAMS value.  */
+      #pragma omp target device (i)
+      #pragma omp teams num_teams(5 + i)
+      if (omp_get_num_teams () > 5 + i
+	  || omp_get_team_num () >= 5 + i)
+	abort ();
+    }
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.fortran/icv-5.f90 b/libgomp/testsuite/libgomp.fortran/icv-5.f90
new file mode 100644
index 0000000..05a35fa
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/icv-5.f90
@@ -0,0 +1,226 @@
+! { dg-set-target-env-var OMP_NUM_TEAMS_ALL "3" }
+! { dg-set-target-env-var OMP_NUM_TEAMS_DEV "4" }
+! { dg-set-target-env-var OMP_NUM_TEAMS "5" }
+! { dg-set-target-env-var OMP_NUM_TEAMS_DEV_0 "6" }
+! { dg-set-target-env-var OMP_NUM_TEAMS_DEV_1 "7" }
+! { dg-set-target-env-var OMP_NUM_TEAMS_DEV_2 "8" }
+! { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_ALL "2" }
+! { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_DEV "3" }
+! { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT "4" }
+! { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_DEV_0 "5" }
+! { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_DEV_1 "6" }
+! { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_DEV_2 "7" }
+
+use omp_lib
+implicit none (type, external)
+  integer :: num_devices, i, large_num_teams, large_threads_limit
+  logical :: err
+
+  if (omp_get_num_devices () > 3) then
+    num_devices = 3
+  else
+    num_devices = omp_get_num_devices ()
+  end if
+
+  do i=0,num_devices-1
+
+    ! Testing NUM_TEAMS.
+    if (env_is_set_dev ("OMP_NUM_TEAMS_DEV_", i, 6 + i)) then
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_max_teams () /= 6 + i) err = .true.
+      !$omp end target
+      if (err) stop 1
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      if (omp_get_num_teams () > 6 + i .or. omp_get_team_num () >= 6 + i) &
+        err = .true.
+      !$omp end teams
+      !$omp end target
+      if (err) stop 2
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      call omp_set_num_teams (5 + i)
+      if (omp_get_max_teams () /= 5 + i) err = .true.
+      !$omp end target
+      if (err) stop 3
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_max_teams () /= 5 + i) err = .true.
+      !$omp end target
+      if (err) stop 4
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      if (omp_get_num_teams () > 5 + i .or. omp_get_team_num () >= 5 + i) &
+        err = .true.
+      !$omp end teams
+      !$omp end target
+      if (err) stop 5
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams num_teams(6 + i)
+      if (omp_get_num_teams () > 6 + i .or. omp_get_team_num () >= 6 + i) &
+        err = .true.
+      !$omp end teams
+      !$omp end target
+      if (err) stop 6
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams num_teams(4 + i)
+      if (omp_get_num_teams () > 4 + i .or. omp_get_team_num () >= 4 + i) &
+        err = .true.
+      !$omp end teams
+      !$omp end target
+      if (err) stop 7
+
+      large_num_teams = 66000
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      call omp_set_num_teams (large_num_teams + i)
+      if (omp_get_max_teams () /= large_num_teams + i) err = .true.
+      !$omp end target
+      if (err) stop 8
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_max_teams () /= large_num_teams + i) err = .true.
+      !$omp end target
+      if (err) stop 9
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      if (omp_get_num_teams () > large_num_teams + i &
+          .or. omp_get_team_num () >= large_num_teams + i) err = .true.
+      !$omp end teams
+      !$omp end target
+      if (err) stop 10
+    end if
+
+    ! Testing TEAMS-THREAD-LIMIT
+    if (env_is_set_dev ("OMP_TEAMS_THREAD_LIMIT_DEV_", i, 5 + i)) then
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_teams_thread_limit () /= 5 + i) err = .true.
+      !$omp end target
+      if (err) stop 11
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      !$omp parallel
+      if (omp_get_thread_limit () > 5 + i .or. omp_get_thread_num () >= 5 + i) &
+        err = .true.
+      !$omp end parallel
+      !$omp end teams
+      !$omp end target
+      if (err) stop 12
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      call omp_set_teams_thread_limit (4 + i)
+      if (omp_get_teams_thread_limit () /= 4 + i) err = .true.
+      !$omp end target
+      if (err) stop 13
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_teams_thread_limit () /= 4 + i) err = .true.
+      !$omp end target
+      if (err) stop 14
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      !$omp parallel
+      if (omp_get_thread_limit () > 4 + i .or. omp_get_thread_num () >= 4 + i) &
+        err = .true.
+      !$omp end parallel
+      !$omp end teams
+      !$omp end target
+      if (err) stop 15
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams thread_limit(5 + i)
+      !$omp parallel
+      if (omp_get_thread_limit () > 5 + i .or. omp_get_thread_num () >= 5 + i) &
+        err = .true.
+      !$omp end parallel
+      !$omp end teams
+      !$omp end target
+      if (err) stop 16
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams thread_limit(3 + i)
+      !$omp parallel
+      if (omp_get_thread_limit () > 3 + i .or. omp_get_thread_num () >= 3 + i) &
+        err = .true.
+      !$omp end parallel
+      !$omp end teams
+      !$omp end target
+      if (err) stop 17
+
+      large_threads_limit = 67000
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      call omp_set_teams_thread_limit (large_threads_limit + i)
+      if (omp_get_teams_thread_limit () /= large_threads_limit + i) err = .true.
+      !$omp end target
+      if (err) stop 18
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_teams_thread_limit () /= large_threads_limit + i) err = .true.
+      !$omp end target
+      if (err) stop 19
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      !$omp parallel
+      if (omp_get_thread_limit () > large_threads_limit + i &
+          .or. omp_get_thread_num () >= large_threads_limit + i) err = .true.
+      !$omp end parallel
+      !$omp end teams
+      !$omp end target
+      if (err) stop 20
+    end if
+
+  end do
+
+contains
+  logical function env_is_set (name, val)
+    character(len=*) :: name, val
+    character(len=40) :: val2
+    integer :: stat
+    call get_environment_variable (name, val2, status=stat)
+    if (stat == 0) then
+      if (val == val2) then
+        env_is_set = .true.
+        return
+      end if
+    else if (stat /= 1) then
+      error stop 30
+    endif
+    env_is_set = .false.
+  end
+  logical function env_is_set_dev (name, dev_num, val)
+    character(len=*) :: name
+    integer :: dev_num, val
+    character(len=64) :: dev_num_str, env_var, val_str
+    dev_num_str = ADJUSTL(dev_num_str)
+    env_var = name // dev_num_str
+    val_str = ADJUSTL(val_str)
+    env_is_set_dev = env_is_set (TRIM(env_var), TRIM(val_str))
+  end
+end
diff --git a/libgomp/testsuite/libgomp.fortran/icv-6.f90 b/libgomp/testsuite/libgomp.fortran/icv-6.f90
new file mode 100644
index 0000000..c8e6a0d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/icv-6.f90
@@ -0,0 +1,140 @@
+! { dg-set-target-env-var OMP_NUM_TEAMS_ALL "3" }
+! { dg-set-target-env-var OMP_NUM_TEAMS_DEV "4" }
+! { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_ALL "2" }
+! { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_DEV "3" }
+
+! This test considers the hierarchical usage of ICVs on the device, i.e. if
+! e.g. OMP_NUM_TEAMS_DEV_<device_num> is not configured, then the value of
+! OMP_NUM_TEAMS_DEV should be used for the targets.
+
+use omp_lib
+implicit none (type, external)
+  integer :: num_devices, i, stat, tmp
+  logical :: err
+  character(len=40) :: val
+
+  ! The following environment variables should not be set.
+  call get_environment_variable ("OMP_NUM_TEAMS_DEV_0", val, status=stat)
+  if (stat /= 1) return
+  call get_environment_variable ("OMP_NUM_TEAMS_DEV_1", val, status=stat)
+  if (stat /= 1) return
+  call get_environment_variable ("OMP_NUM_TEAMS_DEV_2", val, status=stat)
+  if (stat /= 1) return
+  call get_environment_variable ("OMP_TEAMS_THREAD_LIMIT_DEV_0", val, status=stat)
+  if (stat /= 1) return
+  call get_environment_variable ("OMP_TEAMS_THREAD_LIMIT_DEV_1", val, status=stat)
+  if (stat /= 1) return
+  call get_environment_variable ("OMP_TEAMS_THREAD_LIMIT_DEV_2", val, status=stat)
+  if (stat /= 1) return
+
+  if (omp_get_num_devices () > 3) then
+    num_devices = 3
+  else
+    num_devices = omp_get_num_devices ()
+  end if
+
+  do i=0,num_devices-1
+
+    ! Testing NUM_TEAMS.
+    if (env_is_set ("OMP_NUM_TEAMS_DEV", "4")) then
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_max_teams () /= 4) err = .true.
+      !$omp end target
+      if (err) stop 1
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      if (omp_get_num_teams () > 4 .or. omp_get_team_num () >= 4) &
+        err = .true.
+      !$omp end teams
+      !$omp end target
+      if (err) stop 2
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      call omp_set_num_teams (3 + i)
+      if (omp_get_max_teams () /= 3 + i) err = .true.
+      !$omp end target
+      if (err) stop 3
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_max_teams () /= 3 + i) err = .true.
+      !$omp end target
+      if (err) stop 4
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      if (omp_get_num_teams () > 3 + i .or. omp_get_team_num () >= 3 + i) &
+        err = .true.
+      !$omp end teams
+      !$omp end target
+      if (err) stop 5
+    end if
+
+    ! Testing TEAMS-THREAD-LIMIT
+    if (env_is_set ("OMP_TEAMS_THREAD_LIMIT_DEV", "3")) then
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_teams_thread_limit () /= 3) err = .true.
+      !$omp end target
+      if (err) stop 6
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      !$omp parallel
+      if (omp_get_thread_limit () > 3 .or. omp_get_thread_num () >= 3) &
+        err = .true.
+      !$omp end parallel
+      !$omp end teams
+      !$omp end target
+      if (err) stop 7
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      call omp_set_teams_thread_limit (2 + i)
+      if (omp_get_teams_thread_limit () /= 2 + i) err = .true.
+      !$omp end target
+      if (err) stop 8
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_teams_thread_limit () /= 2 + i) err = .true.
+      !$omp end target
+      if (err) stop 9
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      !$omp parallel
+      if (omp_get_thread_limit () > 2 + i .or. omp_get_thread_num () >= 2 + i) &
+        err = .true.
+      !$omp end parallel
+      !$omp end teams
+      !$omp end target
+      if (err) stop 10
+    end if
+
+  end do
+
+contains
+  logical function env_is_set (name, val)
+    character(len=*) :: name, val
+    character(len=40) :: val2
+    integer :: stat
+    call get_environment_variable (name, val2, status=stat)
+    if (stat == 0) then
+      if (val == val2) then
+        env_is_set = .true.
+        return
+      end if
+    else if (stat /= 1) then
+      error stop 10
+    endif
+    env_is_set = .false.
+  end
+end

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Patch] OpenMP, libgomp, gimple: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices
  2022-09-18  8:24     ` Marcel Vollweiler
@ 2022-09-30  9:35       ` Jakub Jelinek
  2022-11-24 14:09         ` Marcel Vollweiler
  0 siblings, 1 reply; 7+ messages in thread
From: Jakub Jelinek @ 2022-09-30  9:35 UTC (permalink / raw)
  To: Marcel Vollweiler; +Cc: gcc-patches

On Sun, Sep 18, 2022 at 10:24:43AM +0200, Marcel Vollweiler wrote:
> gcc/ChangeLog:
> 
> 	* gimplify.cc (optimize_target_teams): Set initial num_teams_upper
> 	to "-2" instead of "1" for non-existing num_teams clause in order to
> 	disambiguate from the case of an existing num_teams clause with value 1.
> 
> libgomp/ChangeLog:
> 
> 	* config/gcn/icv-device.c (omp_get_teams_thread_limit): Added to
> 	allow processing of device-specific values.
> 	(omp_set_teams_thread_limit): Likewise.
> 	(ialias): Likewise.
> 	* config/nvptx/icv-device.c (omp_get_teams_thread_limit): Likewise.
> 	(omp_set_teams_thread_limit): Likewise.
> 	(ialias): Likewise.
> 	* icv-device.c (omp_get_teams_thread_limit): Likewise.
> 	(ialias): Likewise.
> 	(omp_set_teams_thread_limit): Likewise.
> 	* icv.c (omp_set_teams_thread_limit): Removed.
> 	(omp_get_teams_thread_limit): Likewise.
> 	(ialias): Likewise.
> 	* target.c (get_gomp_offload_icvs): Added teams_thread_limit_var
> 	handling.
> 	(gomp_load_image_to_device): Added a size check for the ICVs struct
> 	variable.
> 	(gomp_copy_back_icvs): New function that is used in GOMP_target_ext to
> 	copy back the ICV values from device to host.
> 	(GOMP_target_ext): Update the number of teams and threads in the kernel
> 	args also considering device-specific values.
> 	* testsuite/libgomp.c-c++-common/icv-4.c: Bugfix.

Better say what exactly you changed in words.

> 	* testsuite/libgomp.c-c++-common/icv-5.c: Extended.
> 	* testsuite/libgomp.c-c++-common/icv-6.c: Extended.
> 	* testsuite/libgomp.c-c++-common/icv-7.c: Extended.
> 	* testsuite/libgomp.c-c++-common/icv-9.c: New test.
> 	* testsuite/libgomp.fortran/icv-5.f90: New test.
> 	* testsuite/libgomp.fortran/icv-6.f90: New test.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* c-c++-common/gomp/target-teams-1.c: Adapt expected values for
> 	num_teams from "1" to "-2" in cases without num_teams clause.
> 	* g++.dg/gomp/target-teams-1.C: Likewise.
> 	* gfortran.dg/gomp/defaultmap-4.f90: Likewise.
> 	* gfortran.dg/gomp/defaultmap-5.f90: Likewise.
> 	* gfortran.dg/gomp/defaultmap-6.f90: Likewise.

> --- a/gcc/gimplify.cc
> +++ b/gcc/gimplify.cc
> @@ -14153,7 +14153,7 @@ optimize_target_teams (tree target, gimple_seq *pre_p)
>    struct gimplify_omp_ctx *target_ctx = gimplify_omp_ctxp;
>  
>    if (teams == NULL_TREE)
> -    num_teams_upper = integer_one_node;
> +    num_teams_upper = build_int_cst (integer_type_node, -2);
>    else
>      for (c = OMP_TEAMS_CLAUSES (teams); c; c = OMP_CLAUSE_CHAIN (c))
>        {

The function comment above optimize_target_teams contains detailed
description on what the values mean and why, so it definitely should
document what -2 means and when it is used.
I know you have documentation in libgomp for it, but it should be in both
places.

> +  intptr_t new_teams = orig_teams, new_threads = orig_threads;
> +  /* ORIG_TEAMS == -2: No explicit teams construct specified. Set to 1.

Two spaces after .

> +     ORIG_TEAMS == -1: TEAMS construct with NUM_TEAMS clause specified, but the
> +		       value could not be specified. No Change.

Likewise.
lowercase change ?

> +     ORIG_TEAMS == 0: TEAMS construct without NUM_TEAMS clause.
> +		      Set device-specific value.
> +     ORIG_TEAMS > 0: Value was already set through e.g. NUM_TEAMS clause.
> +		     No change.  */
> +  if (orig_teams == -2)
> +    new_teams = 1;
> +  else if (orig_teams == 0)
> +    {
> +      struct gomp_offload_icv_list *item = gomp_get_offload_icv_item (device);
> +      if (item != NULL)
> +	new_teams = item->icvs.nteams;
> +    }
> +  /* The device-specific teams-thread-limit is only set if (a) an explicit TEAMS
> +     region exists, i.e. ORIG_TEAMS > -2, and (b) THREADS was not already set by
> +     e.g. a THREAD_LIMIT clause.  */
> +  if (orig_teams >= -2 && orig_threads == 0)

The comment talks about ORIG_TEAMS > -2, but the condition is >= -2.
So which one is it?

> +      /* This tests a large number of teams and threads. If it is larger than
> +	 2^15+1 then the according argument in the kernels arguments list
> +	 is encoded with two items instead of one. On NVIDIA there is an
> +	 adjustment for too large teams and threads. For AMD such adjustment
> +	 exists only for threads and will cause runtime errors with a two large

s/two/too/ ?
Shouldn't amdgcn adjusts also number of teams?

As for testcases, have you tested this in a native setup where dg-set-target-env-var
actually works?

	Jakub


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Patch] OpenMP, libgomp, gimple: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices
  2022-09-30  9:35       ` Jakub Jelinek
@ 2022-11-24 14:09         ` Marcel Vollweiler
  2022-12-05 13:50           ` Jakub Jelinek
  0 siblings, 1 reply; 7+ messages in thread
From: Marcel Vollweiler @ 2022-11-24 14:09 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 4985 bytes --]

Hi Jakub,

> >     * testsuite/libgomp.c-c++-common/icv-4.c: Bugfix.
>
> Better say what exactly you changed in words.

Changed.

> > --- a/gcc/gimplify.cc
> > +++ b/gcc/gimplify.cc
> > @@ -14153,7 +14153,7 @@ optimize_target_teams (tree target, gimple_seq
> *pre_p)
> >    struct gimplify_omp_ctx *target_ctx = gimplify_omp_ctxp;
> >
> >    if (teams == NULL_TREE)
> > -    num_teams_upper = integer_one_node;
> > +    num_teams_upper = build_int_cst (integer_type_node, -2);
> >    else
> >      for (c = OMP_TEAMS_CLAUSES (teams); c; c = OMP_CLAUSE_CHAIN (c))
> >        {
>
> The function comment above optimize_target_teams contains detailed description
> on what the values mean and why, so it definitely should document what -2 means
> and when it is used.
> I know you have documentation in libgomp for it, but it should be in both places.

I updated the comment with an explanation for "-2".

>
> > +  intptr_t new_teams = orig_teams, new_threads = orig_threads;
> > +  /* ORIG_TEAMS == -2: No explicit teams construct specified. Set to 1.
>
> Two spaces after .

Corrected here and at other places.

>
> > +     ORIG_TEAMS == -1: TEAMS construct with NUM_TEAMS clause specified, but
> the
> > +                  value could not be specified. No Change.
>
> Likewise.
> lowercase change ?

Corrected.

>
> > +     ORIG_TEAMS == 0: TEAMS construct without NUM_TEAMS clause.
> > +                 Set device-specific value.
> > +     ORIG_TEAMS > 0: Value was already set through e.g. NUM_TEAMS clause.
> > +                No change.  */
> > +  if (orig_teams == -2)
> > +    new_teams = 1;
> > +  else if (orig_teams == 0)
> > +    {
> > +      struct gomp_offload_icv_list *item = gomp_get_offload_icv_item (device);
> > +      if (item != NULL)
> > +   new_teams = item->icvs.nteams;
> > +    }
> > +  /* The device-specific teams-thread-limit is only set if (a) an explicit TEAMS
> > +     region exists, i.e. ORIG_TEAMS > -2, and (b) THREADS was not already set by
> > +     e.g. a THREAD_LIMIT clause.  */
> > +  if (orig_teams >= -2 && orig_threads == 0)
>
> The comment talks about ORIG_TEAMS > -2, but the condition is >= -2.
> So which one is it?

Thanks for the hint. It should be indeed "> -2" since teams_thread_limit "sets
the maximum number of OpenMP threads to use in each contention group created by
a teams construct" (OpenMP 5.2, section 21.6.2). So if there is no (explicit)
teams construct, then teams_thread_limit doesn't need to be copied to the device.

>
> > +      /* This tests a large number of teams and threads. If it is larger than
> > +    2^15+1 then the according argument in the kernels arguments list
> > +    is encoded with two items instead of one. On NVIDIA there is an
> > +    adjustment for too large teams and threads. For AMD such adjustment
> > +    exists only for threads and will cause runtime errors with a two
> > +large
>
> s/two/too/ ?
> Shouldn't amdgcn adjusts also number of teams?

I adjusted now also the number of teams in the amdgcn plugin. As upper bound I
chose two times the number of compute units. This seems to be sufficient when
one team is executed at one compute unit. This at least avoids the queueing of a
large amount of teams and the corresponding memory allocation.

The drawback is that a user is probably not aware of the actual number of
compute units (which is not very large on gfx cards, e.g. 120 for gfx908 and 104
for gfx90a) and thus maybe expects different values from omp_get_team_num(). For
instance in something like the following:

#pragma omp target
#pragma omp teams num_teams(220)
#pragma omp distribute parallel for
   for(int i = 0; i < 220; ++i)
     {
#pragma omp critical
        ... omp_get_team_num () ...
     }

On a gfx90a card with 104 compute units 12 threads are assigned to "reused"
teams (instead of having their own teams) that would not be the case without the
limit.

Alternatively, we could just define some (larger) constant number (though I
don't know a reasonable value here). But this does actually not solve the above
mentioned drawback. I think, we need to find a compromise between an
unneccessary small upper bound and the chance to get memory allocation failures
due to a too large number of teams.

>
> As for testcases, have you tested this in a native setup where dg-set-target-env-var
> actually works?

Besides remote testing with offloading (which does not yet work with
dg-set-target-env-var), I also tested locally on x86_64-pc-linux-gnu with one
nvptx offload device without issues (using "make check" and verifying that
offloading is indeed used).

Marcel
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Attachment #2: teams_and_threads_on_device.diff --]
[-- Type: text/plain, Size: 51281 bytes --]

This patch adds support for omp_get_max_teams, omp_set_num_teams, and
omp_{gs}et_teams_thread_limit on offload devices. That includes the usage of
device-specific ICV values (specified as environment variables or changed on a
device). In order to reuse device-specific ICV values, a copy back mechanism is
implemented that copies ICV values back from device to the host.

Additionally, a limitation of the number of teams on gcn offload devices is
implemented.  The number of teams is limited by twice the number of compute
units (one team is executed on one compute unit).  This avoids queueing
unnessecary many teams and a corresponding allocation of large amounts of
memory.  Without that limitation the memory allocation for a large number of
user-specified teams can result in an "memory access fault".
A limitation of the number of teams is already also implemented for nvptx
devices (see nvptx_adjust_launch_bounds in libgomp/plugin/plugin-nvptx.c).

gcc/ChangeLog:

	* gimplify.cc (optimize_target_teams): Set initial num_teams_upper
	to "-2" instead of "1" for non-existing num_teams clause in order to
	disambiguate from the case of an existing num_teams clause with value 1.

libgomp/ChangeLog:

	* config/gcn/icv-device.c (omp_get_teams_thread_limit): Added to
	allow processing of device-specific values.
	(omp_set_teams_thread_limit): Likewise.
	(ialias): Likewise.
	* config/nvptx/icv-device.c (omp_get_teams_thread_limit): Likewise.
	(omp_set_teams_thread_limit): Likewise.
	(ialias): Likewise.
	* icv-device.c (omp_get_teams_thread_limit): Likewise.
	(ialias): Likewise.
	(omp_set_teams_thread_limit): Likewise.
	* icv.c (omp_set_teams_thread_limit): Removed.
	(omp_get_teams_thread_limit): Likewise.
	(ialias): Likewise.
	* libgomp.texi: Updated documentation for nvptx and gcn corresponding
	to the limitation of the number of teams.
	* plugin/plugin-gcn.c (limit_teams): New helper function that limits
	the number of teams by twice the number of compute units.
	(parse_target_attributes): Limit the number of teams on gcn offload
	devices.
	* target.c (get_gomp_offload_icvs): Added teams_thread_limit_var
	handling.
	(gomp_load_image_to_device): Added a size check for the ICVs struct
	variable.
	(gomp_copy_back_icvs): New function that is used in GOMP_target_ext to
	copy back the ICV values from device to host.
	(GOMP_target_ext): Update the number of teams and threads in the kernel
	args also considering device-specific values.
	* testsuite/libgomp.c-c++-common/icv-4.c: Fixed an error in the reading
	of OMP_TEAMS_THREAD_LIMIT from the environment.
	* testsuite/libgomp.c-c++-common/icv-5.c: Extended.
	* testsuite/libgomp.c-c++-common/icv-6.c: Extended.
	* testsuite/libgomp.c-c++-common/icv-7.c: Extended.
	* testsuite/libgomp.c-c++-common/icv-9.c: New test.
	* testsuite/libgomp.fortran/icv-5.f90: New test.
	* testsuite/libgomp.fortran/icv-6.f90: New test.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/target-teams-1.c: Adapt expected values for
	num_teams from "1" to "-2" in cases without num_teams clause.
	* g++.dg/gomp/target-teams-1.C: Likewise.
	* gfortran.dg/gomp/defaultmap-4.f90: Likewise.
	* gfortran.dg/gomp/defaultmap-5.f90: Likewise.
	* gfortran.dg/gomp/defaultmap-6.f90: Likewise.

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index f06ce3c..e25854c 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -15225,6 +15225,7 @@ computable_teams_clause (tree *tp, int *walk_subtrees, void *)
    0 stands for clause not specified at all, use implementation default
    -1 stands for value that can't be determined easily before entering
       the target construct.
+   -2 means that no explicit teams construct was specified
    If teams construct is not present at all, use 1 for num_teams
    and 0 for thread_limit (only one team is involved, and the thread
    limit is implementation defined.  */
@@ -15243,7 +15244,7 @@ optimize_target_teams (tree target, gimple_seq *pre_p)
   struct gimplify_omp_ctx *target_ctx = gimplify_omp_ctxp;
 
   if (teams == NULL_TREE)
-    num_teams_upper = integer_one_node;
+    num_teams_upper = build_int_cst (integer_type_node, -2);
   else
     for (c = OMP_TEAMS_CLAUSES (teams); c; c = OMP_CLAUSE_CHAIN (c))
       {
diff --git a/gcc/testsuite/c-c++-common/gomp/target-teams-1.c b/gcc/testsuite/c-c++-common/gomp/target-teams-1.c
index 51b8d48..74d60e1 100644
--- a/gcc/testsuite/c-c++-common/gomp/target-teams-1.c
+++ b/gcc/testsuite/c-c++-common/gomp/target-teams-1.c
@@ -81,5 +81,5 @@ foo (int a, int b, long c, long d)
 /* { dg-final { scan-tree-dump-times "thread_limit\\(-1\\)" 3 "gimple" } } */
 /* { dg-final { scan-tree-dump-times "num_teams\\(0\\)" 4 "gimple" } } */
 /* { dg-final { scan-tree-dump-times "thread_limit\\(0\\)" 6 "gimple" } } */
-/* { dg-final { scan-tree-dump-times "num_teams\\(1\\)" 2 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "num_teams\\(-2\\)" 2 "gimple" } } */
 /* { dg-final { scan-tree-dump-times "thread_limit\\(1\\)" 0 "gimple" } } */
diff --git a/gcc/testsuite/g++.dg/gomp/target-teams-1.C b/gcc/testsuite/g++.dg/gomp/target-teams-1.C
index f78a608..29e5597 100644
--- a/gcc/testsuite/g++.dg/gomp/target-teams-1.C
+++ b/gcc/testsuite/g++.dg/gomp/target-teams-1.C
@@ -88,5 +88,5 @@ foo (int a, int b, long c, long d)
 /* { dg-final { scan-tree-dump-times "thread_limit\\(-1\\)" 3 "gimple" } } */
 /* { dg-final { scan-tree-dump-times "num_teams\\(0\\)" 4 "gimple" } } */
 /* { dg-final { scan-tree-dump-times "thread_limit\\(0\\)" 6 "gimple" } } */
-/* { dg-final { scan-tree-dump-times "num_teams\\(1\\)" 2 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "num_teams\\(-2\\)" 2 "gimple" } } */
 /* { dg-final { scan-tree-dump-times "thread_limit\\(1\\)" 0 "gimple" } } */
diff --git a/gcc/testsuite/gfortran.dg/gomp/defaultmap-4.f90 b/gcc/testsuite/gfortran.dg/gomp/defaultmap-4.f90
index 7b182b5..9081159 100644
--- a/gcc/testsuite/gfortran.dg/gomp/defaultmap-4.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/defaultmap-4.f90
@@ -141,5 +141,5 @@ end
 ! { dg-final { scan-tree-dump-times "map\\(to:\\.strxparr \\\[len:" 2 "gimple" } }
 ! { dg-final { scan-tree-dump-times "map\\(to:strxparr \\\[pointer set, len:" 2 "gimple" } }
 ! { dg-final { scan-tree-dump-times "map\\(to:\\.strxp \\\[len:" 2 "gimple" } }
-! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(1\\) thread_limit\\(0\\) defaultmap\\(alloc\\)" 1 "gimple" } }
-! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(1\\) thread_limit\\(0\\) defaultmap\\(alloc:scalar\\) defaultmap\\(to:aggregate\\) defaultmap\\(tofrom:allocatable\\) defaultmap\\(firstprivate:pointer\\)" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(-2\\) thread_limit\\(0\\) defaultmap\\(alloc\\)" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(-2\\) thread_limit\\(0\\) defaultmap\\(alloc:scalar\\) defaultmap\\(to:aggregate\\) defaultmap\\(tofrom:allocatable\\) defaultmap\\(firstprivate:pointer\\)" 1 "gimple" } }
diff --git a/gcc/testsuite/gfortran.dg/gomp/defaultmap-5.f90 b/gcc/testsuite/gfortran.dg/gomp/defaultmap-5.f90
index 1391274..91566ed 100644
--- a/gcc/testsuite/gfortran.dg/gomp/defaultmap-5.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/defaultmap-5.f90
@@ -141,5 +141,5 @@ end
 ! { dg-final { scan-tree-dump-times "map\\(to:strxparr \\\[pointer set, len:" 2 "gimple" } }
 ! { dg-final { scan-tree-dump-times "map\\(to:\\*strxp \\\[len:" 1 "gimple" } }
 ! { dg-final { scan-tree-dump-times "map\\(to:\\.strxp \\\[len:" 1 "gimple" } }
-! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(1\\) thread_limit\\(0\\) defaultmap\\(to\\)" 1 "gimple" } }
-! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(1\\) thread_limit\\(0\\) defaultmap\\(to:scalar\\) defaultmap\\(tofrom:aggregate\\) defaultmap\\(firstprivate:allocatable\\) defaultmap\\(default:pointer\\)" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(-2\\) thread_limit\\(0\\) defaultmap\\(to\\)" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(-2\\) thread_limit\\(0\\) defaultmap\\(to:scalar\\) defaultmap\\(tofrom:aggregate\\) defaultmap\\(firstprivate:allocatable\\) defaultmap\\(default:pointer\\)" 1 "gimple" } }
diff --git a/gcc/testsuite/gfortran.dg/gomp/defaultmap-6.f90 b/gcc/testsuite/gfortran.dg/gomp/defaultmap-6.f90
index 9a81d0f..867e41a 100644
--- a/gcc/testsuite/gfortran.dg/gomp/defaultmap-6.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/defaultmap-6.f90
@@ -101,4 +101,4 @@ end
 ! { dg-final { scan-tree-dump-times "map\\(to:\\.strxparr \\\[len:" 1 "gimple" } }
 ! { dg-final { scan-tree-dump-times "map\\(to:strxparr \\\[pointer set, len:" 1 "gimple" } }
 ! { dg-final { scan-tree-dump-times "map\\(to:\\.strxp \\\[len:" 1 "gimple" } }
-! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(1\\) thread_limit\\(0\\) defaultmap\\(default\\)" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "#pragma omp target num_teams\\(-2\\) thread_limit\\(0\\) defaultmap\\(default\\)" 1 "gimple" } }
diff --git a/libgomp/config/gcn/icv-device.c b/libgomp/config/gcn/icv-device.c
index bf757ba..eb68881 100644
--- a/libgomp/config/gcn/icv-device.c
+++ b/libgomp/config/gcn/icv-device.c
@@ -81,6 +81,19 @@ omp_set_num_teams (int num_teams)
     GOMP_ADDITIONAL_ICVS.nteams = num_teams;
 }
 
+int
+omp_get_teams_thread_limit (void)
+{
+  return GOMP_ADDITIONAL_ICVS.teams_thread_limit;
+}
+
+void
+omp_set_teams_thread_limit (int thread_limit)
+{
+  if (thread_limit >= 0)
+    GOMP_ADDITIONAL_ICVS.teams_thread_limit = thread_limit;
+}
+
 ialias (omp_set_default_device)
 ialias (omp_get_default_device)
 ialias (omp_get_initial_device)
@@ -89,3 +102,5 @@ ialias (omp_is_initial_device)
 ialias (omp_get_device_num)
 ialias (omp_get_max_teams)
 ialias (omp_set_num_teams)
+ialias (omp_get_teams_thread_limit)
+ialias (omp_set_teams_thread_limit)
diff --git a/libgomp/config/nvptx/icv-device.c b/libgomp/config/nvptx/icv-device.c
index eef151c..818e696 100644
--- a/libgomp/config/nvptx/icv-device.c
+++ b/libgomp/config/nvptx/icv-device.c
@@ -81,6 +81,19 @@ omp_set_num_teams (int num_teams)
     GOMP_ADDITIONAL_ICVS.nteams = num_teams;
 }
 
+int
+omp_get_teams_thread_limit (void)
+{
+  return GOMP_ADDITIONAL_ICVS.teams_thread_limit;
+}
+
+void
+omp_set_teams_thread_limit (int thread_limit)
+{
+  if (thread_limit >= 0)
+    GOMP_ADDITIONAL_ICVS.teams_thread_limit = thread_limit;
+}
+
 ialias (omp_set_default_device)
 ialias (omp_get_default_device)
 ialias (omp_get_initial_device)
@@ -89,3 +102,5 @@ ialias (omp_is_initial_device)
 ialias (omp_get_device_num)
 ialias (omp_get_max_teams)
 ialias (omp_set_num_teams)
+ialias (omp_get_teams_thread_limit)
+ialias (omp_set_teams_thread_limit)
diff --git a/libgomp/icv-device.c b/libgomp/icv-device.c
index d8acf0e..48607ce 100644
--- a/libgomp/icv-device.c
+++ b/libgomp/icv-device.c
@@ -97,3 +97,20 @@ omp_set_num_teams (int num_teams)
 }
 
 ialias (omp_set_num_teams)
+
+int
+omp_get_teams_thread_limit (void)
+{
+  return gomp_teams_thread_limit_var;
+}
+
+ialias (omp_get_teams_thread_limit)
+
+void
+omp_set_teams_thread_limit (int thread_limit)
+{
+  if (thread_limit >= 0)
+    gomp_teams_thread_limit_var = thread_limit;
+}
+
+ialias (omp_set_teams_thread_limit)
diff --git a/libgomp/icv.c b/libgomp/icv.c
index df423c0..9aef91c 100644
--- a/libgomp/icv.c
+++ b/libgomp/icv.c
@@ -148,19 +148,6 @@ omp_get_supported_active_levels (void)
   return gomp_supported_active_levels;
 }
 
-void
-omp_set_teams_thread_limit (int thread_limit)
-{
-  if (thread_limit >= 0)
-    gomp_teams_thread_limit_var = thread_limit;
-}
-
-int
-omp_get_teams_thread_limit (void)
-{
-  return gomp_teams_thread_limit_var;
-}
-
 int
 omp_get_cancellation (void)
 {
@@ -261,8 +248,6 @@ ialias (omp_get_thread_limit)
 ialias (omp_set_max_active_levels)
 ialias (omp_get_max_active_levels)
 ialias (omp_get_supported_active_levels)
-ialias (omp_set_teams_thread_limit)
-ialias (omp_get_teams_thread_limit)
 ialias (omp_get_cancellation)
 ialias (omp_get_proc_bind)
 ialias (omp_get_max_task_priority)
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 10fefa9..44e1383 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -4351,7 +4351,7 @@ On the hardware side, there is the hierarchy (fine to coarse):
 @item work item (thread)
 @item wavefront
 @item work group
-@item compute unite (CU)
+@item compute unit (CU)
 @end itemize
 
 All OpenMP and OpenACC levels are used, i.e.
@@ -4366,7 +4366,8 @@ All OpenMP and OpenACC levels are used, i.e.
 The used sizes are
 @itemize
 @item Number of teams is the specified @code{num_teams} (OpenMP) or
-      @code{num_gangs} (OpenACC) or otherwise the number of CU
+      @code{num_gangs} (OpenACC) or otherwise the number of CU. It is limited
+      by two times the number of CU.
 @item Number of wavefronts is 4 for gfx900 and 16 otherwise;
       @code{num_threads} (OpenMP) and @code{num_workers} (OpenACC)
       overrides this if smaller.
@@ -4412,6 +4413,8 @@ The used sizes are
 @itemize
 @item The @code{warp_size} is always 32
 @item CUDA kernel launched: @code{dim=@{#teams,1,1@}, blocks=@{#threads,warp_size,1@}}.
+@item The number of teams is limited by the number of blocks the device can
+      host simultaneously.
 @end itemize
 
 Additional information can be obtained by setting the environment variable to
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 04b122f..9247b46 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -1163,6 +1163,18 @@ limit_worker_threads (int threads)
   return threads;
 }
 
+/* This sets the maximum number of teams to twice the number of GPU Compute
+   Units to avoid memory waste and corresponding memory access faults.  */
+
+static int
+limit_teams (int teams, struct agent_info *agent)
+{
+  int max_teams = 2 * get_cu_count (agent);
+  if (teams > max_teams)
+    teams = max_teams;
+  return teams;
+}
+
 /* Parse the target attributes INPUT provided by the compiler and return true
    if we should run anything all.  If INPUT is NULL, fill DEF with default
    values, then store INPUT or DEF into *RESULT.
@@ -1207,7 +1219,7 @@ parse_target_attributes (void **input,
 	  switch (id & GOMP_TARGET_ARG_ID_MASK)
 	    {
 	    case GOMP_TARGET_ARG_NUM_TEAMS:
-	      gcn_teams = val;
+	      gcn_teams = limit_teams (val, agent);
 	      break;
 	    case GOMP_TARGET_ARG_THREAD_LIMIT:
 	      gcn_threads = limit_worker_threads (val);
diff --git a/libgomp/target.c b/libgomp/target.c
index 189b201..06809db 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -2155,6 +2155,19 @@ get_gomp_offload_icvs (int dev_num)
     new->icvs.nteams = gomp_default_icv_values.nteams_var;
 
   if (dev_x != NULL
+      && gomp_get_icv_flag (dev_x->flags, GOMP_ICV_TEAMS_THREAD_LIMIT))
+    new->icvs.teams_thread_limit = dev_x->icvs.teams_thread_limit_var;
+  else if (dev != NULL
+	   && gomp_get_icv_flag (dev->flags, GOMP_ICV_TEAMS_THREAD_LIMIT))
+    new->icvs.teams_thread_limit = dev->icvs.teams_thread_limit_var;
+  else if (all != NULL
+	   && gomp_get_icv_flag (all->flags, GOMP_ICV_TEAMS_THREAD_LIMIT))
+    new->icvs.teams_thread_limit = all->icvs.teams_thread_limit_var;
+  else
+    new->icvs.teams_thread_limit
+      = gomp_default_icv_values.teams_thread_limit_var;
+
+  if (dev_x != NULL
       && gomp_get_icv_flag (dev_x->flags, GOMP_ICV_DEFAULT_DEVICE))
     new->icvs.default_device = dev_x->icvs.default_device_var;
   else if (dev != NULL
@@ -2290,7 +2303,14 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
 	  int dev_num = (int) (devicep - &devices[0]);
 	  struct gomp_offload_icvs *icvs = get_gomp_offload_icvs (dev_num);
 	  size_t var_size = var->end - var->start;
-
+	  if (var_size != sizeof (struct gomp_offload_icvs))
+	    {
+	      gomp_mutex_unlock (&devicep->lock);
+	      if (is_register_lock)
+		gomp_mutex_unlock (&register_lock);
+	      gomp_fatal ("offload plugin managed 'icv struct' not of expected "
+			  "format");
+	    }
 	  /* Copy the ICVs variable to place on device memory, hereby
 	     actually designating its device number into effect.  */
 	  gomp_copy_host2dev (devicep, NULL, (void *) var->start, icvs,
@@ -2769,6 +2789,20 @@ clear_unsupported_flags (struct gomp_device_descr *devicep, unsigned int flags)
   return flags;
 }
 
+static void
+gomp_copy_back_icvs (struct gomp_device_descr *devicep, int device)
+{
+  struct gomp_offload_icv_list *item = gomp_get_offload_icv_item (device);
+  if (item == NULL)
+    return;
+
+  void *host_ptr = &item->icvs;
+  void *dev_ptr = omp_get_mapped_ptr (host_ptr, device);
+  if (dev_ptr != NULL)
+    gomp_copy_dev2host (devicep, NULL, host_ptr, dev_ptr,
+			sizeof (struct gomp_offload_icvs));
+}
+
 /* Like GOMP_target, but KINDS is 16-bit, UNUSED is no longer present,
    and several arguments have been added:
    FLAGS is a bitmask, see GOMP_TARGET_FLAG_* in gomp-constants.h.
@@ -2801,6 +2835,146 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,
   size_t tgt_align = 0, tgt_size = 0;
   bool fpc_done = false;
 
+  /* Obtain the original TEAMS and THREADS values from ARGS.  */
+  intptr_t orig_teams = 1, orig_threads = 0;
+  size_t num_args = 0, len = 1, teams_len = 1, threads_len = 1;
+  void **tmpargs = args;
+  while (*tmpargs)
+    {
+      intptr_t id = (intptr_t) *tmpargs++, val;
+      if (id & GOMP_TARGET_ARG_SUBSEQUENT_PARAM)
+	{
+	  val = (intptr_t) *tmpargs++;
+	  len = 2;
+	}
+      else
+	{
+	  val = id >> GOMP_TARGET_ARG_VALUE_SHIFT;
+	  len = 1;
+	}
+      num_args += len;
+      if ((id & GOMP_TARGET_ARG_DEVICE_MASK) != GOMP_TARGET_ARG_DEVICE_ALL)
+	continue;
+      val = val > INT_MAX ? INT_MAX : val;
+      if ((id & GOMP_TARGET_ARG_ID_MASK) == GOMP_TARGET_ARG_NUM_TEAMS)
+	{
+	  orig_teams = val;
+	  teams_len = len;
+	}
+      else if ((id & GOMP_TARGET_ARG_ID_MASK) == GOMP_TARGET_ARG_THREAD_LIMIT)
+	{
+	  orig_threads = val;
+	  threads_len = len;
+	}
+    }
+
+  intptr_t new_teams = orig_teams, new_threads = orig_threads;
+  /* ORIG_TEAMS == -2: No explicit teams construct specified.  Set to 1.
+     ORIG_TEAMS == -1: TEAMS construct with NUM_TEAMS clause specified, but the
+		       value could not be determined.  No change.
+     ORIG_TEAMS == 0: TEAMS construct without NUM_TEAMS clause.
+		      Set device-specific value.
+     ORIG_TEAMS > 0: Value was already set through e.g. NUM_TEAMS clause.
+		     No change.  */
+  if (orig_teams == -2)
+    new_teams = 1;
+  else if (orig_teams == 0)
+    {
+      struct gomp_offload_icv_list *item = gomp_get_offload_icv_item (device);
+      if (item != NULL)
+	new_teams = item->icvs.nteams;
+    }
+  /* The device-specific teams-thread-limit is only set if (a) an explicit TEAMS
+     region exists, i.e. ORIG_TEAMS > -2, and (b) THREADS was not already set by
+     e.g. a THREAD_LIMIT clause.  */
+  if (orig_teams > -2 && orig_threads == 0)
+    {
+      struct gomp_offload_icv_list *item = gomp_get_offload_icv_item (device);
+      if (item != NULL)
+	new_threads = item->icvs.teams_thread_limit;
+    }
+
+  /* Copy and change the arguments list only if TEAMS or THREADS need to be
+     updated.  */
+  void **new_args = args;
+  if (orig_teams != new_teams || orig_threads != new_threads)
+    {
+      size_t tms_len = (orig_teams == new_teams
+			? teams_len
+			: (new_teams > -(1 << 15) && new_teams < (1 << 15)
+			   ? 1 : 2));
+      size_t ths_len = (orig_threads == new_threads
+			? threads_len
+			: (new_threads > -(1 << 15) && new_threads < (1 << 15)
+			   ? 1 : 2));
+      /* One additional item after the last arg must be NULL.  */
+      size_t new_args_cnt = num_args - teams_len - threads_len + tms_len
+			    + ths_len + 1;
+      new_args = (void **) gomp_alloca (new_args_cnt * sizeof (void*));
+
+      tmpargs = args;
+      void **tmp_new_args = new_args;
+      /* Copy all args except TEAMS and THREADS.  TEAMS and THREADS are copied
+	 too if they have not been changed and skipped otherwise.  */
+      while (*tmpargs)
+	{
+	  intptr_t id = (intptr_t) *tmpargs;
+	  if (((id & GOMP_TARGET_ARG_ID_MASK) == GOMP_TARGET_ARG_NUM_TEAMS
+	       && orig_teams != new_teams)
+	      || ((id & GOMP_TARGET_ARG_ID_MASK) == GOMP_TARGET_ARG_THREAD_LIMIT
+		  && orig_threads != new_threads))
+	    {
+	      tmpargs++;
+	      if (id & GOMP_TARGET_ARG_SUBSEQUENT_PARAM)
+		tmpargs++;
+	    }
+	  else
+	    {
+	      *tmp_new_args++ = *tmpargs++;
+	      if (id & GOMP_TARGET_ARG_SUBSEQUENT_PARAM)
+		*tmp_new_args++ = *tmpargs++;
+	    }
+	}
+
+      /* Add the new TEAMS arg to the new args list if it has been changed.  */
+      if (orig_teams != new_teams)
+	{
+	  intptr_t new_val = new_teams;
+	  if (tms_len == 1)
+	    {
+	      new_val = (new_val << GOMP_TARGET_ARG_VALUE_SHIFT)
+			 | GOMP_TARGET_ARG_NUM_TEAMS;
+	      *tmp_new_args++ = (void *) new_val;
+	    }
+	  else
+	    {
+	      *tmp_new_args++ = (void *) (GOMP_TARGET_ARG_SUBSEQUENT_PARAM
+					  | GOMP_TARGET_ARG_NUM_TEAMS);
+	      *tmp_new_args++ = (void *) new_val;
+	    }
+	}
+
+      /* Add the new THREADS arg to the new args list if it has been changed. */
+      if (orig_threads != new_threads)
+	{
+	  intptr_t new_val = new_threads;
+	  if (ths_len == 1)
+	    {
+	      new_val = (new_val << GOMP_TARGET_ARG_VALUE_SHIFT)
+			 | GOMP_TARGET_ARG_THREAD_LIMIT;
+	      *tmp_new_args++ = (void *) new_val;
+	    }
+	  else
+	    {
+	      *tmp_new_args++ = (void *) (GOMP_TARGET_ARG_SUBSEQUENT_PARAM
+					  | GOMP_TARGET_ARG_THREAD_LIMIT);
+	      *tmp_new_args++ = (void *) new_val;
+	    }
+	}
+
+      *tmp_new_args = NULL;
+    }
+
   flags = clear_unsupported_flags (devicep, flags);
 
   if (flags & GOMP_TARGET_FLAG_NOWAIT)
@@ -2848,7 +3022,7 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,
 	  && !thr->task->final_task)
 	{
 	  gomp_create_target_task (devicep, fn, mapnum, hostaddrs,
-				   sizes, kinds, flags, depend, args,
+				   sizes, kinds, flags, depend, new_args,
 				   GOMP_TARGET_TASK_BEFORE_MAP);
 	  return;
 	}
@@ -2894,7 +3068,7 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,
 				      tgt_align, tgt_size);
 	    }
 	}
-      gomp_target_fallback (fn, hostaddrs, devicep, args);
+      gomp_target_fallback (fn, hostaddrs, devicep, new_args);
       return;
     }
 
@@ -2924,7 +3098,7 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,
     }
   devicep->run_func (devicep->target_id, fn_addr,
 		     tgt_vars ? (void *) tgt_vars->tgt_start : hostaddrs,
-		     args);
+		     new_args);
   if (tgt_vars)
     {
       htab_clear (refcount_set);
@@ -2932,6 +3106,12 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,
     }
   if (refcount_set)
     htab_free (refcount_set);
+
+  /* Copy back ICVs from device to host.
+     HOST_PTR is expected to exist since it was added in
+     gomp_load_image_to_device if not already available.  */
+  gomp_copy_back_icvs (devicep, device);
+
 }
 
 /* Handle reverse offload.  This is called by the device plugins for a
diff --git a/libgomp/testsuite/libgomp.c-c++-common/icv-4.c b/libgomp/testsuite/libgomp.c-c++-common/icv-4.c
index b987a33..9da0d63 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/icv-4.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/icv-4.c
@@ -16,7 +16,7 @@ main ()
     }
   else
     omp_set_num_teams (6);
-  if (getenv ("OMP_TEAMS_THREAD_LIMIT") == NULL
+  if (getenv ("OMP_TEAMS_THREAD_LIMIT") != NULL
       && strcmp (getenv ("OMP_TEAMS_THREAD_LIMIT"), "12") == 0)
     {
       if (omp_get_teams_thread_limit () != 12)
diff --git a/libgomp/testsuite/libgomp.c-c++-common/icv-5.c b/libgomp/testsuite/libgomp.c-c++-common/icv-5.c
index 431cfc7..72d7af6 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/icv-5.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/icv-5.c
@@ -1,25 +1,203 @@
 /* { dg-do run } */
-/* { dg-set-target-env-var OMP_NUM_TEAMS_DEV_0 "42" } */
-/* { dg-set-target-env-var OMP_NUM_TEAMS_DEV_1 "43" } */
-/* { dg-set-target-env-var OMP_NUM_TEAMS_DEV_2 "44" } */
-/* { dg-set-target-env-var OMP_NUM_TEAMS_ALL "45" } */
-/* { dg-set-target-env-var OMP_NUM_TEAMS_DEV "46" } */
-/* { dg-set-target-env-var OMP_NUM_TEAMS "47" } */
+/* { dg-set-target-env-var OMP_NUM_TEAMS_ALL "3" } */
+/* { dg-set-target-env-var OMP_NUM_TEAMS_DEV "4" } */
+/* { dg-set-target-env-var OMP_NUM_TEAMS "5" } */
+/* { dg-set-target-env-var OMP_NUM_TEAMS_DEV_0 "6" } */
+/* { dg-set-target-env-var OMP_NUM_TEAMS_DEV_1 "7" } */
+/* { dg-set-target-env-var OMP_NUM_TEAMS_DEV_2 "8" } */
+/* { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_ALL "2" } */
+/* { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_DEV "3" } */
+/* { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT "4" } */
+/* { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_DEV_0 "5" } */
+/* { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_DEV_1 "6" } */
+/* { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_DEV_2 "7" } */
 
 #include <omp.h>
 #include <stdlib.h>
+#include <unistd.h>
 
 int
 main ()
 {
-  if (omp_get_max_teams () != 47)
+  if (omp_get_max_teams () != 5
+      || omp_get_teams_thread_limit () != 4)
     abort ();
 
+  #pragma omp teams
+  {
+    if (omp_get_num_teams () > 5
+	|| omp_get_team_num () >= 5)
+      abort ();
+    #pragma omp parallel
+    if (omp_get_thread_limit () > 4
+	|| omp_get_thread_num () >= 4)
+      abort ();
+  }
+
+  omp_set_num_teams (4);
+  omp_set_teams_thread_limit (3);
+  if (omp_get_max_teams () != 4
+      || omp_get_teams_thread_limit () != 3)
+    abort ();
+
+  #pragma omp teams
+  {
+    if (omp_get_num_teams () > 4
+	|| omp_get_team_num () >= 4)
+      abort ();
+    #pragma omp parallel
+    if (omp_get_thread_limit () > 3
+	|| omp_get_thread_num () >= 3)
+      abort ();
+  }
+
+  #pragma omp teams num_teams(3) thread_limit(2)
+  {
+    if (omp_get_num_teams () != 3
+	|| omp_get_team_num () >= 3)
+    abort ();
+    #pragma omp parallel
+    if (omp_get_thread_limit () > 2
+	|| omp_get_thread_num () >= 2)
+      abort ();
+  }
+
+  #pragma omp teams num_teams(5) thread_limit(4)
+  {
+    if (omp_get_num_teams () != 5
+	|| omp_get_team_num () >= 5)
+    abort ();
+    #pragma omp parallel
+    if (omp_get_thread_limit () > 4
+	|| omp_get_thread_num () >= 4)
+      abort ();
+  }
+
   int num_devices = omp_get_num_devices () > 3 ? 3 : omp_get_num_devices ();
-  for (int i=0; i < num_devices; i++)
-    #pragma omp target device (i)
-      if (omp_get_max_teams () != 42 + i)
+
+  for (int i = 0; i < num_devices; i++)
+    {
+      #pragma omp target device (i)
+      if (omp_get_max_teams () != 6 + i
+	  || omp_get_teams_thread_limit () != 5 + i)
+	abort ();
+
+      #pragma omp target device (i)
+      #pragma omp teams
+      #pragma omp parallel
+      if (omp_get_thread_limit () > 5 + i
+	  || omp_get_thread_num () >= 5 + i)
+	abort ();
+
+      #pragma omp target device (i)
+      {
+	omp_set_num_teams (5 + i);
+	omp_set_teams_thread_limit (4 + i);
+	if (omp_get_max_teams () != 5 + i
+	    || omp_get_teams_thread_limit () != 4 + i)
+	  abort ();
+      }
+
+      /* omp_set_num_teams and omp_set_teams_thread_limit above set the value
+	 of nteams-var and teams-thread-limit-var ICVs on device 'i', which has
+	 scope 'device' and should be avaible in subsequent target regions.  */
+      #pragma omp target device (i)
+      if (omp_get_max_teams () != 5 + i
+	  || omp_get_teams_thread_limit () != 4 + i)
+	abort ();
+
+      #pragma omp target device (i)
+      #pragma omp teams
+      {
+	if (omp_get_num_teams () > 5 + i
+	    || omp_get_team_num () >= 5 + i)
+	  abort ();
+	#pragma omp parallel
+	if (omp_get_thread_limit () > 4 + i
+	    || omp_get_thread_num () >= 4 + i)
+	  abort ();
+      }
+
+      #pragma omp target device (i)
+      #pragma omp teams num_teams(6 + i) thread_limit(5 + i)
+      {
+	if (omp_get_num_teams () > 6 + i
+	    || omp_get_team_num () >= 6 + i)
+	  abort ();
+	#pragma omp parallel
+	if (omp_get_thread_limit () > 5 + i
+	    || omp_get_thread_num () >= 5 + i
+	    || omp_get_num_teams () > 6 + i
+	    || omp_get_team_num () >= 6 + i)
+	  abort ();
+      }
+
+      #pragma omp target device (i)
+      #pragma omp teams num_teams(4 + i) thread_limit(3 + i)
+      {
+	if (omp_get_num_teams () > 4 + i
+	    || omp_get_team_num () >= 4 + i)
+	  abort ();
+	#pragma omp parallel
+	if (omp_get_thread_limit () > 3 + i
+	    || omp_get_thread_num () >= 3 + i
+	    || omp_get_num_teams () > 4 + i
+	    || omp_get_team_num () >= 4 + i)
+	  abort ();
+      }
+
+      #pragma omp target device (i)
+      #pragma omp teams thread_limit(3 + i) num_teams(4 + i)
+      {
+	if (omp_get_num_teams () > 4 + i
+	    || omp_get_team_num () >= 4 + i)
+	  abort ();
+	#pragma omp parallel
+	if (omp_get_thread_limit () > 3 + i
+	    || omp_get_thread_num () >= 3 + i
+	    || omp_get_num_teams () > 4 + i
+	    || omp_get_team_num () >= 4 + i)
+	  abort ();
+      }
+
+      /* The NUM_TEAMS and THREAD_LIMIT clauses should not change the values
+	 of the corresponding ICVs.  */
+      #pragma omp target device (i)
+      if (omp_get_max_teams () != 5 + i
+	  || omp_get_teams_thread_limit () != 4 + i)
+	abort ();
+
+      /* This tests a large number of teams and threads.  If it is larger than
+	 2^15+1 then the according argument in the kernels arguments list
+	 is encoded with two items instead of one.  */
+      intptr_t large_num_teams = 66000;
+      intptr_t large_threads_limit = 67000;
+      #pragma omp target device (i)
+      {
+	omp_set_num_teams (large_num_teams + i);
+	omp_set_teams_thread_limit (large_threads_limit + i);
+	if (omp_get_max_teams () != large_num_teams + i
+	    || omp_get_teams_thread_limit () != large_threads_limit + i)
+	  abort ();
+      }
+
+      #pragma omp target device (i)
+	if (omp_get_max_teams () != large_num_teams + i
+	    || omp_get_teams_thread_limit () != large_threads_limit + i)
 	abort ();
 
+      #pragma omp target device (i)
+      #pragma omp teams
+      {
+	if (omp_get_num_teams () > large_num_teams + i
+	    || omp_get_team_num () >= large_num_teams + i)
+	  abort ();
+	#pragma omp parallel
+	if (omp_get_thread_limit () > large_threads_limit + i
+	    || omp_get_thread_num () >= large_threads_limit + i)
+	  abort ();
+      }
+    }
+
   return 0;
 }
diff --git a/libgomp/testsuite/libgomp.c-c++-common/icv-6.c b/libgomp/testsuite/libgomp.c-c++-common/icv-6.c
index e199a18..1b17f2d 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/icv-6.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/icv-6.c
@@ -1,9 +1,10 @@
 /* { dg-do run } */
-/* { dg-set-target-env-var OMP_NUM_TEAMS_ALL "42" } */
-/* { dg-set-target-env-var OMP_NUM_TEAMS_DEV "43" } */
+/* { dg-set-target-env-var OMP_NUM_TEAMS_ALL "3" } */
+/* { dg-set-target-env-var OMP_NUM_TEAMS_DEV "4" } */
+/* { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_ALL "2" } */
+/* { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_DEV "3" } */
 /* { dg-set-target-env-var OMP_SCHEDULE_ALL "guided,4" } */
 /* { dg-set-target-env-var OMP_DYNAMIC_ALL "true" } */
-/* { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_ALL "44" } */
 /* { dg-set-target-env-var OMP_THREAD_LIMIT_ALL "45" } */
 /* { dg-set-target-env-var OMP_NUM_THREADS_ALL "46,3,2" } */
 /* { dg-set-target-env-var OMP_MAX_ACTIVE_LEVELS_ALL "47" } */
@@ -12,8 +13,8 @@
 
 /* This tests the hierarchical usage of ICVs on the device, i.e. if
    OMP_NUM_TEAMS_DEV_<device_num> is not configured, then the value of
-   OMP_NUM_TEAMS_DEV should be used.  And if there is no environment variable
-   without suffix, then the corresponding _ALL variant should be used.  */
+   OMP_NUM_TEAMS_DEV should be used.  And if OMP_NUM_TEAMS (without suffix) is
+   not defined, then OMP_NUM_TEAMS_ALL should be used for the host.  */
 
 #include <omp.h>
 #include <stdlib.h>
@@ -26,10 +27,10 @@ main ()
   int chunk_size;
   omp_get_schedule(&kind, &chunk_size);
 
-  if ((!getenv ("OMP_NUM_TEAMS") && omp_get_max_teams () != 42)
+  if ((!getenv ("OMP_NUM_TEAMS") && omp_get_max_teams () != 3)
       || (!getenv ("OMP_DYNAMIC") && !omp_get_dynamic ())
       || (!getenv ("OMP_SCHEDULE") && (kind != 3 || chunk_size != 4))
-      || (!getenv ("OMP_TEAMS_THREAD_LIMIT") && omp_get_teams_thread_limit () != 44)
+      || (!getenv ("OMP_TEAMS_THREAD_LIMIT") && omp_get_teams_thread_limit () != 2)
       || (!getenv ("OMP_THREAD_LIMIT") && omp_get_thread_limit () != 45)
       || (!getenv ("OMP_NUM_THREADS") && omp_get_max_threads () != 46)
       || (!getenv ("OMP_PROC_BIND") && omp_get_proc_bind () != omp_proc_bind_spread)
@@ -44,9 +45,51 @@ main ()
       name[sizeof ("OMP_NUM_TEAMS_DEV_1") - 2] = '0' + i;
       if (getenv (name))
 	continue;
-    #pragma omp target device (i)
-      if (omp_get_max_teams () != 43)
+
+      #pragma omp target device (i)
+      if (omp_get_max_teams () != 4
+	  || omp_get_teams_thread_limit () != 3)
 	abort ();
+      #pragma omp target device (i)
+      #pragma omp teams
+      {
+	if (omp_get_num_teams () > 4
+	    || omp_get_team_num () >= 4)
+	  abort ();
+	#pragma omp parallel
+	if (omp_get_thread_limit () > 3
+	    || omp_get_thread_num () >= 3)
+	  abort ();
+      }
+
+      #pragma omp target device (i)
+      {
+	omp_set_num_teams (3 + i);
+	omp_set_teams_thread_limit (2 + i);
+	if (omp_get_max_teams () != 3 + i
+	    || omp_get_teams_thread_limit () != 2 + i)
+	  abort ();
+      }
+
+     /* omp_set_num_teams above set the value of nteams-var ICV on device 'i',
+	 which has scope 'device' and should be avaible in subsequent target
+	 regions.  */
+      #pragma omp target device (i)
+      if (omp_get_max_teams () != 3 + i
+	  || omp_get_teams_thread_limit () != 2 + i)
+	abort ();
+
+      #pragma omp target device (i)
+      #pragma omp teams
+      {
+	if (omp_get_num_teams () > 3 + i
+	    || omp_get_team_num () >= 3 + i)
+	  abort ();
+	#pragma omp parallel
+	if (omp_get_thread_limit () > 2 + i
+	    || omp_get_thread_num () >= 2 + i)
+	  abort ();
+      }
     }
 
   return 0;
diff --git a/libgomp/testsuite/libgomp.c-c++-common/icv-7.c b/libgomp/testsuite/libgomp.c-c++-common/icv-7.c
index 70a716d..bbbd6df 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/icv-7.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/icv-7.c
@@ -1,5 +1,6 @@
 /* { dg-do run } */
-/* { dg-set-target-env-var OMP_NUM_TEAMS_ALL "42" } */
+/* { dg-set-target-env-var OMP_NUM_TEAMS_ALL "7" } */
+/* { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_ALL "2" } */
 
 /* This tests the hierarchical usage of ICVs on the host and on devices, i.e. if
    OMP_NUM_TEAMS_DEV_<device_num>, OMP_NUM_TEAMS_DEV, and
@@ -9,18 +10,90 @@
 
 #include <omp.h>
 #include <stdlib.h>
+#include <string.h>
 
 int
 main ()
 {
-  if (omp_get_max_teams () != 42)
+  if ((!getenv ("OMP_NUM_TEAMS") && omp_get_max_teams () != 7)
+      || (!getenv ("OMP_TEAMS_THREAD_LIMIT") && omp_get_teams_thread_limit () != 2))
     abort ();
 
+  #pragma omp teams
+  if ((!getenv ("OMP_NUM_TEAMS"))
+      && (omp_get_num_teams () > 7 || omp_get_team_num () >= 7))
+    abort ();
+
+  omp_set_num_teams (9);
+  omp_set_teams_thread_limit (3);
+  if (omp_get_max_teams () != 9
+      || omp_get_teams_thread_limit () != 3)
+    abort ();
+
+  #pragma omp teams
+  if (omp_get_num_teams () > 9
+      || omp_get_team_num () >= 9)
+    abort ();
+
+  #pragma omp teams num_teams(5)
+  if (omp_get_num_teams () > 5
+      || omp_get_team_num () >= 5)
+    abort ();
+
+  if (getenv ("OMP_NUM_TEAMS_DEV") || getenv ("OMP_TEAMS_THREAD_LIMIT_DEV"))
+    return 0;
+
   int num_devices = omp_get_num_devices () > 3 ? 3 : omp_get_num_devices ();
-  for (int i=0; i < num_devices; i++)
-    #pragma omp target device (i)
-      if (omp_get_max_teams () != 42)
+  for (int i = 0; i < num_devices; i++)
+    {
+      char nteams[sizeof ("OMP_NUM_TEAMS_DEV_1")];
+      strcpy (nteams, "OMP_NUM_TEAMS_DEV_1");
+      nteams[sizeof ("OMP_NUM_TEAMS_DEV_1") - 2] = '0' + i;
+      char teams_thread_limit[sizeof ("OMP_TEAMS_THREAD_LIMIT_DEV_1")];
+      strcpy (teams_thread_limit, "OMP_TEAMS_THREAD_LIMIT_DEV_1");
+      teams_thread_limit[sizeof ("OMP_TEAMS_THREAD_LIMIT_DEV_1") - 2] = '0' + i;
+      if (getenv (nteams) || getenv (teams_thread_limit))
+	continue;
+
+      #pragma omp target device (i)
+      if (omp_get_max_teams () != 7
+	  || omp_get_teams_thread_limit () != 2)
+	abort ();
+
+      #pragma omp target device (i)
+      #pragma omp teams
+      if (omp_get_num_teams () > 7
+	  || omp_get_team_num () >= 7)
+	abort ();
+
+      #pragma omp target device (i)
+      {
+	omp_set_num_teams (8 + i);
+	omp_set_teams_thread_limit (4 + i);
+	if (omp_get_max_teams () != 8 + i
+	    || omp_get_teams_thread_limit () != 4 + i)
+	  abort ();
+      }
+
+     /* omp_set_num_teams above set the value of nteams-var ICV on device 'i',
+	 which has scope 'device' and should be avaible in subsequent target
+	 regions.  */
+      #pragma omp target device (i)
+      if (omp_get_max_teams () != 8 + i
+	  || omp_get_teams_thread_limit () != 4 + i)
+	abort ();
+
+      #pragma omp target device (i)
+      #pragma omp teams
+      if (omp_get_num_teams () > 8 + i
+	  || omp_get_team_num () >= 8 + i)
+	abort ();
+
+      #pragma omp target device (i)
+      #pragma omp teams num_teams(5 + i)
+      if (omp_get_num_teams () != 5 + i)
 	abort ();
+    }
 
   return 0;
 }
diff --git a/libgomp/testsuite/libgomp.c-c++-common/icv-9.c b/libgomp/testsuite/libgomp.c-c++-common/icv-9.c
new file mode 100644
index 0000000..21b874f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/icv-9.c
@@ -0,0 +1,72 @@
+/* { dg-do run } */
+
+/* This tests usage of ICVs on the host and on devices if no corresponding
+   environment variables are configured.  */
+
+#include <omp.h>
+#include <stdlib.h>
+
+int
+main ()
+{
+  if (omp_get_max_teams () != 0
+      || omp_get_teams_thread_limit () != 0)
+    abort ();
+
+  omp_set_num_teams (9);
+  omp_set_teams_thread_limit (2);
+  if (omp_get_max_teams () != 9
+      || omp_get_teams_thread_limit () != 2)
+    abort ();
+
+  #pragma omp teams
+  if (omp_get_num_teams () > 9
+      || omp_get_team_num () >= 9)
+    abort ();
+
+  #pragma omp teams num_teams(5)
+  if (omp_get_num_teams () > 5
+      || omp_get_team_num () >= 5)
+    abort ();
+
+  int num_devices = omp_get_num_devices () > 3 ? 3 : omp_get_num_devices ();
+  for (int i = 0; i < num_devices; i++)
+    {
+      #pragma omp target device (i)
+      if (omp_get_max_teams () != 0
+	  || omp_get_teams_thread_limit () != 0)
+	abort ();
+
+      #pragma omp target device (i)
+      {
+	omp_set_num_teams (8 + i);
+	omp_set_teams_thread_limit (3 + i);
+	if (omp_get_max_teams () != 8 + i
+	    || omp_get_teams_thread_limit () != 3 + i)
+	  abort ();
+      }
+
+     /* omp_set_num_teams above set the value of nteams-var ICV on device 'i',
+	 which has scope 'device' and should be avaible in subsequent target
+	 regions.  */
+      #pragma omp target device (i)
+      if (omp_get_max_teams () != 8 + i
+	  || omp_get_teams_thread_limit () != 3 + i)
+	abort ();
+
+      #pragma omp target device (i)
+      #pragma omp teams
+      if (omp_get_num_teams () > 8 + i
+	  || omp_get_team_num () >= 8 + i)
+	abort ();
+
+      /* NUM_TEAMS clause has priority over previously set NUM_TEAMS value.  */
+      #pragma omp target device (i)
+      #pragma omp teams num_teams(5 + i)
+      if (omp_get_num_teams () > 5 + i
+	  || omp_get_team_num () >= 5 + i)
+	abort ();
+    }
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.fortran/icv-5.f90 b/libgomp/testsuite/libgomp.fortran/icv-5.f90
new file mode 100644
index 0000000..05a35fa
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/icv-5.f90
@@ -0,0 +1,226 @@
+! { dg-set-target-env-var OMP_NUM_TEAMS_ALL "3" }
+! { dg-set-target-env-var OMP_NUM_TEAMS_DEV "4" }
+! { dg-set-target-env-var OMP_NUM_TEAMS "5" }
+! { dg-set-target-env-var OMP_NUM_TEAMS_DEV_0 "6" }
+! { dg-set-target-env-var OMP_NUM_TEAMS_DEV_1 "7" }
+! { dg-set-target-env-var OMP_NUM_TEAMS_DEV_2 "8" }
+! { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_ALL "2" }
+! { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_DEV "3" }
+! { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT "4" }
+! { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_DEV_0 "5" }
+! { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_DEV_1 "6" }
+! { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_DEV_2 "7" }
+
+use omp_lib
+implicit none (type, external)
+  integer :: num_devices, i, large_num_teams, large_threads_limit
+  logical :: err
+
+  if (omp_get_num_devices () > 3) then
+    num_devices = 3
+  else
+    num_devices = omp_get_num_devices ()
+  end if
+
+  do i=0,num_devices-1
+
+    ! Testing NUM_TEAMS.
+    if (env_is_set_dev ("OMP_NUM_TEAMS_DEV_", i, 6 + i)) then
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_max_teams () /= 6 + i) err = .true.
+      !$omp end target
+      if (err) stop 1
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      if (omp_get_num_teams () > 6 + i .or. omp_get_team_num () >= 6 + i) &
+        err = .true.
+      !$omp end teams
+      !$omp end target
+      if (err) stop 2
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      call omp_set_num_teams (5 + i)
+      if (omp_get_max_teams () /= 5 + i) err = .true.
+      !$omp end target
+      if (err) stop 3
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_max_teams () /= 5 + i) err = .true.
+      !$omp end target
+      if (err) stop 4
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      if (omp_get_num_teams () > 5 + i .or. omp_get_team_num () >= 5 + i) &
+        err = .true.
+      !$omp end teams
+      !$omp end target
+      if (err) stop 5
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams num_teams(6 + i)
+      if (omp_get_num_teams () > 6 + i .or. omp_get_team_num () >= 6 + i) &
+        err = .true.
+      !$omp end teams
+      !$omp end target
+      if (err) stop 6
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams num_teams(4 + i)
+      if (omp_get_num_teams () > 4 + i .or. omp_get_team_num () >= 4 + i) &
+        err = .true.
+      !$omp end teams
+      !$omp end target
+      if (err) stop 7
+
+      large_num_teams = 66000
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      call omp_set_num_teams (large_num_teams + i)
+      if (omp_get_max_teams () /= large_num_teams + i) err = .true.
+      !$omp end target
+      if (err) stop 8
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_max_teams () /= large_num_teams + i) err = .true.
+      !$omp end target
+      if (err) stop 9
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      if (omp_get_num_teams () > large_num_teams + i &
+          .or. omp_get_team_num () >= large_num_teams + i) err = .true.
+      !$omp end teams
+      !$omp end target
+      if (err) stop 10
+    end if
+
+    ! Testing TEAMS-THREAD-LIMIT
+    if (env_is_set_dev ("OMP_TEAMS_THREAD_LIMIT_DEV_", i, 5 + i)) then
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_teams_thread_limit () /= 5 + i) err = .true.
+      !$omp end target
+      if (err) stop 11
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      !$omp parallel
+      if (omp_get_thread_limit () > 5 + i .or. omp_get_thread_num () >= 5 + i) &
+        err = .true.
+      !$omp end parallel
+      !$omp end teams
+      !$omp end target
+      if (err) stop 12
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      call omp_set_teams_thread_limit (4 + i)
+      if (omp_get_teams_thread_limit () /= 4 + i) err = .true.
+      !$omp end target
+      if (err) stop 13
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_teams_thread_limit () /= 4 + i) err = .true.
+      !$omp end target
+      if (err) stop 14
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      !$omp parallel
+      if (omp_get_thread_limit () > 4 + i .or. omp_get_thread_num () >= 4 + i) &
+        err = .true.
+      !$omp end parallel
+      !$omp end teams
+      !$omp end target
+      if (err) stop 15
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams thread_limit(5 + i)
+      !$omp parallel
+      if (omp_get_thread_limit () > 5 + i .or. omp_get_thread_num () >= 5 + i) &
+        err = .true.
+      !$omp end parallel
+      !$omp end teams
+      !$omp end target
+      if (err) stop 16
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams thread_limit(3 + i)
+      !$omp parallel
+      if (omp_get_thread_limit () > 3 + i .or. omp_get_thread_num () >= 3 + i) &
+        err = .true.
+      !$omp end parallel
+      !$omp end teams
+      !$omp end target
+      if (err) stop 17
+
+      large_threads_limit = 67000
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      call omp_set_teams_thread_limit (large_threads_limit + i)
+      if (omp_get_teams_thread_limit () /= large_threads_limit + i) err = .true.
+      !$omp end target
+      if (err) stop 18
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_teams_thread_limit () /= large_threads_limit + i) err = .true.
+      !$omp end target
+      if (err) stop 19
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      !$omp parallel
+      if (omp_get_thread_limit () > large_threads_limit + i &
+          .or. omp_get_thread_num () >= large_threads_limit + i) err = .true.
+      !$omp end parallel
+      !$omp end teams
+      !$omp end target
+      if (err) stop 20
+    end if
+
+  end do
+
+contains
+  logical function env_is_set (name, val)
+    character(len=*) :: name, val
+    character(len=40) :: val2
+    integer :: stat
+    call get_environment_variable (name, val2, status=stat)
+    if (stat == 0) then
+      if (val == val2) then
+        env_is_set = .true.
+        return
+      end if
+    else if (stat /= 1) then
+      error stop 30
+    endif
+    env_is_set = .false.
+  end
+  logical function env_is_set_dev (name, dev_num, val)
+    character(len=*) :: name
+    integer :: dev_num, val
+    character(len=64) :: dev_num_str, env_var, val_str
+    dev_num_str = ADJUSTL(dev_num_str)
+    env_var = name // dev_num_str
+    val_str = ADJUSTL(val_str)
+    env_is_set_dev = env_is_set (TRIM(env_var), TRIM(val_str))
+  end
+end
diff --git a/libgomp/testsuite/libgomp.fortran/icv-6.f90 b/libgomp/testsuite/libgomp.fortran/icv-6.f90
new file mode 100644
index 0000000..c8e6a0d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/icv-6.f90
@@ -0,0 +1,140 @@
+! { dg-set-target-env-var OMP_NUM_TEAMS_ALL "3" }
+! { dg-set-target-env-var OMP_NUM_TEAMS_DEV "4" }
+! { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_ALL "2" }
+! { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT_DEV "3" }
+
+! This test considers the hierarchical usage of ICVs on the device, i.e. if
+! e.g. OMP_NUM_TEAMS_DEV_<device_num> is not configured, then the value of
+! OMP_NUM_TEAMS_DEV should be used for the targets.
+
+use omp_lib
+implicit none (type, external)
+  integer :: num_devices, i, stat, tmp
+  logical :: err
+  character(len=40) :: val
+
+  ! The following environment variables should not be set.
+  call get_environment_variable ("OMP_NUM_TEAMS_DEV_0", val, status=stat)
+  if (stat /= 1) return
+  call get_environment_variable ("OMP_NUM_TEAMS_DEV_1", val, status=stat)
+  if (stat /= 1) return
+  call get_environment_variable ("OMP_NUM_TEAMS_DEV_2", val, status=stat)
+  if (stat /= 1) return
+  call get_environment_variable ("OMP_TEAMS_THREAD_LIMIT_DEV_0", val, status=stat)
+  if (stat /= 1) return
+  call get_environment_variable ("OMP_TEAMS_THREAD_LIMIT_DEV_1", val, status=stat)
+  if (stat /= 1) return
+  call get_environment_variable ("OMP_TEAMS_THREAD_LIMIT_DEV_2", val, status=stat)
+  if (stat /= 1) return
+
+  if (omp_get_num_devices () > 3) then
+    num_devices = 3
+  else
+    num_devices = omp_get_num_devices ()
+  end if
+
+  do i=0,num_devices-1
+
+    ! Testing NUM_TEAMS.
+    if (env_is_set ("OMP_NUM_TEAMS_DEV", "4")) then
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_max_teams () /= 4) err = .true.
+      !$omp end target
+      if (err) stop 1
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      if (omp_get_num_teams () > 4 .or. omp_get_team_num () >= 4) &
+        err = .true.
+      !$omp end teams
+      !$omp end target
+      if (err) stop 2
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      call omp_set_num_teams (3 + i)
+      if (omp_get_max_teams () /= 3 + i) err = .true.
+      !$omp end target
+      if (err) stop 3
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_max_teams () /= 3 + i) err = .true.
+      !$omp end target
+      if (err) stop 4
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      if (omp_get_num_teams () > 3 + i .or. omp_get_team_num () >= 3 + i) &
+        err = .true.
+      !$omp end teams
+      !$omp end target
+      if (err) stop 5
+    end if
+
+    ! Testing TEAMS-THREAD-LIMIT
+    if (env_is_set ("OMP_TEAMS_THREAD_LIMIT_DEV", "3")) then
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_teams_thread_limit () /= 3) err = .true.
+      !$omp end target
+      if (err) stop 6
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      !$omp parallel
+      if (omp_get_thread_limit () > 3 .or. omp_get_thread_num () >= 3) &
+        err = .true.
+      !$omp end parallel
+      !$omp end teams
+      !$omp end target
+      if (err) stop 7
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      call omp_set_teams_thread_limit (2 + i)
+      if (omp_get_teams_thread_limit () /= 2 + i) err = .true.
+      !$omp end target
+      if (err) stop 8
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      if (omp_get_teams_thread_limit () /= 2 + i) err = .true.
+      !$omp end target
+      if (err) stop 9
+
+      err = .false.
+      !$omp target device(i) map(tofrom: err)
+      !$omp teams
+      !$omp parallel
+      if (omp_get_thread_limit () > 2 + i .or. omp_get_thread_num () >= 2 + i) &
+        err = .true.
+      !$omp end parallel
+      !$omp end teams
+      !$omp end target
+      if (err) stop 10
+    end if
+
+  end do
+
+contains
+  logical function env_is_set (name, val)
+    character(len=*) :: name, val
+    character(len=40) :: val2
+    integer :: stat
+    call get_environment_variable (name, val2, status=stat)
+    if (stat == 0) then
+      if (val == val2) then
+        env_is_set = .true.
+        return
+      end if
+    else if (stat /= 1) then
+      error stop 10
+    endif
+    env_is_set = .false.
+  end
+end

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Patch] OpenMP, libgomp, gimple: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices
  2022-11-24 14:09         ` Marcel Vollweiler
@ 2022-12-05 13:50           ` Jakub Jelinek
  0 siblings, 0 replies; 7+ messages in thread
From: Jakub Jelinek @ 2022-12-05 13:50 UTC (permalink / raw)
  To: Marcel Vollweiler; +Cc: gcc-patches

On Thu, Nov 24, 2022 at 03:09:02PM +0100, Marcel Vollweiler wrote:
> gcc/ChangeLog:
> 
> 	* gimplify.cc (optimize_target_teams): Set initial num_teams_upper
> 	to "-2" instead of "1" for non-existing num_teams clause in order to
> 	disambiguate from the case of an existing num_teams clause with value 1.
> 
> libgomp/ChangeLog:
> 
> 	* config/gcn/icv-device.c (omp_get_teams_thread_limit): Added to
> 	allow processing of device-specific values.
> 	(omp_set_teams_thread_limit): Likewise.
> 	(ialias): Likewise.
> 	* config/nvptx/icv-device.c (omp_get_teams_thread_limit): Likewise.
> 	(omp_set_teams_thread_limit): Likewise.
> 	(ialias): Likewise.
> 	* icv-device.c (omp_get_teams_thread_limit): Likewise.
> 	(ialias): Likewise.
> 	(omp_set_teams_thread_limit): Likewise.
> 	* icv.c (omp_set_teams_thread_limit): Removed.
> 	(omp_get_teams_thread_limit): Likewise.
> 	(ialias): Likewise.
> 	* libgomp.texi: Updated documentation for nvptx and gcn corresponding
> 	to the limitation of the number of teams.
> 	* plugin/plugin-gcn.c (limit_teams): New helper function that limits
> 	the number of teams by twice the number of compute units.
> 	(parse_target_attributes): Limit the number of teams on gcn offload
> 	devices.
> 	* target.c (get_gomp_offload_icvs): Added teams_thread_limit_var
> 	handling.
> 	(gomp_load_image_to_device): Added a size check for the ICVs struct
> 	variable.
> 	(gomp_copy_back_icvs): New function that is used in GOMP_target_ext to
> 	copy back the ICV values from device to host.
> 	(GOMP_target_ext): Update the number of teams and threads in the kernel
> 	args also considering device-specific values.
> 	* testsuite/libgomp.c-c++-common/icv-4.c: Fixed an error in the reading
> 	of OMP_TEAMS_THREAD_LIMIT from the environment.
> 	* testsuite/libgomp.c-c++-common/icv-5.c: Extended.
> 	* testsuite/libgomp.c-c++-common/icv-6.c: Extended.
> 	* testsuite/libgomp.c-c++-common/icv-7.c: Extended.
> 	* testsuite/libgomp.c-c++-common/icv-9.c: New test.
> 	* testsuite/libgomp.fortran/icv-5.f90: New test.
> 	* testsuite/libgomp.fortran/icv-6.f90: New test.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* c-c++-common/gomp/target-teams-1.c: Adapt expected values for
> 	num_teams from "1" to "-2" in cases without num_teams clause.
> 	* g++.dg/gomp/target-teams-1.C: Likewise.
> 	* gfortran.dg/gomp/defaultmap-4.f90: Likewise.
> 	* gfortran.dg/gomp/defaultmap-5.f90: Likewise.
> 	* gfortran.dg/gomp/defaultmap-6.f90: Likewise.

Ok, thanks.

	Jakub


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-12-05 13:50 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-14 16:06 [Patch] OpenMP, libgomp, gimple: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices Marcel Vollweiler
2022-06-30 13:16 ` Jakub Jelinek
2022-08-03 12:40   ` Marcel Vollweiler
2022-09-18  8:24     ` Marcel Vollweiler
2022-09-30  9:35       ` Jakub Jelinek
2022-11-24 14:09         ` Marcel Vollweiler
2022-12-05 13:50           ` Jakub Jelinek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).