Hi Jakub, I updated (and simplified) my last submitted patch (https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598770.html) considering the following aspects: 1. For copying ICVs to devices a struct ("gomp_offload_icvs") is used now instead of copying all ICVs individually. This is a somehow reduced struct as we don't want to copy ICVs which are not used on the device. 2. A linked list of those gomp_offload_icvs structs was introduced (gomp_offload_icv_list) that holds only device-specific ICV values. In a following patch (for https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593260.html, will be submitted soon) this list is also used to keep ICV values that were changed on a particular device and copied back to the host. 3. The value of a device-specific ICV according to the hierarchy (DEV_X, DEV, ALL) can be determined earliest when we copy to the device, since we do not know the actual existing device numbers (only those that were explicitly specified). The device-specific ICV struct is added to gomp_offload_icv_list when it is copied the first time. Once it is contained it is taken from the list for each copy process (to the device and back). The updated patch is attached and tested again on x86_64-linux with nvptx and amdgcn offloading without regression. Marcel Am 25.07.2022 um 15:38 schrieb Marcel Vollweiler: > Hi Jakub, > >>>> I'm not sure we can rely on execv on all targets that do support libgomp. >>>> Any reason why you actually need this, rather than using >>>> dg-set-target-env-var directive(s) and perhaps return 0; if getenv doesn't >>>> return the expected values? >>> >>> Interesting topic. After some (internal) discussions I think the best way is to >>> set the environment variables explicitely instead using dg-set-target-env-var. >>> The reason is that dg-set-target-env-var does not work for remote testing (which >>> seems to be a common test environment). For remote testing dejagnu immediately >>> aborts the test case with UNSUPPORTED which is specified in the corresponding >>> extension and makes sence from my point of view as the test assumption cannot be >>> fulfilled (since the environment variables are not set on remote targets). >>> It also means that whenever dg-set-target-env-var is set in the test file, the >>> execution of the test case is not tested on remote targets. >> >> The only reason why dg-set-target-env-var is supported on native only right >> now is that I'm never doing remote testing myself and so couldn't test that. >> There is no inherent reason why the env vars couldn't be propagated over to >> the remote and set in the environment there. >> So trying to work around that rather than at least trying to change >> dg-set-target-env-var so that it works with the remote testing you do looks >> wrong. >> If dg-set-target-env-var can be made to work remotely, it will magically >> improve those 130+ tests that use it already together with the newly added >> tests. >> >> So, I'd suggest to just use dg-set-target-env-var and incrementally work on >> making it work for remote testing if that is important to whomever does >> that kind of testing. Could be e.g. a matter of invoking remotely >> env VAR1=val1 VAR2=val2 program args >> instead of program args. If env is missing on the remote side, it could >> be UNSUPPORTED then. > > I agree. So I changed the tests using dg-set-target-env-var and removed the > execv parts. > >> >>> +/* The initial ICV values for the host, which are configured with environment >>> + variables without a suffix, e.g. OMP_NUM_TEAMS. */ >>> +struct gomp_initial_icvs gomp_initial_icvs_none; >>> + >>> +/* Initial ICV values that were configured for the host and for all devices by >>> + using environment variables like OMP_NUM_TEAMS_ALL. */ >>> +struct gomp_initial_icvs gomp_initial_icvs_all; >>> + >>> +/* Initial ICV values that were configured only for devices (not for the host) >>> + by using environment variables like OMP_NUM_TEAMS_DEV. */ >>> +struct gomp_initial_icvs gomp_initial_icvs_dev; >> >> As I said last time, I don't like allocating these >> all the time in the data section of libgomp when at least for a few upcoming >> years, most users will never use those suffixes. >> Can't *_DEV and *_ALL go into the gomp_initial_icv_dev_list >> chain too, perhaps > > gomp_initial_icvs_{none, all, dev} are now defined as pointers (as you proposed > previously). gomp_initial_icvs_{all, dev} are only instantiated if at least one > according environment variable is parsed. gomp_initial_icvs_none is always > initialized with the initial global ICV values. > > All three structures are now also included in gomp_initial_icv_list (previously > named gomp_initial_icv_dev_list) with "magic device numbers" -1, -2, and -3. > The list items for _DEV, _ALL and no suffix are stored at the beginning of the > list whereas the device-specific list items are attached at the end. > >> >>> +static const struct envvar >>> +{ >>> + const char *name; >>> + int name_len; >>> + unsigned char flag_vars[3]; >>> + unsigned char flag; >>> + void *params[3]; >>> + bool (*parse_func) (const char *, const char *, void * const[]); >>> +} envvars[] = { >>> + {ENTRY ("OMP_SCHEDULE_DEV"), {OMP_SCHEDULE_DEV_, >>> OMP_SCHEDULE_CHUNK_SIZE_DEV_}, GOMP_ENV_VAR_SUFFIX_DEV, >>> {&gomp_initial_icvs_dev.run_sched_var, >>> &gomp_initial_icvs_dev.run_sched_chunk_size}, &parse_schedule}, >>> + {ENTRY ("OMP_SCHEDULE_ALL"), {OMP_SCHEDULE_DEV_, >>> OMP_SCHEDULE_CHUNK_SIZE_DEV_}, GOMP_ENV_VAR_SUFFIX_ALL, >>> {&gomp_initial_icvs_all.run_sched_var, >>> &gomp_initial_icvs_all.run_sched_chunk_size}, &parse_schedule}, >>> + {ENTRY ("OMP_SCHEDULE"), {OMP_SCHEDULE_DEV_, >>> OMP_SCHEDULE_CHUNK_SIZE_DEV_}, GOMP_ENV_VAR_SUFFIX_NONE, >>> {&gomp_initial_icvs_none.run_sched_var, >>> &gomp_initial_icvs_none.run_sched_chunk_size}, &parse_schedule}, >>> + >>> + {ENTRY ("OMP_NUM_TEAMS_DEV"), {OMP_NUM_TEAMS_DEV_}, >>> GOMP_ENV_VAR_SUFFIX_DEV , {&gomp_initial_icvs_dev.nteams_var, false}, >>> &parse_int}, >>> + {ENTRY ("OMP_NUM_TEAMS_ALL"), {OMP_NUM_TEAMS_DEV_}, >>> GOMP_ENV_VAR_SUFFIX_ALL, {&gomp_initial_icvs_all.nteams_var, false}, >>> &parse_int}, >>> + {ENTRY ("OMP_NUM_TEAMS"), {OMP_NUM_TEAMS_DEV_}, GOMP_ENV_VAR_SUFFIX_NONE, >>> {&gomp_initial_icvs_none.nteams_var, false}, &parse_int}, >>> + >>> + {ENTRY ("OMP_DYNAMIC_DEV"), {OMP_DYNAMIC_DEV_}, GOMP_ENV_VAR_SUFFIX_DEV, >>> {&gomp_initial_icvs_dev.dyn_var}, &parse_boolean}, >>> + {ENTRY ("OMP_DYNAMIC_ALL"), {OMP_DYNAMIC_DEV_}, GOMP_ENV_VAR_SUFFIX_ALL, >>> {&gomp_initial_icvs_all.dyn_var}, &parse_boolean}, >>> + {ENTRY ("OMP_DYNAMIC"), {OMP_DYNAMIC_DEV_}, GOMP_ENV_VAR_SUFFIX_NONE, >>> {&gomp_initial_icvs_none.dyn_var}, &parse_boolean}, >>> + >>> + {ENTRY ("OMP_TEAMS_THREAD_LIMIT_DEV"), {OMP_TEAMS_THREAD_LIMIT_DEV_}, >>> GOMP_ENV_VAR_SUFFIX_DEV, {&gomp_initial_icvs_dev.teams_thread_limit_var, >>> false}, &parse_int}, >>> + {ENTRY ("OMP_TEAMS_THREAD_LIMIT_ALL"), {OMP_TEAMS_THREAD_LIMIT_DEV_}, >>> GOMP_ENV_VAR_SUFFIX_ALL, {&gomp_initial_icvs_all.teams_thread_limit_var, >>> false}, &parse_int}, >>> + {ENTRY ("OMP_TEAMS_THREAD_LIMIT"), {OMP_TEAMS_THREAD_LIMIT_DEV_}, >>> GOMP_ENV_VAR_SUFFIX_NONE, {&gomp_initial_icvs_none.teams_thread_limit_var, >>> false}, &parse_int}, >>> + >>> + {ENTRY ("OMP_THREAD_LIMIT_DEV"), {OMP_THREAD_LIMIT_DEV_}, >>> GOMP_ENV_VAR_SUFFIX_DEV, {&gomp_initial_icvs_dev.thread_limit_var, false, >>> (void *) UINT_MAX}, &parse_unsigned_long}, >>> + {ENTRY ("OMP_THREAD_LIMIT_ALL"), {OMP_THREAD_LIMIT_DEV_}, >>> GOMP_ENV_VAR_SUFFIX_ALL, {&gomp_initial_icvs_all.thread_limit_var, false, >>> (void *) UINT_MAX}, &parse_unsigned_long}, >>> + {ENTRY ("OMP_THREAD_LIMIT"), {OMP_THREAD_LIMIT_DEV_}, >>> GOMP_ENV_VAR_SUFFIX_NONE, {&gomp_initial_icvs_none.thread_limit_var, false, >>> (void *) UINT_MAX}, &parse_unsigned_long}, >>> + >>> + {ENTRY ("OMP_NUM_THREADS_DEV"), {OMP_NUM_THREADS_DEV_, >>> OMP_NTHREADS_LIST_DEV, OMP_NTHREADS_LIST_LEN_DEV}, GOMP_ENV_VAR_SUFFIX_DEV, >>> {&gomp_initial_icvs_dev.nthreads_var, >>> &gomp_initial_icvs_dev.nthreads_var_list, >>> &gomp_initial_icvs_dev.nthreads_var_list_len}, &parse_unsigned_long_list}, >>> + {ENTRY ("OMP_NUM_THREADS_ALL"), {OMP_NUM_THREADS_DEV_, >>> OMP_NTHREADS_LIST_DEV, OMP_NTHREADS_LIST_LEN_DEV}, GOMP_ENV_VAR_SUFFIX_ALL, >>> {&gomp_initial_icvs_all.nthreads_var, >>> &gomp_initial_icvs_all.nthreads_var_list, >>> &gomp_initial_icvs_all.nthreads_var_list_len}, &parse_unsigned_long_list}, >>> + {ENTRY ("OMP_NUM_THREADS"), {OMP_NUM_THREADS_DEV_, OMP_NTHREADS_LIST_DEV, >>> OMP_NTHREADS_LIST_LEN_DEV}, GOMP_ENV_VAR_SUFFIX_NONE, >>> {&gomp_initial_icvs_none.nthreads_var, >>> &gomp_initial_icvs_none.nthreads_var_list, >>> &gomp_initial_icvs_none.nthreads_var_list_len}, &parse_unsigned_long_list}, >>> + >>> + {ENTRY ("OMP_PROC_BIND_DEV"), {OMP_PROC_BIND_DEV_, >>> OMP_PROC_BIND_LIST_DEV_, OMP_PROC_BIND_LIST_LEN_DEV_}, >>> GOMP_ENV_VAR_SUFFIX_DEV, {&gomp_initial_icvs_dev.bind_var, >>> &gomp_initial_icvs_dev.bind_var_list, >>> &gomp_initial_icvs_dev.bind_var_list_len}, &parse_bind_var}, >>> + {ENTRY ("OMP_PROC_BIND_ALL"), {OMP_PROC_BIND_DEV_, >>> OMP_PROC_BIND_LIST_DEV_, OMP_PROC_BIND_LIST_LEN_DEV_}, >>> GOMP_ENV_VAR_SUFFIX_ALL, {&gomp_initial_icvs_all.bind_var, >>> &gomp_initial_icvs_all.bind_var_list, >>> &gomp_initial_icvs_all.bind_var_list_len}, &parse_bind_var}, >>> + {ENTRY ("OMP_PROC_BIND"), {OMP_PROC_BIND_DEV_, OMP_PROC_BIND_LIST_DEV_, >>> OMP_PROC_BIND_LIST_LEN_DEV_}, GOMP_ENV_VAR_SUFFIX_NONE, >>> {&gomp_initial_icvs_none.bind_var, &gomp_initial_icvs_none.bind_var_list, >>> &gomp_initial_icvs_none.bind_var_list_len}, &parse_bind_var}, >>> + >>> + {ENTRY ("OMP_MAX_ACTIVE_LEVELS_DEV"), {OMP_MAX_ACTIVE_LEVELS_DEV_}, >>> GOMP_ENV_VAR_SUFFIX_DEV, {&gomp_initial_icvs_dev.max_active_levels_var, (void >>> *) true, (void *) gomp_supported_active_levels}, &parse_unsigned_long}, >>> + {ENTRY ("OMP_MAX_ACTIVE_LEVELS_ALL"), {OMP_MAX_ACTIVE_LEVELS_DEV_}, >>> GOMP_ENV_VAR_SUFFIX_ALL, {&gomp_initial_icvs_all.max_active_levels_var, (void >>> *) true, (void *) gomp_supported_active_levels}, &parse_unsigned_long}, >>> + {ENTRY ("OMP_MAX_ACTIVE_LEVELS"), {OMP_MAX_ACTIVE_LEVELS_DEV_}, >>> GOMP_ENV_VAR_SUFFIX_NONE, {&gomp_initial_icvs_none.max_active_levels_var, >>> (void *) true, (void *) gomp_supported_active_levels}, &parse_unsigned_long}, >>> + >>> + {ENTRY ("OMP_WAIT_POLICY_DEV"), {OMP_WAIT_POLICY_DEV_}, >>> GOMP_ENV_VAR_SUFFIX_DEV, {&gomp_initial_icvs_dev.wait_policy}, >>> &parse_wait_policy}, >>> + {ENTRY ("OMP_WAIT_POLICY_ALL"), {OMP_WAIT_POLICY_DEV_}, >>> GOMP_ENV_VAR_SUFFIX_ALL, {&gomp_initial_icvs_all.wait_policy}, >>> &parse_wait_policy}, >>> + {ENTRY ("OMP_WAIT_POLICY"), {OMP_WAIT_POLICY_DEV_}, >>> GOMP_ENV_VAR_SUFFIX_NONE, {&gomp_initial_icvs_none.wait_policy}, >>> &parse_wait_policy}, >>> + >>> + {ENTRY ("OMP_STACKSIZE_DEV"), {OMP_STACKSIZE_DEV_}, >>> GOMP_ENV_VAR_SUFFIX_DEV, {&gomp_initial_icvs_dev.stacksize}, &parse_stacksize}, >>> + {ENTRY ("OMP_STACKSIZE_ALL"), {OMP_STACKSIZE_DEV_}, >>> GOMP_ENV_VAR_SUFFIX_ALL, {&gomp_initial_icvs_all.stacksize}, &parse_stacksize}, >>> + {ENTRY ("OMP_STACKSIZE"), {OMP_STACKSIZE_DEV_}, GOMP_ENV_VAR_SUFFIX_NONE, >>> {&gomp_initial_icvs_none.stacksize}, &parse_stacksize}, >>> + >>> + {ENTRY ("OMP_CANCELLATION"), {}, GOMP_ENV_VAR_SUFFIX_NONE, >>> {&gomp_cancel_var}, &parse_boolean}, >>> + {ENTRY ("OMP_DISPLAY_AFFINITY"), {}, GOMP_ENV_VAR_SUFFIX_NONE, >>> {&gomp_display_affinity_var}, &parse_boolean}, >>> + {ENTRY ("OMP_TARGET_OFFLOAD"), {}, GOMP_ENV_VAR_SUFFIX_NONE, >>> {&gomp_target_offload_var}, &parse_target_offload}, >>> + {ENTRY ("OMP_MAX_TASK_PRIORITY"), {}, GOMP_ENV_VAR_SUFFIX_NONE, >>> {&gomp_max_task_priority_var, (void *) true}, &parse_int}, >>> + {ENTRY ("OMP_ALLOCATOR"), {}, GOMP_ENV_VAR_SUFFIX_NONE, >>> {&gomp_def_allocator}, &parse_allocator}, >>> + {ENTRY ("OMP_DEFAULT_DEVICE"), {OMP_DEFAULT_DEVICE_DEV_}, >>> GOMP_ENV_VAR_SUFFIX_NONE, {&gomp_initial_icvs_none.default_device_var, (void >>> *) true}, &parse_int}, >>> +}; >> >> This is very large, why do you need 3 entries per most of the env vars? >> Just one would be enough, with just some flag whether it accepts any >> suffixes or not. > > The tables have been optimized now. Only one entry per ICV and removed some parts. > >> Lines are too long, they'd need wrapping. >> I think the coding conventions are space after { and space before }. > > Both changed. > >> >>> +static bool >>> +get_device_num (char *env, int *dev_num, int *dev_num_len) >>> +{ >>> + int pos = 0; >>> + char dev_num_str[INT_MAX_STR_LEN+1]; >>> + >>> + *dev_num_len = 0; >>> + *dev_num = -1; >>> + if (env == NULL) >>> + return false; >>> + >>> + while (pos < INT_MAX_STR_LEN) >>> + { >>> + if (env[pos] == '\0' || env[pos] == '=') >>> + break; >>> + dev_num_str[pos] = env[pos]; >>> + pos++; >>> + } >>> + >>> + if (env[pos] != '=' || (dev_num_str[0] == '0' && pos > 1)) >>> + return false; >>> + >>> + dev_num_str[pos] = '\0'; >>> + *dev_num = (int) strtoul (dev_num_str, 0, 10); >>> + *dev_num_len = pos; >> >> Why do you copy the chars to a separate stack buffer? >> strtoul will stop on anything that isn't a number. >> So, just passing it second argument and verifying that >> it points to '=' ('\0' wouldn't be valid) would be good enough >> (or perhaps also verying it doesn't point to the start pointer >> in case caller hasn't checked yet if it starts with a digit). > > Thanks for this hint. That makes it much simpler. Changed. > >> Also, the (int) cast means it throws away important information, >> we'd treat say >> 8589934593 >> on 64-bit arches as 1, I think we want to just ignore it. >> Also, we should ignore negative values (though, for 5.2 there >> is a question if OMP_NUM_THREADS_DEV_-1=5 is valid or not and >> redundant with OMP_NUM_THREADS_DEV_4=5 (if omp_get_num_devices() == 5) >> or OMP_NUM_THREADS and which one wins. > > Negative device numbers are rejected now. Btw. it seems that defining > environment variables like "OMP_NUM_THREADS_DEV_-1" is not always valid due to > "-" (invalid identifier name). > >> >>> + return true; >>> +} >>> + >>> +/* Helper function for initialize_env to add a device specific ICV value >>> + to gomp_initial_icv_dev_list. */ >>> + >>> +static void >>> +add_device_specific_icv (int dev_num, int icv_code, void *value) >>> +{ >>> + struct gomp_icv_list *list = gomp_initial_icv_dev_list; >>> + while (list != NULL && list->device_num != dev_num) >>> + list = list->next; >>> + >>> + if (list == NULL) >>> + { >>> + list = >>> + (struct gomp_icv_list *) gomp_malloc (sizeof (struct gomp_icv_list)); >> >> Formatting, = can't be at the end of line. >> >>> +static unsigned char >>> +get_icv_flag (unsigned char flag_var) >>> +{ >>> + switch (flag_var) >>> + { >>> + case OMP_NUM_TEAMS_DEV_: >>> + return gomp_initial_icv_flags.nteams_var; >>> + case OMP_SCHEDULE_DEV_: >>> + return gomp_initial_icv_flags.run_sched_var; >>> + case OMP_SCHEDULE_CHUNK_SIZE_DEV_: >>> + return gomp_initial_icv_flags.run_sched_chunk_size; >>> + case OMP_DYNAMIC_DEV_: >>> + return gomp_initial_icv_flags.dyn_var; >>> + case OMP_TEAMS_THREAD_LIMIT_DEV_: >>> + return gomp_initial_icv_flags.teams_thread_limit_var; >>> + case OMP_THREAD_LIMIT_DEV_: >>> + return gomp_initial_icv_flags.thread_limit_var; >>> + case OMP_NUM_THREADS_DEV_: >>> + return gomp_initial_icv_flags.nthreads_var; >>> + case OMP_NTHREADS_LIST_DEV: >>> + return gomp_initial_icv_flags.nthreads_var_list; >>> + case OMP_NTHREADS_LIST_LEN_DEV: >>> + return gomp_initial_icv_flags.nthreads_var_list_len; >>> + case OMP_PROC_BIND_DEV_: >>> + return gomp_initial_icv_flags.bind_var; >>> + case OMP_PROC_BIND_LIST_DEV_: >>> + return gomp_initial_icv_flags.bind_var_list; >>> + case OMP_PROC_BIND_LIST_LEN_DEV_: >>> + return gomp_initial_icv_flags.bind_var_list_len; >>> + case OMP_MAX_ACTIVE_LEVELS_DEV_: >>> + return gomp_initial_icv_flags.max_active_levels_var; >>> + case OMP_WAIT_POLICY_DEV_: >>> + return gomp_initial_icv_flags.wait_policy; >>> + case OMP_STACKSIZE_DEV_: >>> + return gomp_initial_icv_flags.stacksize; >>> + case OMP_DEFAULT_DEVICE_DEV_: >>> + return gomp_initial_icv_flags.default_device_var; >>> + default: >>> + return OMP_NONE; >> >> Doesn't this function return a bitmask of GOMP_ENV_VAR_SUFFIX_WHATEVER >> values? OMP_NONE isn't one of them, shouldn't it return 0 instead? >> But more importantly, wouldn't it be easier if the icv_flags was just >> an array indexed by flag_var? You don't need a large switch to handle >> setting or getting it then. As you just want 4 bits per flag instead of 8, >> you could index it by flag_var >> 1 and for flag_var & 1 shift right or left >> by 4. > > The ICV flags are now defined as uint32_t. This is enough to store flags for our > current 21 ICVs. The flags for _DEV, _DEV_X, _ALL and no suffix are now > separated as we have the flags variable for each item of the initial icv list. > >> >>> static void __attribute__((constructor)) >>> initialize_env (void) >>> { >>> - unsigned long thread_limit_var; >>> - unsigned long max_active_levels_var; >>> + extern char **environ; >>> + char **env; >>> + int omp_var, dev_num = 0, dev_num_len = 0, int_value, k; >>> + bool bool_value, ignore = false; >>> + char *env_val; >>> >>> /* Do a compile time check that mkomp_h.pl did good job. */ >>> omp_check_defines (); >>> >>> - parse_schedule (); >>> - parse_boolean ("OMP_DYNAMIC", &gomp_global_icv.dyn_var); >>> - parse_boolean ("OMP_CANCELLATION", &gomp_cancel_var); >>> - parse_boolean ("OMP_DISPLAY_AFFINITY", &gomp_display_affinity_var); >>> - parse_int ("OMP_DEFAULT_DEVICE", &gomp_global_icv.default_device_var, true); >>> - parse_target_offload ("OMP_TARGET_OFFLOAD", &gomp_target_offload_var); >>> - parse_int ("OMP_MAX_TASK_PRIORITY", &gomp_max_task_priority_var, true); >>> - gomp_def_allocator = parse_allocator (); >>> - if (parse_unsigned_long ("OMP_THREAD_LIMIT", &thread_limit_var, false)) >>> - { >>> - gomp_global_icv.thread_limit_var >>> - = thread_limit_var > INT_MAX ? UINT_MAX : thread_limit_var; >>> - } >>> - parse_int_secure ("GOMP_DEBUG", &gomp_debug_var, true); >>> #ifndef HAVE_SYNC_BUILTINS >>> gomp_mutex_init (&gomp_managed_threads_lock); >>> #endif >>> gomp_init_num_threads (); >>> gomp_available_cpus = gomp_global_icv.nthreads_var; >>> - if (!parse_unsigned_long_list ("OMP_NUM_THREADS", >>> - &gomp_global_icv.nthreads_var, >>> - &gomp_nthreads_var_list, >>> - &gomp_nthreads_var_list_len)) >>> - gomp_global_icv.nthreads_var = gomp_available_cpus; >>> - parse_int ("OMP_NUM_TEAMS", &gomp_nteams_var, false); >>> - parse_int ("OMP_TEAMS_THREAD_LIMIT", &gomp_teams_thread_limit_var, false); >>> - bool ignore = false; >>> - if (parse_bind_var ("OMP_PROC_BIND", >>> - &gomp_global_icv.bind_var, >>> - &gomp_bind_var_list, >>> - &gomp_bind_var_list_len) >>> + >>> + for (env = environ; *env != 0; env++) >>> + { >>> + if (!startswith (*env, "OMP_")) >>> + continue; >> >> While the above is certainly good and quickly skips non-OpenMP env vars, >> I think >> >>> + >>> + for (omp_var = 0; omp_var < OMP_DEV_VAR_CNT; omp_var++) >>> + { >>> + if (startswith (*env, envvars_dev[omp_var].name)) >> >> walking 36 entries for each OMP_ env var and doing strncmp for each is >> expensive. >> >> Wouldn't it be better to just walk the name once, find out the suffix in >> there and the length of the part before it too, >> just a simple loop, stopping at '=' and when seeing '_', check if followed >> by "ALL=", "DEV=" or "DEV_" followed by digit. That will determine the >> kind and length of the env var name without the suffixes, so then >> you can just walk the much shortened table with just 16 entries now >> and can skip entries which don't have the computed length (the table >> includes name_len, so start by >> if (envvars[omp_var].name_len != name_len) >> continue; >> if (memcmp (*env + strlen ("OMP_"), envvars[omp_var].name, >> envvars[omp_var].name_len) != 0) >> continue; > > That's a good point. Together with the "envvars" table the parsing has been > optimized. Similarly to you proposal I just use a flag in the table and parse > the environment variable variants successively. First the basic variable name > (e.g. OMP_NUM_TEAMS) and then checking whether "=", "_DEV=", "_ALL=", or "_DEV_" > is following if allowed according to the table's flag. (Only) for a directly > following "=" a check is not necessary because host variants are always allowed > (thus GOMP_ENV_VAR_SUFFIX_NONE is also omitted in the table). > >> >>> --- a/libgomp/libgomp.h >>> +++ b/libgomp/libgomp.h >>> @@ -454,6 +454,24 @@ struct gomp_team_state >>> >>> struct target_mem_desc; >>> >>> +#define OMP_NONE 0 >>> +#define OMP_NUM_TEAMS_DEV_ 1 >>> +#define OMP_SCHEDULE_DEV_ 2 >>> +#define OMP_SCHEDULE_CHUNK_SIZE_DEV_ 3 >>> +#define OMP_DYNAMIC_DEV_ 4 >>> +#define OMP_TEAMS_THREAD_LIMIT_DEV_ 5 >>> +#define OMP_THREAD_LIMIT_DEV_ 6 >>> +#define OMP_NUM_THREADS_DEV_ 7 >>> +#define OMP_NTHREADS_LIST_DEV 8 >>> +#define OMP_NTHREADS_LIST_LEN_DEV 9 >>> +#define OMP_PROC_BIND_DEV_ 10 >>> +#define OMP_PROC_BIND_LIST_DEV_ 11 >>> +#define OMP_PROC_BIND_LIST_LEN_DEV_ 12 >>> +#define OMP_MAX_ACTIVE_LEVELS_DEV_ 13 >>> +#define OMP_WAIT_POLICY_DEV_ 14 >>> +#define OMP_STACKSIZE_DEV_ 15 >>> +#define OMP_DEFAULT_DEVICE_DEV_ 16 >> >> These aren't constans defined in OpenMP standard, so I think it would >> be better to use different prefixes, say GOMP_ICV_WHATEVER or >> GOMP_ENV_WHATEVER. OMP_NONE is very much non-descriptive of what >> it means. Why do some defines have _DEV_ suffixes and others _DEV? > They are renamed accordingly. > >> It should also be an enum rather than set of defines and I don't see a >> reason for the _DEV* suffixes. > > Agreed and changed accordingly. > >> >>> + >>> /* These are the OpenMP 4.0 Internal Control Variables described in >>> section 2.3.1. Those described as having one copy per task are >>> stored within the structure; those described as having one copy >>> @@ -473,6 +491,69 @@ struct gomp_task_icv >>> struct target_mem_desc *target_data; >>> }; >>> >>> +#define GOMP_ENV_VAR_SUFFIX_UNKNOWN 0 >>> +#define GOMP_ENV_VAR_SUFFIX_NONE 1 >>> +#define GOMP_ENV_VAR_SUFFIX_DEV 2 >>> +#define GOMP_ENV_VAR_SUFFIX_ALL 4 >>> +#define GOMP_ENV_VAR_SUFFIX_DEV_X 8 >> >> Similarly, make this an enum, and perhaps just GOMP_ENV_SUFFIX_WHATEVER ? > > Also changed. > > An updated patch is attached and tested again on x86_64-linux with nvptx and > amdgcn offloading without regression. > > Marcel > ----------------- > Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 > München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, > Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955