Hi Jakub, Am 04.05.2022 um 17:12 schrieb Jakub Jelinek: > On Tue, Jan 18, 2022 at 05:10:47PM +0100, Marcel Vollweiler wrote: >> Hi, >> >> This patch considers the environment variable syntax extension for >> device-specific variants of environment variables from OpenMP 5.1 (see >> OpenMP 5.1 specification, p. 75 and p. 639). An environment variable >> (e.g. OMP_NUM_TEAMS) can have different suffixes: >> >> _DEV (e.g. OMP_NUM_TEAMS_DEV): affects all devices but not the host. >> _DEV_ (e.g. OMP_NUM_TEAMS_DEV_42): affects only device with >> number . >> no suffix (e.g. OMP_NUM_TEAMS): affects only the host. >> >> In future OpenMP versions also suffix _ALL will be introduced (see >> discussion https://github.com/OpenMP/spec/issues/3179). This is also >> considered in this patch: >> >> _ALL (e.g. OMP_NUM_TEAMS_ALL): affects all devices and the host. >> >> The precedence is as follows (descending). For the host: >> >> 1. no suffix >> 2. _ALL >> >> For devices: >> >> 1. _DEV_ >> 2. _DEV >> 3. _ALL >> >> That means, _DEV_ is used whenever available. Otherwise _DEV is >> used if available, and at last _ALL. If there is no value for any of the >> variable variants, default values are used as already implemented before. >> >> This patch concerns parsing (a), storing (b), output (c) and >> transmission to the device (d): >> >> (a) The actual number of devices and the numbering are not known when >> parsing the environment variables. Thus all environment variables are >> iterated and searched for device-specific ones. >> >> (b) Only configured device-specific variables are stored. Thus, linked >> lists are used. >> >> (c) The output is done in omp_display_env (see specification p. 468f). >> Global ICVs are tagged with [all], see >> https://github.com/OpenMP/spec/issues/3179. ICVs which are not global >> but aren't handled device-specific yet are tagged with [host]. >> omp_display_env outputs the initial values of the ICVs. That's why >> separate data structures are introduced (like gomp_initial_icv...). >> >> (d) Device-specific ICVs which are already user accessible on the device >> are transmitted to the device (moreover nteams-var is added and used for >> the tests). There are ICVs which values are currently set explicitly in >> the config when copying them to the device: GOMP_NTHREADS_VAR, >> GOMP_THREAD_LIMIT_VAR, GOMP_DYN_VAR (see gomp_gcn_enter_kernel in >> libgomp/config/gcn/team.c and gomp_nvptx_main in >> libgomp/config/nvptx/team.c). The corresponding environment variables >> are nevertheless parsed and stored device-specific but the transmission >> to the device is not changed. > > Just a partial review, there are many issues. > Some issues I'm mentioning just once or several times but many apply to > various other spots in the patch. > >> +/* Returns the element of the list for the specified device number. */ >> +struct gomp_icv_list* >> +gomp_get_icv_list (struct gomp_icv_list **list, int device_num) >> +{ >> + struct gomp_icv_list *l = *list; >> + while (l != NULL) >> + { >> + if (l->device_num == device_num) >> + return l; >> + l = l->next; >> + } >> + return NULL; >> +} >> + >> +void* > > Space before *. Corrected all occurrences. > >> +gomp_get_icv_value_ptr (struct gomp_icv_list **list, int device_num) >> +{ >> + struct gomp_icv_list *l = gomp_get_icv_list (list, device_num); >> + if (l == NULL) >> + return NULL; >> + return l->value; >> +} >> + >> +/* Lists for initial device-specific ICVs, i.e. ICVs that are configured for >> + particular devices (with environment variables like OMP_NUM_TEAMS_DEV_42). */ >> +struct gomp_icv_list *gomp_dyn_var_dev_list = NULL; >> +struct gomp_icv_list *gomp_nthreads_var_dev_list = NULL; >> +struct gomp_icv_list *gomp_nthreads_var_list_dev_list = NULL; >> +struct gomp_icv_list *gomp_nthreads_var_list_len_dev_list = NULL; >> +struct gomp_icv_list *gomp_run_sched_var_dev_list = NULL; >> +struct gomp_icv_list *gomp_run_sched_chunk_size_dev_list = NULL; >> +struct gomp_icv_list *gomp_nteams_var_dev_list = NULL; >> +struct gomp_icv_list *gomp_thread_limit_var_dev_list = NULL; >> +struct gomp_icv_list *gomp_max_active_levels_var_dev_list = NULL; >> +struct gomp_icv_list *gomp_proc_bind_var_dev_list = NULL; >> +struct gomp_icv_list *gomp_proc_bind_var_list_dev_list = NULL; >> +struct gomp_icv_list *gomp_proc_bind_var_list_len_dev_list = NULL; >> +struct gomp_icv_list *stacksize_dev_list = NULL; >> +struct gomp_icv_list *wait_policy_dev_list = NULL; >> +struct gomp_icv_list *teams_thread_limit_var_dev_list = NULL; > > To me the above is just too big extra .data growth, we should optimize for > the common case of no OMP_* env vars or a few host cases of them. > So, I think it is ok to have the gomp_initial_icv var as is and > gomp_initial_icv_flags too. But I'd turn gomp_initial_icv_all and > gomp_initial_icv_dev into pointers to gomp_initial_icv_t, Turning gomp_initial_icv_all and gomp_initial_icv_dev into pointers does not work in the (new) table driven approach were the addresses of both are used. > and maybe instead > of the OMP_*_DEV_ linked lists for each var separately add one linked > list that contains device number, next pointer, gomp_initial_icv_t for values > and gomp_icv_flags_t used as a bitmask "is this ICV set for this ". The above lists are replaced by one list now. > >> + >> +/* Flags for non-global ICVs to store by which environment variables they are >> + affected. */ >> +struct gomp_icv_flags_t gomp_initial_icv_flags = { >> + .nthreads_var = GOMP_ENV_VAR_SUFFIX_UNKNOWN, >> + .run_sched_var = GOMP_ENV_VAR_SUFFIX_UNKNOWN, >> + .run_sched_chunk_size = GOMP_ENV_VAR_SUFFIX_UNKNOWN, >> + .thread_limit_var = GOMP_ENV_VAR_SUFFIX_UNKNOWN, >> + .dyn_var = GOMP_ENV_VAR_SUFFIX_UNKNOWN, >> + .max_active_levels_var = GOMP_ENV_VAR_SUFFIX_UNKNOWN, >> + .bind_var = GOMP_ENV_VAR_SUFFIX_UNKNOWN, >> + .nteams_var = GOMP_ENV_VAR_SUFFIX_UNKNOWN, >> + .stacksize = GOMP_ENV_VAR_SUFFIX_UNKNOWN, >> + .wait_policy = GOMP_ENV_VAR_SUFFIX_UNKNOWN, >> + .teams_thread_limit_var = GOMP_ENV_VAR_SUFFIX_UNKNOWN >> +}; >> + > >> invalid: >> gomp_error ("Invalid value for chunk size in " >> "environment variable OMP_SCHEDULE"); > > This should be > gomp_error ("Invalid value for chunk size in " > "environment variable %s", name); > and similarly for all the other parsing routines. Corrected all occurrences. > >> +/* Helper function for omp_display_env which prints the values of run_sched_var. >> + 'device' can be 'host', 'dev', 'all' or a particular device number. */ >> +static void >> +print_schedule (enum gomp_schedule_type run_sched_var, int run_sched_chunk_size, >> + const char* device) > > Formatting, space before * not after it. Corrected all occurrences. > >> +/* Helper function for omp_display_env which prints the values of proc_bind_var. >> + 'device' can be 'host', 'dev', 'all', or a particular device number. */ >> +static void print_proc_bind (char proc_bind_var, unsigned long len, char **list, >> + const char* device) > > Space before * instead of after it. > print_proc_bind should be at the start of line, not in the middle. Changed. > >> +{ >> + fprintf (stderr, " [%s] OMP_PROC_BIND = '", device); >> + switch (proc_bind_var) >> { >> case omp_proc_bind_false: >> fputs ("FALSE", stderr); >> @@ -1324,8 +1392,8 @@ omp_display_env (int verbose) >> fputs ("SPREAD", stderr); >> break; >> } >> - for (i = 1; i < gomp_bind_var_list_len; i++) >> - switch (gomp_bind_var_list[i]) >> + for (int i = 1; i < len; i++) >> + switch ((*list)[i]) >> { >> case omp_proc_bind_master: >> fputs (",MASTER", stderr); /* TODO: Change to PRIMARY for OpenMP 5.1. */ >> @@ -1338,7 +1406,131 @@ omp_display_env (int verbose) >> break; >> } >> fputs ("'\n", stderr); >> - fputs (" OMP_PLACES = '", stderr); >> +} >> + >> +void >> +omp_display_env (int verbose) >> +{ >> + int i; >> + >> + fputs ("\nOPENMP DISPLAY ENVIRONMENT BEGIN\n", stderr); >> + >> + fputs (" _OPENMP = '201511'\n", stderr); >> + >> + fprintf (stderr, " [host] OMP_DYNAMIC = '%s'\n", >> + gomp_initial_icv.dyn_var ? "TRUE" : "FALSE"); >> + if (gomp_initial_icv_flags.dyn_var & GOMP_ENV_VAR_SUFFIX_ALL) >> + fprintf (stderr, " [all] OMP_DYNAMIC = '%s'\n", >> + gomp_initial_icv_all.dyn_var ? "TRUE" : "FALSE"); >> + if (gomp_initial_icv_flags.dyn_var & GOMP_ENV_VAR_SUFFIX_DEV) >> + fprintf (stderr, " [device] OMP_DYNAMIC = '%s'\n", >> + gomp_initial_icv_dev.dyn_var ? "TRUE" : "FALSE"); >> + struct gomp_icv_list* l_dyn_var = gomp_dyn_var_dev_list; > > * formatting. Corrected all occurrences. > >> + while (l_dyn_var != NULL) >> + { >> + fprintf (stderr, " [%d] OMP_DYNAMIC = '%s'\n", l_dyn_var->device_num, >> + *(bool*)l_dyn_var->value ? "TRUE" : "FALSE"); >> + l_dyn_var = l_dyn_var->next; >> + } > > And like mentioned elsewhere, this should be table driven, ideally using > one table for all. You really don't want to duplicate that much code. Three tables are introduced in env.c: 1) a table for parsing device specific ICVs ("envvars_dev"), 2) a table for parsing non device specific ICVs ("envvars"), 3) a table that helps setting the host ICVs depending on the _ALL and _DEV variants of the environment variables ("host_envvars"). Those tables make the parsing of environment variables and setting the corresponding ICV values more compact and clear from my point of view. However, making omp_display_env table driven seems not quite helpful, since mostly all of the ICVs have different output patterns. Nevertheless I introduced print_device_specific_icvs function to reduce the complexity of iterating the list for most of the ICVs. > >> + struct gomp_icv_list* l_nthreads_var = gomp_nthreads_var_dev_list; > > * formatting. > >> + struct gomp_icv_list* l_run_sched_var = gomp_run_sched_var_dev_list; > > * formatting (many times more). > >> +/* Helper function for parse_device_specific. Extracts the device number from >> + an environment variable name. */ >> +static void >> +get_device_num (char **env, int prefix_len, int *dev_num, int *name_len) > > Why do you pass char **env rather than say just char *env? The extra > indirection doesn't seem to be useful. Why don't you return one of the > two return values and just fill in the other one through pointer argument? Removed the extra indirection for "env". The function "get_device_num" returns a boolean value now in order to signalize success or failure. > >> +{ >> + if (env == NULL || *env == NULL) >> + { >> + *name_len = 0; >> + *dev_num = -1; >> + return; >> + } >> + >> + int eq_pos = strchr (*env, '=') - *env; > > This will misbehave if there is no = character. > There is no point to walk again the first prefix_len bytes or walk megabytes > of chars if somebody creates extremely long vars, those will never something > we care about. > So I'd just compute the longest device num we want to support > (stringify __INT_MAX__ with an extra preprocessor macro in between so that > it is expanded and compute sizeof of it, unless you hardcode 10 chars > maximum) and check that after the prefix_len there is 1 to 10 of [0-9] chars > followed by = in a loop, if not, punt, otherwise strtoul it into a number? > Maybe also verify that first digit is 0 only iff that is immediately > followed by =. The get_device_num function was adapted accordingly and a constant INT_MAX_STR_LEN with value 10 is introduced and used. I thought about stringification of __INT_MAX__ instead using the hardcoded "10" but ended only in some over-complicated macros since __INT_MAX__ is given as hex number and not as decimal. > >> + int dev_num_len = eq_pos - prefix_len; >> + char buf_dev_num[dev_num_len+1]; >> + >> + strncpy(buf_dev_num, *env + prefix_len, dev_num_len); >> + buf_dev_num[dev_num_len] = '\0'; > > No need to copy anything, VLA, etc. And strtoul is what is used elsewhere > in libgomp, we never use atoi. Replaced the strncpy with a loop over the actual expected device number and replaced atoi with strtoul. > >> + *dev_num = atoi (buf_dev_num); >> + *name_len = eq_pos; >> +} >> + >> +/* Helper function for parse_device_specific. Adds a new node to the given >> + list. */ >> +static struct gomp_icv_list* >> +add_device_specific_icv (int dev_num, size_t size, struct gomp_icv_list **list) >> +{ >> + if (list == NULL) >> + return NULL; >> + >> + struct gomp_icv_list *new_node = >> + (struct gomp_icv_list*) malloc (sizeof (struct gomp_icv_list)); > > Please always use gomp_malloc if it can't fail, malloc should be only used > if it is an optimization and caller will handle malloc returning NULL > gracefully. Changed. > >> + new_node->device_num = dev_num; >> + new_node->value = malloc (size); >> + new_node->next = *list; >> + *list = new_node; >> + >> + return new_node; >> +} >> + >> +/* Helper function for 'initialize_env' to parse device-specific environment >> + variables like 'OMP_NUM_TEAMS_DEV_42'. */ >> +static void >> +parse_device_specific () > > Should be (void) Good hint, thanks. (parse_device_specific was removed though) > >> +{ >> + extern char **environ; >> + int dev_num; >> + int name_len; >> + struct gomp_icv_list *new_node; >> + >> + for (char **env = environ; *env != 0; env++) >> + { >> + if (strncmp (*env, "OMP_SCHEDULE_DEV_", 17) == 0) > > These would be a maintainance nightmare (making sure all the computed > lengths are accurate). Please add something like > static inline bool > startswith (const char *str, const char *prefix) > { > return strncmp (str, prefix, strlen (prefix)) == 0; > } > gcc/system.h has and use it. > But even with that you really don't want to have to duplicate so much code, > e.g. all the get_device_num char name[name_len strncpy and termination > at least. So I think you want it table driven, start by checking > if (!startswith (*env, "OMP_")) > continue; > and then walk over some table with names of the env vars, their precomputed > lengths (using some macro so that you don't duplicate the strings), and then > what to do with it, where what to do could be what function to call and var > to fill in, or something you can switch on if common code can't be used. Agreed, so I introduced a table ("envvars") that contains the names, the length of the names, pointers to the parse functions etc. > >> + { >> + get_device_num (env, 17, &dev_num, &name_len); >> + char name[name_len+1]; > > Formatting, spaces around + on both sides. > But I think using a VLA is unnecessary, you can easily compute some > upper bound (length of largest supported env var name + length of largest > supported device number (we shouldn't support > INT_MAX numbers). Changed. > >> + strncpy(name, *env, name_len); > > Formatting, space before (. Changed. > >> + new_node = add_device_specific_icv (dev_num, sizeof (bool), >> + &gomp_dyn_var_dev_list); >> + *((bool*)(new_node->value)) = value; > > Formatting, space before *, space in between )(. Corrected (hopefully) all occurrences. > >> + /* Parse the environment variables and store their values in the initial >> + struct. */ >> + if (parse_schedule ("OMP_SCHEDULE", &gomp_initial_icv.run_sched_var, >> + &gomp_initial_icv.run_sched_chunk_size)) >> + { >> + gomp_initial_icv_flags.run_sched_var |= GOMP_ENV_VAR_SUFFIX_NONE; >> + gomp_initial_icv_flags.run_sched_chunk_size |= GOMP_ENV_VAR_SUFFIX_NONE; >> + } >> + if (parse_schedule ("OMP_SCHEDULE_ALL", &gomp_initial_icv_all.run_sched_var, >> + &gomp_initial_icv_all.run_sched_chunk_size)) >> + { >> + gomp_initial_icv_flags.run_sched_var |= GOMP_ENV_VAR_SUFFIX_ALL; >> + gomp_initial_icv_flags.run_sched_chunk_size |= GOMP_ENV_VAR_SUFFIX_ALL; >> + } >> + if (parse_schedule ("OMP_SCHEDULE_DEV", &gomp_initial_icv_dev.run_sched_var, >> + &gomp_initial_icv_dev.run_sched_chunk_size)) >> + { >> + gomp_initial_icv_flags.run_sched_var |= GOMP_ENV_VAR_SUFFIX_DEV; >> + gomp_initial_icv_flags.run_sched_chunk_size |= GOMP_ENV_VAR_SUFFIX_DEV; >> + } > > This really should be table driven. This is code done once during program > startup, so we don't want such code to be extra large (roughly optimize for > size, not speed). Furthermore, we should optimize for the common case that > no env vars or only very few of them are used and if anything, the host > ones. > Looking at glibc's getenv implementation, getenv is O(num_env_vars), like: > for (ep = __environ; *ep != NULL; ++ep) > { > #if _STRING_ARCH_unaligned > uint16_t ep_start = *(uint16_t *) *ep; > #else > uint16_t ep_start = (((unsigned char *) *ep)[0] > | (((unsigned char *) *ep)[1] << 8)); > #endif > > if (name_start == ep_start && !strncmp (*ep + 2, name, len) > && (*ep)[len + 2] == '=') > return &(*ep)[len + 3]; > } > I'd keep the GOMP_* env vars as is using getenv, but as we need to walk > environ completely for the OMP_*_DEV_* vars anyway, I wonder if we just > shouldn't handle all the OMP_* env vars inside of that loop. Good point. Changed accordingly, i.e. one iteration over all environment variables. Only in a few cases getenv is still used (GOMP_ and dependent OMP_ variables). > Immediately punt on non-OMP_ prefixed vars, then do table driven decision > on what middle part we have and depending on what tail (nothing, _DEV, _ALL, > _DEV_) ending it has decide where to stick it. > > For the table driven operation, some vars are regular and just parse a bool > or long etc. value, others need some extra handling code. > > Though, there is one gotcha, if we had code where we parsed some var first > and another one later and there was interdependence between the two, in > environ they can appear in any order. > >> + >> + /* Set the ICV values for the host. */ >> + if (gomp_initial_icv_flags.run_sched_var & GOMP_ENV_VAR_SUFFIX_NONE) >> + { >> + gomp_global_icv.run_sched_var = gomp_initial_icv.run_sched_var; >> + gomp_global_icv.run_sched_chunk_size = > > No = at the end of line. Changed. > >> + gomp_initial_icv.run_sched_chunk_size; >> + } > >> +enum gomp_env_var_suffix_t >> +{ >> + GOMP_ENV_VAR_SUFFIX_UNKNOWN = 0, >> + GOMP_ENV_VAR_SUFFIX_NONE = 1, >> + GOMP_ENV_VAR_SUFFIX_DEV = 2, >> + GOMP_ENV_VAR_SUFFIX_ALL = 4 >> +}; >> + >> +/* Struct that contains all ICVs for which we need to store initial values. >> + Keeping the initial values is needed for omp_display_env and also used for >> + transmitting device-specific values to the target. */ >> +struct gomp_initial_icv_t >> +{ >> + unsigned long nthreads_var; >> + unsigned long *nthreads_var_list; >> + unsigned long nthreads_var_list_len; >> + enum gomp_schedule_type run_sched_var; >> + int run_sched_chunk_size; >> + int default_device_var; >> + unsigned int thread_limit_var; >> + bool dyn_var; >> + unsigned char max_active_levels_var; >> + char bind_var; >> + char *bind_var_list; >> + unsigned long bind_var_list_len; >> + int nteams_var; >> + int teams_thread_limit_var; >> + int wait_policy; >> + unsigned long stacksize; > > Would be nice to order the struct elements to avoid padding. > Say put pointers first, then unsigned long fields, then > int/uint/enum ones and finally the bool/char ones. Changed. > >> +}; >> + >> +struct gomp_icv_flags_t >> +{ >> + enum gomp_env_var_suffix_t nthreads_var; >> + enum gomp_env_var_suffix_t run_sched_var; >> + enum gomp_env_var_suffix_t run_sched_chunk_size; >> + enum gomp_env_var_suffix_t thread_limit_var; >> + enum gomp_env_var_suffix_t dyn_var; >> + enum gomp_env_var_suffix_t max_active_levels_var; >> + enum gomp_env_var_suffix_t bind_var; >> + enum gomp_env_var_suffix_t nteams_var; >> + enum gomp_env_var_suffix_t stacksize; >> + enum gomp_env_var_suffix_t wait_policy; >> + enum gomp_env_var_suffix_t teams_thread_limit_var; > > This is unnecessarily large. You need just 3 bits for each. > Can't you make all those say 4 bit bitfields? > I know it is a GNU extension, but we don't expect libgomp to be compiled by > other compilers, do we? Changed. > >> +}; >> + >> +struct gomp_icv_list { > > Formatting consistency. All other toplevel structs have > { on the next line. Changed. > >> + int device_num; >> + void* value; > > Formatting, use void *value; instead. Changed. > >> + struct gomp_icv_list *next; >> +}; >> + > >> --- /dev/null >> +++ b/libgomp/testsuite/libgomp.c-c++-common/icv-5.c >> @@ -0,0 +1,48 @@ >> +/* { dg-do run } */ >> + >> +#include >> +#include >> +#include >> +#include >> + >> +char const *varnames[] = { >> + "OMP_NUM_TEAMS_DEV_0", >> + "OMP_NUM_TEAMS_DEV_1", >> + "OMP_NUM_TEAMS_DEV_2", >> + "OMP_NUM_TEAMS_ALL", >> + "OMP_NUM_TEAMS_DEV", >> + "OMP_NUM_TEAMS" >> +}; >> +char const *values[] = { "42", "43", "44", "45", "46", "47" }; >> +const int cnt = 6; >> + >> +int >> +main (int argc, char *const *argv) >> +{ >> + int updated = 0; >> + for (int i = 0; i < cnt; i++) >> + { >> + if (getenv (varnames[i]) == NULL >> + || strcmp (getenv (varnames[i]), values[i]) != 0) >> + { >> + setenv (varnames[i], values[i], 1); >> + updated = 1; >> + } >> + } >> + if (updated) >> + { >> + execv (argv[0], argv); > > I'm not sure we can rely on execv on all targets that do support libgomp. > Any reason why you actually need this, rather than using > dg-set-target-env-var directive(s) and perhaps return 0; if getenv doesn't > return the expected values? Interesting topic. After some (internal) discussions I think the best way is to set the environment variables explicitely instead using dg-set-target-env-var. The reason is that dg-set-target-env-var does not work for remote testing (which seems to be a common test environment). For remote testing dejagnu immediately aborts the test case with UNSUPPORTED which is specified in the corresponding extension and makes sence from my point of view as the test assumption cannot be fulfilled (since the environment variables are not set on remote targets). It also means that whenever dg-set-target-env-var is set in the test file, the execution of the test case is not tested on remote targets. However, I understand that compatibility is essential (in this particular case to use execv only if the test is run on POSIX compliant systems) and added a test for "__unix__" (though this is not a complete guarantee for POSIX conformity as far as I understand). > >> + abort (); >> + } >> + > >> + int num_devices = omp_get_num_devices () > 3 ? 3 : omp_get_num_devices (); >> + for (int i=0; i < num_devices; i++) >> + #pragma omp target device (i) >> + if (omp_get_max_teams () != 42) >> + abort (); >> + >> + return 0; >> +} >> \ No newline at end of file > > Please avoid files without newlines at the end. Corrected all test files. An updated patch is attached and tested again on x86_64-linux with nvptx and amdgcn offloading without regression. Marcel ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955