From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 3E4FE3858C53 for ; Wed, 4 May 2022 15:12:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3E4FE3858C53 Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-593-wwB_c1K9N7eViw3cfKVIsw-1; Wed, 04 May 2022 11:12:48 -0400 X-MC-Unique: wwB_c1K9N7eViw3cfKVIsw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 268A51014A60; Wed, 4 May 2022 15:12:48 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.192.16]) by smtp.corp.redhat.com (Postfix) with ESMTPS id BCAF1C27EB0; Wed, 4 May 2022 15:12:47 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.16.1/8.16.1) with ESMTPS id 244FCif31696267 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Wed, 4 May 2022 17:12:44 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.16.1/8.16.1/Submit) id 244FChsT1696266; Wed, 4 May 2022 17:12:43 +0200 Date: Wed, 4 May 2022 17:12:42 +0200 From: Jakub Jelinek To: Marcel Vollweiler Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH] OpenMP, libgomp: Environment variable syntax extension. Message-ID: Reply-To: Jakub Jelinek References: <392c847d-e798-2be3-a808-6888de6c90cd@codesourcery.com> MIME-Version: 1.0 In-Reply-To: <392c847d-e798-2be3-a808-6888de6c90cd@codesourcery.com> X-Scanned-By: MIMEDefang 2.85 on 10.11.54.8 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 May 2022 15:12:53 -0000 On Tue, Jan 18, 2022 at 05:10:47PM +0100, Marcel Vollweiler wrote: > Hi, > > This patch considers the environment variable syntax extension for > device-specific variants of environment variables from OpenMP 5.1 (see > OpenMP 5.1 specification, p. 75 and p. 639). An environment variable > (e.g. OMP_NUM_TEAMS) can have different suffixes: > > _DEV (e.g. OMP_NUM_TEAMS_DEV): affects all devices but not the host. > _DEV_ (e.g. OMP_NUM_TEAMS_DEV_42): affects only device with > number . > no suffix (e.g. OMP_NUM_TEAMS): affects only the host. > > In future OpenMP versions also suffix _ALL will be introduced (see > discussion https://github.com/OpenMP/spec/issues/3179). This is also > considered in this patch: > > _ALL (e.g. OMP_NUM_TEAMS_ALL): affects all devices and the host. > > The precedence is as follows (descending). For the host: > > 1. no suffix > 2. _ALL > > For devices: > > 1. _DEV_ > 2. _DEV > 3. _ALL > > That means, _DEV_ is used whenever available. Otherwise _DEV is > used if available, and at last _ALL. If there is no value for any of the > variable variants, default values are used as already implemented before. > > This patch concerns parsing (a), storing (b), output (c) and > transmission to the device (d): > > (a) The actual number of devices and the numbering are not known when > parsing the environment variables. Thus all environment variables are > iterated and searched for device-specific ones. > > (b) Only configured device-specific variables are stored. Thus, linked > lists are used. > > (c) The output is done in omp_display_env (see specification p. 468f). > Global ICVs are tagged with [all], see > https://github.com/OpenMP/spec/issues/3179. ICVs which are not global > but aren't handled device-specific yet are tagged with [host]. > omp_display_env outputs the initial values of the ICVs. That's why > separate data structures are introduced (like gomp_initial_icv...). > > (d) Device-specific ICVs which are already user accessible on the device > are transmitted to the device (moreover nteams-var is added and used for > the tests). There are ICVs which values are currently set explicitly in > the config when copying them to the device: GOMP_NTHREADS_VAR, > GOMP_THREAD_LIMIT_VAR, GOMP_DYN_VAR (see gomp_gcn_enter_kernel in > libgomp/config/gcn/team.c and gomp_nvptx_main in > libgomp/config/nvptx/team.c). The corresponding environment variables > are nevertheless parsed and stored device-specific but the transmission > to the device is not changed. Just a partial review, there are many issues. Some issues I'm mentioning just once or several times but many apply to various other spots in the patch. > +/* Returns the element of the list for the specified device number. */ > +struct gomp_icv_list* > +gomp_get_icv_list (struct gomp_icv_list **list, int device_num) > +{ > + struct gomp_icv_list *l = *list; > + while (l != NULL) > + { > + if (l->device_num == device_num) > + return l; > + l = l->next; > + } > + return NULL; > +} > + > +void* Space before *. > +gomp_get_icv_value_ptr (struct gomp_icv_list **list, int device_num) > +{ > + struct gomp_icv_list *l = gomp_get_icv_list (list, device_num); > + if (l == NULL) > + return NULL; > + return l->value; > +} > + > +/* Lists for initial device-specific ICVs, i.e. ICVs that are configured for > + particular devices (with environment variables like OMP_NUM_TEAMS_DEV_42). */ > +struct gomp_icv_list *gomp_dyn_var_dev_list = NULL; > +struct gomp_icv_list *gomp_nthreads_var_dev_list = NULL; > +struct gomp_icv_list *gomp_nthreads_var_list_dev_list = NULL; > +struct gomp_icv_list *gomp_nthreads_var_list_len_dev_list = NULL; > +struct gomp_icv_list *gomp_run_sched_var_dev_list = NULL; > +struct gomp_icv_list *gomp_run_sched_chunk_size_dev_list = NULL; > +struct gomp_icv_list *gomp_nteams_var_dev_list = NULL; > +struct gomp_icv_list *gomp_thread_limit_var_dev_list = NULL; > +struct gomp_icv_list *gomp_max_active_levels_var_dev_list = NULL; > +struct gomp_icv_list *gomp_proc_bind_var_dev_list = NULL; > +struct gomp_icv_list *gomp_proc_bind_var_list_dev_list = NULL; > +struct gomp_icv_list *gomp_proc_bind_var_list_len_dev_list = NULL; > +struct gomp_icv_list *stacksize_dev_list = NULL; > +struct gomp_icv_list *wait_policy_dev_list = NULL; > +struct gomp_icv_list *teams_thread_limit_var_dev_list = NULL; To me the above is just too big extra .data growth, we should optimize for the common case of no OMP_* env vars or a few host cases of them. So, I think it is ok to have the gomp_initial_icv var as is and gomp_initial_icv_flags too. But I'd turn gomp_initial_icv_all and gomp_initial_icv_dev into pointers to gomp_initial_icv_t, and maybe instead of the OMP_*_DEV_ linked lists for each var separately add one linked list that contains device number, next pointer, gomp_initial_icv_t for values and gomp_icv_flags_t used as a bitmask "is this ICV set for this ". > + > +/* Flags for non-global ICVs to store by which environment variables they are > + affected. */ > +struct gomp_icv_flags_t gomp_initial_icv_flags = { > + .nthreads_var = GOMP_ENV_VAR_SUFFIX_UNKNOWN, > + .run_sched_var = GOMP_ENV_VAR_SUFFIX_UNKNOWN, > + .run_sched_chunk_size = GOMP_ENV_VAR_SUFFIX_UNKNOWN, > + .thread_limit_var = GOMP_ENV_VAR_SUFFIX_UNKNOWN, > + .dyn_var = GOMP_ENV_VAR_SUFFIX_UNKNOWN, > + .max_active_levels_var = GOMP_ENV_VAR_SUFFIX_UNKNOWN, > + .bind_var = GOMP_ENV_VAR_SUFFIX_UNKNOWN, > + .nteams_var = GOMP_ENV_VAR_SUFFIX_UNKNOWN, > + .stacksize = GOMP_ENV_VAR_SUFFIX_UNKNOWN, > + .wait_policy = GOMP_ENV_VAR_SUFFIX_UNKNOWN, > + .teams_thread_limit_var = GOMP_ENV_VAR_SUFFIX_UNKNOWN > +}; > + > invalid: > gomp_error ("Invalid value for chunk size in " > "environment variable OMP_SCHEDULE"); This should be gomp_error ("Invalid value for chunk size in " "environment variable %s", name); and similarly for all the other parsing routines. > +/* Helper function for omp_display_env which prints the values of run_sched_var. > + 'device' can be 'host', 'dev', 'all' or a particular device number. */ > +static void > +print_schedule (enum gomp_schedule_type run_sched_var, int run_sched_chunk_size, > + const char* device) Formatting, space before * not after it. > +/* Helper function for omp_display_env which prints the values of proc_bind_var. > + 'device' can be 'host', 'dev', 'all', or a particular device number. */ > +static void print_proc_bind (char proc_bind_var, unsigned long len, char **list, > + const char* device) Space before * instead of after it. print_proc_bind should be at the start of line, not in the middle. > +{ > + fprintf (stderr, " [%s] OMP_PROC_BIND = '", device); > + switch (proc_bind_var) > { > case omp_proc_bind_false: > fputs ("FALSE", stderr); > @@ -1324,8 +1392,8 @@ omp_display_env (int verbose) > fputs ("SPREAD", stderr); > break; > } > - for (i = 1; i < gomp_bind_var_list_len; i++) > - switch (gomp_bind_var_list[i]) > + for (int i = 1; i < len; i++) > + switch ((*list)[i]) > { > case omp_proc_bind_master: > fputs (",MASTER", stderr); /* TODO: Change to PRIMARY for OpenMP 5.1. */ > @@ -1338,7 +1406,131 @@ omp_display_env (int verbose) > break; > } > fputs ("'\n", stderr); > - fputs (" OMP_PLACES = '", stderr); > +} > + > +void > +omp_display_env (int verbose) > +{ > + int i; > + > + fputs ("\nOPENMP DISPLAY ENVIRONMENT BEGIN\n", stderr); > + > + fputs (" _OPENMP = '201511'\n", stderr); > + > + fprintf (stderr, " [host] OMP_DYNAMIC = '%s'\n", > + gomp_initial_icv.dyn_var ? "TRUE" : "FALSE"); > + if (gomp_initial_icv_flags.dyn_var & GOMP_ENV_VAR_SUFFIX_ALL) > + fprintf (stderr, " [all] OMP_DYNAMIC = '%s'\n", > + gomp_initial_icv_all.dyn_var ? "TRUE" : "FALSE"); > + if (gomp_initial_icv_flags.dyn_var & GOMP_ENV_VAR_SUFFIX_DEV) > + fprintf (stderr, " [device] OMP_DYNAMIC = '%s'\n", > + gomp_initial_icv_dev.dyn_var ? "TRUE" : "FALSE"); > + struct gomp_icv_list* l_dyn_var = gomp_dyn_var_dev_list; * formatting. > + while (l_dyn_var != NULL) > + { > + fprintf (stderr, " [%d] OMP_DYNAMIC = '%s'\n", l_dyn_var->device_num, > + *(bool*)l_dyn_var->value ? "TRUE" : "FALSE"); > + l_dyn_var = l_dyn_var->next; > + } And like mentioned elsewhere, this should be table driven, ideally using one table for all. You really don't want to duplicate that much code. > + struct gomp_icv_list* l_nthreads_var = gomp_nthreads_var_dev_list; * formatting. > + struct gomp_icv_list* l_run_sched_var = gomp_run_sched_var_dev_list; * formatting (many times more). > +/* Helper function for parse_device_specific. Extracts the device number from > + an environment variable name. */ > +static void > +get_device_num (char **env, int prefix_len, int *dev_num, int *name_len) Why do you pass char **env rather than say just char *env? The extra indirection doesn't seem to be useful. Why don't you return one of the two return values and just fill in the other one through pointer argument? > +{ > + if (env == NULL || *env == NULL) > + { > + *name_len = 0; > + *dev_num = -1; > + return; > + } > + > + int eq_pos = strchr (*env, '=') - *env; This will misbehave if there is no = character. There is no point to walk again the first prefix_len bytes or walk megabytes of chars if somebody creates extremely long vars, those will never something we care about. So I'd just compute the longest device num we want to support (stringify __INT_MAX__ with an extra preprocessor macro in between so that it is expanded and compute sizeof of it, unless you hardcode 10 chars maximum) and check that after the prefix_len there is 1 to 10 of [0-9] chars followed by = in a loop, if not, punt, otherwise strtoul it into a number? Maybe also verify that first digit is 0 only iff that is immediately followed by =. > + int dev_num_len = eq_pos - prefix_len; > + char buf_dev_num[dev_num_len+1]; > + > + strncpy(buf_dev_num, *env + prefix_len, dev_num_len); > + buf_dev_num[dev_num_len] = '\0'; No need to copy anything, VLA, etc. And strtoul is what is used elsewhere in libgomp, we never use atoi. > + *dev_num = atoi (buf_dev_num); > + *name_len = eq_pos; > +} > + > +/* Helper function for parse_device_specific. Adds a new node to the given > + list. */ > +static struct gomp_icv_list* > +add_device_specific_icv (int dev_num, size_t size, struct gomp_icv_list **list) > +{ > + if (list == NULL) > + return NULL; > + > + struct gomp_icv_list *new_node = > + (struct gomp_icv_list*) malloc (sizeof (struct gomp_icv_list)); Please always use gomp_malloc if it can't fail, malloc should be only used if it is an optimization and caller will handle malloc returning NULL gracefully. > + new_node->device_num = dev_num; > + new_node->value = malloc (size); > + new_node->next = *list; > + *list = new_node; > + > + return new_node; > +} > + > +/* Helper function for 'initialize_env' to parse device-specific environment > + variables like 'OMP_NUM_TEAMS_DEV_42'. */ > +static void > +parse_device_specific () Should be (void) > +{ > + extern char **environ; > + int dev_num; > + int name_len; > + struct gomp_icv_list *new_node; > + > + for (char **env = environ; *env != 0; env++) > + { > + if (strncmp (*env, "OMP_SCHEDULE_DEV_", 17) == 0) These would be a maintainance nightmare (making sure all the computed lengths are accurate). Please add something like static inline bool startswith (const char *str, const char *prefix) { return strncmp (str, prefix, strlen (prefix)) == 0; } gcc/system.h has and use it. But even with that you really don't want to have to duplicate so much code, e.g. all the get_device_num char name[name_len strncpy and termination at least. So I think you want it table driven, start by checking if (!startswith (*env, "OMP_")) continue; and then walk over some table with names of the env vars, their precomputed lengths (using some macro so that you don't duplicate the strings), and then what to do with it, where what to do could be what function to call and var to fill in, or something you can switch on if common code can't be used. > + { > + get_device_num (env, 17, &dev_num, &name_len); > + char name[name_len+1]; Formatting, spaces around + on both sides. But I think using a VLA is unnecessary, you can easily compute some upper bound (length of largest supported env var name + length of largest supported device number (we shouldn't support > INT_MAX numbers). > + strncpy(name, *env, name_len); Formatting, space before (. > + new_node = add_device_specific_icv (dev_num, sizeof (bool), > + &gomp_dyn_var_dev_list); > + *((bool*)(new_node->value)) = value; Formatting, space before *, space in between )(. > + /* Parse the environment variables and store their values in the initial > + struct. */ > + if (parse_schedule ("OMP_SCHEDULE", &gomp_initial_icv.run_sched_var, > + &gomp_initial_icv.run_sched_chunk_size)) > + { > + gomp_initial_icv_flags.run_sched_var |= GOMP_ENV_VAR_SUFFIX_NONE; > + gomp_initial_icv_flags.run_sched_chunk_size |= GOMP_ENV_VAR_SUFFIX_NONE; > + } > + if (parse_schedule ("OMP_SCHEDULE_ALL", &gomp_initial_icv_all.run_sched_var, > + &gomp_initial_icv_all.run_sched_chunk_size)) > + { > + gomp_initial_icv_flags.run_sched_var |= GOMP_ENV_VAR_SUFFIX_ALL; > + gomp_initial_icv_flags.run_sched_chunk_size |= GOMP_ENV_VAR_SUFFIX_ALL; > + } > + if (parse_schedule ("OMP_SCHEDULE_DEV", &gomp_initial_icv_dev.run_sched_var, > + &gomp_initial_icv_dev.run_sched_chunk_size)) > + { > + gomp_initial_icv_flags.run_sched_var |= GOMP_ENV_VAR_SUFFIX_DEV; > + gomp_initial_icv_flags.run_sched_chunk_size |= GOMP_ENV_VAR_SUFFIX_DEV; > + } This really should be table driven. This is code done once during program startup, so we don't want such code to be extra large (roughly optimize for size, not speed). Furthermore, we should optimize for the common case that no env vars or only very few of them are used and if anything, the host ones. Looking at glibc's getenv implementation, getenv is O(num_env_vars), like: for (ep = __environ; *ep != NULL; ++ep) { #if _STRING_ARCH_unaligned uint16_t ep_start = *(uint16_t *) *ep; #else uint16_t ep_start = (((unsigned char *) *ep)[0] | (((unsigned char *) *ep)[1] << 8)); #endif if (name_start == ep_start && !strncmp (*ep + 2, name, len) && (*ep)[len + 2] == '=') return &(*ep)[len + 3]; } I'd keep the GOMP_* env vars as is using getenv, but as we need to walk environ completely for the OMP_*_DEV_* vars anyway, I wonder if we just shouldn't handle all the OMP_* env vars inside of that loop. Immediately punt on non-OMP_ prefixed vars, then do table driven decision on what middle part we have and depending on what tail (nothing, _DEV, _ALL, _DEV_) ending it has decide where to stick it. For the table driven operation, some vars are regular and just parse a bool or long etc. value, others need some extra handling code. Though, there is one gotcha, if we had code where we parsed some var first and another one later and there was interdependence between the two, in environ they can appear in any order. > + > + /* Set the ICV values for the host. */ > + if (gomp_initial_icv_flags.run_sched_var & GOMP_ENV_VAR_SUFFIX_NONE) > + { > + gomp_global_icv.run_sched_var = gomp_initial_icv.run_sched_var; > + gomp_global_icv.run_sched_chunk_size = No = at the end of line. > + gomp_initial_icv.run_sched_chunk_size; > + } > +enum gomp_env_var_suffix_t > +{ > + GOMP_ENV_VAR_SUFFIX_UNKNOWN = 0, > + GOMP_ENV_VAR_SUFFIX_NONE = 1, > + GOMP_ENV_VAR_SUFFIX_DEV = 2, > + GOMP_ENV_VAR_SUFFIX_ALL = 4 > +}; > + > +/* Struct that contains all ICVs for which we need to store initial values. > + Keeping the initial values is needed for omp_display_env and also used for > + transmitting device-specific values to the target. */ > +struct gomp_initial_icv_t > +{ > + unsigned long nthreads_var; > + unsigned long *nthreads_var_list; > + unsigned long nthreads_var_list_len; > + enum gomp_schedule_type run_sched_var; > + int run_sched_chunk_size; > + int default_device_var; > + unsigned int thread_limit_var; > + bool dyn_var; > + unsigned char max_active_levels_var; > + char bind_var; > + char *bind_var_list; > + unsigned long bind_var_list_len; > + int nteams_var; > + int teams_thread_limit_var; > + int wait_policy; > + unsigned long stacksize; Would be nice to order the struct elements to avoid padding. Say put pointers first, then unsigned long fields, then int/uint/enum ones and finally the bool/char ones. > +}; > + > +struct gomp_icv_flags_t > +{ > + enum gomp_env_var_suffix_t nthreads_var; > + enum gomp_env_var_suffix_t run_sched_var; > + enum gomp_env_var_suffix_t run_sched_chunk_size; > + enum gomp_env_var_suffix_t thread_limit_var; > + enum gomp_env_var_suffix_t dyn_var; > + enum gomp_env_var_suffix_t max_active_levels_var; > + enum gomp_env_var_suffix_t bind_var; > + enum gomp_env_var_suffix_t nteams_var; > + enum gomp_env_var_suffix_t stacksize; > + enum gomp_env_var_suffix_t wait_policy; > + enum gomp_env_var_suffix_t teams_thread_limit_var; This is unnecessarily large. You need just 3 bits for each. Can't you make all those say 4 bit bitfields? I know it is a GNU extension, but we don't expect libgomp to be compiled by other compilers, do we? > +}; > + > +struct gomp_icv_list { Formatting consistency. All other toplevel structs have { on the next line. > + int device_num; > + void* value; Formatting, use void *value; instead. > + struct gomp_icv_list *next; > +}; > + > --- /dev/null > +++ b/libgomp/testsuite/libgomp.c-c++-common/icv-5.c > @@ -0,0 +1,48 @@ > +/* { dg-do run } */ > + > +#include > +#include > +#include > +#include > + > +char const *varnames[] = { > + "OMP_NUM_TEAMS_DEV_0", > + "OMP_NUM_TEAMS_DEV_1", > + "OMP_NUM_TEAMS_DEV_2", > + "OMP_NUM_TEAMS_ALL", > + "OMP_NUM_TEAMS_DEV", > + "OMP_NUM_TEAMS" > +}; > +char const *values[] = { "42", "43", "44", "45", "46", "47" }; > +const int cnt = 6; > + > +int > +main (int argc, char *const *argv) > +{ > + int updated = 0; > + for (int i = 0; i < cnt; i++) > + { > + if (getenv (varnames[i]) == NULL > + || strcmp (getenv (varnames[i]), values[i]) != 0) > + { > + setenv (varnames[i], values[i], 1); > + updated = 1; > + } > + } > + if (updated) > + { > + execv (argv[0], argv); I'm not sure we can rely on execv on all targets that do support libgomp. Any reason why you actually need this, rather than using dg-set-target-env-var directive(s) and perhaps return 0; if getenv doesn't return the expected values? > + abort (); > + } > + > + int num_devices = omp_get_num_devices () > 3 ? 3 : omp_get_num_devices (); > + for (int i=0; i < num_devices; i++) > + #pragma omp target device (i) > + if (omp_get_max_teams () != 42) > + abort (); > + > + return 0; > +} > \ No newline at end of file Please avoid files without newlines at the end. Jakub