Hi Chung-Lin! On 2019-11-05T22:35:43+0800, Chung-Lin Tang wrote: > Hi Thomas, > after your last round of review, I realized that the bulk of the compiler omp-low work was > simply a case of dumb over-engineering in the wrong direction :P > (although it did painstakingly function correctly) Hehe -- that happens. ;-) > However, the issue of ACC_DEVICE_TYPE=host not working (and hence "!openacc_host_selected" > in the testcases) Actually not just for that, but also generally for any shared-memory models that may come into existance at some point, such as CUDA Unified Memory, for example? > actually is a bit more sophisticated than I thought: > > The reason it doesn't work for the host device, is because we use the map pointer (i.e. > a hostaddrs[] entry when passed into libgomp) to point to an array descriptor to pass > the whole array information, and rely on code inside gomp_map_vars_* to setup things, > and place the final on-device address of the non-contig. array into devaddrs[], therefore > only using a single map entry (something I thought was quite clever) > > However, this broke down on the host and host-fallback devices, simply because, there > we do NOT do any gomp_map_vars processing; our current code in GOACC_parallel_keyed > simply skips it and passes the offload function the original hostaddrs[] contents. > Lacking the processing to transform the descriptor pointer into a proper array ref, > things of course segfault. > > So I think we have three options for this (which may have some interactions with say, > the "proper" host-side parallelization we eventually need to implement for OpenACC 2.7) > > (1) The simplest solution: implement a processing which searches and reverts such > non-contiguous array map entries in GOACC_parallel_keyed. > (note: I have implemented this in the current attached "v2" patch) > > (2) Make the GOACC_parallel_keyed code to not make short cuts for host-modes; > i.e. still do the proper gomp_map_vars processing for all cases. > > (3) Modify the non-contiguous array map conventions: a possible solution is to use > two maps placed together: one for the array pointer, another for the array descriptor (as > opposed to the current style of using only one map) This needs more further elaborate > compiler/runtime work. > > The first two options will pessimize host-mode performance somewhat. The third I have > some WIP patches, but it's still buggy ATM. Seeking your opinion on what we should do. I'll have to think about it some more, but variant (1) doesn't seem so bad actually, for a first take. While it's not nice to pessimize in particular directives with 'if (false)' clauses, at least it does work, the run-time overhead should not be too bad (also compared to variant (2), I suppose), and variant (3) can still be implemented later. A few comments/questions: Please reference PR76739 in your submission/ChangeLog updates. > --- gcc/c/c-typeck.c (revision 277827) > +++ gcc/c/c-typeck.c (working copy) > @@ -12868,7 +12868,7 @@ c_finish_omp_cancellation_point (location_t loc, t > static tree > handle_omp_array_sections_1 (tree c, tree t, vec &types, > bool &maybe_zero_len, unsigned int &first_non_one, > - enum c_omp_region_type ort) > + bool &non_contiguous, enum c_omp_region_type ort) > { > tree ret, low_bound, length, type; > if (TREE_CODE (t) != TREE_LIST) > @@ -13160,14 +13161,21 @@ handle_omp_array_sections_1 (tree c, tree t, vec return error_mark_node; > } > /* If there is a pointer type anywhere but in the very first > - array-section-subscript, the array section can't be contiguous. */ > + array-section-subscript, the array section can't be contiguous. > + Note that OpenACC does accept these kinds of non-contiguous pointer > + based arrays. */ That comment update should instead be moved to the function comment before the 'handle_omp_array_sections_1' function definition, and should then also explain the new 'non_contiguous' out variable. The latter needs to be done anyway, and the former (no comment here) is easy enough to tell from the code: > if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_DEPEND > && TREE_CODE (TREE_CHAIN (t)) == TREE_LIST) > { > - error_at (OMP_CLAUSE_LOCATION (c), > - "array section is not contiguous in %qs clause", > - omp_clause_code_name[OMP_CLAUSE_CODE (c)]); > - return error_mark_node; > + if (ort == C_ORT_ACC) > + non_contiguous = true; > + else > + { > + error_at (OMP_CLAUSE_LOCATION (c), > + "array section is not contiguous in %qs clause", > + omp_clause_code_name[OMP_CLAUSE_CODE (c)]); > + return error_mark_node; > + } > } > @@ -13238,6 +13247,7 @@ handle_omp_array_sections (tree c, enum c_omp_regi > unsigned int num = types.length (), i; > tree t, side_effects = NULL_TREE, size = NULL_TREE; > tree condition = NULL_TREE; > + tree ncarray_dims = NULL_TREE; > > if (int_size_in_bytes (TREE_TYPE (first)) <= 0) > maybe_zero_len = true; > @@ -13261,6 +13271,13 @@ handle_omp_array_sections (tree c, enum c_omp_regi > length = fold_convert (sizetype, length); > if (low_bound == NULL_TREE) > low_bound = integer_zero_node; > + > + if (non_contiguous) > + { > + ncarray_dims = tree_cons (low_bound, length, ncarray_dims); > + continue; > + } > + > if (!maybe_zero_len && i > first_non_one) > { > if (integer_nonzerop (low_bound)) I'm not at all familiar with this array sections code, will trust your understanding that we don't need any of the processing that you're skipping here ('continue'): 'TREE_SIDE_EFFECTS' handling for the length expressions, and other things. > @@ -13357,6 +13374,14 @@ handle_omp_array_sections (tree c, enum c_omp_regi > size = size_binop (MULT_EXPR, size, l); > } > } > + if (non_contiguous) > + { > + int kind = OMP_CLAUSE_MAP_KIND (c); > + OMP_CLAUSE_SET_MAP_KIND (c, kind | GOMP_MAP_NONCONTIG_ARRAY); > + OMP_CLAUSE_DECL (c) = t; > + OMP_CLAUSE_SIZE (c) = ncarray_dims; > + return false; > + } > if (side_effects) > size = build2 (COMPOUND_EXPR, sizetype, side_effects, size); > if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION Likewise for all the code being skipped here ('return false'). > --- gcc/cp/semantics.c (revision 277827) > +++ gcc/cp/semantics.c (working copy) Analoguous to the C front end. > --- gcc/gimplify.c (revision 277827) > +++ gcc/gimplify.c (working copy) > @@ -8622,9 +8622,17 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_se > if (OMP_CLAUSE_SIZE (c) == NULL_TREE) > OMP_CLAUSE_SIZE (c) = DECL_P (decl) ? DECL_SIZE_UNIT (decl) > : TYPE_SIZE_UNIT (TREE_TYPE (decl)); > + if (OMP_CLAUSE_SIZE (c) > + && TREE_CODE (OMP_CLAUSE_SIZE (c)) == TREE_LIST > + && GOMP_MAP_NONCONTIG_ARRAY_P (OMP_CLAUSE_MAP_KIND (c))) Per the code above, 'OMP_CLAUSE_SIZE (c)' will always be set to something, so no point in checking that here? Isn't the 'GOMP_MAP_NONCONTIG_ARRAY_P' check alone sufficient already? And then maybe 'assert (TREE_CODE (OMP_CLAUSE_SIZE (c)) == TREE_LIST' in here: > { > + /* For non-contiguous array maps, OMP_CLAUSE_SIZE is a TREE_LIST > + of the individual array dimensions, which gimplify_expr doesn't > + handle, so skip the call to gimplify_expr here. */ > + } > - if (gimplify_expr (&OMP_CLAUSE_SIZE (c), pre_p, > - NULL, is_gimple_val, fb_rvalue) == GS_ERROR) > + else if (gimplify_expr (&OMP_CLAUSE_SIZE (c), pre_p, > + NULL, is_gimple_val, fb_rvalue) == GS_ERROR) > + { > remove = true; > break; > } Again, that means we're skipping other code here; don't understand yet. Your ChangeLog update says: > * gimplify.c (gimplify_scan_omp_clauses): For non-contiguous array map kinds, > make sure bias in each dimension are put into firstprivate variables. I'm not yet seeing how that's happening. Ah, I see that ChangeLog comment is probably just a remnant from the previous version. > --- gcc/omp-low.c (revision 277827) > +++ gcc/omp-low.c (working copy) Have not yet reviewed in detail. > @@ -1367,6 +1498,38 @@ scan_sharing_clauses (tree clauses, omp_context *c > install_var_local (decl, ctx); > break; > } > + > + if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP > + && GOMP_MAP_NONCONTIG_ARRAY_P (OMP_CLAUSE_MAP_KIND (c))) > + { > + tree array_decl = OMP_CLAUSE_DECL (c); > + tree array_type = TREE_TYPE (array_decl); > + bool by_ref = (TREE_CODE (array_type) == ARRAY_TYPE > + ? true : false); > + > + /* Checking code to ensure we only have arrays at top dimension. > + This limitation might be lifted in the future. */ Please reference PR76739 here, and in PR76739 also add a comment about this limitation. (As well as any other limitations, of course.) > + if (TREE_CODE (array_type) == REFERENCE_TYPE) > + array_type = TREE_TYPE (array_type); > + tree t = array_type, prev_t = NULL_TREE; > + while (t) > + { > + if (TREE_CODE (t) == ARRAY_TYPE && prev_t) > + { > + error_at (gimple_location (ctx->stmt), "array types are" > + " only allowed at outermost dimension of" > + " non-contiguous array"); > + break; > + } > + prev_t = t; > + t = TREE_TYPE (t); > + } > + > + install_var_field (array_decl, by_ref, 3, ctx); > + install_var_local (array_decl, ctx); > + break; > + } > + Assuming this intentionally means to skip ('break' just above) the following 'if (DECL_P (decl))' and its 'else' branch, then maybe remove the 'break' just above, and instead do 'else if (DECL_P (decl))'? > if (DECL_P (decl)) > { > if (DECL_SIZE (decl) > @@ -2624,6 +2830,14 @@ scan_omp_target (gomp_target *stmt, omp_context *o > gimple_omp_target_set_child_fn (stmt, ctx->cb.dst_fn); > } > > + /* If is OpenACC construct, put non-contiguous array clauses (if any) > + in front of clause chain. The runtime can then test the first to see > + if the additional map processing for them is required. */ > + if (is_gimple_omp_oacc (stmt)) > + reorder_noncontig_array_clauses (gimple_omp_target_clauses_ptr (stmt)); Should that be deemed unsuitable for any reason, then add a new 'GOACC_FLAG_*' flag to indicate existance of non-contiguous arrays. > --- include/gomp-constants.h (revision 277827) > +++ include/gomp-constants.h (working copy) > @@ -40,6 +40,7 @@ > #define GOMP_MAP_FLAG_SPECIAL_0 (1 << 2) > #define GOMP_MAP_FLAG_SPECIAL_1 (1 << 3) > #define GOMP_MAP_FLAG_SPECIAL_2 (1 << 4) > +#define GOMP_MAP_FLAG_SPECIAL_3 (1 << 5) > #define GOMP_MAP_FLAG_SPECIAL (GOMP_MAP_FLAG_SPECIAL_1 \ > | GOMP_MAP_FLAG_SPECIAL_0) > /* Flag to force a specific behavior (or else, trigger a run-time error). */ > @@ -127,6 +128,26 @@ enum gomp_map_kind > /* Decrement usage count and deallocate if zero. */ > GOMP_MAP_RELEASE = (GOMP_MAP_FLAG_SPECIAL_2 > | GOMP_MAP_DELETE), > + /* Mapping kinds for non-contiguous arrays. */ > + GOMP_MAP_NONCONTIG_ARRAY = (GOMP_MAP_FLAG_SPECIAL_3), > + GOMP_MAP_NONCONTIG_ARRAY_TO = (GOMP_MAP_NONCONTIG_ARRAY > + | GOMP_MAP_TO), > + GOMP_MAP_NONCONTIG_ARRAY_FROM = (GOMP_MAP_NONCONTIG_ARRAY > + | GOMP_MAP_FROM), > + GOMP_MAP_NONCONTIG_ARRAY_TOFROM = (GOMP_MAP_NONCONTIG_ARRAY > + | GOMP_MAP_TOFROM), > + GOMP_MAP_NONCONTIG_ARRAY_FORCE_TO = (GOMP_MAP_NONCONTIG_ARRAY_TO > + | GOMP_MAP_FLAG_FORCE), > + GOMP_MAP_NONCONTIG_ARRAY_FORCE_FROM = (GOMP_MAP_NONCONTIG_ARRAY_FROM > + | GOMP_MAP_FLAG_FORCE), > + GOMP_MAP_NONCONTIG_ARRAY_FORCE_TOFROM = (GOMP_MAP_NONCONTIG_ARRAY_TOFROM > + | GOMP_MAP_FLAG_FORCE), > + GOMP_MAP_NONCONTIG_ARRAY_ALLOC = (GOMP_MAP_NONCONTIG_ARRAY > + | GOMP_MAP_ALLOC), > + GOMP_MAP_NONCONTIG_ARRAY_FORCE_ALLOC = (GOMP_MAP_NONCONTIG_ARRAY > + | GOMP_MAP_FORCE_ALLOC), > + GOMP_MAP_NONCONTIG_ARRAY_FORCE_PRESENT = (GOMP_MAP_NONCONTIG_ARRAY > + | GOMP_MAP_FORCE_PRESENT), Just an idea: instead of this long list, would it maybe be better (if feasible at all?) to have a single "lead-in" mapping 'GOMP_MAP_NONCONTIG_ARRAY_MODE', which specifies how many of the following (normal) mappings belong to that "non-contiguous array mode". (Roughly similar to what 'GOMP_MAP_TO_PSET' is doing with any 'GOMP_MAP_POINTER's following it.) Might that make some things simpler, or even more complicated (more internal state to keep)? > --- libgomp/oacc-parallel.c (revision 277827) > +++ libgomp/oacc-parallel.c (working copy) > +static inline void > +revert_noncontig_array_map_pointers (size_t mapnum, void **hostaddrs, > + unsigned short *kinds) > +{ > + for (int i = 0; i < mapnum; i++) > + { > + if (GOMP_MAP_NONCONTIG_ARRAY_P (kinds[i] & 0xff)) > + hostaddrs[i] = *((void **)hostaddrs[i]); Can we be (or, do we make) sure that 'hostaddrs' will never be in read-only memory? And, it's permissible to alter 'hostaddrs'? Ah, other code (including 'libgomp/target.c') is doing such things, too, so it must be fine. > + else > + /* We assume all non-contiguous array map entries are placed at the > + start; first other map kind means we can exit. */ > + break; > + } > +} > --- libgomp/target.c (revision 277827) > +++ libgomp/target.c (working copy) Have not yet reviewed in detail. > @@ -533,9 +679,37 @@ gomp_map_vars_internal (struct gomp_device_descr * > const int typemask = short_mapkind ? 0xff : 0x7; > struct splay_tree_s *mem_map = &devicep->mem_map; > struct splay_tree_key_s cur_node; > - struct target_mem_desc *tgt > - = gomp_malloc (sizeof (*tgt) + sizeof (tgt->list[0]) * mapnum); > - tgt->list_count = mapnum; > + struct target_mem_desc *tgt; > + > + bool process_noncontig_arrays = false; > + size_t nca_data_row_num = 0, row_start = 0; > + size_t nca_info_num = 0, nca_index; > + struct ncarray_info *nca_info = NULL; > + struct target_var_desc *row_desc; > + uintptr_t target_row_addr; > + void **host_data_rows = NULL, **target_data_rows = NULL; > + void *row; > + > + if (mapnum > 0) > + { Also add such a comment here: "We assume all non-contiguous array map entries are placed at the start". > + int kind = get_kind (short_mapkind, kinds, 0); > + process_noncontig_arrays = GOMP_MAP_NONCONTIG_ARRAY_P (kind & typemask); > + } > + > + if (process_noncontig_arrays) > + for (i = 0; i < mapnum; i++) > + { > + int kind = get_kind (short_mapkind, kinds, i); > + if (GOMP_MAP_NONCONTIG_ARRAY_P (kind & typemask)) > + { > + nca_data_row_num += gomp_noncontig_array_count_rows (hostaddrs[i]); > + nca_info_num += 1; > + } > + } Or, actually, can the 'if (mapnum > 0)' above and the 'for' loop here again be simplified to just one loop with 'break', like you've done in 'libgomp/oacc-parallel.c:revert_noncontig_array_map_pointers'? > + > + tgt = gomp_malloc (sizeof (*tgt) > + + sizeof (tgt->list[0]) * (mapnum + nca_data_row_num)); > + tgt->list_count = mapnum + nca_data_row_num; > tgt->refcount = pragma_kind == GOMP_MAP_VARS_ENTER_DATA ? 0 : 1; > tgt->device_descr = devicep; > struct gomp_coalesce_buf cbuf, *cbufp = NULL; > @@ -735,6 +931,56 @@ gomp_map_vars_internal (struct gomp_device_descr * > } > } > > + /* For non-contiguous arrays. Each data row is one target item, separated > + from the normal map clause items, hence we order them after mapnum. */ > + if (process_noncontig_arrays) > + for (i = 0, nca_index = 0, row_start = 0; i < mapnum; i++) > + { > + int kind = get_kind (short_mapkind, kinds, i); > + if (!GOMP_MAP_NONCONTIG_ARRAY_P (kind & typemask)) > + continue; Can instead 'break' again? > @@ -1044,8 +1299,112 @@ gomp_map_vars_internal (struct gomp_device_descr * > array++; > } > } > + > + /* Processing of non-contiguous array rows. */ > + if (process_noncontig_arrays) > + { > + for (i = 0, nca_index = 0, row_start = 0; i < mapnum; i++) > + { > + int kind = get_kind (short_mapkind, kinds, i); > + if (!GOMP_MAP_NONCONTIG_ARRAY_P (kind & typemask)) > + continue; Likewise? It's now gotten too late; more review to follow later. Grüße Thomas