From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 129308 invoked by alias); 6 Nov 2019 22:59:55 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 129299 invoked by uid 89); 6 Nov 2019 22:59:54 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-15.1 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_SHORT,KAM_STOCKGEN,SPF_PASS autolearn=ham version=3.3.1 spammy=approximately, specially, delays X-HELO: esa3.mentor.iphmx.com Received: from esa3.mentor.iphmx.com (HELO esa3.mentor.iphmx.com) (68.232.137.180) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 06 Nov 2019 22:59:48 +0000 IronPort-SDR: w3fLWz4jwjnrQbRVAIZbEbmGAEoMtsegcDpeOpLjYmJ4mVR6vAWTjrOjLSDI7/2Wd7d8UH1qT+ JjgOCEfLBCu9A66Ii+4KmGRgP4aTeTEWh3++5ctXseo+7O8QBrofoovJ0mOxFcgMr8mei/I+gy 83TV130EzXjBckwilaPuUA9YEmpyCp3+uiqluBksqNgbl/9feZvVGyZ3HSQab/nnmek77BElp6 ZShzI/aGUAXdAfqzg69+V9bQbqzGvjoNWR96mGaMdttlXMVUmgpb5+xDWd20UoKDjMTM9Ks2TJ az4= Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa3.mentor.iphmx.com with ESMTP; 06 Nov 2019 14:59:46 -0800 IronPort-SDR: u7VAI9z6NM0F7WDqv1IoiC/zDJplTkauqalBz0UOSld1KCCivZRrE/BCOvx4nddWN9mdB0nekc QkuX52SjtkgqhRL1KEasHt4Ub/MdA/zUfcJyNfDBBpPsWvrmbB9vg7ZY315pjQ5elAWvwc/n+S kJlFCeqDGfC2lJXnY23kTSuaLyEwOca9ODocN75SsejnbhoC7DWbWnM8HDczmUZlHFlrNqwr9d gEq41XaUdOW+xN3P9t+7HGrVUDR+2IrFPX7Lizg0Hn98TewOPTns3dVqaO6jjZ6H3mkrPTBvu4 TSk= Date: Wed, 06 Nov 2019 22:59:00 -0000 From: Julian Brown To: Thomas Schwinge CC: Bernhard Reutner-Fischer , , Tom de Vries , Chung-Lin Tang , Jakub Jelinek Subject: Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory Message-ID: <20191106225937.52675b51@squid.athome> In-Reply-To: <20190612204216.0ec83e4e@squid.athome> References: <70d27ebd-762e-59a3-082f-48fa0c687212@codesourcery.com> <20180813172151.6bfcece3@squid.athome> <524d9017-e7f0-87a2-6a62-9b23abd65ac2@codesourcery.com> <94ac317d-5e4c-7738-4b22-b59d29dc114a@codesourcery.com> <20180815174637.7f7f9666@squid.athome> <1532490C-015A-4F89-8512-FD9751C7EEA4@gmail.com> <20180816164643.01e26476@squid.athome> <20181211150811.47a032cf@squid.athome> <20190603170245.4a62a0ad@squid.athome> <20190603162300.GR19695@tucnak> <20190607150837.299df55b@squid.athome> <87muinggnx.fsf@euler.schwinge.homeip.net> <20190612204216.0ec83e4e@squid.athome> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="MP_/qS58ClCkk1OuS8tOrARVypC" Return-Path: julian@codesourcery.com X-IsSubscribed: yes X-SW-Source: 2019-11/txt/msg00448.txt.bz2 --MP_/qS58ClCkk1OuS8tOrARVypC Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Content-length: 15785 Hi! This is a new patch that takes a different approach to the last-posted version in this thread. I have combined the previous incremental patches on the og9 branch that culminated in the following patch: https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01220.html =46rom that email, the following explanation was given of the previous approaches taken as to how the partitioning level for OpenACC "private" variables was calculated and represented in the compiler, and how this patch differs: - The first (by Chung-Lin Tang) recorded which variables should be made private per-gang in each front end (i.e. separately in C, C++ and Fortran) using a new attribute "oacc gangprivate". This was deemed too early; the final determination about which loops are assigned which parallelism level has not yet been made at parse time. - The second, last discussed here: https://gcc.gnu.org/ml/gcc-patches/2019-06/msg00726.html moved the analysis of OpenACC contexts to determine parallelism levels to omp-low.c (but kept the "oacc gangprivate" attribute and the NVPTX backend parts). However (as mentioned in that mail), this is still too early: in fact the final determination of the parallelism level for each loop (especially for loops without explicit gang/worker/vector clauses) does not happen until we reach the device compiler, in the oaccloops pass. This patch builds on the second approach, but delays fixing the parallelism level of each "private" variable (those that are addressable, and declared private using OpenACC clauses or by defining them in a scope nested within a compute region or partitioned loop) until the oaccdevlow pass. This is done by adding a new internal UNIQUE function (OACC_PRIVATE) that lists (the address of) each private variable as an argument. These new internal functions fit into the existing scheme for demarking OpenACC loops, as described in comments in the patch. Use of the "oacc gangprivate" attribute is now restricted to the NVPTX backend (and could probably be replaced with some lighter-weight mechanism as a followup). I realised I omitted to make some of the cosmetic changes Thomas highlighted below on starting to write this email, but I can do that (with suitable retesting) if desired before committing. On Wed, 12 Jun 2019 20:42:16 +0100 Julian Brown wrote: > On Wed, 12 Jun 2019 13:57:22 +0200 > Thomas Schwinge wrote: >=20 > > I understand right that this will address some aspects of PR90115 > > "OpenACC: predetermined private levels for variables declared in > > blocks" (so please mention that one in the ChangeLog updates, and > > commit log), but it doesn't address all of these aspects (and see > > also Cesar's list in > > ), > > and also not yet PR90114 "Predetermined private levels for variables > > declared in OpenACC accelerator routines"?=20=20 >=20 > There's two possible reasons for placing gang-private variables in > shared memory: correct implementation of OpenACC semantics, or > optimisation, since shared memory is faster than local memory (on > NVidia devices). Handling of private variables is intimately tied > with the execution model for gangs/workers/vectors implemented by a > particular target: for PTX, that's handled in the backend using a > broadcasting/neutering scheme. >=20 > That is sufficient for code that e.g. sets a variable in worker-single > mode and expects to use the value in worker-partitioned mode. The > difficulty (semantics-wise) comes when the user wants to do something > like an atomic operation in worker-partitioned mode and expects a > worker-single variable to be shared across each partitioned worker. > Forcing use of shared memory for such variables makes that work > properly. >=20 > It is *not* sufficient for the next level down, though -- expecting to > perform atomic operations in vector-partitioned mode on a variable > that is declared in vector-single mode, i.e. so that it is supposed to > be shared across all vector elements. AFAIK, that's not > straightforward, and we haven't attempted to implement it. >=20 > I think the original motivation for this patch was optimisation, > though -- typical code won't try to use atomics in this way. Cesar's > list of caveats that you linked to seems to support that notion. After a little further investigation, I came to the conclusion that the patch was always originally about correctness, but optimisation. But that's largely academic now. > > I guess I'm not terribly happy with the 'goacc.expand_accel_var' > > name. Using different "memories" for specially tagged DECLs seems > > to be a pretty generic concept (address spaces?), and...=20=20 >=20 > This is partly another NVPTX weirdness -- the target uses address > spaces, but only within the backend, and without using the generic > middle-end address space machinery. The other reason for using an > attribute instead of assigning an address space is that the former can > be detected by the target compiler, but will be ignored by the host > compiler. Forcing use of an address space this early would mean that > the same non-standard address space would have to make sense for both > host and offloaded code. >=20 > For AMD GCN, we do use the generic address space support, and I found > that I could re-use the "oacc gangprivate" attribute -- but not the > expand_accel_var hook (expand time is too late for that target). > Instead, another new hook "TARGET_GOACC_ADJUST_GANGPRIVATE_DECL" is > called from omp-offload.c:execute_oacc_device_lower for variables that > have the "oacc gangprivate" attribute set. Those bits haven't been > posted upstream yet, though. This patch uses both target hooks -- the TARGET_GOACC_ADJUST_PRIVATE_DECL (renamed), and TARGET_GOACC_EXPAND_ACCEL_VAR. The first can tweak the decl at oaccdevlow time, and the second at expand time. This version of the patch doesn't provide full support for gang-private variables on AMD GCN yet though, since that depends on other code that hasn't been upstreamed yet. (GCN works with the equivalent patch to this on the og9 branch though.) > > > --- a/gcc/expr.c > > > +++ b/gcc/expr.c > > > @@ -9974,8 +9974,19 @@ expand_expr_real_1 (tree exp, rtx target, > > > machine_mode tmode, exp =3D SSA_NAME_VAR (ssa_name); > > > goto expand_decl_rtl; > > >=20=20 > > > - case PARM_DECL: > > > case VAR_DECL: > > > + /* Allow accel compiler to handle specific cases of > > > variables, > > > + specifically those tagged with the "oacc gangprivate" > > > attribute, > > > + which may be intended to be placed in special memory in > > > GPUs. */ > > > + if (flag_openacc && targetm.goacc.expand_accel_var) > > > + { > > > + temp =3D targetm.goacc.expand_accel_var (exp); > > > + if (temp) > > > + return temp; > > > + } > > > + /* ... fall through ... */ > > > + > > > + case PARM_DECL:=20=20=20=20 > >=20 > > ... I'm thus confused that there isn't already a generic mechanism > > available in GCC, that we can just use instead of adding a new one > > here? Thinking about the "address spaces" stuff in 'gcc/target.def' > > -- or is that the wrong concept? (I'm not familiar with all that, > > and haven't looked closely.)=20=20 >=20 > Same point again -- the same address space would have to be supported > on the host and offload compiler. I'm happy to accept suggestions for > another name for the hook though? (Still not renamed in this version, sorry.) > > > +/* Mark addressable variables which are declared implicitly or > > > explicitly as > > > + gang private with a special attribute. These may need to have > > > their > > > + declarations altered later on in compilation (e.g. in > > > + execute_oacc_device_lower or the backend, depending on how the > > > OpenACC > > > + execution model is implemented on a given target) to ensure > > > that sharing > > > + semantics are correct. */ > > > + > > > +static void > > > +mark_oacc_gangprivate (vec *decls, omp_context *ctx) > > > +{ > > > + int i; > > > + tree decl; > > > + > > > + FOR_EACH_VEC_ELT (*decls, i, decl) > > > + { > > > + for (omp_context *thisctx =3D ctx; thisctx; thisctx =3D > > > thisctx->outer) > > > + { > > > + tree inner_decl =3D maybe_lookup_decl (decl, thisctx); > > > + if (inner_decl) > > > + { > > > + decl =3D inner_decl; > > > + break; > > > + } > > > + } > > > + if (!lookup_attribute ("oacc gangprivate", DECL_ATTRIBUTES > > > (decl))) > > > + { > > > + if (dump_file && (dump_flags & TDF_DETAILS)) > > > + { > > > + fprintf (dump_file, > > > + "Setting 'oacc gangprivate' attribute for > > > decl:"); > > > + print_generic_decl (dump_file, decl, TDF_SLIM); > > > + fputc ('\n', dump_file); > > > + } > > > + DECL_ATTRIBUTES (decl) > > > + =3D tree_cons (get_identifier ("oacc gangprivate"), > > > + NULL, DECL_ATTRIBUTES (decl)); > > > + } > > > + } > > > +}=20=20=20=20 > >=20 > > So I'm confused how that can be done here ('omplower'), given that > > the decision about how levels of parallelism (gang, worker, vector) > > are assigned is only done later ('oaccdevlow'), > > separately/differently per offloading target? > >=20 > > The following seems relevant: > >=20=20=20 > > > +/* Find gang-private variables in a context. */ > > > + > > > +static int > > > +process_oacc_gangprivate (splay_tree_node node, void * ARG_UNUSED > > > (data)) +{ > > > + omp_context *ctx =3D (omp_context *) node->value; > > > + unsigned level_total =3D 0; > > > + omp_context *thisctx; > > > + > > > + for (thisctx =3D ctx; thisctx; thisctx =3D thisctx->outer) > > > + level_total +=3D thisctx->oacc_partitioning_levels; > > > + > > > + /* If the current context and parent contexts are distributed > > > over a > > > + total of one parallelism level, we have gang partitioning. > > > */ > > > + if (level_total =3D=3D 1) > > > + mark_oacc_gangprivate (&ctx->oacc_addressable_var_decls, > > > ctx); + > > > + return 0; > > > +}=20=20=20=20 > >=20 > > ..., but I didn't quickly manage to grok that. (I shall try harder, > > later on.) > >=20 > > But still then, this looks like it might work for the outer level > > (gang) only (because all offloading targets are expected to assign > > gang level to the outermost loop -- might that be the underlying > > assumption?), but it won't work for inner loop/privatization levels? > > (..., which I understand this patch isn't doing anything about.)=20=20 >=20 > The "oacc gangprivate" only applies to variables that are (addressable > and) private per-gang, but the attribute marking works on both > top-level "acc parallel" directives and "acc loop" directives below > that -- so long as they don't explicitly use parallelism finer than > "gang" level. It also works on variables declared private() using > OpenACC clauses in all supported languages, or those that are declared > in an appropriate C/C++ scope. >=20 > At least for loops with reductions, gang-partitioned loops have > different semantics from worker and vector-partitioned loops. So I > think in general, it must be the case that it is possible to analyse > OpenACC code "lexically" to determine which loops are gang > partitioned, and which are partitioned at finer levels. It can't be > deferred entirely to the target. It's been a while since I read those > bits of the standard, though! >=20 > But yes, in GCC, omp-low only tries to calculate the maximum > partitioning level for each loop nest. The final determination isn't > made until oaccdevlow time. That's OK if shared memory is being used > only as an optimisation, much less OK if it's a necessary part of > implementing OpenACC semantics properly. It might be more of an issue > if we tried to support "vector-shared" variables properly. So: this version moves the partitioning-level calculation for private variables out of omp-low, so this isn't an issue any more. Variables are privatized according to the "true" partitioning level of the scope inside the parallel region that they are associated with (i.e. "parallel" region, or loop). > > > + > > > +program main > > > + integer :: w, arr(0:31) > > > + > > > + !$acc parallel num_gangs(32) num_workers(32) copyout(arr) > > > + !$acc loop gang private(w)=20=20=20=20 > >=20 > > ... here. > >=20 > > (Just to make sure, a Fortran 'integer' will always be > > 'integer(kind=3D4)'?)=20=20 >=20 > No idea! I can check. That's a yes, I think. Re-tested with offloading to nvptx. OK for mainline? Thanks, Julian 2019-11-06 Julian Brown Chung-Lin Tang gcc/ * config/gcn/gcn-protos.h (gcn_goacc_adjust_gangprivate_decl): Rename to... (gcn_goacc_adjust_private_decl): ...this. Add and use LEVEL parameter. * config/gcn/gcn-tree.c (gcn_goacc_adjust_gangprivate_decl): Rename to... (gcn_goacc_adjust_private_decl): ...this. Add LEVEL parameter. * config/gcn/gcn.c (TARGET_GOACC_ADJUST_GANGPRIVATE_DECL): Delete. (TARGET_GOACC_ADJUST_PRIVATE_DECL): Define using renamed gcn_goacc_adjust_private_decl. * config/nvptx/nvptx.c (tree-hash-traits.h, tree-pretty-print.h): Include. (gangprivate_shared_size): New global variable. (gangprivate_shared_align): Likewise. (gangprivate_shared_sym): Likewise. (gangprivate_shared_hmap): Likewise. (nvptx_option_override): Initialize gangprivate_shared_sym, gangprivate_shared_align. (nvptx_file_end): Output gangprivate_shared_sym. (nvptx_goacc_adjust_private_decl): New function. (nvptx_goacc_expand_accel_var): New function. (nvptx_set_current_function): New function. (TARGET_GOACC_ADJUST_PRIVATE_DECL, TARGET_GOACC_EXPAND_ACCEL_VAR): Define hooks. * doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR, TARGET_GOACC_ADJUST_PRIVATE_DECL): Place new documentation hooks. * doc/tm.texi: Regenerate. * expr.c (expand_expr_real_1): Expand decls using the expand_accel_var OpenACC hook if defined. * internal-fn.c (expand_UNIQUE): Handle IFN_UNIQUE_OACC_PRIVATE. * internal-fn.h (IFN_UNIQUE_CODES): Add OACC_PRIVATE. * omp-low.c (omp_context): Add oacc_addressable_var_decls field. (new_omp_context): Initialize oacc_addressable_var_decls in new omp_context. (delete_omp_context): Delete oacc_addressable_var_decls in old omp_context. (lower_oacc_reductions): Add PRIVATE_MARKER parameter. Insert private marker before fork. (lower_oacc_head_tail): Add PRIVATE_MARKER parameter. Modify private marker's gimple call arguments, and pass it to lower_oacc_reductions. (oacc_record_private_var_clauses, oacc_record_vars_in_bind, make_oacc_private_marker): New functions. (lower_omp_for): Call oacc_record_private_var_clauses with "for" clauses. Call oacc_record_vars_in_bind for OpenACC contexts. Create private marker and pass to lower_oacc_head_tail. (lower_omp_target): Create private marker and pass to lower_oacc_reductions. (lower_omp_1): Call oacc_record_vars_in_bind for OpenACC bind contexts. * omp-offload.c (convert.h): Include. (oacc_loop_xform_head_tail): Treat private-variable markers like fork/join when transforming head/tail sequences. (execute_oacc_device_lower): Use IFN_UNIQUE_OACC_PRIVATE to determine partitioning level of private variables, and process any found via adjust_private_decl target hook. * target.def (expand_accel_var, adjust_private_decl): New target hooks. libgomp/ * testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test. * testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New test. * testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: New test. * testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90: New test. --MP_/qS58ClCkk1OuS8tOrARVypC Content-Type: text/x-patch Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="gang-local-storage-in-shm-9.diff" Content-length: 30301 commit ccbf9525701265f8522c78b13751b82adba78f62 Author: Julian Brown Date: Thu Mar 21 15:09:24 2019 -0700 Add support for gang local storage allocation in shared memory gcc/ * config/gcn/gcn-protos.h (gcn_goacc_adjust_gangprivate_decl): Rename to... (gcn_goacc_adjust_private_decl): ...this. Add and use LEVEL parameter. * config/gcn/gcn-tree.c (gcn_goacc_adjust_gangprivate_decl): Rename to... (gcn_goacc_adjust_private_decl): ...this. Add LEVEL parameter. * config/gcn/gcn.c (TARGET_GOACC_ADJUST_GANGPRIVATE_DECL): Delete. (TARGET_GOACC_ADJUST_PRIVATE_DECL): Define using renamed gcn_goacc_adjust_private_decl. * config/nvptx/nvptx.c (tree-hash-traits.h, tree-pretty-print.h): Include. (gangprivate_shared_size): New global variable. (gangprivate_shared_align): Likewise. (gangprivate_shared_sym): Likewise. (gangprivate_shared_hmap): Likewise. (nvptx_option_override): Initialize gangprivate_shared_sym, gangprivate_shared_align. (nvptx_file_end): Output gangprivate_shared_sym. (nvptx_goacc_adjust_private_decl): New function. (nvptx_goacc_expand_accel_var): New function. (nvptx_set_current_function): New function. (TARGET_GOACC_ADJUST_PRIVATE_DECL, TARGET_GOACC_EXPAND_ACCEL_VAR): Define hooks. * doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR, TARGET_GOACC_ADJUST_PRIVATE_DECL): Place new documentation hooks. * doc/tm.texi: Regenerate. * expr.c (expand_expr_real_1): Expand decls using the expand_accel_var OpenACC hook if defined. * internal-fn.c (expand_UNIQUE): Handle IFN_UNIQUE_OACC_PRIVATE. * internal-fn.h (IFN_UNIQUE_CODES): Add OACC_PRIVATE. * omp-low.c (omp_context): Add oacc_addressable_var_decls field. (new_omp_context): Initialize oacc_addressable_var_decls in new omp_context. (delete_omp_context): Delete oacc_addressable_var_decls in old omp_context. (lower_oacc_reductions): Add PRIVATE_MARKER parameter. Insert private marker before fork. (lower_oacc_head_tail): Add PRIVATE_MARKER parameter. Modify private marker's gimple call arguments, and pass it to lower_oacc_reductions. (oacc_record_private_var_clauses, oacc_record_vars_in_bind, make_oacc_private_marker): New functions. (lower_omp_for): Call oacc_record_private_var_clauses with "for" clauses. Call oacc_record_vars_in_bind for OpenACC contexts. Create private marker and pass to lower_oacc_head_tail. (lower_omp_target): Create private marker and pass to lower_oacc_reductions. (lower_omp_1): Call oacc_record_vars_in_bind for OpenACC bind contexts. * omp-offload.c (convert.h): Include. (oacc_loop_xform_head_tail): Treat private-variable markers like fork/join when transforming head/tail sequences. (execute_oacc_device_lower): Use IFN_UNIQUE_OACC_PRIVATE to determine partitioning level of private variables, and process any found via adjust_private_decl target hook. * target.def (expand_accel_var, adjust_private_decl): New target hooks. libgomp/ * testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test. * testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New test. * testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: New test. * testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90: New test. diff --git a/gcc/config/gcn/gcn-protos.h b/gcc/config/gcn/gcn-protos.h index da7faf29c70..714d51189d9 100644 --- a/gcc/config/gcn/gcn-protos.h +++ b/gcc/config/gcn/gcn-protos.h @@ -39,7 +39,7 @@ extern rtx gcn_gen_undef (machine_mode); extern bool gcn_global_address_p (rtx); extern tree gcn_goacc_adjust_propagation_record (tree record_type, bool sender, const char *name); -extern void gcn_goacc_adjust_gangprivate_decl (tree var); +extern void gcn_goacc_adjust_private_decl (tree var, int level); extern void gcn_goacc_reduction (gcall *call); extern bool gcn_hard_regno_rename_ok (unsigned int from_reg, unsigned int to_reg); diff --git a/gcc/config/gcn/gcn-tree.c b/gcc/config/gcn/gcn-tree.c index c6b6302e9ed..aa56e236134 100644 --- a/gcc/config/gcn/gcn-tree.c +++ b/gcc/config/gcn/gcn-tree.c @@ -697,8 +697,11 @@ gcn_goacc_adjust_propagation_record (tree record_type, bool sender, } void -gcn_goacc_adjust_gangprivate_decl (tree var) +gcn_goacc_adjust_private_decl (tree var, int level) { + if (level != GOMP_DIM_GANG) + return; + tree type = TREE_TYPE (var); tree lds_type = build_qualified_type (type, TYPE_QUALS_NO_ADDR_SPACE (type) diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c index b5f09da173c..e41023b335c 100644 --- a/gcc/config/gcn/gcn.c +++ b/gcc/config/gcn/gcn.c @@ -6033,8 +6033,8 @@ print_operand (FILE *file, rtx x, int code) #undef TARGET_GOACC_ADJUST_PROPAGATION_RECORD #define TARGET_GOACC_ADJUST_PROPAGATION_RECORD \ gcn_goacc_adjust_propagation_record -#undef TARGET_GOACC_ADJUST_GANGPRIVATE_DECL -#define TARGET_GOACC_ADJUST_GANGPRIVATE_DECL gcn_goacc_adjust_gangprivate_decl +#undef TARGET_GOACC_ADJUST_PRIVATE_DECL +#define TARGET_GOACC_ADJUST_PRIVATE_DECL gcn_goacc_adjust_private_decl #undef TARGET_GOACC_FORK_JOIN #define TARGET_GOACC_FORK_JOIN gcn_fork_join #undef TARGET_GOACC_REDUCTION diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c index 90171a95784..d16125aec8f 100644 --- a/gcc/config/nvptx/nvptx.c +++ b/gcc/config/nvptx/nvptx.c @@ -74,6 +74,8 @@ #include "cfgloop.h" #include "fold-const.h" #include "intl.h" +#include "tree-hash-traits.h" +#include "tree-pretty-print.h" /* This file should be included last. */ #include "target-def.h" @@ -166,6 +168,12 @@ static unsigned vector_red_align; static unsigned vector_red_partition; static GTY(()) rtx vector_red_sym; +/* Shared memory block for gang-private variables. */ +static unsigned gangprivate_shared_size; +static unsigned gangprivate_shared_align; +static GTY(()) rtx gangprivate_shared_sym; +static hash_map gangprivate_shared_hmap; + /* Global lock variable, needed for 128bit worker & gang reductions. */ static GTY(()) tree global_lock_var; @@ -247,6 +255,10 @@ nvptx_option_override (void) vector_red_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT; vector_red_partition = 0; + gangprivate_shared_sym = gen_rtx_SYMBOL_REF (Pmode, "__gangprivate_shared"); + SET_SYMBOL_DATA_AREA (gangprivate_shared_sym, DATA_AREA_SHARED); + gangprivate_shared_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT; + diagnose_openacc_conflict (TARGET_GOMP, "-mgomp"); diagnose_openacc_conflict (TARGET_SOFT_STACK, "-msoft-stack"); diagnose_openacc_conflict (TARGET_UNIFORM_SIMT, "-muniform-simt"); @@ -5231,6 +5243,10 @@ nvptx_file_end (void) write_shared_buffer (asm_out_file, vector_red_sym, vector_red_align, vector_red_size); + if (gangprivate_shared_size) + write_shared_buffer (asm_out_file, gangprivate_shared_sym, + gangprivate_shared_align, gangprivate_shared_size); + if (need_softstack_decl) { write_var_marker (asm_out_file, false, true, "__nvptx_stacks"); @@ -6424,6 +6440,60 @@ nvptx_can_change_mode_class (machine_mode, machine_mode, reg_class_t) return false; } +/* Implement TARGET_GOACC_ADJUST_PRIVATE_DECL. Set "oacc gangprivate" + attribute for gang-private variable declarations. */ + +void +nvptx_goacc_adjust_private_decl (tree decl, int level) +{ + if (level != GOMP_DIM_GANG) + return; + + if (!lookup_attribute ("oacc gangprivate", DECL_ATTRIBUTES (decl))) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Setting 'oacc gangprivate' attribute for decl:"); + print_generic_decl (dump_file, decl, TDF_SLIM); + fputc ('\n', dump_file); + } + tree id = get_identifier ("oacc gangprivate"); + DECL_ATTRIBUTES (decl) = tree_cons (id, NULL, DECL_ATTRIBUTES (decl)); + } +} + +/* Implement TARGET_GOACC_EXPAND_ACCEL_VAR. Place "oacc gangprivate" + variables in shared memory. */ + +static rtx +nvptx_goacc_expand_accel_var (tree var) +{ + if (VAR_P (var) + && lookup_attribute ("oacc gangprivate", DECL_ATTRIBUTES (var))) + { + unsigned int offset, *poffset; + poffset = gangprivate_shared_hmap.get (var); + if (poffset) + offset = *poffset; + else + { + unsigned HOST_WIDE_INT align = DECL_ALIGN (var); + gangprivate_shared_size + = (gangprivate_shared_size + align - 1) & ~(align - 1); + if (gangprivate_shared_align < align) + gangprivate_shared_align = align; + + offset = gangprivate_shared_size; + bool existed = gangprivate_shared_hmap.put (var, offset); + gcc_assert (!existed); + gangprivate_shared_size += tree_to_uhwi (DECL_SIZE_UNIT (var)); + } + rtx addr = plus_constant (Pmode, gangprivate_shared_sym, offset); + return gen_rtx_MEM (TYPE_MODE (TREE_TYPE (var)), addr); + } + return NULL_RTX; +} + static GTY(()) tree nvptx_previous_fndecl; static void @@ -6432,6 +6502,7 @@ nvptx_set_current_function (tree fndecl) if (!fndecl || fndecl == nvptx_previous_fndecl) return; + gangprivate_shared_hmap.empty (); nvptx_previous_fndecl = fndecl; vector_red_partition = 0; oacc_bcast_partition = 0; @@ -6573,6 +6644,12 @@ nvptx_set_current_function (tree fndecl) #undef TARGET_HAVE_SPECULATION_SAFE_VALUE #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed +#undef TARGET_GOACC_ADJUST_PRIVATE_DECL +#define TARGET_GOACC_ADJUST_PRIVATE_DECL nvptx_goacc_adjust_private_decl + +#undef TARGET_GOACC_EXPAND_ACCEL_VAR +#define TARGET_GOACC_EXPAND_ACCEL_VAR nvptx_goacc_expand_accel_var + #undef TARGET_SET_CURRENT_FUNCTION #define TARGET_SET_CURRENT_FUNCTION nvptx_set_current_function diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 915e9612208..db40f50b71c 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -6156,6 +6156,19 @@ like @code{cond_add@var{m}}. The default implementation returns a zero constant of type @var{type}. @end deftypefn +@deftypefn {Target Hook} rtx TARGET_GOACC_EXPAND_ACCEL_VAR (tree @var{var}) +This hook, if defined, is used by accelerator target back-ends to expand +specially handled kinds of VAR_DECL expressions. A particular use is to +place variables with specific attributes inside special accelarator +memories. A return value of NULL indicates that the target does not +handle this VAR_DECL, and normal RTL expanding is resumed. +@end deftypefn + +@deftypefn {Target Hook} void TARGET_GOACC_ADJUST_PRIVATE_DECL (tree @var{var}, @var{int}) +Tweak variable declaration for a private variable at the specified +parallelism level. +@end deftypefn + @node Anchored Addresses @section Anchored Addresses @cindex anchored addresses diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index ac0f0494992..743cf36dd00 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -4215,6 +4215,10 @@ address; but often a machine-dependent strategy can generate better code. @hook TARGET_PREFERRED_ELSE_VALUE +@hook TARGET_GOACC_EXPAND_ACCEL_VAR + +@hook TARGET_GOACC_ADJUST_PRIVATE_DECL + @node Anchored Addresses @section Anchored Addresses @cindex anchored addresses diff --git a/gcc/expr.c b/gcc/expr.c index b54bf1d3dc5..165796b97d2 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -10043,8 +10043,19 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode, exp = SSA_NAME_VAR (ssa_name); goto expand_decl_rtl; - case PARM_DECL: case VAR_DECL: + /* Allow accel compiler to handle specific cases of variables, + specifically those tagged with the "oacc gangprivate" attribute, + which may be intended to be placed in special memory in GPUs. */ + if (flag_openacc && targetm.goacc.expand_accel_var) + { + temp = targetm.goacc.expand_accel_var (exp); + if (temp) + return temp; + } + /* ... fall through ... */ + + case PARM_DECL: /* If a static var's type was incomplete when the decl was written, but the type is complete now, lay out the decl now. */ if (DECL_SIZE (exp) == 0 diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c index 549d6f1153b..2c853047cdd 100644 --- a/gcc/internal-fn.c +++ b/gcc/internal-fn.c @@ -2618,6 +2618,8 @@ expand_UNIQUE (internal_fn, gcall *stmt) else gcc_unreachable (); break; + case IFN_UNIQUE_OACC_PRIVATE: + break; } if (pattern) diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h index 7164ee5cf3c..a2810edc1b4 100644 --- a/gcc/internal-fn.h +++ b/gcc/internal-fn.h @@ -36,7 +36,8 @@ along with GCC; see the file COPYING3. If not see #define IFN_UNIQUE_CODES \ DEF(UNSPEC), \ DEF(OACC_FORK), DEF(OACC_JOIN), \ - DEF(OACC_HEAD_MARK), DEF(OACC_TAIL_MARK) + DEF(OACC_HEAD_MARK), DEF(OACC_TAIL_MARK), \ + DEF(OACC_PRIVATE) enum ifn_unique_kind { #define DEF(X) IFN_UNIQUE_##X diff --git a/gcc/omp-low.c b/gcc/omp-low.c index d8f058fe475..6d821e64767 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -156,6 +156,9 @@ struct omp_context /* True if there is bind clause on the construct (i.e. a loop construct). */ bool loop_p; + + /* Addressable variable decls in this context. */ + vec *oacc_addressable_var_decls; }; static splay_tree all_contexts; @@ -921,6 +924,8 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx) ctx->cb.decl_map = new hash_map; + ctx->oacc_addressable_var_decls = new vec (); + return ctx; } @@ -1002,6 +1007,7 @@ delete_omp_context (splay_tree_value value) } delete ctx->lastprivate_conditional_map; + delete ctx->oacc_addressable_var_decls; XDELETE (ctx); } @@ -6550,8 +6556,9 @@ lower_lastprivate_clauses (tree clauses, tree predicate, gimple_seq *body_p, static void lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner, - gcall *fork, gcall *join, gimple_seq *fork_seq, - gimple_seq *join_seq, omp_context *ctx) + gcall *fork, gcall *private_marker, gcall *join, + gimple_seq *fork_seq, gimple_seq *join_seq, + omp_context *ctx) { gimple_seq before_fork = NULL; gimple_seq after_fork = NULL; @@ -6747,6 +6754,8 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner, /* Now stitch things together. */ gimple_seq_add_seq (fork_seq, before_fork); + if (private_marker) + gimple_seq_add_stmt (fork_seq, private_marker); if (fork) gimple_seq_add_stmt (fork_seq, fork); gimple_seq_add_seq (fork_seq, after_fork); @@ -7462,7 +7471,7 @@ lower_oacc_loop_marker (location_t loc, tree ddvar, bool head, HEAD and TAIL. */ static void -lower_oacc_head_tail (location_t loc, tree clauses, +lower_oacc_head_tail (location_t loc, tree clauses, gcall *private_marker, gimple_seq *head, gimple_seq *tail, omp_context *ctx) { bool inner = false; @@ -7470,6 +7479,14 @@ lower_oacc_head_tail (location_t loc, tree clauses, gimple_seq_add_stmt (head, gimple_build_assign (ddvar, integer_zero_node)); unsigned count = lower_oacc_head_mark (loc, ddvar, clauses, head, ctx); + + if (private_marker) + { + gimple_set_location (private_marker, loc); + gimple_call_set_lhs (private_marker, ddvar); + gimple_call_set_arg (private_marker, 1, ddvar); + } + tree fork_kind = build_int_cst (unsigned_type_node, IFN_UNIQUE_OACC_FORK); tree join_kind = build_int_cst (unsigned_type_node, IFN_UNIQUE_OACC_JOIN); @@ -7500,7 +7517,8 @@ lower_oacc_head_tail (location_t loc, tree clauses, &join_seq); lower_oacc_reductions (loc, clauses, place, inner, - fork, join, &fork_seq, &join_seq, ctx); + fork, (count == 1) ? private_marker : NULL, + join, &fork_seq, &join_seq, ctx); /* Append this level to head. */ gimple_seq_add_seq (head, fork_seq); @@ -9465,6 +9483,32 @@ lower_omp_for_lastprivate (struct omp_for_data *fd, gimple_seq *body_p, } } +/* Record vars listed in private clauses in CLAUSES in CTX. This information + is used to mark up variables that should be made private per-gang. */ + +static void +oacc_record_private_var_clauses (omp_context *ctx, tree clauses) +{ + for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c)) + if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE) + { + tree decl = OMP_CLAUSE_DECL (c); + if (VAR_P (decl) && TREE_ADDRESSABLE (decl)) + ctx->oacc_addressable_var_decls->safe_push (decl); + } +} + +/* Record addressable vars declared in BINDVARS in CTX. This information is + used to mark up variables that should be made private per-gang. */ + +static void +oacc_record_vars_in_bind (omp_context *ctx, tree bindvars) +{ + for (tree v = bindvars; v; v = DECL_CHAIN (v)) + if (VAR_P (v) && TREE_ADDRESSABLE (v)) + ctx->oacc_addressable_var_decls->safe_push (v); +} + /* Callback for walk_gimple_seq. Find #pragma omp scan statement. */ static tree @@ -10295,6 +10339,57 @@ lower_omp_for_scan (gimple_seq *body_p, gimple_seq *dlist, gomp_for *stmt, *dlist = new_dlist; } +/* Build an internal UNIQUE function with type IFN_UNIQUE_OACC_PRIVATE listing + the addresses of variables that should be made private at the surrounding + parallelism level. Such functions appear in the gimple code stream in two + forms, e.g. for a partitioned loop: + + .data_dep.6 = .UNIQUE (OACC_HEAD_MARK, .data_dep.6, 1, 68); + .data_dep.6 = .UNIQUE (OACC_PRIVATE, .data_dep.6, -1, &w); + .data_dep.6 = .UNIQUE (OACC_FORK, .data_dep.6, -1); + .data_dep.6 = .UNIQUE (OACC_HEAD_MARK, .data_dep.6); + + or alternatively, OACC_PRIVATE can appear at the top level of a parallel, + not as part of a HEAD_MARK sequence: + + .UNIQUE (OACC_PRIVATE, 0, 0, &w); + + For such stand-alone appearances, the 3rd argument is always 0, denoting + gang partitioning. */ + +static gcall * +make_oacc_private_marker (omp_context *ctx) +{ + int i; + tree decl; + + if (ctx->oacc_addressable_var_decls->length () == 0) + return NULL; + + auto_vec args; + + args.quick_push (build_int_cst (integer_type_node, IFN_UNIQUE_OACC_PRIVATE)); + args.quick_push (integer_zero_node); + args.quick_push (integer_minus_one_node); + + FOR_EACH_VEC_ELT (*ctx->oacc_addressable_var_decls, i, decl) + { + for (omp_context *thisctx = ctx; thisctx; thisctx = thisctx->outer) + { + tree inner_decl = maybe_lookup_decl (decl, thisctx); + if (inner_decl) + { + decl = inner_decl; + break; + } + } + tree addr = build_fold_addr_expr (decl); + args.safe_push (addr); + } + + return gimple_build_call_internal_vec (IFN_UNIQUE, args); +} + /* Lower code for an OMP loop directive. */ static void @@ -10311,6 +10406,8 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx) push_gimplify_context (); + oacc_record_private_var_clauses (ctx, gimple_omp_for_clauses (stmt)); + lower_omp (gimple_omp_for_pre_body_ptr (stmt), ctx); block = make_node (BLOCK); @@ -10329,6 +10426,8 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx) gbind *inner_bind = as_a (gimple_seq_first_stmt (omp_for_body)); tree vars = gimple_bind_vars (inner_bind); + if (is_gimple_omp_oacc (ctx->stmt)) + oacc_record_vars_in_bind (ctx, vars); gimple_bind_append_vars (new_stmt, vars); /* bind_vars/BLOCK_VARS are being moved to new_stmt/block, don't keep them on the inner_bind and it's block. */ @@ -10428,6 +10527,12 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx) lower_omp (gimple_omp_body_ptr (stmt), ctx); + gcall *private_marker = NULL; + if (is_gimple_omp_oacc (ctx->stmt) + && !gimple_seq_empty_p (omp_for_body) + && !gimple_seq_empty_p (omp_for_body)) + private_marker = make_oacc_private_marker (ctx); + /* Lower the header expressions. At this point, we can assume that the header is of the form: @@ -10464,7 +10569,7 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx) if (is_gimple_omp_oacc (ctx->stmt) && !ctx_in_oacc_kernels_region (ctx)) lower_oacc_head_tail (gimple_location (stmt), - gimple_omp_for_clauses (stmt), + gimple_omp_for_clauses (stmt), private_marker, &oacc_head, &oacc_tail, ctx); /* Add OpenACC partitioning and reduction markers just before the loop. */ @@ -12289,8 +12394,14 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx) them as a dummy GANG loop. */ tree level = build_int_cst (integer_type_node, GOMP_DIM_GANG); + gcall *private_marker = make_oacc_private_marker (ctx); + + if (private_marker) + gimple_call_set_arg (private_marker, 2, level); + lower_oacc_reductions (gimple_location (ctx->stmt), clauses, level, - false, NULL, NULL, &fork_seq, &join_seq, ctx); + false, NULL, private_marker, NULL, &fork_seq, + &join_seq, ctx); } gimple_seq_add_seq (&new_body, fork_seq); @@ -12546,6 +12657,9 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx) ctx); break; case GIMPLE_BIND: + if (ctx && is_gimple_omp_oacc (ctx->stmt)) + oacc_record_vars_in_bind (ctx, + gimple_bind_vars (as_a (stmt))); lower_omp (gimple_bind_body_ptr (as_a (stmt)), ctx); maybe_remove_omp_member_access_dummy_vars (as_a (stmt)); break; diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c index 32eacf7863e..d8291125370 100644 --- a/gcc/omp-offload.c +++ b/gcc/omp-offload.c @@ -52,6 +52,7 @@ along with GCC; see the file COPYING3. If not see #include "stringpool.h" #include "attribs.h" #include "cfgloop.h" +#include "convert.h" /* Describe the OpenACC looping structure of a function. The entire function is held in a 'NULL' loop. */ @@ -1082,7 +1083,9 @@ oacc_loop_xform_head_tail (gcall *from, int level) = ((enum ifn_unique_kind) TREE_INT_CST_LOW (gimple_call_arg (stmt, 0))); - if (k == IFN_UNIQUE_OACC_FORK || k == IFN_UNIQUE_OACC_JOIN) + if (k == IFN_UNIQUE_OACC_FORK + || k == IFN_UNIQUE_OACC_JOIN + || k == IFN_UNIQUE_OACC_PRIVATE) *gimple_call_arg_ptr (stmt, 2) = replacement; else if (k == kind && stmt != from) break; @@ -1684,6 +1687,38 @@ execute_oacc_device_lower () case IFN_UNIQUE_OACC_TAIL_MARK: remove = true; break; + + case IFN_UNIQUE_OACC_PRIVATE: + { + HOST_WIDE_INT level + = TREE_INT_CST_LOW (gimple_call_arg (call, 2)); + if (level == -1) + break; + for (unsigned i = 3; + i < gimple_call_num_args (call); + i++) + { + tree arg = gimple_call_arg (call, i); + gcc_assert (TREE_CODE (arg) == ADDR_EXPR); + tree decl = TREE_OPERAND (arg, 0); + if (dump_file && (dump_flags & TDF_DETAILS)) + { + static char const *const axes[] = + /* Must be kept in sync with GOMP_DIM + enumeration. */ + { "gang", "worker", "vector" }; + fprintf (dump_file, "Decl UID %u has %s " + "partitioning:", DECL_UID (decl), + axes[level]); + print_generic_decl (dump_file, decl, TDF_SLIM); + fputc ('\n', dump_file); + } + if (targetm.goacc.adjust_private_decl) + targetm.goacc.adjust_private_decl (decl, level); + } + remove = true; + } + break; } break; } diff --git a/gcc/target.def b/gcc/target.def index 1f011edf88b..a046d6eddb3 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -1719,6 +1719,23 @@ for allocating any storage for reductions when necessary.", void, (gcall *call), default_goacc_reduction) +DEFHOOK +(expand_accel_var, +"This hook, if defined, is used by accelerator target back-ends to expand\n\ +specially handled kinds of VAR_DECL expressions. A particular use is to\n\ +place variables with specific attributes inside special accelarator\n\ +memories. A return value of NULL indicates that the target does not\n\ +handle this VAR_DECL, and normal RTL expanding is resumed.", +rtx, (tree var), +NULL) + +DEFHOOK +(adjust_private_decl, +"Tweak variable declaration for a private variable at the specified\n\ +parallelism level.", +void, (tree var, int), +NULL) + HOOK_VECTOR_END (goacc) /* Functions relating to vectorization. */ diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-private-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-private-1.c new file mode 100644 index 00000000000..28222c25da3 --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-private-1.c @@ -0,0 +1,38 @@ +#include + +int main (void) +{ + int ret; + + #pragma acc parallel num_gangs(1) num_workers(32) copyout(ret) + { + int w = 0; + + #pragma acc loop worker + for (int i = 0; i < 32; i++) + { + #pragma acc atomic update + w++; + } + + ret = (w == 32); + } + assert (ret); + + #pragma acc parallel num_gangs(1) vector_length(32) copyout(ret) + { + int v = 0; + + #pragma acc loop vector + for (int i = 0; i < 32; i++) + { + #pragma acc atomic update + v++; + } + + ret = (v == 32); + } + assert (ret); + + return 0; +} diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c new file mode 100644 index 00000000000..a4f81a39e24 --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c @@ -0,0 +1,95 @@ +#include +#include +#include +#include +#include +#include + +#if 0 +#define DEBUG(DIM, IDX, VAL) \ + fprintf (stderr, "%sdist[%d] = %d\n", (DIM), (IDX), (VAL)) +#else +#define DEBUG(DIM, IDX, VAL) +#endif + +#define N (32*32*32) + +int +check (const char *dim, int *dist, int dimsize) +{ + int ix; + int exit = 0; + + for (ix = 0; ix < dimsize; ix++) + { + DEBUG(dim, ix, dist[ix]); + if (dist[ix] < (N) / (dimsize + 0.5) + || dist[ix] > (N) / (dimsize - 0.5)) + { + fprintf (stderr, "did not distribute to %ss (%d not between %d " + "and %d)\n", dim, dist[ix], (int) ((N) / (dimsize + 0.5)), + (int) ((N) / (dimsize - 0.5))); + exit |= 1; + } + } + + return exit; +} + +int main () +{ + int ary[N]; + int ix; + int exit = 0; + int gangsize = 0, workersize = 0, vectorsize = 0; + int *gangdist, *workerdist, *vectordist; + + for (ix = 0; ix < N;ix++) + ary[ix] = -1; + +#pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \ + copy(ary) copyout(gangsize, workersize, vectorsize) + { +#pragma acc loop gang worker vector + for (unsigned ix = 0; ix < N; ix++) + { + int g, w, v; + + g = __builtin_goacc_parlevel_id (GOMP_DIM_GANG); + w = __builtin_goacc_parlevel_id (GOMP_DIM_WORKER); + v = __builtin_goacc_parlevel_id (GOMP_DIM_VECTOR); + + ary[ix] = (g << 16) | (w << 8) | v; + } + + gangsize = __builtin_goacc_parlevel_size (GOMP_DIM_GANG); + workersize = __builtin_goacc_parlevel_size (GOMP_DIM_WORKER); + vectorsize = __builtin_goacc_parlevel_size (GOMP_DIM_VECTOR); + } + + gangdist = (int *) alloca (gangsize * sizeof (int)); + workerdist = (int *) alloca (workersize * sizeof (int)); + vectordist = (int *) alloca (vectorsize * sizeof (int)); + memset (gangdist, 0, gangsize * sizeof (int)); + memset (workerdist, 0, workersize * sizeof (int)); + memset (vectordist, 0, vectorsize * sizeof (int)); + + /* Test that work is shared approximately equally amongst each active + gang/worker/vector. */ + for (ix = 0; ix < N; ix++) + { + int g = (ary[ix] >> 16) & 255; + int w = (ary[ix] >> 8) & 255; + int v = ary[ix] & 255; + + gangdist[g]++; + workerdist[w]++; + vectordist[v]++; + } + + exit = check ("gang", gangdist, gangsize); + exit |= check ("worker", workerdist, workersize); + exit |= check ("vector", vectordist, vectorsize); + + return exit; +} diff --git a/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90 new file mode 100644 index 00000000000..b9293e7d2a4 --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90 @@ -0,0 +1,25 @@ +! Test for "oacc gangprivate" attribute on gang-private variables + +! { dg-do run } +! { dg-additional-options "-fdump-tree-oaccdevlow-details" } +! { dg-final { scan-tree-dump-times "Decl UID \[0-9\]+ has gang partitioning: integer\\(kind=4\\) w;" 1 "oaccdevlow" } } */ + +program main + integer :: w, arr(0:31) + + !$acc parallel num_gangs(32) num_workers(32) copyout(arr) + !$acc loop gang private(w) + do j = 0, 31 + w = 0 + !$acc loop seq + do i = 0, 31 + !$acc atomic update + w = w + 1 + !$acc end atomic + end do + arr(j) = w + end do + !$acc end parallel + + if (any (arr .ne. 32)) stop 1 +end program main diff --git a/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90 b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90 new file mode 100644 index 00000000000..90e06be24ff --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90 @@ -0,0 +1,25 @@ +! Test for worker-private variables + +! { dg-do run } +! { dg-additional-options "-fdump-tree-oaccdevlow-details" } +! { dg-final { scan-tree-dump-times "Decl UID \[0-9\]+ has worker partitioning: integer\\(kind=4\\) w;" 1 "oaccdevlow" } } */ + +program main + integer :: w, arr(0:31) + + !$acc parallel num_gangs(32) num_workers(32) copyout(arr) + !$acc loop gang worker private(w) + do j = 0, 31 + w = 0 + !$acc loop seq + do i = 0, 31 + !$acc atomic update + w = w + 1 + !$acc end atomic + end do + arr(j) = w + end do + !$acc end parallel + + if (any (arr .ne. 32)) stop 1 +end program main --MP_/qS58ClCkk1OuS8tOrARVypC--