From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 85420 invoked by alias); 19 Aug 2015 19:42:25 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 85407 invoked by uid 89); 19 Aug 2015 19:42:23 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.7 required=5.0 tests=AWL,BAYES_50,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 19 Aug 2015 19:42:19 +0000 Received: from svr-orw-fem-05.mgc.mentorg.com ([147.34.97.43]) by relay1.mentorg.com with esmtp id 1ZS9OD-0007dv-OP from Cesar_Philippidis@mentor.com ; Wed, 19 Aug 2015 12:51:13 -0700 Received: from [127.0.0.1] (147.34.91.1) by svr-orw-fem-05.mgc.mentorg.com (147.34.97.43) with Microsoft SMTP Server id 14.3.224.2; Wed, 19 Aug 2015 12:42:14 -0700 Message-ID: <55D4DC16.7020106@codesourcery.com> Date: Wed, 19 Aug 2015 19:43:00 -0000 From: Cesar Philippidis User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.8.0 MIME-Version: 1.0 To: "gcc-patches@gcc.gnu.org" , Nathan Sidwell , Jakub Jelinek Subject: [gomp4] New reduction infrastructure for OpenACC Content-Type: multipart/mixed; boundary="------------010003080905060909060607" X-SW-Source: 2015-08/txt/msg01131.txt.bz2 --------------010003080905060909060607 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Content-length: 4966 This patch introduces a infrastructure for reductions in OpenACC. This infrastructure consists of four internal functions, GOACC_REDUCTION_SETUP, GOACC_REDUCTION_INIT, GOACC_REDUCTION_FINI, and GOACC_REDUCTION_TEARDOWN, along with a new target hook goacc.reduction. Each internal function shares a common interface: var = ifn (*ref_to_res, local_var, level, op, lid, rid) var is the intermediate and private result of the reduction. Usually, var = local_var. *ref_to_res is a pointer to the resulting reduction. This is only non-NULL for gang reductions. All other reduction operate on local variables for which var will suffice. local_var is a local (private) copy of the reduction variable. level is the GOMP_DIM of the reduction. Each function call may only contain one dim. If a loop a combination of gang, worker and vector, then ifn must be called one per each dim. op is the reduction operation. lid is a unique loop ID. It's not 100% unique because it might get reset in different TUs. rid is the reduction ID within a loop. E.g., if a loop has two reductions associated with it, the first could be designated zero and the second one. The target hook takes in one argument, the gimple statement containing the call to the internal reduction function, and it returns true if it introduces any calls to other target functions. This was necessary for the nvptx backend, specifically for vector INIT because the thread ID is necessary. Each internal function is expanded during execute_oacc_transform using that goacc reduction target hook. This allows us to generate target-specific code while lowering it in a target-independent manner. There are a couple of significant changes in this patch over the existing OpenMP reduction implementation. The first change is that reductions no longer rely on special ganglocal mappings. Certain targets, such as nvptx gpus, have a distributed memory hierarchy. On nvptx targets, all of the processors are partitioned into blocks. Each block has a limited amount of shared memory. Because of the OpenACC spec is written, we were initially mapping nvptx's shared memory into gang-local memory. However, Nathan's worker and vector state propagator is robust enough that we were able to eliminate the ganglocal mappings altogether. While this new infrastructure allows us to eliminate the ganglocal mappings, nvptx still needs to use shared memory for worker reductions. Consider the following example where red is private: #pragma acc loop worker reduction (+:red) for (...) red++; This loop would expand to this during omp-lower: red = GOACC_REDUCTION_SETUP (NULL, red, GOMP_DIM_WORKER, '+', 0, 0); GOACC_FORK (GOMP_DIM_WORKER); red = GOACC_REDUCTION_INIT (NULL, red, GOMP_DIM_WORKER, '+', 0, 0); for (...) red++; red = GOACC_REDUCTION_FINI (NULL, red, GOMP_DIM_WORKER, '+', 0, 0); GOACC_JOIN (GOMP_DIM_WORKER); red = GOACC_REDUCTION_TEARDOWN (NULL, red, GOMP_DIM_WORKER, '+', 0, 0); For nvptx targets, SETUP and TEARDOWN are responsible for allocating and freeing shared memory. INIT is responsible for initializing the private reduction variable. This is necessary for vector reductions because we want thread 0 to contain the original value of local_var, and the other threads to be initialized to the proper value for 'op'. All of the intermediate reduction results are combined in FINI and written back to var or *ref_to_res, whichever is necessary, in TEARDOWN. I don't want to delve too much into the use of this infrastructure right now. We do have a design for that, and I intend to present more details when I post the lowering patch. The next patch will likely be the nvptx changes though. One of the reasons why we needed create this generic interface was to implement vector reductions on nvptx targets. On nvptx targets, we're mapping vectors to warps. That's fine, but warps cannot use spinlocks or the warp will deadlock. As a consequence, we can't use the existing OpenMP atomic reductions in OpenACC. The way I got around the spinlock problem in 5.0 was by allocating an array of length vector_length, and stashing all of the intermediate reductions in there. The later on, one thread would merge all of those reductions together. This new reduction infrastructure provides a more elegant solution for OpenACC reduction. And while we're still using atomic operations for gang and worker reductions, we're no longer using a global lock for workers. This api allows us to use a lock in shared memory for workers. That said, this infrastructure does provide sufficient flexibility to implement tree reductions for gangs and workers later on. It should be noted that this is not a replacement for the existing OpenMP reductions. Rather, OpenMP will continue to use lower_reduction_clauses and friends, while OpenACC will use this infrastructure. That said, OpenMP could taught to use this infrastructure. Is this patch OK for gomp-4_0-branch? Thanks, Cesar --------------010003080905060909060607 Content-Type: text/x-patch; name="reduction-infrastructure-gomp4.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="reduction-infrastructure-gomp4.diff" Content-length: 14052 2015-08-19 Cesar Philippidis gcc/ * doc/tm.texi: Regenerate. * doc/tm.texi.in: Add a placeholder for TARGET_GOACC_REDUCTION. * internal-fn.c (expand_GOACC_REDUCTION_SETUP): New function. (expand_GOACC_REDUCTION_INIT): New function. (expand_GOACC_REDUCTION_FINI): New function. (expand_GOACC_REDUCTION_TEARDOWN): New function. * internal-fn.def (DEF_INTERNAL_FN): * omp-low.c (GOACC_REDUCTION_SETUP, GOACC_REDUCTION_INIT, GOACC_REDUCTION_FINI, GOACC_REDUCTION_TEARDOWN): New internal functions. * omp-low.c (execute_oacc_transform): Expand those new internal functions. (make_pass_oacc_transform): Add TODO_cleanup_cfg to todo_flags_finish. (default_goacc_reduction_setup): New function. (default_goacc_reduction_init_fini): New function. (default_goacc_reduction_teardown): New function. (default_goacc_reduction): New function. * target.def (reduction): New goacc target hook. * targhooks.h (default_goacc_reduction): Declare diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 12e57a7..0c8ba5d 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -5767,6 +5767,19 @@ pass. It should return true, if the functions should be deleted. The default hook returns true, if there is no RTL expanders for them. @end deftypefn +@deftypefn {Target Hook} bool TARGET_GOACC_REDUCTION (gimple @var{call}) +This hook is used by the oacc_transform pass to expand calls to the +internal functions @var{GOACC_REDUCTION_SETUP}, +@var{GOACC_REDUCTION_INIT}, + @var{GOACC_REDUCTION_FINI} and + @var{GOACC_REDUCTION_TEARDOWN} into a sequence of gimple instructions. + @var{call} is gimple statement containing the call to the function. This + hook removes statement @var{call} after the expanded sequence has been + inserted. This hook is also responsible for allocating any storage for + reductions when necessary. It returns @var{true} if the expanded +sequence introduces any calls to OpenACC-specific internal functions. +@end deftypefn + @node Anchored Addresses @section Anchored Addresses @cindex anchored addresses diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 3bf98be..dfd32a7 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -4253,6 +4253,8 @@ address; but often a machine-dependent strategy can generate better code. @hook TARGET_GOACC_LOCK_UNLOCK +@hook TARGET_GOACC_REDUCTION + @node Anchored Addresses @section Anchored Addresses @cindex anchored addresses diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c index fcb9c47..9c923ed 100644 --- a/gcc/internal-fn.c +++ b/gcc/internal-fn.c @@ -2051,6 +2051,39 @@ expand_GOACC_UNLOCK (gcall *ARG_UNUSED (stmt)) #endif } +/* This should get expanded in oacc_transform. */ + +static void +expand_GOACC_REDUCTION_SETUP (gcall *stmt ATTRIBUTE_UNUSED) +{ + gcc_unreachable (); +} + +/* This should get expanded in oacc_transform. */ + +static void +expand_GOACC_REDUCTION_INIT (gcall *stmt ATTRIBUTE_UNUSED) +{ + gcc_unreachable (); +} + +/* This should get expanded in oacc_transform. */ + +static void +expand_GOACC_REDUCTION_FINI (gcall *stmt ATTRIBUTE_UNUSED) +{ + gcc_unreachable (); +} + +/* This should get expanded in oacc_transform. */ + +static void +expand_GOACC_REDUCTION_TEARDOWN (gcall *stmt ATTRIBUTE_UNUSED) +{ + gcc_unreachable (); +} + + /* Routines to expand each internal function, indexed by function number. Each routine has the prototype: diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index 0bb8a91..6c5db37 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -89,3 +89,21 @@ DEF_INTERNAL_FN (GOACC_DIM_POS, ECF_PURE | ECF_NOTHROW | ECF_LEAF, ".") argument is a loop identifer. */ DEF_INTERNAL_FN (GOACC_LOCK, ECF_NOTHROW | ECF_LEAF, "..") DEF_INTERNAL_FN (GOACC_UNLOCK, ECF_NOTHROW | ECF_LEAF, "..") + +/* REDUCTION_SETUP, REDUCTION_INIT, REDUCTION_FINI and REDUCTION_TEARDOWN + together define a generic interface to support gang, worker and vector + reductions. All of the functions take the following form + + V = goacc_reduction_foo (REF_TO_RES, LOCAL_VAR, LEVEL, OP, LID, RID) + + where REF_TO_RES is a reference to the original reduction variable for + that particular reduction, LOCAL_VAR is the intermediate reduction + variable. LEVEL corresponds to the GOMP_DIM of the reduction, OP is a + tree code of the reduction operation. LID is a unique identifier of the + loop within a TU and RID is a unique id for a reduction within a loop. + V is the resulting intermediate reduction variable returned by the + function. In general, V should equal LOCAL_VAR. */ +DEF_INTERNAL_FN (GOACC_REDUCTION_SETUP, ECF_NOTHROW, NULL) +DEF_INTERNAL_FN (GOACC_REDUCTION_INIT, ECF_NOTHROW, NULL) +DEF_INTERNAL_FN (GOACC_REDUCTION_FINI, ECF_NOTHROW, NULL) +DEF_INTERNAL_FN (GOACC_REDUCTION_TEARDOWN, ECF_NOTHROW, NULL) diff --git a/gcc/omp-low.c b/gcc/omp-low.c index fcf037e..2049eea 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -14671,11 +14671,16 @@ execute_oacc_transform () tree attrs = get_oacc_fn_attrib (current_function_decl); int dims[GOMP_DIM_MAX]; tree purpose[GOMP_DIM_MAX]; + bool needs_rescan; if (!attrs) /* Not an offloaded function. */ return 0; + /* Offloaded targets may introduce new basic blocks, which require + dominance information to update SSA. */ + calculate_dominance_info (CDI_DOMINATORS); + { unsigned ix; tree pos = TREE_VALUE (attrs); @@ -14725,59 +14730,74 @@ execute_oacc_transform () replace_oacc_fn_attrib (current_function_decl, pos); } } - - FOR_ALL_BB_FN (bb, cfun) - for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);) - { - gimple stmt = gsi_stmt (gsi); - if (!is_gimple_call (stmt)) - ; /* Nothing. */ - else if (gimple_call_builtin_p (stmt, BUILT_IN_ACC_ON_DEVICE)) - /* acc_on_device must be evaluated at compile time for - constant arguments. */ - { - gsi_next (&gsi); - oacc_xform_on_device (stmt); - continue; - } - else if (gimple_call_internal_p (stmt)) + do + { + needs_rescan = false; + + FOR_ALL_BB_FN (bb, cfun) + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);) { - unsigned ifn_code = gimple_call_internal_fn (stmt); - switch (ifn_code) - { - default: break; + gimple stmt = gsi_stmt (gsi); - case IFN_GOACC_DIM_POS: - case IFN_GOACC_DIM_SIZE: + if (!is_gimple_call (stmt)) + ; /* Nothing. */ + else if (gimple_call_builtin_p (stmt, BUILT_IN_ACC_ON_DEVICE)) + /* acc_on_device must be evaluated at compile time for + constant arguments. */ + { gsi_next (&gsi); - oacc_xform_dim (stmt, dims, ifn_code == IFN_GOACC_DIM_POS); + oacc_xform_on_device (stmt); continue; + } + else if (gimple_call_internal_p (stmt)) + { + unsigned ifn_code = gimple_call_internal_fn (stmt); + int retval = 0; + switch (ifn_code) + { + default: break; - case IFN_GOACC_LOCK: - case IFN_GOACC_UNLOCK: - if (targetm.goacc.lock_unlock - (stmt, dims, ifn_code == IFN_GOACC_LOCK)) - goto remove; - break; + case IFN_GOACC_DIM_POS: + case IFN_GOACC_DIM_SIZE: + gsi_next (&gsi); + oacc_xform_dim (stmt, dims, ifn_code == IFN_GOACC_DIM_POS); + continue; - case IFN_GOACC_FORK: - case IFN_GOACC_JOIN: - if (targetm.goacc.fork_join - (stmt, dims, ifn_code == IFN_GOACC_FORK)) - { - remove: - replace_uses_by (gimple_vdef (stmt), - gimple_vuse (stmt)); - gsi_remove (&gsi, true); - /* Removal will have advanced the iterator. */ + case IFN_GOACC_LOCK: + case IFN_GOACC_UNLOCK: + if (targetm.goacc.lock_unlock + (stmt, dims, ifn_code == IFN_GOACC_LOCK)) + goto remove; + break; + + case IFN_GOACC_REDUCTION_SETUP: + case IFN_GOACC_REDUCTION_INIT: + case IFN_GOACC_REDUCTION_FINI: + case IFN_GOACC_REDUCTION_TEARDOWN: + gsi_next (&gsi); + if (targetm.goacc.reduction (stmt)) + needs_rescan = true; continue; + + case IFN_GOACC_FORK: + case IFN_GOACC_JOIN: + if (targetm.goacc.fork_join + (stmt, dims, ifn_code == IFN_GOACC_FORK)) + { + remove: + replace_uses_by (gimple_vdef (stmt), + gimple_vuse (stmt)); + gsi_remove (&gsi, true); + /* Removal will have advanced the iterator. */ + continue; + } + break; } - break; } + gsi_next (&gsi); } - gsi_next (&gsi); - } + } while (needs_rescan); return 0; } @@ -14875,7 +14895,7 @@ const pass_data pass_data_oacc_transform = 0 /* Possibly PROP_gimple_eomp. */, /* properties_provided */ 0, /* properties_destroyed */ 0, /* todo_flags_start */ - TODO_update_ssa, /* todo_flags_finish */ + TODO_update_ssa | TODO_cleanup_cfg, /* todo_flags_finish */ }; class pass_oacc_transform : public gimple_opt_pass @@ -14906,4 +14926,145 @@ make_pass_oacc_transform (gcc::context *ctxt) return new pass_oacc_transform (ctxt); } +/* Default implementation of targetm.goacc.reduction_setup. This hook + provides a baseline implementation for the internal function + GOACC_REDUCTION_SETUP for a single-threaded target. I.e. num_gangs = + num_workers = vector_length = 1. + + Given: + + V = IFN_RED_SETUP (RES_PTR, LOCAL, LEVEL, OP. LID, RID) + + Expand to: + + V = RES_PTR ? *RES_PTR : LOCAL; +*/ + +static bool +default_goacc_reduction_setup (gimple call) +{ + gimple_stmt_iterator gsi = gsi_for_stmt (call); + tree v = gimple_call_lhs (call); + tree ref_to_res = gimple_call_arg (call, 0); + tree local_var = gimple_call_arg (call, 1); + gimple_seq seq = NULL; + + push_gimplify_context (true); + + if (!integer_zerop (ref_to_res)) + { + tree x = build_simple_mem_ref (ref_to_res); + gimplify_assign (v, x, &seq); + } + else + gimplify_assign (v, local_var, &seq); + + pop_gimplify_context (NULL); + + gsi_replace_with_seq (&gsi, seq, true); + + return false; +} + +/* Default implementation for both targetm.goacc.reduction_init and + reduction_fini. This hook provides a baseline implementation for the + internal functions GOACC_REDUCTION_INIT and GOACC_REDUCTION_FINI for a + single-threaded target. + + Given: + + V = IFN_RED_INIT (RES_PTR, LOCAL, LEVEL, OP, LID, RID) + + or + + V = IFN_RED_FINI (RES_PTR, LOCAL, LEVEL, OP, LID, RID) + + Expand to: + + V = LOCAL; +*/ + +static bool +default_goacc_reduction_init_fini (gimple call) +{ + gimple_stmt_iterator gsi = gsi_for_stmt (call); + tree v = gimple_call_lhs (call); + tree local_var = gimple_call_arg (call, 1); + gimple g; + + g = gimple_build_assign (v, local_var); + gsi_replace (&gsi, g, true); + + return false; +} + +/* Default implementation of targetm.goacc.reduction_teardown. This hook + provides a baseline implementation for the internal function + GOACC_REDUCTION_TEARDOWN for a single-threaded target. + + Given: + + IFN_RED_TEARDOWN (RES_PTR, LOCAL, LEVEL, OP, LID, RID) + + Expand to: + + if (RES_PTR) + *RES_PTR = LOCAL; + + V = LOCAL; +*/ + +static bool +default_goacc_reduction_teardown (gimple call) +{ + gimple_stmt_iterator gsi = gsi_for_stmt (call); + tree lhs = gimple_call_lhs (call); + tree ref_to_res = gimple_call_arg (call, 0); + tree var = gimple_call_arg (call, 1); + gimple_seq seq = NULL; + + push_gimplify_context (true); + + if (!integer_zerop (ref_to_res)) + { + tree x = build_simple_mem_ref (ref_to_res); + gimplify_assign (x, var, &seq); + } + + if (lhs != NULL_TREE) + gimplify_assign (lhs, var, &seq); + + pop_gimplify_context (NULL); + + gsi_replace_with_seq (&gsi, seq, true); + + return false; +} + +/* Default goacc.reduction early expander. */ + +bool +default_goacc_reduction (gimple call) +{ + /* Reductions modify the SSA names in complicated ways. Let update_ssa + correct it. */ + mark_virtual_operands_for_renaming (cfun); + + switch (gimple_call_internal_fn (call)) + { + case IFN_GOACC_REDUCTION_SETUP: + return default_goacc_reduction_setup (call); + + case IFN_GOACC_REDUCTION_INIT: + case IFN_GOACC_REDUCTION_FINI: + return default_goacc_reduction_init_fini (call); + + case IFN_GOACC_REDUCTION_TEARDOWN: + return default_goacc_reduction_teardown (call); + + default: + gcc_unreachable (); + } +} + #include "gt-omp-low.h" diff --git a/gcc/target.def b/gcc/target.def index fa5670a..550db6a 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -1679,6 +1679,15 @@ default hook returns true, if there is no RTL expanders for them.", bool, (gimple, const int[], bool), default_goacc_lock_unlock) +DEFHOOK +(reduction, +"This hook is used by the oacc_transform pass to expand calls to the\n\ +internal functions @var{GOACC_REDUCTION_SETUP},\n\ +@var{GOACC_REDUCTION_INIT},\n\ @var{GOACC_REDUCTION_FINI} and\n\ @var{GOACC_REDUCTION_TEARDOWN} into a sequence of gimple instructions.\n\ @var{call} is gimple statement containing the call to the function. This\n\ hook removes statement @var{call} after the expanded sequence has been\n\ inserted. This hook is also responsible for allocating any storage for\n\ reductions when necessary. It returns @var{true} if the expanded\n\ +sequence introduces any calls to OpenACC-specific internal functions.", +bool, (gimple call), +default_goacc_reduction) + HOOK_VECTOR_END (goacc) /* Functions relating to vectorization. */ diff --git a/gcc/targhooks.h b/gcc/targhooks.h index 0e7f13d..ddde78d 100644 --- a/gcc/targhooks.h +++ b/gcc/targhooks.h @@ -107,6 +107,7 @@ extern unsigned default_add_stmt_cost (void *, int, enum vect_cost_for_stmt, extern void default_finish_cost (void *, unsigned *, unsigned *, unsigned *); extern void default_destroy_cost_data (void *); +extern bool default_goacc_reduction (gimple); extern bool default_goacc_validate_dims (tree, int [], int); extern unsigned default_goacc_dim_limit (unsigned); extern bool default_goacc_fork_join (gimple, const int [], bool); --------------010003080905060909060607--