From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 38805 invoked by alias); 9 Dec 2015 13:19:40 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 38789 invoked by uid 89); 9 Dec 2015 13:19:39 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Wed, 09 Dec 2015 13:19:37 +0000 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (Postfix) with ESMTPS id B36F2C0A8485 for ; Wed, 9 Dec 2015 13:19:35 +0000 (UTC) Received: from tucnak.zalov.cz (ovpn-113-142.phx2.redhat.com [10.3.113.142]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id tB9DJXQl017684 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Wed, 9 Dec 2015 08:19:35 -0500 Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.15.2/8.15.2) with ESMTP id tB9DJV7H011238 for ; Wed, 9 Dec 2015 14:19:32 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.15.2/8.15.2/Submit) id tB9DJUYH011237 for gcc-patches@gcc.gnu.org; Wed, 9 Dec 2015 14:19:30 +0100 Date: Wed, 09 Dec 2015 13:19:00 -0000 From: Jakub Jelinek To: GCC Patches Subject: Re: [hsa 5/10] OpenMP lowering/expansion changes (gridification) Message-ID: <20151209131930.GS5675@tucnak.redhat.com> Reply-To: Jakub Jelinek References: <20151207111758.GA24234@virgil.suse.cz> <20151207112243.GF24234@virgil.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151207112243.GF24234@virgil.suse.cz> User-Agent: Mutt/1.5.23 (2014-03-12) X-IsSubscribed: yes X-SW-Source: 2015-12/txt/msg00992.txt.bz2 On Mon, Dec 07, 2015 at 12:22:43PM +0100, Martin Jambor wrote: > it creates a copy of the entire target body and expands it slightly > differently for concurrent execution on a GPU. Note that both teams > and distribute constructs are mandatory. Moreover, currently the > distribute has to be in a combined statement with the inner for > construct. And there are quite a few other restrictions which I hope The standard calls those composite constructs, and I bet for gridification you want that restriction always, without composite distribute parallel for there are two different unrelated loops. > * builtin-types.def (BT_FN_VOID_UINT_PTR_INT_PTR): New. > (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT): Removed. > (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR): New. > * fortran/types.def (BT_FN_VOID_UINT_PTR_INT_PTR): New. > (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT): Removed. > (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR): New. Fortran has its own ChangeLog file. > @@ -556,9 +558,9 @@ DEF_FUNCTION_TYPE_9 (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT, > BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG, > BT_BOOL, BT_UINT, BT_PTR, BT_INT) > > -DEF_FUNCTION_TYPE_10 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT, > - BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR, > - BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_INT, BT_INT) > +DEF_FUNCTION_TYPE_9 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR, > + BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR, > + BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_PTR) There shouldn't be an empty line in between this DEF_FUNCTION_TYPE_9 and the previous one. > @@ -221,9 +223,9 @@ DEF_FUNCTION_TYPE_9 (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT, > BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG, > BT_BOOL, BT_UINT, BT_PTR, BT_INT) > > -DEF_FUNCTION_TYPE_10 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT, > +DEF_FUNCTION_TYPE_9 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR, > BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR, > - BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_INT, BT_INT) > + BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_PTR) > > DEF_FUNCTION_TYPE_11 (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_UINT_LONG_INT_LONG_LONG_LONG, > BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR, Ditto. > --- a/gcc/gimple.def > +++ b/gcc/gimple.def > @@ -369,13 +369,17 @@ DEFGSCODE(GIMPLE_OMP_TARGET, "gimple_omp_target", GSS_OMP_PARALLEL_LAYOUT) > /* GIMPLE_OMP_TEAMS represents #pragma omp teams > BODY is the sequence of statements inside the single section. > CLAUSES is an OMP_CLAUSE chain holding the associated clauses. */ > -DEFGSCODE(GIMPLE_OMP_TEAMS, "gimple_omp_teams", GSS_OMP_SINGLE_LAYOUT) > +DEFGSCODE(GIMPLE_OMP_TEAMS, "gimple_omp_teams", GSS_OMP_TEAMS_LAYOUT) Why? > +/* GIMPLE_OMP_GPUKERNEL represents a parallel loop lowered for execution > + on a GPU. It is an artificial statement created by omp lowering. */ > +DEFGSCODE(GIMPLE_OMP_GPUKERNEL, "gimple_omp_gpukernel", GSS_OMP) Why do you call it GPUKERNEL or KERNEL_BODY when you really mean gridified body and gridified loop? I mean, what is GPU specific about it? PTX is unlikely going to use that. And kernel is a wide term. > @@ -622,8 +623,14 @@ struct GTY((tag("GSS_OMP_FOR"))) > /* [ WORD 11 ] > Pre-body evaluated before the loop body begins. */ > gimple_seq pre_body; > + > + /* [ WORD 12 ] > + If set, this statement is part of a gridified kernel, its clauses need to > + be scanned and lowered but the statement should be discarded after > + lowering. */ > + bool kernel_phony; Ugh no, flags should go into GF_OMP_*. > @@ -643,6 +660,12 @@ struct GTY((tag("GSS_OMP_PARALLEL_LAYOUT"))) > /* [ WORD 10 ] > Shared data argument. */ > tree data_arg; > + > + /* [ WORD 11 ] */ > + /* If set, this statement is part of a gridified kernel, its clauses need to > + be scanned and lowered but the statement should be discarded after > + lowering. */ > + bool kernel_phony; > }; Likewise. As for omp-low.c changes, the file is already large enough that it would be nice if it is easy to find out what routines are for gridification purposes only, use some special prefix (grid_*, ompgrid_*, ...) for all such functions? > @@ -1761,6 +1786,8 @@ fixup_child_record_type (omp_context *ctx) > { > tree f, type = ctx->record_type; > > + if (!ctx->receiver_decl) > + return; So when is receiver_decl NULL? > @@ -2113,6 +2140,14 @@ scan_sharing_clauses (tree clauses, omp_context *ctx) > } > break; > > + case OMP_CLAUSE__GRIDDIM_: > + if (ctx->outer) > + { > + scan_omp_op (&OMP_CLAUSE_GRIDDIM_SIZE (c), ctx->outer); > + scan_omp_op (&OMP_CLAUSE_GRIDDIM_GROUP (c), ctx->outer); These should be OMP_CLAUSE__GRIDDIM__{SIZE,GROUP}. See OMP_CLAUSE__SIMDUID__DECL for another similar macro. > @@ -6252,6 +6302,37 @@ gimple_build_cond_empty (tree cond) > return gimple_build_cond (pred_code, lhs, rhs, NULL_TREE, NULL_TREE); > } > > +/* Return true if a parallel REGION is within a declare target function or > + within a target region and is not a part of a gridified kernel. */ > + > +static bool > +region_needs_kernel_p (struct omp_region *region) > +{ > + bool indirect = false; > + for (region = region->outer; region; region = region->outer) > + { > + if (region->type == GIMPLE_OMP_PARALLEL) > + indirect = true; > + else if (region->type == GIMPLE_OMP_TARGET) > + { > + gomp_target *tgt_stmt; > + tgt_stmt = as_a (last_stmt (region->entry)); gomp_target *tgt_stmt = as_a (last_stmt (region->entry)); ? > +static GTY(()) tree kernel_dim_array_type; > +static GTY(()) tree kernel_lattrs_dimnum_decl; > +static GTY(()) tree kernel_lattrs_grid_decl; > +static GTY(()) tree kernel_lattrs_group_decl; > +static GTY(()) tree kernel_launch_attributes_type; Turn this at least into either a struct or array of trees, so that it is not 5 separate GC roots? > + tree dim_arr_index_type; > + dim_arr_index_type = build_index_type (build_int_cst (integer_type_node, 2)); See above for formatting; even if you don't have the declaration one line above it, putting = in 5th column of next line will be often beneficial for the formatting: > + kernel_dim_array_type = build_array_type (uint32_type_node, > + dim_arr_index_type); > + > + kernel_launch_attributes_type = make_node (RECORD_TYPE); > + kernel_lattrs_dimnum_decl = build_decl (BUILTINS_LOCATION, FIELD_DECL, > + get_identifier ("ndim"), > + uint32_type_node); > + DECL_CHAIN (kernel_lattrs_dimnum_decl) = NULL_TREE; > + > + kernel_lattrs_grid_decl = build_decl (BUILTINS_LOCATION, FIELD_DECL, > + get_identifier ("grid_size"), > + kernel_dim_array_type); > + DECL_CHAIN (kernel_lattrs_grid_decl) = kernel_lattrs_dimnum_decl; > + kernel_lattrs_group_decl = build_decl (BUILTINS_LOCATION, FIELD_DECL, > + get_identifier ("group_size"), > + kernel_dim_array_type); > + DECL_CHAIN (kernel_lattrs_group_decl) = kernel_lattrs_grid_decl; > + finish_builtin_struct (kernel_launch_attributes_type, > + "__gomp_kernel_launch_attributes", > + kernel_lattrs_group_decl, NULL_TREE); > +static tree > +get_target_arguments (gimple_stmt_iterator *gsi, gomp_target *tgt_stmt) > +{ > + auto_vec args; > + tree clauses = gimple_omp_target_clauses (tgt_stmt); > + tree t, c = find_omp_clause (clauses, OMP_CLAUSE_NUM_TEAMS); > + if (c) > + t = OMP_CLAUSE_NUM_TEAMS_EXPR (c); > + else > + t = integer_minus_one_node; > + t = get_target_argument_value (gsi, GOMP_TARGET_ARG_DEVICE_ALL, > + GOMP_TARGET_ARG_NUM_TEAMS, t); > + args.quick_push (t); This is what I've talked about in review of another patch. num_teams is int, for 32-bit targets trying to encode it into 16 bits is not going to work. > + > + c = find_omp_clause (clauses, OMP_CLAUSE_THREAD_LIMIT); > + if (c) > + t = OMP_CLAUSE_THREAD_LIMIT_EXPR (c); > + else > + t = integer_minus_one_node; > + t = get_target_argument_value (gsi, GOMP_TARGET_ARG_DEVICE_ALL, > + GOMP_TARGET_ARG_THREAD_LIMIT, t); Ditto. > @@ -14872,6 +15392,14 @@ lower_omp_taskreg (gimple_stmt_iterator *gsi_p, omp_context *ctx) > par_olist = NULL; > par_ilist = NULL; > par_rlist = NULL; > + bool phony_construct = is_a (stmt) > + && gimple_omp_parallel_kernel_phony (as_a (stmt)); I'm not a big fan of the is_a mess. gimple_code (stmt) == GIMPLE_OMP_PARALLEL is what is used elsewhere. > + if (phony_construct && ctx->record_type) > + { > + gcc_checking_assert (!ctx->receiver_decl); > + ctx->receiver_decl = create_tmp_var > + (build_reference_type (ctx->record_type), ".omp_rec"); Formatting. > @@ -400,7 +401,8 @@ const char * const omp_clause_code_name[] = > "num_gangs", > "num_workers", > "vector_length", > - "tile" > + "tile", > + "griddim" The clause is "_griddim_". Jakub