From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 11318 invoked by alias); 18 Sep 2014 19:10:36 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 11290 invoked by uid 89); 18 Sep 2014 19:10:35 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.2 X-Spam-User: qpsmtpd, 2 recipients X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 18 Sep 2014 19:10:33 +0000 Received: from svr-orw-fem-03.mgc.mentorg.com ([147.34.97.39]) by relay1.mentorg.com with esmtp id 1XUh65-0002fe-Pe from Cesar_Philippidis@mentor.com ; Thu, 18 Sep 2014 12:10:29 -0700 Received: from [127.0.0.1] (147.34.91.1) by svr-orw-fem-03.mgc.mentorg.com (147.34.97.39) with Microsoft SMTP Server id 14.3.181.6; Thu, 18 Sep 2014 12:10:29 -0700 Message-ID: <541B2E23.3070102@codesourcery.com> Date: Thu, 18 Sep 2014 19:10:00 -0000 From: Cesar Philippidis User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.1 MIME-Version: 1.0 To: Thomas Schwinge CC: "gcc-patches@gcc.gnu.org" , Subject: Re: [gomp4] various OpenACC/PTX built-ins and a reduction tweak References: <5418D6B6.40801@codesourcery.com> <87ha04n513.fsf@schwinge.name> In-Reply-To: <87ha04n513.fsf@schwinge.name> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-SW-Source: 2014-09/txt/msg01538.txt.bz2 On 09/18/2014 11:43 AM, Thomas Schwinge wrote: > Hi Cesar! > > On Tue, 16 Sep 2014 17:32:54 -0700, Cesar Philippidis wrote: >> The patch introduces the following OpenACC/PTX-specific built-ins: >> >> * GOACC_ntid >> * GOACC_tid >> * GOACC_nctaid >> * GOACC_ctaid >> * acc_on_device >> * GOACC_get_thread_num >> * GOACC_get_num_threads >> >> Of these functions, the only one part of the OpenACC spec is >> acc_on_device. The other functions are helpers for omp-low.c. In >> particular, I'm using GOACC_get_thread_num and GOACC_get_num_threads to >> determine the number of accelerator threads available to the reduction >> clause. Current GOACC_get_num_threads is num_gangs * vector_length, but >> value is subject to change later on. It's probably a premature to >> include the PTX built-ins right now, but I'd like to middle end of our >> internal OpenACC branch in sync with gomp-4_0-branch. > > In my opinion (and we've once only very briefly discussed this > internally), exposing concepts such as TID or CTA is wrong at this > abstraction level, for these are PTX concepts, but we'd like OpenACC > concepts here: numbers and IDs of gangs, workers, vector-length. That > said, I'm not opposed to having these committed to gomp-4_0-branch, and > we'll fix it in the following. (And -- full disclosure ;-) -- it was me > who internally added GOACC_ntid/GOACC_tid, when I needed those as > prototypes, and never got around to re-doing that properly.) > > >> This patch also allows OpenACC reductions to process the array holding >> partial reductions on the accelerator, instead of copying that array >> back to the host. Currently, this only happens when num_gangs = 1. For >> PTX targets, we're going to need to use another kernel to process the >> array of partial results because PTX lacks inter-CTA synchronization >> (we're currently mapping gangs to CTAs). That's why I was working on the >> routine clause recently. >> >> Is this OK for gomp-4_0-branch? > > If we agree to fix this up as discussed above (and I guess we have no > chance but to agree on this), I'd say so. A few comments, none of which > are blocking: That sounds fair. >> --- a/gcc/builtins.def >> +++ b/gcc/builtins.def > >> #undef DEF_GOACC_BUILTIN >> #define DEF_GOACC_BUILTIN(ENUM, NAME, TYPE, ATTRS) \ >> DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE, \ >> - false, true, true, ATTRS, false, flag_openacc) >> + false, true, true, ATTRS, false, \ >> + (/* TODO */ true || flag_openacc)) > > Hack that, I hope, will soon be addressed properly. > >> #define DEF_GOMP_BUILTIN(ENUM, NAME, TYPE, ATTRS) \ >> DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE, \ >> false, true, true, ATTRS, false, \ >> - (flag_openmp || flag_tree_parallelize_loops)) >> + (/* TODO */ true || flag_openmp || flag_tree_parallelize_loops)) > > Likewise. > > >> --- a/gcc/omp-builtins.def >> +++ b/gcc/omp-builtins.def >> @@ -236,6 +236,3 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET_UPDATE, "GOMP_target_update", >> BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST) >> DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TEAMS, "GOMP_teams", >> BT_FN_VOID_UINT_UINT, ATTR_NOTHROW_LIST) >> - >> -DEF_GOMP_BUILTIN (BUILT_IN_OMP_SET_NUM_THREADS, "omp_set_num_threads", >> - BT_FN_VOID_INT, ATTR_CONST_NOTHROW_LEAF_LIST) > > To avoid confusion: that has been added to gomp-4_0-branch earlier, and > is now reverted to the trunk state. > > >> --- a/gcc/omp-low.c >> +++ b/gcc/omp-low.c > >> +static bool >> +finish_reduction_on_host (omp_context *ctx) > >> +static tree >> +oacc_host_nthreads (omp_context *ctx) > >> +static void >> +finalize_reduction_data (tree clauses, tree nthreads, gimple_seq *stmt_seqp, > >> @@ -4433,61 +4519,26 @@ lower_reduction_clauses (tree clauses, gimple_seq *stmt_seqp, omp_context *ctx) > >> @@ -9782,6 +9900,14 @@ initialize_reduction_data (tree clauses, tree nthreads, gimple_seq *stmt_seqp, > >> @@ -9913,16 +10089,15 @@ finalize_reduction_data (tree clauses, tree nthreads, gimple_seq *stmt_seqp, > >> @@ -10026,7 +10211,6 @@ process_reduction_data (gimple_seq *body, gimple_seq *in_stmt_seqp, > > Have not reviewed in detail, but that also applies to previous reduction > changes, so OK to apply on top of that. > > >> @@ -5644,9 +5739,9 @@ expand_oacc_offload (struct omp_region *region) >> tree openmp_target = get_offload_symbol_decl (); >> tree fnaddr = build_fold_addr_expr (child_fn); >> g = gimple_build_call (builtin_decl_explicit (start_ix), 10, device, >> - fnaddr, build_fold_addr_expr (openmp_target), >> - t1, t2, t3, t4, >> - t_num_gangs, t_num_workers, t_vector_length); >> + fnaddr, build_fold_addr_expr (openmp_target), >> + t1, t2, t3, t4, >> + t_num_gangs, t_num_workers, t_vector_length); >> gimple_set_location (g, gimple_location (entry_stmt)); >> gsi_insert_before (&gsi, g, GSI_SAME_STMT); >> } > > Unintentional? > > >> @@ -6913,8 +7008,10 @@ expand_omp_for_static_nochunk (struct omp_region *region, > >> + /* Ensure nthreads is at least 1. BUILT_IN_GOACC_NTID returns 0 for a target >> + that does not have a specific expansion. */ > > Remains to be decided. > >> @@ -7317,8 +7423,10 @@ expand_omp_for_static_chunk (struct omp_region *region, > >> + /* Ensure nthreads is at least 1. BUILT_IN_GOACC_NTID returns 0 for a target >> + that does not have a specific expansion. */ >> + nthreads >> + = fold_build2 (MAX_EXPR, itype, nthreads, >> + fold_convert (TREE_TYPE (nthreads), integer_one_node)); >> + nthreads = force_gimple_operand_gsi (&gsi, nthreads, true, NULL_TREE, >> + true, GSI_SAME_STMT); > > Likewise. > > >> /* Initialize the reduction variables to be value of the first array >> - element. */ >> + element. FIXME: A parallel loop should use the original reduction >> + variable as the initial value. */ > > Is that > , > and where you had sent that question to the OpenACC technical mailing > list, weeks ago, which has not yet been answered as far as I know -- > might be a good idea to "ping" them? Yes. I'll ping them. I'll make the changes you suggested in a follow-up patch. Since we both touch builtins.c, I'll let your patch go in first since you already have one ready. Then I'll rebase the revised version of this patch on top of your patch. Thanks, Cesar