From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 104705 invoked by alias); 29 Jun 2016 15:31:40 -0000 Mailing-List: contact fortran-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: fortran-owner@gcc.gnu.org Received: (qmail 104625 invoked by uid 89); 29 Jun 2016 15:31:39 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=H*F:U*cesar, UD:php, states X-Spam-User: qpsmtpd, 2 recipients X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Wed, 29 Jun 2016 15:31:28 +0000 Received: from svr-orw-fem-03.mgc.mentorg.com ([147.34.97.39]) by relay1.mentorg.com with esmtp id 1bIHSX-0005ry-0O from Cesar_Philippidis@mentor.com ; Wed, 29 Jun 2016 08:31:25 -0700 Received: from [127.0.0.1] (147.34.91.1) by svr-orw-fem-03.mgc.mentorg.com (147.34.97.39) with Microsoft SMTP Server id 14.3.224.2; Wed, 29 Jun 2016 08:31:24 -0700 Subject: Re: [PATCH,openacc] check for compatible loop parallelism with acc routine calls To: Thomas Schwinge , Jakub Jelinek References: <5762190F.4030102@codesourcery.com> <20160617144206.GC7387@tucnak.redhat.com> <576C08D2.6070008@codesourcery.com> <87inwst78s.fsf@kepler.schwinge.homeip.net> CC: "gcc-patches@gcc.gnu.org" , Fortran List , Alexander Monakov , Ilya Verbin From: Cesar Philippidis Message-ID: <5773E9CC.2000202@codesourcery.com> Date: Wed, 29 Jun 2016 15:31:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.8.0 MIME-Version: 1.0 In-Reply-To: <87inwst78s.fsf@kepler.schwinge.homeip.net> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2016-06/txt/msg00119.txt.bz2 On 06/29/2016 07:11 AM, Thomas Schwinge wrote: > Cesar, I have not yet fully digested this, but do I understand right that > you're really fixing two issues here, that are related (OpenACC routines) > but still can be addressed independently of each other? Do I understand > right that the first one, the "problems with acc routines [...] > incorrectly permitting 'acc seq' loops to call gang, worker and vector > routines" is just a Fortran front end patch? If yes, please split that > one out, so as to reduce the volume of remaining changes that remain to > be discussed. This patch addresses the following issues: 1. Issues warnings when a non-acc routine function is called inside an OpenACC offloaded region. 2. It corrects a bug what was allowing seq loops to call gang, worker and vector routines. 3. It adds supports for acc routines in fortran modules (which I noticed was missing when I added 'acc routine seq' to acc_on_device in the fortran openacc include files). I'll split these into separate patches. > On Thu, 23 Jun 2016 09:05:38 -0700, Cesar Philippidis wrote: >> On 06/17/2016 07:42 AM, Jakub Jelinek wrote: >>> On Wed, Jun 15, 2016 at 08:12:15PM -0700, Cesar Philippidis wrote: >>>> The second set of changes involves teaching the gimplifier to error when >>>> it detects a function call to an non-acc routines inside an OpenACC >>>> offloaded region. > > As I understand, that's the same problem as has been discussed before > (Ilya CCed), and has recently again been filed in > "ICE in LTO1 when attempting NVPTX > offloading (-fopenacc)", and "ICE in LTO1 > with -fopenmp offloading" (Alexander CCed). Some earlier discussion > threads include: > , > , > . > >>>> Actually, I relaxed non-acc routines by excluding >>>> calls to builtin functions, including those prefixed with _gfortran_. >>>> Nvptx does have a newlib c library, and it also has a subset of >>>> libgfortran. Still, this solution is probably not optimal. >>> >>> I don't really like that, hardcoding prefixes or whatever is available >>> (you have quite some subset of libc, libm etc. available too) in the >>> compiler looks very hackish. What is wrong with complaining during >>> linking of the offloaded code? > > ACK. Jakub, do I understand you correctly, that you basically say that > every function declaration that is in scope inside offloaded regions (for > example, GCC builtin functions, or standard library functions declared in > target compiler's header files) is permitted to be called in offloaded > regions, and the offloading compiler will then either be able to resolve > these (nvptx back end knows about trigonometric functions, for example, > and a lot of functions are available in the nvptx libc), or otherwise > error out during the offloading compilation (during linking), gracefully > without terminating the target compilation (that "gracefully" bit is > currently missing -- that's for another day). That is, all such > functions are implicitly callable as OpenACC "seq" functions (which means > that they don't internally use gang/worker/vector parallelism). In > particular, all these functions do *not* need to be marked with an > explicit "#pragma acc routine seq" directive. (Functions internally > using gang/worker/vector parallelism will need to be marked > appropriately, using a "#pragma acc routine gang/worker/vector" > directive.) That's how I understand your comment above, and your earlier > comments on this topic, and also is what I think should be done. OK. I'll drop the warning changes from my patch set then unless you want to keep it. > A few random comments on the patch: > >> --- a/gcc/fortran/gfortran.h >> +++ b/gcc/fortran/gfortran.h >> @@ -303,6 +303,15 @@ enum save_state >> { SAVE_NONE = 0, SAVE_EXPLICIT, SAVE_IMPLICIT >> }; >> >> +/* Flags to keep track of ACC routine states. */ >> +enum oacc_function >> +{ OACC_FUNCTION_NONE = 0, >> + OACC_FUNCTION_SEQ, >> + OACC_FUNCTION_GANG, >> + OACC_FUNCTION_WORKER, >> + OACC_FUNCTION_VECTOR >> +}; > > What's the purpose of OACC_FUNCTION_NONE? It's not used anywhere, as far > as I can tell? It's used by the fortran module code. It controls how parallelism gets encoded in the .mod files. >> --- a/gcc/fortran/openmp.c >> +++ b/gcc/fortran/openmp.c >> @@ -1664,21 +1664,31 @@ gfc_match_oacc_cache (void) >> >> /* Determine the loop level for a routine. */ >> >> -static int >> +static oacc_function >> gfc_oacc_routine_dims (gfc_omp_clauses *clauses) >> { >> int level = -1; >> + oacc_function ret = OACC_FUNCTION_SEQ; >> >> if (clauses) >> { >> unsigned mask = 0; >> >> if (clauses->gang) >> - level = GOMP_DIM_GANG, mask |= GOMP_DIM_MASK (level); >> + { >> + level = GOMP_DIM_GANG, mask |= GOMP_DIM_MASK (level); >> + ret = OACC_FUNCTION_GANG; >> + } >> if (clauses->worker) >> - level = GOMP_DIM_WORKER, mask |= GOMP_DIM_MASK (level); >> + { >> + level = GOMP_DIM_WORKER, mask |= GOMP_DIM_MASK (level); >> + ret = OACC_FUNCTION_WORKER; >> + } >> if (clauses->vector) >> - level = GOMP_DIM_VECTOR, mask |= GOMP_DIM_MASK (level); >> + { >> + level = GOMP_DIM_VECTOR, mask |= GOMP_DIM_MASK (level); >> + ret = OACC_FUNCTION_VECTOR; >> + } >> if (clauses->seq) >> level = GOMP_DIM_MAX, mask |= GOMP_DIM_MASK (level); >> > > I have not looked in detail, so maybe I'm misunderstanding what is being > done here -- but how do "clauses->seq" and "level = GOMP_DIM_MAX" fit > together? Conceptually, if you take a look at the oacc_function attribute in a tree dump, you'll see an array with three elements. Basically, each element in that array represents a gang, worker or vector parallelism. By definition, a gang loop permits a worker and vector loop to be nested inside it. So, for a gang routine, the oacc_function attribute is constructed such that it permits gang, worker and vector level parallelism. Similarly, for a worker routine, the oacc_function attribute has the worker and vector level parallelism 'bits' set. With that in mind, setting seq to GOMP_DIM_MASK allows the loop creating that oacc_function attribute to mask out any gang, worker and vector parallelism. >> @@ -1689,7 +1699,7 @@ gfc_oacc_routine_dims (gfc_omp_clauses *clauses) >> if (level < 0) >> level = GOMP_DIM_MAX; >> >> - return level; >> + return ret; >> } > > Just from that last hunk, it seems that the assignment to "level" is a > dead store? I'll need to check this when I split out the patch. >> +static tree >> +add_attributes_to_decl (symbol_attribute sym_attr, tree list) >> +{ >> + unsigned id; >> + tree attr; >> + >> + for (id = 0; id < EXT_ATTR_NUM; id++) >> + if (sym_attr.ext_attr & (1 << id)) >> + { >> + attr = build_tree_list ( >> + get_identifier (ext_attr_list[id].middle_end_name), >> + NULL_TREE); >> + list = chainon (list, attr); >> + } >> + >> + list = add_omp_offloading_attributes (sym_attr.omp_declare_target, >> + sym_attr.oacc_function, list); >> + >> + return list; >> +} > > Something that I had noticed before, possibly related here: code in > gcc/fortran/ does never call replace_oacc_fn_attrib, but the C and C++ > front ends do. Is that function what you've re-implemented here? Similar, but I broke this code out from another function to handle BUILT_IN_EXPECT. But I can revert this change now, since BUILT_IN_EXPECT will be treated as an implicit SEQ routine. >> --- a/gcc/lto-cgraph.c >> +++ b/gcc/lto-cgraph.c >> @@ -1201,9 +1201,11 @@ input_overwrite_node (struct lto_file_decl_data *file_data, >> LDPR_NUM_KNOWN); >> node->instrumentation_clone = bp_unpack_value (bp, 1); >> node->split_part = bp_unpack_value (bp, 1); >> - gcc_assert (flag_ltrans >> - || (!node->in_other_partition >> - && !node->used_from_other_partition)); >> + >> + int success = flag_ltrans || (!node->in_other_partition >> + && !node->used_from_other_partition); >> + if (!success) >> + error ("Missing %<%s%>", node->name ()); >> } >> >> /* Return string alias is alias of. */ >> @@ -1416,9 +1418,11 @@ input_varpool_node (struct lto_file_decl_data *file_data, >> node->set_section_for_node (section); >> node->resolution = streamer_read_enum (ib, ld_plugin_symbol_resolution, >> LDPR_NUM_KNOWN); >> - gcc_assert (flag_ltrans >> - || (!node->in_other_partition >> - && !node->used_from_other_partition)); >> + >> + int success = flag_ltrans || (!node->in_other_partition >> + && !node->used_from_other_partition); >> + if (!success) >> + error ("Missing %<%s%>", node->name ()); >> >> return node; >> } > > That looks similar to what I remember from earlier, simiar patches, as > referenced above. It is. I never got around to pushing that patch very strongly because I thought those link failures were legitimate compiler bugs. >> @@ -19420,7 +19442,8 @@ oacc_loop_fixed_partitions (oacc_loop *loop, unsigned outer_mask) >> { >> unsigned outermost = this_mask & -this_mask; >> >> - if (outermost && outermost <= outer_mask) >> + if ((outermost && outermost <= outer_mask) >> + || (this_mask && (loop->parent->flags & OLF_SEQ))) >> { >> if (noisy) >> { > >> --- a/gcc/testsuite/c-c++-common/goacc/routine-3.c >> +++ b/gcc/testsuite/c-c++-common/goacc/routine-3.c >> @@ -49,7 +49,7 @@ main () >> int red = 0; >> #pragma acc parallel copy (red) >> { >> - /* Independent/seq loop tests. */ >> + /* Independent loop tests. */ >> #pragma acc loop reduction (+:red) // { dg-warning "insufficient partitioning" } >> for (int i = 0; i < 10; i++) >> red += gang (); >> @@ -62,6 +62,19 @@ main () >> for (int i = 0; i < 10; i++) >> red += vector (); >> >> + /* Seq loop tests. */ >> +#pragma acc loop seq reduction (+:red) /* { dg-message "containing loop" } */ >> + for (int i = 0; i < 10; i++) >> + red += gang (); /* { dg-error "incorrectly nested" } */ >> + >> +#pragma acc loop seq reduction (+:red) /* { dg-message "containing loop" } */ >> + for (int i = 0; i < 10; i++) >> + red += worker (); /* { dg-error "incorrectly nested" } */ >> + >> +#pragma acc loop seq reduction (+:red) /* { dg-message "containing loop" } */ >> + for (int i = 0; i < 10; i++) >> + red += vector (); /* { dg-error "incorrectly nested" } */ >> + >> /* Gang routine tests. */ >> #pragma acc loop gang reduction (+:red) /* { dg-message "containing loop" } */ >> for (int i = 0; i < 10; i++) > > Do these test case changes actually relate to any of the compiler changes > discussed above? Maybe to the oacc_loop_fixed_partitions cited just > above? Is that a separate issue to fix? Eh, or is that actually the fix > for your first issue, the "problems with acc routines [...] incorrectly > permitting 'acc seq' loops to call gang, worker and vector routines"? This is issue 2, and I'll break it out into a separate patch. >> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c >> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c >> @@ -1,4 +1,4 @@ >> /* { dg-do run { target lto } } */ >> -/* { dg-additional-options "-fipa-pta -flto -flto-partition=max" } */ >> +/* { dg-additional-options "-fipa-pta -flto -flto-partition=max -fno-exceptions" } */ >> >> #include "data-clauses-kernels.c" > >> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels.c >> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels.c >> @@ -1,2 +1,4 @@ >> +/* { dg-additional-options "-fno-exceptions" } */ >> + >> #define CONSTRUCT kernels >> #include "data-clauses.h" > >> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c >> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c >> @@ -1,4 +1,4 @@ >> /* { dg-do run { target lto } } */ >> -/* { dg-additional-options "-fipa-pta -flto -flto-partition=max" } */ >> +/* { dg-additional-options "-fipa-pta -flto -flto-partition=max -fno-exceptions" } */ >> >> #include "data-clauses-parallel.c" > >> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel.c >> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel.c >> @@ -1,2 +1,4 @@ >> +/* { dg-additional-options "-fno-exceptions" } */ >> + >> #define CONSTRUCT parallel >> #include "data-clauses.h" > > Hmm? I'm not sure what happened here either. Maybe adding the 'acc routine' directive to acc_on_device is preventing that function from expanding to its builtin function counterpart, which caused gcc to generate exception code? Cesar