From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jakub@redhat.com>
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [170.10.133.124])
 by sourceware.org (Postfix) with ESMTP id 7010B385AC09
 for <gcc-patches@gcc.gnu.org>; Mon, 26 Jul 2021 14:29:12 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7010B385AC09
Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com
 [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id
 us-mta-512-e6WQJN9SO2aAp3zpASL7Hg-1; Mon, 26 Jul 2021 10:29:08 -0400
X-MC-Unique: e6WQJN9SO2aAp3zpASL7Hg-1
Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com
 [10.5.11.22])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mimecast-mx01.redhat.com (Postfix) with ESMTPS id D2ECC100A60E;
 Mon, 26 Jul 2021 14:29:07 +0000 (UTC)
Received: from tucnak.zalov.cz (ovpn-112-143.ams2.redhat.com [10.36.112.143])
 by smtp.corp.redhat.com (Postfix) with ESMTPS id 45A49100760B;
 Mon, 26 Jul 2021 14:29:06 +0000 (UTC)
Received: from tucnak.zalov.cz (localhost [127.0.0.1])
 by tucnak.zalov.cz (8.16.1/8.16.1) with ESMTPS id 16QET4qc2919021
 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT);
 Mon, 26 Jul 2021 16:29:04 +0200
Received: (from jakub@localhost)
 by tucnak.zalov.cz (8.16.1/8.16.1/Submit) id 16QET3GS2919020;
 Mon, 26 Jul 2021 16:29:03 +0200
Date: Mon, 26 Jul 2021 16:29:02 +0200
From: Jakub Jelinek <jakub@redhat.com>
To: Kwok Cheung Yeung <kcy@codesourcery.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [WIP, OpenMP] OpenMP metadirectives support
Message-ID: <20210726142902.GW2380545@tucnak>
Reply-To: Jakub Jelinek <jakub@redhat.com>
References: <8d413974-0068-3a31-6ae5-d36c1be06d06@codesourcery.com>
MIME-Version: 1.0
In-Reply-To: <8d413974-0068-3a31-6ae5-d36c1be06d06@codesourcery.com>
X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
X-Spam-Status: No, score=-6.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_LOW,
 RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Jul 2021 14:29:14 -0000

On Fri, Jul 09, 2021 at 12:16:15PM +0100, Kwok Cheung Yeung wrote:
> This is a WIP implementation of metadirectives as defined in the OpenMP 5.0
> spec. I intend to add support for metadirectives as specified in OpenMP 5.1
> later (where the directive can be selected dynamically at runtime), but am
> concentrating on the static part for now. Parsing has only been implemented
> in the C frontend so far. I am especially interested in feedback regarding
> certain aspects of the implementation before I become too committed to the
> current design.

Note, there is a partial overlap with the attribute syntax changes, see below.
c-family/c-omp.c now has omp_directives table that should be updated for
changes like this and then c_omp_categorize_directive that returns some
information about the directives given a directive name (though, that name
can be one, two or three tokens long, consider e.g. target enter data
or cancellation point directives).

For metadirective, I think very special case are declarative directives in
them, I'd tend to sorry for them at least for now, I'm pretty sure many
cases with them are just unimplementable and will need to be restricted in
the standard, others can be implemented with lots of effort.
Whether it is e.g. metadirective guarding declare target ... end declare
target pair that would only conditionally set declare target and instead of
a single bit to find out if something is declare target or not we'd until
resolved need to compute it for all possibilities, or e.g. conditional
declare reduction/declare mapper where the name lookup for reduction or map
directives would be dependent on metadirective resolution later on, etc.
I'm afraid a total nightmare nobody has really thought about details for it.

> 1) When parsing each directive variant, a vector of tokens is constructed
> and populated with the tokens for a regular equivalent pragma, along with
> the tokens for its clauses and the body. The parser routine for that pragma
> type is then called with these tokens, and the entire resulting parse tree
> is stored as a sub-tree of the metadirective tree structure.
> 
> This results in the body being parsed and stored once for each directive
> variant. I believe this is necessary because the body is parsed differently
> if there is a 'for' in the directive (using c_parser_omp_for_loop) compared
> to if there is not, plus clauses in the directive (e.g. tile, collapse) can
> change how the for loop is parsed.
> 
> As an optimisation, identical body trees could be merged together, but that
> can come later.

I'm afraid it isn't just an optimization and we need to be as smart as
possible.  I'm not sure it is possible to parse everything many times,
consider e.g. labels in the blocks, nested function definitions, variable
definitions, etc.
While OpenMP requires that essentially the code must be valid if the
metadirective is replaced by any of those mentioned directives which rules
quite some weirdo corner cases, nothing prevents e.g. two or more
when directives to be standalone directives (which don't have any body and
so whatever comes after them should be left parsed for later as normal
statement sequence), one or more to be normal constructs that accept a
structured block and one or more to be e.g. looping constructs (simd, for,
distribute, taskloop or combined versions of those).
Even when issues with labels etc. are somehow solved (e.g. for structured
blocks we have the restriction that goto, break, continue, or switch into
a case/default label, etc. can't be used to enter or exit the structured
block which could mean some cases can be handled through renaming seen
labels in all but one bodies), most important is to sync on where parsing
should continue after the metadirective.
I think it would be nice if the metadirective parsing at least made quick
analysis on what kind of bodies the directives will want and can use the new
c-omp.c infrastructure or if needed extend it (e.g. separate the C_OMP_DIR_CONSTRUCT
category into C_OMP_DIR_CONSTRUCT and C_OMP_DIR_LOOPING_CONSTRUCT where
the latter would be used for those that expect some omp loop after it).
One option would be then to parse the body as the most restricted construct
(looping (and determine highest needed collapse and ordered), then construct,
then standalone) and be able to adjust what we parsed into what the
different constructs need, but another option is the separate parsing of
the code after the directive multiple times, but at least in the order of
most restricted to least restricted, remember where to stop and don't parse
it multiple times at least for directives that need the same thing.

> 
> 2) Selectors in the device set (i.e. kind, isa, arch) resolve differently
> depending on whether the program is running on a target or on the host.
> Since we don't keep multiple versions of a function for each target on the
> host compiler, resolving metadirectives with these selectors needs to be
> delayed until after LTO streaming, at which point the host or offload
> compiler can make the appropriate decision.

How is this different from declare variant?  For declare variant, it is true
I'm never trying to resolve it already during parsing of the call and that
probably should be changed, do a first attempt at that point.  Initially
I thought it typically will not be possible, but later clarification and
strong desire of LLVM/ICC etc. to do everything or almost everything already
during parsing suggests that it must be doable at least in some cases.
E.g. we have restrictions that requires directive on which some decision
could be dependent must appear only lexically before it or not at all, etc.
So, similarly to that, metadirective ideally should see if something is
impossible already during parsing (dunno if it should mean we wouldn't parse
the body in that case, that would mean worse diagnostics), then repeat the
checks during gimplification like declare variant is resolved there, then
repeat again after IPA.  Would be probably best if for metadirectives that
resolve to executable directives we represent it by something like a magic
IFN that is told everything needed to decide and can share as much code as
possible with the declare variant decisions.

It is true other compilers implement offloading quite differently from GCC,
by repeating all of preprocessing, parsing etc. for the offloading target,
so they can decide some metadirective/declare variant decisions earlier than
we can.  On the other side that approach has also quite some disadvantages,
it is much harder to ensure ABI compatibility between the host and offload
code if one can use #ifdefs and whatever to change layout of everything in
between.

For the checks during parsing, we'll need a different way how to track which
directives are currently active (or defer anything with construct
selectors till gimplification).  It is true that resolving that during
parsing goes against the goal to parse as many bodies together as possible,
so we need to pick one or the other.  Parsing what follows for all
standalone directives isn't a problem of course, but if the metadirective
has one when with for and another with simd, then parsing the loop just once
would be a problem if there is metadirective in the body that wants to
decide whether it is in for or simd and wants that decision be done during
parsing.

> One negative of this is that the metadirective Gimple representation lasts
> beyond the OMP expand stage, when generally we would expect all OMP
> directives to have been expanded to something else.
> 
> 3) In the OpenMP examples (version 5.0.1), section 9.7, the example
> metadirective.3.c does not work as expected.
> 
> #pragma omp declare target
> void exp_pi_diff(double *d, double my_pi){
>    #pragma omp metadirective \
>                when( construct={target}: distribute parallel for ) \
>                default( parallel for simd)
> ...
> int main()
> {
>    ...
>    #pragma omp target teams map(tofrom: d[0:N])
>    exp_pi_diff(d,my_pi);
>    ...
>    exp_pi_diff(d,my_pi);

The spec says in this case that the target construct is added to the
construct set because of the function appearing in between omp declare target
and omp end declare target, so the above is something that resolves
statically to distribute parallel for.
It is true that in OpenMP 5.1 the earlier
For functions within a declare target block, the target trait is added to the beginning of the
set as c 1 for any versions of the function that are generated for target regions so the total size
of the set is increased by 1.
has been mistakenly replaced with:
For device routines, the target trait is added to the beginning of the set as c 1 for any versions of
the procedure that are generated for target regions so the total size of the set is increased by 1.
by that has been corrected in 5.2:
C/C++:
For functions that are declared in a code region that is delimited by a declare target directive and
its paired end directive, the target trait is added to the beginning of the set as c 1 for any target
variants that result from the directive so the total size of the set is increased by one.
Fortran:
If a declare target directive appears in the specification part of a procedure or in the
specification part of a procedure interface body, the target trait is added to the beginning of the
set as c 1 for any target variants that result from the directive so the total size of the set is
increased by one.

So, it is really a static decision that can be decided already during
parsing.
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1505,6 +1505,7 @@ OBJS = \
>  	omp-general.o \
>  	omp-low.o \
>  	omp-oacc-kernels-decompose.o \
> +        omp-expand-metadirective.o \

Spaces instead of tab.

> @@ -1312,12 +1312,14 @@ static const struct omp_pragma_def omp_pragmas[] = {
>    { "allocate", PRAGMA_OMP_ALLOCATE },
>    { "atomic", PRAGMA_OMP_ATOMIC },
>    { "barrier", PRAGMA_OMP_BARRIER },
> +  { "begin", PRAGMA_OMP_BEGIN },
>    { "cancel", PRAGMA_OMP_CANCEL },
>    { "cancellation", PRAGMA_OMP_CANCELLATION_POINT },
>    { "critical", PRAGMA_OMP_CRITICAL },
>    { "depobj", PRAGMA_OMP_DEPOBJ },
> -  { "end", PRAGMA_OMP_END_DECLARE_TARGET },
> +  { "end", PRAGMA_OMP_END },
>    { "flush", PRAGMA_OMP_FLUSH },
> +  { "metadirective", PRAGMA_OMP_METADIRECTIVE },
>    { "requires", PRAGMA_OMP_REQUIRES },
>    { "section", PRAGMA_OMP_SECTION },
>    { "sections", PRAGMA_OMP_SECTIONS },

Please update for this also the omp_directives array.

> +enum pragma_kind
> +c_pp_lookup_pragma_by_name (const char *name)
> +{
> +  const int n_omp_pragmas = sizeof (omp_pragmas) / sizeof (*omp_pragmas);
> +  const int n_omp_pragmas_simd = sizeof (omp_pragmas_simd)
> +				 / sizeof (*omp_pragmas_simd);
> +
> +  void *result = bsearch (name, omp_pragmas, n_omp_pragmas,
> +			  sizeof (*omp_pragmas),
> +			  c_pp_lookup_pragma_by_name_1);
> +  if (!result)
> +    result = bsearch (name, omp_pragmas_simd, n_omp_pragmas_simd,
> +		      sizeof (*omp_pragmas_simd),
> +		      c_pp_lookup_pragma_by_name_1);
> +
> +  if (result)
> +    {
> +      const struct omp_pragma_def *def
> +	= (const struct omp_pragma_def *) result;
> +
> +      return (enum pragma_kind) def->id;
> +    }
> +
> +  return PRAGMA_NONE;
> +}

I think this should be dropped and c_omp_categorize_directive should
be used instead of it.

Please add a function comment to show the grammar.  See e.g.
c_parser_omp_declare.

>  
> +static void
> +c_parser_omp_begin (c_parser *parser, bool *if_p)
> +{
> +  location_t loc = c_parser_peek_token (parser)->location;
> +  c_parser_consume_pragma(parser);

Space before (.

> +  if (c_parser_peek_token (parser)->type == CPP_NAME)
> +    {
> +      const char *p = IDENTIFIER_POINTER (c_parser_peek_token (parser)->value);
> +
> +      if (strcmp (p, "metadirective") == 0)
> +	{
> +	  char p_name[sizeof "#pragma omp teams distribute parallel for simd"];
> +	  omp_clause_mask mask (0);
> +
> +	  c_parser_consume_token (parser);
> +	  c_parser_omp_metadirective (loc, parser, p_name, mask, NULL, if_p,
> +				      true);

metadirective, by not being itself combinable, doesn't need this p_name
and mask stuff.  That is used only for combined/composite construct when
the p_name and mask need to be computed dynamically based on the exact
parsing.  The begin metadirective vs. metadirective difference is a boolean
one that can be either passed as bool, or if the pragma token is passed
one could look at its pragma kind.

> +	  return;
> +	}
> +    }
> +
> +  error_at (loc, "expected %<begin metadirective%>");

"expected %<metadirective%>" ?  #pragma omp begin already appeared...

> +  c_parser_skip_to_pragma_eol (parser);
> +}
> +
> +static void
> +c_parser_omp_end (c_parser *parser)

Similarly with the function comment.

	Jakub