public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Jakub Jelinek <jakub@redhat.com>
To: Tobias Burnus <Tobias_Burnus@mentor.com>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: [Patch] OpenMP: Move omp requires checks to libgomp
Date: Thu, 9 Jun 2022 13:40:19 +0200	[thread overview]
Message-ID: <YqHcI46gvVbB+E/G@tucnak> (raw)
In-Reply-To: <07fec82a-41cf-fdc5-6307-c068dd95ef1a@mentor.com>

On Wed, Jun 08, 2022 at 05:56:02AM +0200, Tobias Burnus wrote:
> gcc/c/ChangeLog:
> 
> 	* c-parser.cc (c_parser_declaration_or_fndef): Set
> 	OMP_REQUIRES_TARGET_USED in omp_requires_mask if function has
> 	"omp declare target" attribute.
> 	(c_parser_omp_target_data): Set	OMP_REQUIRES_TARGET_USED in
> 	omp_requires_mask.
> 	(c_parser_omp_target_enter_data): Likewise.
> 	(c_parser_omp_target_exit_data): Likewise.
> 	(c_parser_omp_requires): Remove sorry.
> 
> gcc/cp/ChangeLog:
> 
> 	* parser.cc (cp_parser_simple_declaration): Set
> 	OMP_REQUIRES_TARGET_USED in omp_requires_mask if function has
> 	"omp declare target" attribute.
> 	(cp_parser_omp_target_data): Set OMP_REQUIRES_TARGET_USED in
> 	omp_requires_mask.
> 	(cp_parser_omp_target_enter_data): Likewise.
> 	(cp_parser_omp_target_exit_data): Likewise.
> 	(cp_parser_omp_requires): Remove sorry.
> 
> gcc/fortran/ChangeLog:
> 
> 	* openmp.cc (gfc_match_omp_requires): Remove "not implemented yet".
> 	* parse.cc: Include "tree.h" and "omp-general.h".
> 	(gfc_parse_file): Add code to merge omp_requires to omp_requires_mask.
> 
> gcc/ChangeLog:
> 
> 	* omp-general.h (omp_runtime_api_call): New prototype.
> 	* omp-general.cc (omp_runtime_api_call): Added device_api_only arg
> 	and moved from ...
> 	* omp-low.cc (omp_runtime_api_call): ... here.
> 	(scan_omp_1_stmt): Update call.
> 	* gimplify.cc (gimplify_call_expr): Call omp_runtime_api_call.
> 	* omp-offload.cc (omp_finish_file): Add code to create OpenMP requires
> 	mask variable in .gnu.gomp_requires section, if needed.
> 
> include/ChangeLog:
> 
> 	* gomp-constants.h (GOMP_REQUIRES_UNIFIED_ADDRESS,
> 	GOMP_REQUIRES_UNIFIED_SHARED_MEMORY,
> 	GOMP_REQUIRES_REVERSE_OFFLOAD): New.
> 
> libgcc/ChangeLog:
> 
> 	* offloadstuff.c (__requires_mask_table, __requires_mask_table_end):
> 	New symbols to mark start and end of the .gnu.gomp_requires section.
> 
> 
> libgomp/ChangeLog:
> 
> 	* libgomp-plugin.h (GOMP_OFFLOAD_get_num_devices): Add
> 	omp_requires_mask arg.
> 	* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices): Likewise;
> 	return -1 when device available but omp_requires_mask != 0.
> 	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices): Likewise.
> 	* oacc-host.c (host_get_num_devices, host_openacc_get_property):
> 	Update call.
> 	* oacc-init.c (resolve_device, acc_init_1, acc_shutdown_1,
> 	goacc_attach_host_thread_to_device, acc_get_num_devices,
> 	acc_set_device_num, get_property_any): Likewise.
> 	* target.c: (__requires_mask_table, __requires_mask_table_end):
> 	Declare weak extern symbols.
> 	(gomp_requires_to_name): New.
> 	(gomp_target_init): Add code to check .gnu._gomp_requires section
> 	mask values for inconsistencies; warn when requirements makes an
> 	existing device unsupported.
> 	* testsuite/libgomp.c-c++-common/requires-1-aux.c: New test.
> 	* testsuite/libgomp.c-c++-common/requires-1.c: New test.
> 	* testsuite/libgomp.c-c++-common/requires-2-aux.c: New test.
> 	* testsuite/libgomp.c-c++-common/requires-2.c: New test.
> 
> liboffloadmic/ChangeLog:
> 
> 	* plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_get_num_devices):
> 	Return -1 when device available but omp_requires_mask != 0.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* c-c++-common/gomp/requires-4.c: Update dg-*.
> 	* c-c++-common/gomp/target-device-ancestor-2.c: Likewise.
> 	* c-c++-common/gomp/target-device-ancestor-3.c: Likewise.
> 	* c-c++-common/gomp/target-device-ancestor-4.c: Likewise.
> 	* c-c++-common/gomp/target-device-ancestor-5.c: Likewise.
> 	* gfortran.dg/gomp/target-device-ancestor-3.f90: Likewise.
> 	* gfortran.dg/gomp/target-device-ancestor-4.f90: Likewise.
> 	* gfortran.dg/gomp/target-device-ancestor-2.f90: Likewise. Move post-FE
> 	checks to ...
> 	* gfortran.dg/gomp/target-device-ancestor-2a.f90: ... this new file.

> +      if (flag_openmp
> +         && lookup_attribute ("omp declare target",
> +                              DECL_ATTRIBUTES (current_function_decl)))
> +       omp_requires_mask
> +         = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);

I must admit it is unclear what the
"must appear lexically before any device constructs or device routines."
restriction actually means for device routines.
Is that lexically before definition of such device routines, or even their
declarations?

It wouldn't surprise me if some library packages started eventually adding
declare target directives in some headers around external declarations,
should that be the point after which we don't allow requires directives?

On the other side, for the definitions, we don't need to know when parsing
the definition whether it is a device routine.

void
foo (void)
{
}
#pragma omp declare target to (foo)

And yet another question: is
void bar (void);
#pragma omp declare target device_type (host) to (bar)
void
bar (void)
{
}
a device routine or not?

The above patch snippet is I believe for function definitions that were
arked as declare target before the definition somehow (another decl for
it merged with the new one or in between the begin/end).  And is true
even for device_type (host), to rule that out you'd need to check for
"omp declare target host" attribute not being present.
I'm not against the above snippet perhaps adjusted for device_type(host),
but IMHO we want clarifications from omp-lang.

> @@ -20915,6 +20921,10 @@ c_parser_omp_teams (location_t loc, c_parser *parser,
>  static tree
>  c_parser_omp_target_data (location_t loc, c_parser *parser, bool *if_p)
>  {
> +  if (flag_openmp)
> +    omp_requires_mask
> +      = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);
> +
>    tree clauses
>      = c_parser_omp_all_clauses (parser, OMP_TARGET_DATA_CLAUSE_MASK,
>  				"#pragma omp target data");

target update is also a device construct and the above snippet hasn't been
added for it, ditto for interop which we don't implement yet.
But, my preference would be instead of adding these snippets to
c_parser_omp_target_{data,enter_data,exit_data,update} etc. move it from
c_parser_omp_target to c_parser_omp_all_clauses:
  if (flag_openmp
      && (mask & (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DEVICE)) != 0)
    omp_requires_mask
      = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);
(somewhere at the start of the function), because the definition of device
constructs is exactly like that:
"device construct	An OpenMP construct that accepts the device clause."

> diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> index da2f370cdca..6e26d123370 100644
> --- a/gcc/cp/parser.cc
> +++ b/gcc/cp/parser.cc
> @@ -15389,6 +15389,11 @@ cp_parser_simple_declaration (cp_parser* parser,
>  	  /* Otherwise, we're done with the list of declarators.  */
>  	  else
>  	    {
> +	      if (flag_openmp && lookup_attribute ("omp declare target",
> +						   DECL_ATTRIBUTES (decl)))
> +		omp_requires_mask
> +		  = (enum omp_requires) (omp_requires_mask
> +					 | OMP_REQUIRES_TARGET_USED);
>  	      pop_deferring_access_checks ();
>  	      cp_finalize_omp_declare_simd (parser, &odsd);
>  	      return;

Ditto.

> @@ -44287,6 +44292,10 @@ cp_parser_omp_teams (cp_parser *parser, cp_token *pragma_tok,
>  static tree
>  cp_parser_omp_target_data (cp_parser *parser, cp_token *pragma_tok, bool *if_p)
>  {
> +  if (flag_openmp)
> +    omp_requires_mask
> +      = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);
> +
>    tree clauses
>      = cp_parser_omp_all_clauses (parser, OMP_TARGET_DATA_CLAUSE_MASK,
>  				 "#pragma omp target data", pragma_tok);
> @@ -44390,6 +44399,10 @@ cp_parser_omp_target_enter_data (cp_parser *parser, cp_token *pragma_tok,
>        return true;
>      }
>  
> +  if (flag_openmp)
> +    omp_requires_mask
> +      = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);
> +
>    tree clauses
>      = cp_parser_omp_all_clauses (parser, OMP_TARGET_ENTER_DATA_CLAUSE_MASK,
>  				 "#pragma omp target enter data", pragma_tok);
> @@ -44481,6 +44494,10 @@ cp_parser_omp_target_exit_data (cp_parser *parser, cp_token *pragma_tok,
>        return true;
>      }
>  
> +  if (flag_openmp)
> +    omp_requires_mask
> +      = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);
> +
>    tree clauses
>      = cp_parser_omp_all_clauses (parser, OMP_TARGET_EXIT_DATA_CLAUSE_MASK,
>  				 "#pragma omp target exit data", pragma_tok);

Ditto.

For Fortran, is the above mostly not needed because requires need to be in
the specification part and device constructs are executable and appear in
the part after it?  Do we allow requires in BLOCK's specification part?

> --- a/gcc/gimplify.cc
> +++ b/gcc/gimplify.cc
> @@ -3644,6 +3644,9 @@ gimplify_call_expr (tree *expr_p, gimple_seq *pre_p, bool want_value)
>  	  return GS_OK;
>  	}
>      }
> +  if (fndecl && flag_openmp && omp_runtime_api_call (fndecl, true))
> +    omp_requires_mask
> +      = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);
>  
>    /* Remember the original function pointer type.  */
>    fnptrtype = TREE_TYPE (CALL_EXPR_FN (*expr_p));

I'm sure device APIs were discussed, but I must be blind and I can't find it
in either 5.0, 5.1 or 5.2.  All I see is device constructs or device routines
in those places where I'd also look for device related OpenMP runtime
library APIs.  Though, if some routine calls omp_get_num_devices (),
certainly the library at that point needs to know
reverse_offload/unified_shared_memory/etc. requires because that determines
how many devices it has.  So, what have I missed (aka on which place in the
standard the above snippet is based on)?
Perhaps I had in mind by "device routines" the OpenMP runtime APIs related
to devices, but unfortunately we have a different glossary for that term:
"device routine	A function (for C/C+ and Fortran) or subroutine (for Fortran)
		that can be executed on a target device, as part of a target region."

> +      /* Now likewise but for device API. */

Two spaces after .

> +      /* Now omp_* calls that are available as omp_* and omp_*_; however, the
> +	 DECL_NAME is always omp_* without tailing underscore. Non device.  */

Likewise.

> +      /* And device APIs. */
> +      "get_device_num",
> +      "get_initial_device",
> +      "is_initial_device",  /* Even if it does not require init'ed devices. */
> +      NULL,
> +      /* And finally calls available as omp_*, omp_*_ and omp_*_8_; however,
> +	 as DECL_NAME only omp_* and omp_*_8 appear. For non device.  */

Ditto 3x.

> --- a/gcc/omp-offload.cc
> +++ b/gcc/omp-offload.cc
> @@ -397,6 +397,27 @@ omp_finish_file (void)
>    unsigned num_funcs = vec_safe_length (offload_funcs);
>    unsigned num_vars = vec_safe_length (offload_vars);
>  
> +  if (flag_openmp && (omp_requires_mask & OMP_REQUIRES_TARGET_USED) != 0)
> +    {
> +      if (targetm_common.have_named_sections)
> +	{
> +	  const char *requires_section = ".gnu.gomp_requires";
> +	  tree maskvar = build_decl (UNKNOWN_LOCATION, VAR_DECL,
> +				     get_identifier (".gomp_requires_mask"),
> +				     unsigned_type_node);
> +	  SET_DECL_ALIGN (maskvar, TYPE_ALIGN (unsigned_type_node));

Don't we want also DECL_USER_ALIGN (maskvar) = 1; so that
we never try to increase its alignment?

Is it an allocated section, or should it be better non-allocated and then
dealt with by mkoffload?

Shouldn't the vars in that section be const, so that it is a read-only
section?

Is unsigned_type_node what we want (say wouldn't be just unsigned_char_node
be enough, currently we just need 3 bits).

Also, wonder if for HAVE_GAS_SHF_MERGE && flag_merge_constants
we shouldn't try to make that section mergeable.  If it goes away during
linking and is replaced by something, then it doesn't matter, but otherwise,
as we don't record which TU had what flags, all we care about is that
there were some TUs which used device construct/routines (and device APIs?)
and used bitmask 7, other TUs that used bitmask 3 and others that used
bitmask 4.

> +	  TREE_STATIC (maskvar) = 1;
> +	  DECL_INITIAL (maskvar)
> +	    = build_int_cst (unsigned_type_node,
> +			     ((unsigned int) omp_requires_mask
> +			      & (OMP_REQUIRES_UNIFIED_ADDRESS
> +				 | OMP_REQUIRES_UNIFIED_SHARED_MEMORY
> +				 | OMP_REQUIRES_REVERSE_OFFLOAD)));
> +	  set_decl_section_name (maskvar, requires_section);
> +	  varpool_node::finalize_decl (maskvar);
> +	}
> +    }
> +
>    if (num_funcs == 0 && num_vars == 0)
>      return;
>  
> @@ -442,6 +463,14 @@ omp_finish_file (void)
>      }
>    else
>      {
> +#ifndef ACCEL_COMPILER
> +      if (flag_openmp
> +	  && (omp_requires_mask & OMP_REQUIRES_TARGET_USED)
> +	  && (omp_requires_mask & (OMP_REQUIRES_UNIFIED_ADDRESS
> +				   | OMP_REQUIRES_UNIFIED_SHARED_MEMORY
> +				   | OMP_REQUIRES_REVERSE_OFFLOAD)))
> +	sorry ("OpenMP device offloading is not supported for this target");
> +#endif

I don't understand this snippet.  Without named sections on the host,
I bet we simply don't support offloading at all,
the record_offload_symbol target hook is only non-trivially defined
for nvptx and nvptx isn't typical host for OpenMP offloading,
because we don't remember it anywhere.

> @@ -32,61 +29,4 @@ integer :: a, b, c
> -
> -
> -end
> \ No newline at end of file

Please avoid this in all files (unless it was there
previously and you are fixing it).

> --- a/include/gomp-constants.h
> +++ b/include/gomp-constants.h
> @@ -330,6 +330,12 @@ enum gomp_map_kind
>  #define GOMP_DEPEND_MUTEXINOUTSET	4
>  #define GOMP_DEPEND_INOUTSET		5
>  
> +/* Flag values for requires-directive features, must match corresponding
> +   OMP_REQUIRES_* values in gcc/omp-general.h.  */
> +#define GOMP_REQUIRES_UNIFIED_ADDRESS       0x10
> +#define GOMP_REQUIRES_UNIFIED_SHARED_MEMORY 0x20
> +#define GOMP_REQUIRES_REVERSE_OFFLOAD       0x80

They don't have to much those, we can translate them
(and translating them to 1/2/4 might be a good idea).

> --- a/libgomp/libgomp-plugin.h
> +++ b/libgomp/libgomp-plugin.h
> @@ -125,7 +125,7 @@ extern void GOMP_PLUGIN_fatal (const char *, ...)
>  extern const char *GOMP_OFFLOAD_get_name (void);
>  extern unsigned int GOMP_OFFLOAD_get_caps (void);
>  extern int GOMP_OFFLOAD_get_type (void);
> -extern int GOMP_OFFLOAD_get_num_devices (void);
> +extern int GOMP_OFFLOAD_get_num_devices (unsigned int);

I wonder if we shouldn't rename it when we change the arguments,
so that if one mixes an older plugin with newer libgomp or vice versa
this is easily caught.

> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -36,6 +36,7 @@
>  # include <inttypes.h>  /* For PRIu64.  */
>  #endif
>  #include <string.h>
> +#include <stdio.h>  /* For snprintf. */
>  #include <assert.h>
>  #include <errno.h>
>  
> @@ -98,6 +99,13 @@ static int num_devices;
>  /* Number of GOMP_OFFLOAD_CAP_OPENMP_400 devices.  */
>  static int num_devices_openmp;
>  
> +/* Start/end of .gnu.gomp.requires section of program, defined in

Isn't it .gnu.gomp_requires ?

> +   crtoffloadbegin/end.o.  */
> +__attribute__((weak))
> +extern const unsigned int __requires_mask_table[];
> +__attribute__((weak))
> +extern const unsigned int __requires_mask_table_end[];

I must say it is unclear to me how this works.
It will only find one such array, say in the executable,
or if the executable doesn't have it, in one of the shared libraries.

I think we want some solution that will work with OpenMP code
at least in the executable and some shared libraries it is linked against
(obviously another case is when a library with certain #pragma omp requires
is dlopened after we've finalized the number of devices, bet the options
in that case are either warn or fatal error).

The choices could be e.g. make __requires_mask_table{,_end} .hidden
and in the crtoffloadbegin.o (or end) unconditionally call some new libgomp
routine to register the table, but the disadvantage of that is that we could
have many of those register calls even when there is nothing to register
(sure, the .ctor in crtoffloadbegin.o (or end) could compare the 2 addresses
and not call anything if the table is empty but there would be still code
executed during initialization of the library).

Yet another possibility is linker plugin case.  We already use it for the
case where we actually have some offloading LTO bytecode, transform it into
a data section and register with GOMP_offload_register*.
So, if we could e.g. at the same time also process those requires arrays,
diagnose at link time if multiple TUs with that set disagree on the mask
value and in the target data provide that mask to the library, that would
be much nicer.
And the masks either could be gathered from .gnu.gomp_requires or it can be
somehow encoded in the offloading LTO or its associated data.
What is important though is that it will work even if we actually don't have
any "omp declare target" functions or variables in the TU or the whole
executable or library, just the requires mask.  But that can be dealt with
e.g. by forcing the LTO sections even for that case or so.

	Jakub


  reply	other threads:[~2022-06-09 11:40 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-13 15:07 [PATCH, OpenMP 5.0] More implementation of the requires directive Chung-Lin Tang
2021-01-13 15:27 ` Jakub Jelinek
2021-03-25 11:18 ` Thomas Schwinge
2022-03-29 13:42 ` Andrew Stubbs
2022-06-08  3:56 ` [Patch] OpenMP: Move omp requires checks to libgomp Tobias Burnus
2022-06-09 11:40   ` Jakub Jelinek [this message]
2022-06-09 12:46     ` Tobias Burnus
2022-06-09 14:19       ` Jakub Jelinek
2022-06-29 14:33         ` [Patch][v4] " Tobias Burnus
2022-06-29 17:02           ` Jakub Jelinek
2022-06-29 18:10             ` Tobias Burnus
2022-06-29 20:18               ` Jakub Jelinek
2022-07-01 13:06                 ` [Patch][v5] " Tobias Burnus
2022-07-01 14:34                   ` Jakub Jelinek
2022-07-01 16:31                     ` Tobias Burnus
2022-07-01 16:55                       ` Jakub Jelinek
2022-07-01 21:08                         ` Tobias Burnus
2022-07-04  8:31                           ` Jakub Jelinek
2022-07-07 13:26                           ` Fix one issue in OpenMP 'requires' directive diagnostics (was: [Patch][v5] OpenMP: Move omp requires checks to libgomp) Thomas Schwinge
2022-07-07 13:56                             ` Tobias Burnus
2022-07-08  6:59                               ` Thomas Schwinge
2022-07-06 10:42                   ` Restore 'GOMP_offload_unregister_ver' functionality " Thomas Schwinge
2022-07-06 13:59                     ` Tobias Burnus
2022-07-06 21:08                       ` Thomas Schwinge
2022-08-17 11:45                       ` Jakub Jelinek
2023-09-15  9:41                   ` [Patch][v5] OpenMP: Move omp requires checks to libgomp Thomas Schwinge
2022-07-07  8:37           ` Adjust 'libgomp.c-c++-common/requires-3.c' (was: [Patch][v4] OpenMP: Move omp requires checks to libgomp) Thomas Schwinge
2022-07-07  9:02             ` Tobias Burnus
2022-07-07  8:42           ` Enhance 'libgomp.c-c++-common/requires-4.c', 'libgomp.c-c++-common/requires-5.c' testing " Thomas Schwinge
2022-07-07  9:36             ` Tobias Burnus
2022-07-07 10:42               ` Thomas Schwinge
2022-07-06 10:30   ` Define 'OMP_REQUIRES_[...]', 'GOMP_REQUIRES_[...]' in a single place (was: [Patch] " Thomas Schwinge
2022-07-06 13:40     ` Tobias Burnus
2022-07-06 11:04   ` Fix Intel MIC 'mkoffload' for OpenMP 'requires' " Thomas Schwinge
2022-07-06 11:29     ` Tobias Burnus
2022-07-06 12:38       ` Thomas Schwinge
2022-07-06 13:30         ` Tobias Burnus
2022-07-07 10:46           ` Thomas Schwinge
2022-07-06 14:19     ` Tobias Burnus
2024-03-07 12:38   ` nvptx: 'cuDeviceGetCount' failure is fatal " Thomas Schwinge
2024-03-07 14:28     ` nvptx: 'cuDeviceGetCount' failure is fatal Tobias Burnus
2024-03-08 15:58       ` Thomas Schwinge

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YqHcI46gvVbB+E/G@tucnak \
    --to=jakub@redhat.com \
    --cc=Tobias_Burnus@mentor.com \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).