public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Thomas Schwinge <thomas@codesourcery.com>
To: Kwok Cheung Yeung <kcy@codesourcery.com>
Cc: Tobias Burnus <tobias@codesourcery.com>,
	<gcc-patches@gcc.gnu.org>, "Jakub Jelinek" <jakub@redhat.com>
Subject: Re: [PATCH] openmp: Add support for the 'indirect' clause in C/C++
Date: Thu, 9 Nov 2023 13:24:38 +0100	[thread overview]
Message-ID: <87wmurru61.fsf@euler.schwinge.homeip.net> (raw)
In-Reply-To: <37f412ee-58e7-4bde-a763-591268e8f8f4@codesourcery.com>

Hi Kwok!

Nice work!  A few comments:

On 2023-11-03T19:53:28+0000, Kwok Cheung Yeung <kcy@codesourcery.com> wrote:
> Subject: [PATCH] openmp: Add support for the 'indirect' clause in C/C++
>
> This adds support for the 'indirect' clause in the 'declare target'
> directive.  Functions declared as indirect may be called via function
> pointers passed from the host in offloaded code.
>
> Virtual calls to member functions via the object pointer in C++ are
> currently not supported in target regions.

Similar to how you have it here:

> --- a/gcc/config/nvptx/mkoffload.cc
> +++ b/gcc/config/nvptx/mkoffload.cc
> @@ -51,6 +51,7 @@ struct id_map
>  };
>
>  static id_map *func_ids, **funcs_tail = &func_ids;
> +static id_map *ind_func_ids, **ind_funcs_tail = &ind_func_ids;
>  static id_map *var_ids, **vars_tail = &var_ids;
>
>  /* Files to unlink.  */
> @@ -302,6 +303,11 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
|                 else if (startswith (input + i, "FUNC_MAP "))
|                   {
>                     output_fn_ptr = true;
>                     record_id (input + i + 9, &funcs_tail);
>                   }
> +               else if (startswith (input + i, "IND_FUNC_MAP "))
> +                 {
> +                   output_fn_ptr = true;
> +                   record_id (input + i + 13, &ind_funcs_tail);
> +                 }
>                 else
>                   abort ();
>                 /* Skip to next line. */

..., please also here:

> --- a/gcc/config/nvptx/nvptx.cc
> +++ b/gcc/config/nvptx/nvptx.cc
> @@ -5919,7 +5919,11 @@ nvptx_record_offload_symbol (tree decl)
>       /* OpenMP offloading does not set this attribute.  */
>       tree dims = attr ? TREE_VALUE (attr) : NULL_TREE;
>
> -     fprintf (asm_out_file, "//:FUNC_MAP \"%s\"",
> +     fprintf (asm_out_file, "//:");
> +     if (lookup_attribute ("omp declare target indirect",
> +                           DECL_ATTRIBUTES (decl)))
> +       fprintf (asm_out_file, "IND_");
> +     fprintf (asm_out_file, "FUNC_MAP \"%s\"",
>                IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)));

... maintain separate 'if' branches for 'FUNC_MAP' vs. 'IND_FUNC_MAP', so
that we're able to easily locate those with 'grep', for example.

Also, assuming that the order of appearance of 'IND_FUNC_MAP' does matter
as it does for 'FUNC_MAP', don't we have to handle the former in
nvptx-tools 'as', too?  For example, see
<https://github.com/MentorEmbedded/nvptx-tools/pull/29>
"Ensure :VAR_MAP and :FUNC_MAP are output in order", in particular
nvptx-tools commit aa3404ad5a496cda5d79a50bedb1344fd63e8763
"Ensure :VAR_MAP and :FUNC_MAP are output in order, part II [#29]".
Please check that, and submit on
<https://github.com/MentorEmbedded/nvptx-tools/pulls>, if necessary.

If yes, maybe that's another nudge towards:
"Instead of 'K_comment', a point could be made to have these be
represented as their own 'Kind'."  That is, '//:' be its own token kind,
and handled generically, instead of '//:VAR_MAP', '//:FUNC_MAP',
'//:IND_FUNC_MAP' specially/only.  I shall then look into that, later.

> --- a/gcc/lto-cgraph.cc
> +++ b/gcc/lto-cgraph.cc
> @@ -68,6 +68,7 @@ enum LTO_symtab_tags
>    LTO_symtab_edge,
>    LTO_symtab_indirect_edge,
>    LTO_symtab_variable,
> +  LTO_symtab_indirect_function,
>    LTO_symtab_last_tag
>  };

I did wonder if that new tag should have "offload" in its name, as that's
the only case where it's used?  But then I noticed that here
('output_offload_tables'):

> @@ -1111,6 +1112,18 @@ output_offload_tables (void)
>                              (*offload_vars)[i]);
>      }
>
> +  for (unsigned i = 0; i < vec_safe_length (offload_ind_funcs); i++)
> +    {
> +      symtab_node *node = symtab_node::get ((*offload_ind_funcs)[i]);
> +      if (!node)
> +     continue;
> +      node->force_output = true;
> +      streamer_write_enum (ob->main_stream, LTO_symtab_tags,
> +                        LTO_symtab_last_tag, LTO_symtab_indirect_function);
> +      lto_output_fn_decl_ref (ob->decl_state, ob->main_stream,
> +                           (*offload_ind_funcs)[i]);
> +    }
> +

..., and correspondingly here ('input_offload_tables'):

> @@ -1863,6 +1877,19 @@ input_offload_tables (bool do_force_output)
>               varpool_node::get (var_decl)->force_output = 1;
>             tmp_decl = var_decl;
>           }
> +       else if (tag == LTO_symtab_indirect_function)
> +         {
> +           tree fn_decl
> +             = lto_input_fn_decl_ref (ib, file_data);
> +           vec_safe_push (offload_ind_funcs, fn_decl);
> +
> +           /* Prevent IPA from removing fn_decl as unreachable, since there
> +              may be no refs from the parent function to child_fn in offload
> +              LTO mode.  */
> +           if (do_force_output)
> +             cgraph_node::get (fn_decl)->mark_force_output ();
> +           tmp_decl = fn_decl;
> +         }
>         else if (tag == LTO_symtab_edge)
>           {
>             static bool error_emitted = false;

..., we're currently using 'LTO_symtab_unavail_node' for 'offload_funcs'
and 'LTO_symtab_variable' for 'offload_vars', also with "LTO"
(non-"offload") tags, so 'LTO_symtab_indirect_function' isn't any worse.

Maybe, though, we should generally have separate tags for offloading use?
Possibly aliasing (in value) the LTO ones -- but maybe actually not, to
improve "type safety".  I shall look into that, later.

> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -350,6 +350,9 @@ enum omp_clause_code {
>    /* OpenMP clause: doacross ({source,sink}:vec).  */
>    OMP_CLAUSE_DOACROSS,
>
> +  /* OpenMP clause: indirect [(constant-integer-expression)].  */
> +  OMP_CLAUSE_INDIRECT,
> +
>    /* Internal structure to hold OpenACC cache directive's variable-list.
>       #pragma acc cache (variable-list).  */
>    OMP_CLAUSE__CACHE_,

In this position here, isn't 'OMP_CLAUSE_INDIRECT' applicable to the
'OMP_CLAUSE_RANGE_CHECK' in 'gcc/tree.h:OMP_CLAUSE_SIZE' and
'gcc/tree.h:OMP_CLAUSE_DECL':

    #define OMP_CLAUSE_SIZE(NODE)                                               \
      OMP_CLAUSE_OPERAND (OMP_CLAUSE_RANGE_CHECK (OMP_CLAUSE_CHECK (NODE),      \
                                              OMP_CLAUSE_FROM,          \
                                              OMP_CLAUSE__CACHE_), 1)

    #define OMP_CLAUSE_DECL(NODE)                                       \
      OMP_CLAUSE_OPERAND (OMP_CLAUSE_RANGE_CHECK (OMP_CLAUSE_CHECK (NODE),      \
                                              OMP_CLAUSE_PRIVATE,       \
                                              OMP_CLAUSE__SCANTEMP_), 0)

That's probably not intentional?  In that case, maybe simply move it at
the end of the clause list?  (..., and generally then match that ordering
in any 'switch'es, as applicable, and likewise position
'gcc/tree.h:OMP_CLAUSE_INDIRECT_EXPR' correspondingly.)

I would've assumed handling for 'OMP_CLAUSE_INDIRECT' to also be
necessary in the following places:

  - 'gcc/c-family/c-omp.cc:c_omp_split_clauses'
  - 'gcc/cp/pt.cc:tsubst_omp_clauses',
  - 'gcc/gimplify.cc:gimplify_scan_omp_clauses',
    'gcc/gimplify.cc:gimplify_adjust_omp_clauses'
  - 'gcc/omp-low.cc:scan_sharing_clauses' (twice)
  - 'gcc/tree-nested.cc:convert_nonlocal_omp_clauses',
    'gcc/tree-nested.cc:convert_local_omp_clauses'
  - 'gcc/tree-pretty-print.cc:dump_omp_clause'

Please verify, and add handling as well as test cases as necessary, or,
as applicable, put 'case OMP_CLAUSE_INDIRECT:' next to
'default: gcc_unreachable ();' etc., if indeed that clause is not
expected there.

In this file here:

> --- /dev/null
> +++ b/libgomp/config/accel/target-indirect.c
> @@ -0,0 +1,126 @@
> +/* Copyright (C) 2023 Free Software Foundation, Inc.
> +
> +   Contributed by Siemens.
> +
> +   This file is part of the GNU Offloading and Multi Processing Library
> +   (libgomp).
> +
> +   Libgomp is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
> +   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
> +   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> +   more details.
> +
> +   Under Section 7 of GPL version 3, you are granted additional
> +   permissions described in the GCC Runtime Library Exception, version
> +   3.1, as published by the Free Software Foundation.
> +
> +   You should have received a copy of the GNU General Public License and
> +   a copy of the GCC Runtime Library Exception along with this program;
> +   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <assert.h>
> +#include "libgomp.h"
> +
> +#define splay_tree_prefix indirect
> +#define splay_tree_c
> +#include "splay-tree.h"
> +
> +volatile void **GOMP_INDIRECT_ADDR_MAP = NULL;
> +
> +/* Use a splay tree to lookup the target address instead of using a
> +   linear search.  */
> +#define USE_SPLAY_TREE_LOOKUP
> +
> +#ifdef USE_SPLAY_TREE_LOOKUP
> +
> +static struct indirect_splay_tree_s indirect_map;
> +static indirect_splay_tree_node indirect_array = NULL;
> +
> +/* Build the splay tree used for host->target address lookups.  */
> +
> +void
> +build_indirect_map (void)
> +{
> +  size_t num_ind_funcs = 0;
> +  volatile void **map_entry;
> +  static int lock = 0; /* == gomp_mutex_t lock; gomp_mutex_init (&lock); */
> +
> +  if (!GOMP_INDIRECT_ADDR_MAP)
> +    return;
> +
> +  gomp_mutex_lock (&lock);
> +
> +  if (!indirect_array)
> +    {
> +      /* Count the number of entries in the NULL-terminated address map.  */
> +      for (map_entry = GOMP_INDIRECT_ADDR_MAP; *map_entry;
> +        map_entry += 2, num_ind_funcs++);
> +
> +      /* Build splay tree for address lookup.  */
> +      indirect_array = gomp_malloc (num_ind_funcs * sizeof (*indirect_array));
> +      indirect_splay_tree_node array = indirect_array;
> +      map_entry = GOMP_INDIRECT_ADDR_MAP;
> +
> +      for (int i = 0; i < num_ind_funcs; i++, array++)
> +     {
> +       indirect_splay_tree_key k = &array->key;
> +       k->host_addr = (uint64_t) *map_entry++;
> +       k->target_addr = (uint64_t) *map_entry++;
> +       array->left = NULL;
> +       array->right = NULL;
> +       indirect_splay_tree_insert (&indirect_map, array);
> +     }
> +    }
> +
> +  gomp_mutex_unlock (&lock);
> +}
> +
> +void *
> +GOMP_target_map_indirect_ptr (void *ptr)
> +{
> +  /* NULL pointers always resolve to NULL.  */
> +  if (!ptr)
> +    return ptr;
> +
> +  assert (indirect_array);
> +
> +  struct indirect_splay_tree_key_s k;
> +  indirect_splay_tree_key node = NULL;
> +
> +  k.host_addr = (uint64_t) ptr;
> +  node = indirect_splay_tree_lookup (&indirect_map, &k);
> +
> +  return node ? (void *) node->target_addr : ptr;
> +}
> +
> +#else
> +
> +void
> +build_indirect_map (void)
> +{
> +}
> +
> +void *
> +GOMP_target_map_indirect_ptr (void *ptr)
> +{
> +  /* NULL pointers always resolve to NULL.  */
> +  if (!ptr)
> +    return ptr;
> +
> +  assert (GOMP_INDIRECT_ADDR_MAP);
> +
> +  for (volatile void **map_entry = GOMP_INDIRECT_ADDR_MAP; *map_entry;
> +       map_entry += 2)
> +    if (*map_entry == ptr)
> +      return (void *) *(map_entry + 1);
> +
> +  return ptr;
> +}
> +
> +#endif

..., I'm curious why certain variables are declared 'volatile'?  Is that
really the right approach for whatever exactly the (concurrency?)
requirements here are?

> --- a/libgomp/config/gcn/team.c
> +++ b/libgomp/config/gcn/team.c
> @@ -30,6 +30,7 @@

> +extern void build_indirect_map (void);

Why not generally have a prototype for this (new
'libgomp/config/accel/target-indirect.h', or maybe just
'libgomp/libgomp.h'?)?

> @@ -45,6 +46,9 @@ gomp_gcn_enter_kernel (void)
>  {
>    int threadid = __builtin_gcn_dim_pos (1);
>

Shouldn't this:

> +  /* Initialize indirect function support.  */
> +  build_indirect_map ();
> +

... be called inside here:

>    if (threadid == 0)
>      {

..., so that it's only executed by one thread?

Also, for my understanding: why is 'build_indirect_map' done at kernel
invocation time (here) instead of at image load time?

> --- a/libgomp/config/nvptx/team.c
> +++ b/libgomp/config/nvptx/team.c
> @@ -35,6 +35,7 @@ struct gomp_thread *nvptx_thrs __attribute__((shared,nocommon));

> +extern void build_indirect_map (void);

Likewise to 'libgomp/config/gcn/team.c'.

> @@ -52,6 +53,10 @@ gomp_nvptx_main (void (*fn) (void *), void *fn_data)
>    int tid, ntids;
>    asm ("mov.u32 %0, %%tid.y;" : "=r" (tid));
>    asm ("mov.u32 %0, %%ntid.y;" : "=r" (ntids));
> +
> +  /* Initialize indirect function support.  */
> +  build_indirect_map ();
> +
>    if (tid == 0)
>      {

Likewise to 'libgomp/config/gcn/team.c'.

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-2.c
> @@ -0,0 +1,33 @@
> +/* { dg-do run } */
> +
> +#define N 256
> +
> +#pragma omp begin declare target indirect
> +int foo(void) { return 5; }
> +int bar(void) { return 8; }
> +int baz(void) { return 11; }
> +#pragma omp end declare target
> +
> +int main (void)
> +{
> +  int i, x = 0, expected = 0;
> +  int (*fn_ptr[N])(void);
> +
> +  for (i = 0; i < N; i++)
> +    {
> +      switch (i % 3)
> +     {
> +     case 0: fn_ptr[i] = &foo;
> +     case 1: fn_ptr[i] = &bar;
> +     case 2: fn_ptr[i] = &baz;
> +     }
> +      expected += (*fn_ptr[i]) ();
> +    }
> +
> +#pragma omp target teams distribute parallel for reduction(+: x) \
> +             map (to: fn_ptr) map (tofrom: x)
> +  for (int i = 0; i < N; i++)
> +    x += (*fn_ptr[i]) ();
> +
> +  return x - expected;
> +}

    [...]/libgomp.c-c++-common/declare-target-indirect-2.c: In function ‘main’:
    [...]/libgomp.c-c++-common/declare-target-indirect-2.c:20:27: warning: this statement may fall through [-Wimplicit-fallthrough=]
       20 |         case 0: fn_ptr[i] = &foo;
          |                 ~~~~~~~~~~^~~~~~
    [...]/libgomp.c-c++-common/declare-target-indirect-2.c:21:9: note: here
       21 |         case 1: fn_ptr[i] = &bar;
          |         ^~~~
    [...]/libgomp.c-c++-common/declare-target-indirect-2.c:21:27: warning: this statement may fall through [-Wimplicit-fallthrough=]
       21 |         case 1: fn_ptr[i] = &bar;
          |                 ~~~~~~~~~~^~~~~~
    [...]/libgomp.c-c++-common/declare-target-indirect-2.c:22:9: note: here
       22 |         case 2: fn_ptr[i] = &baz;
          |         ^~~~

..., so I suppose that's effectively testing 'fn_ptr[i] = &baz;' only for
all 'i's?


Grüße
 Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

  parent reply	other threads:[~2023-11-09 12:24 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-08 13:13 Kwok Cheung Yeung
2023-10-17 13:12 ` Tobias Burnus
2023-10-17 13:34   ` Jakub Jelinek
2023-10-17 14:41     ` Tobias Burnus
2023-11-03 19:53   ` Kwok Cheung Yeung
2023-11-06  8:48     ` Tobias Burnus
2023-11-07 21:37       ` Joseph Myers
2023-11-07 21:51         ` Jakub Jelinek
2023-11-07 21:59           ` Kwok Cheung Yeung
2023-11-09 12:24     ` Thomas Schwinge [this message]
2023-11-09 16:00       ` Tobias Burnus
2023-11-13 10:59         ` Thomas Schwinge
2023-11-13 11:47           ` Tobias Burnus
2024-04-11 10:10             ` Thomas Schwinge
2024-01-03 14:47       ` [committed] " Kwok Cheung Yeung
2024-01-03 15:54       ` Kwok Cheung Yeung
2024-01-22 20:33     ` [PATCH] openmp: Change to using a hashtab to lookup offload target addresses for indirect function calls Kwok Cheung Yeung
2024-01-24  7:06       ` rep.dot.nop
2024-01-29 17:48         ` [PATCH v2] " Kwok Cheung Yeung
2024-03-08 13:40           ` Thomas Schwinge
2024-03-14 11:38           ` Tobias Burnus
2024-01-22 20:41     ` [PATCH] openmp, fortran: Add Fortran support for indirect clause on the declare target directive Kwok Cheung Yeung
2024-01-23 19:14       ` Tobias Burnus
2024-02-05 21:37         ` [PATCH v2] " Kwok Cheung Yeung
2024-02-06  9:03           ` Tobias Burnus
2024-02-06  9:50             ` Kwok Cheung Yeung
2024-02-12  8:51               ` Tobias Burnus
2024-02-15 21:37                 ` [COMMITTED] libgomp: Update documentation for indirect calls in target regions Kwok Cheung Yeung

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wmurru61.fsf@euler.schwinge.homeip.net \
    --to=thomas@codesourcery.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=kcy@codesourcery.com \
    --cc=tobias@codesourcery.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).