public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Tobias Burnus <tobias@codesourcery.com>
To: Thomas Schwinge <thomas@codesourcery.com>,
	Kwok Cheung Yeung <kcy@codesourcery.com>
Cc: <gcc-patches@gcc.gnu.org>, Jakub Jelinek <jakub@redhat.com>
Subject: Re: [PATCH] openmp: Add support for the 'indirect' clause in C/C++
Date: Mon, 13 Nov 2023 12:47:04 +0100	[thread overview]
Message-ID: <7ed57b08-3895-49b8-8b10-642609f0a8d2@codesourcery.com> (raw)
In-Reply-To: <875y25udf9.fsf@euler.schwinge.homeip.net>

Hi Thomas,

On 13.11.23 11:59, Thomas Schwinge wrote:
>>>     - 'gcc/cp/pt.cc:tsubst_omp_clauses',
>>>     - 'gcc/gimplify.cc:gimplify_scan_omp_clauses',
>>>       'gcc/gimplify.cc:gimplify_adjust_omp_clauses'
>>>     - 'gcc/omp-low.cc:scan_sharing_clauses' (twice)
>>>     - 'gcc/tree-nested.cc:convert_nonlocal_omp_clauses',
>>>       'gcc/tree-nested.cc:convert_local_omp_clauses'
>>>     - 'gcc/tree-pretty-print.cc:dump_omp_clause'
> [...]
>> Most of those seem to relate to executable directives
> (That remark I don't understand.)

OpenMP classifies directives into categories. The following exists
(grep + count of TR12's json/directive/ files.)

       2 meta
       2 subsidiary
       2 utility
       4 informational
      11 declarative
      40 executable

where "declare target' + 'indirect' (like declare variant, declare simd) is declarative,
i.e.

"declare target directive - A declarative directive that ensures that procedures
and/or variables can be executed or accessed on a device."

For those, we usually add an attribute to the function declaration. And 'executive' is
defined as

"executable directive - A directive that appears in an executable context and results in
implementation code and/or prescribes the manner in which associate user code must execute."

Those either are turned into a libgomp call or transform some code - possibly including
calling the library. That would be, e.g. 'omp target', 'omp parallel', 'omp atomic'.

The listed functions all deal with executable code, i.e. some tree ..._EXPR, possibly associated
with a structured block (at least the scan_* only apply for block-associated directives as they
scan of usage inside that block - affecting the directive/the directive's clauses).

* * *

>> We use noclone,noinline attributes for 'declare target', thus, there
>> should be no issue on this side and regarding tsubst_omp_clauses, as the
>> clause is either present or not (either bare or with a parse-time
>> constant logical), there is not much post processing needed.
> That's not obvious to the casual reader of GCC source code, though.

True, but that's the general problem with code - without background, you
don't always understand the flow in the code and when something is called.

I think there is no good way how this can be solved; or rather, it can
be solved for a specific question but the next person looks for
something else and then has the same problem and the previous "solution"
(like comment) doesn't help.

In some cases, I think it helps to add comments, but I have the feeling
that your question is to specific (you look at a single patch) to be
really helpful here. But I am happy to be proven wrong and see useful
code changes/comments.

>> Thus, I bet that there is nothing to do for those.
>>
>>> Please verify, and add handling as well as test cases as necessary, or,
>>> as applicable, put 'case OMP_CLAUSE_INDIRECT:' next to
>>> 'default: gcc_unreachable ();' etc., if indeed that clause is not
>>> expected there.
>> What's the point of having it next to default if it is gcc_unreachable?
> Instead of "bet", I suggest to document intentions: so that it's clear
> that 'OMP_CLAUSE_INDIRECT' is not meant to be seen here vs. an accidental
> omission.
Done - in this email thread, which can be found by patch archeology.
>> I mean there are several others which shouldn't be needed like
>> OMP_CLAUSE_DEVICE_TYPE which also does not show up at gcc/cp/pt.cc.
> Quite possible.  :-) I certainly wouldn't object to "handling" those,
> too.
>
> Generally, in my opinion, we should usually see 'case's listed for all
> clause codes where we 'switch' on them, for example.

If there are plenty of 'default:', I am no sure it makes sense.

But in general, the question is (for a switch where most 'enum' values are used)
whether it would make more sense to remove the 'default: gcc_unreachable();'.
In that case, we do not handle the others - but the -Wswitch-enum will warn about it.
Thus, a bootstrap build will ensure that all values are covered due to -Werror=switch-enum.

Downside is that without -Werror/bootstrap, it will silently fall through but for
normal FE/ME code, it is guaranteed to be bootstrapped and will show up.

* * *

>>> Also, for my understanding: why is 'build_indirect_map' done at kernel
>>> invocation time (here) instead of at image load time?
>> The splay_tree is generated on the device itself - and we currently do
>> not start a kernel during GOMP_OFFLOAD_load_image. We could, the
>> question is whether it makes sense. (Generating the splay_tree on the
>> host for the device is a hassle and error prone as it needs to use
>> device pointers at the end.)
> Hmm.  It seems conceptually cleaner to me to set this up upfront, and
> avoids potentially slowing down every device kernel invocation (at least
> another function call, and 'gomp_mutex_lock' check).  Though, I agree
> this may be "in the noise" with regards to all the other stuff going on
> in 'gomp_gcn_enter_kernel' and elsewhere...

I think the most common case is GOMP_INDIRECT_ADDR_MAP == NULL.

The question is whether the lock should/could be moved inside  if (!indirect_array)
or not. Probably yes:
* doing an atomic load for the outer '!indirect array', work on a local array for
the build up and only assign it at the end - and just after the lock check again
whether '!indirect array'.

That way, it is lock free once build but when build there is no race.

> What I just realize, what's also unclear to me is how the current
> implementation works with regards to several images getting loaded --
> don't we then overwrite 'GOMP_INDIRECT_ADDR_MAP' instead of
> (conceptually) appending to it?

Yes, I think that will happen - but it looks as if the same issue exists
also the other code? I think that's not the first variable that has that
issue?

I think we should try to cleanup that handling, also to support calling
a device function in a shared library from a target region in the main
program, which currently also fails.

All device routines that are in normal static libraries and in the
object files of the main program should simply work thanks to offload
LTO such that there is only a single GOMP_offload_register_ver call (per
device type) and GOMP_OFFLOAD_load_image call (per device).

Likewise if the offloading is only done via a single shared library. —
Any mixing will currently fail, unfortunately. This patch just adds
another item which does not handle it properly.

(Not good but IMHO also not a showstopper for this patch.)

> In the general case, additional images may also get loaded during
> execution.  We thus need proper locking of the shared data structure, uh?
> Or, can we have separate on-device data structures per image?  (I've not
> yet thought about that in detail.)

I think we could - but in the main-program 'omp target' case that calls
a shared-library 'declare target' function means that we need to handle
multiple GOMP_offload_register_ver / load_image calls such that they can
work together.

Obviously, it gets harder if the user keeps doing dlopen() / dlclose()
of libraries containing offload code where a target/compute region is
run before, between, and after those calls (but hopefully not running
when calling dlopen/dlclose).

> Relatedly then, when images are unloaded, we also need to remove stale
> items from the table, and release resources (for example, the
> 'GOMP_OFFLOAD_alloc' for 'map_target_addr').

True. I think the general assumption is that images only get unloaded at
the very end, which matches most but not all code. Yet another work item.

I think we should open a new PR about this topic and collect work items
there.

>>>> +++ b/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-2.c
> Another thing regarding this test case: testing
> '-foffload-options=amdgcn-amdhsa=-march=gfx900' offloading on our
> amdnano4 system, I see:
> ...
> Re-running this manually a few times, I got:
>      pass: 5
>      fail (as above): 3
>      hang: 1

Looks like something that needs to be investigated.

Tobias
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

  reply	other threads:[~2023-11-13 11:47 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-08 13:13 Kwok Cheung Yeung
2023-10-17 13:12 ` Tobias Burnus
2023-10-17 13:34   ` Jakub Jelinek
2023-10-17 14:41     ` Tobias Burnus
2023-11-03 19:53   ` Kwok Cheung Yeung
2023-11-06  8:48     ` Tobias Burnus
2023-11-07 21:37       ` Joseph Myers
2023-11-07 21:51         ` Jakub Jelinek
2023-11-07 21:59           ` Kwok Cheung Yeung
2023-11-09 12:24     ` Thomas Schwinge
2023-11-09 16:00       ` Tobias Burnus
2023-11-13 10:59         ` Thomas Schwinge
2023-11-13 11:47           ` Tobias Burnus [this message]
2024-04-11 10:10             ` Thomas Schwinge
2024-01-03 14:47       ` [committed] " Kwok Cheung Yeung
2024-01-03 15:54       ` Kwok Cheung Yeung
2024-01-22 20:33     ` [PATCH] openmp: Change to using a hashtab to lookup offload target addresses for indirect function calls Kwok Cheung Yeung
2024-01-24  7:06       ` rep.dot.nop
2024-01-29 17:48         ` [PATCH v2] " Kwok Cheung Yeung
2024-03-08 13:40           ` Thomas Schwinge
2024-03-14 11:38           ` Tobias Burnus
2024-01-22 20:41     ` [PATCH] openmp, fortran: Add Fortran support for indirect clause on the declare target directive Kwok Cheung Yeung
2024-01-23 19:14       ` Tobias Burnus
2024-02-05 21:37         ` [PATCH v2] " Kwok Cheung Yeung
2024-02-06  9:03           ` Tobias Burnus
2024-02-06  9:50             ` Kwok Cheung Yeung
2024-02-12  8:51               ` Tobias Burnus
2024-02-15 21:37                 ` [COMMITTED] libgomp: Update documentation for indirect calls in target regions Kwok Cheung Yeung

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7ed57b08-3895-49b8-8b10-642609f0a8d2@codesourcery.com \
    --to=tobias@codesourcery.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=kcy@codesourcery.com \
    --cc=thomas@codesourcery.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).