public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Tom de Vries <tdevries@suse.de>
To: Chung-Lin Tang <chunglin.tang@gmail.com>,
	gcc-patches <gcc-patches@gcc.gnu.org>,
	Catherine Moore <clm@codesourcery.com>
Subject: Re: [PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx
Date: Fri, 16 Dec 2022 15:51:35 +0100	[thread overview]
Message-ID: <041a398e-5d1f-cbfa-5d7c-12f46b330143@suse.de> (raw)
In-Reply-To: <8b974d21-e288-4596-7500-277a43c92771@gmail.com>

On 9/21/22 09:45, Chung-Lin Tang wrote:
> Hi Tom,
> I had a patch submitted earlier, where I reported that the current way 
> of implementing
> barriers in libgomp on nvptx created a quite significant performance 
> drop on some SPEChpc2021
> benchmarks:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/600818.html
> 
> That previous patch wasn't accepted well (admittedly, it was kind of a 
> hack).
> So in this patch, I tried to (mostly) re-implement team-barriers for NVPTX.
> 

Ack.

> Basically, instead of trying to have the GPU do CPU-with-OS-like things 
> that it isn't suited for,
> barriers are implemented simplistically with bar.* synchronization 
> instructions.
> Tasks are processed after threads have joined, and only if 
> team->task_count != 0
> 
> (arguably, there might be a little bit of performance forfeited where 
> earlier arriving threads
> could've been used to process tasks ahead of other threads. But that 
> again falls into requiring
> implementing complex futex-wait/wake like behavior. Really, that kind of 
> tasking is not what target
> offloading is usually used for)
> 

Please try to add this insight somewhere as a comment in the code, f.i. 
in the header comment of bar.h.

> Implementation highlight notes:
> 1. gomp_team_barrier_wake() is now an empty function (threads never 
> "wake" in the usual manner)
> 2. gomp_team_barrier_cancel() now uses the "exit" PTX instruction.
> 3. gomp_barrier_wait_last() now is implemented using "bar.arrive"
> 
> 4. gomp_team_barrier_wait_end()/gomp_team_barrier_wait_cancel_end():
>     The main synchronization is done using a 'bar.red' instruction. This 
> reduces across all threads
>     the condition (team->task_count != 0), to enable the task processing 
> down below if any thread
>     created a task. (this bar.red usage required the need of the second 
> GCC patch in this series)
> 
> This patch has been tested on x86_64/powerpc64le with nvptx offloading, 
> using libgomp, ovo, omptests,
> and sollve_vv testsuites, all without regressions. Also verified that 
> the SPEChpc 2021 521.miniswp_t
> and 534.hpgmgfv_t performance regressions that occurred in the GCC12 
> cycle has been restored to
> devel/omp/gcc-11 (OG11) branch levels. Is this okay for trunk?
> 

AFAIU the waiters and lock fields are longer used, so they can be removed.

Yes, LGTM, please apply (after the other one).

Thanks for addressing this.

FWIW, tested on NVIDIA RTX A2000 with driver 525.60.11.

> (also suggest backporting to GCC12 branch, if performance regression can 
> be considered a defect)
> 

That's ok, but wait a while after applying on trunk before doing that, 
say a month.

Thanks,
- Tom

> Thanks,
> Chung-Lin
> 
> libgomp/ChangeLog:
> 
> 2022-09-21  Chung-Lin Tang  <cltang@codesourcery.com>
> 
>      * config/nvptx/bar.c (generation_to_barrier): Remove.
>      (futex_wait,futex_wake,do_spin,do_wait): Remove.
>      (GOMP_WAIT_H): Remove.
>      (#include "../linux/bar.c"): Remove.
>      (gomp_barrier_wait_end): New function.
>      (gomp_barrier_wait): Likewise.
>      (gomp_barrier_wait_last): Likewise.
>      (gomp_team_barrier_wait_end): Likewise.
>      (gomp_team_barrier_wait): Likewise.
>      (gomp_team_barrier_wait_final): Likewise.
>      (gomp_team_barrier_wait_cancel_end): Likewise.
>      (gomp_team_barrier_wait_cancel): Likewise.
>      (gomp_team_barrier_cancel): Likewise.
>      * config/nvptx/bar.h (gomp_team_barrier_wake): Remove
>      prototype, add new static inline function.

  parent reply	other threads:[~2022-12-16 14:51 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-21  7:45 Chung-Lin Tang
2022-09-21  9:01 ` Jakub Jelinek
2022-09-21 10:02   ` Chung-Lin Tang
2022-10-17 14:29 ` Chung-Lin Tang
2022-10-31 14:18   ` [Ping x2] " Chung-Lin Tang
2022-11-07 16:34     ` [Ping x3] " Chung-Lin Tang
2022-11-21 16:24       ` [Ping x4] " Chung-Lin Tang
2022-12-05 16:21         ` [Ping x5] " Chung-Lin Tang
2022-12-12 11:13           ` [Ping x6] " Chung-Lin Tang
2022-12-16 14:51 ` Tom de Vries [this message]
2022-12-19 12:13   ` Thomas Schwinge

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=041a398e-5d1f-cbfa-5d7c-12f46b330143@suse.de \
    --to=tdevries@suse.de \
    --cc=chunglin.tang@gmail.com \
    --cc=clm@codesourcery.com \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).