public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* OpenMP auto-simd
@ 2022-03-02 15:12 Stubbs, Andrew
  2022-03-02 15:24 ` Jakub Jelinek
  2022-03-08 14:29 ` Thomas Schwinge
  0 siblings, 2 replies; 5+ messages in thread
From: Stubbs, Andrew @ 2022-03-02 15:12 UTC (permalink / raw)
  To: Jakub Jelinek, gcc

Hi Jakub, all,

Has anyone ever considered having GCC add the "simd" clause to offload (or regular) loop nests automatically?

For example, something like "-fomp-auto-simd" would transform "distribute parallel" to "distribute parallel simd" automatically. Loop nests that already contain "simd" clauses or directives would remain unchanged, most likely.

The reason I ask is that other toolchains have chosen to use a "SIMT" model for GPUs, which means that OpenMP threads map to individual vector lanes and are therefore are strictly scalar. The result is that the "simd" directive is irrelevant and lots of code out there isn't using it at all (so I'm told). Meanwhile, in GCC we map OpenMP threads to Nvidia warps and AMD GCN wavefronts, so it is impossible to get full performance without explicitly specifying the "simd" directive. We therefore suffer in direct comparisons.

I'm of the opinion that GCC is the one implementing OpenMP as intended, but all the same I need to explore our options here, figure out what the consequences would be, and plan a project to do what we can.

I've thought of simply enabling "-ftree-vectorize" on AMD GCN (this doesn't help NVPTX) but I think that is sub-optimal because things like the OpenMP scheduler really need to be aware of the vector size, and there's probably other ways in which parallel regions can be better formed with regard to the vectorizer. If these features don't exist right now then I have an opportunity to include them in our upcoming project.

Any info/suggestions/advice would be appreciated.

Thanks

Andrew

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: OpenMP auto-simd
  2022-03-02 15:12 OpenMP auto-simd Stubbs, Andrew
@ 2022-03-02 15:24 ` Jakub Jelinek
  2022-03-02 16:11   ` Stubbs, Andrew
  2022-03-08 14:29 ` Thomas Schwinge
  1 sibling, 1 reply; 5+ messages in thread
From: Jakub Jelinek @ 2022-03-02 15:24 UTC (permalink / raw)
  To: Stubbs, Andrew; +Cc: gcc

On Wed, Mar 02, 2022 at 03:12:30PM +0000, Stubbs, Andrew wrote:
> Has anyone ever considered having GCC add the "simd" clause to offload (or regular) loop nests automatically?
> 
> For example, something like "-fomp-auto-simd" would transform "distribute parallel" to "distribute parallel simd" automatically. Loop nests that already contain "simd" clauses or directives would remain unchanged, most likely.

I'm afraid we can't do that, at least not always.  The simd has various
restrictions on what can appear inside of the body, etc. and we shouldn't
reject valid code just because we decided to add simd automatically (even if
the user asked for those through an option).
So, it could be done only if we would do analysis that it is safe to do
that.

	Jakub


^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: OpenMP auto-simd
  2022-03-02 15:24 ` Jakub Jelinek
@ 2022-03-02 16:11   ` Stubbs, Andrew
  0 siblings, 0 replies; 5+ messages in thread
From: Stubbs, Andrew @ 2022-03-02 16:11 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc

> -----Original Message-----
> From: Jakub Jelinek <jakub@redhat.com>
> Sent: 02 March 2022 15:25
> To: Stubbs, Andrew <Andrew_Stubbs@mentor.com>
> Cc: gcc@gcc.gnu.org
> Subject: Re: OpenMP auto-simd
> 
> On Wed, Mar 02, 2022 at 03:12:30PM +0000, Stubbs, Andrew wrote:
> > Has anyone ever considered having GCC add the "simd" clause to offload (or
> regular) loop nests automatically?
> >
> > For example, something like "-fomp-auto-simd" would transform "distribute
> parallel" to "distribute parallel simd" automatically. Loop nests that
> already contain "simd" clauses or directives would remain unchanged, most
> likely.
> 
> I'm afraid we can't do that, at least not always.  The simd has various
> restrictions on what can appear inside of the body, etc. and we shouldn't
> reject valid code just because we decided to add simd automatically (even if
> the user asked for those through an option).
> So, it could be done only if we would do analysis that it is safe to do
> that.

In the general case there are undoubtedly issues, but I think the restrictions listed in the OpenMP document ought to be detectable, for at least the inline code. Is there one that is too hard, at least during the early passes? I anticipate that version 1.0 wouldn't add the directive to regions that include function calls (unless declared "simd" explicitly, perhaps), although that would be nice to have later.

For AMD GCN it's always safe to set the "force_vectorize" flag on any given loop (it's just the same as setting -ftree-vectorize for the whole program) given that the vectorizer will simply quietly fail later. For NVPTX this might be a bigger issue.

Is this really such a lost cause?

Thanks

Andrew

P.S. Ideally we'd do this the same way that other toolchains do it, such that omp_get_thread_num returns a number in the range 0..1023 rather than 0..15 (AMD) or 0..32 (NVPTX) as we do now, but I think that's just impossible with the current implementation.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: OpenMP auto-simd
  2022-03-02 15:12 OpenMP auto-simd Stubbs, Andrew
  2022-03-02 15:24 ` Jakub Jelinek
@ 2022-03-08 14:29 ` Thomas Schwinge
  2022-03-08 16:47   ` Stubbs, Andrew
  1 sibling, 1 reply; 5+ messages in thread
From: Thomas Schwinge @ 2022-03-08 14:29 UTC (permalink / raw)
  To: Andrew_Stubbs, Jakub Jelinek; +Cc: gcc, Tom de Vries

Hi!

... with the usual caveat that I know much more about OpenACC than
OpenMP, and I know (at least a bit) more about nvptx than GCN...  ;-)

On 2022-03-02T15:12:30+0000, "Stubbs, Andrew" <Andrew_Stubbs@mentor.com> wrote:
> Has anyone ever considered having GCC add the "simd" clause to offload (or regular) loop nests automatically?
>
> For example, something like "-fomp-auto-simd" would transform "distribute parallel" to "distribute parallel simd" automatically. Loop nests that already contain "simd" clauses or directives would remain unchanged, most likely.
>
> The reason I ask is that other toolchains have chosen to use a "SIMT" model for GPUs, which means that OpenMP threads map to individual vector lanes and are therefore are strictly scalar. The result is that the "simd" directive is irrelevant and lots of code out there isn't using it at all (so I'm told). Meanwhile, in GCC we map OpenMP threads to Nvidia warps and AMD GCN wavefronts, so it is impossible to get full performance without explicitly specifying the "simd" directive. We therefore suffer in direct comparisons.
>
> I'm of the opinion that GCC is the one implementing OpenMP as intended

I'm curious: how does one arrive at this conclusion?

For example, in addition to intra-warp thread parallelism, nvptx also
does have a few SIMD instructions: data transfer (combine two adjacent
32-bit transfers into one 64-bit transfer, and also some basic
arithmetic; I'd have to look up the details).  It's not much, but it's
something that GCC's SLP vectorizer can use.  (Tom worked on that, years
ago.)  Using that to implement OpenMP's SIMD (quite likely via
default-(SLP-)auto-vectorization), you'd then indeed get for actualy
OpenMP threads what you described as "SIMT" model above.

Why not change GCC to do the same, if that's the common understanding how
OpenMP for GPUs should be done, as implemented by other compilers?


Grüße
 Thomas


> but all the same I need to explore our options here, figure out what the consequences would be, and plan a project to do what we can.
>
> I've thought of simply enabling "-ftree-vectorize" on AMD GCN (this doesn't help NVPTX) but I think that is sub-optimal because things like the OpenMP scheduler really need to be aware of the vector size, and there's probably other ways in which parallel regions can be better formed with regard to the vectorizer. If these features don't exist right now then I have an opportunity to include them in our upcoming project.
>
> Any info/suggestions/advice would be appreciated.
>
> Thanks
>
> Andrew
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: OpenMP auto-simd
  2022-03-08 14:29 ` Thomas Schwinge
@ 2022-03-08 16:47   ` Stubbs, Andrew
  0 siblings, 0 replies; 5+ messages in thread
From: Stubbs, Andrew @ 2022-03-08 16:47 UTC (permalink / raw)
  To: Schwinge, Thomas, Jakub Jelinek; +Cc: gcc, Tom de Vries

> > I'm of the opinion that GCC is the one implementing OpenMP as intended
> 
> I'm curious: how does one arrive at this conclusion?

Basically, any implementation for which a (significant) directive becomes a no-op is either a) not implementing the feature as intended, or b) is implementing it for a device configured differently to the one that inspired the directive. Since I'm pretty sure that (b) is not the case I settled my opinion on (a).

Now, it may be that the original intention is flawed and the deviant implementation is superior, but that's another matter.

Andrew

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-03-08 16:47 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-02 15:12 OpenMP auto-simd Stubbs, Andrew
2022-03-02 15:24 ` Jakub Jelinek
2022-03-02 16:11   ` Stubbs, Andrew
2022-03-08 14:29 ` Thomas Schwinge
2022-03-08 16:47   ` Stubbs, Andrew

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).