Hi!

On Thu, 28 May 2015 16:20:11 +0200, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, May 28, 2015 at 03:06:35PM +0100, Julian Brown wrote:
> > [...]

> I think the lowering of this already at ompexp time is premature

Yes, we're aware of this "wart".  :-|

> I think much better would be to have a function attribute (or cgraph
> flag) that would be set for functions you want to compile this way
> (plus a targetm flag that the targets want to support it that way),
> plus a flag in loop structure for the acc loop vector loops
> (perhaps the current OpenMP simd loop flags are good enough for that),
> and lower it somewhere around the vectorization pass or so.

Moving the loop lowering/expansion later is along the same lines as we've
been thinking.  Figuring out how the OpenMP simd implementation works, is
another thing I wanted to look into.

> Or, what exactly do you emit for the fallback code, or for other GPGPUs
> or XeonPhi?  To me e.g. for XeonPhi or HSA this sounds like you
> want to implement the acc loop gang as a work-sharing loop among
> threads (like #pragma omp for) and #pragma acc loop vector like
> a loop that should be vectorized if at all possible (like #pragma omp simd).
> I really think it is important that OpenACC GCC support is not so strongly
> tied to one specific GPGPU

Not disagreeing, but: we have to start somewhere.  GPU offloading and all
its peculiarities is still entering unknown terriroty in GCC; we're still
learning, and shall try to converge the emerging different
implementations in the future.  Doing the completely generic (agnostic of
specific offloading device) implementation right now is a challenging
task, hence the work on a "nvptx-specific prototype" first, to put it
this way.

That said, we of course very much welcome your continued review of our
work, and your suggestions!

> and similarly OpenMP should be usable for
> all offloading targets GCC supports.
> 
> That way, it is possible to auto-vectorize the code too, decision how
> to expand the code of offloaded function is done already separately for each
> offloading target, there is a space for optimizations on much simpler
> cfg, etc.


Grüße,
 Thomas