Hi! On Thu, 28 May 2015 16:20:11 +0200, Jakub Jelinek wrote: > On Thu, May 28, 2015 at 03:06:35PM +0100, Julian Brown wrote: > > [...] > I think the lowering of this already at ompexp time is premature Yes, we're aware of this "wart". :-| > I think much better would be to have a function attribute (or cgraph > flag) that would be set for functions you want to compile this way > (plus a targetm flag that the targets want to support it that way), > plus a flag in loop structure for the acc loop vector loops > (perhaps the current OpenMP simd loop flags are good enough for that), > and lower it somewhere around the vectorization pass or so. Moving the loop lowering/expansion later is along the same lines as we've been thinking. Figuring out how the OpenMP simd implementation works, is another thing I wanted to look into. > Or, what exactly do you emit for the fallback code, or for other GPGPUs > or XeonPhi? To me e.g. for XeonPhi or HSA this sounds like you > want to implement the acc loop gang as a work-sharing loop among > threads (like #pragma omp for) and #pragma acc loop vector like > a loop that should be vectorized if at all possible (like #pragma omp simd). > I really think it is important that OpenACC GCC support is not so strongly > tied to one specific GPGPU Not disagreeing, but: we have to start somewhere. GPU offloading and all its peculiarities is still entering unknown terriroty in GCC; we're still learning, and shall try to converge the emerging different implementations in the future. Doing the completely generic (agnostic of specific offloading device) implementation right now is a challenging task, hence the work on a "nvptx-specific prototype" first, to put it this way. That said, we of course very much welcome your continued review of our work, and your suggestions! > and similarly OpenMP should be usable for > all offloading targets GCC supports. > > That way, it is possible to auto-vectorize the code too, decision how > to expand the code of offloaded function is done already separately for each > offloading target, there is a space for optimizations on much simpler > cfg, etc. Grüße, Thomas