This patch, which is largely implemented by Chung-Lin, is a first step towards teaching the c and c++ FEs how to allocate shared memory for gang local variables. E.g. #pragma acc parallel { int some_array[N], some_var; Both some_array and some_var will be stored in shared memory with this patch. Shared memory is allocated for local variables in a similar fashion to worker reductions. The nvptx BE maintains a global __gangprivate_shared variable for all of the local variables that require shared memory. During RTL expansion, decls are checked for an "oacc gangprivate" attribute, then those decls are remapped to a pointer within __gangprivate_shared via the new expand_accel_var target hook. That hook is also responsible for reserving shared memory for each decl in the offloaded program. The c and c++ FEs attach "oacc gangprivate" attributes to decls immediately after they process OpenACC kernels and parallel regions. This implementation still has a number of limitations, which will be addressed in follow up patches at some later date: * Currently variables in private clauses inside acc loops will not utilize shared memory. * OpenACC routines don't use shared memory, except for reductions and worker state propagation. * Variables local to worker loops don't use shared memory. * Variables local to automatically partitioned gang and worker loops don't use shared memory. * Shared memory is allocated globally, not locally on a per-function basis. We're not sure if that matters though. This patch has been applied to gomp-4_0-branch. Cesar