This patch is submitted now for review and so I can commit a backport it to the OG11 branch, but isn't suitable for mainline until stage 1. The patch implements support for omp_low_lat_mem_space and omp_low_lat_mem_alloc on NVPTX offload devices. The omp_pteam_mem_alloc, omp_cgroup_mem_alloc and omp_thread_mem_alloc allocators are also configured to use this space (this to match the current or intended behaviour in other toolchains). The memory is drawn from the ".shared" space that is accessible only from within the team in which it is allocated, and which effectively ceases to exist when the kernel exits. By default, 8 KiB of space is reserved for each team at launch time. This can be adjusted, at runtime, via a new environment variable "GOMP_NVPTX_LOWLAT_POOL". Reserving a larger amount may limit the number of teams that can be run in parallel (due to hardware limitations). Conversely, reducing the allocation may increase the number of teams that can be run in parallel. (I have not yet attempted to tune the default too precisely.) The actual maximum size will vary according to the available hardware and the number of variables that the compiler has placed in .shared space. The allocator implementation is designed to add no extra space-overhead than omp_alloc already does (aside from rounding allocations up to a multiple of 8 bytes), thus the internal free and realloc must be told how big the original allocation was. The free algorithm maintains an in-order linked-list of free memory chunks. Memory is allocated on a first-fit basis. If the allocation fails the NVPTX allocator returns NULL and omp_alloc handles the fall-back. Now that this is a thing that is likely to happen (low-latency memory is small) this patch also implements appropriate fall-back modes for the predefined allocators (fall-back for custom allocators already worked). In order to support the %dynamic_smem_size PTX feature is is necessary to bump the minimum supported PTX version from 3.1 (~2013) to 4.1 (~2014). OK for stage 1? Andrew