This patch addresses the following problems with acc routines: * incorrectly permitting 'acc seq' loops to call gang, worker and vector routines * lto-wrapper errors when a function or subroutine isn't marked as 'acc routine' The solution to the first problem is straightforward. It only required a small change to oacc_loop_fixed_partitions. The solution to the second problem is more involved, since it required changes to the fortran FE, gimplifier, the behavior of flag_generate_offload, and libgomp. Starting with the the fortran changes, this patch updates the way that the fortran FE handles the 'acc routine' attribute in modules. Before, it only recorded that a function was marked as an acc routine. With this patch, it now records the level of parallelism the routine has. This is necessary for the middle end to validate compatible parallelism between the loop calling the routine and the routine itself. The second set of changes involves teaching the gimplifier to error when it detects a function call to an non-acc routines inside an OpenACC offloaded region. Actually, I relaxed non-acc routines by excluding calls to builtin functions, including those prefixed with _gfortran_. Nvptx does have a newlib c library, and it also has a subset of libgfortran. Still, this solution is probably not optimal. Next, I had to modify the openacc header files in libgomp to mark acc_on_device as an acc routine. Unfortunately, this meant that I had to build the opeancc.mod module for gfortran with -fopenacc. But doing that, caused caused gcc to stream offloaded code to the openacc.o object file. So, I've updated the behavior of flag_generate_offload such that minus one indicates that the user specified -foffload=disable, and that will prevent gcc from streaming offloaded lto code. The alternative was to hack libtool to build libgomp with -foffload=disable. Is this patch OK for trunk? There are still a couple of other quirks with routines we'll need to address with a follow up patch. Namely, passing scalar dummy arguments causes to subroutines trips up the nvptx worker and vector state propagator if the actual argument is a local variable. That's because the nvptx state propagator only forwards the pointer to the worker and vector threads, and not the actual variable itself. Consequently, those pointers dereference garbage. This is a problem with pass-by-reference in general. Cesar