Hi, I. This patch allows parallelization of an outer loop in an openacc kernels region. The testcase is based on autopar/outer-1.c. II. We rely on pass_lim to move the *.omp_data_i loads out of the loop nest. For the test-case, pass_lim was managing to move the load out of the inner loop, but not the outer loop, because the load was classified as 'MOVE_PRESERVE_EXECUTION'. By marking the *.omp_data_i load non-trapping, it's now classified as 'MOVE_POSSIBLE', and moved out of the loop nest. III. The 'loops_state_set (LOOPS_NEED_FIXUP)' is a somewhat blunt and temporary fix for the oacc kernels variant of PR66846 - parloops does not always mark loops for fixup if needed. The original PR needs an added verify_loop_structure to trigger the problem. Normally the problem is hidden by the fact that the first pass that runs on the new function is pass_fixup_cfg, which happens to fixup the loops (The loops are fixed up because TODO_cleanup_cfg is set during pass_fixup_cfg, because the function contains an ECF_CONST function: __builtin_omp_get_num_threads). For the oacc kernels variant, the problem triggers without adding verify_loop_structure. During pass_ipa_inline, we call loop_optimizer_init, which (given that LOOPS_NEED_FIXUP is not set) verifies the loop structure, which fails. Pass_fixup_cfg is not run inbetween the discovery of the new function and pass_ipa_inline. IV. I've committed this patch to gomp-4_0-branch. Bootstrapped and reg-tested on x86_64. Build and reg-tested on setup with nvidia accelerator. Thanks, - Tom