public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 00/40] OpenACC "kernels" Improvements
@ 2021-12-15 15:54 Frederik Harwath
  2021-12-15 15:54 ` [PATCH 01/40] Kernels loops annotation: C and C++ Frederik Harwath
                   ` (39 more replies)
  0 siblings, 40 replies; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: rguenther, fortran, matz, Catherine_Moore

Hi,
this patch series implements the re-work of the OpenACC "kernels"
implementation that has been announced at the GNU Tools Track of this
year's Linux Plumbers Conference; see
https://linuxplumbersconf.org/event/11/contributions/998/.  Versions
of the patches have also been committed to the devel/omp/gcc-11 branch
recently.

The patch series contains middle-end changes that modify the "kernels"
loop handling to use Graphite for dependence analysis of loops in
"kernels" regions, as well as new optimizations and adjustments to
existing optimizations to support this analysis. A central step is
contained in the commit titled "openacc: Use Graphite for dependence
analysis in \"kernels\" regions" whose commit message also contains
further explanations. There are also front end changes (cf. the
patches by Sandra Loosemore) that prepare the loops in "kernels"
regions for the middle-end processing and which lift various
restrictions on "kernels" regions.  I have included some dependences
(the patches by Julian Brown) from the devel/omp/gcc-11 branch which
will be re-submitted independently for review.

I have bootstrapped the compiler on x86_64-linux-gnu and performed
comprehensive testing on a powerpc64le-linux-gnu target.  The patches
should apply cleanly on commit r12-4865 of the master branch.

I am aware that we cannot incorporate those patches into GCC at the
current development stage. I hope that we can discuss some of the
changes before they can be considered for inclusion in GCC during the
next stage 1.

Best regards,
Frederik


Andrew Stubbs (2):
  openacc: Add data optimization pass
  openacc: Add runtime a lias checking for OpenACC kernels

Frederik Harwath (20):
  Fortran: Delinearize array accesses
  openacc: Move pass_oacc_device_lower after pass_graphite
  graphite: Extend SCoP detection dump output
  graphite: Rename isl_id_for_ssa_name
  graphite: Fix minor mistakes in comments
  Move compute_alias_check_pairs to tree-data-ref.c
  graphite: Add runtime alias checking
  openacc: Use Graphite for dependence analysis in "kernels" regions
  openacc: Add "can_be_parallel" flag info to "graph" dumps
  openacc: Remove unused partitioning in "kernels" regions
  Add function for printing a single OMP_CLAUSE
  openacc: Warn about "independent" "kernels" loops with
    data-dependences
  openacc: Handle internal function calls in pass_lim
  openacc: Disable pass_pre on outlined functions analyzed by Graphite
  graphite: Tune parameters for OpenACC use
  graphite: Adjust scop loop-nest choice
  graphite: Accept loops without data references
  openacc: Enable reduction variable localization for "kernels"
  openacc: Check type for references in reduction lowering
  openacc: Adjust testsuite to new "kernels" handling

Julian Brown (4):
  Reference reduction localization
  Fix tree check failure with reduction localization
  Use more appropriate var in localize_reductions call
  Handle references in OpenACC "private" clauses

Sandra Loosemore (12):
  Kernels loops annotation: C and C++.
  Add -fno-openacc-kernels-annotate-loops option to more testcases.
  Kernels loops annotation: Fortran.
  Additional Fortran testsuite fixes for kernels loops annotation pass.
  Fix bug in processing of array dimensions in data clauses.
  Add a "combined" flag for "acc kernels loop" etc directives.
  Annotate inner loops in "acc kernels loop" directives (C/C++).
  Annotate inner loops in "acc kernels loop" directives (Fortran).
  Permit calls to builtins and intrinsics in kernels loops.
  Fix patterns in Fortran tests for kernels loop annotation.
  Clean up loop variable extraction in OpenACC kernels loop annotation.
  Relax some restrictions on the loop bound in kernels loop annotation.

Tobias Burnus (2):
  Fix for is_gimple_reg vars to 'data kernels'
  openacc: fix privatization of by-reference arrays

 gcc/Makefile.in                               |   2 +
 gcc/c-family/c-common.h                       |   1 +
 gcc/c-family/c-omp.c                          | 915 +++++++++++++++--
 gcc/c-family/c.opt                            |   8 +
 gcc/c/c-decl.c                                |  28 +
 gcc/c/c-parser.c                              |   3 +
 gcc/cfgloop.c                                 |   1 +
 gcc/cfgloop.h                                 |   6 +
 gcc/cfgloopmanip.c                            |   1 +
 gcc/common.opt                                |   9 +
 gcc/config/nvptx/nvptx.c                      |   7 +
 gcc/cp/decl.c                                 |  44 +
 gcc/cp/parser.c                               |   3 +
 gcc/cp/semantics.c                            |   9 +
 gcc/doc/gimple.texi                           |   2 +
 gcc/doc/invoke.texi                           |  52 +-
 gcc/doc/passes.texi                           |   6 +-
 gcc/expr.c                                    |   1 +
 gcc/flag-types.h                              |   1 +
 gcc/fortran/gfortran.h                        |   1 +
 gcc/fortran/lang.opt                          |  12 +
 gcc/fortran/openmp.c                          | 415 ++++++++
 gcc/fortran/parse.c                           |   9 +
 gcc/fortran/trans-array.c                     | 321 ++++--
 gcc/fortran/trans-openmp.c                    |  34 +-
 gcc/gimple-loop-interchange.cc                |   2 +-
 gcc/gimple-pretty-print.c                     |   3 +
 gcc/gimple-walk.c                             |  15 +-
 gcc/gimple-walk.h                             |   6 +
 gcc/gimple.h                                  |   5 +
 gcc/gimplify.c                                | 117 +++
 gcc/graph.c                                   |  35 +-
 gcc/graphite-dependences.c                    | 220 ++--
 gcc/graphite-isl-ast-to-gimple.c              | 271 ++++-
 gcc/graphite-oacc.c                           | 688 +++++++++++++
 gcc/graphite-oacc.h                           |  55 +
 gcc/graphite-optimize-isl.c                   |  42 +-
 gcc/graphite-poly.c                           |  41 +-
 gcc/graphite-scop-detection.c                 | 651 ++++++++++--
 gcc/graphite-sese-to-poly.c                   |  90 +-
 gcc/graphite.c                                | 120 ++-
 gcc/graphite.h                                |  40 +-
 gcc/internal-fn.c                             |   4 +
 gcc/internal-fn.h                             |   4 +-
 gcc/omp-data-optimize.cc                      | 951 ++++++++++++++++++
 gcc/omp-expand.c                              | 102 +-
 gcc/omp-general.c                             |  23 +-
 gcc/omp-general.h                             |   1 +
 gcc/omp-low.c                                 | 439 ++++++--
 gcc/omp-oacc-kernels-decompose.cc             | 154 ++-
 gcc/omp-oacc-neuter-broadcast.cc              |   2 +
 gcc/omp-offload.c                             | 830 ++++++++++++---
 gcc/omp-offload.h                             |   2 +
 gcc/params.opt                                |   7 +-
 gcc/passes.c                                  |  42 +
 gcc/passes.def                                |  47 +-
 gcc/sese.c                                    |  25 +-
 gcc/sese.h                                    |  19 +
 .../c-c++-common/goacc-gomp/nesting-1.c       |  10 +-
 gcc/testsuite/c-c++-common/goacc/cache-3-1.c  |   2 +-
 .../goacc/classify-kernels-unparallelized.c   |  35 +-
 .../c-c++-common/goacc/classify-kernels.c     |  24 +-
 .../c-c++-common/goacc/classify-parallel.c    |   8 +-
 .../goacc/classify-routine-nohost.c           |  20 +-
 .../c-c++-common/goacc/classify-routine.c     |  22 +-
 .../c-c++-common/goacc/classify-serial.c      |   8 +-
 .../c-c++-common/goacc/combined-directives.c  |   2 +-
 .../device-lowering-debug-optimization.c      |  29 +
 .../goacc/device-lowering-no-loops.c          |  17 +
 .../goacc/device-lowering-no-optimization.c   |  30 +
 .../c-c++-common/goacc/if-clause-2.c          |   2 +-
 gcc/testsuite/c-c++-common/goacc/kernels-1.c  |  17 +-
 .../kernels-counter-var-redundant-load.c      |  19 +-
 .../kernels-counter-vars-function-scope.c     |  10 +-
 .../c-c++-common/goacc/kernels-decompose-1.c  |  31 +-
 .../c-c++-common/goacc/kernels-decompose-2.c  |  57 +-
 .../goacc/kernels-decompose-ice-1.c           |   7 +-
 .../goacc/kernels-decompose-ice-2.c           |   3 +-
 .../goacc/kernels-double-reduction-n.c        |   6 +-
 .../goacc/kernels-double-reduction.c          |   5 +-
 .../c-c++-common/goacc/kernels-loop-2.c       |  19 +-
 .../c-c++-common/goacc/kernels-loop-3.c       |   3 +
 .../goacc/kernels-loop-annotation-1.c         |  26 +
 .../goacc/kernels-loop-annotation-10.c        |  32 +
 .../goacc/kernels-loop-annotation-11.c        |  27 +
 .../goacc/kernels-loop-annotation-12.c        |  28 +
 .../goacc/kernels-loop-annotation-13.c        |  27 +
 .../goacc/kernels-loop-annotation-14.c        |  22 +
 .../goacc/kernels-loop-annotation-15.c        |  22 +
 .../goacc/kernels-loop-annotation-16.c        |  26 +
 .../goacc/kernels-loop-annotation-17.c        |  26 +
 .../goacc/kernels-loop-annotation-18.c        |  18 +
 .../goacc/kernels-loop-annotation-19.c        |  19 +
 .../goacc/kernels-loop-annotation-2.c         |  21 +
 .../goacc/kernels-loop-annotation-20.c        |  23 +
 .../goacc/kernels-loop-annotation-21.c        |  42 +
 .../goacc/kernels-loop-annotation-22.c        |  41 +
 .../goacc/kernels-loop-annotation-3.c         |  24 +
 .../goacc/kernels-loop-annotation-4.c         |  34 +
 .../goacc/kernels-loop-annotation-5.c         |  27 +
 .../goacc/kernels-loop-annotation-6.c         |  27 +
 .../goacc/kernels-loop-annotation-7.c         |  26 +
 .../goacc/kernels-loop-annotation-8.c         |  27 +
 .../goacc/kernels-loop-annotation-9.c         |  26 +
 .../c-c++-common/goacc/kernels-loop-data-2.c  |  17 +-
 .../goacc/kernels-loop-data-enter-exit-2.c    |  16 +-
 .../goacc/kernels-loop-data-enter-exit.c      |  17 +-
 .../goacc/kernels-loop-data-update.c          |  13 +-
 .../c-c++-common/goacc/kernels-loop-data.c    |  12 +-
 .../c-c++-common/goacc/kernels-loop-g.c       |  14 +-
 .../goacc/kernels-loop-mod-not-zero.c         |  10 +-
 .../c-c++-common/goacc/kernels-loop-n.c       |  10 +-
 .../c-c++-common/goacc/kernels-loop-nest.c    |  12 +-
 .../c-c++-common/goacc/kernels-loop.c         |  10 +-
 .../goacc/kernels-one-counter-var.c           |  12 +-
 .../kernels-parallel-loop-data-enter-exit.c   |  17 +-
 .../c-c++-common/goacc/kernels-reduction.c    |  10 +-
 .../c-c++-common/goacc/loop-2-kernels.c       |   6 +-
 .../c-c++-common/goacc/loop-auto-1.c          | 127 +--
 .../c-c++-common/goacc/loop-auto-2.c          |  37 +-
 .../c-c++-common/goacc/loop-auto-reductions.c |  22 +
 .../goacc/nested-reductions-2-parallel.c      | 138 +++
 .../goacc/note-parallelism-kernels-loops-1.c  |  61 ++
 .../note-parallelism-kernels-loops-parloops.c |  53 +
 .../c-c++-common/goacc/omp_data_optimize-1.c  | 677 +++++++++++++
 .../c-c++-common/goacc/routine-nohost-1.c     |   8 +-
 gcc/testsuite/c-c++-common/unroll-1.c         |   8 +-
 gcc/testsuite/c-c++-common/unroll-4.c         |   4 +-
 .../g++.dg/goacc/omp_data_optimize-1.C        | 169 ++++
 gcc/testsuite/g++.dg/goacc/template.C         |  18 +-
 .../gcc.dg/goacc/graphite-parameter-1.c       |  21 +
 .../gcc.dg/goacc/graphite-parameter-2.c       |  23 +
 .../gcc.dg/goacc/loop-processing-1.c          |   8 +-
 .../gcc.dg/goacc/nested-function-1.c          |   3 +-
 gcc/testsuite/gcc.dg/graphite/alias-1.c       |  22 +
 gcc/testsuite/gcc.dg/tree-ssa/backprop-1.c    |   6 +-
 gcc/testsuite/gcc.dg/tree-ssa/backprop-2.c    |   4 +-
 gcc/testsuite/gcc.dg/tree-ssa/backprop-3.c    |   4 +-
 gcc/testsuite/gcc.dg/tree-ssa/backprop-4.c    |   6 +-
 gcc/testsuite/gcc.dg/tree-ssa/backprop-5.c    |   4 +-
 gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c    |   6 +-
 gcc/testsuite/gcc.dg/tree-ssa/cunroll-1.c     |   6 +-
 gcc/testsuite/gcc.dg/tree-ssa/cunroll-3.c     |   4 +-
 gcc/testsuite/gcc.dg/tree-ssa/cunroll-9.c     |   4 +-
 gcc/testsuite/gcc.dg/tree-ssa/ldist-17.c      |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/loop-38.c       |   4 +-
 gcc/testsuite/gcc.dg/tree-ssa/loopclosedphi.c |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr21463.c       |   4 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr45427.c       |   4 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr59597.c       |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c     |   2 +-
 gcc/testsuite/gcc.dg/unroll-2.c               |   2 +-
 gcc/testsuite/gcc.dg/unroll-3.c               |   4 +-
 gcc/testsuite/gcc.dg/unroll-4.c               |   4 +-
 gcc/testsuite/gcc.dg/unroll-5.c               |   4 +-
 gcc/testsuite/gcc.dg/vect/bb-slp-59.c         |   2 +-
 gcc/testsuite/gcc.dg/vect/vect-profile-1.c    |   2 +-
 gcc/testsuite/gfortran.dg/assumed_type_2.f90  |   6 +-
 .../gfortran.dg/directive_unroll_1.f90        |   2 +-
 .../gfortran.dg/directive_unroll_4.f90        |   2 +-
 ...assify-kernels-unparallelized-parloops.f95 |  44 +
 .../goacc/classify-kernels-unparallelized.f95 |  27 +-
 .../gfortran.dg/goacc/classify-kernels.f95    |  21 +-
 .../gfortran.dg/goacc/classify-parallel.f95   |   6 +-
 .../goacc/classify-routine-nohost.f95         |  18 +-
 .../gfortran.dg/goacc/classify-routine.f95    |  20 +-
 .../gfortran.dg/goacc/classify-serial.f95     |   8 +-
 .../gfortran.dg/goacc/combined-directives.f90 |  19 +-
 .../gfortran.dg/goacc/common-block-3.f90      |  17 +-
 .../gfortran.dg/goacc/gang-static.f95         |  14 +-
 .../gfortran.dg/goacc/kernels-conversion.f95  |  52 +
 .../gfortran.dg/goacc/kernels-decompose-1.f95 | 186 ++--
 .../gfortran.dg/goacc/kernels-decompose-2.f95 | 113 ++-
 .../gfortran.dg/goacc/kernels-loop-2.f95      |  10 +-
 .../goacc/kernels-loop-annotation-1.f95       |  33 +
 .../goacc/kernels-loop-annotation-10.f95      |  32 +
 .../goacc/kernels-loop-annotation-11.f95      |  34 +
 .../goacc/kernels-loop-annotation-12.f95      |  39 +
 .../goacc/kernels-loop-annotation-13.f95      |  38 +
 .../goacc/kernels-loop-annotation-14.f95      |  35 +
 .../goacc/kernels-loop-annotation-15.f95      |  35 +
 .../goacc/kernels-loop-annotation-16.f95      |  34 +
 .../goacc/kernels-loop-annotation-18.f95      |  28 +
 .../goacc/kernels-loop-annotation-19.f95      |  29 +
 .../goacc/kernels-loop-annotation-2.f95       |  32 +
 .../goacc/kernels-loop-annotation-20.f95      |  26 +
 .../goacc/kernels-loop-annotation-3.f95       |  33 +
 .../goacc/kernels-loop-annotation-4.f95       |  34 +
 .../goacc/kernels-loop-annotation-5.f95       |  35 +
 .../goacc/kernels-loop-annotation-6.f95       |  34 +
 .../goacc/kernels-loop-annotation-7.f95       |  48 +
 .../goacc/kernels-loop-annotation-8.f95       |  50 +
 .../goacc/kernels-loop-annotation-9.f95       |  34 +
 .../gfortran.dg/goacc/kernels-loop-data-2.f95 |  10 +-
 .../goacc/kernels-loop-data-enter-exit-2.f95  |  12 +-
 .../goacc/kernels-loop-data-enter-exit.f95    |  12 +-
 .../goacc/kernels-loop-data-update.f95        |  12 +-
 .../gfortran.dg/goacc/kernels-loop-data.f95   |  14 +-
 .../gfortran.dg/goacc/kernels-loop-inner.f95  |   6 +-
 .../gfortran.dg/goacc/kernels-loop-n.f95      |  13 +-
 .../gfortran.dg/goacc/kernels-loop.f95        |   9 +-
 .../kernels-parallel-loop-data-enter-exit.f95 |  12 +-
 .../gfortran.dg/goacc/kernels-reductions.f90  |  37 +
 .../gfortran.dg/goacc/kernels-tree.f95        |   2 +-
 .../gfortran.dg/goacc/loop-2-kernels.f95      |   6 +-
 .../goacc/loop-auto-transfer-2.f90            |  45 +
 .../goacc/loop-auto-transfer-3.f90            |  95 ++
 .../goacc/loop-auto-transfer-4.f90            | 293 ++++++
 .../gfortran.dg/goacc/nested-function-1.f90   |  12 +-
 .../goacc/nested-reductions-2-parallel.f90    | 177 ++++
 .../gfortran.dg/goacc/omp_data_optimize-1.f90 | 588 +++++++++++
 .../goacc/private-explicit-kernels-1.f95      |  20 +-
 .../goacc/private-predetermined-kernels-1.f95 |  23 +-
 .../goacc/privatization-1-compute-loop.f90    |   3 -
 .../goacc/routine-module-mod-1.f90            |   4 +-
 .../goacc/routine-multiple-directives-1.f90   |  32 +-
 .../gfortran.dg/gomp/affinity-clause-1.f90    |   2 +-
 gcc/testsuite/gfortran.dg/graphite/block-2.f  |   9 +-
 .../gfortran.dg/graphite/block-3.f90          |   2 +-
 .../gfortran.dg/graphite/block-4.f90          |   2 +-
 gcc/testsuite/gfortran.dg/graphite/id-9.f     |   2 +-
 .../gfortran.dg/inline_matmul_16.f90          |   2 +
 .../gfortran.dg/inline_matmul_24.f90          |   2 +-
 gcc/testsuite/gfortran.dg/no_arg_check_2.f90  |   6 +-
 gcc/testsuite/gfortran.dg/pr32921.f           |   2 +-
 gcc/testsuite/gfortran.dg/reassoc_4.f         |   2 +-
 .../gfortran.dg/vect/fast-math-mgrid-resid.f  |   1 +
 gcc/tree-chrec.c                              |   3 +
 gcc/tree-core.h                               |   4 +-
 gcc/tree-data-ref.c                           | 107 +-
 gcc/tree-data-ref.h                           |   3 +
 gcc/tree-loop-distribution.c                  |  87 --
 gcc/tree-parloops.c                           |  18 +-
 gcc/tree-pass.h                               |   3 +
 gcc/tree-pretty-print.c                       |  11 +
 gcc/tree-pretty-print.h                       |   1 +
 gcc/tree-scalar-evolution.c                   | 177 +++-
 gcc/tree-scalar-evolution.h                   |   3 +
 gcc/tree-ssa-dce.c                            |  23 +
 gcc/tree-ssa-loop-im.c                        |  57 +-
 gcc/tree-ssa-loop-ivcanon.c                   |   2 +
 gcc/tree-ssa-loop-manip.h                     |   2 +-
 gcc/tree-ssa-loop-niter.c                     |   6 +
 gcc/tree-ssa-loop.c                           | 110 ++
 gcc/tree-ssa-phiprop.c                        |   2 +
 gcc/tree-ssa-pre.c                            |  17 +
 gcc/tree.c                                    | 137 ++-
 gcc/tree.h                                    |   7 +
 .../libgomp.oacc-c++/privatized-ref-2.C       |  64 ++
 .../libgomp.oacc-c++/privatized-ref-3.C       |  64 ++
 .../acc_prof-kernels-1.c                      |  22 +-
 .../declare-vla-kernels-decompose-ice-1.c     |   4 -
 .../kernels-decompose-1.c                     |  10 +-
 .../kernels-private-vars-local-worker-1.c     |   6 +-
 .../kernels-private-vars-local-worker-2.c     |   6 +-
 .../kernels-private-vars-local-worker-3.c     |   6 +-
 .../kernels-private-vars-local-worker-4.c     |   8 +-
 .../kernels-private-vars-local-worker-5.c     |   6 +-
 .../kernels-private-vars-loop-gang-1.c        |   4 +-
 .../kernels-private-vars-loop-gang-2.c        |   4 +-
 .../kernels-private-vars-loop-gang-3.c        |   4 +-
 .../kernels-private-vars-loop-gang-4.c        |  15 +-
 .../kernels-private-vars-loop-gang-5.c        |  10 +-
 .../kernels-private-vars-loop-gang-6.c        |   4 +-
 .../kernels-private-vars-loop-vector-1.c      |   6 +-
 .../kernels-private-vars-loop-vector-2.c      |   6 +-
 .../kernels-private-vars-loop-worker-1.c      |   8 +-
 .../kernels-private-vars-loop-worker-2.c      |   6 +-
 .../kernels-private-vars-loop-worker-3.c      |   6 +-
 .../kernels-private-vars-loop-worker-4.c      |   6 +-
 .../kernels-private-vars-loop-worker-5.c      |   9 +-
 .../kernels-private-vars-loop-worker-6.c      |   6 +-
 .../kernels-private-vars-loop-worker-7.c      |   6 +-
 .../libgomp.oacc-c-c++-common/loop-auto-1.c   |  30 +-
 .../libgomp.oacc-c-c++-common/parallel-dims.c |  39 +-
 .../libgomp.oacc-c-c++-common/pr84955-1.c     |   1 -
 .../libgomp.oacc-c-c++-common/pr85381-2.c     |   8 +-
 .../libgomp.oacc-c-c++-common/pr85381-3.c     |   8 +-
 .../libgomp.oacc-c-c++-common/pr85381-4.c     |   4 +-
 .../libgomp.oacc-c-c++-common/pr85486-2.c     |   4 +-
 .../libgomp.oacc-c-c++-common/pr85486-3.c     |   4 +-
 .../libgomp.oacc-c-c++-common/pr85486.c       |   4 +-
 .../routine-nohost-1.c                        |   6 +-
 .../runtime-alias-check-1.c                   |  79 ++
 .../runtime-alias-check-2.c                   |  90 ++
 .../vector-length-128-1.c                     |   5 +-
 .../vector-length-128-2.c                     |   5 +-
 .../vector-length-128-3.c                     |   5 +-
 .../vector-length-128-4.c                     |   5 +-
 .../vector-length-128-5.c                     |   5 +-
 .../vector-length-128-6.c                     |   5 +-
 .../vector-length-128-7.c                     |   5 +-
 .../testsuite/libgomp.oacc-fortran/if-1.f90   |  32 +-
 .../kernels-acc-loop-reduction-2.f90          |  12 +-
 .../kernels-independent.f90                   |   1 +
 .../libgomp.oacc-fortran/kernels-loop-1.f90   |   1 +
 .../kernels-private-vars-loop-gang-1.f90      |   4 +-
 .../kernels-private-vars-loop-gang-2.f90      |   4 +-
 .../kernels-private-vars-loop-gang-3.f90      |   4 +-
 .../kernels-private-vars-loop-gang-6.f90      |   5 +-
 .../kernels-private-vars-loop-vector-1.f90    |   4 +-
 .../kernels-private-vars-loop-vector-2.f90    |  11 +-
 .../kernels-private-vars-loop-worker-1.f90    |   6 +-
 .../kernels-private-vars-loop-worker-2.f90    |   4 +-
 .../kernels-private-vars-loop-worker-3.f90    |   4 +-
 .../kernels-private-vars-loop-worker-4.f90    |   4 +-
 .../kernels-private-vars-loop-worker-5.f90    |   7 +-
 .../kernels-private-vars-loop-worker-6.f90    |   4 +-
 .../kernels-private-vars-loop-worker-7.f90    |   6 +-
 .../libgomp.oacc-fortran/optional-private.f90 |   2 -
 .../libgomp.oacc-fortran/pr94358-1.f90        |   7 +-
 .../libgomp.oacc-fortran/privatized-ref-1.f95 |  71 ++
 .../libgomp.oacc-fortran/routine-nohost-1.f90 |   4 +-
 313 files changed, 12131 insertions(+), 1729 deletions(-)
 create mode 100644 gcc/graphite-oacc.c
 create mode 100644 gcc/graphite-oacc.h
 create mode 100644 gcc/omp-data-optimize.cc
 create mode 100644 gcc/testsuite/c-c++-common/goacc/device-lowering-debug-optimization.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/device-lowering-no-loops.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/device-lowering-no-optimization.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-1.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-10.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-11.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-12.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-13.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-14.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-15.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-16.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-17.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-18.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-19.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-2.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-20.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-21.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-22.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-3.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-4.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-5.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-6.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-7.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-8.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-9.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/loop-auto-reductions.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops-1.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops-parloops.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/omp_data_optimize-1.c
 create mode 100644 gcc/testsuite/g++.dg/goacc/omp_data_optimize-1.C
 create mode 100644 gcc/testsuite/gcc.dg/goacc/graphite-parameter-1.c
 create mode 100644 gcc/testsuite/gcc.dg/goacc/graphite-parameter-2.c
 create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-1.c
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-10.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-15.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-16.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-18.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-19.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-2.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-20.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-3.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-4.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-5.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-6.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-7.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-8.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-9.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-reductions.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-3.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-4.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/omp_data_optimize-1.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-c++/privatized-ref-2.C
 create mode 100644 libgomp/testsuite/libgomp.oacc-c++/privatized-ref-3.C
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-1.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-2.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-1.f95

--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 01/40] Kernels loops annotation: C and C++.
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2021-12-15 15:54 ` [PATCH 02/40] Add -fno-openacc-kernels-annotate-loops option to more testcases Frederik Harwath
                   ` (38 subsequent siblings)
  39 siblings, 0 replies; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: Sandra Loosemore, thomas, joseph, jason, nathan

From: Sandra Loosemore <sandra@codesourcery.com>

This patch detects loops in kernels regions that are candidates for
parallelization, and adds "#pragma acc loop auto" annotations to them.
This annotation is controlled by the -fopenacc-kernels-annotate-loops
option, which is enabled by default.  -Wopenacc-kernels-annotate-loops
can be used to produce diagnostics about loops that cannot be annotated.

gcc/c-family/
        * c-common.h (c_oacc_annotate_loops_in_kernels_regions): Declare.
        * c-omp.c: Include tree-iterator.h
        (enum annotation_state): New.
        (struct annotation_info): New.
        (do_not_annotate_loop): New.
        (do_not_annotate_loop_nest): New.
        (annotation_error): New.
        (c_finish_omp_for_internal): Split from c_finish_omp_for.  Use
        annotation_error function.  Code refactoring to avoid destructive
        changes that cannot be undone in case of error.
        (is_local_var): New.
        (lang_specific_unwrap_initializer): New.
        (annotate_for_loop): New.
        (check_and_annotate_for_loop): New.
        (annotate_loops_in_kernels_regions): New.
        (c_oacc_annotate_loops_in_kernels_regions): New.
        * c.opt (Wopenacc-kernels-annotate-loops): New.
        (fopenacc-kernels-annotate-loops): New.

gcc/c/
        * c-decl.c (c_unwrap_for_init): New.
        (finish_function): Call c_oacc_annotate_loops_in_kernels_regions.

gcc/cp/
        * decl.c (cp_unwrap_for_init): New.
        (finish_function): Call c_oacc_annotate_loops_in_kernels_regions.

gcc/
        * doc/invoke.texi (Option Summary): Add entries for
        -Wopenacc-kernels-annotate-loops and
        -fno-openacc-kernels-annotate-loops.
        (Warning Options): Document -Wopenacc-kernels-annotate-loops.
        (Optimization Options): Document -fno-openacc-kernels-annotate-loops.

gcc/testsuite/
        * c-c++-common/goacc/classify-kernels-unparallelized.c: Add
        -fno-openacc-kernels-annotate-loops option.
        * c-c++-common/goacc/classify-kernels.c: Likewise.
        * c-c++-common/goacc/kernels-counter-var-redundant-load.c: Likewise.
        * c-c++-common/goacc/kernels-counter-vars-function-scope.c: Likewise.
        * c-c++-common/goacc/kernels-double-reduction.c: Likewise.
        * c-c++-common/goacc/kernels-double-reduction-n.c: Likewise.
        * c-c++-common/goacc/kernels-loop-2.c: Likewise.
        * c-c++-common/goacc/kernels-loop-3.c: Likewise.
        * c-c++-common/goacc/kernels-loop-data-2.c: Likewise.
        * c-c++-common/goacc/kernels-loop-data-enter-exit-2.c: Likewise.
        * c-c++-common/goacc/kernels-loop-data-enter-exit.c: Likewise.
        * c-c++-common/goacc/kernels-loop-data-update.c: Likewise.
        * c-c++-common/goacc/kernels-loop-data.c: Likewise.
        * c-c++-common/goacc/kernels-loop-g.c: Likewise.
        * c-c++-common/goacc/kernels-loop-mod-not-zero.c: Likewise.
        * c-c++-common/goacc/kernels-loop-n.c: Likewise.
        * c-c++-common/goacc/kernels-loop-nest.c: Likewise.
        * c-c++-common/goacc/kernels-loop.c: Likewise.
        * c-c++-common/goacc/kernels-one-counter-var.c: Likewise.
        * c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c:
        Likewise.
        * c-c++-common/goacc/kernels-reduction.c: Likewise.
        * c-c++-common/goacc/kernels-loop-annotation-1.c: New.
        * c-c++-common/goacc/kernels-loop-annotation-2.c: New.
        * c-c++-common/goacc/kernels-loop-annotation-3.c: New.
        * c-c++-common/goacc/kernels-loop-annotation-4.c: New.
        * c-c++-common/goacc/kernels-loop-annotation-5.c: New.
        * c-c++-common/goacc/kernels-loop-annotation-6.c: New.
        * c-c++-common/goacc/kernels-loop-annotation-7.c: New.
        * c-c++-common/goacc/kernels-loop-annotation-8.c: New.
        * c-c++-common/goacc/kernels-loop-annotation-9.c: New.
        * c-c++-common/goacc/kernels-loop-annotation-10.c: New.
        * c-c++-common/goacc/kernels-loop-annotation-11.c: New.
        * c-c++-common/goacc/kernels-loop-annotation-12.c: New.
        * c-c++-common/goacc/kernels-loop-annotation-13.c: New.
        * c-c++-common/goacc/kernels-loop-annotation-14.c: New.
        * c-c++-common/goacc/kernels-loop-annotation-15.c: New.
        * c-c++-common/goacc/kernels-loop-annotation-16.c: New.
        * c-c++-common/goacc/kernels-loop-annotation-17.c: New.
---
 gcc/c-family/c-common.h                       |   1 +
 gcc/c-family/c-omp.c                          | 799 ++++++++++++++++--
 gcc/c-family/c.opt                            |   8 +
 gcc/c/c-decl.c                                |  28 +
 gcc/cp/decl.c                                 |  44 +
 gcc/doc/invoke.texi                           |  32 +-
 .../goacc/classify-kernels-unparallelized.c   |   1 +
 .../c-c++-common/goacc/classify-kernels.c     |   3 +-
 .../kernels-counter-var-redundant-load.c      |   1 +
 .../kernels-counter-vars-function-scope.c     |   1 +
 .../goacc/kernels-double-reduction-n.c        |   1 +
 .../goacc/kernels-double-reduction.c          |   1 +
 .../c-c++-common/goacc/kernels-loop-2.c       |   1 +
 .../c-c++-common/goacc/kernels-loop-3.c       |   1 +
 .../goacc/kernels-loop-annotation-1.c         |  26 +
 .../goacc/kernels-loop-annotation-10.c        |  32 +
 .../goacc/kernels-loop-annotation-11.c        |  27 +
 .../goacc/kernels-loop-annotation-12.c        |  28 +
 .../goacc/kernels-loop-annotation-13.c        |  27 +
 .../goacc/kernels-loop-annotation-14.c        |  22 +
 .../goacc/kernels-loop-annotation-15.c        |  22 +
 .../goacc/kernels-loop-annotation-16.c        |  26 +
 .../goacc/kernels-loop-annotation-17.c        |  26 +
 .../goacc/kernels-loop-annotation-2.c         |  21 +
 .../goacc/kernels-loop-annotation-3.c         |  24 +
 .../goacc/kernels-loop-annotation-4.c         |  34 +
 .../goacc/kernels-loop-annotation-5.c         |  27 +
 .../goacc/kernels-loop-annotation-6.c         |  27 +
 .../goacc/kernels-loop-annotation-7.c         |  26 +
 .../goacc/kernels-loop-annotation-8.c         |  27 +
 .../goacc/kernels-loop-annotation-9.c         |  26 +
 .../c-c++-common/goacc/kernels-loop-data-2.c  |   1 +
 .../goacc/kernels-loop-data-enter-exit-2.c    |   1 +
 .../goacc/kernels-loop-data-enter-exit.c      |   1 +
 .../goacc/kernels-loop-data-update.c          |   1 +
 .../c-c++-common/goacc/kernels-loop-data.c    |   1 +
 .../c-c++-common/goacc/kernels-loop-g.c       |   1 +
 .../goacc/kernels-loop-mod-not-zero.c         |   1 +
 .../c-c++-common/goacc/kernels-loop-n.c       |   1 +
 .../c-c++-common/goacc/kernels-loop-nest.c    |   1 +
 .../c-c++-common/goacc/kernels-loop.c         |   1 +
 .../goacc/kernels-one-counter-var.c           |   1 +
 .../kernels-parallel-loop-data-enter-exit.c   |   1 +
 .../c-c++-common/goacc/kernels-reduction.c    |   1 +
 44 files changed, 1322 insertions(+), 61 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-1.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-10.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-11.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-12.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-13.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-14.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-15.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-16.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-17.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-2.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-3.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-4.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-5.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-6.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-7.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-8.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-9.c

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index f60714e34160..f8b414401a5d 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1247,6 +1247,7 @@ extern enum omp_clause_default_kind c_omp_predetermined_sharing (tree);
 extern enum omp_clause_defaultmap_kind c_omp_predetermined_mapping (tree);
 extern tree c_omp_check_context_selector (location_t, tree);
 extern void c_omp_mark_declare_variant (location_t, tree, tree);
+extern void c_oacc_annotate_loops_in_kernels_regions (tree, tree (*) (tree));
 extern const char *c_omp_map_clause_name (tree, bool);
 extern void c_omp_adjust_map_clauses (tree, bool);

diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c
index fad060670b65..fad50da8fbc4 100644
--- a/gcc/c-family/c-omp.c
+++ b/gcc/c-family/c-omp.c
@@ -37,7 +37,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "bitmap.h"
 #include "gimple-fold.h"
-
+#include "tree-iterator.h"

 /* Complete a #pragma oacc wait construct.  LOC is the location of
    the #pragma.  */
@@ -918,6 +918,110 @@ c_omp_for_incr_canonicalize_ptr (location_t loc, tree decl, tree incr)
   return incr;
 }

+/* State of annotation traversal for FOR loops in kernels regions,
+   used to control processing and diagnostic messages that are deferred until
+   the entire loop has been scanned.  */
+enum annotation_state {
+  as_outer,
+  as_in_kernels_region,
+  as_in_kernels_loop,
+  /* The remaining state values represent conversion failures caught
+     while in as_in_kernels_loop state.  To test whether the traversal is
+     in the body of a kernels loop, use (state >= as_in_kernels_loop).  */
+  as_invalid_variable_type,
+  as_missing_initializer,
+  as_invalid_initializer,
+  as_missing_predicate,
+  as_invalid_predicate,
+  as_missing_increment,
+  as_invalid_increment,
+  as_explicit_annotation,
+  as_invalid_control_flow,
+  as_invalid_break,
+  as_invalid_return,
+  as_invalid_call,
+  as_invalid_modification
+};
+
+/* Structure used to hold state for automatic annotation of FOR loops
+   in kernels regions.  LOOP is the nearest enclosing loop, or
+   NULL_TREE if outside of a loop context.  VARS is a tree_list
+   containing the variables controlling LOOP's termination (the
+   induction variable and a possible limit variable).  STATE keeps
+   track of whether loop satisfies all criteria making it legal to
+   parallelize.  Otherwise, REASON is a statement that blocks
+   automatic parallelization, such as an unstructured jump or an
+   assignment to a variable in VARS, used for printing diagnostics.
+
+   These structures are chained through NEXT, which points to the
+   next-closest enclosing loop's or the kernels region's annotation info, if
+   any.  */
+
+struct annotation_info
+{
+  tree loop;
+  tree vars;
+  bool break_ok;
+  enum annotation_state state;
+  tree reason;
+  struct annotation_info *next;
+};
+
+/* Mark the current loop's INFO as not OK to annotate, recording STATE
+   and REASON for producing diagnostics later.  */
+
+static void
+do_not_annotate_loop (struct annotation_info *info,
+                     enum annotation_state state, tree reason)
+{
+  if (info->state == as_in_kernels_loop)
+    {
+      info->state = state;
+      info->reason = reason;
+    }
+}
+
+/* Mark the current loop identified by INFO and all of its ancestors (i.e.,
+   enclosing loops) as not OK to annotate.  Arguments are the same as
+   for do_not_annotate_loop.  */
+
+static void
+do_not_annotate_loop_nest (struct annotation_info *info,
+                          enum annotation_state state, tree reason)
+{
+  while (info != NULL)
+    {
+      do_not_annotate_loop (info, state, reason);
+      info = info->next;
+    }
+}
+
+/* If INFO is non-null, call do_not_annotate_loop with STATE and REASON
+   to record info for diagnosing an error later.  Otherwise emit an error now
+   at ELOCUS with message MSG and the optional arguments.  */
+
+static void annotation_error (struct annotation_info *,
+                             enum annotation_state, tree, location_t,
+                             const char *, ...) ATTRIBUTE_GCC_DIAG(5,6);
+static
+void annotation_error (struct annotation_info *info,
+                             enum annotation_state state,
+                             tree reason,
+                             location_t elocus,
+                             const char *msg, ...)
+{
+  if (info)
+    do_not_annotate_loop (info, state, reason);
+  else
+    {
+      auto_diagnostic_group d;
+      va_list ap;
+      va_start (ap, msg);
+      emit_diagnostic_valist (DK_ERROR, elocus, -1, msg, &ap);
+      va_end (ap);
+    }
+}
+
 /* Validate and generate OMP_FOR.
    DECLV is a vector of iteration variables, for each collapsed loop.

@@ -927,12 +1031,19 @@ c_omp_for_incr_canonicalize_ptr (location_t loc, tree decl, tree incr)
    INITV, CONDV and INCRV are vectors containing initialization
    expressions, controlling predicates and increment expressions.
    BODY is the body of the loop and PRE_BODY statements that go before
-   the loop.  */
+   the loop.  FINAL_P is true if not inside a C++ template.

-tree
-c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
-                 tree orig_declv, tree initv, tree condv, tree incrv,
-                 tree body, tree pre_body, bool final_p)
+   INFO is null if called to parse an explicitly-annotated OMP for
+   loop, otherwise it holds state information for automatically
+   annotating a regular FOR loop in a kernels region.  In the former case,
+   malformed loops are hard errors; otherwise we just record the annotation
+   failure in INFO.  */
+
+static tree
+c_finish_omp_for_internal (location_t locus, enum tree_code code, tree declv,
+                          tree orig_declv, tree initv, tree condv, tree incrv,
+                          tree body, tree pre_body, bool final_p,
+                          struct annotation_info *info)
 {
   location_t elocus;
   bool fail = false;
@@ -956,12 +1067,14 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
       if (!INTEGRAL_TYPE_P (TREE_TYPE (decl))
          && TREE_CODE (TREE_TYPE (decl)) != POINTER_TYPE)
        {
-         error_at (elocus, "invalid type for iteration variable %qE", decl);
+         annotation_error (info, as_invalid_variable_type, decl, elocus,
+                           "invalid type for iteration variable %qE", decl);
          fail = true;
        }
       else if (TYPE_ATOMIC (TREE_TYPE (decl)))
        {
-         error_at (elocus, "%<_Atomic%> iteration variable %qE", decl);
+         annotation_error (info, as_invalid_variable_type, decl, elocus,
+                           "%<_Atomic%> iteration variable %qE", decl);
          fail = true;
          /* _Atomic iterator confuses stuff too much, so we risk ICE
             trying to diagnose it further.  */
@@ -977,7 +1090,8 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
          init = DECL_INITIAL (decl);
          if (init == NULL)
            {
-             error_at (elocus, "%qE is not initialized", decl);
+             annotation_error (info, as_missing_initializer, decl, elocus,
+                               "%qE is not initialized", decl);
              init = integer_zero_node;
              fail = true;
            }
@@ -998,7 +1112,8 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,

       if (cond == NULL_TREE)
        {
-         error_at (elocus, "missing controlling predicate");
+         annotation_error (info, as_missing_predicate, NULL_TREE, elocus,
+                           "missing controlling predicate");
          fail = true;
        }
       else
@@ -1014,12 +1129,14 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
          if (EXPR_HAS_LOCATION (cond))
            elocus = EXPR_LOCATION (cond);

-         if (TREE_CODE (cond) == LT_EXPR
-             || TREE_CODE (cond) == LE_EXPR
-             || TREE_CODE (cond) == GT_EXPR
-             || TREE_CODE (cond) == GE_EXPR
-             || TREE_CODE (cond) == NE_EXPR
-             || TREE_CODE (cond) == EQ_EXPR)
+         enum tree_code condcode = TREE_CODE (cond);
+
+         if (condcode == LT_EXPR
+             || condcode == LE_EXPR
+             || condcode == GT_EXPR
+             || condcode == GE_EXPR
+             || condcode == NE_EXPR
+             || condcode == EQ_EXPR)
            {
              tree op0 = TREE_OPERAND (cond, 0);
              tree op1 = TREE_OPERAND (cond, 1);
@@ -1039,79 +1156,88 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
              if (TREE_CODE (op0) == NOP_EXPR
                  && decl == TREE_OPERAND (op0, 0))
                {
-                 TREE_OPERAND (cond, 0) = TREE_OPERAND (op0, 0);
-                 TREE_OPERAND (cond, 1)
-                   = fold_build1_loc (elocus, NOP_EXPR, TREE_TYPE (decl),
-                                  TREE_OPERAND (cond, 1));
+                 op0 = TREE_OPERAND (op0, 0);
+                 op1 = fold_build1_loc (elocus, NOP_EXPR, TREE_TYPE (decl),
+                                        op1);
                }
              else if (TREE_CODE (op1) == NOP_EXPR
                       && decl == TREE_OPERAND (op1, 0))
                {
-                 TREE_OPERAND (cond, 1) = TREE_OPERAND (op1, 0);
-                 TREE_OPERAND (cond, 0)
-                   = fold_build1_loc (elocus, NOP_EXPR, TREE_TYPE (decl),
-                                  TREE_OPERAND (cond, 0));
+                 op1 = TREE_OPERAND (op1, 0);
+                 op0 = fold_build1_loc (elocus, NOP_EXPR, TREE_TYPE (decl),
+                                        op0);
                }

-             if (decl == TREE_OPERAND (cond, 0))
+             if (decl == op0)
                cond_ok = true;
-             else if (decl == TREE_OPERAND (cond, 1))
+             else if (decl == op1)
                {
-                 TREE_SET_CODE (cond,
-                                swap_tree_comparison (TREE_CODE (cond)));
-                 TREE_OPERAND (cond, 1) = TREE_OPERAND (cond, 0);
-                 TREE_OPERAND (cond, 0) = decl;
+                 condcode = swap_tree_comparison (condcode);
+                 op1 = op0;
+                 op0 = decl;
                  cond_ok = true;
                }

-             if (TREE_CODE (cond) == NE_EXPR
-                 || TREE_CODE (cond) == EQ_EXPR)
+             if (condcode == NE_EXPR || condcode == EQ_EXPR)
                {
                  if (!INTEGRAL_TYPE_P (TREE_TYPE (decl)))
                    {
-                     if (code == OACC_LOOP || TREE_CODE (cond) == EQ_EXPR)
+                     if (code == OACC_LOOP || condcode == EQ_EXPR)
                        cond_ok = false;
                    }
-                 else if (operand_equal_p (TREE_OPERAND (cond, 1),
+                 else if (operand_equal_p (op1,
                                            TYPE_MIN_VALUE (TREE_TYPE (decl)),
                                            0))
-                   TREE_SET_CODE (cond, TREE_CODE (cond) == NE_EXPR
-                                        ? GT_EXPR : LE_EXPR);
-                 else if (operand_equal_p (TREE_OPERAND (cond, 1),
+                   condcode = (condcode == NE_EXPR ? GT_EXPR : LE_EXPR);
+                 else if (operand_equal_p (op1,
                                            TYPE_MAX_VALUE (TREE_TYPE (decl)),
                                            0))
-                   TREE_SET_CODE (cond, TREE_CODE (cond) == NE_EXPR
-                                        ? LT_EXPR : GE_EXPR);
-                 else if (code == OACC_LOOP || TREE_CODE (cond) == EQ_EXPR)
+                   condcode = (condcode == NE_EXPR ? LT_EXPR : GE_EXPR);
+                 else if (code == OACC_LOOP || condcode == EQ_EXPR)
                    cond_ok = false;
                }

-             if (cond_ok && TREE_VEC_ELT (condv, i) != cond)
+             if (cond_ok)
                {
-                 tree ce = NULL_TREE, *pce = &ce;
-                 tree type = TREE_TYPE (TREE_OPERAND (cond, 1));
-                 for (tree c = TREE_VEC_ELT (condv, i); c != cond;
-                      c = TREE_OPERAND (c, 1))
+                 /* We postponed destructive changes to canonicalize
+                    cond until we're sure it is OK.  In the !error_p
+                    case where we are trying to transform a regular FOR_STMT
+                    to OMP_FOR, we don't want to destroy the original
+                    condition if we aren't going to be able to do the
+                    transformation anyway.  */
+                 TREE_SET_CODE (cond, condcode);
+                 TREE_OPERAND (cond, 0) = op0;
+                 TREE_OPERAND (cond, 1) = op1;
+
+                 if (TREE_VEC_ELT (condv, i) != cond)
                    {
-                     *pce = build2 (COMPOUND_EXPR, type, TREE_OPERAND (c, 0),
-                                    TREE_OPERAND (cond, 1));
-                     pce = &TREE_OPERAND (*pce, 1);
+                     tree ce = NULL_TREE, *pce = &ce;
+                     tree type = TREE_TYPE (op1);
+                     for (tree c = TREE_VEC_ELT (condv, i); c != cond;
+                          c = TREE_OPERAND (c, 1))
+                       {
+                         *pce = build2 (COMPOUND_EXPR, type,
+                                        TREE_OPERAND (c, 0), op1);
+                         pce = &TREE_OPERAND (*pce, 1);
+                       }
+                     op1 = ce;
+                     TREE_VEC_ELT (condv, i) = cond;
                    }
-                 TREE_OPERAND (cond, 1) = ce;
-                 TREE_VEC_ELT (condv, i) = cond;
                }
            }

          if (!cond_ok)
            {
-             error_at (elocus, "invalid controlling predicate");
+             annotation_error (info, as_invalid_predicate, cond, elocus,
+                               "invalid controlling predicate");
              fail = true;
            }
        }

       if (incr == NULL_TREE)
        {
-         error_at (elocus, "missing increment expression");
+         annotation_error (info, as_missing_increment, NULL_TREE, elocus,
+                           "missing increment expression");
          fail = true;
        }
       else
@@ -1210,9 +1336,11 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
                              if (i == NULL_TREE
                                  || !operand_equal_p (unit, i, 0))
                                {
-                                 error_at (elocus,
-                                           "increment is not constant 1 or "
-                                           "-1 for %<!=%> condition");
+                                 annotation_error (info,
+                                                   as_invalid_increment,
+                                                   incr, elocus,
+                                                   "increment is not constant 1 or "
+                                                   "-1 for %<!=%> condition");
                                  fail = true;
                                }
                            }
@@ -1228,9 +1356,10 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
                    {
                      if (!integer_onep (i) && !integer_minus_onep (i))
                        {
-                         error_at (elocus,
-                                   "increment is not constant 1 or -1 for"
-                                   " %<!=%> condition");
+                         annotation_error (info, as_invalid_increment,
+                                           incr, elocus,
+                                           "increment is not constant 1 or -1 for"
+                                           " %<!=%> condition");
                          fail = true;
                        }
                    }
@@ -1242,7 +1371,8 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
            }
          if (!incr_ok)
            {
-             error_at (elocus, "invalid increment expression");
+             annotation_error (info, as_invalid_increment, incr,
+                               elocus, "invalid increment expression");
              fail = true;
            }
        }
@@ -1270,6 +1400,20 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
     }
 }

+/* External entry point to c_finish_omp_for_internal, called from the
+   parsers.  See above for description of the arguments.  */
+
+tree
+c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
+                 tree orig_declv, tree initv, tree condv, tree incrv,
+                 tree body, tree pre_body, bool final_p)
+{
+  return c_finish_omp_for_internal (locus, code, declv,
+                                   orig_declv, initv, condv, incrv,
+                                   body, pre_body, final_p, NULL);
+}
+
+
 /* Type for passing data in between c_omp_check_loop_iv and
    c_omp_check_loop_iv_r.  */

@@ -3000,6 +3144,543 @@ c_omp_map_clause_name (tree clause, bool oacc)
   return omp_clause_code_name[OMP_CLAUSE_CODE (clause)];
 }

+/* The following functions implement automatic recognition and annotation of
+   for loops in OpenACC kernels regions.  Inside a kernels region, a nest of
+   for loops that does not contain any annotated OpenACC loops, nor break
+   or goto statements or assignments to the variables controlling loop
+   termination, is converted to an OMP_FOR node with an "acc loop auto"
+   annotation on each loop.  This feature is controlled by
+   flag_openacc_kernels_annotate_loops.  */
+
+/* Check whether DECL is the declaration of a local variable (or function
+   parameter) of integral type that does not have its address taken.  */
+
+static bool
+is_local_var (tree decl)
+{
+  return ((TREE_CODE (decl) == VAR_DECL || TREE_CODE (decl) == PARM_DECL)
+         && DECL_CONTEXT (decl) != NULL
+         && TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL
+         && INTEGRAL_TYPE_P (TREE_TYPE (decl))
+         && !TREE_ADDRESSABLE (decl));
+}
+
+/* The initializer for a FOR_STMT is sometimes wrapped in various other
+   language-specific tree structures.  We need a hook to unwrap them.
+   This function takes a tree argument and should return either a
+   MODIFY_EXPR, VAR_DECL, or NULL_TREE.  */
+
+static tree (*lang_specific_unwrap_initializer) (tree);
+
+/* Try to annotate the given NODE, which must be a FOR_STMT, with a
+   "#pragma acc loop auto" annotation.  In practice, this means
+   building an OMP_FOR node for it.  PREV_STMT is the statement
+   immediately before the loop, which may be used as the loop's
+   initialization statement.  Annotating the loop may fail, in which
+   case INFO is used to record the cause of the failure and the
+   original loop remains unchanged.  This function returns the
+   transformed loop if the transformation succeeded, the original node
+   otherwise.  */
+
+static tree
+annotate_for_loop (tree node, tree_stmt_iterator *prev_tsi,
+                  struct annotation_info *info)
+{
+  gcc_checking_assert (TREE_CODE (node) == FOR_STMT);
+
+  location_t loc = EXPR_LOCATION (node);
+  tree cond = FOR_COND (node);
+  gcc_assert (cond);
+  tree decl = TREE_OPERAND (cond, 0);
+  gcc_assert (decl && TREE_CODE (decl) == VAR_DECL);
+  tree init = FOR_INIT_STMT (node);
+  tree prev_stmt = NULL_TREE;
+  bool unlink_prev = false;
+  bool fix_decl = false;
+
+
+  /* Both the C and C++ front ends normally put the initializer in the
+     statement list just before the FOR_STMT instead of in FOR_INIT_STMT.
+     If FOR_INIT_STMT happens to exist but isn't a MODIFY_EXPR, bail out
+     because the code below won't handle it.  */
+  if (init != NULL_TREE && TREE_CODE (init) != MODIFY_EXPR)
+    {
+      do_not_annotate_loop (info, as_invalid_initializer, NULL_TREE);
+      return node;
+    }
+
+  /* Examine the statement before the loop to see if it is a
+     valid initializer.  It must be either a MODIFY_EXPR or VAR_DECL,
+     possibly wrapped in language-specific structure.  */
+  if (init == NULL_TREE && prev_tsi != NULL)
+    {
+      prev_stmt = tsi_stmt (*prev_tsi);
+
+      /* Call the language-specific hook to unwrap prev_stmt.  */
+      if (prev_stmt)
+       prev_stmt = (*lang_specific_unwrap_initializer) (prev_stmt);
+
+      /* See if we have a valid MODIFY_EXPR.  */
+      if (prev_stmt
+         && TREE_CODE (prev_stmt) == MODIFY_EXPR
+         && TREE_OPERAND (prev_stmt, 0) == decl
+         && !TREE_SIDE_EFFECTS (TREE_OPERAND (prev_stmt, 1)))
+       {
+         init = prev_stmt;
+         unlink_prev = true;
+       }
+      else if (prev_stmt == decl
+              && !TREE_SIDE_EFFECTS (DECL_INITIAL (decl)))
+       {
+         /* If the preceding statement is the declaration of the loop
+            variable with its initialization, build an assignment
+            expression for the loop's initializer.  */
+         init = build2 (MODIFY_EXPR, TREE_TYPE (decl), decl,
+                        DECL_INITIAL (decl));
+         /* We need to remove the initializer from the decl if we
+            end up using the init we just built instead.  */
+         fix_decl = true;
+       }
+    }
+
+  if (init == NULL_TREE)
+    /* There is nothing we can do to find the correct init statement for
+       this loop, but c_finish_omp_for insists on having one and would fail
+       otherwise.  In that case, we would just return node.  Do that
+       directly, here.  */
+    {
+      do_not_annotate_loop (info, as_missing_initializer, NULL_TREE);
+      return node;
+    }
+
+  tree incr = FOR_EXPR (node);
+
+  /* The C++ frontend can wrap the increment two levels deep inside a
+     cleanup expression, but c_finish_omp_for does not care about that.  */
+  if (incr != NULL_TREE && TREE_CODE (incr) == CLEANUP_POINT_EXPR)
+    incr = TREE_OPERAND (TREE_OPERAND (incr, 0), 0);
+  tree body = FOR_BODY (node);
+
+  tree declv = make_tree_vec (1);
+  tree initv = make_tree_vec (1);
+  tree condv = make_tree_vec (1);
+  tree incrv = make_tree_vec (1);
+  TREE_VEC_ELT (declv, 0) = decl;
+  TREE_VEC_ELT (initv, 0) = init;
+  TREE_VEC_ELT (condv, 0) = cond;
+  TREE_VEC_ELT (incrv, 0) = incr;
+
+  /* Do the actual transformation.  This can still fail because
+     c_finish_omp_for has some stricter checks than we have performed up to
+     this point.  */
+  tree omp_for = c_finish_omp_for_internal (loc, OACC_LOOP, declv, NULL_TREE,
+                                           initv, condv, incrv, body,
+                                           NULL_TREE, false, info);
+  if (omp_for != NULL_TREE)
+    {
+      if (unlink_prev)
+       /* We don't need the previous statement that we consumed as an
+          initializer in the new OMP_FOR any more.  */
+       tsi_delink (prev_tsi);
+
+      if (fix_decl)
+       /* We no longer need the initializer expression on the decl of
+          the loop variable and don't want to duplicate it.  The
+          kernels conversion pass would interpret it as a stray
+          assignment in a gang-single region.  */
+       DECL_INITIAL (prev_stmt) = NULL_TREE;
+
+      /* Add an auto clause, then return the new loop.  */
+      tree auto_clause = build_omp_clause (loc, OMP_CLAUSE_AUTO);
+      OMP_CLAUSE_CHAIN (auto_clause) = OMP_FOR_CLAUSES (omp_for);
+      OMP_FOR_CLAUSES (omp_for) = auto_clause;
+      return omp_for;
+    }
+
+  return node;
+}
+
+/* Forward declaration.  */
+static tree annotate_loops_in_kernels_regions (tree *, int *, void *);
+
+/* Given a FOR_STMT NODE that is a candidate for parallelization, check its
+   body for validity, then try to annotate it with
+   "#pragma oacc loop auto", possibly modifying the current node in place.
+   The INFO argument contains the traversal state at the point the loop
+   appears.  */
+
+static void
+check_and_annotate_for_loop (tree *nodeptr, tree_stmt_iterator *prev_tsi,
+                            struct annotation_info *info)
+{
+  tree node = *nodeptr;
+  gcc_assert (TREE_CODE (node) == FOR_STMT);
+
+  /* This structure describes the current loop statement.  */
+  struct annotation_info loop_info
+    = { node, NULL_TREE, false, as_in_kernels_loop, NULL_TREE, info };
+  tree cond = FOR_COND (node);
+
+  /* If we are in the body of an explicitly-annotated loop, do not add
+     annotations to this loop or any other nested loops.  */
+  if (info->state == as_explicit_annotation)
+    do_not_annotate_loop (&loop_info, as_explicit_annotation, info->reason);
+
+  /* We need to find the controlling variable for the loop in order
+     to detect whether it is modified in the body of the loop.
+     That is why we are doing some checks on the loop condition
+     that duplicate what c_finish_omp_for is doing.  */
+
+  /* The loop condition must be a comparison.  */
+  else if (cond == NULL_TREE)
+    do_not_annotate_loop (&loop_info, as_missing_predicate, NULL_TREE);
+  else if (TREE_CODE_CLASS (TREE_CODE (cond)) != tcc_comparison)
+    do_not_annotate_loop (&loop_info, as_invalid_predicate, cond);
+  else
+    {
+      /* The condition's LHS must be a local variable that does not
+        have its address taken.  Its RHS must also be such a local
+        variable or a constant.  */
+      tree induction_var = TREE_OPERAND (cond, 0);
+      tree limit_var = TREE_OPERAND (cond, 1);
+      if (!is_local_var (induction_var)
+         || (!is_local_var (limit_var)
+             && (TREE_CODE_CLASS (TREE_CODE (limit_var))
+                 != tcc_constant)))
+       do_not_annotate_loop (&loop_info, as_invalid_predicate, cond);
+      else
+       {
+         /* These variables must not be assigned to in the loop.  */
+         loop_info.vars = tree_cons (NULL_TREE, induction_var,
+                                     loop_info.vars);
+         if (TREE_CODE_CLASS (TREE_CODE (limit_var)) != tcc_constant)
+           loop_info.vars = tree_cons (NULL_TREE, limit_var, loop_info.vars);
+       }
+    }
+
+  /* Walk the body.  This will process any nested loops, so we have to do it
+     even if we have already rejected this loop as a candidate for
+     annotation.  */
+  walk_tree (&FOR_BODY (node), annotate_loops_in_kernels_regions,
+            (void *) &loop_info, NULL);
+
+  if (loop_info.state == as_in_kernels_loop)
+    {
+      /* If the traversal of the loop and all nested loops didn't hit
+        any problems, attempt the actual transformation.  If it
+        succeeds, replace this node with the annotated loop.  */
+      tree result = annotate_for_loop (node, prev_tsi, &loop_info);
+      if (result != node)
+       {
+         /* Success!  */
+         *nodeptr = result;
+         return;
+       }
+    }
+
+  /* If we got here, we have a FOR_STMT we could not convert to an
+     OMP loop.  */
+
+  if (loop_info.state == as_invalid_return)
+    /* This is diagnosed elsewhere as a hard error, so no warning is
+       needed here.  */
+    return;
+
+  /* Issue warnings about other problems.  */
+  auto_diagnostic_group d;
+  if (warning_at (EXPR_LOCATION (node),
+                 OPT_Wopenacc_kernels_annotate_loops,
+                 "loop cannot be annotated for OpenACC parallelization"))
+    {
+      location_t locus;
+      if (loop_info.reason && EXPR_HAS_LOCATION (loop_info.reason))
+       locus = EXPR_LOCATION (loop_info.reason);
+      else
+       locus = EXPR_LOCATION (node);
+      switch (loop_info.state)
+       {
+       case as_invalid_variable_type:
+         inform (locus, "invalid type for iteration variable %qE",
+                 loop_info.reason);
+         break;
+       case as_missing_initializer:
+         inform (locus, "missing iteration variable initializer");
+         break;
+       case as_invalid_initializer:
+         inform (locus, "unrecognized initializer");
+         break;
+       case as_missing_predicate:
+         inform (locus, "missing controlling predicate");
+         break;
+       case as_invalid_predicate:
+         inform (locus, "invalid controlling predicate");
+         break;
+       case as_missing_increment:
+         inform (locus, "missing increment expression");
+         break;
+       case as_invalid_increment:
+         inform (locus, "invalid increment expression");
+         break;
+       case as_explicit_annotation:
+         inform (locus, "explicit OpenACC annotation in loop nest");
+         break;
+       case as_invalid_control_flow:
+         inform (locus, "loop contains unstructured control flow");
+         break;
+       case as_invalid_break:
+         inform (locus, "loop contains %<break%> statement");
+         break;
+       case as_invalid_call:
+         inform (locus, "loop contains call to non-oacc function");
+         break;
+       case as_invalid_modification:
+         inform (locus, "invalid modification of controlling variable");
+         break;
+       default:
+         gcc_unreachable ();
+       }
+    }
+}
+
+/* Traversal function for walk_tree.  Visit the tree, finding OpenACC
+   kernels regions.  DATA is NULL if we are outside of a kernels region,
+   otherwise it is a pointer to the enclosing kernels region's
+   annotation_info struct.  If the traversal encounters a for loop inside a
+   kernels region that is a candidate for parallelization, annotate it
+   with OpenACC loop directives.  */
+
+static tree
+annotate_loops_in_kernels_regions (tree *nodeptr, int *walk_subtrees,
+                                  void *data)
+{
+  tree node = *nodeptr;
+  struct annotation_info *info = (struct annotation_info *) data;
+  gcc_assert (info);
+
+  switch (TREE_CODE (node))
+    {
+    case OACC_KERNELS:
+      /* Recursively process the body of the kernels region in a new info
+        scope.  */
+      if (info->state == as_outer)
+       {
+         struct annotation_info nested_info
+           = { NULL_TREE, NULL_TREE, true,
+               as_in_kernels_region, NULL_TREE, info };
+         walk_tree (&OMP_BODY (node), annotate_loops_in_kernels_regions,
+                    (void *) &nested_info, NULL);
+         *walk_subtrees = 0;
+       }
+      break;
+
+    case OACC_LOOP:
+      /* Do not try to add automatic OpenACC annotations inside manually
+        annotated loops.  Presumably, the user avoided doing it on
+        purpose; for example, all available levels of parallelism may
+        have been used up.  */
+      {
+       struct annotation_info nested_info
+         = { NULL_TREE, NULL_TREE, false, as_explicit_annotation,
+             node, info };
+       if (info->state >= as_in_kernels_region)
+         do_not_annotate_loop_nest (info, as_explicit_annotation,
+                                    node);
+       walk_tree (&OMP_BODY (node), annotate_loops_in_kernels_regions,
+                  (void *) &nested_info, NULL);
+       *walk_subtrees = 0;
+      }
+      break;
+
+    case FOR_STMT:
+      /* Try to annotate the loop if we are in a kernels region.
+        This will do a recursive traversal of the loop body in a new
+        info scope.  */
+      if (info->state >= as_in_kernels_region)
+       {
+         check_and_annotate_for_loop (nodeptr, NULL, info);
+         *walk_subtrees = 0;
+       }
+      break;
+
+    case LABEL_EXPR:
+      /* Possibly unstructured control flow.  Unless we perform further
+        analyses, we must assume that such control flow may enter the
+        current loop.  In this case, we must not parallelize the loop.  */
+      if (info->state >= as_in_kernels_loop
+         && TREE_USED (LABEL_EXPR_LABEL (node)))
+       do_not_annotate_loop_nest (info, as_invalid_control_flow, node);
+      break;
+
+    case GOTO_EXPR:
+      /* Possibly unstructured control flow.  Unless we perform further
+        analyses, we must assume that such control flow may leave the
+        current loop.  In this case, we must not parallelize the loop.  */
+      if (info->state >= as_in_kernels_loop)
+       do_not_annotate_loop_nest (info, as_invalid_control_flow, node);
+      break;
+
+    case BREAK_STMT:
+      /* A break statement.  Whether or not this is valid depends on the
+        enclosing context.  */
+      if (info->state >= as_in_kernels_loop && !info->break_ok)
+       do_not_annotate_loop (info, as_invalid_break, node);
+      break;
+
+    case RETURN_EXPR:
+      /* A return leaves the entire loop nest.  */
+      if (info->state >= as_in_kernels_loop)
+       do_not_annotate_loop_nest (info, as_invalid_return, node);
+      break;
+
+    case CALL_EXPR:
+      /* Direct function calls to functions marked as OpenACC routines are
+        allowed.  Reject indirect calls or calls to non-routines.  */
+      if (info->state >= as_in_kernels_loop)
+       {
+         tree fn = CALL_EXPR_FN (node), fn_decl = NULL_TREE;
+         if (fn != NULL_TREE && TREE_CODE (fn) == FUNCTION_DECL)
+           fn_decl = fn;
+         else if (fn != NULL_TREE && TREE_CODE (fn) == ADDR_EXPR)
+           {
+             tree fn_op = TREE_OPERAND (fn, 0);
+             if (fn_op != NULL_TREE && TREE_CODE (fn_op) == FUNCTION_DECL)
+               fn_decl = fn_op;
+           }
+         if (fn_decl == NULL_TREE)
+           do_not_annotate_loop_nest (info, as_invalid_call, node);
+         else if (!lookup_attribute ("oacc function",
+                                     DECL_ATTRIBUTES (fn_decl)))
+           do_not_annotate_loop_nest (info, as_invalid_call, node);
+       }
+      break;
+
+    case MODIFY_EXPR:
+      /* See if this assignment's LHS is one of the variables that must
+        not be modified in the loop body because they control termination
+        of the loop (or an enclosing loop in the nest).  */
+      if (info->state >= as_in_kernels_loop)
+       {
+         tree lhs = TREE_OPERAND (node, 0);
+         if (!is_local_var (lhs))
+           /* Early exit: This cannot be a variable we care about.  */
+           break;
+         /* Walk up the loop stack.  Invalidate the ones controlled by this
+            variable.  There may be several, if this variable is the common
+            iteration limit for several nested loops.  */
+         for (struct annotation_info *outer_loop = info; outer_loop != NULL;
+              outer_loop = outer_loop->next)
+           for (tree t = outer_loop->vars; t != NULL_TREE; t = TREE_CHAIN (t))
+             if (TREE_VALUE (t) == lhs)
+               {
+                 do_not_annotate_loop (outer_loop,
+                                       as_invalid_modification,
+                                       node);
+                 break;
+               }
+       }
+      break;
+
+    case SWITCH_STMT:
+      /* Needs special handling to allow break in the body.  */
+      if (info->state >= as_in_kernels_loop)
+       {
+         bool save_break_ok = info->break_ok;
+
+         walk_tree (&SWITCH_STMT_COND (node),
+                    annotate_loops_in_kernels_regions,
+                    (void *) info, NULL);
+         info->break_ok = true;
+         walk_tree (&SWITCH_STMT_BODY (node),
+                    annotate_loops_in_kernels_regions,
+                    (void *) info, NULL);
+         info->break_ok = save_break_ok;
+         *walk_subtrees = 0;
+       }
+      break;
+
+    case WHILE_STMT:
+      /* Needs special handling to allow break in the body.  */
+      if (info->state >= as_in_kernels_loop)
+       {
+         bool save_break_ok = info->break_ok;
+
+         walk_tree (&WHILE_COND (node), annotate_loops_in_kernels_regions,
+                    (void *) info, NULL);
+         info->break_ok = true;
+         walk_tree (&WHILE_BODY (node), annotate_loops_in_kernels_regions,
+                    (void *) info, NULL);
+         info->break_ok = save_break_ok;
+         *walk_subtrees = 0;
+       }
+      break;
+
+    case DO_STMT:
+      /* Needs special handling to allow break in the body.  */
+      if (info->state >= as_in_kernels_loop)
+       {
+         bool save_break_ok = info->break_ok;
+
+         walk_tree (&DO_COND (node), annotate_loops_in_kernels_regions,
+                    (void *) info, NULL);
+         info->break_ok = true;
+         walk_tree (&DO_BODY (node), annotate_loops_in_kernels_regions,
+                    (void *) info, NULL);
+         info->break_ok = save_break_ok;
+         *walk_subtrees = 0;
+       }
+      break;
+
+    case STATEMENT_LIST:
+      /* We iterate over these explicitly so that we can track the previous
+        statement in the chain.  It may be the initializer for a following
+        FOR_STMT node.  */
+      if (info->state >= as_in_kernels_region)
+       {
+         tree_stmt_iterator i = tsi_start (node);
+         tree_stmt_iterator prev, *prev_tsi = NULL;
+         while (!tsi_end_p (i))
+           {
+             tree *stmtptr = tsi_stmt_ptr (i);
+             if (TREE_CODE (*stmtptr) == FOR_STMT)
+               {
+                 check_and_annotate_for_loop (stmtptr, prev_tsi, info);
+                 *walk_subtrees = 0;
+               }
+             else
+               walk_tree (stmtptr, annotate_loops_in_kernels_regions,
+                          (void *) info, NULL);
+             prev = i;
+             prev_tsi = &prev;
+             tsi_next (&i);
+           }
+         *walk_subtrees = 0;
+       }
+      break;
+
+    default:
+      break;
+    }
+
+  return NULL_TREE;
+}
+
+/* Find for loops in OpenACC kernels regions that do not have OpenACC
+   annotations but look like they might benefit from automatic
+   parallelization.  Convert them from FOR_STMT to OMP_FOR nodes and
+   add the equivalent of "#pragma acc loop auto" annotations for them.
+   Assumes flag_openacc_kernels_annotate_loops is set.  */
+
+void
+c_oacc_annotate_loops_in_kernels_regions (tree decl,
+                                         tree (*unwrap_fn) (tree))
+{
+  struct annotation_info info
+    = { NULL_TREE, NULL_TREE, true, as_outer, NULL_TREE, NULL };
+  lang_specific_unwrap_initializer = unwrap_fn;
+  walk_tree (&DECL_SAVED_TREE (decl), annotate_loops_in_kernels_regions,
+            (void *) &info, NULL);
+}
+
 /* Used to merge map clause information in c_omp_adjust_map_clauses.  */
 struct map_clause
 {
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 06457ac739e4..a0f43d6d325f 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1074,6 +1074,10 @@ Wopenacc-parallelism
 C C++ Var(warn_openacc_parallelism) Warning
 Warn about potentially suboptimal choices related to OpenACC parallelism.

+Wopenacc-kernels-annotate-loops
+C ObjC C++ ObjC++ Warning Var(warn_openacc_kernels_annotate_loops) Init(0)
+Warn about loops in OpenACC kernels regions that cannot be parallelized.
+
 Wopenmp-simd
 C C++ Var(warn_openmp_simd) Warning LangEnabledBy(C C++,Wall)
 Warn if a simd directive is overridden by the vectorizer cost model.
@@ -1910,6 +1914,10 @@ fopenacc-dim=
 C ObjC C++ ObjC++ LTO Joined Var(flag_openacc_dims)
 Specify default OpenACC compute dimensions.

+fopenacc-kernels-annotate-loops
+C ObjC C++ ObjC++ LTO Optimization Var(flag_openacc_kernels_annotate_loops) Init(1)
+Automatically parallelize unannotated loops in OpenACC kernels regions.
+
 fopenmp
 C ObjC C++ ObjC++ LTO Var(flag_openmp)
 Enable OpenMP (implies -frecursive in Fortran).
diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index 186fa1692c16..467b3425b9a4 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -10230,6 +10230,29 @@ temp_pop_parm_decls (void)
   pop_scope ();
 }

+/* Function passed to c_oacc_annotate_loop_in_kernels_regions to do
+   language-specific unwrapping of an initializer expression.  */
+static tree
+c_unwrap_for_init (tree x)
+{
+  if (!x)
+    return NULL_TREE;
+
+  while (true)
+    switch (TREE_CODE (x))
+      {
+      case MODIFY_EXPR:
+      case VAR_DECL:
+       return x;
+
+      case DECL_EXPR:
+       x = TREE_OPERAND (x, 0);
+       break;
+
+      default:
+       return NULL_TREE;
+      }
+}

 /* Finish up a function declaration and compile that function
    all the way to assembler language output.  Then free the storage
@@ -10332,6 +10355,11 @@ finish_function (location_t end_loc)
   if (warn_unused_parameter)
     do_warn_unused_parameter (fndecl);

+  /* If requested, automatically annotate suitable loops in OpenACC kernels
+     regions with OpenACC loop annotations to allow auto-parallelization.  */
+  if (flag_openacc && flag_openacc_kernels_annotate_loops)
+    c_oacc_annotate_loops_in_kernels_regions (fndecl, c_unwrap_for_init);
+
   /* Store the end of the function, so that we get good line number
      info for the epilogue.  */
   cfun->function_end_locus = end_loc;
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 7c2a134e4061..17f14d1f6742 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -17528,6 +17528,45 @@ emit_coro_helper (tree helper)
   expand_or_defer_fn (helper);
 }

+
+/* Function passed to c_oacc_annotate_loop_in_kernels_regions to do
+   language-specific unwrapping of an initializer expression.  */
+static tree
+cp_unwrap_for_init (tree x)
+{
+  if (!x)
+    return NULL_TREE;
+
+  while (true)
+    switch (TREE_CODE (x))
+      {
+      case MODIFY_EXPR:
+      case VAR_DECL:
+       return x;
+
+      case CLEANUP_POINT_EXPR:
+       x = TREE_OPERAND (x, 0);
+       break;
+
+      case EXPR_STMT:
+       x = TREE_OPERAND (x, 0);
+       break;
+
+      case DECL_EXPR:
+       x = TREE_OPERAND (x, 0);
+       break;
+
+      case CONVERT_EXPR:
+       if (TREE_TYPE (x) != void_type_node)
+         return NULL_TREE;
+       x = TREE_OPERAND (x, 0);
+       break;
+
+      default:
+       return NULL_TREE;
+      }
+}
+
 /* Finish up a function declaration and compile that function
    all the way to assembler language output.  The free the storage
    for the function definition. INLINE_P is TRUE if we just
@@ -17832,6 +17871,11 @@ finish_function (bool inline_p)
       && !DECL_CLONED_FUNCTION_P (fndecl))
     do_warn_unused_parameter (fndecl);

+  /* If requested, automatically annotate suitable loops in OpenACC kernels
+     regions with OpenACC loop annotations to allow auto-parallelization.  */
+  if (flag_openacc && flag_openacc_kernels_annotate_loops)
+    c_oacc_annotate_loops_in_kernels_regions (fndecl, cp_unwrap_for_init);
+
   /* Genericize before inlining.  */
   if (!processing_template_decl
       && !DECL_IMMEDIATE_FUNCTION_P (fndecl)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 9fb74d349203..e0f09610408c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -371,6 +371,7 @@ Objective-C and Objective-C++ Dialects}.
 -Wnull-dereference  -Wno-odr  @gol
 -Wopenacc-parallelism  @gol
 -Wopenmp-simd  @gol
+-Wopenacc-kernels-annotate-loops  -Wopenmp-simd @gol
 -Wno-overflow  -Woverlength-strings  -Wno-override-init-side-effects @gol
 -Wpacked  -Wno-packed-bitfield-compat  -Wpacked-not-aligned  -Wpadded @gol
 -Wparentheses  -Wno-pedantic-ms-format @gol
@@ -533,7 +534,8 @@ Objective-C and Objective-C++ Dialects}.
 -fmerge-constants  -fmodulo-sched  -fmodulo-sched-allow-regmoves @gol
 -fmove-loop-invariants  -fmove-loop-stores  -fno-branch-count-reg @gol
 -fno-defer-pop  -fno-fp-int-builtin-inexact  -fno-function-cse @gol
--fno-guess-branch-probability  -fno-inline  -fno-math-errno  -fno-peephole @gol
+-fno-guess-branch-probability  -fno-inline  -fno-math-errno @gol
+-fno-openacc-kernels-annotate-loops  -fno-peephole @gol
 -fno-peephole2  -fno-printf-return-value  -fno-sched-interblock @gol
 -fno-sched-spec  -fno-signed-zeros @gol
 -fno-toplevel-reorder  -fno-trapping-math  -fno-zero-initialized-in-bss @gol
@@ -8957,6 +8959,13 @@ Enabled by default.
 @cindex OpenACC accelerator programming
 Warn about potentially suboptimal choices related to OpenACC parallelism.

+@item -Wopenacc-kernels-annotate-loops
+@opindex Wopenacc-kernels-annotate-loops
+@opindex Wno-Wopenacc-kernels-annotate-loops
+Warn about @code{for} (C/C++) or @code{DO} (Fortran) loops in OpenACC
+kernels regions that cannot be automatically annotated for
+parallelization with @option{-fopenacc-kernels-annotate-loops}.
+
 @item -Wopenmp-simd
 @opindex Wopenmp-simd
 @opindex Wno-openmp-simd
@@ -14835,6 +14844,27 @@ SIMD iterations.

 @end table

+@item -fno-openacc-kernels-annotate-loops
+@opindex fno-openacc-kernels-annotate-loops
+@opindex fopenacc-kernels-annotate-loops
+@cindex kernels regions, OpenACC
+Disable automatic parallelization of unannotated loops in OpenACC
+kernels regions.  The default is to attempt to add implicit
+@code{acc loop auto} annotations to loops in kernels regions if
+@option{-fopenacc} is enabled.
+
+Note that you can use @option{-Wopenacc-kernels-annotate-loops} to
+diagnose @code{for} loops that cannot be automatically annotated
+(@pxref{Warning Options}).  Reasons why automatic loop annotations
+cannot be applied include premature exits, calls to functions without
+an @code{openacc routine} annotation, or unstructured control flow in
+the loop body.  In C and C++, the loop variable initialization, end
+test, and increment expressions must additionally conform to
+restrictions similar to those for explicitly-annotated loops, and the
+loop variable must not be otherwise modified in the body of the loop.
+An explicit @code{acc loop} annotation disables automatic annotations
+on any nested or containing loops.
+
 @end table

 @node Instrumentation Options
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
index 1d12658790d1..e391184f403d 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
@@ -2,6 +2,7 @@
    OpenACC kernels.  */

 /* { dg-additional-options "-O2" }
+   { dg-additional-options "-fno-openacc-kernels-annotate-loops" }
    { dg-additional-options "-fopt-info-optimized-omp" }
    { dg-additional-options "-fdump-tree-ompexp" }
    { dg-additional-options "-fdump-tree-parloops1-all" }
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-kernels.c b/gcc/testsuite/c-c++-common/goacc/classify-kernels.c
index bdf7b4a06410..779e2b0a24db 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-kernels.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-kernels.c
@@ -1,7 +1,8 @@
 /* Check offloaded function's attributes and classification for OpenACC
-   kernels.  */
+   'kernels' (parloops version).  */

 /* { dg-additional-options "-O2" }
+   { dg-additional-options "-fno-openacc-kernels-annotate-loops" }
    { dg-additional-options "-fopt-info-optimized-omp" }
    { dg-additional-options "-fdump-tree-ompexp" }
    { dg-additional-options "-fdump-tree-parloops1-all" }
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-counter-var-redundant-load.c b/gcc/testsuite/c-c++-common/goacc/kernels-counter-var-redundant-load.c
index 030425475495..c37152c74041 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-counter-var-redundant-load.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-counter-var-redundant-load.c
@@ -1,4 +1,5 @@
 /* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */
 /* { dg-additional-options "-fdump-tree-dom3" } */

 #include <stdlib.h>
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c b/gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c
index c475333f1aef..b1f43029af7c 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c
@@ -1,4 +1,5 @@
 /* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c b/gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c
index 8f7f415b58d8..e87aab3295c7 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c
@@ -1,4 +1,5 @@
 /* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */
 /* { dg-additional-options "-fopt-info-optimized-omp" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c b/gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c
index c11d36fb4373..2323857fb4ad 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c
@@ -1,4 +1,5 @@
 /* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */
 /* { dg-additional-options "-fopt-info-optimized-omp" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c
index acef6a1a1793..adca30bf2cd7 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c
@@ -1,4 +1,5 @@
 /* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c
index 75e2bb78cea4..5f16085ff386 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c
@@ -1,4 +1,5 @@
 /* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-1.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-1.c
new file mode 100644
index 000000000000..c7b5ac882195
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-1.c
@@ -0,0 +1,26 @@
+/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-fdump-tree-original" } */
+/* { dg-do compile } */
+
+/* Test that all loops in the nest are annotated.  */
+
+void f (float a[16][16], float b[16][16], float c[16][16])
+{
+  int i, j, k;
+
+#pragma acc kernels copyin(a[0:16][0:16], b[0:16][0:16]) copyout(c[0:16][0:16])
+  {
+    for (i = 0; i < 16; i++) {
+      for (j = 0; j < 16; j++) {
+       float t = 0;
+       for (k = 0; k < 16; k++)
+         t += a[i][k] * b[k][j];
+       c[i][j] = t;
+      }
+    }
+  }
+
+}
+
+/* { dg-final { scan-tree-dump-times "acc loop auto" 3 "original" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-10.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-10.c
new file mode 100644
index 000000000000..58b41d20e232
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-10.c
@@ -0,0 +1,32 @@
+/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-fdump-tree-original" } */
+/* { dg-do compile } */
+
+/* Test that a loop with a random goto in the body can't be annotated.  */
+
+#define n 16
+
+float f (float *a, float *b)
+{
+  float t = 0;
+  int i;
+
+#pragma acc kernels
+  {
+    for (i = 0; i < n; i++)    /* { dg-warning "loop cannot be annotated" } */
+      {
+       if (a[i] < 0)
+         {
+           t = 0;
+           goto bad;
+         }
+       t += a[i] * b[i];
+      }
+  bad:
+    ;
+  }
+  return t;
+}
+
+/* { dg-final { scan-tree-dump-times "acc loop auto" 0 "original" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-11.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-11.c
new file mode 100644
index 000000000000..e9d2ef48611a
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-11.c
@@ -0,0 +1,27 @@
+/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-fdump-tree-original" } */
+/* { dg-do compile } */
+
+/* Test that a loop with a random label in the body triggers a warning.  */
+
+#define n 16
+
+float f (float *a, float *b)
+{
+  float t = 0;
+  int i = n - 1;
+
+#pragma acc kernels
+  {
+    goto spaghetti;
+    for (i = 0; i < n; i++)    /* { dg-warning "loop cannot be annotated" } */
+      {
+      spaghetti:
+       t += a[i] * b[i];
+      }
+  }
+  return t;
+}
+
+/* { dg-final { scan-tree-dump-times "acc loop auto" 0 "original" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-12.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-12.c
new file mode 100644
index 000000000000..ba408bc3634d
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-12.c
@@ -0,0 +1,28 @@
+/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-fdump-tree-original" } */
+/* { dg-do compile } */
+
+/* Test that in a situation with nested loops, a problem that prevents
+   annotation of the inner loop only still allows the outer loop to be
+   annotated.  */
+
+float f (float *a, float *b, int n)
+{
+  float t = 0;
+
+#pragma acc kernels
+  {
+    for (int i = 0; i < n; i++)
+      for (int j = 0; j <= i; j++)  /* { dg-warning "loop cannot be annotated" } */
+       {
+         if (a[i] < 0 || b[j] < 0)
+           j = i;
+         else
+           t += a[i] * b[j];
+       }
+  }
+  return t;
+}
+
+/* { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-13.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-13.c
new file mode 100644
index 000000000000..64433e816ed4
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-13.c
@@ -0,0 +1,27 @@
+/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-fdump-tree-original" } */
+/* { dg-do compile } */
+
+/* Test that in a situation with nested loops, a problem that prevents
+   annotation of the outer loop only still allows the inner loop to be
+   annotated.  */
+
+float f (float *a, float *b, int n)
+{
+  float t = 0;
+
+#pragma acc kernels
+  {
+    for (int i = 0; i < n; i++)          /* { dg-warning "loop cannot be annotated" } */
+      {
+       if (a[i] < 0)
+         n = i;
+       for (int j = 0; j <= i; j++)
+         t += a[i] * b[j];
+      }
+  }
+  return t;
+}
+
+/* { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-14.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-14.c
new file mode 100644
index 000000000000..379e6baf97c3
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-14.c
@@ -0,0 +1,22 @@
+/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-fdump-tree-original" } */
+/* { dg-do compile } */
+
+/* Test that an explicit annotation on an outer loop suppresses annotation
+   of inner loops, and produces a diagnostic.  */
+
+void f (float *a, float *b)
+{
+  float t = 0;
+
+#pragma acc kernels
+  {
+#pragma acc loop seq
+    for (int l = 0; l < 20; l++)
+      for (int m = 0; m < 20; m++)     /* { dg-warning "loop cannot be annotated" } */
+        b[m] = a[m];
+  }
+}
+
+/* { dg-final { scan-tree-dump-times "acc loop auto" 0 "original" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-15.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-15.c
new file mode 100644
index 000000000000..9a2a7cabde5d
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-15.c
@@ -0,0 +1,22 @@
+/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-fdump-tree-original" } */
+/* { dg-do compile } */
+
+/* Test that an explicit annotation on an inner loop suppresses annotation
+   of outer loops, and produces a diagnostic.  */
+
+void f (float *a, float *b)
+{
+  float t = 0;
+
+#pragma acc kernels
+  {
+    for (int l = 0; l < 20; l++)       /* { dg-warning "loop cannot be annotated" } */
+#pragma acc loop seq
+      for (int m = 0; m < 20; m++)
+        b[m] = a[m];
+  }
+}
+
+/* { dg-final { scan-tree-dump-times "acc loop auto" 0 "original" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-16.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-16.c
new file mode 100644
index 000000000000..075f897fad4a
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-16.c
@@ -0,0 +1,26 @@
+/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-fdump-tree-original" } */
+/* { dg-do compile } */
+
+/* Test that a loop with a modification of the loop variable in the
+   body cannot be annotated.  */
+
+float f (float *a, float *b, int n)
+{
+  float t = 0;
+
+#pragma acc kernels
+  {
+    for (int i = 0; i < n; i++)        /* { dg-warning "loop cannot be annotated" } */
+      {
+       if (a[i] < 0 || b[i] < 0)
+         i = n;
+       else
+         t += a[i] * b[i];
+      }
+  }
+  return t;
+}
+
+/* { dg-final { scan-tree-dump-times "acc loop auto" 0 "original" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-17.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-17.c
new file mode 100644
index 000000000000..507678965b4d
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-17.c
@@ -0,0 +1,26 @@
+/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-fdump-tree-original" } */
+/* { dg-do compile } */
+
+/* Test that a loop with a modification of the loop iteration count
+   variable in the body cannot be annotated.  */
+
+float f (float *a, float *b, int n)
+{
+  float t = 0;
+
+#pragma acc kernels
+  {
+    for (int i = 0; i < n; i++)        /* { dg-warning "loop cannot be annotated" } */
+      {
+       if (a[i] < 0 || b[i] < 0)
+         n = i;
+       else
+         t += a[i] * b[i];
+      }
+  }
+  return t;
+}
+
+/* { dg-final { scan-tree-dump-times "acc loop auto" 0 "original" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-2.c
new file mode 100644
index 000000000000..9e0a946828ff
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-2.c
@@ -0,0 +1,21 @@
+/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-fdump-tree-original" } */
+/* { dg-do compile } */
+
+/* Test that a loop with a variable bound can be annotated.  */
+
+float f (float *a, float *b, int n)
+{
+  float t = 0;
+  int i;
+
+#pragma acc kernels
+  {
+    for (i = 0; i < n; i++)
+      t += a[i] * b[i];
+  }
+  return t;
+}
+
+/* { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-3.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-3.c
new file mode 100644
index 000000000000..f60070e27961
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-3.c
@@ -0,0 +1,24 @@
+/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-fdump-tree-original" } */
+/* { dg-do compile } */
+
+/* Test that a loop with a conditional in the body can be annotated.  */
+
+#define n 16
+
+float f (float *a, float *b)
+{
+  float t = 0;
+  int i;
+
+#pragma acc kernels
+  {
+    for (i = 0; i < n; i++)
+      if (a[i] > 0 && b[i] > 0)
+       t += a[i] * b[i];
+  }
+  return t;
+}
+
+/* { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-4.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-4.c
new file mode 100644
index 000000000000..949871cc42ec
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-4.c
@@ -0,0 +1,34 @@
+/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-fdump-tree-original" } */
+/* { dg-do compile } */
+
+/* Test that a loop with a switch and break in the body can be annotated.  */
+
+#define n 16
+
+float f (float *a, float *b, int state)
+{
+  float t = 0;
+  int i;
+
+#pragma acc kernels
+  {
+    for (i = 0; i < n; i++)
+      switch (state)
+       {
+       case 0:
+       default:
+         t += a[i] * b[i];
+         break;
+
+       case 1:
+         if (a[i] > 0 && b[i] > 0)
+           t += a[i] * b[i];
+         break;
+       }
+  }
+  return t;
+}
+
+/* { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-5.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-5.c
new file mode 100644
index 000000000000..03dfe8fbcd40
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-5.c
@@ -0,0 +1,27 @@
+/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-fdump-tree-original" } */
+/* { dg-do compile } */
+
+/* Test that a loop with a continue statement in the body can be annotated.  */
+
+#define n 16
+
+float f (float *a, float *b)
+{
+  float t = 0;
+  int i;
+
+#pragma acc kernels
+  {
+    for (i = 0; i < n; i++)
+      {
+       if (a[i] < 0 || b[i] < 0)
+         continue;
+       t += a[i] * b[i];
+      }
+  }
+  return t;
+}
+
+/* { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-6.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-6.c
new file mode 100644
index 000000000000..ede6b3c8cd67
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-6.c
@@ -0,0 +1,27 @@
+/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-fdump-tree-original" } */
+/* { dg-do compile } */
+
+/* Test that a loop with a break statement in the body cannot be annotated.  */
+
+#define n 16
+
+float f (float *a, float *b)
+{
+  float t = 0;
+  int i;
+
+#pragma acc kernels
+  {
+    for (i = 0; i < n; i++)    /* { dg-warning "loop cannot be annotated" } */
+      {
+       if (a[i] < 0 || b[i] < 0)
+         break;
+       t += a[i] * b[i];
+      }
+  }
+  return t;
+}
+
+/* { dg-final { scan-tree-dump-times "acc loop auto" 0 "original" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-7.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-7.c
new file mode 100644
index 000000000000..20ee29989665
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-7.c
@@ -0,0 +1,26 @@
+/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-fdump-tree-original" } */
+/* { dg-do compile } */
+
+/* Test that a loop with a random function call in the body cannot be
+   annotated.  */
+
+extern float g (float);
+
+#define n 16
+
+float f (float *a, float *b)
+{
+  float t = 0;
+  int i;
+
+#pragma acc kernels
+  {
+    for (i = 0; i < n; i++)    /* { dg-warning "loop cannot be annotated" } */
+      t += g (a[i] * b[i]);
+  }
+  return t;
+}
+
+/* { dg-final { scan-tree-dump-times "acc loop auto" 0 "original" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-8.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-8.c
new file mode 100644
index 000000000000..796f048d67ca
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-8.c
@@ -0,0 +1,27 @@
+/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-fdump-tree-original" } */
+/* { dg-do compile } */
+
+/* Test that a loop with an openacc function call in the body can be
+   annotated.  */
+
+#pragma acc routine worker
+extern float g (float);
+
+#define n 16
+
+float f (float *a, float *b)
+{
+  float t = 0;
+  int i;
+
+#pragma acc kernels
+  {
+    for (i = 0; i < n; i++)
+      t += g (a[i] * b[i]);
+  }
+  return t;
+}
+
+/* { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-9.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-9.c
new file mode 100644
index 000000000000..048f1b09a84d
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-9.c
@@ -0,0 +1,26 @@
+/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-fdump-tree-original" } */
+/* { dg-do compile } */
+
+/* Test that a kernels loop with a return in the body triggers a hard
+   error.  */
+
+#define n 16
+
+float f (float *a, float *b)
+{
+  float t = 0;
+  int i;
+
+#pragma acc kernels
+  {
+    for (i = 0; i < n; i++)
+      {
+       if (a[i] < 0 || b[i] < 0)
+         return 0.0;   /* { dg-error "invalid branch" } */
+       t += a[i] * b[i];
+      }
+  }
+  return t;
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-2.c
index 71800217991a..9a97de6f6e13 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-2.c
@@ -1,4 +1,5 @@
 /* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit-2.c
index 0c9f83312408..31e8378e3d74 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit-2.c
@@ -1,4 +1,5 @@
 /* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit.c
index 0bd21b68d317..ad591551b979 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit.c
@@ -1,4 +1,5 @@
 /* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-update.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-update.c
index dd5a84146a8e..4acffef41ba1 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-update.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-update.c
@@ -1,4 +1,5 @@
 /* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data.c
index a658182de904..327aa0570c9c 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data.c
@@ -1,4 +1,5 @@
 /* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c
index 73b469d70610..26c65fe742aa 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c
@@ -1,5 +1,6 @@
 /* { dg-additional-options "-O2" } */
 /* { dg-additional-options "-g" } */
+/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c
index 55926230d578..8955cf29224b 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c
@@ -1,4 +1,5 @@
 /* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c
index e86be1b1cdc0..d88a61dbab51 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c
@@ -1,4 +1,5 @@
 /* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c
index 2b0e186ae297..5943d56a5bbe 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c
@@ -1,4 +1,5 @@
 /* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop.c
index 9619d53b43d7..ad525cdbe141 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop.c
@@ -1,4 +1,5 @@
 /* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c b/gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c
index 69539b24a78d..f799baffd8df 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c
@@ -1,4 +1,5 @@
 /* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c b/gcc/testsuite/c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c
index 81b0fee5a44c..b8093b54dec8 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c
@@ -1,4 +1,5 @@
 /* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-reduction.c b/gcc/testsuite/c-c++-common/goacc/kernels-reduction.c
index 5921b88920fd..105cbcf3ba2e 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-reduction.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-reduction.c
@@ -1,4 +1,5 @@
 /* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */

--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 02/40] Add -fno-openacc-kernels-annotate-loops option to more testcases.
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
  2021-12-15 15:54 ` [PATCH 01/40] Kernels loops annotation: C and C++ Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2021-12-15 15:54 ` [PATCH 03/40] Kernels loops annotation: Fortran Frederik Harwath
                   ` (37 subsequent siblings)
  39 siblings, 0 replies; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: Sandra Loosemore, thomas

From: Sandra Loosemore <sandra@codesourcery.com>

2020-03-27  Sandra Loosemore  <sandra@codesourcery.com>

        gcc/testsuite/
        * c-c++-common/goacc/kernels-decompose-2.c: Add
        -fno-openacc-kernels-annotate-loops.
---
 gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c
index cdf85d4bafae..0f2d2f0a757b 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c
@@ -1,5 +1,6 @@
 /* Test OpenACC 'kernels' construct decomposition.  */

+/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */
 /* { dg-additional-options "-fopt-info-omp-all" } */
 /* { dg-additional-options "--param=openacc-kernels=decompose" }
 /* { dg-additional-options "-O2" } for 'parloops'.  */
--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 03/40] Kernels loops annotation: Fortran.
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
  2021-12-15 15:54 ` [PATCH 01/40] Kernels loops annotation: C and C++ Frederik Harwath
  2021-12-15 15:54 ` [PATCH 02/40] Add -fno-openacc-kernels-annotate-loops option to more testcases Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2021-12-15 15:54 ` [PATCH 04/40] Additional Fortran testsuite fixes for kernels loops annotation pass Frederik Harwath
                   ` (36 subsequent siblings)
  39 siblings, 0 replies; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: Sandra Loosemore, thomas, tobias, fortran, Gergö Barany

From: Sandra Loosemore <sandra@codesourcery.com>

This patch implements the Fortran support for adding "#pragma acc loop auto"
annotations to loops in OpenACC kernels regions.  It implements the same
-fopenacc-kernels-annotate-loops and -Wopenacc-kernels-annotate-loops options
that were previously added (and documented) for the C/C++ front ends.

Co-Authored-By: Gergö Barany <gergo@codesourcery.com>

gcc/fortran/
        * gfortran.h (gfc_oacc_annotate_loops_in_kernels_regions): Declare.
        * lang.opt (Wopenacc-kernels-annotate-loops): New.
        (fopenacc-kernels-annotate-loops): New.
        * openmp.c: Include options.h.
        (enum annotation_state, enum annotation_result): New.
        (check_code_for_invalid_calls): New.
        (check_expr_for_invalid_calls): New.
        (check_for_invalid_calls): New.
        (annotate_do_loop): New.
        (annotate_do_loops_in_kernels): New.
        (compute_goto_targets): New.
        (gfc_oacc_annotate_loops_in_kernels_regions): New.
        * parse.c (gfc_parse_file): Handle -fopenacc-kernels-annotate-loops.

gcc/testsuite/
        * gfortran.dg/goacc/classify-kernels-unparallelized.f95: Add
        -fno-openacc-kernels-annotate-loops option.
        * gfortran.dg/goacc/classify-kernels.f95: Likewise.
        * gfortran.dg/goacc/common-block-3.f90: Likewise.
        * gfortran.dg/goacc/kernels-loop-2.f95: Likewise.
        * gfortran.dg/goacc/kernels-loop-data-2.f95: Likewise.
        * gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95: Likewise.
        * gfortran.dg/goacc/kernels-loop-data-enter-exit.f95: Likewise.
        * gfortran.dg/goacc/kernels-loop-data-update.f95: Likewise.
        * gfortran.dg/goacc/kernels-loop-data.f95: Likewise.
        * gfortran.dg/goacc/kernels-loop-n.f95: Likewise.
        * gfortran.dg/goacc/kernels-loop.f95: Likewise.
        * gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95:
        Likewise.
        * gfortran.dg/goacc/kernels-loop-annotation-1.f95: New.
        * gfortran.dg/goacc/kernels-loop-annotation-2.f95: New.
        * gfortran.dg/goacc/kernels-loop-annotation-3.f95: New.
        * gfortran.dg/goacc/kernels-loop-annotation-4.f95: New.
        * gfortran.dg/goacc/kernels-loop-annotation-5.f95: New.
        * gfortran.dg/goacc/kernels-loop-annotation-6.f95: New.
        * gfortran.dg/goacc/kernels-loop-annotation-7.f95: New.
        * gfortran.dg/goacc/kernels-loop-annotation-8.f95: New.
        * gfortran.dg/goacc/kernels-loop-annotation-9.f95: New.
        * gfortran.dg/goacc/kernels-loop-annotation-10.f95: New.
        * gfortran.dg/goacc/kernels-loop-annotation-11.f95: New.
        * gfortran.dg/goacc/kernels-loop-annotation-12.f95: New.
        * gfortran.dg/goacc/kernels-loop-annotation-13.f95: New.
        * gfortran.dg/goacc/kernels-loop-annotation-14.f95: New.
        * gfortran.dg/goacc/kernels-loop-annotation-15.f95: New.
        * gfortran.dg/goacc/kernels-loop-annotation-16.f95: New.
---
 gcc/fortran/gfortran.h                        |   1 +
 gcc/fortran/lang.opt                          |   8 +
 gcc/fortran/openmp.c                          | 364 ++++++++++++++++++
 gcc/fortran/parse.c                           |   9 +
 .../goacc/classify-kernels-unparallelized.f95 |   1 +
 .../gfortran.dg/goacc/classify-kernels.f95    |   1 +
 .../gfortran.dg/goacc/common-block-3.f90      |   1 +
 .../gfortran.dg/goacc/kernels-loop-2.f95      |   1 +
 .../goacc/kernels-loop-annotation-1.f95       |  33 ++
 .../goacc/kernels-loop-annotation-10.f95      |  32 ++
 .../goacc/kernels-loop-annotation-11.f95      |  34 ++
 .../goacc/kernels-loop-annotation-12.f95      |  39 ++
 .../goacc/kernels-loop-annotation-13.f95      |  38 ++
 .../goacc/kernels-loop-annotation-14.f95      |  35 ++
 .../goacc/kernels-loop-annotation-15.f95      |  35 ++
 .../goacc/kernels-loop-annotation-16.f95      |  34 ++
 .../goacc/kernels-loop-annotation-2.f95       |  32 ++
 .../goacc/kernels-loop-annotation-3.f95       |  33 ++
 .../goacc/kernels-loop-annotation-4.f95       |  34 ++
 .../goacc/kernels-loop-annotation-5.f95       |  35 ++
 .../goacc/kernels-loop-annotation-6.f95       |  34 ++
 .../goacc/kernels-loop-annotation-7.f95       |  48 +++
 .../goacc/kernels-loop-annotation-8.f95       |  50 +++
 .../goacc/kernels-loop-annotation-9.f95       |  34 ++
 .../gfortran.dg/goacc/kernels-loop-data-2.f95 |   1 +
 .../goacc/kernels-loop-data-enter-exit-2.f95  |   1 +
 .../goacc/kernels-loop-data-enter-exit.f95    |   1 +
 .../goacc/kernels-loop-data-update.f95        |   1 +
 .../gfortran.dg/goacc/kernels-loop-data.f95   |   1 +
 .../gfortran.dg/goacc/kernels-loop-n.f95      |   1 +
 .../gfortran.dg/goacc/kernels-loop.f95        |   1 +
 .../kernels-parallel-loop-data-enter-exit.f95 |   1 +
 32 files changed, 974 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-10.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-15.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-16.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-2.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-3.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-4.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-5.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-6.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-7.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-8.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-9.f95

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index f7662c59a5df..50db768ce0fc 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -3545,6 +3545,7 @@ void gfc_resolve_oacc_declare (gfc_namespace *);
 void gfc_resolve_oacc_parallel_loop_blocks (gfc_code *, gfc_namespace *);
 void gfc_resolve_oacc_blocks (gfc_code *, gfc_namespace *);
 void gfc_resolve_oacc_routines (gfc_namespace *);
+void gfc_oacc_annotate_loops_in_kernels_regions (gfc_namespace *);

 /* expr.c */
 void gfc_free_actual_arglist (gfc_actual_arglist *);
diff --git a/gcc/fortran/lang.opt b/gcc/fortran/lang.opt
index 6db01c736be1..a202c04c4a25 100644
--- a/gcc/fortran/lang.opt
+++ b/gcc/fortran/lang.opt
@@ -289,6 +289,10 @@ Wopenacc-parallelism
 Fortran
 ; Documented in C

+Wopenacc-kernels-annotate-loops
+Fortran
+; Documented in C
+
 Wopenmp-simd
 Fortran
 ; Documented in C
@@ -695,6 +699,10 @@ fopenacc-dim=
 Fortran LTO Joined Var(flag_openacc_dims)
 ; Documented in C

+fopenacc-kernels-annotate-loops
+Fortran LTO Optimization
+; Documented in C
+
 fopenmp
 Fortran LTO
 ; Documented in C
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index dcf22ac2c2f3..243b5e0a9ac6 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -29,6 +29,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic.h"
 #include "gomp-constants.h"
 #include "target-memory.h"  /* For gfc_encode_character.  */
+#include "options.h"

 /* Match an end of OpenMP directive.  End of OpenMP directive is optional
    whitespace, followed by '\n' or comment '!'.  */
@@ -9090,3 +9091,366 @@ gfc_resolve_omp_udrs (gfc_symtree *st)
   for (omp_udr = st->n.omp_udr; omp_udr; omp_udr = omp_udr->next)
     gfc_resolve_omp_udr (omp_udr);
 }
+
+
+/* The following functions implement automatic recognition and annotation of
+   DO loops in OpenACC kernels regions.  Inside a kernels region, a nest of
+   DO loops that does not contain any annotated OpenACC loops, nor EXIT
+   or GOTO statements, gets an automatic "acc loop auto" annotation
+   on each loop.
+   This feature is controlled by flag_openacc_kernels_annotate_loops.  */
+
+
+/* State of annotation state traversal for DO loops in kernels regions.  */
+enum annotation_state {
+  as_outer,
+  as_in_kernels_region,
+  as_in_kernels_loop,
+  as_in_kernels_inner_loop
+};
+
+/* Return status of annotation traversal.  */
+enum annotation_result {
+  ar_ok,
+  ar_invalid_loop,
+  ar_invalid_nest
+};
+
+/* Code walk function for check_for_invalid_calls.  */
+
+static int
+check_code_for_invalid_calls (gfc_code **codep, int *walk_subtrees,
+                             void *data ATTRIBUTE_UNUSED)
+{
+  gfc_code *code = *codep;
+  switch (code->op)
+    {
+    case EXEC_CALL:
+      /* Calls to openacc routines are permitted.  */
+      if (code->resolved_sym
+         && (code->resolved_sym->attr.oacc_routine_lop
+             != OACC_ROUTINE_LOP_NONE))
+       return 0;
+      /* Else fall through.  */
+
+    case EXEC_CALL_PPC:
+    case EXEC_ASSIGN_CALL:
+      gfc_warning (OPT_Wopenacc_kernels_annotate_loops,
+                  "Subroutine call at %L prevents annotation of loop nest",
+                  &code->loc);
+      *walk_subtrees = 0;
+      return 1;
+
+    default:
+      return 0;
+    }
+}
+
+/* Expr walk function for check_for_invalid_calls.  */
+
+static int
+check_expr_for_invalid_calls (gfc_expr **exprp, int *walk_subtrees,
+                             void *data ATTRIBUTE_UNUSED)
+{
+  gfc_expr *expr = *exprp;
+  switch (expr->expr_type)
+    {
+    case EXPR_FUNCTION:
+      if (expr->value.function.esym
+         && (expr->value.function.esym->attr.oacc_routine_lop
+             != OACC_ROUTINE_LOP_NONE))
+       return 0;
+      /* Else fall through.  */
+
+    case EXPR_COMPCALL:
+      gfc_warning (OPT_Wopenacc_kernels_annotate_loops,
+                  "Function call at %L prevents annotation of loop nest",
+                  &expr->where);
+      *walk_subtrees = 0;
+      return 1;
+
+    default:
+      return 0;
+    }
+}
+
+/* Return TRUE if the DO loop CODE contains function or procedure
+   calls that ought to prohibit annotation.  This traversal is
+   separate from the main annotation tree walk because we need to walk
+   expressions as well as executable statements.  */
+
+static bool
+check_for_invalid_calls (gfc_code *code)
+{
+  gcc_assert (code->op == EXEC_DO);
+  return gfc_code_walker (&code, check_code_for_invalid_calls,
+                         check_expr_for_invalid_calls, NULL);
+}
+
+/* Annotate DO loop CODE with OpenACC "loop auto".  */
+
+static void
+annotate_do_loop (gfc_code *code, gfc_code *parent)
+{
+
+  /* A DO loop's body is another phony DO node whose next pointer starts
+     the actual body.  */
+  gcc_assert (code->op == EXEC_DO);
+  gcc_assert (code->block->op == EXEC_DO);
+
+  /* Build the "acc loop auto" annotation and add the loop as its
+     body.  */
+  gfc_omp_clauses *clauses = gfc_get_omp_clauses ();
+  clauses->par_auto = 1;
+  gfc_code *oacc_loop = gfc_get_code (EXEC_OACC_LOOP);
+  oacc_loop->block = gfc_get_code (EXEC_OACC_LOOP);
+  oacc_loop->block->next = code;
+  oacc_loop->ext.omp_clauses = clauses;
+  oacc_loop->loc = code->loc;
+  oacc_loop->block->loc = code->loc;
+
+  /* Splice the annotation into the place of the original loop.  */
+  if (parent->block == code)
+    parent->block = oacc_loop;
+  else
+    {
+      gfc_code *prev = parent->block;
+      while (prev != code && prev->next != code)
+       {
+         prev = prev->next;
+         gcc_assert (prev != NULL);
+       }
+      prev->next = oacc_loop;
+    }
+  oacc_loop->next = code->next;
+  code->next = NULL;
+}
+
+/* Recursively traverse CODE in block PARENT, finding OpenACC kernels
+   regions.  GOTO_TARGETS keeps track of statement labels that are
+   targets of gotos in the current function, while STATE keeps track
+   of the current context of the traversal.  If the traversal
+   encounters a DO loop inside a kernels region, annotate it with
+   OpenACC loop directives if appropriate.  Return the status of the
+   traversal.  */
+
+static enum annotation_result
+annotate_do_loops_in_kernels (gfc_code *code, gfc_code *parent,
+                             hash_set <gfc_st_label *> *goto_targets,
+                             annotation_state state)
+{
+  gfc_code *next_code = NULL;
+  enum annotation_result retval = ar_ok;
+
+  for ( ; code; code = next_code)
+    {
+      bool walk_block = true;
+      next_code = code->next;
+
+      if (state >= as_in_kernels_loop
+         && code->here && goto_targets->contains (code->here))
+       /* This statement has a label that is the target of a GOTO or some
+          other jump.  Do not try to sort out the details, just reject
+          this loop nest.  */
+       {
+         gfc_warning (OPT_Wopenacc_kernels_annotate_loops,
+                      "Possible control transfer to label at %L "
+                      "prevents annotation of loop nest",
+                      &code->loc);
+         return ar_invalid_nest;
+       }
+
+      switch (code->op)
+       {
+       case EXEC_OACC_KERNELS:
+         /* Enter kernels region.  */
+         annotate_do_loops_in_kernels (code->block->next, code,
+                                       goto_targets,
+                                       as_in_kernels_region);
+         walk_block = false;
+         break;
+
+       case EXEC_OACC_PARALLEL_LOOP:
+       case EXEC_OACC_PARALLEL:
+       case EXEC_OACC_KERNELS_LOOP:
+       case EXEC_OACC_LOOP:
+         /* Do not try to add automatic OpenACC annotations inside manually
+            annotated loops.  Presumably, the user avoided doing it on
+            purpose; for example, all available levels of parallelism may
+            have been used up.  */
+         if (state >= as_in_kernels_region)
+           {
+             gfc_warning (OPT_Wopenacc_kernels_annotate_loops,
+                          "Explicit loop annotation at %L "
+                          "prevents annotation of loop nest",
+                          &code->loc);
+             return ar_invalid_nest;
+           }
+         walk_block = false;
+         break;
+
+       case EXEC_DO:
+         if (state >= as_in_kernels_region)
+           {
+             /* A DO loop's body is another phony DO node whose next
+                pointer starts the actual body.  Skip the phony node.  */
+             gcc_assert (code->block->op == EXEC_DO);
+             enum annotation_result result
+               = annotate_do_loops_in_kernels (code->block->next, code,
+                                               goto_targets,
+                                               as_in_kernels_loop);
+             /* Check for function/procedure calls in the body of the
+                loop that would prevent parallelization.  Unlike in C/C++,
+                we do not have to check that there is no modification of
+                the loop variable or loop count since they are already
+                handled by the semantics of DO loops in the FORTRAN
+                language.  */
+             if (result != ar_invalid_nest && check_for_invalid_calls (code))
+               result = ar_invalid_nest;
+             if (result == ar_ok)
+               annotate_do_loop (code, parent);
+             else if (result == ar_invalid_nest
+                      && state >= as_in_kernels_loop)
+               /* The outer loop is invalid, too, so stop traversal.  */
+               return result;
+             walk_block = false;
+           }
+         break;
+
+       case EXEC_DO_WHILE:
+       case EXEC_DO_CONCURRENT:
+         /* Traverse the body in a special state to allow EXIT statements
+            from these loops.  */
+         if (state >= as_in_kernels_loop)
+           {
+             enum annotation_result result
+               = annotate_do_loops_in_kernels (code->block, code,
+                                               goto_targets,
+                                               as_in_kernels_inner_loop);
+             if (result == ar_invalid_nest)
+               return result;
+             else if (result != ar_ok)
+               retval = result;
+             walk_block = false;
+           }
+         break;
+
+       case EXEC_GOTO:
+       case EXEC_ARITHMETIC_IF:
+       case EXEC_STOP:
+       case EXEC_ERROR_STOP:
+         /* A jump that may leave this loop.  */
+         if (state >= as_in_kernels_loop)
+           {
+             gfc_warning (OPT_Wopenacc_kernels_annotate_loops,
+                          "Possible unstructured control flow at %L "
+                          "prevents annotation of loop nest",
+                          &code->loc);
+             return ar_invalid_nest;
+           }
+         break;
+
+       case EXEC_RETURN:
+         /* A return from a kernels region is diagnosed elsewhere as a
+            hard error, so no warning is needed here.  */
+         if (state >= as_in_kernels_loop)
+           return ar_invalid_nest;
+         break;
+
+       case EXEC_EXIT:
+         if (state == as_in_kernels_loop)
+           {
+             gfc_warning (OPT_Wopenacc_kernels_annotate_loops,
+                          "Exit at %L prevents annotation of loop",
+                          &code->loc);
+             retval = ar_invalid_loop;
+           }
+         break;
+
+       case EXEC_BACKSPACE:
+       case EXEC_CLOSE:
+       case EXEC_ENDFILE:
+       case EXEC_FLUSH:
+       case EXEC_INQUIRE:
+       case EXEC_OPEN:
+       case EXEC_READ:
+       case EXEC_REWIND:
+       case EXEC_WRITE:
+         /* Executing side-effecting I/O statements in parallel doesn't
+            make much sense.  If this is what users want, they can always
+            add explicit annotations on the loop nest.  */
+         if (state >= as_in_kernels_loop)
+           {
+             gfc_warning (OPT_Wopenacc_kernels_annotate_loops,
+                          "I/O statement at %L prevents annotation of loop",
+                          &code->loc);
+             return ar_invalid_nest;
+           }
+         break;
+
+       default:
+         break;
+       }
+
+      /* Visit nested statements, if any, returning early if we hit
+        any problems.  */
+      if (walk_block)
+       {
+         enum annotation_result result
+           = annotate_do_loops_in_kernels (code->block, code,
+                                           goto_targets, state);
+         if (result == ar_invalid_nest)
+           return result;
+         else if (result != ar_ok)
+           retval = result;
+       }
+    }
+  return retval;
+}
+
+/* Traverse CODE to find all the labels referenced by GOTO and similar
+   statements and store them in GOTO_TARGETS.  */
+
+static void
+compute_goto_targets (gfc_code *code, hash_set <gfc_st_label *> *goto_targets)
+{
+  for ( ; code; code = code->next)
+    {
+      switch (code->op)
+       {
+       case EXEC_GOTO:
+       case EXEC_LABEL_ASSIGN:
+         goto_targets->add (code->label1);
+         gcc_fallthrough ();
+
+       case EXEC_ARITHMETIC_IF:
+         goto_targets->add (code->label2);
+         goto_targets->add (code->label3);
+         gcc_fallthrough ();
+
+       default:
+         /* Visit nested statements, if any.  */
+         if (code->block != NULL)
+           compute_goto_targets (code->block, goto_targets);
+       }
+    }
+}
+
+/* Find DO loops in OpenACC kernels regions that do not have OpenACC
+   annotations but look like they might benefit from automatic
+   parallelization.  Add "acc loop auto" annotations for them.  Assumes
+   flag_openacc_kernels_annotate_loops is set.  */
+
+void
+gfc_oacc_annotate_loops_in_kernels_regions (gfc_namespace *ns)
+{
+  if (ns->proc_name)
+    {
+      hash_set <gfc_st_label *> goto_targets;
+      compute_goto_targets (ns->code, &goto_targets);
+      annotate_do_loops_in_kernels (ns->code, NULL, &goto_targets, as_outer);
+    }
+
+  for (ns = ns->contained; ns; ns = ns->sibling)
+    gfc_oacc_annotate_loops_in_kernels_regions (ns);
+}
diff --git a/gcc/fortran/parse.c b/gcc/fortran/parse.c
index 12aa80ec45ca..04e9d2450b16 100644
--- a/gcc/fortran/parse.c
+++ b/gcc/fortran/parse.c
@@ -6912,6 +6912,15 @@ done:
   if (flag_c_prototypes || flag_c_prototypes_external)
     fprintf (stdout, "\n#ifdef __cplusplus\n}\n#endif\n");

+  /* Add annotations on loops in OpenACC kernels regions if requested.  This
+     is most easily done on this representation close to the source code.  */
+  if (flag_openacc && flag_openacc_kernels_annotate_loops)
+    {
+      gfc_current_ns = gfc_global_ns_list;
+      for (; gfc_current_ns; gfc_current_ns = gfc_current_ns->sibling)
+       gfc_oacc_annotate_loops_in_kernels_regions (gfc_current_ns);
+    }
+
   /* Do the translation.  */
   translate_all_program_units (gfc_global_ns_list);

diff --git a/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95 b/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95
index 3fb48b321f2f..2ceae2088070 100644
--- a/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95
@@ -2,6 +2,7 @@
 ! OpenACC kernels.

 ! { dg-additional-options "-O2" }
+! { dg-additional-options "-fno-openacc-kernels-annotate-loops" }
 ! { dg-additional-options "-fopt-info-optimized-omp" }
 ! { dg-additional-options "-fdump-tree-ompexp" }
 ! { dg-additional-options "-fdump-tree-parloops1-all" }
diff --git a/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95 b/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95
index 6c8d298e236d..d061a241074b 100644
--- a/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95
@@ -2,6 +2,7 @@
 ! kernels.

 ! { dg-additional-options "-O2" }
+! { dg-additional-options "-fno-openacc-kernels-annotate-loops" }
 ! { dg-additional-options "-fopt-info-optimized-omp" }
 ! { dg-additional-options "-fdump-tree-ompexp" }
 ! { dg-additional-options "-fdump-tree-parloops1-all" }
diff --git a/gcc/testsuite/gfortran.dg/goacc/common-block-3.f90 b/gcc/testsuite/gfortran.dg/goacc/common-block-3.f90
index 5defe2ea85de..d2816c3e9364 100644
--- a/gcc/testsuite/gfortran.dg/goacc/common-block-3.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/common-block-3.f90
@@ -1,4 +1,5 @@
 ! { dg-options "-fopenacc -fdump-tree-omplower" }
+! { dg-additional-options "-fno-openacc-kernels-annotate-loops" }

 module consts
   integer, parameter :: n = 100
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95
index ef53324dd2a0..63774ffb5aff 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95
@@ -1,4 +1,5 @@
 ! { dg-additional-options "-O2" }
+! { dg-additional-options "-fno-openacc-kernels-annotate-loops" }
 ! { dg-additional-options "-fdump-tree-parloops1-all" }
 ! { dg-additional-options "-fdump-tree-optimized" }

diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95
new file mode 100644
index 000000000000..41f6307dbb17
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95
@@ -0,0 +1,33 @@
+! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-Wopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-fdump-tree-original" }
+! { dg-do compile }
+
+! Test that all loops in the nest are annotated.
+
+subroutine f (a, b, c)
+  implicit none
+
+  real, intent (in), dimension(16,16) :: a
+  real, intent (in), dimension(16,16) :: b
+  real, intent (out), dimension(16,16) :: c
+
+  integer :: i, j, k
+  real :: t
+
+!$acc kernels copyin(a(1:16,1:16), b(1:16,1:16)) copyout(c(1:16,1:16))
+
+  do i = 1, 16
+    do j = 1, 16
+      t = 0
+      do k = 1, 16
+        t = t + a(i,k) * b(k,j)
+      end do
+      c(i,j) = t;
+    end do
+  end do
+
+!$acc end kernels
+end subroutine f
+
+! { dg-final { scan-tree-dump-times "acc loop private\\(.\\) auto" 3 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-10.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-10.f95
new file mode 100644
index 000000000000..f612c5beb963
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-10.f95
@@ -0,0 +1,32 @@
+! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-Wopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-fdump-tree-original" }
+! { dg-do compile }
+
+! Test that a loop with a random goto in the body can't be annotated.
+
+function f (a, b)
+  implicit none
+
+  real :: f
+  real, intent (in), dimension (16) :: a, b
+
+  integer :: i
+  real :: t
+
+  t = 0.0
+
+!$acc kernels
+
+  do i = 1, 16
+    if (a(i) < 0 .or. b(i) < 0) then
+      go to 10  ! { dg-warning "Possible unstructured control flow" }
+    end if
+    t = t + a(i) * b(i)
+  end do
+
+10  f = t
+
+!$acc end kernels
+
+end function f
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95
new file mode 100644
index 000000000000..d51482e4685d
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95
@@ -0,0 +1,34 @@
+! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-Wopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-fdump-tree-original" }
+! { dg-additional-options "-std=legacy" }
+! { dg-do compile }
+
+! Test that a loop with a random label in the body cannot be annotated.
+
+function f (a, b)
+  implicit none
+
+  real :: f
+  real, intent (in), dimension (16) :: a, b
+
+  integer :: i
+  real :: t
+
+  t = 0.0
+
+!$acc kernels
+
+  goto 10
+
+  do i = 1, 16
+10  t = t + a(i) * b(i)  ! { dg-warning "Possible control transfer to label" }
+  end do
+
+  f = t
+
+!$acc end kernels
+
+end function f
+
+! { dg-final { scan-tree-dump-times "acc loop private.* auto" 0 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95
new file mode 100644
index 000000000000..3c4956d70775
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95
@@ -0,0 +1,39 @@
+! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-Wopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-fdump-tree-original" }
+! { dg-do compile }
+
+! Test that in a situation with nested loops, a problem that prevents
+! annotation of the inner loop only still allows the outer loop to be
+! annotated.
+
+function f (a, b)
+  implicit none
+
+  real :: f
+  real, intent (in), dimension (16) :: a, b
+
+  integer :: i, j
+  real :: t
+
+  t = 0.0
+
+!$acc kernels
+
+  do i = 1, 16
+    do j = 1, 16
+      if (a(i) < 0 .or. b(j) < 0) then
+        exit  ! { dg-warning "Exit" }
+      else
+        t = t + a(i) * b(j)
+      end if
+    end do
+  end do
+
+  f = t
+
+!$acc end kernels
+
+end function f
+
+! { dg-final { scan-tree-dump-times "acc loop private.* auto" 1 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95
new file mode 100644
index 000000000000..3ec459f0a8df
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95
@@ -0,0 +1,38 @@
+! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-Wopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-fdump-tree-original" }
+! { dg-do compile }
+
+! Test that in a situation with nested loops, a problem that prevents
+! annotation of the outer loop only still allows the inner loop to be
+! annotated.
+
+function f (a, b)
+  implicit none
+
+  real :: f
+  real, intent (in), dimension (16) :: a, b
+
+  integer :: i, j
+  real :: t
+
+  t = 0.0
+
+!$acc kernels
+
+  do i = 1, 16
+    if (a(i) < 0) then
+      exit  ! { dg-warning "Exit" }
+    end if
+    do j = 1, 16
+      t = t + a(i) * b(j)
+    end do
+  end do
+
+  f = t
+
+!$acc end kernels
+
+end function f
+
+! { dg-final { scan-tree-dump-times "acc loop private.* auto" 1 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95
new file mode 100644
index 000000000000..91f431cca432
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95
@@ -0,0 +1,35 @@
+! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-Wopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-fdump-tree-original" }
+! { dg-do compile }
+
+! Test that an explicit annotation on an outer loop suppresses annotation
+!  of inner loops, and produces a diagnostic.
+
+function f (a, b)
+  implicit none
+
+  real :: f
+  real, intent (in), dimension (16) :: a, b
+
+  integer :: i, j
+  real :: t
+
+  t = 0.0
+
+!$acc kernels
+
+!$acc loop seq  ! { dg-warning "Explicit loop annotation" }
+  do i = 1, 16
+    do j = 1, 16
+      t = t + a(i) * b(j)
+    end do
+  end do
+
+  f = t
+
+!$acc end kernels
+
+end function f
+
+! { dg-final { scan-tree-dump-times "acc loop private.* auto" 0 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-15.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-15.f95
new file mode 100644
index 000000000000..570c12d3ad70
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-15.f95
@@ -0,0 +1,35 @@
+! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-Wopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-fdump-tree-original" }
+! { dg-do compile }
+
+! Test that an explicit annotation on an inner loop suppresses annotation
+! of the outer loop, and produces a diagnostic.
+
+function f (a, b)
+  implicit none
+
+  real :: f
+  real, intent (in), dimension (16) :: a, b
+
+  integer :: i, j
+  real :: t
+
+  t = 0.0
+
+!$acc kernels
+
+  do i = 1, 16
+    !$acc loop seq  ! { dg-warning "Explicit loop annotation" }
+    do j = 1, 16
+      t = t + a(i) * b(j)
+    end do
+  end do
+
+  f = t
+
+!$acc end kernels
+
+end function f
+
+! { dg-final { scan-tree-dump-times "acc loop private.* auto" 0 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-16.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-16.f95
new file mode 100644
index 000000000000..6e44a304b28b
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-16.f95
@@ -0,0 +1,34 @@
+! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-Wopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-fdump-tree-original" }
+! { dg-do compile }
+
+! Test that loops containing I/O statements can't be annotated.
+
+function f (a, b)
+  implicit none
+
+  real :: f
+  real, intent (in), dimension (16) :: a, b
+
+  integer :: i, j
+  real :: t
+
+  t = 0.0
+
+!$acc kernels
+
+  do i = 1, 16
+    do j = 1, 16
+      print *, " i =", i, " j =", j  ! { dg-warning "I/O statement" }
+      t = t + a(i) * b(j)
+    end do
+  end do
+
+  f = t
+
+!$acc end kernels
+
+end function f
+
+! { dg-final { scan-tree-dump-times "acc loop private.* auto" 0 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-2.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-2.f95
new file mode 100644
index 000000000000..4624a05247d9
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-2.f95
@@ -0,0 +1,32 @@
+! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-Wopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-fdump-tree-original" }
+! { dg-do compile }
+
+! Test that a loop with a variable bound can be annotated.
+
+function f (a, b)
+  implicit none
+
+  real :: f
+  real, intent (in), dimension (:) :: a, b
+
+  integer :: i, n
+  real :: t
+
+  t = 0.0
+  n = size (a)
+
+!$acc kernels
+
+  do i = 1, n
+    t = t + a(i) * b(i)
+  end do
+
+  f = t
+
+!$acc end kernels
+
+end function f
+
+! { dg-final { scan-tree-dump-times "acc loop private.* auto" 1 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-3.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-3.f95
new file mode 100644
index 000000000000..daed8f7f6e9d
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-3.f95
@@ -0,0 +1,33 @@
+! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-Wopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-fdump-tree-original" }
+! { dg-do compile }
+
+! Test that a loop with a conditional in the body can be annotated.
+
+function f (a, b)
+  implicit none
+
+  real :: f
+  real, intent (in), dimension (16) :: a, b
+
+  integer :: i
+  real :: t
+
+  t = 0.0
+
+!$acc kernels
+
+  do i = 1, 16
+    if (a(i) > 0 .and. b(i) > 0) then
+      t = t + a(i) * b(i)
+    end if
+  end do
+
+  f = t
+
+!$acc end kernels
+
+end function f
+
+! { dg-final { scan-tree-dump-times "acc loop private.* auto" 1 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-4.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-4.f95
new file mode 100644
index 000000000000..0c4ad256b7eb
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-4.f95
@@ -0,0 +1,34 @@
+! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-Wopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-fdump-tree-original" }
+! { dg-do compile }
+
+! Test that a loop with a case construct in the body can be annotated.
+
+function f (a, b)
+  implicit none
+
+  real :: f
+  real, intent (in), dimension (16) :: a, b
+
+  integer :: i
+  real :: t
+
+!$acc kernels
+
+  do i = 1, 16
+    select case (i)
+      case (1)
+        t = a(i) * b(i)
+      case default
+        t = t + a(i) * b(i)
+    end select
+  end do
+
+  f = t
+
+!$acc end kernels
+
+end function f
+
+! { dg-final { scan-tree-dump-times "acc loop private.* auto" 1 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-5.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-5.f95
new file mode 100644
index 000000000000..1c3f87eed6e4
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-5.f95
@@ -0,0 +1,35 @@
+! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-Wopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-fdump-tree-original" }
+! { dg-do compile }
+
+! Test that a loop with a cycle statement in the body can be annotated.
+
+function f (a, b)
+  implicit none
+
+  real :: f
+  real, intent (in), dimension (16) :: a, b
+
+  integer :: i
+  real :: t
+
+  t = 0.0
+
+!$acc kernels
+
+  do i = 1, 16
+    if (a(i) < 0 .or. b(i) < 0) then
+      cycle
+    end if
+    t = t + a(i) * b(i)
+  end do
+
+  f = t
+
+!$acc end kernels
+
+end function f
+
+! { dg-final { scan-tree-dump-times "acc loop private.* auto" 1 "original" } }
+
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-6.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-6.f95
new file mode 100644
index 000000000000..43173a70df24
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-6.f95
@@ -0,0 +1,34 @@
+! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-Wopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-fdump-tree-original" }
+! { dg-do compile }
+
+! Test that a loop with a exit statement in the body cannot be annotated.
+
+function f (a, b)
+  implicit none
+
+  real :: f
+  real, intent (in), dimension (16) :: a, b
+
+  integer :: i
+  real :: t
+
+  t = 0.0
+
+!$acc kernels
+
+  do i = 1, 16
+    if (a(i) < 0 .or. b(i) < 0) then
+      exit     ! { dg-warning "Exit" }
+    end if
+    t = t + a(i) * b(i)
+  end do
+
+  f = t
+
+!$acc end kernels
+
+end function f
+
+! { dg-final { scan-tree-dump-times "acc loop private.* auto" 0 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-7.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-7.f95
new file mode 100644
index 000000000000..ec42213220e7
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-7.f95
@@ -0,0 +1,48 @@
+! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-Wopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-fdump-tree-original" }
+! { dg-do compile }
+
+! Test that a loop with a random function call in the body cannot
+! be annotated.
+
+
+function f (a, b)
+  implicit none
+
+  real :: f
+  real, intent (in), dimension (16) :: a, b
+
+  integer :: i
+  real :: t
+
+  interface
+    function g (x)
+      real :: g
+      real, intent (in) :: x
+    end function g
+
+    subroutine h (x)
+      real, intent (in) :: x
+    end subroutine h
+  end interface
+
+  t = 0.0
+
+!$acc kernels
+  do i = 1, 16
+    t = t + g (a(i) * b(i))  ! { dg-warning "Function call" }
+  end do
+
+  do i = 1, 16
+    call h (t) ! { dg-warning "Subroutine call" }
+    t = t + a(i) * b(i)
+  end do
+
+  f = t
+!$acc end kernels
+
+end function f
+
+! { dg-final { scan-tree-dump-times "acc loop private.* auto" 0 "original" } }
+
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-8.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-8.f95
new file mode 100644
index 000000000000..9188f70d9664
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-8.f95
@@ -0,0 +1,50 @@
+! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-Wopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-fdump-tree-original" }
+! { dg-do compile }
+
+! Test that a loop with a call to a declared openacc function/subroutine
+! can be annotated.
+
+
+function f (a, b)
+  implicit none
+
+  real :: f
+  real, intent (in), dimension (16) :: a, b
+
+  integer :: i
+  real :: t
+
+  interface
+    function g (x)
+      !$acc routine worker
+      real :: g
+      real, intent (in) :: x
+    end function g
+
+    subroutine h (x)
+      !$acc routine worker
+      real, intent (in) :: x
+    end subroutine h
+  end interface
+
+  t = 0.0
+
+!$acc kernels
+  do i = 1, 16
+    t = t + g (a(i) * b(i))
+  end do
+
+  do i = 1, 16
+    call h (t)
+    t = t + a(i) * b(i)
+  end do
+
+  f = t
+!$acc end kernels
+
+end function f
+
+! { dg-final { scan-tree-dump-times "acc loop private\\(i\\) auto" 2 "original" } }
+
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-9.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-9.f95
new file mode 100644
index 000000000000..f5aa5a0f43b5
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-9.f95
@@ -0,0 +1,34 @@
+! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-Wopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-fdump-tree-original" }
+! { dg-do compile }
+
+! Test that a loop with a return statement in the body gives a hard
+! error.
+
+function f (a, b)
+  implicit none
+
+  real :: f
+  real, intent (in), dimension (16) :: a, b
+
+  integer :: i
+  real :: t
+
+  t = 0.0
+
+!$acc kernels
+
+  do i = 1, 16
+    if (a(i) < 0 .or. b(i) < 0) then
+      f = 0.0
+      return   ! { dg-error "invalid branch" }
+    end if
+    t = t + a(i) * b(i)
+  end do
+
+  f = t
+
+!$acc end kernels
+
+end function f
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95
index 2f1dcd603a14..c1f6ef8df600 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95
@@ -1,4 +1,5 @@
 ! { dg-additional-options "-O2" }
+! { dg-additional-options "-fno-openacc-kernels-annotate-loops" }
 ! { dg-additional-options "-fdump-tree-parloops1-all" }
 ! { dg-additional-options "-fdump-tree-optimized" }

diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95
index 447e85d64483..313e3df7f63d 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95
@@ -1,4 +1,5 @@
 ! { dg-additional-options "-O2" }
+! { dg-additional-options "-fno-openacc-kernels-annotate-loops" }
 ! { dg-additional-options "-fdump-tree-parloops1-all" }
 ! { dg-additional-options "-fdump-tree-optimized" }

diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit.f95
index 4edb2889b7b1..26671064ba27 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit.f95
@@ -1,4 +1,5 @@
 ! { dg-additional-options "-O2" }
+! { dg-additional-options "-fno-openacc-kernels-annotate-loops" }
 ! { dg-additional-options "-fdump-tree-parloops1-all" }
 ! { dg-additional-options "-fdump-tree-optimized" }

diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-update.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-update.f95
index fc113e1f6602..d79ed796c366 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-update.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-update.f95
@@ -1,4 +1,5 @@
 ! { dg-additional-options "-O2" }
+! { dg-additional-options "-fno-openacc-kernels-annotate-loops" }
 ! { dg-additional-options "-fdump-tree-parloops1-all" }
 ! { dg-additional-options "-fdump-tree-optimized" }

diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data.f95
index 94522f586362..d8ef52af2e6a 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data.f95
@@ -1,4 +1,5 @@
 ! { dg-additional-options "-O2" }
+! { dg-additional-options "-fno-openacc-kernels-annotate-loops" }
 ! { dg-additional-options "-fdump-tree-parloops1-all" }
 ! { dg-additional-options "-fdump-tree-optimized" }

diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-n.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-n.f95
index b9c4aea074d7..6b7334144c87 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-n.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-n.f95
@@ -1,4 +1,5 @@
 ! { dg-additional-options "-O2" }
+! { dg-additional-options "-fno-openacc-kernels-annotate-loops" }
 ! { dg-additional-options "-fdump-tree-parloops1-all" }
 ! { dg-additional-options "-fdump-tree-optimized" }

diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95
index 6dc7b2e0f28f..aadfcfc41448 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95
@@ -1,4 +1,5 @@
 ! { dg-additional-options "-O2" }
+! { dg-additional-options "-fno-openacc-kernels-annotate-loops" }
 ! { dg-additional-options "-fdump-tree-parloops1-all" }
 ! { dg-additional-options "-fdump-tree-optimized" }

diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95
index 48c20b999423..0d45c5cf4338 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95
@@ -1,4 +1,5 @@
 ! { dg-additional-options "-O2" }
+! { dg-additional-options "-fno-openacc-kernels-annotate-loops" }
 ! { dg-additional-options "-fdump-tree-parloops1-all" }
 ! { dg-additional-options "-fdump-tree-optimized" }

--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 04/40] Additional Fortran testsuite fixes for kernels loops annotation pass.
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
                   ` (2 preceding siblings ...)
  2021-12-15 15:54 ` [PATCH 03/40] Kernels loops annotation: Fortran Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2021-12-15 15:54 ` [PATCH 05/40] Fix bug in processing of array dimensions in data clauses Frederik Harwath
                   ` (35 subsequent siblings)
  39 siblings, 0 replies; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: Sandra Loosemore, thomas, tobias, fortran

From: Sandra Loosemore <sandra@codesourcery.com>

2020-03-27  Sandra Loosemore  <sandra@codesourcery.com>

        gcc/testsuite/
        * gfortran.dg/goacc/classify-kernels-unparallelized.f95: Adjust
        line numbering.
        * gfortran.dg/goacc/classify-kernels.f95: Likewise.
        * gfortran.dg/goacc/kernels-decompose-2.f95: Add
        -fno-openacc-kernels-annotate-loops.
---
 .../gfortran.dg/goacc/classify-kernels-unparallelized.f95    | 5 +++--
 gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95         | 5 +++--
 gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95      | 1 +
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95 b/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95
index 2ceae2088070..00aac9aa94ea 100644
--- a/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95
@@ -23,8 +23,9 @@ program main

   call setup(a, b)

-  !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1)) ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
-  do i = 0, n - 1
+  !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1))
+  do i = 0, n - 1 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+                  ! { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" "" { target *-*-* } 24 }
      c(i) = a(f (i)) + b(f (i))
   end do
   !$acc end kernels
diff --git a/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95 b/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95
index d061a241074b..ba815319abf2 100644
--- a/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95
@@ -19,8 +19,9 @@ program main

   call setup(a, b)

-  !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1)) ! { dg-message "optimized: assigned OpenACC gang loop parallelism" }
-  do i = 0, n - 1
+  !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1))
+  do i = 0, n - 1 ! { dg-message "optimized: assigned OpenACC gang loop parallelism" }
+                  ! { dg-message "beginning .parloops. part in OpenACC .kernels. region" "" { target *-*-* } 20 }
      c(i) = a(i) + b(i)
   end do
   !$acc end kernels
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95
index 238482b91a49..04c998d11dad 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95
@@ -1,5 +1,6 @@
 ! Test OpenACC 'kernels' construct decomposition.

+! { dg-additional-options "-fno-openacc-kernels-annotate-loops" }
 ! { dg-additional-options "-fopt-info-omp-all" }
 ! { dg-additional-options "--param=openacc-kernels=decompose" }
 ! { dg-additional-options "-O2" } for 'parloops'.
--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 05/40] Fix bug in processing of array dimensions in data clauses.
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
                   ` (3 preceding siblings ...)
  2021-12-15 15:54 ` [PATCH 04/40] Additional Fortran testsuite fixes for kernels loops annotation pass Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2021-12-15 15:54 ` [PATCH 06/40] Add a "combined" flag for "acc kernels loop" etc directives Frederik Harwath
                   ` (34 subsequent siblings)
  39 siblings, 0 replies; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: Sandra Loosemore, thomas, jason, nathan

From: Sandra Loosemore <sandra@codesourcery.com>

The g++ front end wraps the array length and low_bound values in
NON_LVALUE_EXPR, causing the subsequent tests for INTEGER_CST to fail.
The test case c-c++-common/goacc/kernels-loop-annotation-1.c was
tickling this bug and giving bogus errors in g++ because it was falling
through to dynamic array code instead of recognizing the constant bounds.

This patch was posted upstream here
https://gcc.gnu.org/pipermail/gcc-patches/2020-March/542694.html
but not yet committed.  It may be that some other fix for this problem
is implemented on mainline instead; check before merging this patch.

2020-03-31  Sandra Loosemore  <sandra@codesourcery.com>

        gcc/cp/
        * semantics.c (handle_omp_array_sections_1): Call STRIP_NOPS
        on length and low_bound;
        (handle_omp_array_sections): Likewise.
---
 gcc/cp/semantics.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 2443d0327498..c2643d0a7a24 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -5145,6 +5145,10 @@ handle_omp_array_sections_1 (tree c, tree t, vec<tree> &types,
   if (length)
     length = mark_rvalue_use (length);
   /* We need to reduce to real constant-values for checks below.  */
+  if (length)
+    STRIP_NOPS (length);
+  if (low_bound)
+    STRIP_NOPS (low_bound);
   if (length)
     length = fold_simple (length);
   if (low_bound)
@@ -5457,6 +5461,11 @@ handle_omp_array_sections (tree c, enum c_omp_region_type ort)
          tree low_bound = TREE_PURPOSE (t);
          tree length = TREE_VALUE (t);

+         if (length)
+           STRIP_NOPS (length);
+         if (low_bound)
+           STRIP_NOPS (low_bound);
+
          i--;
          if (low_bound
              && TREE_CODE (low_bound) == INTEGER_CST
--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 06/40] Add a "combined" flag for "acc kernels loop" etc directives.
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
                   ` (4 preceding siblings ...)
  2021-12-15 15:54 ` [PATCH 05/40] Fix bug in processing of array dimensions in data clauses Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2021-12-15 15:54 ` [PATCH 07/40] Annotate inner loops in "acc kernels loop" directives (C/C++) Frederik Harwath
                   ` (33 subsequent siblings)
  39 siblings, 0 replies; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches
  Cc: Sandra Loosemore, thomas, joseph, jason, nathan, tobias, fortran

From: Sandra Loosemore <sandra@codesourcery.com>

2020-08-19  Sandra Loosemore  <sandra@codesourcery.com>

        gcc/
        * tree.h (OACC_LOOP_COMBINED): New.

        gcc/c/
        * c-parser.c (c_parser_oacc_loop): Set OACC_LOOP_COMBINED.

        gcc/cp/
        * parser.c (cp_parser_oacc_loop): Set OACC_LOOP_COMBINED.

        gcc/fortran/
        * trans-openmp.c (gfc_trans_omp_do): Add combined parameter,
        use it to set OACC_LOOP_COMBINED.  Update all call sites.
---
 gcc/c/c-parser.c           |  3 +++
 gcc/cp/parser.c            |  3 +++
 gcc/fortran/trans-openmp.c | 34 +++++++++++++++++++++-------------
 gcc/tree.h                 |  5 +++++
 4 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 80dd61d599ef..1258b48693de 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -17371,6 +17371,7 @@ c_parser_oacc_loop (location_t loc, c_parser *parser, char *p_name,
                    omp_clause_mask mask, tree *cclauses, bool *if_p)
 {
   bool is_parallel = ((mask >> PRAGMA_OACC_CLAUSE_REDUCTION) & 1) == 1;
+  bool is_combined = (cclauses != NULL);

   strcat (p_name, " loop");
   mask |= OACC_LOOP_CLAUSE_MASK;
@@ -17389,6 +17390,8 @@ c_parser_oacc_loop (location_t loc, c_parser *parser, char *p_name,
   tree block = c_begin_compound_stmt (true);
   tree stmt = c_parser_omp_for_loop (loc, parser, OACC_LOOP, clauses, NULL,
                                     if_p);
+  if (stmt && stmt != error_mark_node)
+    OACC_LOOP_COMBINED (stmt) = is_combined;
   block = c_end_compound_stmt (loc, block, true);
   add_stmt (block);

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 4c2075742d6a..c834d25b028f 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -44580,6 +44580,7 @@ cp_parser_oacc_loop (cp_parser *parser, cp_token *pragma_tok, char *p_name,
                     omp_clause_mask mask, tree *cclauses, bool *if_p)
 {
   bool is_parallel = ((mask >> PRAGMA_OACC_CLAUSE_REDUCTION) & 1) == 1;
+  bool is_combined = (cclauses != NULL);

   strcat (p_name, " loop");
   mask |= OACC_LOOP_CLAUSE_MASK;
@@ -44598,6 +44599,8 @@ cp_parser_oacc_loop (cp_parser *parser, cp_token *pragma_tok, char *p_name,
   tree block = begin_omp_structured_block ();
   int save = cp_parser_begin_omp_structured_block (parser);
   tree stmt = cp_parser_omp_for_loop (parser, OACC_LOOP, clauses, NULL, if_p);
+  if (stmt && stmt != error_mark_node)
+    OACC_LOOP_COMBINED (stmt) = is_combined;
   cp_parser_end_omp_structured_block (parser, save);
   add_stmt (finish_omp_structured_block (block));

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index e81c5588c53c..618e106791e5 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -4855,7 +4855,8 @@ typedef struct dovar_init_d {

 static tree
 gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock,
-                 gfc_omp_clauses *do_clauses, tree par_clauses)
+                 gfc_omp_clauses *do_clauses, tree par_clauses,
+                 bool combined)
 {
   gfc_se se;
   tree dovar, stmt, from, to, step, type, init, cond, incr, orig_decls;
@@ -5219,7 +5220,10 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock,
     case EXEC_OMP_DISTRIBUTE: stmt = make_node (OMP_DISTRIBUTE); break;
     case EXEC_OMP_LOOP: stmt = make_node (OMP_LOOP); break;
     case EXEC_OMP_TASKLOOP: stmt = make_node (OMP_TASKLOOP); break;
-    case EXEC_OACC_LOOP: stmt = make_node (OACC_LOOP); break;
+    case EXEC_OACC_LOOP:
+      stmt = make_node (OACC_LOOP);
+      OACC_LOOP_COMBINED (stmt) = combined;
+      break;
     default: gcc_unreachable ();
     }

@@ -5313,7 +5317,8 @@ gfc_trans_oacc_combined_directive (gfc_code *code)
     pblock = &block;
   else
     pushlevel ();
-  stmt = gfc_trans_omp_do (code, EXEC_OACC_LOOP, pblock, &loop_clauses, NULL);
+  stmt = gfc_trans_omp_do (code, EXEC_OACC_LOOP, pblock, &loop_clauses, NULL,
+                          true);
   protected_set_expr_location (stmt, loc);
   if (TREE_CODE (stmt) != BIND_EXPR)
     stmt = build3_v (BIND_EXPR, NULL, stmt, poplevel (1, 0));
@@ -6151,7 +6156,7 @@ gfc_trans_omp_do_simd (gfc_code *code, stmtblock_t *pblock,
     omp_do_clauses
       = gfc_trans_omp_clauses (&block, &clausesa[GFC_OMP_SPLIT_DO], code->loc);
   body = gfc_trans_omp_do (code, EXEC_OMP_SIMD, pblock ? pblock : &block,
-                          &clausesa[GFC_OMP_SPLIT_SIMD], omp_clauses);
+                          &clausesa[GFC_OMP_SPLIT_SIMD], omp_clauses, false);
   if (pblock == NULL)
     {
       if (TREE_CODE (body) != BIND_EXPR)
@@ -6209,7 +6214,7 @@ gfc_trans_omp_parallel_do (gfc_code *code, bool is_loop, stmtblock_t *pblock,
     }
   stmt = gfc_trans_omp_do (code, is_loop ? EXEC_OMP_LOOP : EXEC_OMP_DO,
                           new_pblock, &clausesa[GFC_OMP_SPLIT_DO],
-                          omp_clauses);
+                          omp_clauses, false);
   if (pblock == NULL)
     {
       if (TREE_CODE (stmt) != BIND_EXPR)
@@ -6496,7 +6501,8 @@ gfc_trans_omp_distribute (gfc_code *code, gfc_omp_clauses *clausesa)
     case EXEC_OMP_TARGET_TEAMS_DISTRIBUTE_SIMD:
     case EXEC_OMP_TEAMS_DISTRIBUTE_SIMD:
       stmt = gfc_trans_omp_do (code, EXEC_OMP_SIMD, &block,
-                              &clausesa[GFC_OMP_SPLIT_SIMD], NULL_TREE);
+                              &clausesa[GFC_OMP_SPLIT_SIMD], NULL_TREE,
+                              false);
       if (TREE_CODE (stmt) != BIND_EXPR)
        stmt = build3_v (BIND_EXPR, NULL, stmt, poplevel (1, 0));
       else
@@ -6555,13 +6561,13 @@ gfc_trans_omp_teams (gfc_code *code, gfc_omp_clauses *clausesa,
     case EXEC_OMP_TEAMS_DISTRIBUTE:
       stmt = gfc_trans_omp_do (code, EXEC_OMP_DISTRIBUTE, NULL,
                               &clausesa[GFC_OMP_SPLIT_DISTRIBUTE],
-                              NULL);
+                              NULL, false);
       break;
     case EXEC_OMP_TARGET_TEAMS_LOOP:
     case EXEC_OMP_TEAMS_LOOP:
       stmt = gfc_trans_omp_do (code, EXEC_OMP_LOOP, NULL,
                               &clausesa[GFC_OMP_SPLIT_DO],
-                              NULL);
+                              NULL, false);
       break;
     default:
       stmt = gfc_trans_omp_distribute (code, clausesa);
@@ -6641,7 +6647,8 @@ gfc_trans_omp_target (gfc_code *code)
       break;
     case EXEC_OMP_TARGET_SIMD:
       stmt = gfc_trans_omp_do (code, EXEC_OMP_SIMD, &block,
-                              &clausesa[GFC_OMP_SPLIT_SIMD], NULL_TREE);
+                              &clausesa[GFC_OMP_SPLIT_SIMD], NULL_TREE,
+                              false);
       if (TREE_CODE (stmt) != BIND_EXPR)
        stmt = build3_v (BIND_EXPR, NULL, stmt, poplevel (1, 0));
       else
@@ -6712,7 +6719,8 @@ gfc_trans_omp_taskloop (gfc_code *code, gfc_exec_op op)
       break;
     case EXEC_OMP_TASKLOOP_SIMD:
       stmt = gfc_trans_omp_do (code, EXEC_OMP_SIMD, &block,
-                              &clausesa[GFC_OMP_SPLIT_SIMD], NULL_TREE);
+                              &clausesa[GFC_OMP_SPLIT_SIMD], NULL_TREE,
+                              false);
       if (TREE_CODE (stmt) != BIND_EXPR)
        stmt = build3_v (BIND_EXPR, NULL, stmt, poplevel (1, 0));
       else
@@ -6756,7 +6764,7 @@ gfc_trans_omp_master_masked_taskloop (gfc_code *code, gfc_exec_op op)
       stmt = gfc_trans_omp_do (code, EXEC_OMP_TASKLOOP, NULL,
                               code->op != EXEC_OMP_MASTER_TASKLOOP
                               ? &clausesa[GFC_OMP_SPLIT_TASKLOOP]
-                              : code->ext.omp_clauses, NULL);
+                              : code->ext.omp_clauses, NULL, false);
     }
   if (TREE_CODE (stmt) != BIND_EXPR)
     stmt = build3_v (BIND_EXPR, NULL, stmt, poplevel (1, 0));
@@ -7119,7 +7127,7 @@ gfc_trans_oacc_directive (gfc_code *code)
       return gfc_trans_oacc_construct (code);
     case EXEC_OACC_LOOP:
       return gfc_trans_omp_do (code, code->op, NULL, code->ext.omp_clauses,
-                              NULL);
+                              NULL, false);
     case EXEC_OACC_UPDATE:
     case EXEC_OACC_CACHE:
     case EXEC_OACC_ENTER_DATA:
@@ -7159,7 +7167,7 @@ gfc_trans_omp_directive (gfc_code *code)
     case EXEC_OMP_SIMD:
     case EXEC_OMP_TASKLOOP:
       return gfc_trans_omp_do (code, code->op, NULL, code->ext.omp_clauses,
-                              NULL);
+                              NULL, false);
     case EXEC_OMP_DISTRIBUTE_PARALLEL_DO:
     case EXEC_OMP_DISTRIBUTE_PARALLEL_DO_SIMD:
     case EXEC_OMP_DISTRIBUTE_SIMD:
diff --git a/gcc/tree.h b/gcc/tree.h
index 7542d97ce121..15e5147f40b0 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1524,6 +1524,11 @@ class auto_suppress_location_wrappers
 #define OMP_MASKED_COMBINED(NODE) \
   (OMP_MASKED_CHECK (NODE)->base.private_flag)

+/* True on an OACC_LOOP statement if it is part of a combined construct,
+   for example "#pragma acc kernels loop".  */
+#define OACC_LOOP_COMBINED(NODE) \
+  (OACC_LOOP_CHECK (NODE)->base.private_flag)
+
 /* Memory order for OMP_ATOMIC*.  */
 #define OMP_ATOMIC_MEMORY_ORDER(NODE) \
   (TREE_RANGE_CHECK (NODE, OMP_ATOMIC, \
--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 07/40] Annotate inner loops in "acc kernels loop" directives (C/C++).
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
                   ` (5 preceding siblings ...)
  2021-12-15 15:54 ` [PATCH 06/40] Add a "combined" flag for "acc kernels loop" etc directives Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2021-12-15 15:54 ` [PATCH 08/40] Annotate inner loops in "acc kernels loop" directives (Fortran) Frederik Harwath
                   ` (32 subsequent siblings)
  39 siblings, 0 replies; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: Sandra Loosemore, thomas, joseph, jason, nathan

From: Sandra Loosemore <sandra@codesourcery.com>

Normally explicit loop directives in a kernels region inhibit
automatic annotation of other loops in the same nest, on the theory
that users have indicated they want manual control over that section
of code.  However there seems to be an expectation in user code that
the combined "kernels loop" directive should still allow annotation of
inner loops.  This patch implements this behavior for C and C++.

2020-08-19  Sandra Loosemore  <sandra@codesourcery.com>

        gcc/c-family/
        * c-omp.c (annotate_loops_in_kernels_regions): Process inner
        loops in combined "acc kernels loop" directives.

        gcc/testsuite/
        * c-c++-common/goacc/kernels-loop-annotation-18.c: New.
        * c-c++-common/goacc/kernels-loop-annotation-19.c: New.
        * c-c++-common/goacc/combined-directives.c: Adjust expected
        patterns.
---
 gcc/c-family/c-omp.c                          | 36 ++++++++++++-------
 .../c-c++-common/goacc/combined-directives.c  |  2 +-
 .../goacc/kernels-loop-annotation-18.c        | 18 ++++++++++
 .../goacc/kernels-loop-annotation-19.c        | 19 ++++++++++
 4 files changed, 62 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-18.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-19.c

diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c
index fad50da8fbc4..30757877eafe 100644
--- a/gcc/c-family/c-omp.c
+++ b/gcc/c-family/c-omp.c
@@ -3477,18 +3477,30 @@ annotate_loops_in_kernels_regions (tree *nodeptr, int *walk_subtrees,
       /* Do not try to add automatic OpenACC annotations inside manually
         annotated loops.  Presumably, the user avoided doing it on
         purpose; for example, all available levels of parallelism may
-        have been used up.  */
-      {
-       struct annotation_info nested_info
-         = { NULL_TREE, NULL_TREE, false, as_explicit_annotation,
-             node, info };
-       if (info->state >= as_in_kernels_region)
-         do_not_annotate_loop_nest (info, as_explicit_annotation,
-                                    node);
-       walk_tree (&OMP_BODY (node), annotate_loops_in_kernels_regions,
-                  (void *) &nested_info, NULL);
-       *walk_subtrees = 0;
-      }
+        have been used up.  However, assume that the combined construct
+        "#pragma acc kernels loop" means to try to process the whole
+        loop nest.
+        Note that a single OACC_LOOP construct represents an entire set
+        of collapsed loops so we do not have to deal explicitly with the
+        collapse clause here, as the Fortran front end does.  */
+      if (info->state == as_in_kernels_region && OACC_LOOP_COMBINED (node))
+       {
+         walk_tree (&OMP_BODY (node), annotate_loops_in_kernels_regions,
+                    (void *) info, NULL);
+         *walk_subtrees = 0;
+       }
+      else
+       {
+         struct annotation_info nested_info
+           = { NULL_TREE, NULL_TREE, false, as_explicit_annotation,
+               node, info };
+         if (info->state >= as_in_kernels_region)
+           do_not_annotate_loop_nest (info, as_explicit_annotation,
+                                      node);
+         walk_tree (&OMP_BODY (node), annotate_loops_in_kernels_regions,
+                    (void *) &nested_info, NULL);
+         *walk_subtrees = 0;
+       }
       break;

     case FOR_STMT:
diff --git a/gcc/testsuite/c-c++-common/goacc/combined-directives.c b/gcc/testsuite/c-c++-common/goacc/combined-directives.c
index c2a3c57b48b8..2519f23d49f0 100644
--- a/gcc/testsuite/c-c++-common/goacc/combined-directives.c
+++ b/gcc/testsuite/c-c++-common/goacc/combined-directives.c
@@ -110,7 +110,7 @@ test ()
 // { dg-final { scan-tree-dump-times "acc loop worker" 2 "gimple" } }
 // { dg-final { scan-tree-dump-times "acc loop vector" 2 "gimple" } }
 // { dg-final { scan-tree-dump-times "acc loop seq" 2 "gimple" } }
-// { dg-final { scan-tree-dump-times "acc loop auto" 2 "gimple" } }
+// { dg-final { scan-tree-dump-times "acc loop auto" 6 "gimple" } }
 // { dg-final { scan-tree-dump-times "acc loop tile.2, 3" 2 "gimple" } }
 // { dg-final { scan-tree-dump-times "acc loop independent private.i" 2 "gimple" } }
 // { dg-final { scan-tree-dump-times "private.z" 2 "gimple" } }
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-18.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-18.c
new file mode 100644
index 000000000000..89ec6447625f
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-18.c
@@ -0,0 +1,18 @@
+/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-fdump-tree-original" } */
+/* { dg-do compile } */
+
+/* Test that "acc kernels loop" directive causes annotation of the entire
+   loop nest.  */
+
+void f (float *a, float *b)
+{
+#pragma acc kernels loop
+  for (int k = 0; k < 20; k++)
+    for (int l = 0; l < 20; l++)
+      for (int m = 0; m < 20; m++)
+       b[m] = a[m];
+}
+
+/* { dg-final { scan-tree-dump-times "acc loop auto" 2 "original" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-19.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-19.c
new file mode 100644
index 000000000000..77a3b7a9136d
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-19.c
@@ -0,0 +1,19 @@
+/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-fdump-tree-original" } */
+/* { dg-do compile } */
+
+/* Test that "acc kernels loop" directive causes annotation of the entire
+   loop nest in the presence of a collapse clause.  */
+
+void f (float *a, float *b)
+{
+#pragma acc kernels loop collapse(2)
+  for (int k = 0; k < 20; k++)
+    for (int l = 0; l < 20; l++)
+      for (int m = 0; m < 20; m++)
+       b[m] = a[m];
+}
+
+/* { dg-final { scan-tree-dump-times "acc loop collapse.2." 1 "original" } } */
+/* { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } } */
--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 08/40] Annotate inner loops in "acc kernels loop" directives (Fortran).
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
                   ` (6 preceding siblings ...)
  2021-12-15 15:54 ` [PATCH 07/40] Annotate inner loops in "acc kernels loop" directives (C/C++) Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2021-12-15 15:54 ` [PATCH 09/40] Permit calls to builtins and intrinsics in kernels loops Frederik Harwath
                   ` (31 subsequent siblings)
  39 siblings, 0 replies; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: Sandra Loosemore, thomas, tobias, fortran

From: Sandra Loosemore <sandra@codesourcery.com>

Normally explicit loop directives in a kernels region inhibit
automatic annotation of other loops in the same nest, on the theory
that users have indicated they want manual control over that section
of code.  However there seems to be an expectation in user code that
the combined "kernels loop" directive should still allow annotation of
inner loops.  This patch implements this behavior in Fortran.

2020-08-19  Sandra Loosemore  <sandra@codesourcery.com>

        gcc/fortran/
        * openmp.c (annotate_do_loops_in_kernels): Handle
        EXEC_OACC_KERNELS_LOOP separately to permit annotation of inner
        loops in a combined "acc kernels loop" directive.

        gcc/testsuite/
        * gfortran.dg/goacc/kernels-loop-annotation-18.f95: New.
        * gfortran.dg/goacc/kernels-loop-annotation-19.f95: New.
        * gfortran.dg/goacc/combined-directives.f90: Adjust expected
        patterns.
        * gfortran.dg/goacc/private-explicit-kernels-1.f95: Likewise.
        * gfortran.dg/goacc/private-predetermined-kernels-1.f95:
        Likewise.
---
 gcc/fortran/openmp.c                          | 50 ++++++++++++++++++-
 .../gfortran.dg/goacc/combined-directives.f90 | 19 +++++--
 .../goacc/kernels-loop-annotation-18.f95      | 28 +++++++++++
 .../goacc/kernels-loop-annotation-19.f95      | 29 +++++++++++
 .../goacc/private-explicit-kernels-1.f95      |  7 ++-
 .../goacc/private-predetermined-kernels-1.f95 |  7 ++-
 6 files changed, 131 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-18.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-19.f95

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 243b5e0a9ac6..b0b68b494778 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -9272,7 +9272,6 @@ annotate_do_loops_in_kernels (gfc_code *code, gfc_code *parent,

        case EXEC_OACC_PARALLEL_LOOP:
        case EXEC_OACC_PARALLEL:
-       case EXEC_OACC_KERNELS_LOOP:
        case EXEC_OACC_LOOP:
          /* Do not try to add automatic OpenACC annotations inside manually
             annotated loops.  Presumably, the user avoided doing it on
@@ -9317,6 +9316,55 @@ annotate_do_loops_in_kernels (gfc_code *code, gfc_code *parent,
            }
          break;

+       case EXEC_OACC_KERNELS_LOOP:
+         /* This is a combined "acc kernels loop" directive.  We want to
+            leave the outer loop alone but try to annotate any nested
+            loops in the body.  The expected structure nesting here is
+              EXEC_OACC_KERNELS_LOOP
+                EXEC_OACC_KERNELS_LOOP
+                  EXEC_DO
+                    EXEC_DO
+                      ...body...  */
+         if (code->block)
+           /* Might be empty?  */
+           {
+             gcc_assert (code->block->op == EXEC_OACC_KERNELS_LOOP);
+             gfc_omp_clauses *clauses = code->ext.omp_clauses;
+             int collapse = clauses->collapse;
+             gfc_expr_list *tile = clauses->tile_list;
+             gfc_code *inner = code->block->next;
+
+             gcc_assert (inner->op == EXEC_DO);
+             gcc_assert (inner->block->op == EXEC_DO);
+
+             /* We need to skip over nested loops covered by "collapse" or
+                "tile" clauses.  "Tile" takes precedence
+                (see gfc_trans_omp_do).  */
+             if (tile)
+               {
+                 collapse = 0;
+                 for (gfc_expr_list *el = tile; el; el = el->next)
+                   collapse++;
+               }
+             if (clauses->orderedc)
+               collapse = clauses->orderedc;
+             if (collapse <= 0)
+               collapse = 1;
+             for (int i = 1; i < collapse; i++)
+               {
+                 gcc_assert (inner->op == EXEC_DO);
+                 gcc_assert (inner->block->op == EXEC_DO);
+                 inner = inner->block->next;
+               }
+             if (inner)
+               /* Loop might have empty body?  */
+               annotate_do_loops_in_kernels (inner->block->next,
+                                             inner, goto_targets,
+                                             as_in_kernels_region);
+           }
+         walk_block = false;
+         break;
+
        case EXEC_DO_WHILE:
        case EXEC_DO_CONCURRENT:
          /* Traverse the body in a special state to allow EXIT statements
diff --git a/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90 b/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90
index 956349204f4d..562a4e40cd7d 100644
--- a/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90
@@ -139,10 +139,21 @@ end subroutine test

 ! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. collapse.2." 2 "gimple" } }
 ! { dg-final { scan-tree-dump-times "acc loop private.i. gang" 2 "gimple" } }
-! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. worker" 2 "gimple" } }
-! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. vector" 2 "gimple" } }
-! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. seq" 2 "gimple" } }
-! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. auto" 2 "gimple" } }
+
+! These are the parallel loop variants.
+! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. worker" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. vector" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. seq" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. auto" 1 "gimple" } }
+
+! These are the kernels loop variants.  Here the inner loops are annotated
+! separately.
+! { dg-final { scan-tree-dump-times "acc loop private.i. worker" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "acc loop private.i. vector" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "acc loop private.i. seq" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "acc loop private.i. auto" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "acc loop auto private.j." 4 "gimple" } }
+
 ! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. tile.2, 3" 2 "gimple" } }
 ! { dg-final { scan-tree-dump-times "acc loop private.i. independent" 2 "gimple" } }
 ! { dg-final { scan-tree-dump-times "private.z" 2 "gimple" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-18.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-18.f95
new file mode 100644
index 000000000000..e4e210a92dbb
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-18.f95
@@ -0,0 +1,28 @@
+! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-Wopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-fdump-tree-original" }
+! { dg-do compile }
+
+! Test that "acc kernels loop" directive causes annotation of the entire
+! loop nest.
+
+subroutine f (a, b)
+
+  implicit none
+  real, intent (in), dimension(20) :: a
+  real, intent (out), dimension(20) :: b
+  integer :: k, l, m
+
+!$acc kernels loop
+  do k = 1, 20
+    do l = 1, 20
+      do m = 1, 20
+       b(m) = a(m);
+      end do
+    end do
+  end do
+
+end subroutine f
+
+! { dg-final { scan-tree-dump-times "acc loop auto" 2 "original" } }
+
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-19.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-19.f95
new file mode 100644
index 000000000000..5dd6e7f538a6
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-19.f95
@@ -0,0 +1,29 @@
+! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-Wopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-fdump-tree-original" }
+! { dg-do compile }
+
+! Test that "acc kernels loop" directive causes annotation of the entire
+! loop nest in the presence of a collapse clause.
+
+subroutine f (a, b)
+
+  implicit none
+  real, intent (in), dimension(20) :: a
+  real, intent (out), dimension(20) :: b
+  integer :: k, l, m
+
+!$acc kernels loop collapse(2)
+  do k = 1, 20
+    do l = 1, 20
+      do m = 1, 20
+       b(m) = a(m);
+      end do
+    end do
+  end do
+
+end subroutine f
+
+! { dg-final { scan-tree-dump-times "acc loop .*collapse.2." 1 "original" } }
+! { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } }
+
diff --git a/gcc/testsuite/gfortran.dg/goacc/private-explicit-kernels-1.f95 b/gcc/testsuite/gfortran.dg/goacc/private-explicit-kernels-1.f95
index 5d563d226b0c..0c47045df9c8 100644
--- a/gcc/testsuite/gfortran.dg/goacc/private-explicit-kernels-1.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/private-explicit-kernels-1.f95
@@ -73,8 +73,9 @@ program test

   !$acc kernels loop private(i2_1_c, j2_1_c) independent
   ! { dg-final { scan-tree-dump-times "#pragma acc loop private\\(i2_1_c\\) private\\(j2_1_c\\) independent" 1 "original" } }
-  ! { dg-final { scan-tree-dump-times "#pragma acc loop private\\(i2_1_c\\) private\\(j2_1_c\\) independent" 1 "gimple" } }
+  ! { dg-final { scan-tree-dump-times "#pragma acc loop private\\(i2_1_c\\) independent" 1 "gimple" } }
   do i2_1_c = 1, 100
+  ! { dg-final { scan-tree-dump-times "#pragma acc loop auto private\\(j2_1_c\\)" 1 "gimple" } }
      do j2_1_c = 1, 100
      end do
   end do
@@ -130,9 +131,11 @@ program test

   !$acc kernels loop private(i3_1_c, j3_1_c, k3_1_c) independent
   ! { dg-final { scan-tree-dump-times "#pragma acc loop private\\(i3_1_c\\) private\\(j3_1_c\\) private\\(k3_1_c\\) independent" 1 "original" } }
-  ! { dg-final { scan-tree-dump-times "#pragma acc loop private\\(i3_1_c\\) private\\(j3_1_c\\) private\\(k3_1_c\\) independent" 1 "gimple" } }
+  ! { dg-final { scan-tree-dump-times "#pragma acc loop private\\(i3_1_c\\) independent" 1 "gimple" } }
   do i3_1_c = 1, 100
+  ! { dg-final { scan-tree-dump-times "#pragma acc loop auto private\\(j3_1_c\\)" 1 "gimple" } }
      do j3_1_c = 1, 100
+  ! { dg-final { scan-tree-dump-times "#pragma acc loop auto private\\(k3_1_c\\)" 1 "gimple" } }
         do k3_1_c = 1, 100
         end do
      end do
diff --git a/gcc/testsuite/gfortran.dg/goacc/private-predetermined-kernels-1.f95 b/gcc/testsuite/gfortran.dg/goacc/private-predetermined-kernels-1.f95
index 12a7854526a9..3357a20263e7 100644
--- a/gcc/testsuite/gfortran.dg/goacc/private-predetermined-kernels-1.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/private-predetermined-kernels-1.f95
@@ -73,8 +73,9 @@ program test

   !$acc kernels loop independent
   ! { dg-final { scan-tree-dump-times "#pragma acc loop private\\(i2_1_c\\) private\\(j2_1_c\\) independent" 1 "original" } }
-  ! { dg-final { scan-tree-dump-times "#pragma acc loop private\\(i2_1_c\\) private\\(j2_1_c\\) independent" 1 "gimple" } }
+  ! { dg-final { scan-tree-dump-times "#pragma acc loop private\\(i2_1_c\\) independent" 1 "gimple" } }
   do i2_1_c = 1, 100
+  ! { dg-final { scan-tree-dump-times "#pragma acc loop auto private\\(j2_1_c\\)" 1 "gimple" } }
      do j2_1_c = 1, 100
      end do
   end do
@@ -130,9 +131,11 @@ program test

   !$acc kernels loop independent
   ! { dg-final { scan-tree-dump-times "#pragma acc loop private\\(i3_1_c\\) private\\(j3_1_c\\) private\\(k3_1_c\\) independent" 1 "original" } }
-  ! { dg-final { scan-tree-dump-times "#pragma acc loop private\\(i3_1_c\\) private\\(j3_1_c\\) private\\(k3_1_c\\) independent" 1 "gimple" } }
+  ! { dg-final { scan-tree-dump-times "#pragma acc loop private\\(i3_1_c\\) independent" 1 "gimple" } }
   do i3_1_c = 1, 100
+  ! { dg-final { scan-tree-dump-times "#pragma acc loop auto private\\(j3_1_c\\)" 1 "gimple" } }
      do j3_1_c = 1, 100
+  ! { dg-final { scan-tree-dump-times "#pragma acc loop auto private\\(k3_1_c\\)" 1 "gimple" } }
         do k3_1_c = 1, 100
         end do
      end do
--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 09/40] Permit calls to builtins and intrinsics in kernels loops.
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
                   ` (7 preceding siblings ...)
  2021-12-15 15:54 ` [PATCH 08/40] Annotate inner loops in "acc kernels loop" directives (Fortran) Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2021-12-15 15:54 ` [PATCH 10/40] Fix patterns in Fortran tests for kernels loop annotation Frederik Harwath
                   ` (30 subsequent siblings)
  39 siblings, 0 replies; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches
  Cc: Sandra Loosemore, thomas, joseph, jason, nathan, tobias, fortran

From: Sandra Loosemore <sandra@codesourcery.com>

This tweak to the OpenACC kernels loop annotation relaxes the
restrictions on function calls in the loop body.  Normally calls to
functions not explicitly marked with a parallelism attribute are not
permitted, but C/C++ builtins and Fortran intrinsics have known
semantics so we can generally permit those without restriction.  If
any turn out to be problematical, we can add on here to recognize
them, or in the processing of the "auto" annotations.

2020-08-22  Sandra Loosemore  <sandra@codesourcery.com>

        gcc/c-family/
        * c-omp.c (annotate_loops_in_kernels_regions): Test for
        calls to builtins.

        gcc/fortran/
        * openmp.c (check_expr_for_invalid_calls): Check for intrinsic
        functions.

        gcc/testsuite/
        * c-c++-common/goacc/kernels-loop-annotation-20.c: New.
        * gfortran.dg/goacc/kernels-loop-annotation-20.f95: New.
---
 gcc/c-family/c-omp.c                          | 10 ++++---
 gcc/fortran/openmp.c                          |  9 ++++---
 .../goacc/kernels-loop-annotation-20.c        | 23 ++++++++++++++++
 .../goacc/kernels-loop-annotation-20.f95      | 26 +++++++++++++++++++
 4 files changed, 61 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-20.c
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-20.f95

diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c
index 30757877eafe..e7c27f45e888 100644
--- a/gcc/c-family/c-omp.c
+++ b/gcc/c-family/c-omp.c
@@ -3545,8 +3545,9 @@ annotate_loops_in_kernels_regions (tree *nodeptr, int *walk_subtrees,
       break;

     case CALL_EXPR:
-      /* Direct function calls to functions marked as OpenACC routines are
-        allowed.  Reject indirect calls or calls to non-routines.  */
+      /* Direct function calls to builtins and functions marked as
+        OpenACC routines are allowed.  Reject indirect calls or calls
+        to non-routines.  */
       if (info->state >= as_in_kernels_loop)
        {
          tree fn = CALL_EXPR_FN (node), fn_decl = NULL_TREE;
@@ -3560,8 +3561,9 @@ annotate_loops_in_kernels_regions (tree *nodeptr, int *walk_subtrees,
            }
          if (fn_decl == NULL_TREE)
            do_not_annotate_loop_nest (info, as_invalid_call, node);
-         else if (!lookup_attribute ("oacc function",
-                                     DECL_ATTRIBUTES (fn_decl)))
+         else if (!fndecl_built_in_p (fn_decl, BUILT_IN_NORMAL)
+                  && !lookup_attribute ("oacc function",
+                                        DECL_ATTRIBUTES (fn_decl)))
            do_not_annotate_loop_nest (info, as_invalid_call, node);
        }
       break;
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index b0b68b494778..d5d996e378d7 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -9156,9 +9156,12 @@ check_expr_for_invalid_calls (gfc_expr **exprp, int *walk_subtrees,
   switch (expr->expr_type)
     {
     case EXPR_FUNCTION:
-      if (expr->value.function.esym
-         && (expr->value.function.esym->attr.oacc_routine_lop
-             != OACC_ROUTINE_LOP_NONE))
+      /* Permit calls to Fortran intrinsic functions and to routines
+        with an explicitly declared parallelism level.  */
+      if (expr->value.function.isym
+         || (expr->value.function.esym
+             && (expr->value.function.esym->attr.oacc_routine_lop
+                 != OACC_ROUTINE_LOP_NONE)))
        return 0;
       /* Else fall through.  */

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-20.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-20.c
new file mode 100644
index 000000000000..5e3f02845713
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-20.c
@@ -0,0 +1,23 @@
+/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-fdump-tree-original" } */
+/* { dg-do compile } */
+
+/* Test that calls to built-in functions don't inhibit kernels loop
+   annotation.  */
+
+void foo (int n, int *input, int *out1, int *out2)
+{
+#pragma acc kernels
+  {
+    int i;
+
+    for (i = 0; i < n; i++)
+      {
+       out1[i] = __builtin_clz (input[i]);
+       out2[i] = __builtin_popcount (input[i]);
+      }
+  }
+}
+
+/* { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } } */
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-20.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-20.f95
new file mode 100644
index 000000000000..5169a0a1676d
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-20.f95
@@ -0,0 +1,26 @@
+! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-Wopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-fdump-tree-original" }
+! { dg-do compile }
+
+! Test that a loop with calls to intrinsics in the body can be annotated.
+
+subroutine f (n, input, out1, out2)
+  implicit none
+  integer :: n
+  integer, intent (in), dimension (n) :: input
+  integer, intent (out), dimension (n) :: out1, out2
+
+  integer :: i
+
+!$acc kernels
+
+  do i = 1, n
+      out1(i) = min (i, input(i))
+      out2(i) = not (input(i))
+  end do
+!$acc end kernels
+
+end subroutine f
+
+! { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } }
--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 10/40] Fix patterns in Fortran tests for kernels loop annotation.
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
                   ` (8 preceding siblings ...)
  2021-12-15 15:54 ` [PATCH 09/40] Permit calls to builtins and intrinsics in kernels loops Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2021-12-15 15:54 ` [PATCH 11/40] Clean up loop variable extraction in OpenACC " Frederik Harwath
                   ` (29 subsequent siblings)
  39 siblings, 0 replies; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: Sandra Loosemore, thomas, tobias, fortran

From: Sandra Loosemore <sandra@codesourcery.com>

Several of the Fortran tests for kernels loop annotation were failing
due to changes in the formatting of "acc loop" constructs in the dump
file.  Now the "auto" clause appears first, instead of after "private".

2020-08-23   Sandra Loosemore  <sandra@codesourcery.com>

        gcc/testsuite/
        * gfortran.dg/goacc/kernels-loop-annotation-1.f95: Update
        expected output.
        * gfortran.dg/goacc/kernels-loop-annotation-2.f95: Likewise.
        * gfortran.dg/goacc/kernels-loop-annotation-3.f95: Likewise.
        * gfortran.dg/goacc/kernels-loop-annotation-4.f95: Likewise.
        * gfortran.dg/goacc/kernels-loop-annotation-5.f95: Likewise.
        * gfortran.dg/goacc/kernels-loop-annotation-6.f95: Likewise.
        * gfortran.dg/goacc/kernels-loop-annotation-7.f95: Likewise.
        * gfortran.dg/goacc/kernels-loop-annotation-8.f95: Likewise.
        * gfortran.dg/goacc/kernels-loop-annotation-11.f95: Likewise.
        * gfortran.dg/goacc/kernels-loop-annotation-12.f95: Likewise.
        * gfortran.dg/goacc/kernels-loop-annotation-13.f95: Likewise.
        * gfortran.dg/goacc/kernels-loop-annotation-14.f95: Likewise.
        * gfortran.dg/goacc/kernels-loop-annotation-15.f95: Likewise.
        * gfortran.dg/goacc/kernels-loop-annotation-16.f95: Likewise.
---
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95  | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95 | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95 | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95 | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95 | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-15.f95 | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-16.f95 | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-2.f95  | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-3.f95  | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-4.f95  | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-5.f95  | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-6.f95  | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-7.f95  | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-8.f95  | 2 +-
 14 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95
index 41f6307dbb17..42e751dbfb83 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95
@@ -30,4 +30,4 @@ subroutine f (a, b, c)
 !$acc end kernels
 end subroutine f

-! { dg-final { scan-tree-dump-times "acc loop private\\(.\\) auto" 3 "original" } }
+! { dg-final { scan-tree-dump-times "acc loop auto" 3 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95
index d51482e4685d..6e2e2c41172b 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95
@@ -31,4 +31,4 @@ function f (a, b)

 end function f

-! { dg-final { scan-tree-dump-times "acc loop private.* auto" 0 "original" } }
+! { dg-final { scan-tree-dump-times "acc loop auto" 0 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95
index 3c4956d70775..03c4234ce7cd 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95
@@ -36,4 +36,4 @@ function f (a, b)

 end function f

-! { dg-final { scan-tree-dump-times "acc loop private.* auto" 1 "original" } }
+! { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95
index 3ec459f0a8df..6aeb3f2fe4d0 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95
@@ -35,4 +35,4 @@ function f (a, b)

 end function f

-! { dg-final { scan-tree-dump-times "acc loop private.* auto" 1 "original" } }
+! { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95
index 91f431cca432..7d1cff64a3d9 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95
@@ -32,4 +32,4 @@ function f (a, b)

 end function f

-! { dg-final { scan-tree-dump-times "acc loop private.* auto" 0 "original" } }
+! { dg-final { scan-tree-dump-times "acc loop auto" 0 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-15.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-15.f95
index 570c12d3ad70..dab0d4030d03 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-15.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-15.f95
@@ -32,4 +32,4 @@ function f (a, b)

 end function f

-! { dg-final { scan-tree-dump-times "acc loop private.* auto" 0 "original" } }
+! { dg-final { scan-tree-dump-times "acc loop auto" 0 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-16.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-16.f95
index 6e44a304b28b..15ef670e246d 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-16.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-16.f95
@@ -31,4 +31,4 @@ function f (a, b)

 end function f

-! { dg-final { scan-tree-dump-times "acc loop private.* auto" 0 "original" } }
+! { dg-final { scan-tree-dump-times "acc loop auto" 0 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-2.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-2.f95
index 4624a05247d9..2baaa594be18 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-2.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-2.f95
@@ -29,4 +29,4 @@ function f (a, b)

 end function f

-! { dg-final { scan-tree-dump-times "acc loop private.* auto" 1 "original" } }
+! { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-3.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-3.f95
index daed8f7f6e9d..e629891e31f9 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-3.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-3.f95
@@ -30,4 +30,4 @@ function f (a, b)

 end function f

-! { dg-final { scan-tree-dump-times "acc loop private.* auto" 1 "original" } }
+! { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-4.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-4.f95
index 0c4ad256b7eb..6c3300b70537 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-4.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-4.f95
@@ -31,4 +31,4 @@ function f (a, b)

 end function f

-! { dg-final { scan-tree-dump-times "acc loop private.* auto" 1 "original" } }
+! { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-5.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-5.f95
index 1c3f87eed6e4..52a9e7e7a85b 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-5.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-5.f95
@@ -31,5 +31,5 @@ function f (a, b)

 end function f

-! { dg-final { scan-tree-dump-times "acc loop private.* auto" 1 "original" } }
+! { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } }

diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-6.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-6.f95
index 43173a70df24..60eb245a22a9 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-6.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-6.f95
@@ -31,4 +31,4 @@ function f (a, b)

 end function f

-! { dg-final { scan-tree-dump-times "acc loop private.* auto" 0 "original" } }
+! { dg-final { scan-tree-dump-times "acc loop auto" 0 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-7.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-7.f95
index ec42213220e7..438a13acee18 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-7.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-7.f95
@@ -44,5 +44,5 @@ function f (a, b)

 end function f

-! { dg-final { scan-tree-dump-times "acc loop private.* auto" 0 "original" } }
+! { dg-final { scan-tree-dump-times "acc loop auto" 0 "original" } }

diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-8.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-8.f95
index 9188f70d9664..aa97e37c054c 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-8.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-8.f95
@@ -46,5 +46,5 @@ function f (a, b)

 end function f

-! { dg-final { scan-tree-dump-times "acc loop private\\(i\\) auto" 2 "original" } }
+! { dg-final { scan-tree-dump-times "acc loop auto" 2 "original" } }

--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 11/40] Clean up loop variable extraction in OpenACC kernels loop annotation.
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
                   ` (9 preceding siblings ...)
  2021-12-15 15:54 ` [PATCH 10/40] Fix patterns in Fortran tests for kernels loop annotation Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2021-12-15 15:54 ` [PATCH 12/40] Relax some restrictions on the loop bound in " Frederik Harwath
                   ` (28 subsequent siblings)
  39 siblings, 0 replies; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: Sandra Loosemore, thomas, joseph, jason, nathan

From: Sandra Loosemore <sandra@codesourcery.com>

The code for identifying annotatable loops in OpenACC kernels regions
previously looked for the loop variable as the left-hand side of the
comparison in the loop end test.  However, front end optimizations
sometimes switch the sense of the comparison, making this method
unreliable.  In particular, it's ambiguous when both operands to the
end test comparison are local variables.

This patch reorders the loop processing to identify the loop variable
from the initializer, rather than the end test. The processing of the
end test then just checks that one of the operands to the comparison
matches the variable appearing in the initializer.  Much of the patch
is code refactoring, moving the initializer analysis out of
annotate_for_loop to check_and_annotate_for_loop so it can be
performed earlier.

2020-08-30  Sandra Loosemore  <sandra@codesourcery.com>

        gcc/c-family/
        * c-omp.c (annotate_for_loop): Move initializer processing...
        (check_and_annotate_for_loop): ... to here.  Allow the loop
        variable as either operand to the condition.
---
 gcc/c-family/c-omp.c | 196 +++++++++++++++++++++----------------------
 1 file changed, 98 insertions(+), 98 deletions(-)

diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c
index e7c27f45e888..e73fb5d01f7e 100644
--- a/gcc/c-family/c-omp.c
+++ b/gcc/c-family/c-omp.c
@@ -3174,86 +3174,26 @@ static tree (*lang_specific_unwrap_initializer) (tree);

 /* Try to annotate the given NODE, which must be a FOR_STMT, with a
    "#pragma acc loop auto" annotation.  In practice, this means
-   building an OMP_FOR node for it.  PREV_STMT is the statement
-   immediately before the loop, which may be used as the loop's
-   initialization statement.  Annotating the loop may fail, in which
-   case INFO is used to record the cause of the failure and the
-   original loop remains unchanged.  This function returns the
-   transformed loop if the transformation succeeded, the original node
-   otherwise.  */
+   building an OMP_FOR node for it.  DECL and INIT are the
+   previously-verified iteration variable and initializer.  Annotating
+   the loop may fail, in which case INFO is used to record the cause
+   of the failure and the original loop remains unchanged.  This
+   function returns the transformed loop if the transformation
+   succeeded, the original node otherwise.  */

 static tree
-annotate_for_loop (tree node, tree_stmt_iterator *prev_tsi,
+annotate_for_loop (tree node, tree decl, tree init,
                   struct annotation_info *info)
 {
   gcc_checking_assert (TREE_CODE (node) == FOR_STMT);

   location_t loc = EXPR_LOCATION (node);
   tree cond = FOR_COND (node);
+  tree incr = FOR_EXPR (node);
+
+  gcc_assert (decl);
   gcc_assert (cond);
-  tree decl = TREE_OPERAND (cond, 0);
   gcc_assert (decl && TREE_CODE (decl) == VAR_DECL);
-  tree init = FOR_INIT_STMT (node);
-  tree prev_stmt = NULL_TREE;
-  bool unlink_prev = false;
-  bool fix_decl = false;
-
-
-  /* Both the C and C++ front ends normally put the initializer in the
-     statement list just before the FOR_STMT instead of in FOR_INIT_STMT.
-     If FOR_INIT_STMT happens to exist but isn't a MODIFY_EXPR, bail out
-     because the code below won't handle it.  */
-  if (init != NULL_TREE && TREE_CODE (init) != MODIFY_EXPR)
-    {
-      do_not_annotate_loop (info, as_invalid_initializer, NULL_TREE);
-      return node;
-    }
-
-  /* Examine the statement before the loop to see if it is a
-     valid initializer.  It must be either a MODIFY_EXPR or VAR_DECL,
-     possibly wrapped in language-specific structure.  */
-  if (init == NULL_TREE && prev_tsi != NULL)
-    {
-      prev_stmt = tsi_stmt (*prev_tsi);
-
-      /* Call the language-specific hook to unwrap prev_stmt.  */
-      if (prev_stmt)
-       prev_stmt = (*lang_specific_unwrap_initializer) (prev_stmt);
-
-      /* See if we have a valid MODIFY_EXPR.  */
-      if (prev_stmt
-         && TREE_CODE (prev_stmt) == MODIFY_EXPR
-         && TREE_OPERAND (prev_stmt, 0) == decl
-         && !TREE_SIDE_EFFECTS (TREE_OPERAND (prev_stmt, 1)))
-       {
-         init = prev_stmt;
-         unlink_prev = true;
-       }
-      else if (prev_stmt == decl
-              && !TREE_SIDE_EFFECTS (DECL_INITIAL (decl)))
-       {
-         /* If the preceding statement is the declaration of the loop
-            variable with its initialization, build an assignment
-            expression for the loop's initializer.  */
-         init = build2 (MODIFY_EXPR, TREE_TYPE (decl), decl,
-                        DECL_INITIAL (decl));
-         /* We need to remove the initializer from the decl if we
-            end up using the init we just built instead.  */
-         fix_decl = true;
-       }
-    }
-
-  if (init == NULL_TREE)
-    /* There is nothing we can do to find the correct init statement for
-       this loop, but c_finish_omp_for insists on having one and would fail
-       otherwise.  In that case, we would just return node.  Do that
-       directly, here.  */
-    {
-      do_not_annotate_loop (info, as_missing_initializer, NULL_TREE);
-      return node;
-    }
-
-  tree incr = FOR_EXPR (node);

   /* The C++ frontend can wrap the increment two levels deep inside a
      cleanup expression, but c_finish_omp_for does not care about that.  */
@@ -3278,18 +3218,6 @@ annotate_for_loop (tree node, tree_stmt_iterator *prev_tsi,
                                            NULL_TREE, false, info);
   if (omp_for != NULL_TREE)
     {
-      if (unlink_prev)
-       /* We don't need the previous statement that we consumed as an
-          initializer in the new OMP_FOR any more.  */
-       tsi_delink (prev_tsi);
-
-      if (fix_decl)
-       /* We no longer need the initializer expression on the decl of
-          the loop variable and don't want to duplicate it.  The
-          kernels conversion pass would interpret it as a stray
-          assignment in a gang-single region.  */
-       DECL_INITIAL (prev_stmt) = NULL_TREE;
-
       /* Add an auto clause, then return the new loop.  */
       tree auto_clause = build_omp_clause (loc, OMP_CLAUSE_AUTO);
       OMP_CLAUSE_CHAIN (auto_clause) = OMP_FOR_CLAUSES (omp_for);
@@ -3315,11 +3243,16 @@ check_and_annotate_for_loop (tree *nodeptr, tree_stmt_iterator *prev_tsi,
 {
   tree node = *nodeptr;
   gcc_assert (TREE_CODE (node) == FOR_STMT);
+  tree init = FOR_INIT_STMT (node);
+  tree cond = FOR_COND (node);
+  tree prev_stmt = NULL_TREE;
+  tree decl = NULL_TREE;
+  bool unlink_prev = false;
+  bool fix_decl = false;

   /* This structure describes the current loop statement.  */
   struct annotation_info loop_info
     = { node, NULL_TREE, false, as_in_kernels_loop, NULL_TREE, info };
-  tree cond = FOR_COND (node);

   /* If we are in the body of an explicitly-annotated loop, do not add
      annotations to this loop or any other nested loops.  */
@@ -3331,30 +3264,84 @@ check_and_annotate_for_loop (tree *nodeptr, tree_stmt_iterator *prev_tsi,
      That is why we are doing some checks on the loop condition
      that duplicate what c_finish_omp_for is doing.  */

-  /* The loop condition must be a comparison.  */
+  /* First we need to find the decl and initializer for the
+     controlling variable.  Both the C and C++ front ends normally put
+     the initializer in the statement list just before the FOR_STMT
+     instead of in FOR_INIT_STMT.  If FOR_INIT_STMT happens to exist
+     but isn't a MODIFY_EXPR, give up.
+     handle it.  */
+
+  else if (init != NULL_TREE && TREE_CODE (init) != MODIFY_EXPR)
+    do_not_annotate_loop (&loop_info, as_invalid_initializer, NULL_TREE);
+
+  /* Examine the statement before the loop to see if it is a
+     valid initializer.  It must be either a MODIFY_EXPR or VAR_DECL,
+     possibly wrapped in language-specific structure.  */
+  else if (init == NULL_TREE && prev_tsi != NULL && tsi_stmt (*prev_tsi))
+    {
+      prev_stmt = tsi_stmt (*prev_tsi);
+
+      /* Call the language-specific hook to unwrap prev_stmt.  */
+      prev_stmt = (*lang_specific_unwrap_initializer) (prev_stmt);
+
+      /* See if we have a valid MODIFY_EXPR.  */
+      if (TREE_CODE (prev_stmt) == MODIFY_EXPR
+         && is_local_var (TREE_OPERAND (prev_stmt, 0))
+         && !TREE_SIDE_EFFECTS (TREE_OPERAND (prev_stmt, 1)))
+       {
+         decl = TREE_OPERAND (prev_stmt, 0);
+         init = prev_stmt;
+         unlink_prev = true;
+       }
+      else if (is_local_var (prev_stmt)
+              && !TREE_SIDE_EFFECTS (DECL_INITIAL (prev_stmt)))
+       {
+         /* If the preceding statement is the declaration of the loop
+            variable with its initialization, build an assignment
+            expression for the loop's initializer.  */
+         decl = prev_stmt;
+         init = build2 (MODIFY_EXPR, TREE_TYPE (decl), decl,
+                        DECL_INITIAL (decl));
+         /* We need to remove the initializer from the decl if we
+            end up using the init we just built instead.  */
+         fix_decl = true;
+       }
+    }
+
+  if (init == NULL_TREE || decl == NULL_TREE)
+    /* There is nothing we can do to find the correct init statement for
+       this loop.  */
+    do_not_annotate_loop (&loop_info, as_missing_initializer, NULL_TREE);
+
+  /* The condition must be a comparison of the decl we found in
+     the initializer against an expression that can be hoisted
+     outside the loop.  */
+  if (loop_info.state > as_in_kernels_loop)
+    /* Skip validating condition if we've already got an error.  */
+    ;
   else if (cond == NULL_TREE)
     do_not_annotate_loop (&loop_info, as_missing_predicate, NULL_TREE);
   else if (TREE_CODE_CLASS (TREE_CODE (cond)) != tcc_comparison)
     do_not_annotate_loop (&loop_info, as_invalid_predicate, cond);
   else
     {
-      /* The condition's LHS must be a local variable that does not
-        have its address taken.  Its RHS must also be such a local
-        variable or a constant.  */
-      tree induction_var = TREE_OPERAND (cond, 0);
-      tree limit_var = TREE_OPERAND (cond, 1);
-      if (!is_local_var (induction_var)
-         || (!is_local_var (limit_var)
-             && (TREE_CODE_CLASS (TREE_CODE (limit_var))
-                 != tcc_constant)))
+      tree limit_exp = NULL_TREE;
+
+      if (TREE_OPERAND (cond, 0) == decl)
+       limit_exp = TREE_OPERAND (cond, 1);
+      else if (TREE_OPERAND (cond, 1) == decl)
+       limit_exp = TREE_OPERAND (cond, 0);
+
+      if (!limit_exp
+         || (!is_local_var (limit_exp)
+             && (TREE_CODE_CLASS (TREE_CODE (limit_exp)) != tcc_constant)))
        do_not_annotate_loop (&loop_info, as_invalid_predicate, cond);
       else
        {
          /* These variables must not be assigned to in the loop.  */
-         loop_info.vars = tree_cons (NULL_TREE, induction_var,
-                                     loop_info.vars);
-         if (TREE_CODE_CLASS (TREE_CODE (limit_var)) != tcc_constant)
-           loop_info.vars = tree_cons (NULL_TREE, limit_var, loop_info.vars);
+         loop_info.vars = tree_cons (NULL_TREE, decl, loop_info.vars);
+         if (TREE_CODE_CLASS (TREE_CODE (limit_exp)) != tcc_constant)
+           loop_info.vars = tree_cons (NULL_TREE, limit_exp, loop_info.vars);
        }
     }

@@ -3369,11 +3356,24 @@ check_and_annotate_for_loop (tree *nodeptr, tree_stmt_iterator *prev_tsi,
       /* If the traversal of the loop and all nested loops didn't hit
         any problems, attempt the actual transformation.  If it
         succeeds, replace this node with the annotated loop.  */
-      tree result = annotate_for_loop (node, prev_tsi, &loop_info);
+      tree result = annotate_for_loop (node, decl, init, &loop_info);
       if (result != node)
        {
          /* Success!  */
          *nodeptr = result;
+
+         if (unlink_prev)
+           /* We don't need the previous statement that we consumed
+              as an initializer in the new OMP_FOR any more.  */
+           tsi_delink (prev_tsi);
+
+         if (fix_decl)
+           /* We no longer need the initializer expression on the
+              decl of the loop variable and don't want to duplicate
+              it.  The kernels conversion pass would interpret it as
+              a stray assignment in a gang-single region.  */
+           DECL_INITIAL (decl) = NULL_TREE;
+
          return;
        }
     }
--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 12/40] Relax some restrictions on the loop bound in kernels loop annotation.
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
                   ` (10 preceding siblings ...)
  2021-12-15 15:54 ` [PATCH 11/40] Clean up loop variable extraction in OpenACC " Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2021-12-15 15:54 ` [PATCH 13/40] Fortran: Delinearize array accesses Frederik Harwath
                   ` (27 subsequent siblings)
  39 siblings, 0 replies; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: Sandra Loosemore, thomas, joseph, jason, nathan

From: Sandra Loosemore <sandra@codesourcery.com>

OpenACC loop semantics require that the loop bound be computable
before entering the loop, rather than the C/C++ semantics where the
end test is evaluated on every iteration.  Formerly the kernels loop
annotater permitted only constants and variables not modified in the
loop body in the loop bound expression.  This patch relaxes those
restrictions somewhat to allow many forms of expressions involving
such constants and variables, including calls to constant functions.

2020-08-30  Sandra Loosemore  <sandra@codesourcery.com>

        gcc/c-family/
        * c-omp.c (end_test_ok_for_annotation_r): New.
        (end_test_ok_for_annotation): New.
        (check_and_annotate_for_loop): Use the new helper function.

        gcc/testsuite/
        * c-c++-common/goacc/kernels-loop-annotation-21.c: New.
        * c-c++-common/goacc/kernels-loop-annotation-22.c: New.
---
 gcc/c-family/c-omp.c                          | 120 ++++++++++++++++--
 .../goacc/kernels-loop-annotation-21.c        |  42 ++++++
 .../goacc/kernels-loop-annotation-22.c        |  41 ++++++
 3 files changed, 194 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-21.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-22.c

diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c
index e73fb5d01f7e..dc63d304ca67 100644
--- a/gcc/c-family/c-omp.c
+++ b/gcc/c-family/c-omp.c
@@ -3165,6 +3165,116 @@ is_local_var (tree decl)
          && !TREE_ADDRESSABLE (decl));
 }

+/* EXP is a loop bound expression for a comparison against local
+   variable DECL.  Check whether this is potentially valid in an OpenACC loop
+   context, namely that it can be precomputed when entering the loop
+   construct per the OpenACC specification.  Local variables referenced
+   in both DECL and EXP that may not be modified in the body of the loop
+   are added to the list in INFO to be checked later.
+
+   FIXME: Ideally we would like to make this test permissive rather than
+   restrictive, and allow the later conversion of the "auto" attribute to
+   either "seq" or "independent" to make the determination using dataflow,
+   alias analysis, etc rather than a tree traversal.  But presently it does
+   not do that and always just hoists the loop bound expression.  So the
+   current implementation only considers expressions involving unmodified
+   local variables and constants, using a tree walk.  */
+
+static tree
+end_test_ok_for_annotation_r (tree *tp, int *walk_subtrees,
+                             void *data)
+{
+  tree exp = *tp;
+  struct annotation_info *info = (struct annotation_info *) data;
+
+  switch (TREE_CODE_CLASS (TREE_CODE (exp)))
+    {
+    case tcc_constant:
+      /* Constants are trivially known to be invariant.  */
+      return NULL_TREE;
+
+    case tcc_declaration:
+      if (is_local_var (exp))
+       {
+         tree t;
+         /* Add it to the list of variables that can't be modified in the
+            loop, only if not already present.  */
+         for (t = info->vars; t && TREE_VALUE (t) != exp;
+              t = TREE_CHAIN (t))
+           ;
+         if (!t)
+           info->vars = tree_cons (NULL_TREE, exp, info->vars);
+         return NULL_TREE;
+       }
+      else if (TREE_CODE (exp) == VAR_DECL && TREE_READONLY (exp))
+       return NULL_TREE;
+      else if (TREE_CODE (exp) == FUNCTION_DECL)
+       return NULL_TREE;
+      break;
+
+    case tcc_unary:
+    case tcc_binary:
+    case tcc_comparison:
+      /* Allow arithmetic expressions and comparisons provided
+        that the operands are good.  */
+      return NULL_TREE;
+
+    default:
+      /* Handle some special cases.  */
+      switch (TREE_CODE (exp))
+       {
+       case COND_EXPR:
+       case TRUTH_ANDIF_EXPR:
+       case TRUTH_ORIF_EXPR:
+       case TRUTH_AND_EXPR:
+       case TRUTH_OR_EXPR:
+       case TRUTH_XOR_EXPR:
+       case TRUTH_NOT_EXPR:
+         /* ?: and boolean operators are OK.  */
+         return NULL_TREE;
+
+       case CALL_EXPR:
+         /* Allow calls to constant functions with invariant operands.  */
+         {
+           tree fndecl = get_callee_fndecl (exp);
+           if (fndecl && TREE_READONLY (fndecl))
+             return NULL_TREE;
+         }
+         break;
+
+       case ADDR_EXPR:
+         /* We can expect addresses of things to be invariant.  */
+         return NULL_TREE;
+
+       default:
+         break;
+       }
+    }
+
+  /* Reject anything else.  */
+  *walk_subtrees = 0;
+  return exp;
+}
+
+static bool
+end_test_ok_for_annotation (tree decl, tree exp,
+                           struct annotation_info *info)
+{
+  /* Traversal returns NULL_TREE if all is well.  */
+  if (!walk_tree (&exp, end_test_ok_for_annotation_r, info, NULL))
+    {
+      /* So far, so good.  Check the decl against any variables collected
+        in the exp.  */
+      tree t;
+      for (t = info->vars; t; t = TREE_CHAIN (t))
+       if (TREE_VALUE (t) == decl)
+         return false;
+      info->vars = tree_cons (NULL_TREE, decl, info->vars);
+      return true;
+    }
+  return false;
+}
+
 /* The initializer for a FOR_STMT is sometimes wrapped in various other
    language-specific tree structures.  We need a hook to unwrap them.
    This function takes a tree argument and should return either a
@@ -3333,16 +3443,8 @@ check_and_annotate_for_loop (tree *nodeptr, tree_stmt_iterator *prev_tsi,
        limit_exp = TREE_OPERAND (cond, 0);

       if (!limit_exp
-         || (!is_local_var (limit_exp)
-             && (TREE_CODE_CLASS (TREE_CODE (limit_exp)) != tcc_constant)))
+         || !end_test_ok_for_annotation (decl, limit_exp, &loop_info))
        do_not_annotate_loop (&loop_info, as_invalid_predicate, cond);
-      else
-       {
-         /* These variables must not be assigned to in the loop.  */
-         loop_info.vars = tree_cons (NULL_TREE, decl, loop_info.vars);
-         if (TREE_CODE_CLASS (TREE_CODE (limit_exp)) != tcc_constant)
-           loop_info.vars = tree_cons (NULL_TREE, limit_exp, loop_info.vars);
-       }
     }

   /* Walk the body.  This will process any nested loops, so we have to do it
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-21.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-21.c
new file mode 100644
index 000000000000..f87444ede4b4
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-21.c
@@ -0,0 +1,42 @@
+/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-fdump-tree-original" } */
+/* { dg-do compile } */
+
+/* Test for rejecting annotation on loops that have various subexpressions
+   in the loop end test that are not loop-invariant.  */
+
+extern int g (int);
+extern int x;
+extern int gg (int, int) __attribute__ ((const));
+
+void f (float *a, float *b, int n)
+{
+
+  int j;
+#pragma acc kernels
+  {
+    /* Non-constant function call.  */
+    for (int i = 0; i < g(n); i++)     /* { dg-warning "loop cannot be annotated" } */
+      a[i] = b[i];
+
+    /* Global variable.  */
+    for (int i = x; i < n + x; i++)    /* { dg-warning "loop cannot be annotated" } */
+      a[i] = b[i];
+
+    /* Explicit reference to the loop variable.  */
+    for (int i = 0; i < gg (i, n); i++)        /* { dg-warning "loop cannot be annotated" } */
+      a[i] = b[i];
+
+    /* Reference to a variable that is modified in the body of the loop.  */
+    j = 0;
+    for (int i = 0; i < gg (j, n); i++)        /* { dg-warning "loop cannot be annotated" } */
+      {
+       a[i] = b[i];
+       j = i;
+      }
+
+  }
+}
+
+/* { dg-final { scan-tree-dump-times "acc loop auto" 0 "original" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-22.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-22.c
new file mode 100644
index 000000000000..6a5099d2ff9d
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-22.c
@@ -0,0 +1,41 @@
+/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-fdump-tree-original" } */
+/* { dg-do compile } */
+
+/* Test for accepting annotation on loops that have various forms of
+   loop-invariant expressions in their end test.  */
+
+extern const int x;
+extern int g (int) __attribute__ ((const));
+
+void f (float *a, float *b, int n)
+{
+
+  int j;
+#pragma acc kernels
+  {
+    /* Reversed form of comparison.  */
+    for (int i = 0; n >= i; i++)
+      a[i] = b[i];
+
+    /* Constant function call.  */
+    for (int i = 0; i < g(n); i++)
+      a[i] = b[i];
+
+    /* Constant global variable.  */
+    for (int i = 0; i < x; i++)
+      a[i] = b[i];
+
+    /* Complicated expression involving conditionals, etc. */
+    for (int i = 0; i < ((x == 4) ? (n << 2) : (n << 3)); i++)
+      a[i] = b[i];
+
+    /* Reference to a local variable not modified in the loop.  */
+    j = ((x == 4) ? (n << 2) : (n << 3));
+    for (int i = 0; i < j; i++)
+      a[i] = b[i];
+  }
+}
+
+/* { dg-final { scan-tree-dump-times "acc loop auto" 5 "original" } } */
--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 13/40] Fortran: Delinearize array accesses
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
                   ` (11 preceding siblings ...)
  2021-12-15 15:54 ` [PATCH 12/40] Relax some restrictions on the loop bound in " Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2021-12-15 15:54 ` [PATCH 14/40] openacc: Move pass_oacc_device_lower after pass_graphite Frederik Harwath
                   ` (26 subsequent siblings)
  39 siblings, 0 replies; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: thomas, tobias, fortran, rguenther

The Fortran front end presently linearizes accesses to
multi-dimensional arrays by combining the indices for the various
dimensions into a series of explicit multiplies and adds with
refactoring to allow CSE of invariant parts of the computation.
Unfortunately this representation interferes with Graphite-based loop
optimizations.  It is difficult to recover the original
multi-dimensional form of the access by the time loop optimizations
run because parts of it have already been optimized away or into a
form that is not easily recognizable, so it seems better to have the
Fortran front end produce delinearized accesses to begin with, a set
of nested ARRAY_REFs similar to the existing behavior of the C and C++
front ends.  This is a long-standing problem that has previously been
discussed e.g. in PR 14741 and PR61000.

This patch is an initial implementation for explicit array accesses
only; it doesn't handle the accesses generated during scalarization of
whole-array or array-section operations, which follow a different code
path.

Co-Authored-By: Tobias Burnus <tobias@codesourcery.com>

gcc/ChangeLog:

        * expr.c (get_inner_reference): Handle NOP_EXPR.

gcc/fortran/ChangeLog:

        * lang.opt: Document -param=delinearize.
        * trans-array.c: (get_class_array_vptr): New function.
        (get_array_lbound): New function.
        (get_array_ubound): New function.
        (gfc_conv_array_ref): Implement main delinearization logic.
        (build_array_ref): Adjust.

gcc/testsuite/ChangeLog:

        * gfortran.dg/assumed_type_2.f90: Adjust test expectations.
        * gfortran.dg/goacc/kernels-loop-inner.f95: Likewise.
        * gfortran.dg/gomp/affinity-clause-1.f90: Likewise.
        * gfortran.dg/graphite/block-2.f: Likewise.
        * gfortran.dg/graphite/block-3.f90: Likewise.
        * gfortran.dg/graphite/block-4.f90: Likewise.
        * gfortran.dg/graphite/id-9.f: Likewise.
        * gfortran.dg/inline_matmul_16.f90: Likewise.
        * gfortran.dg/inline_matmul_24.f90: Likewise.
        * gfortran.dg/no_arg_check_2.f90: Likewise.
        * gfortran.dg/pr32921.f: Likewise.
        * gfortran.dg/reassoc_4.f: Likewise.
        * gfortran.dg/vect/fast-math-mgrid-resid.f: Likewise.
---
 gcc/expr.c                                    |   1 +
 gcc/fortran/lang.opt                          |   4 +
 gcc/fortran/trans-array.c                     | 321 +++++++++++++-----
 gcc/testsuite/gfortran.dg/assumed_type_2.f90  |   6 +-
 .../gfortran.dg/goacc/kernels-loop-inner.f95  |   2 +-
 .../gfortran.dg/gomp/affinity-clause-1.f90    |   2 +-
 gcc/testsuite/gfortran.dg/graphite/block-2.f  |   9 +-
 .../gfortran.dg/graphite/block-3.f90          |   2 +-
 .../gfortran.dg/graphite/block-4.f90          |   2 +-
 gcc/testsuite/gfortran.dg/graphite/id-9.f     |   2 +-
 .../gfortran.dg/inline_matmul_16.f90          |   2 +
 .../gfortran.dg/inline_matmul_24.f90          |   2 +-
 gcc/testsuite/gfortran.dg/no_arg_check_2.f90  |   6 +-
 gcc/testsuite/gfortran.dg/pr32921.f           |   2 +-
 gcc/testsuite/gfortran.dg/reassoc_4.f         |   2 +-
 .../gfortran.dg/vect/fast-math-mgrid-resid.f  |   1 +
 16 files changed, 270 insertions(+), 96 deletions(-)

diff --git a/gcc/expr.c b/gcc/expr.c
index eb33643bd770..188905b4fe4d 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -7759,6 +7759,7 @@ get_inner_reference (tree exp, poly_int64_pod *pbitsize,
          break;

        case VIEW_CONVERT_EXPR:
+       case NOP_EXPR:
          break;

        case MEM_REF:
diff --git a/gcc/fortran/lang.opt b/gcc/fortran/lang.opt
index a202c04c4a25..25c5a5a32c41 100644
--- a/gcc/fortran/lang.opt
+++ b/gcc/fortran/lang.opt
@@ -521,6 +521,10 @@ fdefault-real-16
 Fortran Var(flag_default_real_16)
 Set the default real kind to an 16 byte wide type.

+-param=delinearize=
+Common Joined UInteger Var(flag_delinearize_aref) Init(1) IntegerRange(0,1) Param Optimization
+Delinearize array references.
+
 fdollar-ok
 Fortran Var(flag_dollar_ok)
 Allow dollar signs in entity names.
diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index 5ceb261b6989..e84b4cb55f05 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -3747,11 +3747,9 @@ add_to_offset (tree *cst_offset, tree *offset, tree t)
     }
 }

-
 static tree
-build_array_ref (tree desc, tree offset, tree decl, tree vptr)
+get_class_array_vptr (tree desc, tree vptr)
 {
-  tree tmp;
   tree type;
   tree cdesc;

@@ -3775,19 +3773,74 @@ build_array_ref (tree desc, tree offset, tree decl, tree vptr)
          && GFC_CLASS_TYPE_P (TYPE_CANONICAL (type)))
        vptr = gfc_class_vptr_get (TREE_OPERAND (cdesc, 0));
     }
+  return vptr;
+}

+static tree
+build_array_ref (tree desc, tree offset, tree decl, tree vptr)
+{
+  tree tmp;
+  vptr = get_class_array_vptr (desc, vptr);
   tmp = gfc_conv_array_data (desc);
   tmp = build_fold_indirect_ref_loc (input_location, tmp);
   tmp = gfc_build_array_ref (tmp, offset, decl, vptr);
   return tmp;
 }

+/* Get the declared lower bound for rank N of array DECL which might
+   be either a bare array or a descriptor.  This differs from
+   gfc_conv_array_lbound because it gets information for temporary array
+   objects from AR instead of the descriptor (they can differ).  */
+
+static tree
+get_array_lbound (tree decl, int n, gfc_symbol *sym,
+                 gfc_array_ref *ar, gfc_se *se)
+{
+  if (sym->attr.temporary)
+    {
+      gfc_se tmpse;
+      gfc_init_se (&tmpse, se);
+      gfc_conv_expr_type (&tmpse, ar->as->lower[n], gfc_array_index_type);
+      gfc_add_block_to_block (&se->pre, &tmpse.pre);
+      return tmpse.expr;
+    }
+  else
+    return gfc_conv_array_lbound (decl, n);
+}
+
+/* Similarly for the upper bound.  */
+static tree
+get_array_ubound (tree decl, int n, gfc_symbol *sym,
+                 gfc_array_ref *ar, gfc_se *se)
+{
+  if (sym->attr.temporary)
+    {
+      gfc_se tmpse;
+      gfc_init_se (&tmpse, se);
+      gfc_conv_expr_type (&tmpse, ar->as->upper[n], gfc_array_index_type);
+      gfc_add_block_to_block (&se->pre, &tmpse.pre);
+      return tmpse.expr;
+    }
+  else
+    return gfc_conv_array_ubound (decl, n);
+}
+

 /* Build an array reference.  se->expr already holds the array descriptor.
    This should be either a variable, indirect variable reference or component
    reference.  For arrays which do not have a descriptor, se->expr will be
    the data pointer.
-   a(i, j, k) = base[offset + i * stride[0] + j * stride[1] + k * stride[2]]*/
+
+   There are two strategies here.  In the traditional case, multidimensional
+   arrays are explicitly linearized into a one-dimensional array, with the
+   index computed as if by
+   a(i, j, k) = base[offset + i * stride[0] + j * stride[1] + k * stride[2]]
+
+   However, we can often get better code using the Graphite framework
+   and scalar evolutions in the middle end, which expects to see
+   multidimensional array accesses represented as nested ARRAY_REFs, similar
+   to what the C/C++ front ends produce.  Delinearization is controlled
+   by flag_delinearize_aref.  */

 void
 gfc_conv_array_ref (gfc_se * se, gfc_array_ref * ar, gfc_expr *expr,
@@ -3798,11 +3851,16 @@ gfc_conv_array_ref (gfc_se * se, gfc_array_ref * ar, gfc_expr *expr,
   tree tmp;
   tree stride;
   tree decl = NULL_TREE;
+  tree cooked_decl = NULL_TREE;
+  tree vptr = se->class_vptr;
   gfc_se indexse;
   gfc_se tmpse;
   gfc_symbol * sym = expr->symtree->n.sym;
   char *var_name = NULL;
+  tree aref = NULL_TREE;
+  tree atype = NULL_TREE;

+  /* Handle coarrays.  */
   if (ar->dimen == 0)
     {
       gcc_assert (ar->codimen || sym->attr.select_rank_temporary
@@ -3862,15 +3920,160 @@ gfc_conv_array_ref (gfc_se * se, gfc_array_ref * ar, gfc_expr *expr,
        }
     }

+  /* Per comments above, DECL is not always a declaration.  It may be
+     either a variable, indirect variable reference, or component
+     reference.  It may have array or pointer type, or it may be a
+     descriptor with RECORD_TYPE.  */
   decl = se->expr;
   if (IS_CLASS_ARRAY (sym) && sym->attr.dummy && ar->as->type != AS_DEFERRED)
     decl = sym->backend_decl;

-  cst_offset = offset = gfc_index_zero_node;
-  add_to_offset (&cst_offset, &offset, gfc_conv_array_offset (decl));
+  /* A pointer array component can be detected from its field decl. Fix
+     the descriptor, mark the resulting variable decl and store it in
+     COOKED_DECL to pass to gfc_build_array_ref.  */
+  if (get_CFI_desc (sym, expr, &cooked_decl, ar))
+    cooked_decl = build_fold_indirect_ref_loc (input_location, cooked_decl);
+  if (!expr->ts.deferred && !sym->attr.codimension
+      && is_pointer_array (se->expr))
+    {
+      if (TREE_CODE (se->expr) == COMPONENT_REF)
+       cooked_decl = se->expr;
+      else if (TREE_CODE (se->expr) == INDIRECT_REF)
+       cooked_decl = TREE_OPERAND (se->expr, 0);
+      else
+       cooked_decl = se->expr;
+    }
+  else if (expr->ts.deferred
+          || (sym->ts.type == BT_CHARACTER
+              && sym->attr.select_type_temporary))
+    {
+      if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (se->expr)))
+       {
+         cooked_decl = se->expr;
+         if (TREE_CODE (cooked_decl) == INDIRECT_REF)
+           cooked_decl = TREE_OPERAND (cooked_decl, 0);
+       }
+      else
+       cooked_decl = sym->backend_decl;
+    }
+  else if (sym->ts.type == BT_CLASS)
+    {
+      if (UNLIMITED_POLY (sym))
+       {
+         gfc_expr *class_expr = gfc_find_and_cut_at_last_class_ref (expr);
+         gfc_init_se (&tmpse, NULL);
+         gfc_conv_expr (&tmpse, class_expr);
+         if (!se->class_vptr)
+           vptr = gfc_class_vptr_get (tmpse.expr);
+         gfc_free_expr (class_expr);
+         cooked_decl = tmpse.expr;
+       }
+      else
+       cooked_decl = NULL_TREE;
+    }
+
+  /* Find the base of the array; this normally has ARRAY_TYPE.  */
+  tree base = build_fold_indirect_ref_loc (input_location,
+                                          gfc_conv_array_data (se->expr));
+  tree type = TREE_TYPE (base);

-  /* Calculate the offsets from all the dimensions.  Make sure to associate
-     the final offset so that we form a chain of loop invariant summands.  */
+  /* Handle special cases, copied from gfc_build_array_ref.  After we get
+     through this, we know TYPE definitely is an ARRAY_TYPE.  */
+  if (GFC_ARRAY_TYPE_P (type) && GFC_TYPE_ARRAY_RANK (type) == 0)
+    {
+      gcc_assert (GFC_TYPE_ARRAY_CORANK (type) > 0);
+      se->expr = fold_convert (TYPE_MAIN_VARIANT (type), base);
+      return;
+    }
+  if (TREE_CODE (type) != ARRAY_TYPE)
+    {
+      gcc_assert (cooked_decl == NULL_TREE);
+      se->expr = base;
+      return;
+    }
+
+  /* Check for cases where we cannot delinearize.  */
+
+  bool delinearize = flag_delinearize_aref;
+
+  /* There is no point in trying to delinearize 1-dimensional arrays.  */
+  if (ar->dimen == 1)
+    delinearize = false;
+
+  if (delinearize
+      && (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (se->expr))
+         || (DECL_P (se->expr)
+             && DECL_LANG_SPECIFIC (se->expr)
+             && GFC_DECL_SAVED_DESCRIPTOR (se->expr))))
+    {
+      /* Descriptor arrays that may not be contiguous cannot
+        be delinearized without using the stride in the descriptor,
+        which generally involves introducing a division operation.
+        That's unlikely to produce optimal code, so avoid doing it.  */
+      tree desc = se->expr;
+      if (!GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (se->expr)))
+       desc = GFC_DECL_SAVED_DESCRIPTOR (se->expr);
+      tree tmptype = TREE_TYPE (desc);
+      if (POINTER_TYPE_P (tmptype))
+       tmptype = TREE_TYPE (tmptype);
+      enum gfc_array_kind akind = GFC_TYPE_ARRAY_AKIND (tmptype);
+      if (akind != GFC_ARRAY_ASSUMED_SHAPE_CONT
+         && akind != GFC_ARRAY_ASSUMED_RANK_CONT
+         && akind != GFC_ARRAY_ALLOCATABLE
+         && akind != GFC_ARRAY_POINTER_CONT)
+       delinearize = false;
+    }
+
+  /* See gfc_build_array_ref in trans.c.  If we have a cooked_decl or
+     vptr, then we most likely have to do pointer arithmetic using a
+     linearized array offset.  */
+  if (delinearize && cooked_decl)
+    delinearize = false;
+  else if (delinearize && get_class_array_vptr (se->expr, vptr))
+    delinearize = false;
+
+  if (!delinearize)
+    {
+      /* Initialize the offset from the array descriptor.  This accounts
+        for the array base being something other than zero.  */
+      cst_offset = offset = gfc_index_zero_node;
+      add_to_offset (&cst_offset, &offset, gfc_conv_array_offset (decl));
+    }
+  else
+    {
+      /* If we are delinearizing, build up the nested array type using the
+        dimension information we have for each rank.  */
+      atype = TREE_TYPE (type);
+      for (n = 0; n < ar->dimen; n++)
+       {
+         /* We're working from the outermost nested array reference inward
+            in this step.  ATYPE is the element type for the access in
+            this rank; build the new array type based on the bounds
+            information and store it back into ATYPE for the next rank's
+            processing.  */
+         tree lbound = get_array_lbound (decl, n, sym, ar, se);
+         tree ubound = get_array_ubound (decl, n, sym, ar, se);
+         tree dimen = build_range_type (TREE_TYPE (lbound),
+                                        lbound, ubound);
+         atype = build_array_type (atype, dimen);
+
+         /* Emit a DECL_EXPR for the array type so the gimplification of
+            its type sizes works correctly.  */
+         if (! TYPE_NAME (atype))
+           TYPE_NAME (atype) = build_decl (UNKNOWN_LOCATION, TYPE_DECL,
+                                           NULL_TREE, atype);
+         gfc_add_expr_to_block (&se->pre,
+                                build1 (DECL_EXPR, atype,
+                                        TYPE_NAME (atype)));
+       }
+
+      /* Cast base to the innermost array type.  */
+      if (DECL_P (base))
+       TREE_ADDRESSABLE (base) = 1;
+      aref = build1 (NOP_EXPR, atype, base);
+    }
+
+  /* Process indices in reverse order.  */
   for (n = ar->dimen - 1; n >= 0; n--)
     {
       /* Calculate the index for this dimension.  */
@@ -3888,16 +4091,7 @@ gfc_conv_array_ref (gfc_se * se, gfc_array_ref * ar, gfc_expr *expr,
          indexse.expr = save_expr (indexse.expr);

          /* Lower bound.  */
-         tmp = gfc_conv_array_lbound (decl, n);
-         if (sym->attr.temporary)
-           {
-             gfc_init_se (&tmpse, se);
-             gfc_conv_expr_type (&tmpse, ar->as->lower[n],
-                                 gfc_array_index_type);
-             gfc_add_block_to_block (&se->pre, &tmpse.pre);
-             tmp = tmpse.expr;
-           }
-
+         tmp = get_array_lbound (decl, n, sym, ar, se);
          cond = fold_build2_loc (input_location, LT_EXPR, logical_type_node,
                                  indexse.expr, tmp);
          msg = xasprintf ("Index '%%ld' of dimension %d of array '%s' "
@@ -3912,16 +4106,7 @@ gfc_conv_array_ref (gfc_se * se, gfc_array_ref * ar, gfc_expr *expr,
             arrays.  */
          if (n < ar->dimen - 1 || ar->as->type != AS_ASSUMED_SIZE)
            {
-             tmp = gfc_conv_array_ubound (decl, n);
-             if (sym->attr.temporary)
-               {
-                 gfc_init_se (&tmpse, se);
-                 gfc_conv_expr_type (&tmpse, ar->as->upper[n],
-                                     gfc_array_index_type);
-                 gfc_add_block_to_block (&se->pre, &tmpse.pre);
-                 tmp = tmpse.expr;
-               }
-
+             tmp = get_array_ubound (decl, n, sym, ar, se);
              cond = fold_build2_loc (input_location, GT_EXPR,
                                      logical_type_node, indexse.expr, tmp);
              msg = xasprintf ("Index '%%ld' of dimension %d of array '%s' "
@@ -3934,65 +4119,41 @@ gfc_conv_array_ref (gfc_se * se, gfc_array_ref * ar, gfc_expr *expr,
            }
        }

-      /* Multiply the index by the stride.  */
-      stride = gfc_conv_array_stride (decl, n);
-      tmp = fold_build2_loc (input_location, MULT_EXPR, gfc_array_index_type,
-                            indexse.expr, stride);
-
-      /* And add it to the total.  */
-      add_to_offset (&cst_offset, &offset, tmp);
-    }
-
-  if (!integer_zerop (cst_offset))
-    offset = fold_build2_loc (input_location, PLUS_EXPR,
-                             gfc_array_index_type, offset, cst_offset);
-
-  /* A pointer array component can be detected from its field decl. Fix
-     the descriptor, mark the resulting variable decl and pass it to
-     build_array_ref.  */
-  decl = NULL_TREE;
-  if (get_CFI_desc (sym, expr, &decl, ar))
-    decl = build_fold_indirect_ref_loc (input_location, decl);
-  if (!expr->ts.deferred && !sym->attr.codimension
-      && is_pointer_array (se->expr))
-    {
-      if (TREE_CODE (se->expr) == COMPONENT_REF)
-       decl = se->expr;
-      else if (TREE_CODE (se->expr) == INDIRECT_REF)
-       decl = TREE_OPERAND (se->expr, 0);
-      else
-       decl = se->expr;
-    }
-  else if (expr->ts.deferred
-          || (sym->ts.type == BT_CHARACTER
-              && sym->attr.select_type_temporary))
-    {
-      if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (se->expr)))
+      if (!delinearize)
        {
-         decl = se->expr;
-         if (TREE_CODE (decl) == INDIRECT_REF)
-           decl = TREE_OPERAND (decl, 0);
+         /* Multiply the index by the stride.  */
+         stride = gfc_conv_array_stride (decl, n);
+         tmp = fold_build2_loc (input_location, MULT_EXPR,
+                                gfc_array_index_type,
+                                indexse.expr, stride);
+
+         /* And add it to the total.  */
+         add_to_offset (&cst_offset, &offset, tmp);
        }
       else
-       decl = sym->backend_decl;
-    }
-  else if (sym->ts.type == BT_CLASS)
-    {
-      if (UNLIMITED_POLY (sym))
        {
-         gfc_expr *class_expr = gfc_find_and_cut_at_last_class_ref (expr);
-         gfc_init_se (&tmpse, NULL);
-         gfc_conv_expr (&tmpse, class_expr);
-         if (!se->class_vptr)
-           se->class_vptr = gfc_class_vptr_get (tmpse.expr);
-         gfc_free_expr (class_expr);
-         decl = tmpse.expr;
+         /* Peel off a layer of array nesting from ATYPE to
+            to get the result type of the new ARRAY_REF.  */
+         atype = TREE_TYPE (atype);
+         aref = build4 (ARRAY_REF, atype, aref, indexse.expr,
+                        NULL_TREE, NULL_TREE);
        }
-      else
-       decl = NULL_TREE;
     }

-  se->expr = build_array_ref (se->expr, offset, decl, se->class_vptr);
+  if (!delinearize)
+    {
+      /* Build a linearized array reference using the offset from all
+        dimensions.  */
+      if (!integer_zerop (cst_offset))
+       offset = fold_build2_loc (input_location, PLUS_EXPR,
+                                 gfc_array_index_type, offset, cst_offset);
+      se->class_vptr = vptr;
+      vptr = get_class_array_vptr (se->expr, vptr);
+      se->expr = gfc_build_array_ref (base, offset, cooked_decl, vptr);
+    }
+ else
+   /* Return the outermost ARRAY_REF we already built.  */
+   se->expr = aref;
 }


diff --git a/gcc/testsuite/gfortran.dg/assumed_type_2.f90 b/gcc/testsuite/gfortran.dg/assumed_type_2.f90
index 5d3cd7eaece9..07be87ef1eb6 100644
--- a/gcc/testsuite/gfortran.dg/assumed_type_2.f90
+++ b/gcc/testsuite/gfortran.dg/assumed_type_2.f90
@@ -147,12 +147,12 @@ end

 ! { dg-final { scan-tree-dump-times "sub_scalar .&scalar_int," 1 "original" } }
 ! { dg-final { scan-tree-dump-times "sub_scalar .&scalar_t1," 1 "original" } }
-! { dg-final { scan-tree-dump-times "sub_scalar .&array_int.1.," 1 "original" } }
+! { dg-final { scan-tree-dump-times "sub_scalar .&.*array_int" 1 "original" } }
 ! { dg-final { scan-tree-dump-times "sub_scalar .&scalar_t1," 1 "original" } }

-! { dg-final { scan-tree-dump-times "sub_scalar .&\\(.\\(real.kind=4..0:. . restrict\\) array_real_alloc.data" 1 "original" } }
+! { dg-final { scan-tree-dump-times "sub_scalar .&.*real.kind=4..0.*restrict.*array_real_alloc.data" 1 "original" } }
 ! { dg-final { scan-tree-dump-times "sub_scalar .\\(character.kind=1..1:1. .\\) .array_char_ptr.data" 1 "original" } }
-! { dg-final { scan-tree-dump-times "sub_scalar .&\\(.\\(struct t2.0:. . restrict\\) array_t2_alloc.data" 1 "original" } }
+! { dg-final { scan-tree-dump-times "sub_scalar .&.*struct t2.0:..*restrict.*array_t2_alloc.data" 1 "original" } }
 ! { dg-final { scan-tree-dump-times "sub_scalar .\\(struct t3 .\\) .array_t3_ptr.data" 1 "original" } }
 ! { dg-final { scan-tree-dump-times "sub_scalar .\\(struct t1 .\\) array_class_t1_alloc._data.data" 1 "original" } }
 ! { dg-final { scan-tree-dump-times "sub_scalar .\\(struct t1 .\\) \\(array_class_t1_ptr._data.dat" 1 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95
index a3ad591f926c..d8d14c42be01 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95
@@ -7,7 +7,7 @@ program main
    integer :: a(100,100), b(100,100)
    integer :: i, j, d

-   !$acc kernels ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+   !$acc kernels ! { dg-message "optimized: assigned OpenACC gang loop parallelism" }
    do i=1,100
      do j=1,100
        a(i,j) = 1
diff --git a/gcc/testsuite/gfortran.dg/gomp/affinity-clause-1.f90 b/gcc/testsuite/gfortran.dg/gomp/affinity-clause-1.f90
index 13bdd36d0b4d..51c6013565a1 100644
--- a/gcc/testsuite/gfortran.dg/gomp/affinity-clause-1.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/affinity-clause-1.f90
@@ -22,7 +22,7 @@ end

 ! { dg-final { scan-tree-dump-times "D\\.\[0-9\]+ = .integer.kind=4.. __builtin_cosf ..real.kind=4.. a \\+ 1.0e\\+0\\);" 2 "original" } }

-! { dg-final { scan-tree-dump-times "#pragma omp task affinity\\(iterator\\(integer\\(kind=4\\) jj=2:5:2, integer\\(kind=4\\) i=D\\.\[0-9\]+:5:1\\):\\*\\(c_char \\*\\) &b\\\[.* <?i>? \\+ -1\\\]\\) affinity\\(iterator\\(integer\\(kind=4\\) jj=2:5:2, integer\\(kind=4\\) i=D\\.\[0-9\]+:5:1\\):\\*\\(c_char \\*\\) &d\\\[\\(.*jj \\* 5 \\+ .* <?i>?\\) \\+ -6\\\]\\)" 1 "original" } }
+! { dg-final { scan-tree-dump-times "#pragma omp task affinity\\(iterator\\(integer\\(kind=4\\) jj=2:5:2, integer\\(kind=4\\) i=D\\.\[0-9\]+:5:1\\):\\*\\(c_char \\*\\) &b\\\[.* <?i>? \\+ -1\\\]\\) affinity\\(iterator\\(integer\\(kind=4\\) jj=2:5:2, integer\\(kind=4\\) i=D\\.\[0-9\]+:5:1\\):\\*\\(c_char \\*\\) &\\(\\(integer\\(kind.*?d\\).*?$" 1 "original" } }

 ! { dg final { scan-tree-dump-times "#pragma omp task affinity\\(iterator\\(integer\\(kind=4\\) i=D.3938:5:1\\):\\*\\(c_char \\*\\) &b\\\[\\(.* <?i>? \\+ -1\\\]\\) affinity\\(iterator\\(integer\\(kind=4\\) i=D\\.\[0-9\]+:5:1\\):\\*\\(c_char \\*\\) &d\\\[\\(\\(integer\\(kind=8\\)\\) i \\+ -1\\) \\* 6\\\]\\)"  1 "original" } }

diff --git a/gcc/testsuite/gfortran.dg/graphite/block-2.f b/gcc/testsuite/gfortran.dg/graphite/block-2.f
index bea8ddeb8267..266da378c5d9 100644
--- a/gcc/testsuite/gfortran.dg/graphite/block-2.f
+++ b/gcc/testsuite/gfortran.dg/graphite/block-2.f
@@ -1,5 +1,11 @@
 ! { dg-do compile }
 ! { dg-additional-options "-std=legacy" }
+
+! ldist introduces a __builtin_memset for the first loop and hence
+! breaks the testcases's assumption regarding the number of SCoPs
+! because Graphite cannot deal with the call.
+! { dg-additional-options "-fdisable-tree-ldist" }
+
       SUBROUTINE MATRIX_MUL_UNROLLED (A, B, C, L, M, N)
       DIMENSION A(L,M), B(M,N), C(L,N)

@@ -18,5 +24,4 @@
       RETURN
       END

-! Disabled for now as it requires delinearization.
-! { dg-final { scan-tree-dump-times "number of SCoPs: 2" 1 "graphite" { xfail *-*-* } } }
+! { dg-final { scan-tree-dump-times "number of SCoPs: 2" 1 "graphite" } }
diff --git a/gcc/testsuite/gfortran.dg/graphite/block-3.f90 b/gcc/testsuite/gfortran.dg/graphite/block-3.f90
index 452de7349050..0edca92bb894 100644
--- a/gcc/testsuite/gfortran.dg/graphite/block-3.f90
+++ b/gcc/testsuite/gfortran.dg/graphite/block-3.f90
@@ -12,6 +12,6 @@ enddo

 end subroutine matrix_multiply

-! { dg-final { scan-tree-dump-times "number of SCoPs: 1" 1 "graphite" { xfail *-*-* } } }
+! { dg-final { scan-tree-dump-times "number of SCoPs: 1" 1 "graphite" } }
 ! { dg-final { scan-tree-dump-times "will be loop blocked" 1 "graphite" { xfail *-*-* } } }

diff --git a/gcc/testsuite/gfortran.dg/graphite/block-4.f90 b/gcc/testsuite/gfortran.dg/graphite/block-4.f90
index 42af5b62444e..f2aed98bcf82 100644
--- a/gcc/testsuite/gfortran.dg/graphite/block-4.f90
+++ b/gcc/testsuite/gfortran.dg/graphite/block-4.f90
@@ -15,6 +15,6 @@ enddo

 end subroutine matrix_multiply

-! { dg-final { scan-tree-dump-times "number of SCoPs: 1" 1 "graphite" { xfail *-*-* } } }
+! { dg-final { scan-tree-dump-times "number of SCoPs: 1" 1 "graphite" } }
 ! { dg-final { scan-tree-dump-times "will be loop blocked" 1 "graphite" { xfail *-*-* } } }

diff --git a/gcc/testsuite/gfortran.dg/graphite/id-9.f b/gcc/testsuite/gfortran.dg/graphite/id-9.f
index c93937088972..885a9dfaa1bb 100644
--- a/gcc/testsuite/gfortran.dg/graphite/id-9.f
+++ b/gcc/testsuite/gfortran.dg/graphite/id-9.f
@@ -8,7 +8,7 @@
                   do l=1,3
                      do k=1,l
                      enddo
-                     bar(k,l)=bar(k,l)+(v3b-1.d0)
+                     bar(k,l)=bar(k,l)+(v3b-1.d0) ! { dg-bogus ".*iteration 2 invokes undefined behavior" "TODO" { xfail *-*-* }   }
                   enddo
             enddo
             do m=1,ne
diff --git a/gcc/testsuite/gfortran.dg/inline_matmul_16.f90 b/gcc/testsuite/gfortran.dg/inline_matmul_16.f90
index 580cb1ac9393..2a7f63b9c963 100644
--- a/gcc/testsuite/gfortran.dg/inline_matmul_16.f90
+++ b/gcc/testsuite/gfortran.dg/inline_matmul_16.f90
@@ -1,5 +1,7 @@
 ! { dg-do run }
 ! { dg-options "-ffrontend-optimize -fdump-tree-optimized -Wrealloc-lhs -finline-matmul-limit=1000 -O" }
+! { dg-additional-options "--param delinearize=0" } TODO
+
 ! PR 66094: Check functionality for MATMUL(TRANSPOSE(A),B)) for two-dimensional arrays
 program main
   implicit none
diff --git a/gcc/testsuite/gfortran.dg/inline_matmul_24.f90 b/gcc/testsuite/gfortran.dg/inline_matmul_24.f90
index 3168d5f10064..8d84f3cdb01b 100644
--- a/gcc/testsuite/gfortran.dg/inline_matmul_24.f90
+++ b/gcc/testsuite/gfortran.dg/inline_matmul_24.f90
@@ -39,4 +39,4 @@ program testMATMUL
       call abort()
     end if
 end program testMATMUL
-! { dg-final { scan-tree-dump-times "gamma5\\\[__var_1_do \\* 4 \\+ __var_2_do\\\]|gamma5\\\[NON_LVALUE_EXPR <__var_1_do> \\* 4 \\+ NON_LVALUE_EXPR <__var_2_do>\\\]" 1 "original" } }
+! { dg-final { scan-tree-dump-times "gamma5.*\\\[NON_LVALUE_EXPR <__var_1_do>\\\]\\\[NON_LVALUE_EXPR <__var_2_do>\\\]" 1 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/no_arg_check_2.f90 b/gcc/testsuite/gfortran.dg/no_arg_check_2.f90
index 3570b9719ebb..0900dd82646f 100644
--- a/gcc/testsuite/gfortran.dg/no_arg_check_2.f90
+++ b/gcc/testsuite/gfortran.dg/no_arg_check_2.f90
@@ -129,12 +129,12 @@ end

 ! { dg-final { scan-tree-dump-times "sub_scalar .&scalar_int," 1 "original" } }
 ! { dg-final { scan-tree-dump-times "sub_scalar .&scalar_t1," 1 "original" } }
-! { dg-final { scan-tree-dump-times "sub_scalar .&array_int.1.," 1 "original" } }
+! { dg-final { scan-tree-dump-times "sub_scalar .&.*array_int" 1 "original" } }
 ! { dg-final { scan-tree-dump-times "sub_scalar .&scalar_t1," 1 "original" } }

-! { dg-final { scan-tree-dump-times "sub_scalar .&\\(.\\(real.kind=4..0:. . restrict\\) array_real_alloc.data" 1 "original" } }
+! { dg-final { scan-tree-dump-times "sub_scalar .&.*real.kind=4..0.*restrict.*array_real_alloc.data" 1 "original" } }
 ! { dg-final { scan-tree-dump-times "sub_scalar .\\(character.kind=1..1:1. .\\) .array_char_ptr.data" 1 "original" } }
-! { dg-final { scan-tree-dump-times "sub_scalar .&\\(.\\(struct t2.0:. . restrict\\) array_t2_alloc.data" 1 "original" } }
+! { dg-final { scan-tree-dump-times "sub_scalar .&.*struct t2.0:..*restrict.*array_t2_alloc.data" 1 "original" } }
 ! { dg-final { scan-tree-dump-times "sub_scalar .\\(struct t3 .\\) .array_t3_ptr.data" 1 "original" } }
 ! { dg-final { scan-tree-dump-times "sub_scalar .\\(struct t1 .\\) array_class_t1_alloc._data.data" 1 "original" } }
 ! { dg-final { scan-tree-dump-times "sub_scalar .\\(struct t1 .\\) \\(array_class_t1_ptr._data.dat" 1 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/pr32921.f b/gcc/testsuite/gfortran.dg/pr32921.f
index 0661208edde5..853438609c43 100644
--- a/gcc/testsuite/gfortran.dg/pr32921.f
+++ b/gcc/testsuite/gfortran.dg/pr32921.f
@@ -45,4 +45,4 @@

       RETURN
       END
-! { dg-final { scan-tree-dump-times "stride" 4 "lim2" } }
+! { dg-final { scan-tree-dump-times "ubound" 4 "lim2" } }
diff --git a/gcc/testsuite/gfortran.dg/reassoc_4.f b/gcc/testsuite/gfortran.dg/reassoc_4.f
index fdcb46e835cf..2368b76aecb2 100644
--- a/gcc/testsuite/gfortran.dg/reassoc_4.f
+++ b/gcc/testsuite/gfortran.dg/reassoc_4.f
@@ -1,5 +1,5 @@
 ! { dg-do compile }
-! { dg-options "-O3 -ffast-math -fdump-tree-reassoc1 --param max-completely-peeled-insns=200" }
+! { dg-options "-O3 -ffast-math -fdump-tree-reassoc1 --param max-completely-peeled-insns=200 --param delinearize=0" }
       subroutine anisonl(w,vo,anisox,s,ii1,jj1,weight)
       integer ii1,jj1,i1,iii1,j1,jjj1,k1,l1,m1,n1
       real*8 w(3,3),vo(3,3),anisox(3,3,3,3),s(60,60),weight
diff --git a/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f b/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f
index 08965cc5e202..6c469b1964c6 100644
--- a/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f
+++ b/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f
@@ -3,6 +3,7 @@
 ! { dg-options "-O3 --param vect-max-peeling-for-alignment=0 -fpredictive-commoning -fdump-tree-pcom-details -std=legacy" }
 ! { dg-additional-options "-mprefer-avx128" { target { i?86-*-* x86_64-*-* } } }
 ! { dg-additional-options "-mzarch" { target { s390*-*-* } } }
+! { dg-additional-options "--param delinearize=0" } TODO

 ******* RESID COMPUTES THE RESIDUAL:  R = V - AU
 *
--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 14/40] openacc: Move pass_oacc_device_lower after pass_graphite
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
                   ` (12 preceding siblings ...)
  2021-12-15 15:54 ` [PATCH 13/40] Fortran: Delinearize array accesses Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2021-12-15 15:54 ` [PATCH 15/40] graphite: Extend SCoP detection dump output Frederik Harwath
                   ` (25 subsequent siblings)
  39 siblings, 0 replies; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: thomas, rguenther

The OpenACC device lowering pass must run after the Graphite pass to
allow for the use of Graphite for automatic parallelization of kernels
regions in the future. Experimentation has shown that it is best,
performancewise, to run pass_oacc_device_lower together with the
related passes pass_oacc_loop_designation and pass_oacc_gimple_workers
early after pass_graphite in pass_tree_loop, at least if the other
tree loop passes are not adjusted. In particular, to enable
vectorization which is crucial for GCN offloading, device lowering
should happen before pass_vectorize. To bring the loops contained in
the offloading functions into the shape expected by the loop
vectorizer, we have to make sure that some passes that previously were
executed only once before pass_tree_loop are also executed on the
offloading functions.  To ensure the execution of
pass_oacc_device_lower if pass_tree_loop does not execute (no loops,
no optimizations), we introduce two further copies of the pass to the
pipeline that run if there are no loops or if no optimization is
performed.

gcc/ChangeLog:

        * omp-general.c (oacc_get_fn_dim_size): Return 0 on
        missing "dims".
        * omp-oacc-neuter-broadcast.cc:
        Make pass_omp_oacc_neuter_broadcast clonable.
        * omp-offload.c (pass_oacc_loop_designation::clone): New
        member function.
        (pass_oacc_gimple_workers::clone): Likewise.
        (pass_oacc_gimple_device_lower::clone): Likewise.
        * passes.c (pass_data_no_loop_optimizations): New pass_data.
        (class pass_no_loop_optimizations): New pass.
        (make_pass_no_loop_optimizations): New function.
        * passes.def: Move pass_oacc_{loop_designation,
        gimple_workers, device_lower} into tree_loop, and add
        copies to pass_tree_no_loop and to new
        pass_no_loop_optimizations.  Add copies of passes pass_ccp,
        pass_ipa_warn, pass_complete_unrolli, pass_backprop,
        pass_phiprop, pass_fix_loops after the OpenACC passes
        in pass_tree_loop.
        * tree-ssa-loop-ivcanon.c (pass_complete_unroll::clone):
        New member function.
        (pass_complete_unrolli::clone): Likewise.
        * tree-ssa-loop.c (pass_fix_loops::clone): Likewise.
        (pass_tree_loop_init::clone): Likewise.
        (pass_tree_loop_done::clone): Likewise.
        * tree-ssa-phiprop.c (pass_phiprop::clone): Likewise.
        * tree-pass.h (make_pass_oacc_only): New declaration.
        (make_pass_oacc_functions_only): New declaration.

libgomp/ChangeLog:

        * testsuite/libgomp.oacc-c-c++-common/pr85486-2.c: Adjust
        expected output to pass name changes due to the pass
        reordering and cloning.
        * testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c: Likewise.
        * testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c: Likewise.
        * testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c: Likewise.
        * testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c: Likewise.
        * testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c: Likewise.
        * testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c: Likewise
        * testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c: Likewise.

gcc/testsuite/ChangeLog:

        * gcc.dg/goacc/loop-processing-1.c: Adjust expected output
        to pass name changes due to the pass reordering and cloning.
        * c-c++-common/goacc/classify-kernels-unparallelized.c: Likewise.
        * c-c++-common/goacc/classify-kernels.c: Likewise.
        * c-c++-common/goacc/classify-parallel.c: Likewise.
        * c-c++-common/goacc/classify-routine.c: Likewise.
        * c-c++-common/goacc/routine-nohost-1.c: Likewise.
        * c-c++-common/unroll-1.c: Likewise.
        * c-c++-common/unroll-4.c: Likewise.
        * gcc.dg/tree-ssa/backprop-1.c: Likewise.
        * gcc.dg/tree-ssa/backprop-2.c: Likewise.
        * gcc.dg/tree-ssa/backprop-3.c: Likewise.
        * gcc.dg/tree-ssa/backprop-4.c: Likewise.
        * gcc.dg/tree-ssa/backprop-5.c: Likewise.
        * gcc.dg/tree-ssa/backprop-6.c: Likewise.
        * gcc.dg/tree-ssa/cunroll-1.c: Likewise.
        * gcc.dg/tree-ssa/cunroll-3.c: Likewise.
        * gcc.dg/tree-ssa/cunroll-9.c: Likewise.
        * gcc.dg/tree-ssa/ldist-17.c: Likewise.
        * gcc.dg/tree-ssa/loop-38.c: Likewise.
        * gcc.dg/tree-ssa/pr21463.c: Likewise.
        * gcc.dg/tree-ssa/pr45427.c: Likewise.
        * gcc.dg/tree-ssa/pr61743-1.c: Likewise.
        * gcc.dg/unroll-2.c: Likewise.
        * gcc.dg/unroll-3.c: Likewise.
        * gcc.dg/unroll-4.c: Likewise.
        * gcc.dg/unroll-5.c: Likewise.
        * gcc.dg/vect/vect-profile-1.c: Likewise.
        * gcc.dg/tree-ssa/loopclosedphi.c: Likewise.
        * gcc.dg/tree-ssa/pr59597.c: Likewise.
        * gcc.dg/vect/bb-slp-59.c: Likewise.
        * c-c++-common/goacc/device-lowering-debug-optimization.c: New test.
        * c-c++-common/goacc/device-lowering-no-loops.c: New test.
        * c-c++-common/goacc/device-lowering-no-optimization.c: New test.

Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
---
 gcc/omp-general.c                             |  8 +-
 gcc/omp-oacc-neuter-broadcast.cc              |  2 +
 gcc/omp-offload.c                             |  6 ++
 gcc/passes.c                                  | 42 ++++++++
 gcc/passes.def                                | 44 ++++++++-
 .../goacc/classify-kernels-unparallelized.c   |  8 +-
 .../c-c++-common/goacc/classify-kernels.c     |  8 +-
 .../c-c++-common/goacc/classify-parallel.c    |  8 +-
 .../c-c++-common/goacc/classify-routine.c     | 22 ++---
 .../device-lowering-debug-optimization.c      | 29 ++++++
 .../goacc/device-lowering-no-loops.c          | 17 ++++
 .../goacc/device-lowering-no-optimization.c   | 30 ++++++
 .../c-c++-common/goacc/routine-nohost-1.c     |  6 +-
 gcc/testsuite/c-c++-common/unroll-1.c         |  8 +-
 gcc/testsuite/c-c++-common/unroll-4.c         |  4 +-
 .../gcc.dg/goacc/loop-processing-1.c          |  5 +-
 gcc/testsuite/gcc.dg/tree-ssa/backprop-1.c    |  6 +-
 gcc/testsuite/gcc.dg/tree-ssa/backprop-2.c    |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/backprop-3.c    |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/backprop-4.c    |  6 +-
 gcc/testsuite/gcc.dg/tree-ssa/backprop-5.c    |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c    |  6 +-
 gcc/testsuite/gcc.dg/tree-ssa/cunroll-1.c     |  6 +-
 gcc/testsuite/gcc.dg/tree-ssa/cunroll-3.c     |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/cunroll-9.c     |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/ldist-17.c      |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/loop-38.c       |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/loopclosedphi.c |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr21463.c       |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr45427.c       |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr59597.c       |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c     |  2 +-
 gcc/testsuite/gcc.dg/unroll-2.c               |  2 +-
 gcc/testsuite/gcc.dg/unroll-3.c               |  4 +-
 gcc/testsuite/gcc.dg/unroll-4.c               |  4 +-
 gcc/testsuite/gcc.dg/unroll-5.c               |  4 +-
 gcc/testsuite/gcc.dg/vect/bb-slp-59.c         |  2 +-
 gcc/testsuite/gcc.dg/vect/vect-profile-1.c    |  2 +-
 gcc/tree-pass.h                               |  2 +
 gcc/tree-ssa-loop-ivcanon.c                   |  2 +
 gcc/tree-ssa-loop.c                           | 99 +++++++++++++++++++
 gcc/tree-ssa-phiprop.c                        |  2 +
 .../libgomp.oacc-c-c++-common/pr85486-2.c     |  2 +-
 .../vector-length-128-1.c                     |  2 +-
 .../vector-length-128-2.c                     |  3 +-
 .../vector-length-128-3.c                     |  2 +-
 .../vector-length-128-4.c                     |  2 +-
 .../vector-length-128-5.c                     |  2 +-
 .../vector-length-128-6.c                     |  2 +-
 .../vector-length-128-7.c                     |  2 +-
 50 files changed, 363 insertions(+), 88 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/device-lowering-debug-optimization.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/device-lowering-no-loops.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/device-lowering-no-optimization.c

diff --git a/gcc/omp-general.c b/gcc/omp-general.c
index 445275524134..27a1bc8092c8 100644
--- a/gcc/omp-general.c
+++ b/gcc/omp-general.c
@@ -2954,7 +2954,13 @@ oacc_get_fn_dim_size (tree fn, int axis)
   while (axis--)
     dims = TREE_CHAIN (dims);

-  int size = TREE_INT_CST_LOW (TREE_VALUE (dims));
+  tree v = TREE_VALUE (dims);
+  /* TODO With 'pass_oacc_device_lower' moved "later", this is necessary to
+     avoid ICE for some OpenACC 'kernels' ("parloops") constructs.  */
+  if (v == NULL_TREE)
+    return 0;
+
+  int size = TREE_INT_CST_LOW (v);

   return size;
 }
diff --git a/gcc/omp-oacc-neuter-broadcast.cc b/gcc/omp-oacc-neuter-broadcast.cc
index e43338f3abf2..94ecdc4d4e9a 100644
--- a/gcc/omp-oacc-neuter-broadcast.cc
+++ b/gcc/omp-oacc-neuter-broadcast.cc
@@ -1992,6 +1992,8 @@ public:
       return execute_omp_oacc_neuter_broadcast ();
     }

+  opt_pass * clone () { return new pass_omp_oacc_neuter_broadcast (m_ctxt); }
+
 }; // class pass_omp_oacc_neuter_broadcast

 } // anon namespace
diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 833f7ddea58f..e99aaac0e515 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -2444,6 +2444,8 @@ public:
       return execute_oacc_loop_designation ();
     }

+  opt_pass * clone () { return new pass_oacc_loop_designation (m_ctxt); }
+
 }; // class pass_oacc_loop_designation

 const pass_data pass_data_oacc_device_lower =
@@ -2467,12 +2469,16 @@ public:
   {}

   /* opt_pass methods: */
+  /* TODO If this were gated on something like '!(fun->curr_properties &
+     PROP_gimple_oaccdevlow)', then we could easily have several instances
+     in the pass pipeline? */
   virtual bool gate (function *) { return flag_openacc; };

   virtual unsigned int execute (function *)
     {
       return execute_oacc_device_lower ();
     }
+  opt_pass * clone () { return new pass_oacc_device_lower (m_ctxt); }

 }; // class pass_oacc_device_lower

diff --git a/gcc/passes.c b/gcc/passes.c
index 64550b00b43c..4a1f4a4b5900 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -620,6 +620,48 @@ make_pass_all_optimizations_g (gcc::context *ctxt)

 namespace {

+const pass_data pass_data_no_loop_optimizations =
+{
+  GIMPLE_PASS, /* type */
+  "*no_loop_optimizations", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_OPTIMIZE, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+/* This pass runs if loop optimizations are disabled
+   at the current optimization level. */
+
+class pass_no_loop_optimizations : public gimple_opt_pass
+{
+public:
+  pass_no_loop_optimizations (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_no_loop_optimizations, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool
+  gate (function *)
+  {
+    return !optimize || optimize_debug;
+  }
+
+}; // class pass_no_loop_optimizations
+
+} // anon namespace
+
+static gimple_opt_pass *
+make_pass_no_loop_optimizations (gcc::context *ctxt)
+{
+  return new pass_no_loop_optimizations (ctxt);
+}
+
+namespace {
+
 const pass_data pass_data_rest_of_compilation =
 {
   RTL_PASS, /* type */
diff --git a/gcc/passes.def b/gcc/passes.def
index 0f541454e7f1..5b9bb422d281 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -183,9 +183,6 @@ along with GCC; see the file COPYING3.  If not see
   INSERT_PASSES_AFTER (all_passes)
   NEXT_PASS (pass_fixup_cfg);
   NEXT_PASS (pass_lower_eh_dispatch);
-  NEXT_PASS (pass_oacc_loop_designation);
-  NEXT_PASS (pass_omp_oacc_neuter_broadcast);
-  NEXT_PASS (pass_oacc_device_lower);
   NEXT_PASS (pass_omp_device_lower);
   NEXT_PASS (pass_omp_target_link);
   NEXT_PASS (pass_adjust_alignment);
@@ -292,6 +289,35 @@ along with GCC; see the file COPYING3.  If not see
          POP_INSERT_PASSES ()
          NEXT_PASS (pass_parallelize_loops, false /* oacc_kernels_p */);
          NEXT_PASS (pass_expand_omp_ssa);
+
+         /* Interrupt pass_tree_loop for OpenACC device lowering. */
+         NEXT_PASS (pass_oacc_only);
+         PUSH_INSERT_PASSES_WITHIN (pass_oacc_only)
+           NEXT_PASS (pass_tree_loop_done);
+           NEXT_PASS (pass_oacc_loop_designation);
+           NEXT_PASS (pass_omp_oacc_neuter_broadcast);
+           NEXT_PASS (pass_oacc_device_lower);
+
+           NEXT_PASS (pass_oacc_functions_only);
+           PUSH_INSERT_PASSES_WITHIN (pass_oacc_functions_only)
+               /* Repeat some passes on OpenACC functions after device lowering. */
+               /* Lower complex instructions arising from OpenACC
+               reductions. */
+               NEXT_PASS (pass_lower_complex);
+               /* Those passes are necessary here to allow the loop vectorizer to
+               work on the offloading functions which is important for AMD GCN
+               offloading. */
+               NEXT_PASS (pass_ccp, true /* nonzero_p */);
+               NEXT_PASS (pass_complete_unrolli);
+               NEXT_PASS (pass_backprop);
+               NEXT_PASS (pass_phiprop);
+               NEXT_PASS (pass_fix_loops);
+           POP_INSERT_PASSES ()
+
+          /* Continue pass_tree_loop after OpenACC device lowering. */
+         NEXT_PASS (pass_tree_loop_init);
+         POP_INSERT_PASSES ()
+
          NEXT_PASS (pass_ch_vect);
          NEXT_PASS (pass_if_conversion);
          /* pass_vectorize must immediately follow pass_if_conversion.
@@ -311,15 +337,21 @@ along with GCC; see the file COPYING3.  If not see
          NEXT_PASS (pass_loop_prefetch);
          /* Run IVOPTs after the last pass that uses data-reference analysis
             as that doesn't handle TARGET_MEM_REFs.  */
+
          NEXT_PASS (pass_iv_optimize);
          NEXT_PASS (pass_lim);
          NEXT_PASS (pass_tree_loop_done);
       POP_INSERT_PASSES ()
+
+
       /* Pass group that runs when pass_tree_loop is disabled or there
          are no loops in the function.  */
       NEXT_PASS (pass_tree_no_loop);
       PUSH_INSERT_PASSES_WITHIN (pass_tree_no_loop)
          NEXT_PASS (pass_slp_vectorize);
+         NEXT_PASS (pass_oacc_loop_designation);
+         NEXT_PASS (pass_omp_oacc_neuter_broadcast);
+         NEXT_PASS (pass_oacc_device_lower);
       POP_INSERT_PASSES ()
       NEXT_PASS (pass_simduid_cleanup);
       NEXT_PASS (pass_lower_vector_ssa);
@@ -397,6 +429,12 @@ along with GCC; see the file COPYING3.  If not see
       NEXT_PASS (pass_local_pure_const);
       NEXT_PASS (pass_modref);
   POP_INSERT_PASSES ()
+  NEXT_PASS (pass_no_loop_optimizations);
+  PUSH_INSERT_PASSES_WITHIN (pass_no_loop_optimizations)
+      NEXT_PASS (pass_oacc_loop_designation);
+      NEXT_PASS (pass_omp_oacc_neuter_broadcast);
+      NEXT_PASS (pass_oacc_device_lower);
+  POP_INSERT_PASSES ()
   NEXT_PASS (pass_tm_init);
   PUSH_INSERT_PASSES_WITHIN (pass_tm_init)
       NEXT_PASS (pass_tm_mark);
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
index e391184f403d..338676aa20ff 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
@@ -6,7 +6,7 @@
    { dg-additional-options "-fopt-info-optimized-omp" }
    { dg-additional-options "-fdump-tree-ompexp" }
    { dg-additional-options "-fdump-tree-parloops1-all" }
-   { dg-additional-options "-fdump-tree-oaccloops" } */
+   { dg-additional-options "-fdump-tree-oaccloops1" } */

 /* { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting
    aspects of that functionality.  */
@@ -39,6 +39,6 @@ void KERNELS ()

 /* Check the offloaded function's classification and compute dimensions (will
    always be 1 x 1 x 1 for non-offloading compilation).
-   { dg-final { scan-tree-dump-times "(?n)Function is unparallelized OpenACC kernels offload" 1 "oaccloops" } }
-   { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } }
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccloops" } } */
+   { dg-final { scan-tree-dump-times "(?n)Function is unparallelized OpenACC kernels offload" 1 "oaccloops1" } }
+   { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops1" } }
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccloops1" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-kernels.c b/gcc/testsuite/c-c++-common/goacc/classify-kernels.c
index 779e2b0a24db..37e2a57455d1 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-kernels.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-kernels.c
@@ -6,7 +6,7 @@
    { dg-additional-options "-fopt-info-optimized-omp" }
    { dg-additional-options "-fdump-tree-ompexp" }
    { dg-additional-options "-fdump-tree-parloops1-all" }
-   { dg-additional-options "-fdump-tree-oaccloops" } */
+   { dg-additional-options "-fdump-tree-oaccloops1" } */

 /* { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting
    aspects of that functionality.  */
@@ -35,6 +35,6 @@ void KERNELS ()

 /* Check the offloaded function's classification and compute dimensions (will
    always be 1 x 1 x 1 for non-offloading compilation).
-   { dg-final { scan-tree-dump-times "(?n)Function is parallelized OpenACC kernels offload" 1 "oaccloops" } }
-   { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } }
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccloops" } } */
+   { dg-final { scan-tree-dump-times "(?n)Function is parallelized OpenACC kernels offload" 1 "oaccloops1" } }
+   { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops1" } }
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccloops1" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-parallel.c b/gcc/testsuite/c-c++-common/goacc/classify-parallel.c
index 9056aa69dad6..82b70ae280cd 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-parallel.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-parallel.c
@@ -4,7 +4,7 @@
 /* { dg-additional-options "-O2" }
    { dg-additional-options "-fopt-info-optimized-omp" }
    { dg-additional-options "-fdump-tree-ompexp" }
-   { dg-additional-options "-fdump-tree-oaccloops" } */
+   { dg-additional-options "-fdump-tree-oaccloops1" } */

 /* { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting
    aspects of that functionality.  */
@@ -27,6 +27,6 @@ void PARALLEL ()

 /* Check the offloaded function's classification and compute dimensions (will
    always be 1 x 1 x 1 for non-offloading compilation).
-   { dg-final { scan-tree-dump-times "(?n)Function is OpenACC parallel offload" 1 "oaccloops" } }
-   { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } }
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc parallel, omp target entrypoint\\)\\)" 1 "oaccloops" } } */
+   { dg-final { scan-tree-dump-times "(?n)Function is OpenACC parallel offload" 1 "oaccloops1" } }
+   { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops1" } }
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc parallel, omp target entrypoint\\)\\)" 1 "oaccloops1" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-routine.c b/gcc/testsuite/c-c++-common/goacc/classify-routine.c
index f7f0454009bf..cd539370dbbf 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-routine.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-routine.c
@@ -4,7 +4,7 @@
 /* { dg-additional-options "-O2" }
    { dg-additional-options "-fopt-info-optimized-omp" }
    { dg-additional-options "-fdump-tree-ompexp" }
-   { dg-additional-options "-fdump-tree-oaccloops" } */
+   { dg-additional-options "-fdump-tree-oaccloops1" } */

 /* { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting
    aspects of that functionality.  */
@@ -29,14 +29,14 @@ void ROUTINE ()

 /* Check the offloaded function's classification and compute dimensions (will
    always be 1 x 1 x 1 for non-offloading compilation).
-   { dg-final { scan-tree-dump-times "(?n)Function is OpenACC routine level 1" 1 "oaccloops" } }
-   { dg-final { scan-tree-dump-times "(?n)OpenACC routine 'ROUTINE' doesn't have 'nohost' clause" 1 "oaccloops" { target c } } }
-   { dg-final { scan-tree-dump-times "(?n)OpenACC routine 'void ROUTINE\\(\\)' doesn't have 'nohost' clause" 1 "oaccloops" { target { c++ && { ! offloading_enabled } } } } }
-   { dg-final { scan-tree-dump-times "(?n)OpenACC routine 'ROUTINE\\(\\)' doesn't have 'nohost' clause" 1 "oaccloops" { target { c++ && offloading_enabled } } } }
-   { dg-final { scan-tree-dump-times "(?n)OpenACC routine 'ROUTINE' not discarded" 1 "oaccloops" { target c } } }
-   { dg-final { scan-tree-dump-times "(?n)OpenACC routine 'void ROUTINE\\(\\)' not discarded" 1 "oaccloops" { target { c++ && { ! offloading_enabled } } } } }
-   { dg-final { scan-tree-dump-times "(?n)OpenACC routine 'ROUTINE\\(\\)' not discarded" 1 "oaccloops" { target { c++ && offloading_enabled } } } }
+   { dg-final { scan-tree-dump-times "(?n)Function is OpenACC routine level 1" 1 "oaccloops1" } }
+   { dg-final { scan-tree-dump-times "(?n)OpenACC routine 'ROUTINE' doesn't have 'nohost' clause" 1 "oaccloops1" { target c } } }
+   { dg-final { scan-tree-dump-times "(?n)OpenACC routine 'void ROUTINE\\(\\)' doesn't have 'nohost' clause" 1 "oaccloops1" { target { c++ && { ! offloading_enabled } } } } }
+   { dg-final { scan-tree-dump-times "(?n)OpenACC routine 'ROUTINE\\(\\)' doesn't have 'nohost' clause" 1 "oaccloops1" { target { c++ && offloading_enabled } } } }
+   { dg-final { scan-tree-dump-times "(?n)OpenACC routine 'ROUTINE' not discarded" 1 "oaccloops1" { target c } } }
+   { dg-final { scan-tree-dump-times "(?n)OpenACC routine 'void ROUTINE\\(\\)' not discarded" 1 "oaccloops1" { target { c++ && { ! offloading_enabled } } } } }
+   { dg-final { scan-tree-dump-times "(?n)OpenACC routine 'ROUTINE\\(\\)' not discarded" 1 "oaccloops1" { target { c++ && offloading_enabled } } } }
    TODO See PR101551 for 'offloading_enabled' differences.
-   { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } }
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(0 1, 1 1, 1 1\\), omp declare target \\(worker\\), oacc function \\(0 1, 1 0, 1 0\\)\\)\\)" 1 "oaccloops" } }
-   { dg-final { scan-tree-dump-times "(?n)void ROUTINE \\(\\)" 1 "oaccloops" } } */
+   { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops1" } }
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(0 1, 1 1, 1 1\\), omp declare target \\(worker\\), oacc function \\(0 1, 1 0, 1 0\\)\\)\\)" 1 "oaccloops1" } }
+   { dg-final { scan-tree-dump-times "(?n)void ROUTINE \\(\\)" 1 "oaccloops1" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/device-lowering-debug-optimization.c b/gcc/testsuite/c-c++-common/goacc/device-lowering-debug-optimization.c
new file mode 100644
index 000000000000..5bf37cc61580
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/device-lowering-debug-optimization.c
@@ -0,0 +1,29 @@
+/* Verify that OpenACC device lowering executes with "-Og". The actual logic in
+   the test function does not matter. */
+
+/* { dg-additional-options "-Og -fdump-tree-oaccdevlow" } */
+
+int main()
+{
+  int i, j;
+  int ina[1024], out[1024], acc;
+
+  for (j = 0; j < 32; j++)
+    for (i = 0; i < 32; i++)
+      ina[j * 32 + i] = (i == j) ? 2 : 0;
+
+  acc = 0;
+#pragma acc parallel loop copy(acc, ina, out)
+      for (j = 0; j < 32; j++)
+        {
+#pragma acc loop reduction(+:acc)
+         for (i = 0; i < 32; i++)
+              acc += ina[i];
+
+         out[j] = acc;
+        }
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump ".omp_fn" "oaccdevlow3" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/device-lowering-no-loops.c b/gcc/testsuite/c-c++-common/goacc/device-lowering-no-loops.c
new file mode 100644
index 000000000000..193b5620de1d
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/device-lowering-no-loops.c
@@ -0,0 +1,17 @@
+/* Verify that OpenACC device lowering executes even if there are no OpenACC
+   loops. */
+
+/* { dg-additional-options "-O2 -fdump-tree-oaccdevlow" } */
+
+int main()
+{
+  int x;
+#pragma acc parallel copy(x)
+  {
+    asm volatile("");
+  }
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump ".omp_fn" "oaccdevlow2" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/device-lowering-no-optimization.c b/gcc/testsuite/c-c++-common/goacc/device-lowering-no-optimization.c
new file mode 100644
index 000000000000..69e2b22d73ba
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/device-lowering-no-optimization.c
@@ -0,0 +1,30 @@
+/* Verify that OpenACC device lowering executes with "-O0".  The actual
+   logic in the test function does not matter. */
+
+/* { dg-additional-options "-O0 -fdump-tree-oaccdevlow" } */
+
+int main()
+{
+
+  int i, j;
+  int ina[1024], out[1024], acc;
+
+  for (j = 0; j < 32; j++)
+    for (i = 0; i < 32; i++)
+      ina[j * 32 + i] = (i == j) ? 2 : 0;
+
+  acc = 0;
+#pragma acc parallel loop copy(acc, ina, out)
+      for (j = 0; j < 32; j++)
+        {
+#pragma acc loop reduction(+:acc)
+         for (i = 0; i < 32; i++)
+              acc += ina[i];
+
+         out[j] = acc;
+        }
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump ".omp_fn" "oaccdevlow3" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/routine-nohost-1.c b/gcc/testsuite/c-c++-common/goacc/routine-nohost-1.c
index 59ebb2bc5a9f..4f9a3a333570 100644
--- a/gcc/testsuite/c-c++-common/goacc/routine-nohost-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/routine-nohost-1.c
@@ -13,7 +13,7 @@ int THREE(void)
 #pragma acc routine nohost
 extern int THREE(void);

-/* { dg-final { scan-tree-dump-times {(?n)^OpenACC routine '[^']*THREE[^']*' has 'nohost' clause\.$} 1 oaccloops } } */
+/* { dg-final { scan-tree-dump-times {(?n)^OpenACC routine '[^']*THREE[^']*' has 'nohost' clause\.$} 1 "oaccloops*" } } */


 #pragma acc routine nohost
@@ -30,7 +30,7 @@ extern void NOTHING(void);

 #pragma acc routine (NOTHING) nohost

-/* { dg-final { scan-tree-dump-times {(?n)^OpenACC routine '[^']*NOTHING[^']*' has 'nohost' clause\.$} 1 oaccloops } } */
+/* { dg-final { scan-tree-dump-times {(?n)^OpenACC routine '[^']*NOTHING[^']*' has 'nohost' clause\.$} 1 "oaccloops*" } } */


 extern float ADD(float, float);
@@ -47,4 +47,4 @@ extern float ADD(float, float);

 #pragma acc routine (ADD) nohost

-/* { dg-final { scan-tree-dump-times {(?n)^OpenACC routine '[^']*ADD[^']*' has 'nohost' clause\.$} 1 oaccloops } } */
+/* { dg-final { scan-tree-dump-times {(?n)^OpenACC routine '[^']*ADD[^']*' has 'nohost' clause\.$} 1 "oaccloops*" } } */
diff --git a/gcc/testsuite/c-c++-common/unroll-1.c b/gcc/testsuite/c-c++-common/unroll-1.c
index fe7f4f31912c..8e57a44be231 100644
--- a/gcc/testsuite/c-c++-common/unroll-1.c
+++ b/gcc/testsuite/c-c++-common/unroll-1.c
@@ -1,5 +1,5 @@
-/* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-cunrolli-details -fdump-rtl-loop2_unroll-details" } */
+/* { dg-do compile } *
+/* { dg-options "-O2 -fdump-tree-cunrolli1-details -fdump-rtl-loop2_unroll-details" } */

 extern void bar (int);

@@ -10,12 +10,12 @@ void test (void)
   #pragma GCC unroll 8
   for (unsigned long i = 1; i <= 8; ++i)
     bar(i);
-  /* { dg-final { scan-tree-dump "11:.*: loop with 8 iterations completely unrolled" "cunrolli" } } */
+  /* { dg-final { scan-tree-dump "11:.*: loop with 8 iterations completely unrolled" "cunrolli1" } } */

   #pragma GCC unroll 8
   for (unsigned long i = 1; i <= 7; ++i)
     bar(i);
-  /* { dg-final { scan-tree-dump "16:.*: loop with 7 iterations completely unrolled" "cunrolli" } } */
+  /* { dg-final { scan-tree-dump "16:.*: loop with 7 iterations completely unrolled" "cunrolli1" } } */

   #pragma GCC unroll 8
   for (unsigned long i = 1; i <= 15; ++i)
diff --git a/gcc/testsuite/c-c++-common/unroll-4.c b/gcc/testsuite/c-c++-common/unroll-4.c
index 1c1988174ba7..fe7f9e10626e 100644
--- a/gcc/testsuite/c-c++-common/unroll-4.c
+++ b/gcc/testsuite/c-c++-common/unroll-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -funroll-all-loops -fdump-rtl-loop2_unroll-details -fdump-tree-cunrolli-details" } */
+/* { dg-options "-O2 -funroll-all-loops -fdump-rtl-loop2_unroll-details -fdump-tree-cunrolli1-details" } */

 extern void bar (int);

@@ -17,6 +17,6 @@ void test (void)
   for (unsigned long i = 1; i <= j; ++i)
     bar(i);

-  /* { dg-final { scan-tree-dump "Not unrolling loop .: user didn't want it unrolled completely" "cunrolli" } } */
+  /* { dg-final { scan-tree-dump "Not unrolling loop .: user didn't want it unrolled completely" "cunrolli1" } } */
   /* { dg-final { scan-rtl-dump-times "Not unrolling loop, user didn't want it unrolled" 2 "loop2_unroll" } } */
 }
diff --git a/gcc/testsuite/gcc.dg/goacc/loop-processing-1.c b/gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
index 78b9aed89beb..c191125b7951 100644
--- a/gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
+++ b/gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
@@ -1,5 +1,4 @@
-/* Make sure that OpenACC loop processing happens.  */
-/* { dg-additional-options "-O2 -fdump-tree-oaccloops" } */
+/* { dg-additional-options "-O2 -fdump-tree-oaccdevlow*" } */

 extern int place ();

@@ -15,4 +14,4 @@ void vector_1 (int *ary, int size)
   }
 }

-/* { dg-final { scan-tree-dump {OpenACC loops.*Loop 0\(0\).*Loop 24\(1\).*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 1, 36\);.*Head-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 1, 36\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 0\);.*Tail-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 0\);.*Loop 6\(6\).*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 2, 6\);.*Head-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 2, 6\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 1\);.*Head-1:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 2\);.*Tail-1:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 2\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 2\);.*Tail-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 1\);} "oaccloops" } } */
+/* { dg-final { scan-tree-dump {OpenACC loops.*Loop 0\(0\).*Loop 24\(1\).*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 1, 36\);.*Head-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 1, 36\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 0\);.*Tail-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 0\);.*Loop 6\(6\).*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 2, 6\);.*Head-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 2, 6\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 1\);.*Head-1:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 2\);.*Tail-1:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 2\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 2\);.*Tail-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 1\);} "oaccloops*" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/backprop-1.c b/gcc/testsuite/gcc.dg/tree-ssa/backprop-1.c
index 302fdb570b63..b6b11bf30afa 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/backprop-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/backprop-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -g -fdump-tree-backprop-details" }  */
+/* { dg-options "-O -g -fdump-tree-backprop1-details" }  */

 /* Test a simple case of non-looping code in which both uses ignore
    the sign and both definitions are sign ops.  */
@@ -18,5 +18,5 @@ TEST_FUNCTION (float, f)
 TEST_FUNCTION (double, )
 TEST_FUNCTION (long double, l)

-/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -x} 3 "backprop" } } */
-/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = ABS_EXPR <x} 3 "backprop" } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -x} 3 "backprop1" } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = ABS_EXPR <x} 3 "backprop1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/backprop-2.c b/gcc/testsuite/gcc.dg/tree-ssa/backprop-2.c
index d54fd36e2fb3..bef921be500b 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/backprop-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/backprop-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -g -fdump-tree-backprop-details" }  */
+/* { dg-options "-O -g -fdump-tree-backprop1-details" }  */

 /* Test a simple case of non-looping code in which both uses ignore
    the sign but only one definition is a sign op.  */
@@ -18,4 +18,4 @@ TEST_FUNCTION (float, f)
 TEST_FUNCTION (double, )
 TEST_FUNCTION (long double, l)

-/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -x} 3 "backprop" } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -x} 3 "backprop1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/backprop-3.c b/gcc/testsuite/gcc.dg/tree-ssa/backprop-3.c
index a244b4af2ac2..1b76ce05cbef 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/backprop-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/backprop-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -g -fdump-tree-backprop-details" }  */
+/* { dg-options "-O -g -fdump-tree-backprop1-details" }  */

 /* Test a simple case of non-looping code in which one use ignores
    the sign but another doesn't.  */
@@ -18,4 +18,4 @@ TEST_FUNCTION (float, f)
 TEST_FUNCTION (double, )
 TEST_FUNCTION (long double, l)

-/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -x} 0 "backprop" } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -x} 0 "backprop1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/backprop-4.c b/gcc/testsuite/gcc.dg/tree-ssa/backprop-4.c
index 54355009c744..02223fd9f23b 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/backprop-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/backprop-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -g -fdump-tree-backprop-details" }  */
+/* { dg-options "-O -g -fdump-tree-backprop1-details" }  */

 /* Test a simple reduction loop in which all inputs are sign ops and
    the consumer doesn't care about the sign.  */
@@ -17,5 +17,5 @@ TEST_FUNCTION (float, f)
 TEST_FUNCTION (double, )
 TEST_FUNCTION (long double, l)

-/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = __builtin_copysign} 3 "backprop" } } */
-/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -} 3 "backprop" } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = __builtin_copysign} 3 "backprop1" } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -} 3 "backprop1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/backprop-5.c b/gcc/testsuite/gcc.dg/tree-ssa/backprop-5.c
index e4f0f856ff6b..9dd04408b3a8 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/backprop-5.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/backprop-5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -g -fdump-tree-backprop-details" }  */
+/* { dg-options "-O -g -fdump-tree-backprop1-details" }  */

 /* Test a loop that does both a multiplication and addition.  The addition
    should prevent any sign ops from being removed.  */
@@ -17,4 +17,4 @@ TEST_FUNCTION (float, f)
 TEST_FUNCTION (double, )
 TEST_FUNCTION (long double, l)

-/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = __builtin_copysign} 0 "backprop" } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = __builtin_copysign} 0 "backprop1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c b/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c
index 31f05716f149..1d17c7328036 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-backprop-details" }  */
+/* { dg-options "-O -fdump-tree-backprop1-details" }  */

 void start (void *);
 void end (void *);
@@ -26,5 +26,5 @@ TEST_FUNCTION (float, f)
 TEST_FUNCTION (double, )
 TEST_FUNCTION (long double, l)

-/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -} 6 "backprop" } } */
-/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = ABS_EXPR <} 3 "backprop" } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -} 6 "backprop1" } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = ABS_EXPR <} 3 "backprop1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-1.c b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-1.c
index bcafbfe86b50..110c6cd8635e 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -fdump-tree-cunrolli-details" } */
+/* { dg-options "-O3 -fdump-tree-cunrolli1-details" } */
 int a[2];
 void
 test(int c)
@@ -9,5 +9,5 @@ test(int c)
     a[i]=5;
 }
 /* Array bounds says the loop will not roll much.  */
-/* { dg-final { scan-tree-dump "loop with 2 iterations completely unrolled" "cunrolli"} } */
-/* { dg-final { scan-tree-dump "Last iteration exit edge was proved true." "cunrolli"} } */
+/* { dg-final { scan-tree-dump "loop with 2 iterations completely unrolled" "cunrolli1"} } */
+/* { dg-final { scan-tree-dump "Last iteration exit edge was proved true." "cunrolli1"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-3.c b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-3.c
index e25c638ac514..f8ab47cebf08 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-cunrolli-details" } */
+/* { dg-options "-O2 -fdump-tree-cunrolli1-details" } */
 int a[1];
 void
 test(int c)
@@ -12,4 +12,4 @@ test(int c)
 }
 /* If we start duplicating headers prior curoll, this loop will have 0 iterations.  */

-/* { dg-final { scan-tree-dump "loop with 1 iterations completely unrolled" "cunrolli"} } */
+/* { dg-final { scan-tree-dump "loop with 1 iterations completely unrolled" "cunrolli1"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-9.c b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-9.c
index 886dc147ad1a..f93db92ab384 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-9.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-9.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-cunrolli-details -fdisable-tree-evrp" } */
+/* { dg-options "-O2 -fdump-tree-cunrolli1-details -fdisable-tree-evrp" } */
 void abort (void);
 int q (void);
 int a[10];
@@ -20,4 +20,4 @@ t (int n)
     }
   return sum;
 }
-/* { dg-final { scan-tree-dump-times "Removed pointless exit:" 1 "cunrolli" } } */
+/* { dg-final { scan-tree-dump-times "Removed pointless exit:" 1 "cunrolli1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-17.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-17.c
index b3617f685a1d..86c84606ce51 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-17.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-17.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-loop-distribution -ftree-loop-distribute-patterns -fdump-tree-ldist-details -fdisable-tree-cunroll -fdisable-tree-cunrolli" } */
+/* { dg-options "-O2 -ftree-loop-distribution -ftree-loop-distribute-patterns -fdump-tree-ldist-details -fdisable-tree-cunroll -fdisable-tree-cunrolli1" } */

 typedef int mad_fixed_t;
 struct mad_pcm
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-38.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-38.c
index 7ca1e4709751..f8f04ffaa456 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loop-38.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-38.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-cunrolli-details" } */
+/* { dg-options "-O2 -fdump-tree-cunrolli1-details" } */
 int a[10];
 int b[11];
 int q (void);
@@ -15,4 +15,4 @@ t(int n)
        sum+=b[i];
   return sum;
 }
-/* { dg-final { scan-tree-dump "Loop 1 iterates at most 11 times" "cunrolli" } } */
+/* { dg-final { scan-tree-dump "Loop 1 iterates at most 11 times" "cunrolli1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loopclosedphi.c b/gcc/testsuite/gcc.dg/tree-ssa/loopclosedphi.c
index d71b757fbca5..482c19ea1485 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loopclosedphi.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loopclosedphi.c
@@ -18,4 +18,4 @@ t6 (int qz, int wh)
     qz = jl * wh;
 }

-/* { dg-final { scan-tree-dump-times "Replacing" 2 "loopdone"} } */
+/* { dg-final { scan-tree-dump-times "Replacing" 2 "loopdone2"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr21463.c b/gcc/testsuite/gcc.dg/tree-ssa/pr21463.c
index ed0829a038c4..c6f1226d6834 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr21463.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr21463.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-phiprop-details" } */
+/* { dg-options "-O -fdump-tree-phiprop1-details" } */

 struct f
 {
@@ -16,4 +16,4 @@ int g(int i, int c, struct f *ff, int g)
   return *t;
 }

-/* { dg-final { scan-tree-dump-times "Inserting PHI for result of load" 1 "phiprop" } } */
+/* { dg-final { scan-tree-dump-times "Inserting PHI for result of load" 1 "phiprop1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr45427.c b/gcc/testsuite/gcc.dg/tree-ssa/pr45427.c
index 2f86f02a30ce..3e8a13cd40c0 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr45427.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr45427.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-cunrolli-details" } */
+/* { dg-options "-O2 -fdump-tree-cunrolli1-details" } */

 extern void abort (void);
 int __attribute__((noinline,noclone))
@@ -25,4 +25,4 @@ int main()
   return 0;
 }

-/* { dg-final { scan-tree-dump-times "bounded by 0x0\[^0-9a-f\]" 0 "cunrolli"} } */
+/* { dg-final { scan-tree-dump-times "bounded by 0x0\[^0-9a-f\]" 0 "cunrolli1"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c b/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c
index 0f66aae87bba..98d639bc24dd 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-Ofast -fdisable-tree-cunrolli -fdump-tree-threadfull1-details" } */
+/* { dg-options "-Ofast -fdisable-tree-cunrolli1 -fdump-tree-threadfull1-details" } */

 typedef unsigned short u16;
 typedef unsigned char u8;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c
index 669d357045cb..069df138bcbe 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c
@@ -50,4 +50,4 @@ int foo1 (e_u8 a[4][N], int b1, int b2, e_u8 b[M+1][4][N])


 /* { dg-final { scan-tree-dump-times "loop with 3 iterations completely unrolled" 2 "cunroll" } } */

 /* { dg-final { scan-tree-dump-times "loop with 7 iterations completely unrolled" 2 "cunroll" } } */

-/* { dg-final { scan-tree-dump-not "completely unrolled" "cunrolli" } } */

+/* { dg-final { scan-tree-dump-not "completely unrolled" "cunrolli1" } } */

diff --git a/gcc/testsuite/gcc.dg/unroll-2.c b/gcc/testsuite/gcc.dg/unroll-2.c
index 8baceaac1699..f94174f0f1d3 100644
--- a/gcc/testsuite/gcc.dg/unroll-2.c
+++ b/gcc/testsuite/gcc.dg/unroll-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-cunrolli-details=stderr -fno-peel-loops -fno-tree-vrp  -fdisable-tree-cunroll -fenable-tree-cunrolli" } */
+/* { dg-options "-O2 -fdump-tree-cunrolli-details=stderr -fno-peel-loops -fno-tree-vrp  -fdisable-tree-cunroll -fenable-tree-cunrolli1" } */

 /* Blank lines can occur in the output of
    -fdump-tree-cunrolli-details=stderr.  */
diff --git a/gcc/testsuite/gcc.dg/unroll-3.c b/gcc/testsuite/gcc.dg/unroll-3.c
index 10bf59b9a2e7..0284378b9c5c 100644
--- a/gcc/testsuite/gcc.dg/unroll-3.c
+++ b/gcc/testsuite/gcc.dg/unroll-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-cunrolli-details -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunrolli=foo -fenable-tree-cunrolli=foo" } */
+/* { dg-options "-O2 -fdump-tree-cunrolli-details -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunrolli1=foo -fenable-tree-cunrolli1=foo" } */

 unsigned a[100], b[100];
 inline void bar()
@@ -28,4 +28,4 @@ int foo2(void)
   return 1;
 }

-/* { dg-final { scan-tree-dump-times "loop with 2 iterations completely unrolled" 1 "cunrolli" } } */
+/* { dg-final { scan-tree-dump-times "loop with 2 iterations completely unrolled" 1 "cunrolli1" } } */
diff --git a/gcc/testsuite/gcc.dg/unroll-4.c b/gcc/testsuite/gcc.dg/unroll-4.c
index 17f194212279..d62e2e7afa0a 100644
--- a/gcc/testsuite/gcc.dg/unroll-4.c
+++ b/gcc/testsuite/gcc.dg/unroll-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-cunrolli-details -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunroll -fenable-tree-cunrolli=foo -fdisable-tree-cunrolli=foo2" } */
+/* { dg-options "-O2 -fdump-tree-cunrolli1-details -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunroll -fenable-tree-cunrolli1=foo -fdisable-tree-cunrolli1=foo2" } */

 unsigned a[100], b[100];
 inline void bar()
@@ -28,4 +28,4 @@ int foo2(void)
   return 1;
 }

-/* { dg-final { scan-tree-dump-times "loop with 2 iterations completely unrolled" 1 "cunrolli" } } */
+/* { dg-final { scan-tree-dump-times "loop with 2 iterations completely unrolled" 1 "cunrolli1" } } */
diff --git a/gcc/testsuite/gcc.dg/unroll-5.c b/gcc/testsuite/gcc.dg/unroll-5.c
index f3bdebe9882f..c81467cd4202 100644
--- a/gcc/testsuite/gcc.dg/unroll-5.c
+++ b/gcc/testsuite/gcc.dg/unroll-5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-cunrolli-details -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunroll -fenable-tree-cunrolli=foo2 -fdisable-tree-cunrolli=foo" } */
+/* { dg-options "-O2 -fdump-tree-cunrolli1-details -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunroll -fenable-tree-cunrolli1=foo2 -fdisable-tree-cunrolli1=foo" } */

 unsigned a[100], b[100];
 inline void bar()
@@ -28,4 +28,4 @@ int foo2(void)
   return 1;
 }

-/* { dg-final { scan-tree-dump-times "loop with 2 iterations completely unrolled" 1 "cunrolli" } } */
+/* { dg-final { scan-tree-dump-times "loop with 2 iterations completely unrolled" 1 "cunrolli1" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-59.c b/gcc/testsuite/gcc.dg/vect/bb-slp-59.c
index 815b44e1f7cf..2f7c17d803eb 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-59.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-59.c
@@ -22,5 +22,5 @@ void foo (void)
 /* We should be able to vectorize the cycle in one SLP attempt including
    both load groups and do only one permutation.  */
 /* { dg-final { scan-tree-dump-times "transform load" 2 "slp1" } } */
-/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 1 "loopdone" } } */
+/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 1 "loopdone2" } } */
 /* { dg-final { scan-tree-dump-times "optimized: basic block" 1 "slp1" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-profile-1.c b/gcc/testsuite/gcc.dg/vect/vect-profile-1.c
index 922f965806f9..a8b3ffb87d06 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-profile-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-profile-1.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target vect_int } */
-/* { dg-additional-options "-fdump-tree-vect-details-blocks -fdisable-tree-cunrolli" } */
+/* { dg-additional-options "-fdump-tree-vect-details-blocks -fdisable-tree-cunrolli1" } */

 /* At least one of these should correspond to a full vector.  */

diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index e807ad855efd..ebaa3c86694f 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -489,6 +489,8 @@ extern gimple_opt_pass *make_pass_vtable_verify (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ubsan (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_sanopt (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_oacc_kernels (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_oacc_only (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_oacc_functions_only (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_ipa_oacc (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_ipa_oacc_kernels (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_warn_nonnull_compare (gcc::context *ctxt);
diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c
index be533b03a85b..2d4145ce5b8e 100644
--- a/gcc/tree-ssa-loop-ivcanon.c
+++ b/gcc/tree-ssa-loop-ivcanon.c
@@ -1583,6 +1583,7 @@ public:

   /* opt_pass methods: */
   virtual unsigned int execute (function *);
+  opt_pass * clone () { return new pass_complete_unroll (m_ctxt); }

 }; // class pass_complete_unroll

@@ -1642,6 +1643,7 @@ public:
   /* opt_pass methods: */
   virtual bool gate (function *) { return optimize >= 2; }
   virtual unsigned int execute (function *);
+  opt_pass * clone () { return new pass_complete_unrolli (m_ctxt); }

 }; // class pass_complete_unrolli

diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c
index 1bbf2f1fb2c8..8d5572033f7b 100644
--- a/gcc/tree-ssa-loop.c
+++ b/gcc/tree-ssa-loop.c
@@ -70,6 +70,8 @@ public:
   virtual bool gate (function *) { return flag_tree_loop_optimize; }

   virtual unsigned int execute (function *fn);
+
+  opt_pass * clone () { return new pass_fix_loops (m_ctxt); }
 }; // class pass_fix_loops

 unsigned int
@@ -136,6 +138,8 @@ public:
   /* opt_pass methods: */
   virtual bool gate (function *fn) { return gate_loop (fn); }

+
+  opt_pass * clone () { return new pass_tree_loop (m_ctxt); }
 }; // class pass_tree_loop

 } // anon namespace
@@ -200,6 +204,97 @@ make_pass_oacc_kernels (gcc::context *ctxt)
 {
   return new pass_oacc_kernels (ctxt);
 }
+/* A superpass that runs its subpasses on OpenACC functions only.  */
+
+namespace {
+
+const pass_data pass_data_oacc_functions_only =
+{
+  GIMPLE_PASS, /* type */
+  "*oacc_fns_only", /* name */
+  OPTGROUP_LOOP, /* optinfo_flags */
+  TV_TREE_LOOP, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_oacc_functions_only: public gimple_opt_pass
+{
+public:
+  pass_oacc_functions_only (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_oacc_functions_only, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *fn) {
+    if (!flag_openacc)
+      return false;
+
+    if (!oacc_get_fn_attrib (fn->decl))
+      return false;
+
+    return true;
+  }
+
+}; // class pass_oacc_functions_only
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_oacc_functions_only (gcc::context *ctxt)
+{
+  return new pass_oacc_functions_only (ctxt);
+}
+
+/* A superpass that runs its subpasses only if compiling for OpenACC.  */
+
+namespace {
+
+const pass_data pass_data_oacc_only =
+{
+  GIMPLE_PASS, /* type */
+  "*oacc_only", /* name */
+  OPTGROUP_LOOP, /* optinfo_flags */
+  TV_TREE_LOOP, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_oacc_only: public gimple_opt_pass
+{
+public:
+  pass_oacc_only (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_oacc_only, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *fn) {
+    if (!flag_openacc)
+      return false;
+
+    if (!oacc_get_fn_attrib (fn->decl))
+      return false;
+
+    return true;
+  }
+
+}; // class pass_oacc_only
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_oacc_only (gcc::context *ctxt)
+{
+  return new pass_oacc_only (ctxt);
+}
+
+

 /* The ipa oacc superpass.  */

@@ -343,6 +438,8 @@ public:
   /* opt_pass methods: */
   virtual unsigned int execute (function *);

+  opt_pass * clone () { return new pass_tree_loop_init (m_ctxt); }
+
 }; // class pass_tree_loop_init

 unsigned int
@@ -556,6 +653,8 @@ public:
   /* opt_pass methods: */
   virtual unsigned int execute (function *) { return tree_ssa_loop_done (); }

+  opt_pass * clone () { return new pass_tree_loop_done (m_ctxt); }
+
 }; // class pass_tree_loop_done

 } // anon namespace
diff --git a/gcc/tree-ssa-phiprop.c b/gcc/tree-ssa-phiprop.c
index 78b0461c839d..f138f766286b 100644
--- a/gcc/tree-ssa-phiprop.c
+++ b/gcc/tree-ssa-phiprop.c
@@ -479,6 +479,8 @@ public:
   virtual bool gate (function *) { return flag_tree_phiprop; }
   virtual unsigned int execute (function *);

+  opt_pass * clone () { return new pass_phiprop (m_ctxt); }
+
 }; // class pass_phiprop

 unsigned int
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-2.c
index 17cc9bd663e5..4438f6c24fed 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-2.c
@@ -7,5 +7,5 @@

 #include "pr85486.c"

-/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 32\\)" "oaccloops" } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 32\\)" "oaccloops1" } } */
 /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=1, vectors=32" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c
index 5158bb5eb89e..c0a29c7556f9 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c
@@ -34,5 +34,5 @@ main (void)
   return 0;
 }

-/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 128\\)" "oaccloops" } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 128\\)" "oaccloops*" } } */
 /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=1, vectors=128" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c
index a3e44ebfbcb4..326f6d8dc31a 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c
@@ -1,6 +1,7 @@
 /* { dg-do run { target openacc_nvidia_accel_selected } } */
 /* { dg-additional-options "-fopenacc-dim=::128" } */
 /* { dg-additional-options "-foffload=-fdump-tree-oaccloops" } */
+>>>>>>> adfd567486a0 (Move pass_oacc_device_lower after pass_graphite)
 /* { dg-set-target-env-var "GOMP_DEBUG" "1" } */

 #include <stdlib.h>
@@ -35,5 +36,5 @@ main (void)
   return 0;
 }

-/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 128\\)" "oaccloops" } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 128\\)" "oaccloops*" } } */
 /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=1, vectors=128" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c
index a85400d09c50..efc9297acdee 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c
@@ -38,5 +38,5 @@ main (void)
   return 0;
 }

-/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 32\\)" "oaccloops" } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 32\\)" "oaccloops*" } } */
 /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=1, vectors=32" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c
index 24c078f377c3..1c83ec0cc18d 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c
@@ -36,5 +36,5 @@ main (void)
   return 0;
 }

-/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 2, 128\\)" "oaccloops" } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 2, 128\\)" "oaccloops*" } } */
 /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=2, vectors=128" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c
index fcca9f593bb2..f2391dca7272 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c
@@ -37,5 +37,5 @@ main (void)
   return 0;
 }

-/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 2, 128\\)" "oaccloops" } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 2, 128\\)" "oaccloops*" } } */
 /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=2, vectors=128" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c
index 0807eab7eee4..8ddaaf592cc1 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c
@@ -37,5 +37,5 @@ main (void)
   return 0;
 }

-/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 0, 128\\)" "oaccloops" } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 0, 128\\)" "oaccloops*" } } */
 /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=2, vectors=128" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c
index 4a8c1bf549e9..97abbfc20986 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c
@@ -36,5 +36,5 @@ main (void)
   return 0;
 }

-/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 0, 128\\)" "oaccloops" } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 0, 128\\)" "oaccloops*" } } */
 /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=8, vectors=128" } */
--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 15/40] graphite: Extend SCoP detection dump output
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
                   ` (13 preceding siblings ...)
  2021-12-15 15:54 ` [PATCH 14/40] openacc: Move pass_oacc_device_lower after pass_graphite Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2022-05-16 12:49   ` Tobias Burnus
  2021-12-15 15:54 ` [PATCH 16/40] graphite: Rename isl_id_for_ssa_name Frederik Harwath
                   ` (24 subsequent siblings)
  39 siblings, 1 reply; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: thomas, sebpop, grosser, rguenther

Extend dump output to make understanding why Graphite rejects to
include a loop in a SCoP easier (for GCC developers).

ChangeLog:

        * graphite-scop-detection.c (scop_detection::can_represent_loop):
        Output reason for failure to dump file.
        (scop_detection::harmful_loop_in_region): Likewise.
        (scop_detection::graphite_can_represent_expr): Likewise.
        (scop_detection::stmt_has_simple_data_refs_p): Likewise.
        (scop_detection::stmt_simple_for_scop_p): Likewise.
        (print_sese_loop_numbers): New function.
        (scop_detection::add_scop): Use from here to print loops in
        rejected SCoP.
---
 gcc/graphite-scop-detection.c | 188 +++++++++++++++++++++++++++++-----
 1 file changed, 165 insertions(+), 23 deletions(-)

diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index 3e729b159b09..46c470210d05 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -69,12 +69,27 @@ public:
     fprintf (output.dump_file, "%d", i);
     return output;
   }
+
   friend debug_printer &
   operator<< (debug_printer &output, const char *s)
   {
     fprintf (output.dump_file, "%s", s);
     return output;
   }
+
+  friend debug_printer &
+  operator<< (debug_printer &output, gimple* stmt)
+  {
+    print_gimple_stmt (output.dump_file, stmt, 0, TDF_VOPS | TDF_MEMSYMS);
+    return output;
+  }
+
+  friend debug_printer &
+  operator<< (debug_printer &output, tree t)
+  {
+    print_generic_expr (output.dump_file, t, TDF_SLIM);
+    return output;
+  }
 } dp;

 #define DEBUG_PRINT(args) do \
@@ -506,6 +521,24 @@ scop_detection::merge_sese (sese_l first, sese_l second) const
   return combined;
 }

+/* Print the loop numbers of the loops contained
+   in SESE to FILE. */
+
+static void
+print_sese_loop_numbers (FILE *file, sese_l sese)
+{
+  loop_p loop;
+  bool printed = false;
+  FOR_EACH_LOOP (loop, 0)
+  {
+    if (loop_in_sese_p (loop, sese))
+      fprintf (file, "%d, ", loop->num);
+    printed = true;
+  }
+  if (printed)
+    fprintf (file, "\b\b");
+}
+
 /* Build scop outer->inner if possible.  */

 void
@@ -519,8 +552,13 @@ scop_detection::build_scop_depth (loop_p loop)
       if (! next
          || harmful_loop_in_region (next))
        {
-         if (s)
-           add_scop (s);
+          if (next)
+            DEBUG_PRINT (
+                dp << "[scop-detection] Discarding SCoP on loops ";
+                print_sese_loop_numbers (dump_file, next);
+                dp << " because of harmful loops\n";);
+          if (s)
+            add_scop (s);
          build_scop_depth (loop);
          s = invalid_sese;
        }
@@ -560,14 +598,62 @@ scop_detection::can_represent_loop (loop_p loop, sese_l scop)
       || !single_pred_p (loop->latch)
       || exit->src != single_pred (loop->latch)
       || !empty_block_p (loop->latch))
-    return false;
+    {
+      DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop shape unsupported.\n");
+      return false;
+    }
+
+  bool edge_irreducible
+      = loop_preheader_edge (loop)->flags & EDGE_IRREDUCIBLE_LOOP;
+  if (edge_irreducible)
+    {
+      DEBUG_PRINT (
+          dp << "[can_represent_loop-fail] Loop is not a natural loop.\n");
+      return false;
+    }
+
+  bool niter_is_unconditional = number_of_iterations_exit (loop,
+                                                          single_exit (loop),
+                                                          &niter_desc, false);

-  return !(loop_preheader_edge (loop)->flags & EDGE_IRREDUCIBLE_LOOP)
-    && number_of_iterations_exit (loop, single_exit (loop), &niter_desc, false)
-    && niter_desc.control.no_overflow
-    && (niter = number_of_latch_executions (loop))
-    && !chrec_contains_undetermined (niter)
-    && graphite_can_represent_expr (scop, loop, niter);
+  if (!niter_is_unconditional)
+    {
+      DEBUG_PRINT (
+          dp << "[can_represent_loop-fail] Loop niter not unconditional.\n"
+             << "Condition: " << niter_desc.assumptions << "\n");
+      return false;
+    }
+
+  niter = number_of_latch_executions (loop);
+  if (!niter)
+    {
+      DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop niter unknown.\n");
+      return false;
+    }
+  if (!niter_desc.control.no_overflow)
+    {
+      DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop niter can overflow.\n");
+      return false;
+    }
+
+  bool undetermined_coefficients = chrec_contains_undetermined (niter);
+  if (undetermined_coefficients)
+    {
+      DEBUG_PRINT (dp << "[can_represent_loop-fail] "
+                  << "Loop niter chrec contains undetermined coefficients.\n");
+      return false;
+    }
+
+  bool can_represent_expr = graphite_can_represent_expr (scop, loop, niter);
+  if (!can_represent_expr)
+    {
+      DEBUG_PRINT (dp << "[can_represent_loop-fail] "
+                  << "Loop niter expression cannot be represented: "
+                  << niter << "\n");
+      return false;
+    }
+
+  return true;
 }

 /* Return true when BEGIN is the preheader edge of a loop with a single exit
@@ -640,6 +726,16 @@ scop_detection::add_scop (sese_l s)

   scops.safe_push (s);
   DEBUG_PRINT (dp << "[scop-detection] Adding SCoP: "; print_sese (dump_file, s));
+
+  if (dump_file && dump_flags & TDF_DETAILS)
+    {
+      loop_p loop;
+      fprintf (dump_file, "Loops in SCoP: ");
+      FOR_EACH_LOOP (loop, 0)
+      if (loop_in_sese_p (loop, s))
+        fprintf (dump_file, "%d ", loop->num);
+      fprintf (dump_file, "\n");
+    }
 }

 /* Return true when a statement in SCOP cannot be represented by Graphite.  */
@@ -665,7 +761,11 @@ scop_detection::harmful_loop_in_region (sese_l scop) const

       /* The basic block should not be part of an irreducible loop.  */
       if (bb->flags & BB_IRREDUCIBLE_LOOP)
-       return true;
+       {
+          DEBUG_PRINT (dp << "[scop-detection-fail] Found bb in irreducible "
+                      "loop.\n");
+          return true;
+        }

       /* Check for unstructured control flow: CFG not generated by structured
         if-then-else.  */
@@ -676,7 +776,11 @@ scop_detection::harmful_loop_in_region (sese_l scop) const
          FOR_EACH_EDGE (e, ei, bb->succs)
            if (!dominated_by_p (CDI_POST_DOMINATORS, bb, e->dest)
                && !dominated_by_p (CDI_DOMINATORS, e->dest, bb))
-             return true;
+             {
+                DEBUG_PRINT (dp << "[scop-detection-fail] Found unstructured "
+                                   "control flow.\n");
+                return true;
+              }
        }

       /* Collect all loops in the current region.  */
@@ -688,7 +792,10 @@ scop_detection::harmful_loop_in_region (sese_l scop) const
       for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
           !gsi_end_p (gsi); gsi_next (&gsi))
        if (!stmt_simple_for_scop_p (scop, gsi_stmt (gsi), bb))
-         return true;
+         {
+           DEBUG_PRINT (dp << "[scop-detection-fail] Found harmful statement.\n");
+           return true;
+         }

       for (basic_block dom = first_dom_son (CDI_DOMINATORS, bb);
           dom;
@@ -731,9 +838,11 @@ scop_detection::harmful_loop_in_region (sese_l scop) const
          && ! loop_nest_has_data_refs (loop))
        {
          DEBUG_PRINT (dp << "[scop-detection-fail] loop_" << loop->num
-                      << "does not have any data reference.\n");
+                      << " does not have any data reference.\n");
          return true;
        }
+
+      DEBUG_PRINT (dp << "[scop-detection] loop_" << loop->num << " is harmless.\n");
     }

   return false;
@@ -922,7 +1031,21 @@ scop_detection::graphite_can_represent_expr (sese_l scop, loop_p loop,
                                             tree expr)
 {
   tree scev = cached_scalar_evolution_in_region (scop, loop, expr);
-  return graphite_can_represent_scev (scop, scev);
+  bool can_represent = graphite_can_represent_scev (scop, scev);
+
+  if (!can_represent)
+    {
+      if (dump_file)
+       {
+          fprintf (dump_file, "[graphite_can_represent_expr] Cannot represent "
+                  "scev \"");
+          print_generic_expr (dump_file, scev, TDF_SLIM);
+          fprintf (dump_file, "\" of expression ");
+          print_generic_expr (dump_file, expr, TDF_SLIM);
+          fprintf (dump_file, " in loop %d\n", loop->num);
+        }
+    }
+  return can_represent;
 }

 /* Return true if the data references of STMT can be represented by Graphite.
@@ -938,7 +1061,11 @@ scop_detection::stmt_has_simple_data_refs_p (sese_l scop, gimple *stmt)

   auto_vec<data_reference_p> drs;
   if (! graphite_find_data_references_in_stmt (nest, loop, stmt, &drs))
-    return false;
+    {
+      DEBUG_PRINT (dp <<
+                  "[stmt_has_simple_data_refs_p] Unanalyzable statement.\n");
+      return false;
+    }

   int j;
   data_reference_p dr;
@@ -946,7 +1073,12 @@ scop_detection::stmt_has_simple_data_refs_p (sese_l scop, gimple *stmt)
     {
       for (unsigned i = 0; i < DR_NUM_DIMENSIONS (dr); ++i)
        if (! graphite_can_represent_scev (scop, DR_ACCESS_FN (dr, i)))
-         return false;
+         {
+            DEBUG_PRINT (dp << "[stmt_has_simple_data_refs_p] Cannot "
+                               "represent access function SCEV: "
+                            << DR_ACCESS_FN (dr, i) << "\n");
+            return false;
+          }
     }

   return true;
@@ -1027,14 +1159,24 @@ scop_detection::stmt_simple_for_scop_p (sese_l scop, gimple *stmt,
        for (unsigned i = 0; i < 2; ++i)
          {
            tree op = gimple_op (stmt, i);
-           if (!graphite_can_represent_expr (scop, loop, op)
-               /* We can only constrain on integer type.  */
-               || ! INTEGRAL_TYPE_P (TREE_TYPE (op)))
+           if (!graphite_can_represent_expr (scop, loop, op))
+             {
+               DEBUG_PRINT (dump_printf_loc (MSG_MISSED_OPTIMIZATION, stmt,
+                                             "[scop-detection-fail] "
+                                             "Graphite cannot represent cond "
+                                             "stmt operator expression.\n"));
+               DEBUG_PRINT (dp << op << "\n");
+
+               return false;
+             }
+
+             if (! INTEGRAL_TYPE_P (TREE_TYPE (op)))
              {
-               DEBUG_PRINT (dp << "[scop-detection-fail] "
-                               << "Graphite cannot represent stmt:\n";
-                            print_gimple_stmt (dump_file, stmt, 0,
-                                               TDF_VOPS | TDF_MEMSYMS));
+               DEBUG_PRINT (dump_printf_loc (MSG_MISSED_OPTIMIZATION, stmt,
+                                             "[scop-detection-fail] "
+                                             "Graphite cannot represent cond "
+                                             "statement operator. "
+                                             "Type must be integral.\n"));
                return false;
              }
          }
--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 16/40] graphite: Rename isl_id_for_ssa_name
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
                   ` (14 preceding siblings ...)
  2021-12-15 15:54 ` [PATCH 15/40] graphite: Extend SCoP detection dump output Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2022-05-16 12:49   ` Tobias Burnus
  2021-12-15 15:54 ` [PATCH 17/40] graphite: Fix minor mistakes in comments Frederik Harwath
                   ` (23 subsequent siblings)
  39 siblings, 1 reply; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: thomas, sebpop, grosser, rguenther

The SSA names for which this function gets used are always SCoP
parameters and hence "isl_id_for_parameter" is a better name.  It also
explains the prefix "P_" for those names in the ISL representation.

gcc/ChangeLog:

        * graphite-sese-to-poly.c (isl_id_for_ssa_name): Rename to ...
          (isl_id_for_parameter): ... this new function name.
          (build_scop_context): Adjust function use.
---
 gcc/graphite-sese-to-poly.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index 99ea0327b1a7..204d382ed4cc 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -100,14 +100,15 @@ extract_affine_mul (scop_p s, tree e, __isl_take isl_space *space)
   return isl_pw_aff_mul (lhs, rhs);
 }

-/* Return an isl identifier from the name of the ssa_name E.  */
+/* Return an isl identifier for the parameter P.  */

 static isl_id *
-isl_id_for_ssa_name (scop_p s, tree e)
+isl_id_for_parameter (scop_p s, tree p)
 {
-  char name1[14];
-  snprintf (name1, sizeof (name1), "P_%d", SSA_NAME_VERSION (e));
-  return isl_id_alloc (s->isl_context, name1, e);
+  gcc_checking_assert (TREE_CODE (p) == SSA_NAME);
+  char name[14];
+  snprintf (name, sizeof (name), "P_%d", SSA_NAME_VERSION (p));
+  return isl_id_alloc (s->isl_context, name, p);
 }

 /* Return an isl identifier for the data reference DR.  Data references and
@@ -898,15 +899,15 @@ build_scop_context (scop_p scop)
   isl_space *space = isl_space_set_alloc (scop->isl_context, nbp, 0);

   unsigned i;
-  tree e;
-  FOR_EACH_VEC_ELT (region->params, i, e)
+  tree p;
+  FOR_EACH_VEC_ELT (region->params, i, p)
     space = isl_space_set_dim_id (space, isl_dim_param, i,
-                                  isl_id_for_ssa_name (scop, e));
+                                  isl_id_for_parameter (scop, p));

   scop->param_context = isl_set_universe (space);

-  FOR_EACH_VEC_ELT (region->params, i, e)
-    add_param_constraints (scop, i, e);
+  FOR_EACH_VEC_ELT (region->params, i, p)
+    add_param_constraints (scop, i, p);
 }

 /* Return true when loop A is nested in loop B.  */
--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 17/40] graphite: Fix minor mistakes in comments
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
                   ` (15 preceding siblings ...)
  2021-12-15 15:54 ` [PATCH 16/40] graphite: Rename isl_id_for_ssa_name Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2022-05-16 12:49   ` Tobias Burnus
  2021-12-15 15:54 ` [PATCH 18/40] Move compute_alias_check_pairs to tree-data-ref.c Frederik Harwath
                   ` (22 subsequent siblings)
  39 siblings, 1 reply; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: thomas, sebpop, grosser, rguenther

gcc/ChangeLog:

        * graphite-sese-to-poly.c (build_poly_sr_1): Fix a typo and
        a reference to a variable which does not exist.
        * graphite-isl-ast-to-gimple.c (gsi_insert_earliest): Fix typo
        in comment.
---
 gcc/graphite-isl-ast-to-gimple.c | 2 +-
 gcc/graphite-sese-to-poly.c      | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index 1ad68a1d4735..0712d85b67a6 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -1018,7 +1018,7 @@ gsi_insert_earliest (gimple_seq seq)
   basic_block begin_bb = get_entry_bb (codegen_region);

   /* Inserting the gimple statements in a vector because gimple_seq behave
-     in strage ways when inserting the stmts from it into different basic
+     in strange ways when inserting the stmts from it into different basic
      blocks one at a time.  */
   auto_vec<gimple *, 3> stmts;
   for (gimple_stmt_iterator gsi = gsi_start (seq); !gsi_end_p (gsi);
diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index 204d382ed4cc..33d6a98327b8 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -649,14 +649,14 @@ build_poly_sr_1 (poly_bb_p pbb, gimple *stmt, tree var, enum poly_dr_type kind,
                 isl_map *acc, isl_set *subscript_sizes)
 {
   scop_p scop = PBB_SCOP (pbb);
-  /* Each scalar variables has a unique alias set number starting from
+  /* Each scalar variable has a unique alias set number starting from
      the maximum alias set assigned to a dr.  */
   int alias_set = scop->max_alias_set + SSA_NAME_VERSION (var);
   subscript_sizes = isl_set_fix_si (subscript_sizes, isl_dim_set, 0,
                                    alias_set);

   /* Add a constrain to the ACCESSES polyhedron for the alias set of
-     data reference DR.  */
+     the reference */
   isl_constraint *c
     = isl_equality_alloc (isl_local_space_from_space (isl_map_get_space (acc)));
   c = isl_constraint_set_constant_si (c, -alias_set);
--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 18/40] Move compute_alias_check_pairs to tree-data-ref.c
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
                   ` (16 preceding siblings ...)
  2021-12-15 15:54 ` [PATCH 17/40] graphite: Fix minor mistakes in comments Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2021-12-15 15:54 ` [PATCH 19/40] graphite: Add runtime alias checking Frederik Harwath
                   ` (21 subsequent siblings)
  39 siblings, 0 replies; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: thomas, rguenther

Move this function from tree-loop-distribution.c to tree-data-ref.c
and make it non-static to enable its use from other parts of GCC.

gcc/ChangeLog:
        * tree-loop-distribution.c (data_ref_segment_size): Remove function.
        (latch_dominated_by_data_ref): Likewise.
        (compute_alias_check_pairs): Likewise.

        * tree-data-ref.c (data_ref_segment_size): New function,
        copied from tree-loop-distribution.c
        (compute_alias_check_pairs): Likewise.
        (latch_dominated_by_data_ref): Likewise.

        * tree-data-ref.h (compute_alias_check_pairs): New declaration.
---
 gcc/tree-data-ref.c          | 87 ++++++++++++++++++++++++++++++++++++
 gcc/tree-data-ref.h          |  3 ++
 gcc/tree-loop-distribution.c | 87 ------------------------------------
 3 files changed, 90 insertions(+), 87 deletions(-)

diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index 46f4ffedb483..6a3659dc490c 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -2636,6 +2636,93 @@ create_intersect_range_checks (class loop *loop, tree *cond_expr,
     dump_printf (MSG_NOTE, "using an address-based overlap test\n");
 }

+/* Compute and return an expression whose value is the segment length which
+   will be accessed by DR in NITERS iterations.  */
+
+static tree
+data_ref_segment_size (struct data_reference *dr, tree niters)
+{
+  niters = size_binop (MINUS_EXPR,
+                      fold_convert (sizetype, niters),
+                      size_one_node);
+  return size_binop (MULT_EXPR,
+                    fold_convert (sizetype, DR_STEP (dr)),
+                    fold_convert (sizetype, niters));
+}
+
+/* Return true if LOOP's latch is dominated by statement for data reference
+   DR.  */
+
+static inline bool
+latch_dominated_by_data_ref (class loop *loop, data_reference *dr)
+{
+  return dominated_by_p (CDI_DOMINATORS, single_exit (loop)->src,
+                        gimple_bb (DR_STMT (dr)));
+}
+
+/* Compute alias check pairs and store them in COMP_ALIAS_PAIRS for LOOP's
+   data dependence relations ALIAS_DDRS.  */
+
+void
+compute_alias_check_pairs (class loop *loop, vec<ddr_p> *alias_ddrs,
+                          vec<dr_with_seg_len_pair_t> *comp_alias_pairs)
+{
+  unsigned int i;
+  unsigned HOST_WIDE_INT factor = 1;
+  tree niters_plus_one, niters = number_of_latch_executions (loop);
+
+  gcc_assert (niters != NULL_TREE && niters != chrec_dont_know);
+  niters = fold_convert (sizetype, niters);
+  niters_plus_one = size_binop (PLUS_EXPR, niters, size_one_node);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file, "Creating alias check pairs:\n");
+
+  /* Iterate all data dependence relations and compute alias check pairs.  */
+  for (i = 0; i < alias_ddrs->length (); i++)
+    {
+      ddr_p ddr = (*alias_ddrs)[i];
+      struct data_reference *dr_a = DDR_A (ddr);
+      struct data_reference *dr_b = DDR_B (ddr);
+      tree seg_length_a, seg_length_b;
+
+      if (latch_dominated_by_data_ref (loop, dr_a))
+       seg_length_a = data_ref_segment_size (dr_a, niters_plus_one);
+      else
+       seg_length_a = data_ref_segment_size (dr_a, niters);
+
+      if (latch_dominated_by_data_ref (loop, dr_b))
+       seg_length_b = data_ref_segment_size (dr_b, niters_plus_one);
+      else
+       seg_length_b = data_ref_segment_size (dr_b, niters);
+
+      unsigned HOST_WIDE_INT access_size_a
+       = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_a))));
+      unsigned HOST_WIDE_INT access_size_b
+       = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_b))));
+      unsigned int align_a = TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr_a)));
+      unsigned int align_b = TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr_b)));
+
+      dr_with_seg_len_pair_t dr_with_seg_len_pair
+       (dr_with_seg_len (dr_a, seg_length_a, access_size_a, align_a),
+        dr_with_seg_len (dr_b, seg_length_b, access_size_b, align_b),
+        /* ??? Would WELL_ORDERED be safe?  */
+        dr_with_seg_len_pair_t::REORDERED);
+
+      comp_alias_pairs->safe_push (dr_with_seg_len_pair);
+    }
+
+  if (tree_fits_uhwi_p (niters))
+    factor = tree_to_uhwi (niters);
+
+  /* Prune alias check pairs.  */
+  prune_runtime_alias_test_list (comp_alias_pairs, factor);
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file,
+            "Improved number of alias checks from %d to %d\n",
+            alias_ddrs->length (), comp_alias_pairs->length ());
+}
+
 /* Create a conditional expression that represents the run-time checks for
    overlapping of address ranges represented by a list of data references
    pairs passed in ALIAS_PAIRS.  Data references are in LOOP.  The returned
diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
index 74f579c9f3f2..4929b059ddea 100644
--- a/gcc/tree-data-ref.h
+++ b/gcc/tree-data-ref.h
@@ -582,6 +582,9 @@ extern opt_result runtime_alias_check_p (ddr_p, class loop *, bool);
 extern int data_ref_compare_tree (tree, tree);
 extern void prune_runtime_alias_test_list (vec<dr_with_seg_len_pair_t> *,
                                           poly_uint64);
+
+extern void compute_alias_check_pairs (class loop *, vec<ddr_p> *,
+                                      vec<dr_with_seg_len_pair_t> *);
 extern void create_runtime_alias_checks (class loop *,
                                         const vec<dr_with_seg_len_pair_t> *,
                                         tree*);
diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index 583c01a42d86..ed6f2c2974f1 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -2582,93 +2582,6 @@ loop_distribution::break_alias_scc_partitions (struct graph *rdg,
     }
 }

-/* Compute and return an expression whose value is the segment length which
-   will be accessed by DR in NITERS iterations.  */
-
-static tree
-data_ref_segment_size (struct data_reference *dr, tree niters)
-{
-  niters = size_binop (MINUS_EXPR,
-                      fold_convert (sizetype, niters),
-                      size_one_node);
-  return size_binop (MULT_EXPR,
-                    fold_convert (sizetype, DR_STEP (dr)),
-                    fold_convert (sizetype, niters));
-}
-
-/* Return true if LOOP's latch is dominated by statement for data reference
-   DR.  */
-
-static inline bool
-latch_dominated_by_data_ref (class loop *loop, data_reference *dr)
-{
-  return dominated_by_p (CDI_DOMINATORS, single_exit (loop)->src,
-                        gimple_bb (DR_STMT (dr)));
-}
-
-/* Compute alias check pairs and store them in COMP_ALIAS_PAIRS for LOOP's
-   data dependence relations ALIAS_DDRS.  */
-
-static void
-compute_alias_check_pairs (class loop *loop, vec<ddr_p> *alias_ddrs,
-                          vec<dr_with_seg_len_pair_t> *comp_alias_pairs)
-{
-  unsigned int i;
-  unsigned HOST_WIDE_INT factor = 1;
-  tree niters_plus_one, niters = number_of_latch_executions (loop);
-
-  gcc_assert (niters != NULL_TREE && niters != chrec_dont_know);
-  niters = fold_convert (sizetype, niters);
-  niters_plus_one = size_binop (PLUS_EXPR, niters, size_one_node);
-
-  if (dump_file && (dump_flags & TDF_DETAILS))
-    fprintf (dump_file, "Creating alias check pairs:\n");
-
-  /* Iterate all data dependence relations and compute alias check pairs.  */
-  for (i = 0; i < alias_ddrs->length (); i++)
-    {
-      ddr_p ddr = (*alias_ddrs)[i];
-      struct data_reference *dr_a = DDR_A (ddr);
-      struct data_reference *dr_b = DDR_B (ddr);
-      tree seg_length_a, seg_length_b;
-
-      if (latch_dominated_by_data_ref (loop, dr_a))
-       seg_length_a = data_ref_segment_size (dr_a, niters_plus_one);
-      else
-       seg_length_a = data_ref_segment_size (dr_a, niters);
-
-      if (latch_dominated_by_data_ref (loop, dr_b))
-       seg_length_b = data_ref_segment_size (dr_b, niters_plus_one);
-      else
-       seg_length_b = data_ref_segment_size (dr_b, niters);
-
-      unsigned HOST_WIDE_INT access_size_a
-       = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_a))));
-      unsigned HOST_WIDE_INT access_size_b
-       = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_b))));
-      unsigned int align_a = TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr_a)));
-      unsigned int align_b = TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr_b)));
-
-      dr_with_seg_len_pair_t dr_with_seg_len_pair
-       (dr_with_seg_len (dr_a, seg_length_a, access_size_a, align_a),
-        dr_with_seg_len (dr_b, seg_length_b, access_size_b, align_b),
-        /* ??? Would WELL_ORDERED be safe?  */
-        dr_with_seg_len_pair_t::REORDERED);
-
-      comp_alias_pairs->safe_push (dr_with_seg_len_pair);
-    }
-
-  if (tree_fits_uhwi_p (niters))
-    factor = tree_to_uhwi (niters);
-
-  /* Prune alias check pairs.  */
-  prune_runtime_alias_test_list (comp_alias_pairs, factor);
-  if (dump_file && (dump_flags & TDF_DETAILS))
-    fprintf (dump_file,
-            "Improved number of alias checks from %d to %d\n",
-            alias_ddrs->length (), comp_alias_pairs->length ());
-}
-
 /* Given data dependence relations in ALIAS_DDRS, generate runtime alias
    checks and version LOOP under condition of these runtime alias checks.  */

--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 19/40] graphite: Add runtime alias checking
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
                   ` (17 preceding siblings ...)
  2021-12-15 15:54 ` [PATCH 18/40] Move compute_alias_check_pairs to tree-data-ref.c Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2021-12-15 15:54 ` [PATCH 20/40] openacc: Use Graphite for dependence analysis in "kernels" regions Frederik Harwath
                   ` (20 subsequent siblings)
  39 siblings, 0 replies; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: thomas, sebpop, grosser, rguenther

Graphite rejects a SCoP if it contains a pair of data references for
which it cannot determine statically if they may alias. This happens
very often, for instance in C code which does not use explicit
"restrict".  This commit adds the possibility to analyze a SCoP
nevertheless and perform an alias check at runtime.  Then, if aliasing
is detected, the execution will fall back to the unoptimized SCoP.

TODO This needs more testing on non-OpenACC code.

gcc/ChangeLog:

        * common.opt: Add fgraphite-runtime-alias-checks.
        * graphite-isl-ast-to-gimple.c
        (generate_alias_cond): New function.
        (graphite_regenerate_ast_isl): Use from here.
        * graphite-poly.c (new_scop): Create unhandled_alias_ddrs vec ...
        (free_scop): and release here.
        * graphite-scop-detection.c (dr_defs_outside_region): New function.
        (dr_well_analyzed_for_runtime_alias_check_p): New function.
        (graphite_runtime_alias_check_p): New function.
        (build_alias_set): Record unhandled alias ddrs for later alias check
        creation if flag_graphite_runtime_alias_checks is true instead
        of failing.
        * graphite.h (struct scop): Add field unhandled_alias_ddrs.
        * sese.h (has_operands_from_region_p): New function.

gcc/testsuite/ChangeLog:

        * gcc.dg/graphite/alias-1.c: New test.
---
 gcc/common.opt                          |   4 +
 gcc/graphite-isl-ast-to-gimple.c        |  60 ++++++
 gcc/graphite-poly.c                     |   2 +
 gcc/graphite-scop-detection.c           | 241 +++++++++++++++++++++---
 gcc/graphite.h                          |   4 +
 gcc/sese.h                              |  18 ++
 gcc/testsuite/gcc.dg/graphite/alias-1.c |  22 +++
 7 files changed, 328 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-1.c

diff --git a/gcc/common.opt b/gcc/common.opt
index 1a5b9bfcca91..b6c46ab63e34 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1673,6 +1673,10 @@ fgraphite-identity
 Common Var(flag_graphite_identity) Optimization
 Enable Graphite Identity transformation.

+fgraphite-runtime-alias-checks
+Common Var(flag_graphite_runtime_alias_checks) Optimization Init(1)
+Allow Graphite to add runtime alias checks to loop-nests if aliasing cannot be resolved statically.
+
 fhoist-adjacent-loads
 Common Var(flag_hoist_adjacent_loads) Optimization
 Enable hoisting adjacent loads to encourage generating conditional move
diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index 0712d85b67a6..073b471775de 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -1456,6 +1456,34 @@ generate_entry_out_of_ssa_copies (edge false_entry,
     }
 }

+/* Create a condition that evaluates to TRUE if all ALIAS_DDRS are free of
+   aliasing. */
+
+static tree
+generate_alias_cond (vec<ddr_p> &alias_ddrs, loop_p context_loop)
+{
+  gcc_checking_assert (flag_graphite_runtime_alias_checks
+                       && alias_ddrs.length () > 0);
+  gcc_checking_assert (context_loop);
+
+  auto_vec<dr_with_seg_len_pair_t> check_pairs;
+  compute_alias_check_pairs (context_loop, &alias_ddrs, &check_pairs);
+  gcc_checking_assert (check_pairs.length () > 0);
+
+  tree alias_cond = NULL_TREE;
+  create_runtime_alias_checks (context_loop, &check_pairs, &alias_cond);
+  gcc_checking_assert (alias_cond);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Generated runtime alias check: ");
+      print_generic_expr (dump_file, alias_cond, dump_flags);
+      fprintf (dump_file, "\n");
+    }
+
+  return alias_cond;
+}
+
 /* GIMPLE Loop Generator: generates loops in GIMPLE form for the given SCOP.
    Return true if code generation succeeded.  */

@@ -1496,12 +1524,44 @@ graphite_regenerate_ast_isl (scop_p scop)
   region->if_region = if_region;

   loop_p context_loop = region->region.entry->src->loop_father;
+  gcc_checking_assert (context_loop);
   edge e = single_succ_edge (if_region->true_region->region.entry->dest);
   basic_block bb = split_edge (e);

   /* Update the true_region exit edge.  */
   region->if_region->true_region->region.exit = single_succ_edge (bb);

+  if (flag_graphite_runtime_alias_checks
+      && scop->unhandled_alias_ddrs.length () > 0)
+    {
+      /* SCoP detection has failed to handle the aliasing between some data
+        references of the SCoP statically. Generate an alias check that selects
+        the newly generated version of the SCoP in the true-branch of the
+        conditional if aliasing can be ruled out at runtime and the original
+        version of the SCoP, otherwise. */
+
+      loop_p loop
+          = find_common_loop (scop->scop_info->region.entry->dest->loop_father,
+                              scop->scop_info->region.exit->src->loop_father);
+      tree cond = generate_alias_cond (scop->unhandled_alias_ddrs, loop);
+      tree non_alias_cond = build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
+      set_ifsese_condition (region->if_region, non_alias_cond);
+
+      /* The loop-nest vec is shared by all DDRs. */
+      DDR_LOOP_NEST (scop->unhandled_alias_ddrs[0]).release ();
+
+      unsigned int i;
+      struct data_dependence_relation *ddr;
+
+      FOR_EACH_VEC_ELT (scop->unhandled_alias_ddrs, i, ddr)
+       if (ddr)
+         free_dependence_relation (ddr);
+      scop->unhandled_alias_ddrs.truncate (0);
+    }
+
+  if (dump_file)
+    fprintf (dump_file, "[codegen] isl AST to Gimple succeeded.\n");
+
   t.translate_isl_ast (context_loop, root_node, e, ip);
   if (! t.codegen_error_p ())
     {
diff --git a/gcc/graphite-poly.c b/gcc/graphite-poly.c
index 1dfc28e6caea..a7aabcb33c99 100644
--- a/gcc/graphite-poly.c
+++ b/gcc/graphite-poly.c
@@ -255,6 +255,7 @@ new_scop (edge entry, edge exit)
   scop_set_region (s, region);
   s->pbbs.create (3);
   s->drs.create (3);
+  s->unhandled_alias_ddrs.create (1);
   s->dependence = NULL;
   return s;
 }
@@ -272,6 +273,7 @@ free_scop (scop_p scop)

   scop->pbbs.release ();
   scop->drs.release ();
+  scop->unhandled_alias_ddrs.release ();

   isl_set_free (scop->param_context);
   scop->param_context = NULL;
diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index 46c470210d05..924004e3f3c4 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -1542,6 +1542,125 @@ try_generate_gimple_bb (scop_p scop, basic_block bb)
   return new_gimple_poly_bb (bb, drs, reads, writes);
 }

+/* Checks if all parts of DR are defined outside of REGION.  This allows an
+   alias check involving DR to be placed in front of the region. */
+
+static opt_result
+dr_defs_outside_region (const sese_l &region, data_reference_p dr)
+{
+  static const char *pre
+      = "cannot create alias check for SCoP. Data reference's";
+  static const char *suf = "uses definitions from SCoP.\n";
+  opt_result res = opt_result::success ();
+
+  if (has_operands_from_region_p (DR_BASE_OBJECT (dr), region))
+    res = opt_result::failure_at (DR_STMT (dr), "%s base %s", pre, suf);
+  else if (has_operands_from_region_p (DR_INIT (dr), region))
+    res = opt_result::failure_at (DR_STMT (dr), "%s constant offset %s", pre,
+                                  suf);
+  else if (has_operands_from_region_p (DR_STEP (dr), region))
+    res = opt_result::failure_at (DR_STMT (dr), "%s step %s", pre, suf);
+  else if (has_operands_from_region_p (DR_OFFSET (dr), region))
+    res = opt_result::failure_at (DR_STMT (dr), "%s loop-invariant offset %s",
+                                  pre, suf);
+  else if (has_operands_from_region_p (DR_BASE_ADDRESS (dr), region))
+    res = opt_result::failure_at (DR_STMT (dr), "%s base address %s", pre,
+                                  suf);
+  else
+    for (unsigned i = 0; i < DR_NUM_DIMENSIONS (dr); ++i)
+      if (has_operands_from_region_p (DR_ACCESS_FN (dr, i), region))
+        {
+          res = opt_result::failure_at (
+              DR_STMT (dr), "%s %d-th access function  %s", pre, i + 1, pre);
+          break;
+        }
+
+  return res;
+}
+
+/* Check that all constituents of DR that are used by the
+   "compute_alias_check_pairs" function have been analyzed as required. */
+
+static opt_result
+dr_well_analyzed_for_runtime_alias_check_p (data_reference_p dr)
+{
+  static const char* error =
+    "data-reference not well-analyzed for runtime check.";
+  gimple* stmt = DR_STMT (dr);
+  opt_result res = opt_result::success ();
+
+  if (! DR_BASE_ADDRESS (dr))
+    res = opt_result::failure_at (stmt, "%s no base address.\n", error);
+  else if (! DR_OFFSET (dr))
+    res = opt_result::failure_at (stmt, "%s no offset.\n", error);
+  else if (! DR_INIT (dr))
+    res = opt_result::failure_at (stmt, "%s no init.\n", error);
+  else if (! DR_STEP (dr))
+    res = opt_result::failure_at (stmt, "%s no step.\n", error);
+  else if (! tree_fits_uhwi_p (DR_STEP (dr)))
+    res = opt_result::failure_at (stmt, "%s step too large.\n", error);
+
+  if (!res)
+    DEBUG_PRINT (dump_data_reference (dump_file, dr));
+
+  return res;
+}
+
+/* Return TRUE if it is possible to create a runtime alias check for
+   data-references DR1 and DR2 from LOOP and place it in front of REGION. */
+
+static opt_result
+graphite_runtime_alias_check_p (data_reference_p dr1, data_reference_p dr2,
+                                class loop *loop, const sese_l &region)
+{
+  gcc_checking_assert (loop);
+  gcc_checking_assert (dr1);
+  gcc_checking_assert (dr2);
+
+  if (dump_file)
+    {
+      fprintf (dump_file,
+               "Attempting runtime alias check creation for DRs:\n");
+      dump_data_reference (dump_file, dr1);
+      dump_data_reference (dump_file, dr2);
+    }
+
+  if (!optimize_loop_for_speed_p (loop))
+    return opt_result::failure_at (DR_STMT (dr1),
+                                   "runtime alias check not supported when"
+                                   " optimizing for size.\n");
+
+  /* Verify that we have enough information about the data-references and
+     context loop to construct a runtime alias check expression with
+     "compute_alias_check_pairs". */
+  tree niters = number_of_latch_executions (loop);
+  if (niters == NULL_TREE || niters == chrec_dont_know)
+    return opt_result::failure_at (DR_STMT (dr1),
+                                  "failed to obtain number of iterations of "
+                                  "loop %d.\n", loop->num);
+
+  opt_result ok = dr_well_analyzed_for_runtime_alias_check_p (dr1);
+  if (!ok)
+    return ok;
+
+  ok = dr_well_analyzed_for_runtime_alias_check_p (dr2);
+  if (!ok)
+    return ok;
+
+  /* The runtime alias check would be placed before REGION and hence it cannot
+     use definitions made within REGION. */
+
+  ok = dr_defs_outside_region (region, dr1);
+  if (!ok)
+    return ok;
+
+  ok = dr_defs_outside_region (region, dr2);
+  if (!ok)
+    return ok;
+
+  return opt_result::success ();
+}
+
 /* Compute alias-sets for all data references in DRS.  */

 static bool
@@ -1549,7 +1668,7 @@ build_alias_set (scop_p scop)
 {
   int num_vertices = scop->drs.length ();
   struct graph *g = new_graph (num_vertices);
-  dr_info *dr1, *dr2;
+  dr_info *dri1, *dri2;
   int i, j;
   int *all_vertices;

@@ -1557,33 +1676,110 @@ build_alias_set (scop_p scop)
     = find_common_loop (scop->scop_info->region.entry->dest->loop_father,
                        scop->scop_info->region.exit->src->loop_father);

-  FOR_EACH_VEC_ELT (scop->drs, i, dr1)
-    for (j = i+1; scop->drs.iterate (j, &dr2); j++)
-      if (dr_may_alias_p (dr1->dr, dr2->dr, nest))
-       {
-         /* Dependences in the same alias set need to be handled
-            by just looking at DR_ACCESS_FNs.  */
-         if (DR_NUM_DIMENSIONS (dr1->dr) == 0
-             || DR_NUM_DIMENSIONS (dr1->dr) != DR_NUM_DIMENSIONS (dr2->dr)
-             || ! operand_equal_p (DR_BASE_OBJECT (dr1->dr),
-                                   DR_BASE_OBJECT (dr2->dr),
-                                   OEP_ADDRESS_OF)
-             || ! types_compatible_p (TREE_TYPE (DR_BASE_OBJECT (dr1->dr)),
-                                      TREE_TYPE (DR_BASE_OBJECT (dr2->dr))))
-           {
-             free_graph (g);
-             return false;
-           }
-         add_edge (g, i, j);
-         add_edge (g, j, i);
-       }
+  gcc_checking_assert (nest);
+
+  vec<loop_p> nest_vec;
+  nest_vec.create (1);
+  if (flag_graphite_runtime_alias_checks)
+    nest_vec.safe_push (nest);
+
+  FOR_EACH_VEC_ELT (scop->drs, i, dri1)
+    {
+      data_reference_p dr1 = dri1->dr;
+
+      for (j = i + 1; scop->drs.iterate (j, &dri2); j++)
+        {
+
+          data_reference_p dr2 = dri2->dr;
+          if (!(DR_IS_READ (dr1) && DR_IS_READ (dr2))
+              && dr_may_alias_p (dr1, dr2, nest))
+            {
+              /* Dependences in the same alias set need to be handled
+                 by just looking at DR_ACCESS_FNs.  */
+              bool dimension_zero = DR_NUM_DIMENSIONS (dr1) == 0;
+              bool different_dimensions
+                  = DR_NUM_DIMENSIONS (dr1) != DR_NUM_DIMENSIONS (dr2);
+              bool different_base_objects = !operand_equal_p (
+                  DR_BASE_OBJECT (dr1), DR_BASE_OBJECT (dr2), OEP_ADDRESS_OF);
+              bool incompatible_types
+                  = !types_compatible_p (TREE_TYPE (DR_BASE_OBJECT (dr1)),
+                                         TREE_TYPE (DR_BASE_OBJECT (dr2)));
+              bool ddr_can_be_handled
+                  = !(dimension_zero || different_dimensions
+                      || different_base_objects || incompatible_types);
+
+              if (!ddr_can_be_handled)
+                {
+                  DEBUG_PRINT (
+                      dp << "[build_alias_set] "
+                            "Cannot handle aliasing between data references:\n";
+                      print_gimple_stmt (dump_file, dr1->stmt, 2, TDF_DETAILS);
+                      print_gimple_stmt (dump_file, dr2->stmt, 2, TDF_DETAILS);
+                      dp << "\n");
+                  if (dimension_zero)
+                    DEBUG_PRINT (dp << "DR1 has dimension 0.\n");
+                  if (different_base_objects)
+                    DEBUG_PRINT (dp << "DRs have different base objects.\n");
+                  if (different_dimensions)
+                    DEBUG_PRINT (dp << "DRs have different dimensions.\n");
+                  if (incompatible_types)
+                    DEBUG_PRINT (dp <<
+                                "DRs have incompatible base object types.\n");
+                }
+
+              if (ddr_can_be_handled)
+                {
+                  add_edge (g, i, j);
+                  add_edge (g, j, i);
+                  continue;
+                }
+
+              loop_p common_loop
+                  = find_common_loop ((DR_STMT (dr1))->bb->loop_father,
+                                      (DR_STMT (dr2))->bb->loop_father);
+              edge scop_entry = scop->scop_info->region.entry;
+              dr1 = create_data_ref (scop_entry, common_loop, DR_REF (dr1),
+                                     DR_STMT (dr1), DR_IS_READ (dr1),
+                                     DR_IS_CONDITIONAL_IN_STMT (dr1));
+              dr2 = create_data_ref (scop_entry, common_loop, DR_REF (dr2),
+                                     DR_STMT (dr2), DR_IS_READ (dr2),
+                                     DR_IS_CONDITIONAL_IN_STMT (dr2));
+
+              if (flag_graphite_runtime_alias_checks
+                  && graphite_runtime_alias_check_p (dr1, dr2, nest,
+                                                     scop->scop_info->region))
+                {
+                  ddr_p ddr = initialize_data_dependence_relation (dr1, dr2,
+                                                                   nest_vec);
+                  scop->unhandled_alias_ddrs.safe_push (ddr);
+                }
+              else
+                {
+                  if (flag_graphite_runtime_alias_checks)
+                    {
+                      unsigned int i;
+                      struct data_dependence_relation *ddr;
+
+                      FOR_EACH_VEC_ELT (scop->unhandled_alias_ddrs, i, ddr)
+                      if (ddr)
+                        free_dependence_relation (ddr);
+                      scop->unhandled_alias_ddrs.truncate (0);
+                    }
+
+                  nest_vec.release ();
+                  free_graph (g);
+                  return false;
+                }
+            }
+      }
+    }

   all_vertices = XNEWVEC (int, num_vertices);
   for (i = 0; i < num_vertices; i++)
     all_vertices[i] = i;

   scop->max_alias_set
-    = graphds_dfs (g, all_vertices, num_vertices, NULL, true, NULL) + 1;
+      = graphds_dfs (g, all_vertices, num_vertices, NULL, true, NULL) + 1;
   free (all_vertices);

   for (i = 0; i < g->n_vertices; i++)
@@ -1703,7 +1899,6 @@ gather_bbs::after_dom_children (basic_block bb)
     }
 }

-
 /* Compute sth like an execution order, dominator order with first executing
    edges that stay inside the current loop, delaying processing exit edges.  */

diff --git a/gcc/graphite.h b/gcc/graphite.h
index 6464d2f50ce7..03febfa39986 100644
--- a/gcc/graphite.h
+++ b/gcc/graphite.h
@@ -368,6 +368,10 @@ struct scop
   /* The maximum alias set as assigned to drs by build_alias_sets.  */
   unsigned max_alias_set;

+  /* A set of ddrs that were rejected by build_alias_set during scop detection
+     and that must be handled by other means (runtime checking). */
+  vec<ddr_p> unhandled_alias_ddrs;
+
   /* All the basic blocks in this scop that contain memory references
      and that will be represented as statements in the polyhedral
      representation.  */
diff --git a/gcc/sese.h b/gcc/sese.h
index cd19e6010196..c51ea68bfb47 100644
--- a/gcc/sese.h
+++ b/gcc/sese.h
@@ -153,6 +153,24 @@ defined_in_sese_p (tree name, const sese_l &r)
   return stmt_in_sese_p (SSA_NAME_DEF_STMT (name), r);
 }

+/* Returns true if EXPR has operands that are defined in REGION.  */
+
+static bool
+has_operands_from_region_p (tree expr, const sese_l &region)
+{
+  if (!expr || is_gimple_min_invariant (expr))
+    return false;
+
+  if (TREE_CODE (expr) == SSA_NAME)
+    return defined_in_sese_p (expr, region);
+
+  for (int i = 0; i < TREE_OPERAND_LENGTH (expr); i++)
+    if (has_operands_from_region_p (TREE_OPERAND (expr, i), region))
+      return true;
+
+  return false;
+}
+
 /* Returns true when LOOP is in REGION.  */

 static inline bool
diff --git a/gcc/testsuite/gcc.dg/graphite/alias-1.c b/gcc/testsuite/gcc.dg/graphite/alias-1.c
new file mode 100644
index 000000000000..ee80dae1df33
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/graphite/alias-1.c
@@ -0,0 +1,22 @@
+/* This test demonstrates a loop nest that Graphite cannot handle
+   because of aliasing. It should be possible to handle this loop nest
+   by creating a runtime alias check like in the very similar test
+   alias-0-runtime-check.c. However Graphite analyses the data
+   reference with respect to the innermost loop that contains the data
+   reference, the variable "i" remains uninstantiated (in contrast to
+   "j"), and consequently the alias check cannot be placed outside of
+   the SCoP since "i" is not defined there. */
+
+/* { dg-options "-O2 -fgraphite-identity -fgraphite-runtime-alias-checks -fdump-tree-graphite-details" } */
+
+void sum(int *x, int *y, unsigned *sum)
+{
+  unsigned i,j;
+  *sum = 0;
+
+  for (i = 0; i < 10000; i=i+1)
+    for (j = 0; j < 22222; j=j+1)
+      *sum +=  x[i] + y[j];
+}
+
+/* { dg-final { scan-tree-dump "number of SCoPs: 1" "graphite" { xfail *-*-* } } } */
--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 20/40] openacc: Use Graphite for dependence analysis in "kernels" regions
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
                   ` (18 preceding siblings ...)
  2021-12-15 15:54 ` [PATCH 19/40] graphite: Add runtime alias checking Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2021-12-15 15:54 ` [PATCH 21/40] openacc: Add "can_be_parallel" flag info to "graph" dumps Frederik Harwath
                   ` (19 subsequent siblings)
  39 siblings, 0 replies; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: thomas, sebpop, grosser, rguenther

This commit changes the handling of OpenACC "kernels" to use Graphite
for dependence analysis. To this end, it first introduces a new
internal representation for "kernels" regions which should be analyzed
by Graphite in pass_omp_oacc_kernels_decompose.  This is now the
default for all "kernels" regions, but the old handling is still
available through the command line parameter
"--param=openacc_kernels=decompose-parloops".  The handling of this
new region type in the omp lowering and omp offloading passes follows
the existing handling for "parallel" regions.  This replaces the
specialized handling for "kernels" regions that was previously used
and which was in limited in many ways.

Graphite is adjusted to be able to analyze the OpenACC functions that
get outlined from the "kernels" regions. It is enabled to handle the
internal function calls that contain information about OpenACC
constructs. In some places where function calls would be rejected by
Graphite, those calls need to be ignored. In other places, information
about the loop step, bounds etc. needs to be extracted from the
calls. The goal is to enable an analysis of the original loop
parameters although the omp lowering and expansion steps have already
modified the loop structure.  Some parallelization-enabling constructs
such as OpenACC "reduction" and "private"/"firstprivate" clauses must
be recognized and the data-dependences must be adjusted to reflect the
semantics of those constructs.  The data-dependence analysis step in
Graphite has so far been tied to the code generation step.  This
commit introduces a separate data-dependence analysis step that avoids
the code generation.  This is necessary because adjusting the code
generation to create a correct OpenACC loop structure would require
very considerable effort and the goal of this commit is to implement
the dependence analysis only. The ability to use Graphite for
dependence analysis without its code generation might be of
independent interest, but it is so far used for OpenACC purposes
only. In general, all changes to Graphite try to avoid affecting other
uses of Graphite as much as possible.

gcc/ChangeLog:

        * Makefile.in: Add graphite-oacc.o
        * cfgloop.c (alloc_loop): Set can_be_parallel_valid_p to false.
        * cfgloop.h: Add can_be_parallel_valid_p field.
        * cfgloopmanip.c (copy_loop_info): Add assert.
        * config/nvptx/nvptx.c (nvptx_goacc_reduction_setup): Add assert.
        * doc/invoke.texi: Adjust param openacc-kernels description.
        * doc/passes.texi: Adjust pass_ipa_oacc_kernels description.
        * flag-types.h (enum openacc_kernels):Add
        OPENACC_KERNELS_DECOMPOSE_PARLOOPS.
        * gimple-pretty-print.c (dump_gimple_omp_target): Handle
        GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
        * gimple.h (enum gf_mask): Add
        GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE and
        widen GF_OMP_TARGET_KIND_MASK.
        (is_gimple_omp_oacc): Handle
        GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
        (is_gimple_omp_offloaded): Likewise.
        * graphite-dependences.c (scop_get_reads_and_writes): Handle
        "kills" and "reduction" PDRs.
        (apply_schedule_on_deps): Add dump output for intermediate
        steps of the dependence computation to enable understanding
        of unexpected dependences.
        (carries_deps): Likewise.
        (scop_get_dependences): Handle "kill" operations and add dump
        output.
        * graphite-isl-ast-to-gimple.c (visit_schedule_loop_node): New function.
        (graphite_oacc_analyze_scop): New function.
        * graphite-optimize-isl.c (optimize_isl): Remove "static" and
        add argument to identify OpenACC use; don't fail on unchanged
        schedule in this case.
        * graphite-poly.c (new_poly_dr): Handle "kills".
        (print_pdr): Likewise.
        (new_gimple_poly_bb): Likewise.
        (free_gimple_poly_bb): Likewise.
        (new_scop): Handle "reduction", "private", and "firstprivate"
        hash sets.
        (free_scop): Likewise.
        (print_isl_space): New function.
        (debug_isl_space): New function.
        * graphite-scop-detection.c (scop_detection::can_represent_loop):
        Don't fail if niter is 0 in OpenACC functions.
        (scop_detection::add_scop): Don't reject regions with only one
        loop in OpenACC functions.
        (ignored_oacc_internal_call_p): New function.
        (scan_tree_for_params): Handle VIEW_CONVERT_EXPR.
        (stmt_has_side_effects): Ignore internal OpenACC function calls.
        (add_write): Likewise.
        (add_read): Likewise.
        (add_kill): New function.
        (add_kills): New function.
        (add_oacc_kills): New function.
        (try_generate_gimple_bb): Kill false dependences for OpenACC
        "private"/"firstprivate" vars.
        (gather_bbs::gather_bbs): Determin OpenACC
        "private"/"firstprivate" vars in region.
        (gather_bbs::before_dom_children): Add assert.
        (determine_openacc_reductions): New function.
        (build_scops): Determine OpenACC "reduction" vars in SCoP.
        * graphite-sese-to-poly.c (oacc_ifn_call_extract): New declaration.
        (oacc_internal_call_p): New function.
        (build_poly_dr): Ignore internal OpenACC function calls,
        handle "reduction" refs.
        (build_poly_sr): Likewise; handle "kill" operations.
        * graphite.c (graphite_transform_loops): Accept functions with
        only a single loop.
        (oacc_enable_graphite_p): New function.
        (gate_graphite_transforms): Enable pass on OpenACC functions.
        * graphite.h (enum poly_dr_type): Add PDR_KILL.
        (struct poly_dr): Add "is_reduction" field.
        (new_poly_dr): Add argument to declaration.
        (pdr_kill_p): New function.
        (print_isl_space): New declaration.
        (debug_isl_space): New declaration.
        (struct scop): Add fields "reductions_vars",
        "oacc_firstprivate_vars", and "oacc_private_scalars".
        (optimize_isl): New declaration.
        (graphite_oacc_analyze_scop): New declaration.
        * internal-fn.c (expand_UNIQUE): Handle
        IFN_UNIQUE_OACC_PRIVATE_SCALAR and IFN_UNIQUE_OACC_FIRSTPRIVATE
        * internal-fn.h: Add OACC_PRIVATE_SCALAR and OACC_FIRSTPRIVATE
        * omp-expand.c (struct omp_region): Adjust comment.
        (expand_omp_for): Add asserts about expected "kernels" region types.
        (mark_loops_in_oacc_kernels_region): Likewise.
        (expand_omp_target): Likewise; handle
        GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
        (build_omp_regions_1): Handle
        GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
        Likewise.
        (omp_make_gimple_edges): Likewise.
        * omp-general.c (oacc_get_kernels_attrib): New function.
        (oacc_get_fn_dim_size): Allow argument to be NULL.
        * omp-general.h (oacc_get_kernels_attrib): New declaration.
        * omp-low.c (struct omp_context): Add fields
        "oacc_firstprivate_vars" and "oacc_private_scalars".
        (was_originally_oacc_kernels): New function.
        (is_oacc_kernels_decomposed_graphite_part): New function.
        (new_omp_context): Allocate "oacc_first_private_vars" and
        "oacc_private_scalars" ...
        (delete_omp_context): ... and free from here.
        (oacc_record_firstprivate_var_clauses): New function.
        (oacc_record_private_scalars): New function.
        (scan_sharing_clauses): Call functions to record "private"
        scalars and "firstprivate" variables.
        (check_oacc_kernel_gwv): Add assert.
        (ctx_in_oacc_kernels_region): Handle
        GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
        (scan_omp_for): Likewise.
        (check_omp_nesting_restrictions): Likewise.
        (lower_oacc_head_mark): Likewise.
        (lower_omp_for): Likewise.
        (lower_omp_target): Create "private" and "firstprivate" marker
        call statements.
        (lower_oacc_head_tail): Adjust "private" and "firstprivate"
        marker calls.
        (lower_oacc_reductions): Emit "private" and "firstprivate"
         marker call statements.
        (make_oacc_firstprivate_vars_marker): New function.
        (make_oacc_private_scalars_marker): New function.
        * omp-oacc-kernels-decompose.cc (adjust_region_code_walk_stmt_fn):
        Assign GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE to
        region using the new "kernels" handling.
        (make_region_seq): Adjust default region type for new
        "kernels" handling; no more exceptions, let Graphite handle everything.
        (make_region_loop_nest): Likewise; add dump output and assert.
        (adjust_nested_loop_clauses): Stop creating "auto" clauses if
        loop has "independent", "gang" etc.
        (transform_kernels_loop_clauses): Likewise.
        * omp-offload.c (oacc_extract_loop_call): New function.
        (oacc_loop_get_cfg_loop): New function.
        (can_be_parallel_str): New function.
        (oacc_loop_can_be_parallel_p): New function.
        (oacc_parallel_kernels_graphite_fun_p): New function.
        (oacc_parallel_fun_p): New function.
        (oacc_loop_transform_auto_into_independent): New function, ...
        (oacc_loop_fixed_partitions): ... called from here to transfer
        the result of Graphite's analysis to the loop.
        (execute_oacc_loop_designation): Handle "oacc
        functions with "parallel_kernels_graphite" attribute.
        (execute_oacc_device_lower): Handle
        IFN_UNIQUE_OACC_PRIVATE_SCALAR and IFN_UNIQUE_OACC_FIRSTPRIVATE.
        * omp-offload.h (oacc_extract_loop_call): Add declaration.
        * params.opt: Add "param=openacc-kernels" value "decompose-parloops".
        * sese.c (scalar_evolution_in_region): "Redirect" SCEV
        analysis to outer loop for IFN_GOACC_LOOP calls.
        * sese.h: Add field "kill_scalar_refs".
        * tree-chrec.c (chrec_fold_plus_1): Handle VIEW_CONVERT_EXPR
        like CASE_CONVERT.
        * tree-data-ref.c (dump_data_reference): Include DR_BASE_ADDRESS and
        DR_OFFSET in dump output.
        (get_references_in_stmt): Don't reject OpenACC internal function
        calls.
        (graphite_find_data_references_in_stmt): Remove unused variable.
        * tree-parloops.c (pass_parallelize_loops::execute): Disable
        pass with the new kernels handling, enable if requested explicitly.
        * tree-scalar-evolution.c (set_scev_analyze_openacc_calls):
        Set flag to enable the analysis of internal OpenACC function
        calls (use for Graphite only).
        (oacc_call_analyzable_p): New function.
        (oacc_ifn_call_extract): New function.
        (oacc_simplify): New function.
        (add_to_evolution): Simplify OpenACC internal function calls
        if applicable.
        (follow_ssa_edge_binary): Likewise.
        (follow_ssa_edge_expr): Likewise.
        (follow_copies_to_constant): Likewise.
        (analyze_initial_condition): Likewise.
        (interpret_loop_phi): Likewise.
        (interpret_gimple_call): New function.
        (interpret_rhs_expr): Likewise.
        (instantiate_scev_name): Likewise.
        (analyze_scalar_evolution_1): Handle GIMPLE_CALL, handle default definitions.
        (expression_expensive_p): Consider internal OpenACC calls to
        be cheap.
        * tree-scalar-evolution.h (set_scev_analyze_openacc_calls):
        New declaration.
        (oacc_call_analyzable_p): New declaration.
        * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Mark
        lhs of internal OpenACC function calls necessary.
        * tree-ssa-loop-niter.c (oacc_call_analyzable_p): New function.
        (oacc_ifn_call_extract): New declaration.
        (interpret_gimple_call): New delcaration.
        (expand_simple_operations): Handle internal OpenACC function calls.
        * tree-ssa-loop.c (gate_oacc_kernels): Disable for new
        "kernels" handling.
        * graphite-oacc.c: New file.
        * graphite-oacc.h: New file.

libgomp/ChangeLog:

        * testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Adjust.
        * testsuite/libgomp.oacc-fortran/kernels-independent.f90: Adjust.
        * testsuite/libgomp.oacc-fortran/kernels-loop-1.f90: Adjust.
        * testsuite/libgomp.oacc-fortran/pr94358-1.f90: Adjust.

gcc/testsuite/ChangeLog:

        * c-c++-common/goacc/classify-kernels.c: Adjust.
        * gfortran.dg/goacc/loop-auto-transfer-2.f90: New test.
        * gfortran.dg/goacc/loop-auto-transfer-3.f90: New test.
        * gfortran.dg/goacc/loop-auto-transfer-4.f90: New test.

Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
---
 gcc/Makefile.in                               |   1 +
 gcc/cfgloop.c                                 |   1 +
 gcc/cfgloop.h                                 |   6 +
 gcc/cfgloopmanip.c                            |   1 +
 gcc/config/nvptx/nvptx.c                      |   7 +
 gcc/doc/invoke.texi                           |  20 +-
 gcc/doc/passes.texi                           |   6 +-
 gcc/flag-types.h                              |   1 +
 gcc/gimple-pretty-print.c                     |   3 +
 gcc/gimple.h                                  |   5 +
 gcc/graphite-dependences.c                    | 220 ++++--
 gcc/graphite-isl-ast-to-gimple.c              |  93 ++-
 gcc/graphite-oacc.c                           | 688 ++++++++++++++++++
 gcc/graphite-oacc.h                           |  55 ++
 gcc/graphite-optimize-isl.c                   |   7 +-
 gcc/graphite-poly.c                           |  39 +-
 gcc/graphite-scop-detection.c                 | 190 ++++-
 gcc/graphite-sese-to-poly.c                   |  65 +-
 gcc/graphite.c                                | 120 ++-
 gcc/graphite.h                                |  35 +-
 gcc/internal-fn.c                             |   4 +
 gcc/internal-fn.h                             |   4 +-
 gcc/omp-expand.c                              |  65 +-
 gcc/omp-general.c                             |  21 +-
 gcc/omp-general.h                             |   1 +
 gcc/omp-low.c                                 | 389 ++++++++--
 gcc/omp-oacc-kernels-decompose.cc             | 145 ++--
 gcc/omp-offload.c                             | 483 +++++++++++-
 gcc/omp-offload.h                             |   2 +
 gcc/params.opt                                |   7 +-
 gcc/sese.c                                    |  25 +-
 gcc/sese.h                                    |   1 +
 .../c-c++-common/goacc/classify-kernels.c     |   2 +-
 .../goacc/loop-auto-transfer-2.f90            |  47 ++
 .../goacc/loop-auto-transfer-3.f90            | 103 +++
 .../goacc/loop-auto-transfer-4.f90            | 323 ++++++++
 gcc/tree-chrec.c                              |   3 +
 gcc/tree-data-ref.c                           |  20 +-
 gcc/tree-parloops.c                           |  18 +-
 gcc/tree-scalar-evolution.c                   | 177 ++++-
 gcc/tree-scalar-evolution.h                   |   3 +
 gcc/tree-ssa-dce.c                            |  23 +
 gcc/tree-ssa-loop-niter.c                     |   6 +
 gcc/tree-ssa-loop.c                           |  11 +
 .../libgomp.oacc-c-c++-common/parallel-dims.c |   2 +
 .../kernels-independent.f90                   |   1 +
 .../libgomp.oacc-fortran/kernels-loop-1.f90   |   1 +
 .../libgomp.oacc-fortran/pr94358-1.f90        |   1 +
 48 files changed, 3123 insertions(+), 328 deletions(-)
 create mode 100644 gcc/graphite-oacc.c
 create mode 100644 gcc/graphite-oacc.h
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-3.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-4.f90

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 571e9c28e29d..debd8047cc85 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1433,6 +1433,7 @@ OBJS = \
        graphite-poly.o \
        graphite-scop-detection.o \
        graphite-sese-to-poly.o \
+       graphite-oacc.o \
        gtype-desc.o \
        haifa-sched.o \
        hash-map-tests.o \
diff --git a/gcc/cfgloop.c b/gcc/cfgloop.c
index 2ba9918bfa2a..a15c2c84c3ca 100644
--- a/gcc/cfgloop.c
+++ b/gcc/cfgloop.c
@@ -349,6 +349,7 @@ alloc_loop (void)
   loop->exits = ggc_cleared_alloc<loop_exit> ();
   loop->exits->next = loop->exits->prev = loop->exits;
   loop->can_be_parallel = false;
+  loop->can_be_parallel_valid_p = false;
   loop->constraints = 0;
   loop->nb_iterations_upper_bound = 0;
   loop->nb_iterations_likely_upper_bound = 0;
diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 0f71a6bf18f2..866ea23c8369 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -213,6 +213,12 @@ public:
   /* True if the loop can be parallel.  */
   unsigned can_be_parallel : 1;

+  /* True if the can_be_parallel flag is valid, i.e.  the
+     parallelizability of the loop has been analyzed.  This can be
+     used to distinguish between unparallelizable loops and a failed
+     analysis, e.g. to provide better diagnostic messages. */
+  unsigned can_be_parallel_valid_p : 1;
+
   /* True if -Waggressive-loop-optimizations warned about this loop
      already.  */
   unsigned warned_aggressive_loop_optimizations : 1;
diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c
index aa538a221e1f..05c381123f65 100644
--- a/gcc/cfgloopmanip.c
+++ b/gcc/cfgloopmanip.c
@@ -952,6 +952,7 @@ copy_loop_info (class loop *loop, class loop *target)
   target->simdlen = loop->simdlen;
   target->constraints = loop->constraints;
   target->can_be_parallel = loop->can_be_parallel;
+  target->can_be_parallel_valid_p = loop->can_be_parallel_valid_p;
   target->warned_aggressive_loop_optimizations
     |= loop->warned_aggressive_loop_optimizations;
   target->dont_vectorize = loop->dont_vectorize;
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 951252e598a2..faec06f2af7c 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -6368,7 +6368,14 @@ nvptx_goacc_reduction_setup (gcall *call, offload_attrs *oa)
     }

   if (lhs)
+    {
+      //TODO Earlier check for ICE as reported in <http://mid.mail-archive.com/878s9zgir3.fsf@euler.schwinge.homeip.net>.
+      //TODO Not sure if this makes too much sense to have (just) here -- should probably be moved (way) further up in the pipeline?
+      if (TREE_CODE (TREE_TYPE (lhs)) == REFERENCE_TYPE)
+       gcc_checking_assert (is_gimple_addressable (var));
+
     gimplify_assign (lhs, var, &seq);
+    }

   pop_gimplify_context (NULL);
   gsi_replace_with_seq (&gsi, seq, true);
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e0f09610408c..f58cdd8724d7 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14775,14 +14775,22 @@ Maximum depth of logical expression evaluation ranger will look through
 when evaluating outgoing edge ranges.

 @item openacc-kernels
-Specify mode of OpenACC `kernels' constructs handling.
-With @option{--param=openacc-kernels=decompose}, OpenACC `kernels'
+Specify mode of OpenACC `kernels' constructs handling.  With
+@option{--param=openacc-kernels=decompose}, OpenACC `kernels'
 constructs are decomposed into parts, a sequence of compute
-constructs, each then handled individually.
-This is work in progress.
+constructs, each then handled individually. The data dependence
+analysis that is necessary to determine if loops can be parallelized
+is performed by the Graphite pass.
+This is the default.
+With @option{--param=openacc-kernels=decompose-parloops}, OpenACC
+`kernels' constructs are decomposed into parts, a sequence of compute
+constructs, each then handled individually by the @samp{parloops}
+pass.
+This is deprecated.
 With @option{--param=openacc-kernels=parloops}, OpenACC `kernels'
-constructs are handled by the @samp{parloops} pass, en bloc.
-This is the current default.
+constructs are handled by the @samp{parloops} pass, en bloc.  This is
+deprecated.
+This is deprecated.

 @item openacc-privatization
 Specify mode of OpenACC privatization diagnostics for
diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi
index 9046cbed2d90..2649e01cc945 100644
--- a/gcc/doc/passes.texi
+++ b/gcc/doc/passes.texi
@@ -248,9 +248,9 @@ constraints in order to generate the points-to sets.  It is located in

 This is a pass group for processing OpenACC kernels regions.  It is a
 subpass of the IPA OpenACC pass group that runs on offloaded functions
-containing OpenACC kernels loops.  It is located in
-@file{tree-ssa-loop.c} and is described by
-@code{pass_ipa_oacc_kernels}.
+containing OpenACC kernels loops if @samp{parloops} based handling of
+kernels regions is used. It is located in @file{tree-ssa-loop.c} and
+is described by @code{pass_ipa_oacc_kernels}.

 @item Target clone

diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index 7cf8c28933b2..bc118308f929 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -481,6 +481,7 @@ enum vrp_mode
 enum openacc_kernels
 {
   OPENACC_KERNELS_DECOMPOSE,
+  OPENACC_KERNELS_DECOMPOSE_PARLOOPS,
   OPENACC_KERNELS_PARLOOPS
 };

diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index 1cd1597359e8..9f4dea184cf3 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -1784,6 +1784,9 @@ dump_gimple_omp_target (pretty_printer *buffer, const gomp_target *gs,
     case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
       kind = " oacc_parallel_kernels_gang_single";
       break;
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE:
+      kind = " oacc_parallel_kernels_graphite";
+      break;
     case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
       kind = " oacc_data_kernels";
       break;
diff --git a/gcc/gimple.h b/gcc/gimple.h
index 3cde3cde7fee..412efff5fa44 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -185,6 +185,9 @@ enum gf_mask {
     /* A 'GF_OMP_TARGET_KIND_OACC_DATA' representing an OpenACC 'kernels'
        decomposed parts' 'data' construct.  */
     GF_OMP_TARGET_KIND_OACC_DATA_KERNELS = 16,
+    /* A GF_OMP_TARGET_KIND_OACC_PARALLEL that originates from a 'kernels'
+       construct, for Graphite to analyze.  */
+    GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE = 17,
     GF_OMP_TEAMS_HOST          = 1 << 0,

     /* True on an GIMPLE_OMP_RETURN statement if the return does not require
@@ -6652,6 +6655,7 @@ is_gimple_omp_oacc (const gimple *stmt)
        case GF_OMP_TARGET_KIND_OACC_DECLARE:
        case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
        case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+       case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE:
        case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
        case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
          return true;
@@ -6681,6 +6685,7 @@ is_gimple_omp_offloaded (const gimple *stmt)
        case GF_OMP_TARGET_KIND_OACC_SERIAL:
        case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
        case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
+       case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE:
          return true;
        default:
          return false;
diff --git a/gcc/graphite-dependences.c b/gcc/graphite-dependences.c
index 9f2eda34add3..24b081624c72 100644
--- a/gcc/graphite-dependences.c
+++ b/gcc/graphite-dependences.c
@@ -38,6 +38,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgloop.h"
 #include "tree-data-ref.h"
 #include "graphite.h"
+#include "graphite-oacc.h"
+#include "gimple-pretty-print.h"
+

 /* Add the constraints from the set S to the domain of MAP.  */

@@ -63,71 +66,108 @@ add_pdr_constraints (poly_dr_p pdr, poly_bb_p pbb)
   return constrain_domain (x, isl_set_copy (pbb->domain));
 }

-/* Returns an isl description of all memory operations in SCOP.  The memory
-   reads are returned in READS and writes in MUST_WRITES and MAY_WRITES.  */
+/* Returns an isl description of all memory operations in SCOP.  The
+   memory reads are returned in READS and writes in MUST_WRITES and
+   MAY_WRITES, kills go to KILLS. */

 static void
 scop_get_reads_and_writes (scop_p scop, isl_union_map *&reads,
                           isl_union_map *&must_writes,
-                          isl_union_map *&may_writes)
+                          isl_union_map *&may_writes,
+                          isl_union_map *&kills)
 {
   int i, j;
   poly_bb_p pbb;
   poly_dr_p pdr;

   FOR_EACH_VEC_ELT (scop->pbbs, i, pbb)
+  {
+    FOR_EACH_VEC_ELT (PBB_DRS (pbb), j, pdr)
     {
-      FOR_EACH_VEC_ELT (PBB_DRS (pbb), j, pdr) {
-       if (pdr_read_p (pdr))
-         {
-           if (dump_file)
-             {
-               fprintf (dump_file, "Adding read to depedence graph: ");
-               print_pdr (dump_file, pdr);
-             }
-           isl_union_map *um
-             = isl_union_map_from_map (add_pdr_constraints (pdr, pbb));
-           reads = isl_union_map_union (reads, um);
-           if (dump_file)
-             {
-               fprintf (dump_file, "Reads depedence graph: ");
-               print_isl_union_map (dump_file, reads);
-             }
-         }
-       else if (pdr_write_p (pdr))
-         {
-           if (dump_file)
-             {
-               fprintf (dump_file, "Adding must write to depedence graph: ");
-               print_pdr (dump_file, pdr);
-             }
-           isl_union_map *um
-             = isl_union_map_from_map (add_pdr_constraints (pdr, pbb));
-           must_writes = isl_union_map_union (must_writes, um);
-           if (dump_file)
-             {
-               fprintf (dump_file, "Must writes depedence graph: ");
-               print_isl_union_map (dump_file, must_writes);
-             }
-         }
-       else if (pdr_may_write_p (pdr))
-         {
-           if (dump_file)
-             {
-               fprintf (dump_file, "Adding may write to depedence graph: ");
-               print_pdr (dump_file, pdr);
-             }
-           isl_union_map *um
-             = isl_union_map_from_map (add_pdr_constraints (pdr, pbb));
-           may_writes = isl_union_map_union (may_writes, um);
-           if (dump_file)
-             {
-               fprintf (dump_file, "May writes depedence graph: ");
-               print_isl_union_map (dump_file, may_writes);
-             }
-         }
-      }
+      isl_union_map *um = NULL;
+
+      if (pdr->is_reduction)
+       {
+         if (dump_file)
+           {
+              fprintf (dump_file,
+                       "Skipped reduction variable %s in statement .\n",
+                      pdr_write_p (pdr) ? "read" : "write");
+             print_gimple_stmt (dump_file, pdr->stmt, 0, dump_flags);
+             fprintf (dump_file, "\n");
+            }
+          continue;
+       }
+
+      if (pdr_read_p (pdr))
+        {
+          if (dump_file)
+            {
+              fprintf (dump_file, "Adding %sread to dependence graph: ",
+                   pdr->is_reduction ? "reduction " : "");
+              print_pdr (dump_file, pdr);
+             isl_map* tmp = add_pdr_constraints (pdr, pbb);
+             print_isl_map (dump_file, tmp);
+             isl_map_free (tmp);
+            }
+          um = isl_union_map_from_map (add_pdr_constraints (pdr, pbb));
+
+          reads = isl_union_map_union (reads, um);
+          if (dump_file)
+           {
+              fprintf (dump_file, "Reads dependence graph: ");
+              print_isl_union_map (dump_file, reads);
+            }
+        }
+      else if (pdr_write_p (pdr))
+        {
+          if (dump_file)
+            {
+              fprintf (dump_file, "Adding %smust write to dependence graph: ",
+                      pdr->is_reduction ? "reduction " : "");
+              print_pdr (dump_file, pdr);
+            }
+
+
+          um = isl_union_map_from_map (add_pdr_constraints (pdr, pbb));
+
+          must_writes = isl_union_map_union (must_writes, um);
+        }
+      else if (pdr_may_write_p (pdr))
+        {
+          if (dump_file)
+            {
+              fprintf (dump_file, "Adding %smay write to dependence graph: ",
+                      pdr->is_reduction ? "reduction " : "");
+              print_pdr (dump_file, pdr);
+            }
+          um = isl_union_map_from_map (add_pdr_constraints (pdr, pbb));
+
+          may_writes = isl_union_map_union (may_writes, um);
+          if (dump_file)
+            {
+              fprintf (dump_file, "May writes dependence graph: ");
+              print_isl_union_map (dump_file, may_writes);
+            }
+        }
+      else if (pdr_kill_p (pdr))
+        {
+          if (dump_file)
+            {
+              fprintf (dump_file, "Adding kill to dependence graph: ");
+              print_pdr (dump_file, pdr);
+            }
+          um = isl_union_map_from_map (add_pdr_constraints (pdr, pbb));
+
+          kills = isl_union_map_union (kills, um);
+          if (dump_file)
+            {
+              fprintf (dump_file, "Kills: ");
+              print_isl_union_map (dump_file, kills);
+            }
+        }
     }
+  }
 }

 /* Helper function used on each MAP of a isl_union_map.  Computes the
@@ -203,7 +243,19 @@ apply_schedule_on_deps (__isl_keep isl_union_map *schedule,
   isl_union_map *trans = extend_schedule (isl_union_map_copy (schedule));
   isl_union_map *ux = isl_union_map_copy (deps);
   ux = isl_union_map_apply_domain (ux, isl_union_map_copy (trans));
+  if (dump_file && dump_flags & TDF_DETAILS)
+    {
+      fprintf (dump_file, "Applied domain map to dependences:\n");
+      print_isl_union_map (dump_file, ux);
+    }
   ux = isl_union_map_apply_range (ux, trans);
+
+  if (dump_file && dump_flags & TDF_DETAILS)
+    {
+      fprintf (dump_file, "Applied range map:\n");
+      print_isl_union_map (dump_file, ux);
+    }
+
   ux = isl_union_map_coalesce (ux);

   if (!isl_union_map_is_empty (ux))
@@ -230,6 +282,12 @@ carries_deps (__isl_keep isl_union_map *schedule,
   if (x == NULL)
     return false;

+  if (dump_file && dump_flags & TDF_DETAILS)
+    {
+      fprintf (dump_file, "Applied schedule on dependences:\n");
+      print_isl_map (dump_file, x);
+    }
+
   isl_space *space = isl_map_get_space (x);
   isl_map *lex = isl_map_lex_le (isl_space_range (space));
   isl_constraint *ineq = isl_inequality_alloc
@@ -244,7 +302,22 @@ carries_deps (__isl_keep isl_union_map *schedule,
   ineq = isl_constraint_set_constant_si (ineq, -1);
   lex = isl_map_add_constraint (lex, ineq);
   lex = isl_map_coalesce (lex);
+
+
+  if (dump_file && dump_flags & TDF_DETAILS)
+    {
+      fprintf (dump_file, "Lex: \n");
+      print_isl_map (dump_file, lex);
+    }
+
   x = isl_map_intersect (x, lex);
+
+  if (dump_file && dump_flags & TDF_DETAILS)
+    {
+      fprintf (dump_file, "Intersect: \n");
+      print_isl_map (dump_file, x);
+    }
+
   bool res = !isl_map_is_empty (x);

   isl_map_free (x);
@@ -265,8 +338,9 @@ scop_get_dependences (scop_p scop)
   isl_space *space = isl_set_get_space (scop->param_context);
   isl_union_map *reads = isl_union_map_empty (isl_space_copy (space));
   isl_union_map *must_writes = isl_union_map_empty (isl_space_copy (space));
-  isl_union_map *may_writes = isl_union_map_empty (space);
-  scop_get_reads_and_writes (scop, reads, must_writes, may_writes);
+  isl_union_map *may_writes = isl_union_map_empty (isl_space_copy (space));
+  isl_union_map *kills = isl_union_map_empty (space);
+  scop_get_reads_and_writes (scop, reads, must_writes, may_writes, kills);

   if (dump_file)
     {
@@ -282,10 +356,11 @@ scop_get_dependences (scop_p scop)
       fprintf (dump_file, "  [1, i0] is a 'memref' with alias set 1"
               " and first subscript access i0.\n");
       fprintf (dump_file, "  [106] is a 'scalar reference' which is the sum of"
-              " SSA_NAME_VERSION 6"
-              " and --param graphite-max-arrays-per-scop=100\n");
+              " SSA_NAME_VERSION 6 and scop->max_alias_set whose value\n is 100"
+              " in this example.\n");
       fprintf (dump_file, "-----------------------\n\n");

+      fprintf (dump_file, "max_alias_set: %d\n", scop->max_alias_set);
       fprintf (dump_file, "data references (\n");
       fprintf (dump_file, "  reads: ");
       print_isl_union_map (dump_file, reads);
@@ -293,31 +368,59 @@ scop_get_dependences (scop_p scop)
       print_isl_union_map (dump_file, must_writes);
       fprintf (dump_file, "  may_writes: ");
       print_isl_union_map (dump_file, may_writes);
+      fprintf (dump_file, "  kills: ");
+      print_isl_union_map (dump_file, kills);
       fprintf (dump_file, ")\n");
     }

   gcc_assert (scop->original_schedule);

+
   isl_union_access_info *ai;
   ai = isl_union_access_info_from_sink (isl_union_map_copy (reads));
   ai = isl_union_access_info_set_must_source (ai, isl_union_map_copy (must_writes));
   ai = isl_union_access_info_set_may_source (ai, may_writes);
+  ai = isl_union_access_info_set_kill (ai, isl_union_map_copy (kills));
   ai = isl_union_access_info_set_schedule
     (ai, isl_schedule_copy (scop->original_schedule));
   isl_union_flow *flow = isl_union_access_info_compute_flow (ai);
   isl_union_map *raw = isl_union_flow_get_must_dependence (flow);
+
+  if (dump_file)
+    {
+      fprintf (dump_file, "raw dependences (\n");
+      print_isl_union_map (dump_file, raw);
+      fprintf (dump_file, ")\n");
+    }
+
   isl_union_flow_free (flow);

   ai = isl_union_access_info_from_sink (isl_union_map_copy (must_writes));
   ai = isl_union_access_info_set_must_source (ai, must_writes);
   ai = isl_union_access_info_set_may_source (ai, reads);
+  ai = isl_union_access_info_set_kill (ai, kills);
   ai = isl_union_access_info_set_schedule
     (ai, isl_schedule_copy (scop->original_schedule));
   flow = isl_union_access_info_compute_flow (ai);

   isl_union_map *waw = isl_union_flow_get_must_dependence (flow);
+
+  if (dump_file)
+    {
+      fprintf (dump_file, "waw dependences (\n");
+      print_isl_union_map (dump_file, waw);
+      fprintf (dump_file, ")\n");
+    }
   isl_union_map *war = isl_union_flow_get_may_dependence (flow);
   war = isl_union_map_subtract (war, isl_union_map_copy (waw));
+
+  if (dump_file)
+    {
+      fprintf (dump_file, "war dependences (\n");
+      print_isl_union_map (dump_file, war);
+      fprintf (dump_file, ")\n");
+    }
+
   isl_union_flow_free (flow);

   raw = isl_union_map_coalesce (raw);
@@ -331,6 +434,9 @@ scop_get_dependences (scop_p scop)

   if (dump_file)
     {
+      fprintf (dump_file, "(space: " );
+      print_isl_space (dump_file, space);
+      fprintf (dump_file, ")\n");
       fprintf (dump_file, "data dependences (\n");
       print_isl_union_map (dump_file, dependences);
       fprintf (dump_file, ")\n");
diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index 073b471775de..e820e2c32202 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -56,6 +56,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa.h"
 #include "tree-vectorizer.h"
 #include "graphite.h"
+#include "graphite-oacc.h"
+#include "stdlib.h"

 struct ast_build_info
 {
@@ -1456,8 +1458,8 @@ generate_entry_out_of_ssa_copies (edge false_entry,
     }
 }

-/* Create a condition that evaluates to TRUE if all ALIAS_DDRS are free of
-   aliasing. */
+/* Create a condition that evaluates to TRUE if all ALIAS_DDRS
+   are free of aliasing. */

 static tree
 generate_alias_cond (vec<ddr_p> &alias_ddrs, loop_p context_loop)
@@ -1617,4 +1619,91 @@ graphite_regenerate_ast_isl (scop_p scop)
   return !t.codegen_error_p ();
 }

+/* A callback for traversing a schedule tree that visits the band
+ nodes of a schedule which correspond to loops. Checks if the local
+ schedule carries any dependencies and marks the corresponding CFG
+ loops as being parallelizable accordingly. */
+
+static isl_bool
+visit_schedule_loop_node (__isl_keep isl_schedule_node *node, void *user)
+{
+  isl_bool visit_children = isl_bool_true;
+
+  if (isl_schedule_node_get_type (node) != isl_schedule_node_band)
+    return visit_children;
+
+  isl_union_map *dependences = (isl_union_map *)user;
+  isl_union_map *schedule
+      = isl_schedule_node_band_get_partial_schedule_union_map (node);
+  isl_space *space = isl_schedule_node_band_get_space (node);
+
+  isl_id *id = isl_space_get_tuple_id (space, isl_dim_out);
+  const char *name = isl_id_get_name (id);
+  /* Expect format set by add_loop_schedule, i.e. "L_n" */
+  gcc_checking_assert (name[0] == 'L' && name[1] == '_');
+  int loop_num = atoi (name + 2);
+  isl_id_free (id);
+
+  int dimension = isl_space_dim (space, isl_dim_out);
+  loop_p loop = get_loop (cfun, loop_num);
+
+  if (dump_file && dump_flags & TDF_DETAILS)
+    {
+      fprintf (dump_file, "CFG loop %d:\n", loop_num);
+      print_isl_union_map (dump_file, schedule);
+      fprintf (dump_file, "Schedule dimension: %d\n", dimension);
+
+      fprintf (dump_file, "Schedule node space:\n");
+      print_isl_space (dump_file, space);
+      fprintf (dump_file, "data dependences (\n");
+      print_isl_union_map (dump_file, dependences);
+      fprintf (dump_file, ")\n");
+    }
+
+  bool has_deps = carries_deps (schedule, dependences, dimension);
+
+  loop->can_be_parallel = !has_deps;
+  loop->can_be_parallel_valid_p = true;
+
+  if (dump_file && dump_flags & TDF_DETAILS)
+    {
+      dump_user_location_t loc = find_loop_location (loop);
+      dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc,
+                       "loop %s data-dependences.\n",
+                      has_deps ? "has" : "has no");
+
+      fprintf (dump_file, ")\n");
+    }
+
+  isl_union_map_free (schedule);
+  isl_space_free (space);
+
+
+  return visit_children;
+}
+
+/* This function performs data-dependence analysis on the SCoP without using
+   Graphite's code generation. This is meant for OpenACC use since the code
+   generator is unable to reconstruct the OpenACC loop structure. */
+
+bool
+graphite_oacc_analyze_scop (scop_p scop)
+{
+  timevar_push (TV_GRAPHITE_CODE_GEN);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "[graphite_oacc_analyze_scop] schedule:\n");
+      print_isl_schedule (dump_file, scop->original_schedule);
+    }
+
+  /* Analyze dependences in SCoP and mark loops as parallelizable accordingly. */
+  isl_schedule_foreach_schedule_node_top_down (
+      scop->original_schedule, visit_schedule_loop_node, scop->dependence);
+
+  timevar_pop (TV_GRAPHITE_CODE_GEN);
+
+  return true;
+}
+
 #endif  /* HAVE_isl */
diff --git a/gcc/graphite-oacc.c b/gcc/graphite-oacc.c
new file mode 100644
index 000000000000..9b3dc7998401
--- /dev/null
+++ b/gcc/graphite-oacc.c
@@ -0,0 +1,688 @@
+/* Functions for analyzing the OpenACC loop structure from Graphite.
+
+   Copyright (C) 2021 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "cfghooks.h"
+#include "tree.h"
+#include "gimple.h"
+#include "cfgloop.h"
+
+#include "internal-fn.h"
+#include "gimple.h"
+#include "tree-cfg.h"
+#include "tree-pretty-print.h"
+#include "gimple-pretty-print.h"
+#include "print-tree.h"
+
+#include "gimple-ssa.h"
+#include "gimple-iterator.h"
+#include "tree-phinodes.h"
+#include "tree-ssa-operands.h"
+#include "ssa-iterators.h"
+#include "omp-general.h"
+#include "graphite-oacc.h"
+
+unsigned
+gimple_call_internal_kind (gimple *call)
+{
+  return TREE_INT_CST_LOW (gimple_call_arg (call, 0));
+}
+
+static bool inline gimple_call_ifn_unique_p (gimple *call,
+                                             enum ifn_unique_kind kind)
+{
+  if (!gimple_call_internal_p (call, IFN_UNIQUE))
+    return false;
+
+  return kind == gimple_call_internal_kind (call);
+}
+
+static bool inline goacc_reduction_call_p (gimple *call)
+{
+  return gimple_call_internal_p (call, IFN_GOACC_REDUCTION);
+}
+
+static bool inline goacc_reduction_call_p (gimple *call,
+                                           enum ifn_goacc_reduction_kind kind)
+{
+  return gimple_call_internal_p (call, IFN_GOACC_REDUCTION)
+         && gimple_call_internal_kind (call) == kind;
+}
+
+/* Check if VAR is private in the OpenACC loop that encloses the cfg LOOP. The
+   function returns TRUE if there is an IFN_UNIQUE_OACC_PRIVATE call in the
+   head sequence that precedes the CFG loop. */
+
+bool
+is_oacc_private (tree var, loop_p loop)
+{
+  return false;
+
+  if (TREE_CODE (var) == SSA_NAME)
+    {
+      if (!SSA_NAME_VAR (var))
+        return false;
+
+      var = SSA_NAME_VAR (var);
+    }
+
+  gcc_checking_assert (TREE_CODE (var) == VAR_DECL);
+
+  if (!loop)
+    return false;
+
+  basic_block bb = loop->header;
+  basic_block entry_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun);
+
+  while (bb != entry_bb)
+    {
+      bb = get_immediate_dominator (CDI_DOMINATORS, bb);
+      gimple *stmt = last_stmt (bb);
+      if (!stmt)
+        continue;
+
+      /* We are looking for the sequence of IFN_UNIQUE calls at the
+          head of the current OpenACC loop. */
+      if (!gimple_call_internal_p (stmt, IFN_UNIQUE))
+        continue;
+
+      enum ifn_unique_kind kind
+          = (enum ifn_unique_kind)TREE_INT_CST_LOW (gimple_call_arg (stmt, 0));
+
+      /* The head mark that starts the current OpenACC loop.
+          Private calls above here are irrelevant. Stop. */
+      if (kind == IFN_UNIQUE_OACC_HEAD_MARK && gimple_call_num_args (stmt) > 2)
+        break;
+
+      if (kind != IFN_UNIQUE_OACC_PRIVATE)
+        continue;
+
+      tree private_var = gimple_call_arg (stmt, 3);
+
+      if (TREE_CODE (private_var) == ADDR_EXPR)
+        private_var = TREE_OPERAND (private_var, 0);
+
+      if (var == private_var)
+        return true;
+    }
+
+  return false;
+}
+
+void
+oacc_add_private_var_kills (loop_p loop, vec<tree> *kills)
+{
+  gcc_checking_assert (loop);
+
+  basic_block bb = loop->header;
+  basic_block entry_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun);
+
+  while (bb != entry_bb)
+    {
+      bb = get_immediate_dominator (CDI_DOMINATORS, bb);
+
+      gimple *stmt = last_stmt (bb);
+      if (!stmt)
+        continue;
+
+      /* We are looking for the sequence of IFN_UNIQUE calls at the head of the
+         current OpenACC loop. */
+
+      if (!gimple_call_ifn_unique_p (stmt, IFN_UNIQUE_OACC_HEAD_MARK))
+        continue;
+
+      /* The head mark that starts the current OpenACC loop.
+         Private calls above here are irrelevant. Stop. */
+      if (gimple_call_num_args (stmt) > 2)
+        break;
+
+      if (!gimple_call_ifn_unique_p (stmt, IFN_UNIQUE_OACC_PRIVATE))
+        continue;
+
+      tree private_var = gimple_call_arg (stmt, 3);
+
+      gcc_checking_assert (TREE_CODE (private_var) == ADDR_EXPR);
+      private_var = TREE_OPERAND (private_var, 0);
+      kills->safe_push (private_var);
+    }
+}
+
+typedef std::pair<gcall *, gcall *> gcall_pair;
+
+/* Returns a pair that contains the internal function calls that start
+   and end the head sequence of the OpenACC loop enclosing the cfg
+   loop LOOP or a pair of NULL pointers if LOOP is not enclosed in a
+   OpenACC LOOP. */
+
+gcall_pair
+find_oacc_head_marks (loop_p loop)
+{
+  basic_block bb = loop->header;
+  basic_block entry_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun);
+
+  gcall *top_head_mark = NULL;
+  gcall *bottom_head_mark = NULL;
+
+  while (bb != entry_bb)
+    {
+      bb = get_immediate_dominator (CDI_DOMINATORS, bb);
+
+      gimple *stmt = last_stmt (bb);
+      if (!stmt)
+        continue;
+
+      /* Look for IFN_UNIQUE calls in the head of OpenACC loop. */
+      if (!gimple_call_ifn_unique_p (stmt, IFN_UNIQUE_OACC_HEAD_MARK))
+        continue;
+
+      if (!bottom_head_mark)
+        {
+          bottom_head_mark = as_a<gcall *> (stmt);
+          continue;
+        }
+
+      /* The head mark that starts the current OpenACC loop can be
+         recognized by the number of call arguments, cf. omp-low.c.  */
+      if (gimple_call_num_args (stmt) > 3)
+        {
+          top_head_mark = as_a<gcall *> (stmt);
+          break;
+        }
+    }
+
+  gcc_checking_assert ((top_head_mark && bottom_head_mark)
+                       || (!top_head_mark && !bottom_head_mark));
+
+  return gcall_pair (top_head_mark, bottom_head_mark);
+}
+
+/* Returns the internal function call that starts the tail sequence of the
+   OpenACC loop that encloses the CFG loop LOOP or NULL if LOOP is not
+   contained in an OpenACC loop. */
+
+gcall *
+find_oacc_top_tail_mark (loop_p loop)
+{
+  gcall_pair head_marks = find_oacc_head_marks (loop);
+
+  if (!head_marks.first || !head_marks.second)
+    return NULL;
+
+  tree data_dep = gimple_call_lhs (head_marks.second);
+  gcc_checking_assert (has_single_use (data_dep));
+
+  gimple *tail_mark;
+  use_operand_p use_p;
+  single_imm_use (data_dep, &use_p, &tail_mark);
+
+  return as_a<gcall *> (tail_mark);
+}
+
+/* Returns a pair containing the internal function calls that start and end the
+   tail sequence of the OpenACC loop that encloses the cfg loop LOOP or a pair
+   of NULL pointers if LOOP does not belong to an OpenACC loop. */
+
+gcall_pair
+find_oacc_tail_marks (loop_p loop)
+{
+  gcall *top_tail_mark = find_oacc_top_tail_mark (loop);
+
+  if (!top_tail_mark)
+    return gcall_pair (NULL, NULL);
+
+  tree data_dep = gimple_call_lhs (top_tail_mark);
+  gimple *stmt = top_tail_mark;
+
+  while (has_single_use (data_dep))
+    {
+      use_operand_p use_p;
+      single_imm_use (data_dep, &use_p, &stmt);
+      data_dep = gimple_call_lhs (stmt);
+
+      gcc_checking_assert (gimple_call_internal_p (stmt));
+    }
+
+  gcall *end_tail_mark = as_a<gcall *> (stmt);
+
+  gcc_checking_assert (
+      gimple_call_ifn_unique_p (end_tail_mark, IFN_UNIQUE_OACC_TAIL_MARK));
+
+  return gcall_pair (top_tail_mark, end_tail_mark);
+}
+
+/* Add all ssa names to VARS that can be reached from PHI by a
+   phi node walk. */
+
+static void
+collect_oacc_reduction_vars_phi_walk (gphi *phi, hash_set<tree> &vars)
+{
+  use_operand_p use_p;
+  ssa_op_iter iter;
+  FOR_EACH_PHI_ARG (use_p, phi, iter, SSA_OP_ALL_USES)
+  {
+    tree use = USE_FROM_PTR (use_p);
+    if (TREE_CODE (use) != SSA_NAME)
+      continue;
+
+    if (vars.contains (use))
+      continue;
+
+    gimple *def_stmt = SSA_NAME_DEF_STMT (use);
+    vars.add (use);
+
+    gphi *use_phi = dyn_cast<gphi *> (def_stmt);
+    if (use_phi)
+      {
+        collect_oacc_reduction_vars_phi_walk (use_phi, vars);
+
+        continue;
+      }
+  }
+}
+
+/* Returns true iff following the immediate use chain from the
+   IFN_GOACC_REDUCTION call CALL leads out of loop that contains CALL. */
+
+static bool
+reduction_use_in_outer_loop_p (gcall *call)
+{
+  gcc_checking_assert (goacc_reduction_call_p (call));
+
+  tree data_dep = gimple_call_lhs (call);
+
+  /* The IFN_GOACC_REDUCTION_CALLS are linked in a chain through
+     immediate uses. Move to the end of this chain. */
+  gimple *stmt = call;
+  while (has_single_use (data_dep))
+    {
+      use_operand_p use_p;
+      single_imm_use (data_dep, &use_p, &stmt);
+
+      if (!goacc_reduction_call_p (stmt))
+        return true;
+
+      data_dep = gimple_call_lhs (stmt);
+    }
+
+  gcc_checking_assert (goacc_reduction_call_p (stmt));
+
+  /* Call starting further reduction use in outer loop. */
+  if (goacc_reduction_call_p (stmt, IFN_GOACC_REDUCTION_SETUP))
+    return true;
+
+  /* Reduction use ends with last internal call in present loop. */
+  if (goacc_reduction_call_p (stmt, IFN_GOACC_REDUCTION_TEARDOWN))
+    return false;
+  gcc_unreachable ();
+}
+
+/* Add all ssa names to VARS that can be reached from BB by walking
+   through the phi nodes which start at the result of an OpenACC
+   reduction computation in BB. */
+
+static void
+collect_oacc_reduction_vars_in_bb (basic_block bb, hash_set<tree> &vars)
+{
+  for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
+       gsi_next (&gsi))
+    {
+      gimple *stmt = gsi_stmt (gsi);
+      if (!goacc_reduction_call_p (stmt, IFN_GOACC_REDUCTION_FINI))
+        continue;
+
+      tree var = gimple_call_arg (stmt, 2);
+      gcc_checking_assert (TREE_CODE (var) == SSA_NAME);
+
+      if (vars.contains (var))
+        continue;
+
+      gimple *def_stmt = SSA_NAME_DEF_STMT (var);
+
+      if (gimple_code (def_stmt) != GIMPLE_PHI)
+        {
+          gcc_checking_assert (goacc_reduction_call_p (def_stmt));
+
+          continue;
+        }
+
+      gcc_checking_assert (
+          goacc_reduction_call_p (stmt, IFN_GOACC_REDUCTION_FINI));
+      gcc_checking_assert (gimple_code (def_stmt) == GIMPLE_PHI);
+
+      if (reduction_use_in_outer_loop_p (as_a<gcall *> (stmt)))
+        vars.add (var);
+
+      collect_oacc_reduction_vars_phi_walk (static_cast<gphi *> (def_stmt),
+                                            vars);
+    }
+}
+
+/* Add all ssa names to VARS that are defined by phi nodes in the header of LOOP
+   such that at least one argument of the phi belongs to VARS. */
+
+static void
+collect_oacc_reduction_vars_in_loop_header (loop_p loop, hash_set<tree> &vars)
+{
+  for (gphi_iterator gpi = gsi_start_phis (loop->header); !gsi_end_p (gpi);
+       gsi_next (&gpi))
+    {
+      gphi *phi = const_cast<gphi *> (gpi.phi ());
+
+      use_operand_p use_p;
+      ssa_op_iter iter;
+      FOR_EACH_PHI_ARG (use_p, phi, iter, SSA_OP_ALL_USES)
+      {
+        tree use = USE_FROM_PTR (use_p);
+        if (vars.contains (use))
+          vars.add (gimple_phi_result (phi));
+      }
+    }
+}
+
+/* Find the ssa names that belong to an OpenACC reduction in the OpenACC loop
+   that surrounds the cfg loop LOOP and add them to VARS.  LOOP must be
+   contained in an OpenACC loop.
+
+   Since the reductions have not and cannot be lowered before execution of the
+   Graphite pass because their lowering is device dependent, Graphite needs to
+   simulate the privatization of the reduction variables by removing
+   dependences between the iteration instances of the loop and the dependences
+   arising from copying the initial value of the reduction variable in and the
+   result out.
+
+   The OpenACC lowering will copy the results of reduction computations at the
+   IFN_GOACC_REDUCTION_FINI calls.  The main reduction statement can thus be
+   identified by walking from those calls through all encountered phi nodes
+   until we reach a gimple assignment statement. The ssa name defined by this
+   statement as well as the ssa_names encountered in the phis along the way are
+   recorded in VARS. In addition, the ssa name defined by each phi which uses a
+   previously identified reduction variable in LOOP's header will also be added
+   to VARS. */
+
+void
+collect_oacc_reduction_vars (loop_p loop, hash_set<tree> &vars)
+{
+  gcall_pair tail = find_oacc_tail_marks (loop);
+  bool in_openacc_loop = tail.first != NULL;
+
+  if (!in_openacc_loop)
+    return;
+
+  const gcall *top_mark = tail.first;
+  const gcall *bottom_mark = tail.second;
+
+  basic_block bb = top_mark->bb;
+  gcc_checking_assert (single_succ_p (bb));
+
+  do
+    {
+      bb = single_succ (bb);
+      collect_oacc_reduction_vars_in_bb (bb, vars);
+    }
+  while (bb != bottom_mark->bb && single_succ_p (bb));
+
+  collect_oacc_reduction_vars_in_loop_header (loop, vars);
+}
+
+static void collect_oacc_privatized_vars_phi_walk_visit_phi_uses (
+    tree var, hash_set<tree> &vars, hash_set<tree> &visited);
+
+/* Add all ssa names to VARS that can be reached from PHI by a phi node walk. */
+
+static void
+collect_oacc_privatized_vars_phi_walk (gphi *phi, hash_set<tree> &vars,
+                                       hash_set<tree> &visited)
+{
+  tree var = PHI_RESULT (phi);
+  bool existed = vars.add (var);
+  if (existed)
+    return;
+
+  use_operand_p use_p;
+  ssa_op_iter iter;
+  FOR_EACH_PHI_ARG (use_p, phi, iter, SSA_OP_ALL_USES)
+  {
+    tree use = USE_FROM_PTR (use_p);
+    if (TREE_CODE (use) != SSA_NAME)
+      continue;
+
+    if (visited.contains (use))
+      continue;
+
+    gimple *def_stmt = SSA_NAME_DEF_STMT (use);
+    gphi *use_phi = dyn_cast<gphi *> (def_stmt);
+    if (use_phi)
+      {
+        collect_oacc_privatized_vars_phi_walk (use_phi, vars, visited);
+        visited.add (use);
+        continue;
+      }
+
+    vars.add (use);
+
+    /* Visit the uses of USE in other phi nodes. This is used to get from loop
+       exit phis in inner loops to the loop entry phis. */
+
+    collect_oacc_privatized_vars_phi_walk_visit_phi_uses (use, vars, visited);
+    visited.add (use);
+  }
+}
+
+/* Records all uses of VAR in phis in VARS and continues the phi walk on each
+   such use. */
+
+static void
+collect_oacc_privatized_vars_phi_walk_visit_phi_uses (tree var,
+                                                      hash_set<tree> &vars,
+                                                      hash_set<tree> &visited)
+{
+  imm_use_iterator iter;
+  use_operand_p use_p;
+  FOR_EACH_IMM_USE_FAST (use_p, iter, var)
+  {
+    tree use = USE_FROM_PTR (use_p);
+    if (TREE_CODE (use) != SSA_NAME)
+      continue;
+
+    if (visited.contains (use))
+      continue;
+
+    gimple *use_stmt = USE_STMT (use_p);
+    gphi *use_phi = dyn_cast<gphi *> (use_stmt);
+
+    if (use_phi)
+      {
+        visited.add (PHI_RESULT (use_phi));
+        collect_oacc_privatized_vars_phi_walk (use_phi, vars, visited);
+        continue;
+      }
+
+    if (TREE_CODE (use) == SSA_NAME
+        && SSA_NAME_VAR (use) == SSA_NAME_VAR (var))
+      {
+        if (!vars.add (use))
+          collect_oacc_privatized_vars_phi_walk_visit_phi_uses (use, vars,
+                                                                visited);
+        continue;
+      }
+  }
+
+  return;
+}
+
+/* Return the first IFN_UNIQUE call with the given KIND that follows the tail
+   sequence of the OpenACC loop surrounding LOOP. */
+
+static gcall *
+find_ifn_unique_call_below (loop_p loop, enum ifn_unique_kind kind)
+{
+  gcall_pair tail = find_oacc_tail_marks (loop);
+  bool in_openacc_loop = tail.first != NULL;
+
+  if (!in_openacc_loop)
+    return NULL;
+
+  edge exit = single_exit (loop);
+  basic_block bb = exit->dest;
+  while ((bb = get_immediate_dominator (CDI_POST_DOMINATORS, bb)))
+    {
+      gimple *stmt = last_stmt (bb);
+
+      if (!stmt)
+        continue;
+
+      if (gimple_call_ifn_unique_p (stmt, kind))
+        return static_cast<gcall *> (stmt);
+    }
+
+  return NULL;
+}
+
+/* Return the IFN_UNIQUE_OACC_PRIVATE_SCALAR call which follows the tail
+   sequence of the OpenACC loop surrounding LOOP. */
+
+gcall *
+get_oacc_private_scalars_call (loop_p loop)
+{
+  return find_ifn_unique_call_below (loop, IFN_UNIQUE_OACC_PRIVATE_SCALAR);
+}
+
+/* Return the IFN_UNIQUE_OACC_FIRSTPRIVATE call which follows the tail
+   sequence of the OpenACC loop surrounding LOOP. */
+
+gcall *
+get_oacc_firstprivate_call (loop_p loop)
+{
+  return find_ifn_unique_call_below (loop, IFN_UNIQUE_OACC_FIRSTPRIVATE);
+}
+
+/* Find the ssa names that belong to the computation of variables that are
+   "private" in the OpenACC loop that surrounds the CFG loop LOOP and add them
+   to VARS.  LOOP must be contained in an OpenACC loop.
+
+   The CFG loop structure of OpenACC loops does not directly reflect the
+   privatization of the variable since the original loop has been enclosed in a
+   "chunking" loop. The "private" scalars variables are alive in those two
+   outermost CFG loops and the corresponding phis must be ignored by Graphite in
+   order to recognize the parallelizability of the loop. Omp-low.c places a
+   special internal function call after the outermost loop of a parallel region
+   whose arguments list the "private" variables that are considered here */
+
+void
+collect_oacc_privatized_vars (gcall *marker, hash_set<tree> &vars)
+{
+  if (!marker)
+    return;
+
+  gcc_checking_assert (marker->bb->loop_father->num == 0);
+
+  /* Search for phis that can be reached from the vars listed in the
+     PRIVATE_SCALARS_CALL's arguments. */
+
+  const unsigned n = gimple_call_num_args (marker);
+  for (unsigned i = 1; i < n; ++i)
+    {
+      tree arg = gimple_call_arg (marker, i);
+
+      if (TREE_CODE (arg) != SSA_NAME)
+        continue;
+
+      gimple *def_stmt = SSA_NAME_DEF_STMT (arg);
+      gphi *phi = dyn_cast<gphi *> (def_stmt);
+      if (!phi)
+        {
+          /* If the argument does not point to a phi, then it must be some value
+            defined outside of any OpenACC loop nest, i.e. a parameter of the
+            loop-nest. */
+          gcc_checking_assert (!def_stmt->bb
+                               || def_stmt->bb->loop_father->num == 0);
+          continue;
+        }
+
+      hash_set<tree> visited;
+      collect_oacc_privatized_vars_phi_walk (phi, vars, visited);
+    }
+}
+
+/* Return true if LOOP is an OpenACC loop with an "auto" clause, false otherwise. */
+
+static bool
+oacc_loop_with_auto_clause_p (loop_p loop)
+{
+  gcall_pair head_marks = find_oacc_head_marks (loop);
+
+  if (!head_marks.first)
+    return false;
+
+  unsigned flags = TREE_INT_CST_LOW (gimple_call_arg (head_marks.first, 3));
+  return flags & OLF_AUTO;
+}
+
+/* Return true if FUN is an outlined OpenACC function that contains loops with
+   "auto" clauses. */
+
+static bool
+function_has_auto_loops_p (function *fun)
+{
+  gcc_checking_assert (oacc_function_p (fun));
+
+  for (auto loop : loops_list (fun, 0))
+    if (oacc_loop_with_auto_clause_p (loop))
+      return true;
+
+  return false;
+}
+
+/* Return true if Graphite might analyze outlined OpenACC functions for the kind
+   of target region for which FUN was created. The actual decision whether
+   Graphite runs on FUN may be subject to further restrictions. */
+
+bool
+graphite_analyze_oacc_target_region_type_p (function *fun)
+{
+  gcc_checking_assert (oacc_function_p (fun));
+
+  bool is_oacc_parallel
+      = lookup_attribute ("oacc parallel",
+                          DECL_ATTRIBUTES (current_function_decl))
+        != NULL;
+
+  bool is_oacc_parallel_kernels_graphite
+      = lookup_attribute ("oacc parallel_kernels_graphite",
+                          DECL_ATTRIBUTES (current_function_decl))
+        != NULL;
+
+  return is_oacc_parallel || is_oacc_parallel_kernels_graphite;
+}
+
+/* Return true if FUN is an outlined OpenACC function that is going to be
+   analyzed by Graphite. */
+
+bool
+graphite_analyze_oacc_function_p (function *fun)
+{
+  gcc_checking_assert (oacc_function_p (fun));
+
+  return graphite_analyze_oacc_target_region_type_p (cfun)
+         && function_has_auto_loops_p (cfun);
+}
diff --git a/gcc/graphite-oacc.h b/gcc/graphite-oacc.h
new file mode 100644
index 000000000000..458e8de24dac
--- /dev/null
+++ b/gcc/graphite-oacc.h
@@ -0,0 +1,55 @@
+/* Functions for analyzing the OpenACC loop structure from Graphite.
+
+   Copyright (C) 2021 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_GRAPHITE_OACC_H
+#define GCC_GRAPHITE_OACC_H
+
+#include "stringpool.h"
+#include "omp-general.h"
+#include "attribs.h"
+#include "cfgloop.h"
+#include "tree-pretty-print.h"
+#include "print-tree.h"
+
+static inline bool oacc_function_p (function *fun)
+{
+  return oacc_get_fn_attrib (fun->decl);
+}
+
+extern bool is_oacc_private (tree var, loop_p loop);
+extern void oacc_add_private_var_kills (loop_p loop, vec<tree> *kills);
+
+extern const gcall* find_oacc_head_mark (loop_p loop, bool last = false);
+
+extern void collect_oacc_reduction_vars (loop_p loop, hash_set<tree> &vars);
+extern void collect_oacc_firstprivate_vars (loop_p loop, hash_set<tree> &vars);
+extern void collect_oacc_private_scalars (loop_p loop, hash_set<tree> &vars);
+extern void collect_oacc_privatized_vars (gcall *marker, hash_set<tree> &vars);
+
+extern gcall* get_oacc_firstprivate_call (loop_p loop);
+extern gcall* get_oacc_private_scalars_call (loop_p loop);
+
+extern bool graphite_analyze_oacc_function_p (function *fun);
+extern bool graphite_analyze_oacc_target_region_type_p (function *fun);
+
+extern gcall* get_oacc_firstprivate_call (loop_p loop);
+extern gcall* get_oacc_private_scalars_call (loop_p loop);
+
+#endif /* GCC_GRAPHITE_OACC_H */
diff --git a/gcc/graphite-optimize-isl.c b/gcc/graphite-optimize-isl.c
index 6928f3e33dca..019452700a49 100644
--- a/gcc/graphite-optimize-isl.c
+++ b/gcc/graphite-optimize-isl.c
@@ -109,8 +109,8 @@ scop_get_domains (scop_p scop)
 /* Compute the schedule for SCOP based on its parameters, domain and set of
    constraints.  Then apply the schedule to SCOP.  */

-static bool
-optimize_isl (scop_p scop)
+bool
+optimize_isl (scop_p scop, bool oacc_enabled_graphite)
 {
   int old_err = isl_options_get_on_error (scop->isl_context);
   int old_max_operations = isl_ctx_get_max_operations (scop->isl_context);
@@ -196,7 +196,8 @@ optimize_isl (scop_p scop)
        print_schedule_ast (dump_file, scop->original_schedule, scop);
       isl_schedule_free (scop->transformed_schedule);
       scop->transformed_schedule = isl_schedule_copy (scop->original_schedule);
-      return flag_graphite_identity || flag_loop_parallelize_all;
+      return flag_graphite_identity || flag_loop_parallelize_all
+             || oacc_enabled_graphite;
     }

   return true;
diff --git a/gcc/graphite-poly.c b/gcc/graphite-poly.c
index a7aabcb33c99..810f7a9918bc 100644
--- a/gcc/graphite-poly.c
+++ b/gcc/graphite-poly.c
@@ -89,7 +89,8 @@ debug_iteration_domains (scop_p scop)

 void
 new_poly_dr (poly_bb_p pbb, gimple *stmt, enum poly_dr_type type,
-            isl_map *acc, isl_set *subscript_sizes)
+            isl_map *acc, isl_set *subscript_sizes,
+            bool is_reduction)
 {
   static int id = 0;
   poly_dr_p pdr = XNEW (struct poly_dr);
@@ -102,10 +103,12 @@ new_poly_dr (poly_bb_p pbb, gimple *stmt, enum poly_dr_type type,
   pdr->subscript_sizes = subscript_sizes;
   PDR_TYPE (pdr) = type;
   PBB_DRS (pbb).safe_push (pdr);
+  pdr->is_reduction = is_reduction;

   if (dump_file)
     {
-      fprintf (dump_file, "Converting dr: ");
+      fprintf (dump_file, "Converting%sdr: ",
+              is_reduction ? " reduction " : " ");
       print_pdr (dump_file, pdr);
       fprintf (dump_file, "To polyhedral representation:\n");
       fprintf (dump_file, "  - access functions: ");
@@ -181,6 +184,10 @@ print_pdr (FILE *file, poly_dr_p pdr)
       fprintf (file, "may_write \n");
       break;

+    case PDR_KILL:
+      fprintf (file, "kill \n");
+      break;
+
     default:
       gcc_unreachable ();
     }
@@ -206,13 +213,15 @@ debug_pdr (poly_dr_p pdr)

 gimple_poly_bb_p
 new_gimple_poly_bb (basic_block bb, vec<data_reference_p> drs,
-                   vec<scalar_use> reads, vec<tree> writes)
+                   vec<scalar_use> reads, vec<tree> writes,
+                   vec<tree> kills)
 {
   gimple_poly_bb_p gbb = XNEW (struct gimple_poly_bb);
   GBB_BB (gbb) = bb;
   GBB_DATA_REFS (gbb) = drs;
   gbb->read_scalar_refs = reads;
   gbb->write_scalar_refs = writes;
+  gbb->kill_scalar_refs = kills;
   GBB_CONDITIONS (gbb).create (0);
   GBB_CONDITION_CASES (gbb).create (0);

@@ -229,6 +238,7 @@ free_gimple_poly_bb (gimple_poly_bb_p gbb)
   GBB_CONDITION_CASES (gbb).release ();
   gbb->read_scalar_refs.release ();
   gbb->write_scalar_refs.release ();
+  gbb->kill_scalar_refs.release ();
   XDELETE (gbb);
 }

@@ -255,6 +265,9 @@ new_scop (edge entry, edge exit)
   scop_set_region (s, region);
   s->pbbs.create (3);
   s->drs.create (3);
+  s->reduction_vars = new hash_set<tree>(1);
+  s->oacc_firstprivate_vars = new hash_set<tree>(1);
+  s->oacc_private_scalars = new hash_set<tree>(1);
   s->unhandled_alias_ddrs.create (1);
   s->dependence = NULL;
   return s;
@@ -273,6 +286,9 @@ free_scop (scop_p scop)

   scop->pbbs.release ();
   scop->drs.release ();
+  delete scop->reduction_vars;
+  delete scop->oacc_firstprivate_vars;
+  delete scop->oacc_private_scalars;
   scop->unhandled_alias_ddrs.release ();

   isl_set_free (scop->param_context);
@@ -529,6 +545,23 @@ debug_isl_map (__isl_keep isl_map *map)
   print_isl_map (stderr, map);
 }

+
+void
+print_isl_space (FILE *f, __isl_keep isl_space *space)
+{
+  isl_printer *p = isl_printer_to_file (the_isl_ctx, f);
+  p = isl_printer_set_yaml_style (p, ISL_YAML_STYLE_BLOCK);
+  p = isl_printer_print_space (p, space);
+  p = isl_printer_print_str (p, "\n");
+  isl_printer_free (p);
+}
+
+DEBUG_FUNCTION void
+debug_isl_space (__isl_keep isl_space *space)
+{
+  print_isl_space (stderr, space);
+}
+
 void
 print_isl_union_map (FILE *f, __isl_keep isl_union_map *map)
 {
diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index 924004e3f3c4..234dbe0ec729 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -49,6 +49,10 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-pretty-print.h"
 #include "cfganal.h"
 #include "graphite.h"
+#include "omp-general.h"
+#include "graphite-oacc.h"
+#include "print-tree.h"
+#include "internal-fn.h"

 class debug_printer
 {
@@ -527,14 +531,13 @@ scop_detection::merge_sese (sese_l first, sese_l second) const
 static void
 print_sese_loop_numbers (FILE *file, sese_l sese)
 {
-  loop_p loop;
   bool printed = false;
-  FOR_EACH_LOOP (loop, 0)
-  {
-    if (loop_in_sese_p (loop, sese))
-      fprintf (file, "%d, ", loop->num);
-    printed = true;
-  }
+  for (auto loop : loops_list (cfun, 0))
+    {
+      if (loop_in_sese_p (loop, sese))
+        fprintf (file, "%d, ", loop->num);
+      printed = true;
+    }
   if (printed)
     fprintf (file, "\b\b");
 }
@@ -630,7 +633,9 @@ scop_detection::can_represent_loop (loop_p loop, sese_l scop)
       DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop niter unknown.\n");
       return false;
     }
-  if (!niter_desc.control.no_overflow)
+  /* TODO The zero niter can probably be allowed in general */
+  if (!niter_desc.control.no_overflow
+      && !(oacc_function_p (cfun) && integer_zerop (niter)))
     {
       DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop niter can overflow.\n");
       return false;
@@ -701,8 +706,7 @@ scop_detection::add_scop (sese_l s)
       s.exit = single_succ_edge (s.exit->dest);
     }

-  /* Do not add scops with only one loop.  */
-  if (region_has_one_loop (s))
+  if (!oacc_function_p (cfun) && region_has_one_loop (s))
     {
       DEBUG_PRINT (dp << "[scop-detection-fail] Discarding one loop SCoP: ";
                   print_sese (dump_file, s));
@@ -729,11 +733,10 @@ scop_detection::add_scop (sese_l s)

   if (dump_file && dump_flags & TDF_DETAILS)
     {
-      loop_p loop;
       fprintf (dump_file, "Loops in SCoP: ");
-      FOR_EACH_LOOP (loop, 0)
-      if (loop_in_sese_p (loop, s))
-        fprintf (dump_file, "%d ", loop->num);
+      for (auto loop : loops_list (cfun, 0))
+        if (loop_in_sese_p (loop, s))
+          fprintf (dump_file, "%d ", loop->num);
       fprintf (dump_file, "\n");
     }
 }
@@ -1084,6 +1087,17 @@ scop_detection::stmt_has_simple_data_refs_p (sese_l scop, gimple *stmt)
   return true;
 }

+/* Check if STMT is a internal OpenACC function call that should be ignored when
+   Graphite checks side effects. */
+
+static inline bool
+ignored_oacc_internal_call_p (gimple *stmt)
+{
+  return is_gimple_call (stmt)
+         && (gimple_call_internal_p (stmt, IFN_UNIQUE)
+             || gimple_call_internal_p (stmt, IFN_GOACC_REDUCTION));
+}
+
 /* GIMPLE_ASM and GIMPLE_CALL may embed arbitrary side effects.
    Calls have side-effects, except those to const or pure
    functions.  */
@@ -1091,6 +1105,9 @@ scop_detection::stmt_has_simple_data_refs_p (sese_l scop, gimple *stmt)
 static bool
 stmt_has_side_effects (gimple *stmt)
 {
+  if (ignored_oacc_internal_call_p (stmt))
+    return false;
+
   if (gimple_has_volatile_ops (stmt)
       || (gimple_code (stmt) == GIMPLE_CALL
          && !(gimple_call_flags (stmt) & (ECF_CONST | ECF_PURE)))
@@ -1288,6 +1305,7 @@ scan_tree_for_params (sese_info_p s, tree e)
     case NEGATE_EXPR:
     case BIT_NOT_EXPR:
     CASE_CONVERT:
+    case VIEW_CONVERT_EXPR:
     case NON_LVALUE_EXPR:
       scan_tree_for_params (s, TREE_OPERAND (e, 0));
       break;
@@ -1362,6 +1380,9 @@ find_scop_parameters (scop_p scop)
 static void
 add_write (vec<tree> *writes, tree def)
 {
+  if (ignored_oacc_internal_call_p (SSA_NAME_DEF_STMT (def)))
+    return;
+
   writes->safe_push (def);
   DEBUG_PRINT (dp << "Adding scalar write: ";
               print_generic_expr (dump_file, def);
@@ -1370,9 +1391,27 @@ add_write (vec<tree> *writes, tree def)
                                  SSA_NAME_DEF_STMT (def), 0));
 }

+static void
+add_kill (vec<tree> *kills, tree def)
+{
+  if (ignored_oacc_internal_call_p (SSA_NAME_DEF_STMT (def)))
+    return;
+
+  kills->safe_push (def);
+  DEBUG_PRINT (dp << "Adding scalar kill: ";
+              print_generic_expr (dump_file, def);
+              dp << "\n");
+}
+
 static void
 add_read (vec<scalar_use> *reads, tree use, gimple *use_stmt)
 {
+  gcc_assert (TREE_CODE (use) == SSA_NAME);
+
+  if ((use_stmt && ignored_oacc_internal_call_p (use_stmt))
+      || ignored_oacc_internal_call_p (SSA_NAME_DEF_STMT (use)))
+    return;
+
   DEBUG_PRINT (dp << "Adding scalar read: ";
               print_generic_expr (dump_file, use);
               dp << "\nFrom stmt: ";
@@ -1428,6 +1467,58 @@ build_cross_bb_scalars_use (scop_p scop, tree use, gimple *use_stmt,
     add_read (reads, use, use_stmt);
 }

+/* Add kills for all ssa names in vector FROM to vector KILLS. */
+
+static void add_kills (hash_set<tree>* from, vec<tree> &kills)
+{
+  hash_set<tree>::iterator end = from->end();
+  hash_set<tree>::iterator it = from->begin ();
+  for (; it != end; ++it)
+    {
+      tree var = *it;
+      add_kill (&kills, var);
+    }
+}
+
+/* Add kill operations for the privatized OpenACC variables that have been
+   recorded for SCOP for the basic block BB into the vector KILLS. */
+
+static void
+add_oacc_kills (scop_p scop, basic_block bb, vec<tree> &kills)
+{
+
+  loop_p loop = bb->loop_father;
+
+  /* Right now we only handle "firstprivate" and "private" variables that occur
+     on an OpenACC computer region. Those affect only the outermost and hence -
+     because of the "chunking" loop created in omp-expand.c around the original
+     loop - the two outermost CFG loops. */
+  if (loop_depth (loop) > 2)
+    return;
+
+  edge_iterator ei;
+  edge e;
+  FOR_EACH_EDGE (e, ei, bb->preds)
+  {
+    if (e->src == loop->header)
+      {
+        add_kills (scop->oacc_private_scalars, kills);
+        add_kills (scop->oacc_firstprivate_vars, kills);
+        break;
+      }
+  }
+
+  FOR_EACH_EDGE (e, ei, bb->succs)
+  {
+    if (e->dest == loop->header)
+      {
+        add_kills (scop->oacc_private_scalars, kills);
+        add_kills (scop->oacc_firstprivate_vars, kills);
+        break;
+      }
+  }
+}
+
 /* Generates a polyhedral black box only if the bb contains interesting
    information.  */

@@ -1436,6 +1527,7 @@ try_generate_gimple_bb (scop_p scop, basic_block bb)
 {
   vec<data_reference_p> drs = vNULL;
   vec<tree> writes = vNULL;
+  vec<tree> kills = vNULL;
   vec<scalar_use> reads = vNULL;

   sese_l region = scop->scop_info->region;
@@ -1497,10 +1589,15 @@ try_generate_gimple_bb (scop_p scop, basic_block bb)
               gsi_next (&psi))
            {
              gphi *phi = psi.phi ();
-             tree res = gimple_phi_result (phi);
-             if (virtual_operand_p (res))
-               continue;
-             /* To simulate out-of-SSA the predecessor of edges into PHI nodes
+              tree res = gimple_phi_result (phi);
+              if (virtual_operand_p (res))
+                continue;
+
+              if (scop->oacc_private_scalars->contains (res)
+                  || scop->oacc_firstprivate_vars->contains (res))
+                continue;
+
+              /* To simulate out-of-SSA the predecessor of edges into PHI nodes
                 has a copy from the PHI argument to the PHI destination.  */
              if (! scev_analyzable_p (res, scop->scop_info->region))
                add_write (&writes, res);
@@ -1536,10 +1633,15 @@ try_generate_gimple_bb (scop_p scop, basic_block bb)
        }
     }

-  if (drs.is_empty () && writes.is_empty () && reads.is_empty ())
+  if (loop &&    /* i.e. BB belongs to SCOP. */
+      oacc_function_p (cfun))
+    add_oacc_kills (scop, bb, kills);
+
+  if (drs.is_empty () && writes.is_empty () && reads.is_empty ()
+      && kills.is_empty ())
     return NULL;

-  return new_gimple_poly_bb (bb, drs, reads, writes);
+  return new_gimple_poly_bb (bb, drs, reads, writes, kills);
 }

 /* Checks if all parts of DR are defined outside of REGION.  This allows an
@@ -1802,10 +1904,21 @@ private:
   auto_vec<gimple *, 3> conditions, cases;
   scop_p scop;
 };
-}
+
 gather_bbs::gather_bbs (cdi_direction direction, scop_p scop, int *bb_to_rpo)
-  : dom_walker (direction, ALL_BLOCKS, bb_to_rpo), scop (scop)
+    : dom_walker (direction, ALL_BLOCKS, bb_to_rpo), scop (scop)
 {
+  if (oacc_function_p (cfun))
+    {
+      edge scop_entry = scop->scop_info->region.entry;
+      loop_p loop = scop_entry->dest->loop_father;
+      gcall *firstprivate_call = get_oacc_firstprivate_call (loop);
+      collect_oacc_privatized_vars (firstprivate_call,
+                                    *scop->oacc_firstprivate_vars);
+
+      gcall *private_call = get_oacc_private_scalars_call (loop);
+      collect_oacc_privatized_vars (private_call, *scop->oacc_private_scalars);
+    }
 }

 /* Call-back for dom_walk executed before visiting the dominated
@@ -1864,6 +1977,8 @@ gather_bbs::before_dom_children (basic_block bb)
   data_reference_p dr;
   FOR_EACH_VEC_ELT (gbb->data_refs, i, dr)
     {
+      gcc_checking_assert (! ignored_oacc_internal_call_p (DR_STMT (dr)));
+
       DEBUG_PRINT (dp << "Adding memory ";
                   if (dr->is_read)
                     dp << "read: ";
@@ -1899,6 +2014,8 @@ gather_bbs::after_dom_children (basic_block bb)
     }
 }

+}
+
 /* Compute sth like an execution order, dominator order with first executing
    edges that stay inside the current loop, delaying processing exit edges.  */

@@ -1921,6 +2038,21 @@ cmp_pbbs (const void *pa, const void *pb)
     return 0;
 }

+/* Analyze the OpenACC loop structure surrounding SCOP to determine the ssa
+   names that belong to OpenACC reduction computations. */
+
+static void
+determine_openacc_reductions (scop_p scop)
+{
+  for (auto loop : loops_list (cfun, 0))
+    {
+      if (!loop_in_sese_p (loop, scop->scop_info->region))
+        continue;
+
+      collect_oacc_reduction_vars (loop, *scop->reduction_vars);
+    }
+}
+
 /* Find Static Control Parts (SCoP) in the current function and pushes
    them to SCOPS.  */

@@ -1956,11 +2088,12 @@ build_scops (vec<scop_p> *scops)
       /* Sort pbbs after execution order for initial schedule generation.  */
       scop->pbbs.qsort (cmp_pbbs);

-      if (! build_alias_set (scop))
-       {
-         DEBUG_PRINT (dp << "[scop-detection-fail] cannot handle dependences\n");
-         free_scop (scop);
-         continue;
+      if (!build_alias_set (scop))
+        {
+          DEBUG_PRINT (dp
+                      << "[scop-detection-fail] cannot handle dependences\n");
+          free_scop (scop);
+          continue;
        }

       /* Do not optimize a scop containing only PBBs that do not belong
@@ -1997,6 +2130,9 @@ build_scops (vec<scop_p> *scops)
          continue;
        }

+      if (oacc_function_p (cfun))
+        determine_openacc_reductions (scop);
+
       scops->safe_push (scop);
     }

diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index 33d6a98327b8..e6aced3b0004 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -36,6 +36,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimplify.h"
 #include "gimplify-me.h"
 #include "tree-cfg.h"
+#include "graphite-oacc.h"
 #include "tree-ssa-loop-manip.h"
 #include "tree-ssa-loop-niter.h"
 #include "tree-ssa-loop.h"
@@ -46,6 +47,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-scalar-evolution.h"
 #include "domwalk.h"
 #include "tree-ssa-propagate.h"
+#include "tree-pretty-print.h"
+#include "gimple-pretty-print.h"
+#include "internal-fn.h"
 #include "graphite.h"

 /* Return an isl identifier for the polyhedral basic block PBB.  */
@@ -201,6 +205,8 @@ parameter_index_in_region (tree name, sese_info_p region)
   return -1;
 }

+tree oacc_ifn_call_extract (gimple*);
+
 /* Extract an affine expression from the tree E in the scop S.  */

 static isl_pw_aff *
@@ -604,6 +610,21 @@ pdr_add_data_dimensions (isl_set *subscript_sizes, scop_p scop,
   return isl_set_coalesce (subscript_sizes);
 }

+static inline bool
+oacc_internal_call_p (gimple *stmt)
+{
+  if (!stmt || !is_gimple_call (stmt))
+    return false;
+
+  /* graphite-scop-detection.c should filter out those calls. */
+  gcc_assert (!gimple_call_internal_p (stmt, IFN_UNIQUE));
+
+  /* Should be handled by scalar evolution analysis. */
+  gcc_assert (!gimple_call_internal_p (stmt, IFN_GOACC_LOOP));
+
+  return false;
+}
+
 /* Build data accesses for DRI.  */

 static void
@@ -640,13 +661,18 @@ build_poly_dr (dr_info &dri)
     subscript_sizes = pdr_add_data_dimensions (subscript_sizes, scop, dr);
   }

-  new_poly_dr (pbb, DR_STMT (dr), DR_IS_READ (dr) ? PDR_READ : PDR_WRITE,
-              acc, subscript_sizes);
+  if (oacc_internal_call_p (DR_STMT (dr)))
+    return;
+
+  bool is_reduction = scop->reduction_vars->contains (DR_BASE_ADDRESS (dr));
+  enum poly_dr_type dr_type = DR_IS_READ (dr) ? PDR_READ : PDR_WRITE;
+
+  new_poly_dr (pbb, DR_STMT (dr), dr_type, acc, subscript_sizes, is_reduction);
 }

 static void
 build_poly_sr_1 (poly_bb_p pbb, gimple *stmt, tree var, enum poly_dr_type kind,
-                isl_map *acc, isl_set *subscript_sizes)
+                 isl_map *acc, isl_set *subscript_sizes, bool is_reduction)
 {
   scop_p scop = PBB_SCOP (pbb);
   /* Each scalar variable has a unique alias set number starting from
@@ -663,7 +689,7 @@ build_poly_sr_1 (poly_bb_p pbb, gimple *stmt, tree var, enum poly_dr_type kind,
   c = isl_constraint_set_coefficient_si (c, isl_dim_out, 0, 1);

   new_poly_dr (pbb, stmt, kind, isl_map_add_constraint (acc, c),
-              subscript_sizes);
+               subscript_sizes, is_reduction);
 }

 /* Record all cross basic block scalar variables in PBB.  */
@@ -675,6 +701,7 @@ build_poly_sr (poly_bb_p pbb)
   gimple_poly_bb_p gbb = PBB_BLACK_BOX (pbb);
   vec<scalar_use> &reads = gbb->read_scalar_refs;
   vec<tree> &writes = gbb->write_scalar_refs;
+  vec<tree> &kills = gbb->kill_scalar_refs;

   isl_space *dc = isl_set_get_space (pbb->domain);
   int nb_out = 1;
@@ -689,13 +716,39 @@ build_poly_sr (poly_bb_p pbb)
   int i;
   tree var;
   FOR_EACH_VEC_ELT (writes, i, var)
+  {
+    if (oacc_internal_call_p (SSA_NAME_DEF_STMT (var)))
+      continue;
+
+    bool is_reduction = scop->reduction_vars->contains (var);
+
     build_poly_sr_1 (pbb, SSA_NAME_DEF_STMT (var), var, PDR_WRITE,
-                    isl_map_copy (acc), isl_set_copy (subscript_sizes));
+                     isl_map_copy (acc), isl_set_copy (subscript_sizes),
+                     is_reduction);
+  }
+
+  FOR_EACH_VEC_ELT (kills, i, var)
+  {
+    build_poly_sr_1 (pbb, NULL, var, PDR_KILL,
+                     isl_map_copy (acc), isl_set_copy (subscript_sizes),
+                     false);
+  }

   scalar_use *use;
   FOR_EACH_VEC_ELT (reads, i, use)
+  {
+    tree use_var = use->second;
+    gcc_checking_assert (TREE_CODE (use_var) == SSA_NAME);
+
+    if (oacc_internal_call_p (use->first)
+       || oacc_internal_call_p (SSA_NAME_DEF_STMT (use->second)))
+      continue;
+
+    bool is_reduction = scop->reduction_vars->contains (use->second);
+
     build_poly_sr_1 (pbb, use->first, use->second, PDR_READ, isl_map_copy (acc),
-                    isl_set_copy (subscript_sizes));
+                    isl_set_copy (subscript_sizes), is_reduction);
+  }

   isl_map_free (acc);
   isl_set_free (subscript_sizes);
diff --git a/gcc/graphite.c b/gcc/graphite.c
index 0060caea22ed..293d5425ff15 100644
--- a/gcc/graphite.c
+++ b/gcc/graphite.c
@@ -43,6 +43,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfghooks.h"
 #include "tree.h"
 #include "gimple.h"
+#include "gimple-iterator.h"
+#include "gimplify-me.h"
 #include "ssa.h"
 #include "fold-const.h"
 #include "gimple-iterator.h"
@@ -58,6 +60,14 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa.h"
 #include "tree-into-ssa.h"
 #include "graphite.h"
+#include "graphite-oacc.h"
+#include "cgraph.h"
+#include "gimple-pretty-print.h"
+#include "print-tree.h"
+#include "tree-pretty-print.h"
+#include "internal-fn.h"
+
+static bool have_isl = true;

 /* Print global statistics to FILE.  */

@@ -416,9 +426,12 @@ graphite_transform_loops (void)
   vec<scop_p> scops = vNULL;
   isl_ctx *ctx;

-  /* If a function is parallel it was most probably already run through graphite
-     once. No need to run again.  */
-  if (parallelized_function_p (cfun->decl))
+  /* If a function is parallel it was most probably already run through
+     graphite once. No need to run again.  This is not true for OpenACC
+     functions. The function was created for offloading, bu we still might have
+     to figure out which loops may be parallelized. */
+
+  if (parallelized_function_p (cfun->decl) && !oacc_function_p (cfun))
     return;

   calculate_dominance_info (CDI_DOMINATORS);
@@ -444,6 +457,7 @@ graphite_transform_loops (void)
   seir_cache = new hash_map<sese_scev_hash, tree>;

   calculate_dominance_info (CDI_POST_DOMINATORS);
+  set_scev_analyze_openacc_calls (oacc_function_p (cfun));
   build_scops (&scops);
   free_dominance_info (CDI_POST_DOMINATORS);

@@ -457,26 +471,50 @@ graphite_transform_loops (void)
       print_global_statistics (dump_file);
     }

-  FOR_EACH_VEC_ELT (scops, i, scop)
-    if (dbg_cnt (graphite_scop))
-      {
-       scop->isl_context = ctx;
-       if (!build_poly_scop (scop))
-         continue;
-
-       if (!apply_poly_transforms (scop))
-         continue;
-
-       changed = true;
-       if (graphite_regenerate_ast_isl (scop)
-           && dump_enabled_p ())
-         {
-           dump_user_location_t loc = find_loop_location
-             (scops[i]->scop_info->region.entry->dest->loop_father);
-           dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc,
-                            "loop nest optimized\n");
-         }
-      }
+  if (oacc_function_p (cfun))
+    {
+      /* OpenACC uses Graphite for dependence analysis only.
+         Code generation would need not to understand the
+         OpenACC internal function calls before it could be
+         enabled. */
+
+      FOR_EACH_VEC_ELT (scops, i, scop)
+      if (dbg_cnt (graphite_scop))
+        {
+          scop->isl_context = ctx;
+          if (!build_poly_scop (scop))
+            continue;
+
+          if (!optimize_isl (scop, true))
+           continue;
+
+          graphite_oacc_analyze_scop (scop);
+          changed = true;
+        }
+      set_scev_analyze_openacc_calls (false);
+    }
+  else // Non-OpenACC-functions
+    {
+      FOR_EACH_VEC_ELT (scops, i, scop)
+      if (dbg_cnt (graphite_scop))
+        {
+          scop->isl_context = ctx;
+          if (!build_poly_scop (scop))
+            continue;
+
+          if (!apply_poly_transforms (scop))
+            continue;
+
+          changed = true;
+          if (graphite_regenerate_ast_isl (scop) && dump_enabled_p ())
+            {
+              dump_user_location_t loc = find_loop_location (
+                  scops[i]->scop_info->region.entry->dest->loop_father);
+              dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc,
+                               "loop nest optimized\n");
+            }
+        }
+    }

   delete seir_cache;
   seir_cache = NULL;
@@ -518,6 +556,8 @@ graphite_transform_loops (void)

 #else /* If isl is not available: #ifndef HAVE_isl.  */

+static bool have_isl = false;
+
 static void
 graphite_transform_loops (void)
 {
@@ -530,7 +570,10 @@ graphite_transform_loops (void)
 static unsigned int
 graphite_transforms (struct function *fun)
 {
-  if (number_of_loops (fun) <= 1)
+
+  unsigned num_loops = number_of_loops (fun);
+  if (num_loops == 0
+      || (num_loops == 1 && !oacc_function_p (cfun)))
     return 0;

   graphite_transform_loops ();
@@ -538,14 +581,35 @@ graphite_transforms (struct function *fun)
   return 0;
 }

+/* Return TRUE if fun is an OpenACC outlined function that should be analyzed
+   by Graphite. */
+
+static inline bool oacc_enable_graphite_p (function *fun)
+{
+  if (!flag_openacc || !oacc_get_fn_attrib (fun->decl))
+    return false;
+
+  if (!graphite_analyze_oacc_target_region_type_p (fun))
+    return false;
+
+  bool optimizing = global_options.x_optimize <= 0;
+  /* Enabling Graphite if isl is not available aborts compilation. Prefer to
+     skip it and emit a warning, unless optimizations are enabled. */
+  if (!have_isl && !optimizing)
+    warning (OPT_Wall, "Unable to analyze OpenACC regions with Graphite; isl "
+                       "is not available.");
+  return true;
+}
+
 static bool
-gate_graphite_transforms (void)
+gate_graphite_transforms (function *fun)
 {
   /* Enable -fgraphite pass if any one of the graphite optimization flags
      is turned on.  */
   if (flag_graphite_identity
       || flag_loop_parallelize_all
-      || flag_loop_nest_optimize)
+      || flag_loop_nest_optimize
+      || oacc_enable_graphite_p (fun))
     flag_graphite = 1;

   return flag_graphite != 0;
@@ -574,7 +638,7 @@ public:
   {}

   /* opt_pass methods: */
-  virtual bool gate (function *) { return gate_graphite_transforms (); }
+  virtual bool gate (function *fun) { return gate_graphite_transforms (fun); }

 }; // class pass_graphite

@@ -609,7 +673,7 @@ public:
   {}

   /* opt_pass methods: */
-  virtual bool gate (function *) { return gate_graphite_transforms (); }
+  virtual bool gate (function *fun) { return gate_graphite_transforms (fun); }
   virtual unsigned int execute (function *fun) { return graphite_transforms (fun); }

 }; // class pass_graphite_transforms
diff --git a/gcc/graphite.h b/gcc/graphite.h
index 03febfa39986..9c508f31109f 100644
--- a/gcc/graphite.h
+++ b/gcc/graphite.h
@@ -42,7 +42,8 @@ enum poly_dr_type
   /* PDR_MAY_READs are represented using PDR_READS.  This does not
      limit the expressiveness.  */
   PDR_WRITE,
-  PDR_MAY_WRITE
+  PDR_MAY_WRITE,
+  PDR_KILL
 };

 struct poly_dr
@@ -61,6 +62,9 @@ struct poly_dr

   enum poly_dr_type type;

+  /* Indicates that this PDR is part of an OpenACC "reduction" computation. */
+  bool is_reduction;
+
   /* The access polyhedron contains the polyhedral space this data
      reference will access.

@@ -185,7 +189,7 @@ struct poly_dr
 #define PDR_ACCESSES(PDR) (NULL)

 void new_poly_dr (poly_bb_p, gimple *, enum poly_dr_type,
-                 isl_map *, isl_set *);
+                 isl_map *, isl_set *, bool);
 void debug_pdr (poly_dr_p);
 void print_pdr (FILE *, poly_dr_p);

@@ -211,6 +215,14 @@ pdr_may_write_p (poly_dr_p pdr)
   return PDR_TYPE (pdr) == PDR_MAY_WRITE;
 }

+/* Returns true when PDR is a "kill".  */
+
+static inline bool
+pdr_kill_p (poly_dr_p pdr)
+{
+  return PDR_TYPE (pdr) == PDR_KILL;
+}
+
 /* POLY_BB represents a blackbox in the polyhedral model.  */

 struct poly_bb
@@ -281,6 +293,8 @@ extern void print_isl_aff (FILE *, isl_aff *);
 extern void print_isl_constraint (FILE *, isl_constraint *);
 extern void print_isl_schedule (FILE *, isl_schedule *);
 extern void debug_isl_schedule (isl_schedule *);
+extern void print_isl_space (FILE *, isl_space *);
+extern void debug_isl_space (isl_space *);
 extern void print_isl_ast (FILE *, isl_ast_node *);
 extern void debug_isl_ast (isl_ast_node *);
 extern void debug_isl_set (isl_set *);
@@ -380,6 +394,18 @@ struct scop
   /* All the data references in this scop.  */
   vec<dr_info> drs;

+  /* This set contains the ssa names that are OpenACC "reduction" variables
+     in the loops from SCOP using them. */
+  hash_set<tree> *reduction_vars;
+
+  /* If SCOP is contained in an OpenACC compute region, this is the set of
+     ssa names that are "firstprivate" in this region. */
+  hash_set<tree> *oacc_firstprivate_vars;
+
+  /* If SCOP is contained in an OpenACC compute region, this is the set of
+     ssa names that are "private" in this region. */
+  hash_set<tree> *oacc_private_scalars;
+
   /* The context describes known restrictions concerning the parameters
      and relations in between the parameters.

@@ -411,7 +437,8 @@ struct scop
 extern scop_p new_scop (edge, edge);
 extern void free_scop (scop_p);
 extern gimple_poly_bb_p new_gimple_poly_bb (basic_block, vec<data_reference_p>,
-                                           vec<scalar_use>, vec<tree>);
+                                           vec<scalar_use>, vec<tree>, vec<tree>);
+extern bool optimize_isl (scop_p, bool = false);
 extern bool apply_poly_transforms (scop_p);

 /* Set the region of SCOP to REGION.  */
@@ -447,10 +474,10 @@ carries_deps (__isl_keep isl_union_map *schedule,

 extern bool build_poly_scop (scop_p);
 extern bool graphite_regenerate_ast_isl (scop_p);
+extern bool graphite_oacc_analyze_scop (scop_p);
 extern void build_scops (vec<scop_p> *);
 extern tree cached_scalar_evolution_in_region (const sese_l &, loop_p, tree);
 extern void dot_all_sese (FILE *, vec<sese_l> &);
 extern void dot_sese (sese_l &);
 extern void dot_cfg ();
-
 #endif
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 0cba95411a63..8a96f7600f68 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -3004,6 +3004,10 @@ expand_UNIQUE (internal_fn, gcall *stmt)
       else
        gcc_unreachable ();
       break;
+    case IFN_UNIQUE_OACC_PRIVATE:
+    case IFN_UNIQUE_OACC_PRIVATE_SCALAR:
+    case IFN_UNIQUE_OACC_FIRSTPRIVATE:
+      break;
     }

   if (pattern)
diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
index 19d0f849a5ad..d1028f05b0d8 100644
--- a/gcc/internal-fn.h
+++ b/gcc/internal-fn.h
@@ -40,7 +40,9 @@ along with GCC; see the file COPYING3.  If not see
   DEF(UNSPEC), \
     DEF(OACC_FORK), DEF(OACC_JOIN),            \
     DEF(OACC_HEAD_MARK), DEF(OACC_TAIL_MARK),  \
-    DEF(OACC_PRIVATE)
+    DEF(OACC_PRIVATE),  \
+    DEF(OACC_PRIVATE_SCALAR),  \
+    DEF(OACC_FIRSTPRIVATE)

 enum ifn_unique_kind {
 #define DEF(X) IFN_UNIQUE_##X
diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
index 70957a66da83..365d167b6428 100644
--- a/gcc/omp-expand.c
+++ b/gcc/omp-expand.c
@@ -108,6 +108,10 @@ struct omp_region
   /* The ordered stmt if type is GIMPLE_OMP_ORDERED and it has
      a depend clause.  */
   gomp_ordered *ord_stmt;
+
+  /* True if this is nested inside an OpenACC kernels construct that
+     will be handled by the "parloops" pass.  */
+  bool inside_kernels_p;
 };

 static struct omp_region *root_omp_region;
@@ -8110,7 +8114,24 @@ expand_omp_for (struct omp_region *region, gimple *inner_stmt)
     expand_omp_simd (region, &fd);
   else if (gimple_omp_for_kind (fd.for_stmt) == GF_OMP_FOR_KIND_OACC_LOOP)
     {
-      gcc_assert (!inner_stmt && !fd.non_rect);
+      struct omp_region *target_region;
+      for (target_region = region->outer; target_region;
+           target_region = target_region->outer)
+        {
+          if (region->type == GIMPLE_OMP_TARGET)
+            {
+              gomp_target *entry_stmt
+                  = as_a<gomp_target *> (last_stmt (target_region->entry));
+
+              if (gimple_omp_target_kind (entry_stmt)
+                  == GF_OMP_TARGET_KIND_OACC_KERNELS)
+                gcc_checking_assert (
+                    param_openacc_kernels != OPENACC_KERNELS_DECOMPOSE_PARLOOPS
+                    && param_openacc_kernels != OPENACC_KERNELS_PARLOOPS);
+            }
+        }
+
+      gcc_assert (!inner_stmt);
       expand_oacc_for (region, &fd);
     }
   else if (gimple_omp_for_kind (fd.for_stmt) == GF_OMP_FOR_KIND_TASKLOOP)
@@ -9515,6 +9536,10 @@ static void
 mark_loops_in_oacc_kernels_region (basic_block region_entry,
                                   basic_block region_exit)
 {
+  gcc_checking_assert (param_openacc_kernels
+                           == OPENACC_KERNELS_DECOMPOSE_PARLOOPS
+                       || param_openacc_kernels == OPENACC_KERNELS_PARLOOPS);
+
   class loop *outer = region_entry->loop_father;
   gcc_assert (region_exit == NULL || outer == region_exit->loop_father);

@@ -9679,24 +9704,29 @@ expand_omp_target (struct omp_region *region)

   entry_stmt = as_a <gomp_target *> (last_stmt (region->entry));
   target_kind = gimple_omp_target_kind (entry_stmt);
+  if (!(param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE_PARLOOPS
+        || param_openacc_kernels == OPENACC_KERNELS_PARLOOPS))
+    gcc_checking_assert (target_kind != GF_OMP_TARGET_KIND_OACC_KERNELS);
+
   new_bb = region->entry;

   offloaded = is_gimple_omp_offloaded (entry_stmt);
   switch (target_kind)
     {
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL:
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE:
+    case GF_OMP_TARGET_KIND_OACC_SERIAL:
     case GF_OMP_TARGET_KIND_REGION:
     case GF_OMP_TARGET_KIND_UPDATE:
     case GF_OMP_TARGET_KIND_ENTER_DATA:
     case GF_OMP_TARGET_KIND_EXIT_DATA:
-    case GF_OMP_TARGET_KIND_OACC_PARALLEL:
     case GF_OMP_TARGET_KIND_OACC_KERNELS:
-    case GF_OMP_TARGET_KIND_OACC_SERIAL:
     case GF_OMP_TARGET_KIND_OACC_UPDATE:
     case GF_OMP_TARGET_KIND_OACC_ENTER_DATA:
     case GF_OMP_TARGET_KIND_OACC_EXIT_DATA:
     case GF_OMP_TARGET_KIND_OACC_DECLARE:
-    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
-    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
     case GF_OMP_TARGET_KIND_DATA:
     case GF_OMP_TARGET_KIND_OACC_DATA:
     case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
@@ -9736,6 +9766,12 @@ expand_omp_target (struct omp_region *region)
                     NULL_TREE, DECL_ATTRIBUTES (child_fn));
       break;
     case GF_OMP_TARGET_KIND_OACC_KERNELS:
+      gcc_checking_assert (
+          param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE_PARLOOPS
+          || param_openacc_kernels == OPENACC_KERNELS_PARLOOPS);
+
+      mark_loops_in_oacc_kernels_region (region->entry, region->exit);
+
       DECL_ATTRIBUTES (child_fn)
        = tree_cons (get_identifier ("oacc kernels"),
                     NULL_TREE, DECL_ATTRIBUTES (child_fn));
@@ -9755,6 +9791,11 @@ expand_omp_target (struct omp_region *region)
        = tree_cons (get_identifier ("oacc parallel_kernels_gang_single"),
                     NULL_TREE, DECL_ATTRIBUTES (child_fn));
       break;
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE:
+      DECL_ATTRIBUTES (child_fn)
+          = tree_cons (get_identifier ("oacc parallel_kernels_graphite"),
+                       NULL_TREE, DECL_ATTRIBUTES (child_fn));
+      break;
     default:
       /* Make sure we don't miss any.  */
       gcc_checking_assert (!(is_gimple_omp_oacc (entry_stmt)
@@ -9967,6 +10008,7 @@ expand_omp_target (struct omp_region *region)
     case GF_OMP_TARGET_KIND_OACC_SERIAL:
     case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
     case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE:
       start_ix = BUILT_IN_GOACC_PARALLEL;
       break;
     case GF_OMP_TARGET_KIND_OACC_DATA:
@@ -10448,14 +10490,15 @@ build_omp_regions_1 (basic_block bb, struct omp_region *parent,
                case GF_OMP_TARGET_KIND_OACC_SERIAL:
                case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
                case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
-                 break;
+                case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE:
+                case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
+                  break;
                case GF_OMP_TARGET_KIND_UPDATE:
                case GF_OMP_TARGET_KIND_ENTER_DATA:
                case GF_OMP_TARGET_KIND_EXIT_DATA:
                case GF_OMP_TARGET_KIND_DATA:
                case GF_OMP_TARGET_KIND_OACC_DATA:
                case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
-               case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
                case GF_OMP_TARGET_KIND_OACC_UPDATE:
                case GF_OMP_TARGET_KIND_OACC_ENTER_DATA:
                case GF_OMP_TARGET_KIND_OACC_EXIT_DATA:
@@ -10638,7 +10681,10 @@ public:
   /* opt_pass methods: */
   virtual bool gate (function *fun)
     {
-      return !(fun->curr_properties & PROP_gimple_eomp);
+      return !(fun->curr_properties & PROP_gimple_eomp)
+             && (!oacc_get_kernels_attrib (cfun->decl)
+                 || param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE_PARLOOPS
+                 || param_openacc_kernels == OPENACC_KERNELS_PARLOOPS);
     }
   virtual unsigned int execute (function *) { return execute_expand_omp (); }
   opt_pass * clone () { return new pass_expand_omp_ssa (m_ctxt); }
@@ -10708,6 +10754,8 @@ omp_make_gimple_edges (basic_block bb, struct omp_region **region,
        case GF_OMP_TARGET_KIND_OACC_SERIAL:
        case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
        case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
+       case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE:
+       case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
          break;
        case GF_OMP_TARGET_KIND_UPDATE:
        case GF_OMP_TARGET_KIND_ENTER_DATA:
@@ -10715,7 +10763,6 @@ omp_make_gimple_edges (basic_block bb, struct omp_region **region,
        case GF_OMP_TARGET_KIND_DATA:
        case GF_OMP_TARGET_KIND_OACC_DATA:
        case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
-       case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
        case GF_OMP_TARGET_KIND_OACC_UPDATE:
        case GF_OMP_TARGET_KIND_OACC_ENTER_DATA:
        case GF_OMP_TARGET_KIND_OACC_EXIT_DATA:
diff --git a/gcc/omp-general.c b/gcc/omp-general.c
index 27a1bc8092c8..1940c96a200c 100644
--- a/gcc/omp-general.c
+++ b/gcc/omp-general.c
@@ -2929,6 +2929,15 @@ oacc_get_fn_attrib (tree fn)
   return lookup_attribute (OACC_FN_ATTRIB, DECL_ATTRIBUTES (fn));
 }

+/* Retrieve the oacc kernels attrib and return it.  Non-oacc
+   functions will return NULL.  */
+
+tree
+oacc_get_kernels_attrib (tree fn)
+{
+  return lookup_attribute ("oacc kernels", DECL_ATTRIBUTES (fn));
+}
+
 /* Return true if FN is an OpenMP or OpenACC offloading function.  */

 bool
@@ -2955,10 +2964,16 @@ oacc_get_fn_dim_size (tree fn, int axis)
     dims = TREE_CHAIN (dims);

   tree v = TREE_VALUE (dims);
-  /* TODO With 'pass_oacc_device_lower' moved "later", this is necessary to
-     avoid ICE for some OpenACC 'kernels' ("parloops") constructs.  */
+  /* TODO-kernels With 'pass_oacc_device_lower' moved "later", this is necessary
+     to avoid ICE for some OpenACC 'kernels' ("parloops") constructs.  */
   if (v == NULL_TREE)
-    return 0;
+    {
+      gcc_checking_assert (
+          param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE_PARLOOPS
+          || param_openacc_kernels == OPENACC_KERNELS_PARLOOPS);
+
+      return 0;
+    }

   int size = TREE_INT_CST_LOW (v);

diff --git a/gcc/omp-general.h b/gcc/omp-general.h
index 8fe744c6a7af..28584ed8d56e 100644
--- a/gcc/omp-general.h
+++ b/gcc/omp-general.h
@@ -119,6 +119,7 @@ extern int oacc_verify_routine_clauses (tree, tree *, location_t,
                                        const char *);
 extern tree oacc_build_routine_dims (tree clauses);
 extern tree oacc_get_fn_attrib (tree fn);
+extern tree oacc_get_kernels_attrib (tree fn);
 extern bool offloading_function_p (tree fn);
 extern int oacc_get_fn_dim_size (tree fn, int axis);
 extern int oacc_get_ifn_dim_arg (const gimple *stmt);
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index f58a191e014c..afd6061ae1e9 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -154,6 +154,12 @@ struct omp_context
   /* True if this construct can be cancelled.  */
   bool cancellable;

+  /* "firstprivate" variables in this context */
+  hash_set<tree> *oacc_firstprivate_vars;
+
+  /* Scalar "private" variables in this context. */
+  hash_set<tree> *oacc_private_scalars;
+
   /* True if lower_omp_1 should look up lastprivate conditional in parent
      context.  */
   bool combined_into_simd_safelen1;
@@ -213,10 +219,30 @@ is_oacc_parallel_or_serial (omp_context *ctx)
 {
   enum gimple_code outer_type = gimple_code (ctx->stmt);
   return ((outer_type == GIMPLE_OMP_TARGET)
-         && ((gimple_omp_target_kind (ctx->stmt)
-              == GF_OMP_TARGET_KIND_OACC_PARALLEL)
-             || (gimple_omp_target_kind (ctx->stmt)
-                 == GF_OMP_TARGET_KIND_OACC_SERIAL)));
+          && ((gimple_omp_target_kind (ctx->stmt)
+               == GF_OMP_TARGET_KIND_OACC_PARALLEL)
+              || (gimple_omp_target_kind (ctx->stmt)
+                  == GF_OMP_TARGET_KIND_OACC_SERIAL)
+              || (gimple_omp_target_kind (ctx->stmt)
+                  == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE)));
+}
+
+/* Return true if CTX corresponds to an oacc region that was generated from
+   an original kernels region that has been lowered to parallel regions.  */
+
+static bool
+was_originally_oacc_kernels (omp_context *ctx)
+{
+  enum gimple_code outer_type = gimple_code (ctx->stmt);
+  return ((outer_type == GIMPLE_OMP_TARGET)
+          && ((gimple_omp_target_kind (ctx->stmt)
+               == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED)
+              || (gimple_omp_target_kind (ctx->stmt)
+                  == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE)
+              || (gimple_omp_target_kind (ctx->stmt)
+                  == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE)
+              || (gimple_omp_target_kind (ctx->stmt)
+                  == GF_OMP_TARGET_KIND_OACC_DATA_KERNELS)));
 }

 /* Return whether CTX represents an OpenACC 'kernels' construct.
@@ -242,10 +268,34 @@ is_oacc_kernels_decomposed_part (omp_context *ctx)
               == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED)
              || (gimple_omp_target_kind (ctx->stmt)
                  == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE)
+             || (gimple_omp_target_kind (ctx->stmt)
+                 == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE)
              || (gimple_omp_target_kind (ctx->stmt)
                  == GF_OMP_TARGET_KIND_OACC_DATA_KERNELS)));
 }

+/* Return whether CTX represents an OpenACC 'kernels' decomposed part that will
+   be analyzed by Graphite.  */
+
+static bool
+is_oacc_kernels_decomposed_graphite_part (omp_context *ctx)
+{
+  return gimple_code (ctx->stmt) == GIMPLE_OMP_TARGET
+         && gimple_omp_target_kind (ctx->stmt)
+                == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE;
+}
+
+
+/* Return whether CTX represents an OpenACC 'kernels' data  part.  */
+
+static bool
+is_oacc_data_kernels_part (omp_context *ctx)
+{
+  return gimple_code (ctx->stmt) == GIMPLE_OMP_TARGET
+         && gimple_omp_target_kind (ctx->stmt)
+                == GF_OMP_TARGET_KIND_OACC_DATA_KERNELS;
+}
+
 /* Return true if STMT corresponds to an OpenMP target region.  */
 static bool
 is_omp_target (gimple *stmt)
@@ -1011,6 +1061,9 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx)

   ctx->cb.decl_map = new hash_map<tree, tree>;

+  ctx->oacc_firstprivate_vars = new hash_set<tree> ();
+  ctx->oacc_private_scalars = new hash_set<tree> ();
+
   return ctx;
 }

@@ -1093,6 +1146,8 @@ delete_omp_context (splay_tree_value value)

   delete ctx->lastprivate_conditional_map;
   delete ctx->allocate_map;
+  delete ctx->oacc_firstprivate_vars;
+  delete ctx->oacc_private_scalars;

   XDELETE (ctx);
 }
@@ -1155,6 +1210,43 @@ fixup_child_record_type (omp_context *ctx)
     = build_qualified_type (build_reference_type (type), TYPE_QUAL_RESTRICT);
 }

+static void
+oacc_record_firstprivate_var_clauses (omp_context *ctx, tree clauses)
+{
+  tree c;
+
+  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_FIRSTPRIVATE)
+      {
+        tree decl = OMP_CLAUSE_DECL (c);
+
+        if (TREE_ADDRESSABLE (decl))
+          continue;
+
+        ctx->oacc_firstprivate_vars->add (decl);
+      }
+}
+
+static void
+oacc_record_private_scalars (omp_context *ctx, tree clauses)
+{
+  tree c;
+
+  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE)
+      {
+        tree decl = OMP_CLAUSE_DECL (c);
+        if (!(VAR_P (decl)
+              && !(TREE_READONLY (decl)
+                   && (TREE_STATIC (decl) || DECL_EXTERNAL (decl)))))
+          continue;
+
+        if (TREE_ADDRESSABLE (decl))
+          continue;
+        ctx->oacc_private_scalars->add (decl);
+      }
+}
+
 /* Instantiate decls as necessary in CTX to satisfy the data sharing
    specified by CLAUSES.  */

@@ -1726,9 +1818,15 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
            break;
          /* FALLTHRU */

-       case OMP_CLAUSE_FIRSTPRIVATE:
-       case OMP_CLAUSE_PRIVATE:
-       case OMP_CLAUSE_LINEAR:
+        case OMP_CLAUSE_FIRSTPRIVATE:
+          if (is_oacc_kernels_decomposed_graphite_part (ctx))
+            oacc_record_firstprivate_var_clauses (ctx, c);
+          gcc_fallthrough ();
+        case OMP_CLAUSE_PRIVATE:
+          if (is_oacc_kernels_decomposed_graphite_part (ctx))
+            oacc_record_private_scalars (ctx, c);
+          gcc_fallthrough ();
+        case OMP_CLAUSE_LINEAR:
        case OMP_CLAUSE_IS_DEVICE_PTR:
          decl = OMP_CLAUSE_DECL (c);
          if (is_variable_sized (decl))
@@ -2591,12 +2689,21 @@ enclosing_target_ctx (omp_context *ctx)
 static bool
 ctx_in_oacc_kernels_region (omp_context *ctx)
 {
+  gcc_checking_assert (param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE
+                       || param_openacc_kernels
+                              == OPENACC_KERNELS_DECOMPOSE_PARLOOPS
+                       || param_openacc_kernels == OPENACC_KERNELS_PARLOOPS);
+
   for (;ctx != NULL; ctx = ctx->outer)
     {
       gimple *stmt = ctx->stmt;
-      if (gimple_code (stmt) == GIMPLE_OMP_TARGET
-         && gimple_omp_target_kind (stmt) == GF_OMP_TARGET_KIND_OACC_KERNELS)
-       return true;
+      if (gimple_code (stmt) != GIMPLE_OMP_TARGET)
+       continue;
+
+      int target_kind = gimple_omp_target_kind (stmt);
+      if (target_kind == GF_OMP_TARGET_KIND_OACC_KERNELS
+          || target_kind == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE)
+        return true;
     }

   return false;
@@ -2610,6 +2717,10 @@ ctx_in_oacc_kernels_region (omp_context *ctx)
 static unsigned
 check_oacc_kernel_gwv (gomp_for *stmt, omp_context *ctx)
 {
+  gcc_checking_assert (param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE_PARLOOPS
+                      || param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE
+                      || param_openacc_kernels == OPENACC_KERNELS_PARLOOPS);
+
   bool checking = true;
   unsigned outer_mask = 0;
   unsigned this_mask = 0;
@@ -2681,9 +2792,11 @@ scan_omp_for (gomp_for *stmt, omp_context *outer_ctx)
     {
       omp_context *tgt = enclosing_target_ctx (outer_ctx);

-      if (!(tgt && is_oacc_kernels (tgt)))
-       for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
-         {
+      if (!tgt
+          || (is_oacc_parallel_or_serial (tgt)
+              && !was_originally_oacc_kernels (tgt)))
+        for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+          {
            tree c_op0;
            switch (OMP_CLAUSE_CODE (c))
              {
@@ -3101,26 +3214,31 @@ check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx)
      inside an OpenACC CTX.  */
   if (gimple_code (stmt) == GIMPLE_OMP_ATOMIC_LOAD
       || gimple_code (stmt) == GIMPLE_OMP_ATOMIC_STORE)
-    /* ..., except for the atomic codes that OpenACC shares with OpenMP.  */
+    /* ..., except for the atomic codes that OpenACC shares with OpenMP  */
+    ;
+  else if (gimple_code (stmt) == GIMPLE_OMP_TARGET
+
+           && gimple_omp_target_kind (stmt) == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE)
+    /* ... and except for target regions introduced for kernels.  */
+
     ;
-  else if (!(is_gimple_omp (stmt)
-            && is_gimple_omp_oacc (stmt)))
+
+  else if (!(is_gimple_omp (stmt) && is_gimple_omp_oacc (stmt)))
     {
       if (oacc_get_fn_attrib (cfun->decl) != NULL)
-       {
-         error_at (gimple_location (stmt),
-                   "non-OpenACC construct inside of OpenACC routine");
-         return false;
-       }
+        {
+          error_at (gimple_location (stmt),
+                    "non-OpenACC construct inside of OpenACC routine");
+          return false;
+        }
       else
-       for (omp_context *octx = ctx; octx != NULL; octx = octx->outer)
-         if (is_gimple_omp (octx->stmt)
-             && is_gimple_omp_oacc (octx->stmt))
-           {
-             error_at (gimple_location (stmt),
-                       "non-OpenACC construct inside of OpenACC region");
-             return false;
-           }
+        for (omp_context *octx = ctx; octx != NULL; octx = octx->outer)
+          if (is_gimple_omp (octx->stmt) && is_gimple_omp_oacc (octx->stmt))
+            {
+              error_at (gimple_location (stmt),
+                        "non-OpenACC construct inside of OpenACC region");
+              return false;
+            }
     }

   if (ctx != NULL)
@@ -3275,6 +3393,7 @@ check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx)
                  case GF_OMP_TARGET_KIND_OACC_SERIAL:
                  case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
                  case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
+                 case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE:
                    ok = true;
                    break;

@@ -3774,6 +3893,7 @@ check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx)
              break;
            case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
            case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
+           case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE:
            case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
              /* OpenACC 'kernels' decomposed parts.  */
              stmt_name = "kernels"; break;
@@ -3794,6 +3914,7 @@ check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx)
              ctx_stmt_name = "host_data"; break;
            case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
            case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
+           case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE:
            case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
              /* OpenACC 'kernels' decomposed parts.  */
              ctx_stmt_name = "kernels"; break;
@@ -3801,10 +3922,12 @@ check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx)
            }

          /* OpenACC/OpenMP mismatch?  */
-         if (is_gimple_omp_oacc (stmt)
-             != is_gimple_omp_oacc (ctx->stmt))
-           {
-             error_at (gimple_location (stmt),
+          if (is_gimple_omp_oacc (stmt) != is_gimple_omp_oacc (ctx->stmt)
+              && (gimple_code (stmt) != GIMPLE_OMP_TARGET
+                  || gimple_omp_target_kind (stmt)
+                         != GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE))
+            {
+              error_at (gimple_location (stmt),
                        "%s %qs construct inside of %s %qs region",
                        (is_gimple_omp_oacc (stmt)
                         ? "OpenACC" : "OpenMP"), stmt_name,
@@ -3812,7 +3935,16 @@ check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx)
                         ? "OpenACC" : "OpenMP"), ctx_stmt_name);
              return false;
            }
-         if (is_gimple_omp_offloaded (ctx->stmt))
+
+          if ((gimple_code (ctx->stmt) == GIMPLE_OMP_TARGET
+               && gimple_omp_target_kind (ctx->stmt)
+                      == GF_OMP_TARGET_KIND_OACC_DATA_KERNELS)
+              && (gimple_code (stmt) == GIMPLE_OMP_TARGET
+                  && gimple_omp_target_kind (stmt)
+                         == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE))
+            ;
+
+          else if (is_gimple_omp_offloaded (ctx->stmt))
            {
              /* No GIMPLE_OMP_TARGET inside offloaded OpenACC CTX.  */
              if (is_gimple_omp_oacc (ctx->stmt))
@@ -7373,9 +7505,11 @@ lower_lastprivate_clauses (tree clauses, tree predicate, gimple_seq *body_p,

 static void
 lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
-                      gcall *fork, gcall *private_marker, gcall *join,
-                      gimple_seq *fork_seq, gimple_seq *join_seq,
-                      omp_context *ctx)
+                       gcall *fork, gcall *private_marker,
+                       gcall *private_scalars_marker,
+                       gcall *firstprivate_marker, gcall *join,
+                       gimple_seq *fork_seq, gimple_seq *join_seq,
+                       omp_context *ctx)
 {
   gimple_seq before_fork = NULL;
   gimple_seq after_fork = NULL;
@@ -7391,7 +7525,9 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
        /* No 'reduction' clauses on OpenACC 'kernels'.  */
        gcc_checking_assert (!is_oacc_kernels (ctx));
        /* Likewise, on OpenACC 'kernels' decomposed parts.  */
-       gcc_checking_assert (!is_oacc_kernels_decomposed_part (ctx));
+        gcc_checking_assert (
+            !is_oacc_kernels_decomposed_part (ctx)
+            || is_oacc_kernels_decomposed_graphite_part (ctx));

        tree orig = OMP_CLAUSE_DECL (c);
        tree var = maybe_lookup_decl (orig, ctx);
@@ -7585,7 +7721,12 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
     gimple_seq_add_stmt (fork_seq, fork);
   gimple_seq_add_seq (fork_seq, after_fork);

+  if (private_scalars_marker)
+    gimple_seq_add_stmt (join_seq, private_scalars_marker);
+  if (firstprivate_marker)
+    gimple_seq_add_stmt (join_seq, firstprivate_marker);
   gimple_seq_add_seq (join_seq, before_join);
+
   if (join)
     gimple_seq_add_stmt (join_seq, join);
   gimple_seq_add_seq (join_seq, after_join);
@@ -8294,16 +8435,29 @@ lower_oacc_head_mark (location_t loc, tree ddvar, tree clauses,
   else
     gcc_unreachable ();

-  /* In a parallel region, loops are implicitly INDEPENDENT.  */
-  if (!tgt || is_oacc_parallel_or_serial (tgt))
-    tag |= OLF_INDEPENDENT;
+  /* In a parallel region, loops without auto and seq clauses are
+     implicitly INDEPENDENT.  */
+  if ((!tgt
+       || (is_oacc_parallel_or_serial (tgt)
+           && !is_oacc_kernels_decomposed_graphite_part (tgt)))
+      && !(tag & (OLF_SEQ | OLF_AUTO)))
+    {
+      tag |= OLF_INDEPENDENT;
+    }

   /* Loops inside OpenACC 'kernels' decomposed parts' regions are expected to
      have an explicit 'seq' or 'independent' clause, and no 'auto' clause.  */
-  if (tgt && is_oacc_kernels_decomposed_part (tgt))
+  if (tgt && is_oacc_kernels_decomposed_part (tgt)
+      && !is_oacc_kernels_decomposed_graphite_part (tgt))
     {
-      gcc_assert (tag & (OLF_SEQ | OLF_INDEPENDENT));
-      gcc_assert (!(tag & OLF_AUTO));
+      tag |= OLF_INDEPENDENT;
+
+      gcc_checking_assert (
+          gimple_code (ctx->stmt) != GIMPLE_OMP_TARGET
+          /* Loops in kernels regions that will be handled by Graphite should
+             have been made 'auto' by "pass_convert_oacc_kernels". */
+          || gimple_omp_target_kind (ctx->stmt)
+                 != GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE);
     }

   if (tag & OLF_TILE)
@@ -8358,7 +8512,9 @@ lower_oacc_loop_marker (location_t loc, tree ddvar, bool head,

 static void
 lower_oacc_head_tail (location_t loc, tree clauses, gcall *private_marker,
-                     gimple_seq *head, gimple_seq *tail, omp_context *ctx)
+                      gcall *private_scalars_marker,
+                      gcall *firstprivate_marker, gimple_seq *head,
+                      gimple_seq *tail, omp_context *ctx)
 {
   bool inner = false;
   tree ddvar = create_tmp_var (integer_type_node, ".data_dep");
@@ -8373,6 +8529,20 @@ lower_oacc_head_tail (location_t loc, tree clauses, gcall *private_marker,
       gimple_call_set_arg (private_marker, 1, ddvar);
     }

+  if (private_scalars_marker)
+    {
+      gimple_set_location (private_scalars_marker, loc);
+      gimple_call_set_lhs (private_scalars_marker, ddvar);
+      gimple_call_set_arg (private_scalars_marker, 1, ddvar);
+    }
+
+  if (firstprivate_marker)
+    {
+      gimple_set_location (firstprivate_marker, loc);
+      gimple_call_set_lhs (firstprivate_marker, ddvar);
+      gimple_call_set_arg (firstprivate_marker, 1, ddvar);
+    }
+
   tree fork_kind = build_int_cst (unsigned_type_node, IFN_UNIQUE_OACC_FORK);
   tree join_kind = build_int_cst (unsigned_type_node, IFN_UNIQUE_OACC_JOIN);

@@ -8402,9 +8572,10 @@ lower_oacc_head_tail (location_t loc, tree clauses, gcall *private_marker,
                              build_int_cst (integer_type_node, done),
                              &join_seq);

-      lower_oacc_reductions (loc, clauses, place, inner,
-                            fork, (count == 1) ? private_marker : NULL,
-                            join, &fork_seq, &join_seq,  ctx);
+      lower_oacc_reductions (loc, clauses, place, inner, fork,
+                             (count == 1) ? private_marker : NULL,
+                             private_scalars_marker, firstprivate_marker, join,
+                             &fork_seq, &join_seq, ctx);

       /* Append this level to head. */
       gimple_seq_add_seq (head, fork_seq);
@@ -11531,6 +11702,76 @@ lower_oacc_private_marker (omp_context *ctx)
   return gimple_build_call_internal_vec (IFN_UNIQUE, args);
 }

+/* Return an internal function call that contains a list of variables which are
+   "firstprivate" in the compute region representend by CTX. This call is used
+   to help Graphite identify those static. */
+
+static gcall *
+make_oacc_firstprivate_vars_marker (omp_context *ctx)
+{
+  auto_vec<tree, 5> args;
+
+  args.quick_push (
+      build_int_cst (integer_type_node, IFN_UNIQUE_OACC_FIRSTPRIVATE));
+
+  /* TODO Change the data structure/iteration to ensure that the ordering of the
+     variables remains stable between GCC runs. */
+  hash_set<tree>::iterator end = ctx->oacc_firstprivate_vars->end();
+  hash_set<tree>::iterator it = ctx->oacc_firstprivate_vars->begin ();
+  for (; it != end; ++it)
+    {
+      tree decl = *it;
+      for (omp_context *thisctx = ctx; thisctx; thisctx = thisctx->outer)
+       {
+         tree inner_decl = maybe_lookup_decl (decl, thisctx);
+         if (inner_decl)
+           {
+             decl = inner_decl;
+             break;
+           }
+       }
+
+      args.safe_push (decl);
+    }
+
+  return gimple_build_call_internal_vec (IFN_UNIQUE, args);
+}
+
+/* Return an internal function call that contains a list of scalar variables
+   which are "private" in the compute region represented by CTX. This call is
+   used to help Graphite identify those variables. */
+
+static gcall *
+make_oacc_private_scalars_marker (omp_context *ctx)
+{
+  auto_vec<tree, 5> args;
+
+  args.quick_push (
+      build_int_cst (integer_type_node, IFN_UNIQUE_OACC_PRIVATE_SCALAR));
+
+  /* TODO Change the data structure/iteration to ensure that the ordering of
+     the variables remains stable between GCC runs. */
+  hash_set<tree>::iterator end = ctx->oacc_private_scalars->end ();
+  hash_set<tree>::iterator it = ctx->oacc_private_scalars->begin ();
+  for (; it != end; ++it)
+    {
+      tree decl = *it;
+      for (omp_context *thisctx = ctx; thisctx; thisctx = thisctx->outer)
+        {
+          tree inner_decl = maybe_lookup_decl (decl, thisctx);
+          if (inner_decl)
+            {
+              decl = inner_decl;
+              break;
+            }
+        }
+
+      args.safe_push (decl);
+    }
+
+  return gimple_build_call_internal_vec (IFN_UNIQUE, args);
+}
+
 /* Lower code for an OMP loop directive.  */

 static void
@@ -11739,11 +11980,16 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
   /* Once lowered, extract the bounds and clauses.  */
   omp_extract_for_data (stmt, &fd, NULL);

-  if (is_gimple_omp_oacc (ctx->stmt)
-      && !ctx_in_oacc_kernels_region (ctx))
-    lower_oacc_head_tail (gimple_location (stmt),
-                         gimple_omp_for_clauses (stmt), private_marker,
-                         &oacc_head, &oacc_tail, ctx);
+  bool oacc_kernels_parloops = false;
+  if (param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE_PARLOOPS
+      || param_openacc_kernels == OPENACC_KERNELS_PARLOOPS)
+    oacc_kernels_parloops = ctx_in_oacc_kernels_region (ctx);
+  if (is_gimple_omp_oacc (ctx->stmt) && !oacc_kernels_parloops)
+    {
+      lower_oacc_head_tail (gimple_location (stmt),
+                            gimple_omp_for_clauses (stmt), private_marker,
+                            NULL, NULL, &oacc_head, &oacc_tail, ctx);
+    }

   /* Add OpenACC partitioning and reduction markers just before the loop.  */
   if (oacc_head)
@@ -12559,6 +12805,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
     case GF_OMP_TARGET_KIND_OACC_DECLARE:
     case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
     case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE:
       data_region = false;
       break;
     case GF_OMP_TARGET_KIND_DATA:
@@ -12751,13 +12998,11 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
        break;

       case OMP_CLAUSE_FIRSTPRIVATE:
-       gcc_checking_assert (offloaded);
-       if (is_gimple_omp_oacc (ctx->stmt))
-         {
+        gcc_checking_assert (offloaded || is_oacc_data_kernels_part (ctx));
+        if (is_gimple_omp_oacc (ctx->stmt))
+          {
            /* No 'firstprivate' clauses on OpenACC 'kernels'.  */
            gcc_checking_assert (!is_oacc_kernels (ctx));
-           /* Likewise, on OpenACC 'kernels' decomposed parts.  */
-           gcc_checking_assert (!is_oacc_kernels_decomposed_part (ctx));

            goto oacc_firstprivate;
          }
@@ -12785,13 +13030,12 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
        break;

       case OMP_CLAUSE_PRIVATE:
+        gcc_checking_assert (offloaded || is_oacc_data_kernels_part (ctx));
        gcc_checking_assert (offloaded);
        if (is_gimple_omp_oacc (ctx->stmt))
          {
            /* No 'private' clauses on OpenACC 'kernels'.  */
            gcc_checking_assert (!is_oacc_kernels (ctx));
-           /* Likewise, on OpenACC 'kernels' decomposed parts.  */
-           gcc_checking_assert (!is_oacc_kernels_decomposed_part (ctx));

            break;
          }
@@ -13066,7 +13310,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
                  }
                else if (is_gimple_reg (var))
                  {
-                   gcc_assert (offloaded);
+                   gcc_assert (offloaded || is_oacc_data_kernels_part (ctx));
                    tree avar = create_tmp_var (TREE_TYPE (var));
                    mark_addressable (avar);
                    enum gomp_map_kind map_kind = OMP_CLAUSE_MAP_KIND (c);
@@ -13846,13 +14090,26 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)

          gcall *private_marker = lower_oacc_private_marker (ctx);

-         if (private_marker)
+         gcall *firstprivate_marker = NULL;
+         gcall *private_scalars_marker = NULL;
+
+          /* The markers for "private" and "firstprivate" scalars are only used
+             to help "Graphite" identify those variables for which it has to
+             adjust some dependences. */
+          if (is_oacc_kernels_decomposed_graphite_part (ctx))
+            {
+              firstprivate_marker = make_oacc_firstprivate_vars_marker (ctx);
+              private_scalars_marker = make_oacc_private_scalars_marker (ctx);
+            }
+
+          if (private_marker)
            gimple_call_set_arg (private_marker, 2, level);

-         lower_oacc_reductions (gimple_location (ctx->stmt), clauses, level,
-                                false, NULL, private_marker, NULL, &fork_seq,
-                                &join_seq, ctx);
-       }
+          lower_oacc_reductions (gimple_location (ctx->stmt), clauses, level,
+                                 false, NULL, private_marker,
+                                 private_scalars_marker, firstprivate_marker,
+                                 NULL, &fork_seq, &join_seq, ctx);
+        }

       gimple_seq_add_seq (&new_body, fork_seq);
       gimple_seq_add_seq (&new_body, tgt_body);
diff --git a/gcc/omp-oacc-kernels-decompose.cc b/gcc/omp-oacc-kernels-decompose.cc
index 4ba5758a9067..c96207d96250 100644
--- a/gcc/omp-oacc-kernels-decompose.cc
+++ b/gcc/omp-oacc-kernels-decompose.cc
@@ -176,8 +176,13 @@ adjust_region_code_walk_stmt_fn (gimple_stmt_iterator *gsi_p,
               compiler logic to analyze this, so can't parallelize it here, so
               we'd very likely be running into a performance problem if we
               were to execute this unparallelized, thus forward the whole loop
-              nest to 'parloops'.  */
-           *region_code = GF_OMP_TARGET_KIND_OACC_KERNELS;
+              nest to Graphite/"parloops".  */
+           if (param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE)
+             *region_code = GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE;
+           else if (param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE_PARLOOPS)
+             *region_code = GF_OMP_TARGET_KIND_OACC_KERNELS;
+           else
+             gcc_unreachable ();
            /* Terminate: final decision for this region.  */
            *handled_ops_p = true;
            return integer_zero_node;
@@ -197,8 +202,13 @@ adjust_region_code_walk_stmt_fn (gimple_stmt_iterator *gsi_p,
         the compiler logic to analyze this, so can't parallelize it here, so
         we'd very likely be running into a performance problem if we were to
         execute this unparallelized, thus forward the whole thing to
-        'parloops'.  */
-      *region_code = GF_OMP_TARGET_KIND_OACC_KERNELS;
+        Graphite/"parloops".  */
+      if (param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE)
+       *region_code = GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE;
+      else if (param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE_PARLOOPS)
+       *region_code = GF_OMP_TARGET_KIND_OACC_KERNELS;
+      else
+        gcc_unreachable ();
       /* Terminate: final decision for this region.  */
       *handled_ops_p = true;
       return integer_zero_node;
@@ -309,7 +319,9 @@ make_region_seq (location_t loc, gimple_seq stmts,
   /* Figure out the region code for this region.  */
   /* Optimistic default: assume "setup code", no looping; thus not
      performance-critical.  */
-  int region_code = GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE;
+  int region_code = param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE
+                        ? GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE
+                        : GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE;
   adjust_region_code (stmts, &region_code);

   if (region_code == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE)
@@ -330,6 +342,13 @@ make_region_seq (location_t loc, gimple_seq stmts,
         loops nested inside this sequentially executed statement.  */
       make_loops_gang_single (stmts);
     }
+  else if (region_code == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE)
+    {
+      if (dump_enabled_p ())
+       dump_printf_loc (MSG_NOTE, loc_stmts_first,
+                        "beginning %<Graphite%> part in OpenACC"
+                        " %<kernels%> region\n");
+    }
   else if (region_code == GF_OMP_TARGET_KIND_OACC_KERNELS)
     {
       if (dump_enabled_p ())
@@ -437,21 +456,24 @@ adjust_nested_loop_clauses (gimple_stmt_iterator *gsi_p, bool *,
          tree *outer_clause_ptr = NULL;
          switch (OMP_CLAUSE_CODE (loop_clause))
            {
-           case OMP_CLAUSE_GANG:
-             outer_clause_ptr = wi_info->loop_gang_clause_ptr;
-             break;
-           case OMP_CLAUSE_WORKER:
-             outer_clause_ptr = wi_info->loop_worker_clause_ptr;
-             break;
-           case OMP_CLAUSE_VECTOR:
-             outer_clause_ptr = wi_info->loop_vector_clause_ptr;
-             break;
-           case OMP_CLAUSE_SEQ:
-           case OMP_CLAUSE_INDEPENDENT:
-           case OMP_CLAUSE_AUTO:
-             add_auto_clause = false;
-           default:
-             break;
+             case OMP_CLAUSE_GANG:
+               outer_clause_ptr = wi_info->loop_gang_clause_ptr;
+               add_auto_clause = false;
+               break;
+             case OMP_CLAUSE_WORKER:
+               outer_clause_ptr = wi_info->loop_worker_clause_ptr;
+               add_auto_clause = false;
+               break;
+             case OMP_CLAUSE_VECTOR:
+               outer_clause_ptr = wi_info->loop_vector_clause_ptr;
+               add_auto_clause = false;
+               break;
+             case OMP_CLAUSE_SEQ:
+             case OMP_CLAUSE_INDEPENDENT:
+             case OMP_CLAUSE_AUTO:
+               add_auto_clause = false;
+             default:
+               break;
            }
          if (outer_clause_ptr != NULL)
            {
@@ -525,30 +547,34 @@ transform_kernels_loop_clauses (gimple *omp_for,
        loop_clause = OMP_CLAUSE_CHAIN (loop_clause))
     {
       bool found_num_clause = false;
-      tree *clause_ptr, clause_to_check;
+      tree *clause_ptr;
+      tree clause_to_check = NULL_TREE;
       switch (OMP_CLAUSE_CODE (loop_clause))
-       {
-       case OMP_CLAUSE_GANG:
-         found_num_clause = true;
-         clause_ptr = &loop_gang_clause;
-         clause_to_check = num_gangs_clause;
-         break;
-       case OMP_CLAUSE_WORKER:
-         found_num_clause = true;
-         clause_ptr = &loop_worker_clause;
-         clause_to_check = num_workers_clause;
-         break;
-       case OMP_CLAUSE_VECTOR:
-         found_num_clause = true;
-         clause_ptr = &loop_vector_clause;
-         clause_to_check = vector_length_clause;
-         break;
-       case OMP_CLAUSE_INDEPENDENT:
-       case OMP_CLAUSE_SEQ:
-       case OMP_CLAUSE_AUTO:
-         add_auto_clause = false;
-       default:
-         break;
+        {
+         case OMP_CLAUSE_GANG:
+           found_num_clause = true;
+           add_auto_clause = false;
+           clause_ptr = &loop_gang_clause;
+           clause_to_check = num_gangs_clause;
+           break;
+         case OMP_CLAUSE_WORKER:
+           found_num_clause = true;
+           add_auto_clause = false;
+           clause_ptr = &loop_worker_clause;
+           clause_to_check = num_workers_clause;
+           break;
+         case OMP_CLAUSE_VECTOR:
+           found_num_clause = true;
+           add_auto_clause = false;
+           clause_ptr = &loop_vector_clause;
+           clause_to_check = vector_length_clause;
+           break;
+         case OMP_CLAUSE_INDEPENDENT:
+         case OMP_CLAUSE_SEQ:
+         case OMP_CLAUSE_AUTO:
+           add_auto_clause = false;
+         default:
+           break;
        }
       if (found_num_clause && OMP_CLAUSE_OPERAND (loop_clause, 0) != NULL)
        {
@@ -646,10 +672,13 @@ make_region_loop_nest (gimple *omp_for, gimple_seq stmts,
   clauses = unshare_expr (clauses);

   /* Figure out the region code for this region.  */
-  /* Optimistic default: assume that the loop nest is parallelizable
-     (essentially, no GIMPLE_OMP_FOR with (explicit or implicit) 'auto' clause,
-     and no un-annotated loops).  */
-  int region_code = GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED;
+  /* For "parloops", use an optimistic default: assume that the loop nest is
+     parallelizable (essentially, no GIMPLE_OMP_FOR with (explicit or implicit)
+     'auto' clause, and no un-annotated loops).  */
+  int region_code = param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE
+                       ? GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE
+                       : GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED;
+
   adjust_region_code (stmts, &region_code);

   if (region_code == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED)
@@ -661,6 +690,19 @@ make_region_loop_nest (gimple *omp_for, gimple_seq stmts,
                         "parallelized loop nest"
                         " in OpenACC %<kernels%> region\n");

+      clauses = transform_kernels_loop_clauses (omp_for,
+                                               num_gangs_clause,
+                                               num_workers_clause,
+                                               vector_length_clause,
+                                               clauses);
+    }
+  else if (region_code == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE)
+    {
+      if (dump_enabled_p ())
+       dump_printf_loc (MSG_NOTE, omp_for,
+                        "forwarded loop nest in OpenACC %<kernels%> region"
+                        " to %<Graphite%> for analysis\n");
+
       clauses = transform_kernels_loop_clauses (omp_for,
                                                num_gangs_clause,
                                                num_workers_clause,
@@ -1526,8 +1568,13 @@ public:
   /* opt_pass methods: */
   virtual bool gate (function *)
   {
-    return (flag_openacc
-           && param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE);
+    if (param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE
+       || param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE_PARLOOPS)
+      return flag_openacc;
+    else if (param_openacc_kernels == OPENACC_KERNELS_PARLOOPS)
+      return false;
+    else
+      gcc_unreachable ();
   }
   virtual unsigned int execute (function *)
   {
diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index e99aaac0e515..2743e90f79a3 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -746,6 +746,198 @@ oacc_xform_loop (gcall *call)
   gsi_replace_with_seq (&gsi, seq, true);
 }

+/* This is used for expanding the loop calls to "fake" values that mimic the
+   values used for host execution during scalar evolution analysis in
+   Graphite. The function has been derived from oacc_xform_loop which could not
+   be used because it rewrites the code directly.
+
+   TODO This function can either be simplified significantly (cf. the fixed
+   values for number_of_threads, thread_index, chunking, striding) or unified
+   with oacc_xform_loop. */
+
+tree
+oacc_extract_loop_call (gcall *call)
+{
+  gimple_stmt_iterator gsi = gsi_for_stmt (call);
+  enum ifn_goacc_loop_kind code
+      = (enum ifn_goacc_loop_kind)TREE_INT_CST_LOW (gimple_call_arg (call, 0));
+  tree dir = gimple_call_arg (call, 1);
+  tree range = gimple_call_arg (call, 2);
+  tree step = gimple_call_arg (call, 3);
+  tree chunk_size = NULL_TREE;
+  unsigned mask = (unsigned)TREE_INT_CST_LOW (gimple_call_arg (call, 5));
+  tree lhs = gimple_call_lhs (call);
+  tree type = NULL_TREE;
+  tree diff_type = TREE_TYPE (range);
+  tree r = NULL_TREE;
+  bool chunking = false, striding = true;
+  unsigned outer_mask = mask & (~mask + 1); // Outermost partitioning
+
+  gcc_checking_assert (lhs);
+
+  type = TREE_TYPE (lhs);
+
+  tree number_of_threads = integer_one_node;
+  tree thread_index = integer_zero_node;
+
+  /* striding=true, chunking=true
+       -> invalid.
+     striding=true, chunking=false
+       -> chunks=1
+     striding=false,chunking=true
+       -> chunks=ceil (range/(chunksize*threads*step))
+     striding=false,chunking=false
+       -> chunk_size=ceil(range/(threads*step)),chunks=1  */
+
+  switch (code)
+    {
+    default:
+      gcc_unreachable ();
+
+    case IFN_GOACC_LOOP_CHUNKS:
+      if (!chunking)
+        r = build_int_cst (type, 1);
+      else
+        {
+          /* chunk_max
+             = (range - dir) / (chunks * step * num_threads) + dir  */
+          tree per = number_of_threads;
+          per = fold_convert (type, per);
+          chunk_size = fold_convert (type, chunk_size);
+          per = fold_build2 (MULT_EXPR, type, per, chunk_size);
+          per = fold_build2 (MULT_EXPR, type, per, step);
+          r = fold_build2 (MINUS_EXPR, type, range, dir);
+          r = fold_build2 (PLUS_EXPR, type, r, per);
+          r = fold_build2 (TRUNC_DIV_EXPR, type, r, per);
+        }
+      break;
+
+    case IFN_GOACC_LOOP_STEP:
+      {
+        /* If striding, step by the entire compute volume, otherwise
+           step by the inner volume.  */
+        r = number_of_threads;
+        r = fold_build2 (MULT_EXPR, type, fold_convert (type, r), step);
+      }
+      break;
+
+    case IFN_GOACC_LOOP_OFFSET:
+      /* Enable vectorization on non-SIMT targets.  */
+      if (!targetm.simt.vf
+          && outer_mask == GOMP_DIM_MASK (GOMP_DIM_VECTOR)
+          /* If not -fno-tree-loop-vectorize, hint that we want to vectorize
+             the loop.  */
+          && (flag_tree_loop_vectorize
+              || !global_options_set.x_flag_tree_loop_vectorize))
+        {
+          basic_block bb = gsi_bb (gsi);
+          class loop *parent = bb->loop_father;
+          class loop *body = parent->inner;
+
+          parent->force_vectorize = true;
+          parent->safelen = INT_MAX;
+
+          /* "Chunking loops" may have inner loops.  */
+          if (parent->inner)
+            {
+              body->force_vectorize = true;
+              body->safelen = INT_MAX;
+            }
+
+          cfun->has_force_vectorize_loops = true;
+        }
+      if (striding)
+        {
+          r = thread_index;
+          r = fold_convert (diff_type, r);
+        }
+      else
+        {
+          tree inner_size = number_of_threads;
+          tree outer_size = number_of_threads;
+          tree volume = fold_build2 (MULT_EXPR, TREE_TYPE (inner_size),
+                                     inner_size, outer_size);
+
+          volume = fold_convert (diff_type, volume);
+          if (chunking)
+            chunk_size = fold_convert (diff_type, chunk_size);
+          else
+            {
+              tree per = fold_build2 (MULT_EXPR, diff_type, volume, step);
+
+              chunk_size = fold_build2 (MINUS_EXPR, diff_type, range, dir);
+              chunk_size = fold_build2 (PLUS_EXPR, diff_type, chunk_size, per);
+              chunk_size
+                  = fold_build2 (TRUNC_DIV_EXPR, diff_type, chunk_size, per);
+            }
+
+          tree span = fold_build2 (MULT_EXPR, diff_type, chunk_size,
+                                   fold_convert (diff_type, inner_size));
+          r = thread_index;
+          r = fold_convert (diff_type, r);
+          r = fold_build2 (MULT_EXPR, diff_type, r, span);
+
+          tree inner = thread_index;
+          inner = fold_convert (diff_type, inner);
+          r = fold_build2 (PLUS_EXPR, diff_type, r, inner);
+
+          if (chunking)
+            {
+              tree chunk = fold_convert (diff_type, gimple_call_arg (call, 6));
+              tree per
+                  = fold_build2 (MULT_EXPR, diff_type, volume, chunk_size);
+              per = fold_build2 (MULT_EXPR, diff_type, per, chunk);
+
+              r = fold_build2 (PLUS_EXPR, diff_type, r, per);
+            }
+        }
+      r = fold_build2 (MULT_EXPR, diff_type, r, step);
+      if (type != diff_type)
+        r = fold_convert (type, r);
+      break;
+
+    case IFN_GOACC_LOOP_BOUND:
+      if (striding)
+        r = range;
+      else
+        {
+          tree inner_size = number_of_threads;
+          tree outer_size = number_of_threads;
+          tree volume = fold_build2 (MULT_EXPR, TREE_TYPE (inner_size),
+                                     inner_size, outer_size);
+
+          volume = fold_convert (diff_type, volume);
+          if (chunking)
+            chunk_size = fold_convert (diff_type, chunk_size);
+          else
+            {
+              tree per = fold_build2 (MULT_EXPR, diff_type, volume, step);
+
+              chunk_size = fold_build2 (MINUS_EXPR, diff_type, range, dir);
+              chunk_size = fold_build2 (PLUS_EXPR, diff_type, chunk_size, per);
+              chunk_size
+                  = fold_build2 (TRUNC_DIV_EXPR, diff_type, chunk_size, per);
+            }
+
+          tree span = fold_build2 (MULT_EXPR, diff_type, chunk_size,
+                                   fold_convert (diff_type, inner_size));
+
+          r = fold_build2 (MULT_EXPR, diff_type, span, step);
+
+          tree offset = gimple_call_arg (call, 6);
+          r = fold_build2 (PLUS_EXPR, diff_type, r,
+                           fold_convert (diff_type, offset));
+          r = fold_build2 (integer_onep (dir) ? MIN_EXPR : MAX_EXPR, diff_type,
+                           r, range);
+        }
+      if (diff_type != type)
+        r = fold_convert (type, r);
+      break;
+    }
+
+  return r;
+}
+
 /* Transform a GOACC_TILE call.  Determines the element loop span for
    the specified loop of the nest.  This is 1 if we're not tiling.

@@ -936,7 +1128,8 @@ oacc_validate_dims (tree fn, tree attrs, int *dims, int level, unsigned used)
 #endif
   if (check
       && warn_openacc_parallelism
-      && !lookup_attribute ("oacc kernels", DECL_ATTRIBUTES (fn)))
+      && !lookup_attribute ("oacc kernels", DECL_ATTRIBUTES (fn))
+      && !lookup_attribute ("oacc parallel_kernels_graphite", DECL_ATTRIBUTES (fn)))
     {
       static char const *const axes[] =
       /* Must be kept in sync with GOMP_DIM enumeration.  */
@@ -1435,7 +1628,219 @@ oacc_loop_process (oacc_loop *loop)
     oacc_loop_process (loop->sibling);
 }

-/* Walk the OpenACC loop heirarchy checking and assigning the
+/* Return the outermost CFG loop that is enclosed between the head and
+   tail mark calls for LOOP, or NULL if there is no such CFG loop.
+
+   The outermost CFG loop is a loop that is used for "chunking" the
+   original loop from the user's code.  The lower_omp_for function
+   in omp-low.c which creates the head and tail mark sequence and
+   the expand_oacc_for function in omp-expand.c are relevant for
+   understanding the structure that we expect to find here. But note
+   that the passes implemented in those files do not operate on CFG
+   loops and hence the correspondence to the CFG loop structure is
+   not directly visible there and has to be inferred. */
+
+static loop_p
+oacc_loop_get_cfg_loop (oacc_loop *loop)
+{
+  loop_p enclosed_cfg_loop = NULL;
+  for (unsigned dim = 0; dim < GOMP_DIM_MAX; ++dim)
+    {
+      gcall *tail_mark = loop->tails[dim];
+      gimple *head_mark = loop->heads[dim];
+      if (!tail_mark)
+        continue;
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+        dump_printf (MSG_OPTIMIZED_LOCATIONS | MSG_PRIORITY_INTERNALS, "%G",
+                     tail_mark);
+
+      loop_p mark_cfg_loop = tail_mark->bb->loop_father;
+      loop_p current_cfg_loop = mark_cfg_loop;
+
+      /* Ascend from TAIL_MARK until a different CFG loop is reached.
+
+         From the way that OpenACC loops are treated in omp-low.c, we
+         could expect the tail marker to be immediately preceded by a
+         loop exit. But loop optimizations (e.g. store-motion in
+         pass_lim) can change this. */
+      basic_block bb = tail_mark->bb;
+      bool empty_loop = false;
+      while (current_cfg_loop == mark_cfg_loop)
+        {
+          /* If the OpenACC loop becomes empty due to optimizations,
+             there is no CFG loop at all enclosed between head and
+             tail mark */
+          if (bb == head_mark->bb)
+            {
+              empty_loop = true;
+              break;
+            }
+
+          bb = get_immediate_dominator (CDI_DOMINATORS, bb);
+          current_cfg_loop = bb->loop_father;
+        }
+
+      if (empty_loop)
+        continue;
+
+      /* We expect to find the same CFG loop enclosed between all head
+         and tail mark pairs. Hence we actually need to look at only
+         the first available pair. But we consider all for
+         verification purposes. */
+      if (enclosed_cfg_loop)
+        {
+          gcc_assert (current_cfg_loop == enclosed_cfg_loop);
+          continue;
+        }
+
+      enclosed_cfg_loop = current_cfg_loop;
+
+      gcc_checking_assert (dominated_by_p (
+          CDI_DOMINATORS, enclosed_cfg_loop->header, head_mark->bb));
+    }
+
+  return enclosed_cfg_loop;
+}
+
+static const char*
+can_be_parallel_str (loop_p loop)
+{
+  if (!loop->can_be_parallel_valid_p)
+    return "not analyzed";
+
+  return loop->can_be_parallel ? "can be parallel" : "cannot be parallel";
+}
+
+/* Returns true if LOOP is known to be parallelizable and false
+   otherwise.  The decision is based on the the dependence analysis
+   that must have been previously performed by Graphite on the CFG
+   loops contained in the OpenACC loop LOOP.  The value of ANALYZED is
+   set to true if all relevant CFG loops have been analyzed. */
+
+static bool
+oacc_loop_can_be_parallel_p (oacc_loop *loop, bool& analyzed)
+{
+  /* Graphite will not run without enabled optimizations, so we cannot
+     expect to find any parallelizability information on the CFG loops. */
+  if (!optimize)
+    return false;
+
+  const dump_user_location_t loc
+      = dump_user_location_t::from_location_t (loop->loc);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    dump_printf_loc (MSG_OPTIMIZED_LOCATIONS | MSG_PRIORITY_INTERNALS, loc,
+                     "Inspecting CFG-loops for OpenACC loop.\n");
+
+  /* Search for the CFG loops that are enclosed between the head and
+     tail mark calls for LOOP. The two outer CFG loops are considered
+     to belong to the OpenACC loop and hence the CAN_BE_PARALLEL flags
+     on those loops will be used to determine the return value. */
+  bool can_be_parallel = false;
+  loop_p enclosed_cfg_loop = oacc_loop_get_cfg_loop (loop);
+
+  if (enclosed_cfg_loop
+      /* The inner loop may have been removed in degenerate cases, e.g.
+         if an infinite "for (; ;)" gets optimized in an OpenACC loop nest. */
+      && enclosed_cfg_loop->inner)
+    {
+      gcc_assert (enclosed_cfg_loop->inner != NULL);
+      gcc_assert (enclosed_cfg_loop->inner->next == NULL);
+
+      can_be_parallel = enclosed_cfg_loop->can_be_parallel
+                        && enclosed_cfg_loop->inner->can_be_parallel;
+
+      analyzed = enclosed_cfg_loop->can_be_parallel_valid_p
+                 && enclosed_cfg_loop->inner->can_be_parallel_valid_p;
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+        {
+          dump_printf (MSG_OPTIMIZED_LOCATIONS | MSG_PRIORITY_INTERNALS,
+                       "\tOuter loop <%d> preceeding tail mark %s.\n"
+                       "\tInner loop <%d> %s.\n",
+                       enclosed_cfg_loop->num,
+                       can_be_parallel_str (enclosed_cfg_loop),
+                       enclosed_cfg_loop->inner->num,
+                       can_be_parallel_str (enclosed_cfg_loop->inner));
+        }
+    }
+  else if (dump_file && (dump_flags & TDF_DETAILS))
+    dump_printf_loc (MSG_OPTIMIZED_LOCATIONS | MSG_PRIORITY_INTERNALS, loc,
+                     "Empty OpenACC loop.\n");
+
+  return can_be_parallel;
+}
+
+static bool
+oacc_parallel_kernels_graphite_fun_p ()
+{
+  return lookup_attribute ("oacc parallel_kernels_graphite",
+                           DECL_ATTRIBUTES (cfun->decl));
+}
+
+static bool
+oacc_parallel_fun_p ()
+{
+  return lookup_attribute ("oacc parallel",
+                           DECL_ATTRIBUTES (cfun->decl));
+}
+
+/* If LOOP is an "auto" loop for which dependence analysis has determined that
+   it can be parallelized, make it "independent" by adjusting its FLAGS field
+   and return true. Otherwise, return false. */
+
+static bool
+oacc_loop_transform_auto_into_independent (oacc_loop *loop)
+{
+  if (!optimize)
+    return false;
+
+  /* This function is only relevant on "kernels"
+     regions that have been explicitly designated
+     to be analyzed by Graphite and on "auto"
+     loops in "parallel" regions. */
+  if (!oacc_parallel_kernels_graphite_fun_p () &&
+      !oacc_parallel_fun_p ())
+    return false;
+
+  if (loop->routine)
+    return false;
+
+  if (!(loop->flags & OLF_AUTO))
+    return false;
+
+  bool analyzed = false;
+  bool can_be_parallel = oacc_loop_can_be_parallel_p (loop, analyzed);
+  dump_user_location_t loc = dump_user_location_t::from_location_t (loop->loc);
+
+  if (dump_enabled_p ())
+    {
+      if (!analyzed)
+        dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
+                         "'auto' loop has not been analyzed (cf. 'graphite' "
+                         "dumps for more information).\n");
+    }
+  if (!can_be_parallel)
+    return false;
+
+  loop->flags |= OLF_INDEPENDENT;
+
+  /* We need to keep the OLF_AUTO flag for now.
+     oacc_loop_fixed_partitions and oacc_loop_auto_partitions
+     interpret "independent auto" as "this loop can be parallel,
+     please determine the dimensions" which seems to correspond to the
+     meaning of those clauses in an old OpenACC version.  We rely on
+     this behaviour to assign the dimensions for this loop.
+
+     TODO Use a different flag to indicate that the dimensions must be assigned. */
+
+  // loop->flags &= ~OLF_AUTO;
+
+  return true;
+}
+
+/* Walk the OpenACC loop hierarchy checking and assigning the
    programmer-specified partitionings.  OUTER_MASK is the partitioning
    this loop is contained within.  Return mask of partitioning
    encountered.  If any auto loops are discovered, set GOMP_DIM_MAX
@@ -1491,6 +1896,9 @@ oacc_loop_fixed_partitions (oacc_loop *loop, unsigned outer_mask)
          loop->flags |= OLF_AUTO;
          mask_all |= GOMP_DIM_MASK (GOMP_DIM_MAX);
        }
+
+      if (oacc_loop_transform_auto_into_independent (loop))
+         mask_all |= GOMP_DIM_MASK (GOMP_DIM_MAX);
     }

   if (this_mask & outer_mask)
@@ -1932,24 +2340,29 @@ execute_oacc_loop_designation ()
       flag_openacc_dims = (char *)&flag_openacc_dims;
     }

-  bool is_oacc_parallel
-    = (lookup_attribute ("oacc parallel",
-                        DECL_ATTRIBUTES (current_function_decl)) != NULL);
   bool is_oacc_kernels
     = (lookup_attribute ("oacc kernels",
                         DECL_ATTRIBUTES (current_function_decl)) != NULL);
+  bool is_oacc_parallel
+    = (lookup_attribute ("oacc parallel",
+                        DECL_ATTRIBUTES (current_function_decl)) != NULL);
   bool is_oacc_serial
     = (lookup_attribute ("oacc serial",
                         DECL_ATTRIBUTES (current_function_decl)) != NULL);
   bool is_oacc_parallel_kernels_parallelized
-    = (lookup_attribute ("oacc parallel_kernels_parallelized",
-                        DECL_ATTRIBUTES (current_function_decl)) != NULL);
+      = (lookup_attribute ("oacc parallel_kernels_parallelized",
+                           DECL_ATTRIBUTES (current_function_decl))
+         != NULL);
+  bool is_oacc_parallel_kernels_graphite
+    = (lookup_attribute ("oacc parallel_kernels_graphite",
+                        DECL_ATTRIBUTES (current_function_decl)) != NULL);
   bool is_oacc_parallel_kernels_gang_single
     = (lookup_attribute ("oacc parallel_kernels_gang_single",
                         DECL_ATTRIBUTES (current_function_decl)) != NULL);
   int fn_level = oacc_fn_attrib_level (attrs);
   bool is_oacc_routine = (fn_level >= 0);
   gcc_checking_assert (is_oacc_parallel
+                      + is_oacc_parallel_kernels_graphite
                       + is_oacc_kernels
                       + is_oacc_serial
                       + is_oacc_parallel_kernels_parallelized
@@ -1957,31 +2370,50 @@ execute_oacc_loop_designation ()
                       + is_oacc_routine
                       == 1);

-  bool is_oacc_kernels_parallelized
-    = (lookup_attribute ("oacc kernels parallelized",
-                        DECL_ATTRIBUTES (current_function_decl)) != NULL);
-  if (is_oacc_kernels_parallelized)
-    gcc_checking_assert (is_oacc_kernels);
+  if (is_oacc_parallel_kernels_parallelized)
+    {
+      gcc_checking_assert (!is_oacc_kernels);
+      gcc_checking_assert (!is_oacc_parallel_kernels_gang_single);
+    }
+  if (is_oacc_parallel_kernels_parallelized)
+    {
+      gcc_checking_assert (!is_oacc_kernels);
+      gcc_checking_assert (!is_oacc_parallel_kernels_gang_single);
+    }
+  if (is_oacc_parallel_kernels_gang_single)
+    {
+      gcc_checking_assert (!is_oacc_kernels);
+      gcc_checking_assert (!is_oacc_parallel_kernels_parallelized);
+    }
+  if (is_oacc_parallel_kernels_graphite)
+    {
+      gcc_checking_assert (!is_oacc_kernels);
+      gcc_checking_assert (!is_oacc_parallel_kernels_gang_single);
+      gcc_checking_assert (!is_oacc_parallel_kernels_parallelized);
+    }

   if (dump_file)
     {
-      if (is_oacc_parallel)
-       fprintf (dump_file, "Function is OpenACC parallel offload\n");
+      if (fn_level >= 0)
+       fprintf (dump_file, "Function is OpenACC routine level %d\n",
+                fn_level);
       else if (is_oacc_kernels)
        fprintf (dump_file, "Function is %s OpenACC kernels offload\n",
-                (is_oacc_kernels_parallelized
+                (is_oacc_parallel_kernels_parallelized
                  ? "parallelized" : "unparallelized"));
-      else if (is_oacc_serial)
-       fprintf (dump_file, "Function is OpenACC serial offload\n");
       else if (is_oacc_parallel_kernels_parallelized)
        fprintf (dump_file, "Function is %s OpenACC kernels offload\n",
                 "parallel_kernels_parallelized");
       else if (is_oacc_parallel_kernels_gang_single)
        fprintf (dump_file, "Function is %s OpenACC kernels offload\n",
                 "parallel_kernels_gang_single");
-      else if (is_oacc_routine)
-       fprintf (dump_file, "Function is OpenACC routine level %d\n",
-                fn_level);
+      else if (is_oacc_parallel_kernels_graphite)
+       fprintf (dump_file, "Function is %s OpenACC kernels offload\n",
+                "parallel_kernels_graphite");
+      else if (is_oacc_serial)
+       fprintf (dump_file, "Function is OpenACC serial offload\n");
+      else if (is_oacc_parallel)
+       fprintf (dump_file, "Function is OpenACC parallel offload\n");
       else
        gcc_unreachable ();
     }
@@ -2027,7 +2459,7 @@ execute_oacc_loop_designation ()
   /* Unparallelized OpenACC kernels constructs must get launched as 1 x 1 x 1
      kernels, so remove the parallelism dimensions function attributes
      potentially set earlier on.  */
-  if (is_oacc_kernels && !is_oacc_kernels_parallelized)
+  if (is_oacc_kernels && !is_oacc_parallel_kernels_parallelized)
     {
       oacc_set_fn_attrib (current_function_decl, NULL, NULL);
       attrs = oacc_get_fn_attrib (current_function_decl);
@@ -2042,8 +2474,10 @@ execute_oacc_loop_designation ()
   unsigned used_mask = oacc_loop_partition (loops, outer_mask);
   /* OpenACC kernels constructs are special: they currently don't use the
      generic oacc_loop infrastructure and attribute/dimension processing.  */
-  if (is_oacc_kernels && is_oacc_kernels_parallelized)
+  if (is_oacc_kernels && is_oacc_parallel_kernels_parallelized)
     {
+      gcc_checking_assert (!is_oacc_parallel_kernels_graphite);
+
       /* Parallelized OpenACC kernels constructs use gang parallelism.  See
         also tree-parloops.c:create_parallel_loop.  */
       used_mask |= GOMP_DIM_MASK (GOMP_DIM_GANG);
@@ -2192,6 +2626,11 @@ execute_oacc_device_lower ()
                  remove = true;
                  break;

+               case IFN_UNIQUE_OACC_PRIVATE_SCALAR:
+               case IFN_UNIQUE_OACC_FIRSTPRIVATE:
+                 remove = true;
+                 break;
+
                case IFN_UNIQUE_OACC_PRIVATE:
                  {
                    dump_flags_t l_dump_flags
diff --git a/gcc/omp-offload.h b/gcc/omp-offload.h
index b91d08cd2182..cacc8ea7614d 100644
--- a/gcc/omp-offload.h
+++ b/gcc/omp-offload.h
@@ -31,5 +31,7 @@ extern GTY(()) vec<tree, va_gc> *offload_vars;

 extern void omp_finish_file (void);
 extern void omp_discover_implicit_declare_target (void);
+extern tree oacc_extract_loop_call (gcall *call);
+

 #endif /* GCC_OMP_DEVICE_H */
diff --git a/gcc/params.opt b/gcc/params.opt
index 8c5948f7a84d..52de12617cbe 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -794,8 +794,8 @@ Common Joined UInteger Var(param_min_vect_loop_bound) Param Optimization
 If -ftree-vectorize is used, the minimal loop bound of a loop to be considered for vectorization.

 -param=openacc-kernels=
-Common Joined Enum(openacc_kernels) Var(param_openacc_kernels) Init(OPENACC_KERNELS_PARLOOPS) Param
---param=openacc-kernels=[decompose|parloops]   Specify mode of OpenACC 'kernels' constructs handling.
+Common Joined Enum(openacc_kernels) Var(param_openacc_kernels) Init(OPENACC_KERNELS_DECOMPOSE) Param
+--param=openacc-kernels=[decompose|decompose-parloops|parloops]        Specify mode of OpenACC 'kernels' constructs handling.

 Enum
 Name(openacc_kernels) Type(enum openacc_kernels)
@@ -803,6 +803,9 @@ Name(openacc_kernels) Type(enum openacc_kernels)
 EnumValue
 Enum(openacc_kernels) String(decompose) Value(OPENACC_KERNELS_DECOMPOSE)

+EnumValue
+Enum(openacc_kernels) String(decompose-parloops) Value(OPENACC_KERNELS_DECOMPOSE_PARLOOPS)
+
 EnumValue
 Enum(openacc_kernels) String(parloops) Value(OPENACC_KERNELS_PARLOOPS)

diff --git a/gcc/sese.c b/gcc/sese.c
index ca88f9bbfdf1..50bdde6c537a 100644
--- a/gcc/sese.c
+++ b/gcc/sese.c
@@ -448,8 +448,29 @@ scalar_evolution_in_region (const sese_l &region, loop_p loop, tree t)
   if (!loop_in_sese_p (loop, region))
     loop = NULL;

-  return instantiate_scev (region.entry, loop,
-                          analyze_scalar_evolution (loop, t));
+  tree chrec = analyze_scalar_evolution (loop, t);
+
+  /* The IFN_GOACC_LOOP calls may evolve to an ssa name that is defined outside
+     of LOOP. To avoid failing the scev analysis, we need this special
+     handling. */
+  if (TREE_CODE (t) == SSA_NAME)
+    {
+      gimple *def_stmt = SSA_NAME_DEF_STMT (t);
+      basic_block def_bb = def_stmt->bb;
+      if (is_gimple_call (def_stmt)
+          && gimple_call_internal_p (def_stmt, IFN_GOACC_LOOP)
+          && TREE_CODE (chrec) == SSA_NAME && def_bb
+          && SSA_NAME_DEF_STMT (chrec)->bb)
+        {
+          loop_p outer_loop = SSA_NAME_DEF_STMT (chrec)->bb->loop_father;
+          loop_p inner_loop = def_bb->loop_father;
+
+          if (outer_loop != inner_loop)
+            return scalar_evolution_in_region (region, outer_loop, chrec);
+        }
+    }
+
+  return instantiate_scev (region.entry, loop, chrec);
 }

 /* Return true if BB is empty, contains only DEBUG_INSNs.  */
diff --git a/gcc/sese.h b/gcc/sese.h
index c51ea68bfb47..114bb9b0c0b4 100644
--- a/gcc/sese.h
+++ b/gcc/sese.h
@@ -280,6 +280,7 @@ typedef struct gimple_poly_bb
   vec<data_reference_p> data_refs;
   vec<scalar_use> read_scalar_refs;
   vec<tree> write_scalar_refs;
+  vec<tree> kill_scalar_refs;
 } *gimple_poly_bb_p;

 #define GBB_BB(GBB) (GBB)->bb
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-kernels.c b/gcc/testsuite/c-c++-common/goacc/classify-kernels.c
index 37e2a57455d1..8430cb868157 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-kernels.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-kernels.c
@@ -20,7 +20,7 @@ extern unsigned int *__restrict c;
 void KERNELS ()
 {
 #pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N]) /* { dg-message "optimized: assigned OpenACC gang loop parallelism" } */
-  for (unsigned int i = 0; i < N; i++)
+  for (unsigned int i = 0; i < N; i++) /* { dg-message "note: beginning .Graphite. region in OpenACC .kernels. construct" } */
     c[i] = a[i] + b[i];
 }

diff --git a/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-2.f90 b/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-2.f90
new file mode 100644
index 000000000000..bba67dcf7cbc
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-2.f90
@@ -0,0 +1,47 @@
+! Verify that Graphite's analysis of the CFG loops gets correctly
+! transferred to the OpenACC loop structure for loop-nests of depth 1
+
+! { dg-additional-options "-fdump-tree-graphite-details -fdump-tree-oaccloops1-details -fopt-info-optimized -fopt-info-missed" }
+! { dg-additional-options "--param max-isl-operations=0" }
+! { dg-additional-options "-O2" }
+! { dg-prune-output ".*not inlinable.*" }
+
+module test_module
+
+  real, allocatable :: array1(:)
+  real, allocatable :: array2(:)
+
+  contains
+
+subroutine test_loop_nest_depth_1 ()
+  implicit none
+
+  integer :: i,n
+
+  if (size (array1) /= size (array2)) return
+  n = size(array1)
+
+  !$acc parallel loop auto copy(array1, array2) ! { dg-message "assigned OpenACC gang vector loop parallelism" }
+  ! { dg-message "loop has no data-dependences" "" {target *-*-*} .-1 }
+  ! { dg-message ".auto. loop can be parallel" "" {target *-*-*} .-2 }
+  do i=1, n
+     array2(i) = array1(i) ! { dg-message "loop has no data-dependences" }
+  end do
+
+
+  !$acc parallel loop auto copy(array1, array2) ! { dg-message "assigned OpenACC seq loop parallelism" }
+  ! { dg-message "loop has no data-dependences" "" {target *-*-*} .-1 }
+  ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-2 }
+  do i=1, n-1
+     array1(i+1) = array1(i) + 10 ! { dg-message "loop has data-dependences" }
+     array2(i) = array1(i)
+  end do
+
+  return
+end subroutine test_loop_nest_depth_1
+
+
+
+end module test_module
+
+! { dg-final { scan-tree-dump-times "number of SCoPs: 1" 2 "graphite" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-3.f90 b/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-3.f90
new file mode 100644
index 000000000000..d635cc5e4fe0
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-3.f90
@@ -0,0 +1,103 @@
+! Verify that Graphite's analysis of the CFG loops gets correctly
+! transferred to the OpenACC loop structure for loop-nests of depth 2
+
+! { dg-additional-options "-fdump-tree-graphite-details -fdump-tree-oaccloops1-details" }
+! { dg-additional-options "-fopt-info-optimized -fopt-info-missed" }
+! { dg-additional-options "-O2" }
+! { dg-prune-output ".*not inlinable.*" }
+
+module test_module
+  implicit none
+
+  integer, parameter :: n = 100
+  integer, parameter :: m = 100
+
+contains
+
+  subroutine test_loop_nest_depth_2 (array)
+    integer :: i, j
+    real :: array (2, n, m)
+
+    ! Perfect loop-nest, inner and outer loop can be parallel
+
+    !$acc parallel copy(array)
+    !$acc loop auto
+    ! { dg-message "assigned OpenACC gang worker loop parallelism" "" {target *-*-*} .-1 }
+    ! { dg-message "loop has no data-dependences" "" {target *-*-*} .-2 }
+    ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+    do i=1, n
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC vector loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-message "loop has no data-dependences" "" {target *-*-*} .-2 }
+       ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+       do j=1, m
+          array (1, i, j) = array(2, i, j) ! { dg-message "loop has no data-dependences" }
+       end do
+    end do
+    !$acc end parallel
+
+    ! Imperfect loop-nest, inner and outer loop can be parallel
+
+    !$acc parallel copy(array)
+    !$acc loop auto
+    ! { dg-message "assigned OpenACC gang worker loop parallelism" "" {target *-*-*} .-1 }
+    ! { dg-message "loop has no data-dependences" "" {target *-*-*} .-2 }
+    ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+    do i=1, n
+       array (2, i, n) = array(1, i, n) ! { dg-message "loop has no data-dependences" }
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC vector loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-message "loop has no data-dependences" "" {target *-*-*} .-2 }
+       ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+       do j=1, m
+          array (1, i, j) = array (2, i,j) ! { dg-message "loop has no data-dependences" }
+       end do
+    end do
+    !$acc end parallel
+
+    ! Imperfect loop-nest, inner loop can be parallel, outer loop cannot be parallel
+
+    !$acc parallel copy(array)
+    !$acc loop auto
+    ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
+    ! { dg-message "loop has no data-dependences" "OpenACC internal chunking loop can be parallel" {target *-*-*} .-2 }
+    ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
+    do i=1, n-1
+       array (1, i+1, 1) = array (2, i, 1) ! { dg-message "loop has data-dependences" }
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC gang vector loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-message "loop has no data-dependences" "" {target *-*-*} .-2 }
+       ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+       do j=1, m
+          array (1, i, j) = array (2, i, j) ! { dg-message "loop has no data-dependences" }
+       end do
+    end do
+    !$acc end parallel
+
+
+    ! Imperfect loop-nest, inner loop can be parallel, outer loop cannot be parallel
+
+    !$acc parallel copy(array)
+    !$acc loop auto
+    ! { dg-message "assigned OpenACC gang vector loop parallelism" "" {target *-*-*} .-1 }
+    ! { dg-message "loop has no data-dependences" "" {target *-*-*} .-2 }
+    ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+    do i=1, n
+       array (2, i, n) = array (1, i, n) ! { dg-message "loop has no data-dependences" }
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-message "loop has no data-dependences" "OpenACC internal chunking loop can be parallel" {target *-*-*} .-2 }
+       ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
+       do j=1, m-1
+          array (1, i, j+1) = array (1, i, j) ! { dg-message "loop has data-dependences" }
+       end do
+    end do
+    !$acc end parallel
+    return
+  end subroutine test_loop_nest_depth_2
+
+end module test_module
+
+
+! { dg-final { scan-tree-dump-times "number of SCoPs: 1" 4 "graphite"  } } One function per kernel, all should be analyzed
+! { dg-final { scan-tree-dump-times "number of SCoPs: 0" 1 "graphite" } } Original function should not be analyzed
diff --git a/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-4.f90 b/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-4.f90
new file mode 100644
index 000000000000..97acecd8807b
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-4.f90
@@ -0,0 +1,323 @@
+! Verify that Graphite's analysis of the CFG loops gets correctly
+! transferred to the OpenACC loop structure for loop-nests of depth 3
+
+! { dg-additional-options "-fdump-tree-graphite-details -fdump-tree-oaccloops1-details" }
+! { dg-additional-options "-fopt-info-optimized -fopt-info-missed" }
+! { dg-additional-options "-O2" }
+! { dg-prune-output ".*not inlinable.*" }
+
+module test_module
+  implicit none
+
+  integer, parameter :: n = 100
+
+contains
+
+  subroutine test_loop_nest_depth_3 (array)
+    integer :: i, j, k
+    real :: array (2, n, n, n)
+
+    ! Perfect loop-nest. Can be parallel.
+
+    !$acc parallel copy(array)
+    !$acc loop auto
+    ! { dg-message "assigned OpenACC gang loop parallelism" "" {target *-*-*} .-1 }
+    ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+    ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+    do i=1, n
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC worker loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+       ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+       do j=1, n
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC vector loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+       ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+          do k=1, n
+             array (1, i, j, k) = array(2, i, j, k) ! { dg-message "loop has no data-dependences" }
+          end do
+       end do
+    end do
+    !$acc end parallel
+
+    ! Perfect loop-nest. Innermost loop cannot be parallel.
+
+    !$acc parallel copy(array)
+    !$acc loop auto
+    ! { dg-message "assigned OpenACC gang worker loop parallelism" "" {target *-*-*} .-1 }
+    ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+    ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+    do i=1, n
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC vector loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+       ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+       do j=1, n
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+       ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
+          do k=1, n-1
+             array (1, i, j, k+1) = array(1, i, j, k) ! { dg-message "loop has data-dependences" }
+          end do
+       end do
+    end do
+    !$acc end parallel
+
+
+    ! Perfect loop-nest. Cannot be parallel because it contains no
+    ! data-reference and is hence not analyzed by Graphite. This is
+    ! expected: empty loops should not be parallel either cf. e.g.
+    ! "../../gfortran.dg/goacc/note-parallelism.f90".
+
+    !$acc parallel copy(array)
+    !$acc loop auto
+    ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
+    ! { dg-missed ".auto. loop has not been analyzed .cf. .graphite. dumps for more information.." "" {target *-*-*} .-2 }
+    do i=1, n
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-missed ".auto. loop has not been analyzed .cf. .graphite. dumps for more information.." "" {target *-*-*} .-2 }
+       do j=1, n
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-bogus "loop has no data-dependences" "OpenACC internal chunking CFG loop not analyzed" {target *-*-*} .-2 }
+       ! { dg-missed ".auto. loop has not been analyzed .cf. .graphite. dumps for more information.." "" {target *-*-*} .-3 }
+          do k=1, n
+             array (1, i, j, k) = array(1, i, j, k) ! { dg-bogus "loop has no data-dependences" }
+          end do
+       end do
+    end do
+    !$acc end parallel
+
+
+    ! Imperfect loop-nest. All levels can be parallel.
+
+    !$acc parallel copy(array)
+    !$acc loop auto
+    ! { dg-message "assigned OpenACC gang loop parallelism" "" {target *-*-*} .-1 }
+    ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+    ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+    do i=1, n
+       array (2, i, n, n) = array (1, i, n, n) ! { dg-message "loop has no data-dependences" }
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC worker loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+       ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+       do j=1, n-1
+          array (2, i, j, n) = array (1, i, j, n) ! { dg-message "loop has no data-dependences" }
+          !$acc loop auto
+          ! { dg-message "assigned OpenACC vector loop parallelism" "" {target *-*-*} .-1 }
+          ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+          ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+          do k=1, n-1
+             array (2, i, j, k) = array(1, i, j, k) ! { dg-message "loop has no data-dependences" }
+          end do
+       end do
+    end do
+    !$acc end parallel
+
+
+    ! Imperfect loop-nest. First level can be parallel, second level
+    ! can be parallel, third level cannot be parallel.
+
+    !$acc parallel copy(array)
+    !$acc loop auto
+    ! { dg-message "assigned OpenACC gang worker loop parallelism" "" {target *-*-*} .-1 }
+    ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+    ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+    do i=1, n
+       array (2, i, n, n) = array (1, i, n, n) ! { dg-message "loop has no data-dependences" }
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC vector loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+       ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+       do j=1, n-1
+          array (2, i, j, n) = array (1, i, j, n) ! { dg-message "loop has no data-dependences" }
+          !$acc loop auto
+          ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
+          ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+          ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
+          do k=1, n-1
+             array (1, i, j, k+1) = array(1, i, j, k) ! { dg-message "loop has data-dependences" }
+          end do
+       end do
+    end do
+    !$acc end parallel
+
+
+    ! Imperfect loop-nest. First level can be parallel, second level
+    ! cannot be parallel, third level can be parallel.
+
+    !$acc parallel copy(array)
+    !$acc loop auto
+    ! { dg-message "assigned OpenACC gang worker loop parallelism" "" {target *-*-*} .-1 }
+    ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+    ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+    do i=1, n
+       array (2, i, n, n) = array (1, i, n, n) ! { dg-message "loop has no data-dependences" }
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+       ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
+       do j=1, n-1
+          array (1, i, j+1, n) = array (1, i, j, n) ! { dg-message "loop has data-dependences" }
+          !$acc loop auto
+          ! { dg-message "assigned OpenACC vector loop parallelism" "" {target *-*-*} .-1 }
+          ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+          ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+          do k=1, n-1
+             array (2, i, j, k) = array(1, i, j, k) ! { dg-message "loop has no data-dependences" }
+          end do
+       end do
+    end do
+    !$acc end parallel
+
+
+    ! Imperfect loop-nest. First level can be parallel, second and
+    ! third level cannot be parallel.
+
+    !$acc parallel copy(array)
+    !$acc loop auto
+    ! { dg-message "assigned OpenACC gang vector loop parallelism" "" {target *-*-*} .-1 }
+    ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+    ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+    do i=1, n
+       array (2, i, n, n) = array (1, i, n, n) ! { dg-message "loop has no data-dependences" }
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+       ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
+       do j=1, n-1
+          array (1, i, j+1, n) = array (1, i, j, n) ! { dg-message "loop has data-dependences" }
+          !$acc loop auto
+          ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
+          ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+          ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
+          do k=1, n-1
+             array (1, i, j, k+1) = array(1, i, j, k) ! { dg-message "loop has data-dependences" }
+          end do
+       end do
+    end do
+    !$acc end parallel
+
+
+    ! Imperfect loop-nest. First level cannot be parallel, second and
+    ! third levels can be parallel
+
+    !$acc parallel copy(array)
+    !$acc loop auto
+    ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
+    ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+    ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
+    do i=1, n - 1
+       array (1, i+1, 1, 1) = array (1, i, 1, 1) ! { dg-message "loop has data-dependences" }
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC gang worker loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+       ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+       do j=1, n
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC vector loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+       ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+          do k=1, n
+             array (1, i, j, k) = array(2, i, j, k) ! { dg-message "loop has no data-dependences" }
+          end do
+       end do
+    end do
+    !$acc end parallel
+
+
+    ! Imperfect loop-nest. First level cannot be parallel, second
+    ! level can be parallel, third level cannot be parallel.
+
+    !$acc parallel copy(array)
+    !$acc loop auto
+    ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
+    ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+    ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
+    do i=1, n - 1
+       array (1, i+1, 1, 1) = array (1, i, 1, 1) ! { dg-message "loop has data-dependences" }
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC gang vector loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+       ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+       do j=1, n
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+       ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
+          do k=1, n - 1
+             array (1, i, j, k+1) = array(1, i, j, k) ! { dg-message "loop has data-dependences" }
+          end do
+       end do
+    end do
+    !$acc end parallel
+
+
+    ! Imperfect loop-nest. First level cannot be parallel, second
+    ! level cannot be parallel, third level can be parallel.
+
+    !$acc parallel copy(array)
+    !$acc loop auto
+    ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
+    ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+    ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
+    do i=1, n - 1
+       array (1, i+1, 1, 1) = array (1, i, 1, 1) ! { dg-message "loop has data-dependences" }
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+       ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
+       do j=1, n - 1
+          array (1, i, j+1, 1) = array (1, i, j, 1) ! { dg-message "loop has data-dependences" }
+          !$acc loop auto
+          ! { dg-message "assigned OpenACC gang vector loop parallelism" "" {target *-*-*} .-1 }
+          ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+          ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+          do k=1, n
+             array (1, i, j, k) = array(2, i, j, k) ! { dg-message "loop has no data-dependences" }
+          end do
+       end do
+    end do
+    !$acc end parallel
+
+
+    ! Imperfect loop-nest. All levels cannot be parallel.
+
+    !$acc parallel copy(array)
+    !$acc loop auto
+    ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
+    ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+    ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
+    do i=1, n-1
+       array (1, i+1, 1, 1) = array (1, i, 1, 1) ! { dg-message "loop has data-dependences" }
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+       ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
+       do j=1, n-1
+          array (1, i, j+1, 1) = array (1, i, j, 1) ! { dg-message "loop has data-dependences" }
+          !$acc loop auto
+          ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
+          ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+          ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
+          do k=1, n-1
+             array (1, i, j, k+1) = array(1, i, j, k) ! { dg-message "loop has data-dependences" }
+          end do
+       end do
+    end do
+    !$acc end parallel
+
+    return
+  end subroutine test_loop_nest_depth_3
+
+end module test_module
+
+
+!  Outlined functions for all kernels but the one without data-references should be analyzed.
+! { dg-final { scan-tree-dump-times "number of SCoPs: 1" 10 "graphite"  } }
+! Original test functon and one outlined kernel function should not be analyzed
+! { dg-final { scan-tree-dump-times "number of SCoPs: 0" 2 "graphite" } }
diff --git a/gcc/tree-chrec.c b/gcc/tree-chrec.c
index eeb67ded3dcf..8170265a8d6e 100644
--- a/gcc/tree-chrec.c
+++ b/gcc/tree-chrec.c
@@ -249,6 +249,7 @@ chrec_fold_plus_1 (enum tree_code code, tree type,
          return chrec_fold_plus_poly_poly (code, type, op0, op1);

        CASE_CONVERT:
+       case VIEW_CONVERT_EXPR:
          {
            /* We can strip sign-conversions to signed by performing the
               operation in unsigned.  */
@@ -282,6 +283,7 @@ chrec_fold_plus_1 (enum tree_code code, tree type,
        }

     CASE_CONVERT:
+    case VIEW_CONVERT_EXPR:
       {
        /* We can strip sign-conversions to signed by performing the
           operation in unsigned.  */
@@ -323,6 +325,7 @@ chrec_fold_plus_1 (enum tree_code code, tree type,
                                    : build_int_cst_type (type, -1)));

        CASE_CONVERT:
+       case VIEW_CONVERT_EXPR:
          if (tree_contains_chrecs (op1, NULL))
            return chrec_dont_know;
          /* FALLTHRU */
diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index 6a3659dc490c..2b97c5043ac1 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -100,6 +100,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "vr-values.h"
 #include "range-op.h"
 #include "tree-ssa-loop-ivopts.h"
+#include "print-tree.h"
+#include "graphite-oacc.h"

 static struct datadep_stats
 {
@@ -225,7 +227,10 @@ dump_data_reference (FILE *outf,
   print_generic_stmt (outf, DR_REF (dr));
   fprintf (outf, "#  base_object: ");
   print_generic_stmt (outf, DR_BASE_OBJECT (dr));
-
+  fprintf (outf, "#  base_address: ");
+  print_generic_stmt (outf, DR_BASE_ADDRESS (dr));
+  fprintf (outf, "#  loop-invariant offset: ");
+  print_generic_stmt (outf, DR_OFFSET (dr));
   for (i = 0; i < DR_NUM_DIMENSIONS (dr); i++)
     {
       fprintf (outf, "#  Access function %d: ", i);
@@ -5865,9 +5870,13 @@ get_references_in_stmt (gimple *stmt, vec<data_ref_loc, va_heap> *references)
       if (gimple_call_internal_p (stmt))
        switch (gimple_call_internal_fn (stmt))
          {
-         case IFN_GOMP_SIMD_LANE:
-           {
-             class loop *loop = gimple_bb (stmt)->loop_father;
+         case IFN_UNIQUE:
+         case IFN_GOACC_REDUCTION:
+          case IFN_GOACC_LOOP:
+              return false;
+          case IFN_GOMP_SIMD_LANE:
+            {
+              class loop *loop = gimple_bb (stmt)->loop_father;
              tree uid = gimple_call_arg (stmt, 0);
              gcc_assert (TREE_CODE (uid) == SSA_NAME);
              if (loop == NULL
@@ -6042,7 +6051,6 @@ graphite_find_data_references_in_stmt (edge nest, loop_p loop, gimple *stmt,
                                       vec<data_reference_p> *datarefs)
 {
   auto_vec<data_ref_loc, 2> references;
-  bool ret = true;
   data_reference_p dr;

   if (get_references_in_stmt (stmt, &references))
@@ -6056,7 +6064,7 @@ graphite_find_data_references_in_stmt (edge nest, loop_p loop, gimple *stmt,
       datarefs->safe_push (dr);
     }

-  return ret;
+  return true;
 }

 /* Search the data references in LOOP, and record the information into
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index 5e64d5ed7a38..6c4bec69e7d0 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -4173,7 +4173,16 @@ public:
   virtual bool gate (function *)
   {
     if (oacc_kernels_p)
-      return flag_openacc;
+      {
+       if (param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE)
+         return false;
+
+        gcc_checking_assert (
+            param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE_PARLOOPS
+            || param_openacc_kernels == OPENACC_KERNELS_PARLOOPS);
+
+        return flag_openacc;
+      }
     else
       return flag_tree_parallelize_loops > 1;
   }
@@ -4192,6 +4201,13 @@ public:
 unsigned
 pass_parallelize_loops::execute (function *fun)
 {
+  if (oacc_kernels_p)
+    {
+      gcc_checking_assert (
+          param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE_PARLOOPS
+          || param_openacc_kernels == OPENACC_KERNELS_PARLOOPS);
+    }
+
   tree nthreads = builtin_decl_explicit (BUILT_IN_OMP_GET_NUM_THREADS);
   if (nthreads == NULL_TREE)
     return 0;
diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index dbdfe8ffa721..00ad0bc6a4c5 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -264,6 +264,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple.h"
 #include "ssa.h"
 #include "gimple-pretty-print.h"
+#include "tree-pretty-print.h"
+#include "print-tree.h"
 #include "fold-const.h"
 #include "gimplify.h"
 #include "gimple-iterator.h"
@@ -276,6 +278,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa.h"
 #include "cfgloop.h"
 #include "tree-chrec.h"
+#include "internal-fn.h"
+#include "graphite-oacc.h"
 #include "tree-affine.h"
 #include "tree-scalar-evolution.h"
 #include "dumpfile.h"
@@ -284,6 +288,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-into-ssa.h"
 #include "builtins.h"
 #include "case-cfn-macros.h"
+#include "omp-offload.h"
+#include "internal-fn.h"

 static tree analyze_scalar_evolution_1 (class loop *, tree);
 static tree analyze_scalar_evolution_for_address_of (class loop *loop,
@@ -311,7 +317,19 @@ struct scev_info_hasher : ggc_ptr_hash<scev_info_str>

 static GTY (()) hash_table<scev_info_hasher> *scalar_evolution_info;

-
+/* This flag indicates that internal OpenACC calls should be analyzed.
+   The analysis is not valid in general. It is used to allow Graphite
+   to analyze the partially lowered OpenACC loops as if it was seeing
+   the unlowered loops. */
+
+static bool analyze_openacc_calls = false;
+
+void set_scev_analyze_openacc_calls (bool analyze)
+{
+  analyze_openacc_calls = analyze;
+}
+
+
 /* Constructs a new SCEV_INFO_STR structure for VAR and INSTANTIATED_BELOW.  */

 static inline struct scev_info_str *
@@ -577,6 +595,51 @@ get_scalar_evolution (basic_block instantiated_below, tree scalar)
   return res;
 }

+bool
+oacc_call_analyzable_p (gimple *stmt)
+{
+  return analyze_openacc_calls
+         && gimple_call_internal_p (stmt, IFN_GOACC_LOOP);
+}
+
+bool
+oacc_call_analyzable_p (tree t)
+{
+  return TREE_CODE (t) == SSA_NAME
+         && oacc_call_analyzable_p (SSA_NAME_DEF_STMT (t));
+}
+
+/* Extract loop information from a OpenACC internal function call. */
+
+tree
+oacc_ifn_call_extract (gimple *stmt)
+{
+  if (oacc_call_analyzable_p (stmt))
+    {
+      gcc_assert (gimple_call_internal_p (stmt, IFN_GOACC_LOOP));
+      return oacc_extract_loop_call (as_a<gcall *> (stmt));
+    }
+
+  return chrec_dont_know;
+}
+
+/* If EXPR is a analyzable internal OpenACC function call,
+   return the result of its analysis; otherwise return EXPR. */
+
+tree
+oacc_simplify (tree expr)
+{
+  if (expr == NULL || TREE_CODE (expr) != SSA_NAME)
+    return expr;
+
+  gimple *def = SSA_NAME_DEF_STMT (expr);
+
+  if (oacc_call_analyzable_p (def))
+    return oacc_ifn_call_extract (def);
+
+  return expr;
+}
+
 /* Helper function for add_to_evolution.  Returns the evolution
    function for an assignment of the form "a = b + c", where "a" and
    "b" are on the strongly connected component.  CHREC_BEFORE is the
@@ -794,6 +857,8 @@ add_to_evolution (unsigned loop_nb, tree chrec_before, enum tree_code code,
   if (to_add == NULL_TREE)
     return chrec_before;

+  to_add = oacc_simplify (to_add);
+
   /* TO_ADD is either a scalar, or a parameter.  TO_ADD is not
      instantiated at this point.  */
   if (TREE_CODE (to_add) == POLYNOMIAL_CHREC)
@@ -966,6 +1031,7 @@ follow_ssa_edge_binary (class loop *loop, gimple *at_stmt,
       res = t_false;
     }

+  *evolution_of_loop = oacc_simplify (*evolution_of_loop);
   return res;
 }

@@ -1116,6 +1182,8 @@ follow_ssa_edge_inner_loop_phi (class loop *outer_loop,
                               evolution_of_loop, limit);
 }

+tree interpret_gimple_call (class loop *loop, gimple *call);
+
 /* Follow the ssa edge into the expression EXPR.
    Return true if the strongly connected component has been found.  */

@@ -1124,8 +1192,11 @@ follow_ssa_edge_expr (class loop *loop, gimple *at_stmt, tree expr,
                      gphi *halting_phi, tree *evolution_of_loop,
                      int limit)
 {
-  enum tree_code code;
-  tree type, rhs0, rhs1 = NULL_TREE;
+  enum tree_code code = LAST_AND_UNUSED_TREE_CODE;
+  tree type = NULL_TREE;
+  tree rhs0 = NULL_TREE;
+  tree rhs1 = NULL_TREE;
+

   /* The EXPR is one of the following cases:
      - an SSA_NAME,
@@ -1140,6 +1211,7 @@ follow_ssa_edge_expr (class loop *loop, gimple *at_stmt, tree expr,
      PHI nodes and otherwise expand appropriately for the expression
      handling below.  */
 tail_recurse:
+  expr = oacc_simplify (expr);
   if (TREE_CODE (expr) == SSA_NAME)
     {
       gimple *def = SSA_NAME_DEF_STMT (expr);
@@ -1187,28 +1259,37 @@ tail_recurse:
          return t_false;
        }

-      /* At this level of abstraction, the program is just a set
-        of GIMPLE_ASSIGNs and PHI_NODEs.  In principle there is no
-        other def to be handled.  */
-      if (!is_gimple_assign (def))
-       return t_false;
+      /* At this level of abstraction, the program is just a set of
+         GIMPLE_ASSIGNs and PHI_NODEs.  In principle there is no other def to
+         be handled except for OpenACC internal function calls. */
+      if (is_gimple_assign (def))
+        {
+          code = gimple_assign_rhs_code (def);
+
+          switch (get_gimple_rhs_class (code))
+            {
+            case GIMPLE_BINARY_RHS:
+              rhs0 = gimple_assign_rhs1 (def);
+              rhs1 = gimple_assign_rhs2 (def);
+              break;
+            case GIMPLE_UNARY_RHS:
+            case GIMPLE_SINGLE_RHS:
+              rhs0 = gimple_assign_rhs1 (def);
+              break;
+            default:
+              return t_false;
+            }
+          type = TREE_TYPE (gimple_assign_lhs (def));
+          at_stmt = def;
+        }
+      else if (oacc_call_analyzable_p (expr)) {
+       // TODO-kernels Is this still needed here?
+       rhs0 = interpret_gimple_call (loop, def);
+       type = TREE_TYPE (gimple_call_lhs (def));
+       at_stmt = def;
+      }
+      else return t_false;

-      code = gimple_assign_rhs_code (def);
-      switch (get_gimple_rhs_class (code))
-       {
-       case GIMPLE_BINARY_RHS:
-         rhs0 = gimple_assign_rhs1 (def);
-         rhs1 = gimple_assign_rhs2 (def);
-         break;
-       case GIMPLE_UNARY_RHS:
-       case GIMPLE_SINGLE_RHS:
-         rhs0 = gimple_assign_rhs1 (def);
-         break;
-       default:
-         return t_false;
-       }
-      type = TREE_TYPE (gimple_assign_lhs (def));
-      at_stmt = def;
     }
   else
     {
@@ -1473,6 +1554,7 @@ follow_copies_to_constant (tree var)
       else
        break;
     }
+  res = oacc_simplify (res);
   if (CONSTANT_CLASS_P (res))
     return res;
   return var;
@@ -1506,6 +1588,7 @@ analyze_initial_condition (gphi *loop_phi_node)
       tree branch = PHI_ARG_DEF (loop_phi_node, i);
       basic_block bb = gimple_phi_arg_edge (loop_phi_node, i)->src;

+      branch = oacc_simplify (branch);
       /* When the branch is oriented to the loop's body, it does
         not contribute to the initial condition.  */
       if (flow_bb_inside_loop_p (loop, bb))
@@ -1533,6 +1616,7 @@ analyze_initial_condition (gphi *loop_phi_node)
   /* We may not have fully constant propagated IL.  Handle degenerate PHIs here
      to not miss important early loop unrollings.  */
   init_cond = follow_copies_to_constant (init_cond);
+  init_cond = oacc_simplify (init_cond);

   if (dump_file && (dump_flags & TDF_SCEV))
     {
@@ -1558,6 +1642,7 @@ interpret_loop_phi (class loop *loop, gphi *loop_phi_node)
   /* Otherwise really interpret the loop phi.  */
   init_cond = analyze_initial_condition (loop_phi_node);
   res = analyze_evolution_in_loop (loop_phi_node, init_cond);
+  init_cond = analyze_initial_condition (loop_phi_node);

   /* Verify we maintained the correct initial condition throughout
      possible conversions in the SSA chain.  */
@@ -1630,8 +1715,11 @@ interpret_rhs_expr (class loop *loop, gimple *at_stmt,
        return chrec_convert (type, rhs1, at_stmt);

       if (code == SSA_NAME)
-       return chrec_convert (type, analyze_scalar_evolution (loop, rhs1),
-                             at_stmt);
+       {
+          rhs1 = oacc_simplify (rhs1);
+          return chrec_convert (type, analyze_scalar_evolution (loop, rhs1),
+                                at_stmt);
+        }

       if (code == ASSERT_EXPR)
        {
@@ -1920,7 +2008,25 @@ interpret_gimple_assign (class loop *loop, gimple *stmt)
                             gimple_assign_rhs2 (stmt));
 }

-
+/* Interpret a gimple call statement. */
+
+tree
+interpret_gimple_call (class loop *loop __attribute__ ((__unused__)), gimple *call)
+{
+
+  /* Information about OpenACC loops is encoded in internal function calls.
+     Extract loop information from those calls. Ignore other calls for now. */
+  if (!oacc_call_analyzable_p (call))
+    return chrec_dont_know;
+
+  tree expr = oacc_ifn_call_extract (call);
+  tree analyzed = expr;
+
+  tree lhs = gimple_call_lhs (call);
+  gcc_assert (lhs);
+
+  return chrec_convert (TREE_TYPE (lhs), analyzed, call);
+}

 /* This section contains all the entry points:
    - number_of_iterations_in_loop,
@@ -1943,6 +2049,8 @@ analyze_scalar_evolution_1 (class loop *loop, tree var)

   def = SSA_NAME_DEF_STMT (var);
   bb = gimple_bb (def);
+  if (!bb)
+    return chrec_dont_know;
   def_loop = bb->loop_father;

   if (!flow_bb_inside_loop_p (loop, bb))
@@ -1969,6 +2077,10 @@ analyze_scalar_evolution_1 (class loop *loop, tree var)
       res = interpret_gimple_assign (loop, def);
       break;

+    case GIMPLE_CALL:
+      res = interpret_gimple_call (loop, def);
+      break;
+
     case GIMPLE_PHI:
       if (loop_phi_node_p (def))
        res = interpret_loop_phi (loop, as_a <gphi *> (def));
@@ -2261,6 +2373,14 @@ instantiate_scev_name (edge instantiate_below,
   class loop *def_loop;
   basic_block def_bb = gimple_bb (SSA_NAME_DEF_STMT (chrec));

+  if (oacc_call_analyzable_p (chrec))
+    {
+      tree res
+          = interpret_gimple_call (evolution_loop, SSA_NAME_DEF_STMT (chrec));
+
+      return res;
+    }
+
   /* A parameter, nothing to do.  */
   if (!def_bb
       || !dominated_by_p (CDI_DOMINATORS, def_bb, instantiate_below->dest))
@@ -3376,6 +3496,9 @@ expression_expensive_p (tree expr, hash_map<tree, uint64_t> &cache,
        return true;
     }

+  if (oacc_call_analyzable_p (expr))
+      return false;
+
   bool visited_p;
   uint64_t &local_cost = cache.get_or_insert (expr, &visited_p);
   if (visited_p)
diff --git a/gcc/tree-scalar-evolution.h b/gcc/tree-scalar-evolution.h
index d679f7285b30..f35bfcd80417 100644
--- a/gcc/tree-scalar-evolution.h
+++ b/gcc/tree-scalar-evolution.h
@@ -42,6 +42,9 @@ extern bool simple_iv (class loop *, class loop *, tree, struct affine_iv *,
                       bool);
 extern bool iv_can_overflow_p (class loop *, tree, tree, tree);
 extern tree compute_overall_effect_of_inner_loop (class loop *, tree);
+extern void set_scev_analyze_openacc_calls (bool);
+extern bool oacc_call_analyzable_p (gimple);
+extern bool oacc_call_analyzable_p (tree);

 /* Returns the basic block preceding LOOP, or the CFG entry block when
    the loop is function's body.  */
diff --git a/gcc/tree-ssa-dce.c b/gcc/tree-ssa-dce.c
index 1281e67489c0..132e17251de0 100644
--- a/gcc/tree-ssa-dce.c
+++ b/gcc/tree-ssa-dce.c
@@ -242,6 +242,26 @@ mark_stmt_if_obviously_necessary (gimple *stmt, bool aggressive)
            && DECL_IS_REPLACEABLE_OPERATOR_NEW_P (callee))
          return;

+       /* Most, but not all function calls are required.  Function calls that
+          produce no result and have no side effects (i.e. const pure
+          functions) are unnecessary.  */
+       if (gimple_has_side_effects (stmt))
+         {
+           mark_stmt_necessary (stmt, true);
+
+            /* The lhs of the OpenACC loop and reduction calls necessary,
+              cf. the lowering in omp-offload.c. */
+            if (gimple_call_internal_p (stmt, IFN_UNIQUE)
+                || gimple_call_internal_p (stmt, IFN_GOACC_REDUCTION))
+              {
+               tree lhs = gimple_call_lhs (stmt);
+               if (lhs)
+                  mark_operand_necessary (lhs);
+              }
+
+           return;
+         }
+
        /* IFN_GOACC_LOOP calls are necessary in that they are used to
           represent parameter (i.e. step, bound) of a lowered OpenACC
           partitioned loop.  But this kind of partitioned loop might not
@@ -251,6 +271,9 @@ mark_stmt_if_obviously_necessary (gimple *stmt, bool aggressive)
        if (gimple_call_internal_p (stmt, IFN_GOACC_LOOP))
          {
            mark_stmt_necessary (stmt, true);
+           tree lhs = gimple_call_lhs (stmt);
+           gcc_assert (lhs);
+           mark_operand_necessary (lhs);
            return;
          }
        break;
diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index 75109407124f..b2689d348d64 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -2039,6 +2039,9 @@ simplify_replace_tree (tree expr, tree old, tree new_tree,
   return (ret ? (do_fold ? fold (ret) : ret) : expr);
 }

+bool oacc_call_analyzable_p (gimple* stmt);
+tree interpret_gimple_call (class loop *loop, gimple *call);
+
 /* Expand definitions of ssa names in EXPR as long as they are simple
    enough, and return the new expression.  If STOP is specified, stop
    expanding if EXPR equals to it.  */
@@ -2054,6 +2057,9 @@ expand_simple_operations (tree expr, tree stop, hash_map<tree, tree> &cache)
   if (expr == NULL_TREE)
     return expr;

+  if (oacc_call_analyzable_p (expr))
+    expr = interpret_gimple_call (NULL, SSA_NAME_DEF_STMT (expr));
+
   if (is_gimple_min_invariant (expr))
     return expr;

diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c
index 8d5572033f7b..168bd348a6f2 100644
--- a/gcc/tree-ssa-loop.c
+++ b/gcc/tree-ssa-loop.c
@@ -155,6 +155,13 @@ make_pass_tree_loop (gcc::context *ctxt)
 static bool
 gate_oacc_kernels (function *fn)
 {
+  if (param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE)
+    return false;
+
+  gcc_checking_assert (param_openacc_kernels
+                           == OPENACC_KERNELS_DECOMPOSE_PARLOOPS
+                       || param_openacc_kernels == OPENACC_KERNELS_PARLOOPS);
+
   if (!flag_openacc)
     return false;

@@ -323,6 +330,10 @@ public:
   /* opt_pass methods: */
   virtual bool gate (function *)
   {
+    if (param_openacc_kernels != OPENACC_KERNELS_DECOMPOSE_PARLOOPS
+        && param_openacc_kernels != OPENACC_KERNELS_PARLOOPS)
+      return false;
+
     return (optimize
            && flag_openacc
            /* Don't bother doing anything if the program has errors.  */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
index 9392e1d88c58..086645b3ac3d 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
@@ -3,6 +3,8 @@

 /* { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting
    aspects of that functionality.  */
+/* { dg-additional-options "-O2" } for Graphite/"kernels". */
+

 /* See also '../libgomp.oacc-fortran/parallel-dims.f90'.  */

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/kernels-independent.f90 b/libgomp/testsuite/libgomp.oacc-fortran/kernels-independent.f90
index 5a47aca2dba2..f79d01ccc419 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/kernels-independent.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/kernels-independent.f90
@@ -1,5 +1,6 @@
 ! { dg-do run }
 ! { dg-additional-options "-cpp" }
+! { dg-additional-options "-O2" } for Graphite

 #define N (1024 * 512)

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-1.f90
index 37aa0ac4f632..5d35bdf9d6ff 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-1.f90
@@ -1,6 +1,7 @@
 ! Exercise the auto, independent, seq and tile loop clauses inside
 ! kernels regions.

+! { dg-additional-options "-O2" } for Graphite
 ! { dg-do run }

 program loops
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90
index cf1d0e569278..74ee6fde84f8 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90
@@ -1,6 +1,7 @@
 ! { dg-do run }
 ! { dg-additional-options "-fopt-info-omp-all" }
 ! { dg-additional-options "--param=openacc-kernels=decompose" }
+! { dg-additional-options "-O2" } for Graphite

 ! It's only with Tcl 8.5 (released in 2007) that "the variable 'varName'
 ! passed to 'incr' may be unset, and in that case, it will be set to [...]",
--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 21/40] openacc: Add "can_be_parallel" flag info to "graph" dumps
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
                   ` (19 preceding siblings ...)
  2021-12-15 15:54 ` [PATCH 20/40] openacc: Use Graphite for dependence analysis in "kernels" regions Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2021-12-15 15:54 ` [PATCH 22/40] openacc: Remove unused partitioning in "kernels" regions Frederik Harwath
                   ` (18 subsequent siblings)
  39 siblings, 0 replies; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: thomas, rguenther

gcc/ChangeLog:

        * graph.c (oacc_get_fn_attrib): New declaration.
        (find_loop_location): New declaration.
        (draw_cfg_nodes_for_loop): Print value of the
        can_be_parallel flag at the top of loops in OpenACC
        functions.
---
 gcc/graph.c | 35 ++++++++++++++++++++++++-----------
 1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/gcc/graph.c b/gcc/graph.c
index 9acd1d5b95e4..a34356e8a7ec 100644
--- a/gcc/graph.c
+++ b/gcc/graph.c
@@ -192,6 +192,10 @@ draw_cfg_nodes_no_loops (pretty_printer *pp, struct function *fun)
     }
 }

+
+extern tree oacc_get_fn_attrib (tree);
+extern dump_user_location_t find_loop_location (class loop *);
+
 /* Draw all the basic blocks in LOOP.  Print the blocks in breath-first
    order to get a good ranking of the nodes.  This function is recursive:
    It first prints inner loops, then the body of LOOP itself.  */
@@ -206,17 +210,26 @@ draw_cfg_nodes_for_loop (pretty_printer *pp, int funcdef_no,

   if (loop->header != NULL
       && loop->latch != EXIT_BLOCK_PTR_FOR_FN (cfun))
-    pp_printf (pp,
-              "\tsubgraph cluster_%d_%d {\n"
-              "\tstyle=\"filled\";\n"
-              "\tcolor=\"darkgreen\";\n"
-              "\tfillcolor=\"%s\";\n"
-              "\tlabel=\"loop %d\";\n"
-              "\tlabeljust=l;\n"
-              "\tpenwidth=2;\n",
-              funcdef_no, loop->num,
-              fillcolors[(loop_depth (loop) - 1) % 3],
-              loop->num);
+    {
+      pp_printf (pp,
+                 "\tsubgraph cluster_%d_%d {\n"
+                 "\tstyle=\"filled\";\n"
+                 "\tcolor=\"darkgreen\";\n"
+                 "\tfillcolor=\"%s\";\n"
+                 "\tlabel=\"loop %d %s\";\n"
+                 "\tlabeljust=l;\n"
+                 "\tpenwidth=2;\n",
+                 funcdef_no, loop->num,
+                 fillcolors[(loop_depth (loop) - 1) % 3], loop->num,
+                 /* This is only meaningful for loops that have been processed
+                    by Graphite.
+
+                    TODO Use can_be_parallel_valid_p? */
+                 !oacc_get_fn_attrib (cfun->decl)
+                     ? ""
+                     : loop->can_be_parallel ? "(can_be_parallel = true)"
+                                             : "(can_be_parallel = false)");
+    }

   for (class loop *inner = loop->inner; inner; inner = inner->next)
     draw_cfg_nodes_for_loop (pp, funcdef_no, inner);
--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 22/40] openacc: Remove unused partitioning in "kernels" regions
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
                   ` (20 preceding siblings ...)
  2021-12-15 15:54 ` [PATCH 21/40] openacc: Add "can_be_parallel" flag info to "graph" dumps Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2021-12-15 15:54 ` [PATCH 23/40] Add function for printing a single OMP_CLAUSE Frederik Harwath
                   ` (17 subsequent siblings)
  39 siblings, 0 replies; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: thomas

With the old "kernels" handling, unparallelized regions would
get executed with 1x1x1 partitioning even if the user provided
explicit num_gangs, num_workers clauses etc.

This commit restores this behavior by removing unused partitioning
after assigning the parallelism dimensions to loops.

gcc/ChangeLog:

        * omp-offload.c (oacc_remove_unused_partitioning): New function
        for removing partitioning that is not used by any loop.
        (oacc_validate_dims): Call oacc_remove_unused_partitioning and
        enable warnings about unused partitioning.

libgomp/ChangeLog:

        * testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Adjust
        expectations.
---
 gcc/omp-offload.c                             | 51 +++++++++++++++++--
 .../acc_prof-kernels-1.c                      | 18 ++++---
 2 files changed, 58 insertions(+), 11 deletions(-)

diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 2743e90f79a3..392ca56b1f4f 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -1097,6 +1097,39 @@ oacc_parse_default_dims (const char *dims)
   targetm.goacc.validate_dims (NULL_TREE, oacc_min_dims, -2, 0);
 }

+/* Remove parallelism dimensions below LEVEL which are not set in USED
+   from DIMS and emit a warning pointing to the location of FN. */
+
+static void
+oacc_remove_unused_partitioning (tree fn, int *dims, int level, unsigned used)
+{
+
+  bool host_compiler = true;
+#ifdef ACCEL_COMPILER
+  host_compiler = false;
+#endif
+
+  static char const *const axes[] =
+      /* Must be kept in sync with GOMP_DIM enumeration.  */
+      { "gang", "worker", "vector" };
+
+  char removed_partitions[20] = "\0";
+  for (int ix = level >= 0 ? level : 0; ix != GOMP_DIM_MAX; ix++)
+    if (!(used & GOMP_DIM_MASK (ix)) && dims[ix] >= 0)
+      {
+        if (host_compiler)
+          {
+            strcat (removed_partitions, axes[ix]);
+            strcat (removed_partitions, " ");
+          }
+        dims[ix] = -1;
+      }
+  if (removed_partitions[0] != '\0')
+    warning_at (DECL_SOURCE_LOCATION (fn), OPT_Wopenacc_parallelism,
+                "removed %spartitioning from %<kernels%> region",
+                removed_partitions);
+}
+
 /* Validate and update the dimensions for offloaded FN.  ATTRS is the
    raw attribute.  DIMS is an array of dimensions, which is filled in.
    LEVEL is the partitioning level of a routine, or -1 for an offload
@@ -1117,6 +1150,7 @@ oacc_validate_dims (tree fn, tree attrs, int *dims, int level, unsigned used)
   for (ix = 0; ix != GOMP_DIM_MAX; ix++)
     {
       purpose[ix] = TREE_PURPOSE (pos);
+
       tree val = TREE_VALUE (pos);
       dims[ix] = val ? TREE_INT_CST_LOW (val) : -1;
       pos = TREE_CHAIN (pos);
@@ -1126,14 +1160,15 @@ oacc_validate_dims (tree fn, tree attrs, int *dims, int level, unsigned used)
 #ifdef ACCEL_COMPILER
   check = false;
 #endif
+
+  static char const *const axes[] =
+      /* Must be kept in sync with GOMP_DIM enumeration.  */
+      { "gang", "worker", "vector" };
+
   if (check
       && warn_openacc_parallelism
-      && !lookup_attribute ("oacc kernels", DECL_ATTRIBUTES (fn))
-      && !lookup_attribute ("oacc parallel_kernels_graphite", DECL_ATTRIBUTES (fn)))
+      && !lookup_attribute ("oacc kernels", DECL_ATTRIBUTES (fn)))
     {
-      static char const *const axes[] =
-      /* Must be kept in sync with GOMP_DIM enumeration.  */
-       { "gang", "worker", "vector" };
       for (ix = level >= 0 ? level : 0; ix != GOMP_DIM_MAX; ix++)
        if (dims[ix] < 0)
          ; /* Defaulting axis.  */
@@ -1144,14 +1179,20 @@ oacc_validate_dims (tree fn, tree attrs, int *dims, int level, unsigned used)
                      "region contains %s partitioned code but"
                      " is not %s partitioned", axes[ix], axes[ix]);
        else if (!(used & GOMP_DIM_MASK (ix)) && dims[ix] != 1)
+         {
          /* The dimension is explicitly partitioned to non-unity, but
             no use is made within the region.  */
          warning_at (DECL_SOURCE_LOCATION (fn), OPT_Wopenacc_parallelism,
                      "region is %s partitioned but"
                      " does not contain %s partitioned code",
                      axes[ix], axes[ix]);
+          }
     }

+  if (lookup_attribute ("oacc parallel_kernels_graphite",
+                         DECL_ATTRIBUTES (fn)))
+    oacc_remove_unused_partitioning  (fn, dims, level, used);
+
   bool changed = targetm.goacc.validate_dims (fn, dims, level, used);

   /* Default anything left to 1 or a partitioned default.  */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
index ad33f72e2fb6..65c83dce01c9 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
@@ -7,6 +7,8 @@

 #include <acc_prof.h>

+/* { dg-skip-if "'kernels' not analyzed by Graphite at -O0" { *-*-* } { "-O0" } { "" } } */
+/* { dg-additional-options "-Wopenacc-parallelism" } */

 /* Use explicit 'copyin' clauses, to work around "'firstprivate'
    optimizations", which will cause the value at the point of call to be used
@@ -95,12 +97,8 @@ static void cb_enqueue_launch_start (acc_prof_info *prof_info, acc_event_info *e
     assert (event_info->launch_event.num_workers >= 1);
   else
     {
-#ifdef __OPTIMIZE__
-      assert (event_info->launch_event.num_workers == num_workers);
-#else
-      /* See 'num_gangs' above.  */
-      assert (event_info->launch_event.num_workers == 1);
-#endif
+      /* Unused partitioning levels get removed from "kernels" region. */
+      assert (event_info->launch_event.num_workers == real_num_workers);
     }
   if (vector_length < 1)
     assert (event_info->launch_event.vector_length >= 1);
@@ -183,6 +181,7 @@ int main()
   STATE_OP (state, = 0);
   num_gangs = 30;
   num_workers = 3;
+  real_num_workers = 1;
   vector_length = 5;
   {
 #define N 100
@@ -192,6 +191,8 @@ int main()
     /* { dg-prune-output "using vector_length \\(32\\), ignoring 5" } */
     {
       for (int i = 0; i < N; ++i)
+      /* { dg-warning "region is worker partitioned but does not contain worker partitioned code" "" { target *-*-* } .-1 } */
+      /* { dg-warning "removed worker partitioning from 'kernels' region" "" { target *-*-* } .-2 } */
        x[i] = i * i;
     }
     if (acc_device_type == acc_device_host)
@@ -208,6 +209,9 @@ int main()
   STATE_OP (state, = 0);
   num_gangs = 22;
   num_workers = 5;
+  /* No worker loop and hence, in a kernels region, worker partitioning
+     should be removed. */
+  real_num_workers = 1;
   vector_length = 7;
   {
 #define N 100
@@ -217,6 +221,8 @@ int main()
     /* { dg-prune-output "using vector_length \\(32\\), ignoring runtime setting" } */
     {
       for (int i = 0; i < N; ++i)
+      /* { dg-warning "region is worker partitioned but does not contain worker partitioned code" "" { target *-*-* } .-1 } */
+      /* { dg-warning "removed worker partitioning from 'kernels' region" "" { target *-*-* } .-2 } */
        x[i] = i * i;
     }
     if (acc_device_type == acc_device_host)
--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 23/40] Add function for printing a single OMP_CLAUSE
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
                   ` (21 preceding siblings ...)
  2021-12-15 15:54 ` [PATCH 22/40] openacc: Remove unused partitioning in "kernels" regions Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2021-12-15 15:54 ` [PATCH 24/40] openacc: Add data optimization pass Frederik Harwath
                   ` (16 subsequent siblings)
  39 siblings, 0 replies; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: thomas, rguenther

Commit 89f4f339130c ("For 'OMP_CLAUSE' in 'dump_generic_node', dump
the whole OMP clause chain") changed the dumping behavior for
OMP_CLAUSEs.  The old behavior is required for a follow-up
commit ("openacc: Add data optimization pass") that optimizes single
OMP_CLAUSEs.

gcc/ChangeLog:

        * tree-pretty-print.c (print_omp_clause_to_str): Add new function.
        * tree-pretty-print.h (print_omp_clause_to_str): Add declaration.
---
 gcc/tree-pretty-print.c | 11 +++++++++++
 gcc/tree-pretty-print.h |  1 +
 2 files changed, 12 insertions(+)

diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index 275dc7d8af73..e85370cfe722 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -1360,6 +1360,17 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, dump_flags_t flags)
     }
 }

+/* Print the single clause at the top of the clause chain C to a string and
+   return it. Note that print_generic_expr_to_str prints the whole clause chain
+   instead. The caller must free the returned memory. */
+
+char *
+print_omp_clause_to_str (tree c)
+{
+  pretty_printer pp;
+  dump_omp_clause (&pp, c, 0, TDF_VOPS|TDF_MEMSYMS);
+  return xstrdup (pp_formatted_text (&pp));
+}

 /* Dump chain of OMP clauses.

diff --git a/gcc/tree-pretty-print.h b/gcc/tree-pretty-print.h
index dacd256302b2..f9ff0ee1ce0b 100644
--- a/gcc/tree-pretty-print.h
+++ b/gcc/tree-pretty-print.h
@@ -41,6 +41,7 @@ extern void print_generic_expr (FILE *, tree, dump_flags_t = TDF_NONE);
 extern char *print_generic_expr_to_str (tree);
 extern void dump_omp_clauses (pretty_printer *, tree, int, dump_flags_t,
                              bool = true);
+extern char *print_omp_clause_to_str (tree);
 extern void dump_omp_atomic_memory_order (pretty_printer *,
                                          enum omp_memory_order);
 extern void dump_omp_loop_non_rect_expr (pretty_printer *, tree, int,
--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 24/40] openacc: Add data optimization pass
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
                   ` (22 preceding siblings ...)
  2021-12-15 15:54 ` [PATCH 23/40] Add function for printing a single OMP_CLAUSE Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2021-12-15 15:54 ` [PATCH 25/40] openacc: Add runtime alias checking for OpenACC kernels Frederik Harwath
                   ` (15 subsequent siblings)
  39 siblings, 0 replies; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: Andrew Stubbs, thomas, rguenther

From: Andrew Stubbs <ams@codesourcery.com>

Address PR90591 "Avoid unnecessary data transfer out of OMP
construct", for simple (but common) cases.

This commit adds a pass that optimizes data mapping clauses.
Currently, it can optimize copy/map(tofrom) clauses involving scalars
to copyin/map(to) and further to "private".  The pass is restricted
"kernels" regions but could be extended to other types of regions.

gcc/ChangeLog:

        * Makefile.in: Add pass.
        * doc/gimple.texi: TODO.
        * gimple-walk.c (walk_gimple_seq_mod): Adjust for backward walking.
        * gimple-walk.h (struct walk_stmt_info): Add field.
        * passes.def: Add new pass.
        * tree-pass.h (make_pass_omp_data_optimize): New declaration.
        * omp-data-optimize.cc: New file.

libgomp/ChangeLog:

        * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
        Expect optimization messages.
        * testsuite/libgomp.oacc-fortran/pr94358-1.f90: Likewise.

gcc/testsuite/ChangeLog:

        * c-c++-common/goacc/uninit-copy-clause.c: Likewise.
        * gfortran.dg/goacc/uninit-copy-clause.f95: Likewise.
        * c-c++-common/goacc/omp_data_optimize-1.c: New test.
        * g++.dg/goacc/omp_data_optimize-1.C: New test.
        * gfortran.dg/goacc/omp_data_optimize-1.f90: New test.

Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
---
 gcc/Makefile.in                               |   1 +
 gcc/doc/gimple.texi                           |   2 +
 gcc/gimple-walk.c                             |  15 +-
 gcc/gimple-walk.h                             |   6 +
 gcc/omp-data-optimize.cc                      | 951 ++++++++++++++++++
 gcc/passes.def                                |   1 +
 .../c-c++-common/goacc/omp_data_optimize-1.c  | 677 +++++++++++++
 .../c-c++-common/goacc/uninit-copy-clause.c   |   6 +
 .../g++.dg/goacc/omp_data_optimize-1.C        | 169 ++++
 .../gfortran.dg/goacc/omp_data_optimize-1.f90 | 588 +++++++++++
 .../gfortran.dg/goacc/uninit-copy-clause.f95  |   2 +
 gcc/tree-pass.h                               |   1 +
 .../kernels-decompose-1.c                     |   2 +
 .../libgomp.oacc-fortran/pr94358-1.f90        |   4 +
 14 files changed, 2422 insertions(+), 3 deletions(-)
 create mode 100644 gcc/omp-data-optimize.cc
 create mode 100644 gcc/testsuite/c-c++-common/goacc/omp_data_optimize-1.c
 create mode 100644 gcc/testsuite/g++.dg/goacc/omp_data_optimize-1.C
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/omp_data_optimize-1.f90

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index debd8047cc85..e876e6ec993c 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1515,6 +1515,7 @@ OBJS = \
        omp-oacc-kernels-decompose.o \
        omp-oacc-neuter-broadcast.o \
        omp-simd-clone.o \
+       omp-data-optimize.o \
        opt-problem.o \
        optabs.o \
        optabs-libfuncs.o \
diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi
index 5d89dbcc68d5..c8f0b8b2a826 100644
--- a/gcc/doc/gimple.texi
+++ b/gcc/doc/gimple.texi
@@ -2770,4 +2770,6 @@ calling @code{walk_gimple_stmt} on each one.  @code{WI} is as in
 @code{walk_gimple_stmt}.  If @code{walk_gimple_stmt} returns non-@code{NULL}, the walk
 is stopped and the value returned.  Otherwise, all the statements
 are walked and @code{NULL_TREE} returned.
+
+TODO update for forward vs. backward.
 @end deftypefn
diff --git a/gcc/gimple-walk.c b/gcc/gimple-walk.c
index e15fd4697ba1..b6add4394ab2 100644
--- a/gcc/gimple-walk.c
+++ b/gcc/gimple-walk.c
@@ -32,6 +32,8 @@ along with GCC; see the file COPYING3.  If not see
 /* Walk all the statements in the sequence *PSEQ calling walk_gimple_stmt
    on each one.  WI is as in walk_gimple_stmt.

+   TODO update for forward vs. backward.
+
    If walk_gimple_stmt returns non-NULL, the walk is stopped, and the
    value is stored in WI->CALLBACK_RESULT.  Also, the statement that
    produced the value is returned if this statement has not been
@@ -44,9 +46,10 @@ gimple *
 walk_gimple_seq_mod (gimple_seq *pseq, walk_stmt_fn callback_stmt,
                     walk_tree_fn callback_op, struct walk_stmt_info *wi)
 {
-  gimple_stmt_iterator gsi;
+  bool forward = !(wi && wi->backward);

-  for (gsi = gsi_start (*pseq); !gsi_end_p (gsi); )
+  gimple_stmt_iterator gsi = forward ? gsi_start (*pseq) : gsi_last (*pseq);
+  for (; !gsi_end_p (gsi); )
     {
       tree ret = walk_gimple_stmt (&gsi, callback_stmt, callback_op, wi);
       if (ret)
@@ -60,7 +63,13 @@ walk_gimple_seq_mod (gimple_seq *pseq, walk_stmt_fn callback_stmt,
        }

       if (!wi->removed_stmt)
-       gsi_next (&gsi);
+       {
+         if (forward)
+           gsi_next (&gsi);
+         else //TODO Correct?  <http://mid.mail-archive.com/CAFiYyc1rFrh1tnCBgKWwLrCpkpLQ4_pXCT8K+dai2UtC0XezKQ@mail.gmail.com>
+           gsi_prev (&gsi);
+         //TODO This could do with some unit testing (see other 'gcc/*-tests.c' files for inspiration), to make sure all the corner cases (removing first/last, for example) work correctly.
+       }
     }

   if (wi)
diff --git a/gcc/gimple-walk.h b/gcc/gimple-walk.h
index f471f10088df..4ebc71d73ddf 100644
--- a/gcc/gimple-walk.h
+++ b/gcc/gimple-walk.h
@@ -71,6 +71,12 @@ struct walk_stmt_info

   /* True if we've removed the statement that was processed.  */
   BOOL_BITFIELD removed_stmt : 1;
+
+  /*TODO True if we're walking backward instead of forward.  */
+  //TODO This flag is only applicable for 'walk_gimple_seq'.
+  //TODO Instead of this somewhat mis-placed (?) flag here, may be able to factor out the walking logic woult of 'walk_gimple_stmt', and do the backward walking in a separate function?
+  //TODO <http://mid.mail-archive.com/874kh863d6.fsf@euler.schwinge.homeip.net>
+  BOOL_BITFIELD backward : 1;
 };

 /* Callback for walk_gimple_stmt.  Called for every statement found
diff --git a/gcc/omp-data-optimize.cc b/gcc/omp-data-optimize.cc
new file mode 100644
index 000000000000..31f615c1d2bd
--- /dev/null
+++ b/gcc/omp-data-optimize.cc
@@ -0,0 +1,951 @@
+/* OMP data optimize
+
+   Copyright (C) 2021 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* This pass tries to optimize OMP data movement.
+
+   The purpose is two-fold: (1) simply avoid redundant data movement, and (2)
+   as an enabler for other compiler optimizations.
+
+   Currently, the focus is on OpenACC 'kernels' constructs, but this may be
+   done more generally later: other compute constructs, but also structured
+   'data' constructs, for example.
+
+   Currently, this implements:
+    - Convert "copy/map(tofrom)" to "copyin/map(to)", where the variable is
+      known to be dead on exit.
+    - Further optimize to "private" where the variable is also known to be
+      dead on entry.
+
+   Future improvements may include:
+    - Optimize mappings that do not start as "copy/map(tofrom)".
+    - Optimize mappings to "copyout/map(from)" where the variable is dead on
+      entry, but not exit.
+    - Improved data liveness checking.
+    - Etc.
+
+   As long as we make sure to not violate user-expected OpenACC semantics, we
+   may do "anything".
+
+   The pass runs too early to use the full data flow analysis tools, so this
+   uses some simplified rules.  The analysis could certainly be improved.
+
+   A variable is dead on exit if
+    1. Nothing reads it between the end of the target region and the end
+       of the function.
+    2. It is not global, static, external, or otherwise persistent.
+    3. It is not addressable (and therefore cannot be aliased).
+    4. There are no backward jumps following the target region (and therefore
+       there can be no loop around the target region).
+
+   A variable is dead on entry if the first occurrence of the variable within
+   the target region is a write.  The algorithm attempts to check all possible
+   code paths, but may give up where control flow is too complex. No attempt
+   is made to evaluate conditionals, so it is likely that it will miss cases
+   where the user might declare private manually.
+
+   Future improvements:
+    1. Allow backward jumps (loops) where the target is also after the end of
+       the target region.
+    2. Detect dead-on-exit variables when there is a write following the
+       target region (tricky, in the presence of conditionals).
+    3. Ignore reads in the "else" branch of conditionals where the target
+       region is in the "then" branch.
+    4. Optimize global/static/external variables that are provably dead on
+       entry or exit.
+   (Most of this can be achieved by unifying the two DF algorithms in this
+   file; the one for scanning inside the target regions had to be made more
+   capable, with propagation of live state across blocks, but that's more
+   effort than I have time right now to do the rework.)
+*/
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tree-pass.h"
+#include "options.h"
+#include "tree.h"
+#include "function.h"
+#include "basic-block.h"
+#include "gimple.h"
+#include "gimplify.h"
+#include "gimple-iterator.h"
+#include "gimple-walk.h"
+#include "gomp-constants.h"
+#include "gimple-pretty-print.h"
+
+#define DUMP_LOC(STMT) \
+  dump_user_location_t::from_location_t (OMP_CLAUSE_LOCATION (STMT))
+
+/* These types track why we could *not* optimize a variable mapping.  The
+   main reason for differentiating the different reasons is diagnostics.  */
+
+enum inhibit_kinds {
+  INHIBIT_NOT, // "optimize"
+  INHIBIT_USE,
+  INHIBIT_JMP,
+  INHIBIT_BAD
+};
+
+struct inhibit_descriptor
+{
+  enum inhibit_kinds kind;
+  gimple *stmt;
+};
+
+/* OMP Data Optimize walk state tables.  */
+struct ODO_State {
+  hash_map<tree, inhibit_descriptor> candidates;
+  hash_set<tree> visited_labels;
+  bool lhs_scanned;
+};
+
+/* These types track whether a variable can be full private, or not.
+
+   These are ORDERED in ascending precedence; when combining two values
+   (at a conditional or switch), the higher value is used.   */
+
+enum access_kinds {
+  ACCESS_NONE,      /* Variable not accessed.  */
+  ACCESS_DEF_FIRST, /* Variable is defined before use.  */
+  ACCESS_UNKNOWN,   /* Status is yet to be determined.  */
+  ACCESS_UNSUPPORTED, /* Variable is array or reference.  */
+  ACCESS_USE_FIRST  /* Variable is used without definition (live on entry).  */
+};
+
+struct ODO_BB {
+  access_kinds access;
+  gimple *foot_stmt;
+};
+
+struct ODO_Target_state {
+  tree var;
+
+  const void *bb_id;  /* A unique id for the BB (use a convenient pointer).  */
+  ODO_BB bb;
+  bool lhs_scanned;
+  bool can_short_circuit;
+
+  hash_map<const void*,ODO_BB> scanned_bb;
+};
+
+/* Classify a newly discovered variable, and add it to the candidate list.  */
+
+static void
+omp_data_optimize_add_candidate (const dump_user_location_t &loc, tree var,
+                                ODO_State *state)
+{
+  inhibit_descriptor in;
+  in.stmt = NULL;
+
+  if (DECL_EXTERNAL (var))
+    {
+      if (dump_enabled_p () && dump_flags & TDF_DETAILS)
+       dump_printf_loc (MSG_NOTE, loc,
+                        " -> unsuitable variable: %<%T%> is external\n", var);
+
+      in.kind = INHIBIT_BAD;
+    }
+  else if (TREE_STATIC (var))
+    {
+      if (dump_enabled_p () && dump_flags & TDF_DETAILS)
+       dump_printf_loc (MSG_NOTE, loc,
+                        " -> unsuitable variable: %<%T%> is static\n", var);
+
+      in.kind = INHIBIT_BAD;
+    }
+  else if (TREE_ADDRESSABLE (var))
+    {
+      if (dump_enabled_p () && dump_flags & TDF_DETAILS)
+       dump_printf_loc (MSG_NOTE, loc,
+                        " -> unsuitable variable: %<%T%> is addressable\n",
+                        var);
+
+      in.kind = INHIBIT_BAD;
+    }
+  else
+    {
+      if (dump_enabled_p () && dump_flags & TDF_DETAILS)
+       dump_printf_loc (MSG_NOTE, loc, " -> candidate variable: %<%T%>\n",
+                        var);
+
+      in.kind = INHIBIT_NOT;
+    }
+
+  if (state->candidates.put (var, in))
+    gcc_unreachable ();
+}
+
+/* Add all the variables in a gimple bind statement to the list of
+   optimization candidates.  */
+
+static void
+omp_data_optimize_stmt_bind (const gbind *bind, ODO_State *state)
+{
+  if (dump_enabled_p () && dump_flags & TDF_DETAILS)
+    dump_printf_loc (MSG_NOTE, bind, "considering scope\n");
+
+  tree vars = gimple_bind_vars (bind);
+  for (tree var = vars; var; var = TREE_CHAIN (var))
+    omp_data_optimize_add_candidate (bind, var, state);
+}
+
+/* Assess a control flow statement to see if it prevents us from optimizing
+   OMP variable mappings.  A conditional jump usually won't, but a loop
+   means a much more complicated liveness algorithm than this would be needed
+   to reason effectively.  */
+
+static void
+omp_data_optimize_stmt_jump (gimple *stmt, ODO_State *state)
+{
+  /* In the general case, in presence of looping/control flow, we cannot make
+     any promises about (non-)uses of 'var's -- so we have to inhibit
+     optimization.  */
+  if (dump_enabled_p () && dump_flags & TDF_DETAILS)
+    dump_printf_loc (MSG_NOTE, stmt, "loop/control encountered: %G\n", stmt);
+
+  bool forward = false;
+  switch (gimple_code (stmt))
+    {
+    case GIMPLE_COND:
+      if (state->visited_labels.contains (gimple_cond_true_label
+                                         (as_a <gcond*> (stmt)))
+         && state->visited_labels.contains (gimple_cond_false_label
+                                            (as_a <gcond*> (stmt))))
+       forward = true;
+      break;
+    case GIMPLE_GOTO:
+      if (state->visited_labels.contains (gimple_goto_dest
+                                         (as_a <ggoto*> (stmt))))
+       forward = true;
+      break;
+    case GIMPLE_SWITCH:
+       {
+         gswitch *sw = as_a <gswitch*> (stmt);
+         forward = true;
+         for (unsigned i = 0; i < gimple_switch_num_labels (sw); i++)
+           if (!state->visited_labels.contains (CASE_LABEL
+                                                (gimple_switch_label (sw,
+                                                                      i))))
+             {
+               forward = false;
+               break;
+             }
+         break;
+       }
+    case GIMPLE_ASM:
+       {
+         gasm *asm_stmt = as_a <gasm*> (stmt);
+         forward = true;
+         for (unsigned i = 0; i < gimple_asm_nlabels (asm_stmt); i++)
+           if (!state->visited_labels.contains (TREE_VALUE
+                                                (gimple_asm_label_op
+                                                 (asm_stmt, i))))
+             {
+               forward = false;
+               break;
+             }
+         break;
+       }
+    default:
+      gcc_unreachable ();
+    }
+  if (forward)
+    {
+      if (dump_enabled_p () && dump_flags & TDF_DETAILS)
+       dump_printf_loc (MSG_NOTE, stmt,
+                        " -> forward jump; candidates remain valid\n");
+
+      return;
+    }
+
+  /* If we get here then control flow has invalidated all current optimization
+     candidates.  */
+  for (hash_map<tree, inhibit_descriptor>::iterator it = state->candidates.begin ();
+       it != state->candidates.end ();
+       ++it)
+    {
+      if ((*it).second.kind == INHIBIT_BAD)
+       continue;
+
+      if (dump_enabled_p () && dump_flags & TDF_DETAILS)
+       dump_printf_loc (MSG_NOTE, stmt, " -> discarding candidate: %T\n",
+                        (*it).first);
+
+      /* We're walking backward: this earlier instance ("earlier" in
+        'gimple_seq' forward order) overrides what we may have had before.  */
+      (*it).second.kind = INHIBIT_JMP;
+      (*it).second.stmt = stmt;
+    }
+}
+
+/* A helper callback for omp_data_optimize_can_be_private.
+   Check if an operand matches the specific one we're looking for, and
+   assess the context in which it appears.  */
+
+static tree
+omp_data_optimize_scan_target_op (tree *tp, int *walk_subtrees, void *data)
+{
+  struct walk_stmt_info *wi = (struct walk_stmt_info *) data;
+  ODO_Target_state *state = (ODO_Target_state *)wi->info;
+  tree op = *tp;
+
+  if (wi->is_lhs && !state->lhs_scanned
+      && state->bb.access != ACCESS_USE_FIRST)
+    {
+      /* We're at the top level of the LHS operand.  Anything we scan inside
+        (array indices etc.) should be treated as RHS.  */
+      state->lhs_scanned = 1;
+
+      /* Writes to arrays and references are unhandled, as yet.  */
+      tree base = get_base_address (op);
+      if (base && base != op && base == state->var)
+       {
+         state->bb.access = ACCESS_UNSUPPORTED;
+         *walk_subtrees = 0;
+       }
+      /* Write to scalar variable.  */
+      else if (op == state->var)
+       {
+         state->bb.access = ACCESS_DEF_FIRST;
+         *walk_subtrees = 0;
+       }
+    }
+  else if (op == state->var)
+    {
+      state->bb.access = ACCESS_USE_FIRST;
+      *walk_subtrees = 0;
+    }
+  return NULL;
+}
+
+/* A helper callback for omp_data_optimize_can_be_private, this assesses a
+   statement inside a target region to see how it affects the data flow of the
+   operands.  A set of basic blocks is recorded, each with the observed access
+   details for the given variable.  */
+
+static tree
+omp_data_optimize_scan_target_stmt (gimple_stmt_iterator *gsi_p,
+                                   bool *handled_ops_p,
+                                   struct walk_stmt_info *wi)
+{
+  ODO_Target_state *state = (ODO_Target_state *) wi->info;
+  gimple *stmt = gsi_stmt (*gsi_p);
+
+  /* If an access was found in the previous statement then we're done.  */
+  if (state->bb.access != ACCESS_NONE && state->can_short_circuit)
+    {
+      *handled_ops_p = true;
+      return (tree)1;  /* Return non-NULL, otherwise ignored.  */
+    }
+
+  /* If the first def/use is already found then don't check more operands.  */
+  *handled_ops_p = state->bb.access != ACCESS_NONE;
+
+  switch (gimple_code (stmt))
+    {
+    /* These will be the last statement in a basic block, and will always
+       be followed by a label or the end of scope.  */
+    case GIMPLE_COND:
+    case GIMPLE_GOTO:
+    case GIMPLE_SWITCH:
+      if (state->bb.access == ACCESS_NONE)
+       state->bb.access = ACCESS_UNKNOWN;
+      state->bb.foot_stmt = stmt;
+      state->can_short_circuit = false;
+      break;
+
+    /* asm goto statements are not necessarily followed by a label.  */
+    case GIMPLE_ASM:
+      if (gimple_asm_nlabels (as_a <gasm*> (stmt)) > 0)
+       {
+         if (state->bb.access == ACCESS_NONE)
+           state->bb.access = ACCESS_UNKNOWN;
+         state->bb.foot_stmt = stmt;
+         state->scanned_bb.put (state->bb_id, state->bb);
+
+         /* Start a new fake BB using the asm string as a unique id.  */
+         state->bb_id = gimple_asm_string (as_a <gasm*> (stmt));
+         state->bb.access = ACCESS_NONE;
+         state->bb.foot_stmt = NULL;
+         state->can_short_circuit = false;
+       }
+      break;
+
+    /* A label is the beginning of a new basic block, and possibly the end
+       of the previous, in the case of a fall-through.  */
+    case GIMPLE_LABEL:
+      if (state->bb.foot_stmt == NULL)
+       state->bb.foot_stmt = stmt;
+      if (state->bb.access == ACCESS_NONE)
+       state->bb.access = ACCESS_UNKNOWN;
+      state->scanned_bb.put (state->bb_id, state->bb);
+
+      state->bb_id = gimple_label_label (as_a <glabel*> (stmt));
+      state->bb.access = ACCESS_NONE;
+      state->bb.foot_stmt = NULL;
+      break;
+
+    /* These should not occur inside target regions??  */
+    case GIMPLE_RETURN:
+      gcc_unreachable ();
+
+    default:
+      break;
+    }
+
+  /* Now walk the operands.  */
+  state->lhs_scanned = false;
+  return NULL;
+}
+
+/* Check every operand under a gimple statement to see if a specific variable
+   is dead on entry to an OMP TARGET statement.  If so, then we can make the
+   variable mapping PRIVATE.  */
+
+static bool
+omp_data_optimize_can_be_private (tree var, gimple *target_stmt)
+{
+  ODO_Target_state state;
+  state.var = var;
+  void *root_id = var;  /* Any non-null pointer will do for the unique ID.  */
+  state.bb_id = root_id;
+  state.bb.access = ACCESS_NONE;
+  state.bb.foot_stmt = NULL;
+  state.lhs_scanned = false;
+  state.can_short_circuit = true;
+
+  struct walk_stmt_info wi;
+  memset (&wi, 0, sizeof (wi));
+  wi.info = &state;
+
+  /* Walk the target region and build the BB list.  */
+  gimple_seq target_body = *gimple_omp_body_ptr (target_stmt);
+  walk_gimple_seq (target_body, omp_data_optimize_scan_target_stmt,
+                  omp_data_optimize_scan_target_op, &wi);
+
+  /* Calculate the liveness data for the whole region.  */
+  if (state.can_short_circuit)
+    ; /* state.access has the answer already.  */
+  else
+    {
+      /* There's some control flow to navigate.  */
+
+      /* First enter the final BB into the table.  */
+      state.scanned_bb.put (state.bb_id, state.bb);
+
+      /* Propagate the known access findings to the parent BBs.
+
+        For each BB that does not have a known liveness value, combine
+        the liveness data from its descendent BBs, if known.  Repeat until
+        there are no more changes to make.  */
+      bool changed;
+      do {
+       changed = false;
+       for (hash_map<const void*,ODO_BB>::iterator it = state.scanned_bb.begin ();
+            it != state.scanned_bb.end ();
+            ++it)
+         {
+           ODO_BB *bb = &(*it).second;
+           tree label;
+           const void *bb_id1, *bb_id2;
+           ODO_BB *chain_bb1, *chain_bb2;
+           unsigned num_labels;
+
+           /* The foot statement is NULL, in the exit block.
+              Blocks that already have liveness data are done.  */
+           if (bb->foot_stmt == NULL
+               || bb->access != ACCESS_UNKNOWN)
+             continue;
+
+           /* If we get here then bb->access == ACCESS_UNKNOWN.  */
+           switch (gimple_code (bb->foot_stmt))
+             {
+             /* If the final statement of a block is the label statement
+                then we have a fall-through.  The liveness data can be simply
+                copied from the next block.  */
+             case GIMPLE_LABEL:
+               bb_id1 = gimple_label_label (as_a <glabel*> (bb->foot_stmt));
+               chain_bb1 = state.scanned_bb.get (bb_id1);
+               if (chain_bb1->access != ACCESS_UNKNOWN)
+                 {
+                   bb->access = chain_bb1->access;
+                   changed = true;
+                 }
+               break;
+
+             /* Combine the liveness data from both branches of a conditional
+                statement.  The access values are ordered such that the
+                higher value takes precedence.  */
+             case GIMPLE_COND:
+               bb_id1 = gimple_cond_true_label (as_a <gcond*>
+                                                (bb->foot_stmt));
+               bb_id2 = gimple_cond_false_label (as_a <gcond*>
+                                                 (bb->foot_stmt));
+               chain_bb1 = state.scanned_bb.get (bb_id1);
+               chain_bb2 = state.scanned_bb.get (bb_id2);
+               bb->access = (chain_bb1->access > chain_bb2->access
+                             ? chain_bb1->access
+                             : chain_bb2->access);
+               if (bb->access != ACCESS_UNKNOWN)
+                 changed = true;
+               break;
+
+             /* Copy the liveness data from the destination block.  */
+             case GIMPLE_GOTO:
+               bb_id1 = gimple_goto_dest (as_a <ggoto*> (bb->foot_stmt));
+               chain_bb1 = state.scanned_bb.get (bb_id1);
+               if (chain_bb1->access != ACCESS_UNKNOWN)
+                 {
+                   bb->access = chain_bb1->access;
+                   changed = true;
+                 }
+               break;
+
+             /* Combine the liveness data from all the branches of a switch
+                statement.  The access values are ordered such that the
+                highest value takes precedence.  */
+             case GIMPLE_SWITCH:
+               num_labels = gimple_switch_num_labels (as_a <gswitch*>
+                                                      (bb->foot_stmt));
+               bb->access = ACCESS_NONE;  /* Lowest precedence value.  */
+               for (unsigned i = 0; i < num_labels; i++)
+                 {
+                   label = gimple_switch_label (as_a <gswitch*>
+                                                (bb->foot_stmt), i);
+                   chain_bb1 = state.scanned_bb.get (CASE_LABEL (label));
+                   bb->access = (bb->access > chain_bb1->access
+                                 ? bb->access
+                                 : chain_bb1->access);
+                 }
+               if (bb->access != ACCESS_UNKNOWN)
+                 changed = true;
+               break;
+
+             /* Combine the liveness data from all the branches of an asm goto
+                statement.  The access values are ordered such that the
+                highest value takes precedence.  */
+             case GIMPLE_ASM:
+               num_labels = gimple_asm_nlabels (as_a <gasm*> (bb->foot_stmt));
+               bb->access = ACCESS_NONE;  /* Lowest precedence value.  */
+               /* Loop through all the labels and the fall-through block.  */
+               for (unsigned i = 0; i < num_labels + 1; i++)
+                 {
+                   if (i < num_labels)
+                     bb_id1 = TREE_VALUE (gimple_asm_label_op
+                                          (as_a <gasm*> (bb->foot_stmt), i));
+                   else
+                     /* The fall-through fake-BB uses the string for an ID. */
+                     bb_id1 = gimple_asm_string (as_a <gasm*>
+                                                 (bb->foot_stmt));
+                   chain_bb1 = state.scanned_bb.get (bb_id1);
+                   bb->access = (bb->access > chain_bb1->access
+                                 ? bb->access
+                                 : chain_bb1->access);
+                 }
+               if (bb->access != ACCESS_UNKNOWN)
+                 changed = true;
+               break;
+
+             /* No other statement kinds should appear as foot statements.  */
+             default:
+               gcc_unreachable ();
+             }
+         }
+      } while (changed);
+
+      /* The access status should now be readable from the initial BB,
+        if one could be determined.  */
+      state.bb = *state.scanned_bb.get (root_id);
+    }
+
+  if (dump_enabled_p () && dump_flags & TDF_DETAILS)
+    {
+      for (hash_map<const void*,ODO_BB>::iterator it = state.scanned_bb.begin ();
+          it != state.scanned_bb.end ();
+          ++it)
+       {
+         ODO_BB *bb = &(*it).second;
+         dump_printf_loc (MSG_NOTE, bb->foot_stmt,
+                          "%<%T%> is %s on entry to block ending here\n", var,
+                          (bb->access == ACCESS_NONE
+                           || bb->access == ACCESS_DEF_FIRST ? "dead"
+                           : bb->access == ACCESS_USE_FIRST ? "live"
+                           : bb->access == ACCESS_UNSUPPORTED
+                           ? "unknown (unsupported op)"
+                           : "unknown (complex control flow)"));
+       }
+      /* If the answer was found early then then the last BB to be scanned
+        will not have been entered into the table.  */
+      if (state.can_short_circuit)
+       dump_printf_loc (MSG_NOTE, target_stmt,
+                        "%<%T%> is %s on entry to target region\n", var,
+                        (state.bb.access == ACCESS_NONE
+                         || state.bb.access == ACCESS_DEF_FIRST ? "dead"
+                         : state.bb.access == ACCESS_USE_FIRST ? "live"
+                         : state.bb.access == ACCESS_UNSUPPORTED
+                         ? "unknown (unsupported op)"
+                         : "unknown (complex control flow)"));
+    }
+
+  if (state.bb.access != ACCESS_DEF_FIRST
+      && dump_enabled_p () && dump_flags & TDF_DETAILS)
+    dump_printf_loc (MSG_NOTE, target_stmt, "%<%T%> is not suitable"
+                    " for private optimization; %s\n", var,
+                    (state.bb.access == ACCESS_USE_FIRST
+                     ? "live on entry"
+                     : state.bb.access == ACCESS_UNKNOWN
+                     ? "complex control flow"
+                     : "unknown reason"));
+
+  return state.bb.access == ACCESS_DEF_FIRST;
+}
+
+/* Inspect a tree operand, from a gimple walk, and check to see if it is a
+   variable use that might mean the variable is not a suitable candidate for
+   optimization in a prior target region.
+
+   This algorithm is very basic and can be easily fooled by writes with
+   subsequent reads, but it should at least err on the safe side.  */
+
+static void
+omp_data_optimize_inspect_op (tree op, ODO_State *state, bool is_lhs,
+                             gimple *stmt)
+{
+  if (is_lhs && !state->lhs_scanned)
+    {
+      /* We're at the top level of the LHS operand.
+         Anything we scan inside should be treated as RHS.  */
+      state->lhs_scanned = 1;
+
+      /* Writes to variables are not yet taken into account, beyond not
+        invalidating the optimization, but not everything on the
+        left-hand-side is a write (array indices, etc.), and if one element of
+        an array is written to then we should assume the rest is live.  */
+      tree base = get_base_address (op);
+      if (base && base == op)
+       return;  /* Writes to scalars are not a "use".  */
+    }
+
+  if (!DECL_P (op))
+    return;
+
+  /* If we get here then we have found a use of a variable.  */
+  tree var = op;
+
+  inhibit_descriptor *id = state->candidates.get (var);
+  if (id && id->kind != INHIBIT_BAD)
+    {
+      if (dump_enabled_p () && dump_flags & TDF_DETAILS)
+       {
+         if (gimple_code (stmt) == GIMPLE_OMP_TARGET)
+           dump_printf_loc (MSG_NOTE, id->stmt,
+                            "encountered variable use in target stmt\n");
+         else
+           dump_printf_loc (MSG_NOTE, id->stmt,
+                            "encountered variable use: %G\n", stmt);
+         dump_printf_loc (MSG_NOTE, id->stmt,
+                          " -> discarding candidate: %T\n", op);
+       }
+
+      /* We're walking backward: this earlier instance ("earlier" in
+        'gimple_seq' forward order) overrides what we may have had before.  */
+      id->kind = INHIBIT_USE;
+      id->stmt = stmt;
+    }
+}
+
+/* Optimize the data mappings of a target region, where our backward gimple
+   walk has identified that the variable is definitely dead on exit.  */
+
+static void
+omp_data_optimize_stmt_target (gimple *stmt, ODO_State *state)
+{
+  for (tree *pc = gimple_omp_target_clauses_ptr (stmt); *pc;
+       pc = &OMP_CLAUSE_CHAIN (*pc))
+    {
+      if (OMP_CLAUSE_CODE (*pc) != OMP_CLAUSE_MAP)
+       continue;
+
+      tree var = OMP_CLAUSE_DECL (*pc);
+      if (OMP_CLAUSE_MAP_KIND (*pc) == GOMP_MAP_FORCE_TOFROM
+         || OMP_CLAUSE_MAP_KIND (*pc) == GOMP_MAP_TOFROM)
+       {
+       /* The dump_printf_loc format code %T does not print
+          the head clause of a clause chain but the whole chain.
+          Print the last considered clause manually. */
+        char *c_s_prev = NULL;
+        if (dump_enabled_p ())
+         c_s_prev = print_omp_clause_to_str (*pc);
+
+        inhibit_descriptor *id = state->candidates.get (var);
+        if (!id) {
+          /* The variable was not a parameter or named in any bind, so it
+             must be in an external scope, and therefore live-on-exit.  */
+          if (dump_enabled_p ())
+            dump_printf_loc(MSG_MISSED_OPTIMIZATION, DUMP_LOC (*pc),
+                            "%qs not optimized: %<%T%> is unsuitable"
+                            " for privatization\n",
+                            c_s_prev, var);
+          continue;
+           }
+
+         switch (id->kind)
+           {
+           case INHIBIT_NOT:  /* Don't inhibit optimization.  */
+
+             /* Change map type from "tofrom" to "to".  */
+             OMP_CLAUSE_SET_MAP_KIND (*pc, GOMP_MAP_TO);
+
+             if (dump_enabled_p ())
+               {
+                 char *c_s_opt = print_omp_clause_to_str (*pc);
+                 dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, DUMP_LOC (*pc),
+                                  "%qs optimized to %qs\n", c_s_prev, c_s_opt);
+                 free (c_s_prev);
+                 c_s_prev = c_s_opt;
+               }
+
+             /* Variables that are dead-on-entry and dead-on-loop can be
+                further optimized to private.  */
+             if (omp_data_optimize_can_be_private (var, stmt))
+               {
+                 tree c_f = (build_omp_clause
+                             (OMP_CLAUSE_LOCATION (*pc),
+                              OMP_CLAUSE_PRIVATE));
+                 OMP_CLAUSE_DECL (c_f) = var;
+                 OMP_CLAUSE_CHAIN (c_f) = OMP_CLAUSE_CHAIN (*pc);
+                 //TODO Copy "implicit" flag from 'var'.
+                 *pc = c_f;
+
+                 if (dump_enabled_p ())
+                   {
+                     char *c_s_opt = print_omp_clause_to_str (*pc);
+                     dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, DUMP_LOC (*pc),
+                                      "%qs further optimized to %qs\n",
+                                      c_s_prev, c_s_opt);
+                     free (c_s_prev);
+                     c_s_prev = c_s_opt;
+                   }
+               }
+             break;
+
+           case INHIBIT_USE:  /* Optimization inhibited by a variable use.  */
+             if (dump_enabled_p ())
+               {
+                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, DUMP_LOC (*pc),
+                                  "%qs not optimized: %<%T%> used...\n",
+                                  c_s_prev, var);
+                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, id->stmt,
+                                  "... here\n");
+               }
+             break;
+
+           case INHIBIT_JMP:  /* Optimization inhibited by control flow.  */
+             if (dump_enabled_p ())
+               {
+                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, DUMP_LOC (*pc),
+                                  "%qs not optimized: %<%T%> disguised by"
+                                  " looping/control flow...\n", c_s_prev, var);
+                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, id->stmt,
+                                  "... here\n");
+               }
+             break;
+
+           case INHIBIT_BAD:  /* Optimization inhibited by properties.  */
+             if (dump_enabled_p ())
+               {
+                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, DUMP_LOC (*pc),
+                                  "%qs not optimized: %<%T%> is unsuitable"
+                                  " for privatization\n", c_s_prev, var);
+               }
+             break;
+
+           default:
+             gcc_unreachable ();
+           }
+
+         if (dump_enabled_p ())
+           free (c_s_prev);
+       }
+    }
+
+  /* Variables used by target regions cannot be optimized from earlier
+     target regions.  */
+  for (tree c = *gimple_omp_target_clauses_ptr (stmt);
+       c; c = OMP_CLAUSE_CHAIN (c))
+    {
+      /* This needs to include all the mapping clauses listed in
+        OMP_TARGET_CLAUSE_MASK in c-parser.c.  */
+      if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_MAP
+         && OMP_CLAUSE_CODE (c) != OMP_CLAUSE_PRIVATE
+         && OMP_CLAUSE_CODE (c) != OMP_CLAUSE_FIRSTPRIVATE)
+       continue;
+
+      tree var = OMP_CLAUSE_DECL (c);
+      omp_data_optimize_inspect_op (var, state, false, stmt);
+    }
+}
+
+/* Call back for gimple walk.  Scan the statement for target regions and
+   variable uses or control flow that might prevent us optimizing offload
+   data copies.  */
+
+static tree
+omp_data_optimize_callback_stmt (gimple_stmt_iterator *gsi_p,
+                                bool *handled_ops_p,
+                                struct walk_stmt_info *wi)
+{
+  ODO_State *state = (ODO_State *) wi->info;
+
+  *handled_ops_p = false;
+  state->lhs_scanned = false;
+
+  gimple *stmt = gsi_stmt (*gsi_p);
+
+  switch (gimple_code (stmt))
+    {
+    /* A bind introduces a new variable scope that might include optimizable
+       variables.  */
+    case GIMPLE_BIND:
+      omp_data_optimize_stmt_bind (as_a <gbind *> (stmt), state);
+      break;
+
+    /* Tracking labels allows us to understand control flow better.  */
+    case GIMPLE_LABEL:
+      state->visited_labels.add (gimple_label_label (as_a <glabel *> (stmt)));
+      break;
+
+    /* Statements that might constitute some looping/control flow pattern
+       may inhibit optimization of target mappings.  */
+    case GIMPLE_COND:
+    case GIMPLE_GOTO:
+    case GIMPLE_SWITCH:
+    case GIMPLE_ASM:
+      omp_data_optimize_stmt_jump (stmt, state);
+      break;
+
+    /* A target statement that will have variables for us to optimize.  */
+    case GIMPLE_OMP_TARGET:
+      /* For now, only look at OpenACC 'kernels' constructs.  */
+      if (gimple_omp_target_kind (stmt) == GF_OMP_TARGET_KIND_OACC_KERNELS)
+       {
+         omp_data_optimize_stmt_target (stmt, state);
+
+         /* Don't walk inside the target region; use of private variables
+            inside the private region does not stop them being private!
+            NOTE: we *do* want to walk target statement types that are not
+            (yet) handled by omp_data_optimize_stmt_target as the uses there
+            must not be missed.  */
+         // TODO add tests for mixed kernels/parallels
+         *handled_ops_p = true;
+       }
+      break;
+
+    default:
+      break;
+    }
+
+  return NULL;
+}
+
+/* Call back for gimple walk.  Scan the operand for variable uses.  */
+
+static tree
+omp_data_optimize_callback_op (tree *tp, int *walk_subtrees, void *data)
+{
+  struct walk_stmt_info *wi = (struct walk_stmt_info *) data;
+
+  omp_data_optimize_inspect_op (*tp, (ODO_State *)wi->info, wi->is_lhs,
+                               wi->stmt);
+
+  *walk_subtrees = 1;
+  return NULL;
+}
+
+/* Main pass entry point.  See comments at head of file.  */
+
+static unsigned int
+omp_data_optimize (void)
+{
+  /* Capture the function arguments so that they can be optimized.  */
+  ODO_State state;
+  for (tree decl = DECL_ARGUMENTS (current_function_decl);
+       decl;
+       decl = DECL_CHAIN (decl))
+    {
+      const dump_user_location_t loc = dump_user_location_t::from_function_decl (decl);
+      omp_data_optimize_add_candidate (loc, decl, &state);
+    }
+
+  /* Scan and optimize the function body, from bottom to top.  */
+  struct walk_stmt_info wi;
+  memset (&wi, 0, sizeof (wi));
+  wi.backward = true;
+  wi.info = &state;
+  gimple_seq body = gimple_body (current_function_decl);
+  walk_gimple_seq (body, omp_data_optimize_callback_stmt,
+                  omp_data_optimize_callback_op, &wi);
+
+  return 0;
+}
+
+
+namespace {
+
+const pass_data pass_data_omp_data_optimize =
+{
+  GIMPLE_PASS, /* type */
+  "omp_data_optimize", /* name */
+  OPTGROUP_OMP, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  PROP_gimple_any, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_omp_data_optimize : public gimple_opt_pass
+{
+public:
+  pass_omp_data_optimize (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_omp_data_optimize, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *)
+  {
+    return (flag_openacc
+           && param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE);
+  }
+  virtual unsigned int execute (function *)
+  {
+    return omp_data_optimize ();
+  }
+
+}; // class pass_omp_data_optimize
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_omp_data_optimize (gcc::context *ctxt)
+{
+  return new pass_omp_data_optimize (ctxt);
+}
diff --git a/gcc/passes.def b/gcc/passes.def
index 5b9bb422d281..681392f8f79f 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_warn_unused_result);
   NEXT_PASS (pass_diagnose_omp_blocks);
   NEXT_PASS (pass_diagnose_tm_blocks);
+  NEXT_PASS (pass_omp_data_optimize);
   NEXT_PASS (pass_omp_oacc_kernels_decompose);
   NEXT_PASS (pass_lower_omp);
   NEXT_PASS (pass_lower_cf);
diff --git a/gcc/testsuite/c-c++-common/goacc/omp_data_optimize-1.c b/gcc/testsuite/c-c++-common/goacc/omp_data_optimize-1.c
new file mode 100644
index 000000000000..c90031a40b71
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/omp_data_optimize-1.c
@@ -0,0 +1,677 @@
+/* Test 'gcc/omp-data-optimize.c'.  */
+
+/* { dg-additional-options "-fdump-tree-gimple-raw" } */
+/* { dg-additional-options "-fopt-info-omp-all" } */
+
+/* It's only with Tcl 8.5 (released in 2007) that "the variable 'varName'
+   passed to 'incr' may be unset, and in that case, it will be set to [...]",
+   so to maintain compatibility with earlier Tcl releases, we manually
+   initialize counter variables:
+   { dg-line l_compute[variable c_compute 0] }
+   { dg-message "dummy" "" { target iN-VAl-Id } l_compute } to avoid
+   "WARNING: dg-line var l_compute defined, but not used".
+   { dg-line l_use[variable c_use 0] }
+   { dg-message "dummy" "" { target iN-VAl-Id } l_use } to avoid
+   "WARNING: dg-line var l_use defined, but not used".
+   { dg-line l_lcf[variable c_lcf 0] }
+   { dg-message "dummy" "" { target iN-VAl-Id } l_lcf } to avoid
+   "WARNING: dg-line var l_lcf defined, but not used".  */
+
+extern int ef1(int);
+
+
+/* Optimization happens.  */
+
+long opt_1_gvar1;
+extern short opt_1_evar1;
+static long opt_1_svar1;
+
+static int opt_1(int opt_1_pvar1)
+{
+  int opt_1_lvar1;
+  extern short opt_1_evar2;
+  static long opt_1_svar2;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+  {
+    int dummy1 = opt_1_pvar1;
+    int dummy2 = opt_1_lvar1;
+    int dummy3 = opt_1_evar2;
+    int dummy4 = opt_1_svar2;
+
+    int dummy5 = opt_1_gvar1;
+    int dummy6 = opt_1_evar1;
+    int dummy7 = opt_1_svar1;
+  }
+
+  return 0;
+
+/* { dg-optimized {'map\(force_tofrom:opt_1_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:opt_1_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-optimized {'map\(force_tofrom:opt_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:opt_1_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-missed {'map\(force_tofrom:opt_1_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_1_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-missed {'map\(force_tofrom:opt_1_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_1_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {'map\(force_tofrom:opt_1_gvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_1_gvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-missed {'map\(force_tofrom:opt_1_evar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_1_evar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-missed {'map\(force_tofrom:opt_1_svar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_1_svar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+}
+
+long opt_2_gvar1;
+extern short opt_2_evar1;
+static long opt_2_svar1;
+
+static int opt_2(int opt_2_pvar1)
+{
+  int opt_2_lvar1;
+  extern short opt_2_evar2;
+  static long opt_2_svar2;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+  {
+    int dummy1 = opt_2_pvar1;
+    int dummy2 = opt_2_lvar1;
+    int dummy3 = opt_2_evar2;
+    int dummy4 = opt_2_svar2;
+
+    int dummy5 = opt_2_gvar1;
+    int dummy6 = opt_2_evar1;
+    int dummy7 = opt_2_svar1;
+  }
+
+  /* A write does not inhibit optimization.  */
+
+  opt_2_pvar1 = 0;
+  opt_2_lvar1 = 1;
+  opt_2_evar2 = 2;
+  opt_2_svar2 = 3;
+
+  opt_2_gvar1 = 10;
+  opt_2_evar1 = 11;
+  opt_2_svar1 = 12;
+
+  return 0;
+
+/* { dg-optimized {'map\(force_tofrom:opt_2_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:opt_2_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-optimized {'map\(force_tofrom:opt_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:opt_2_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-missed {'map\(force_tofrom:opt_2_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_2_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-missed {'map\(force_tofrom:opt_2_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_2_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-missed {'map\(force_tofrom:opt_2_gvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_2_gvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+   { dg-missed {'map\(force_tofrom:opt_2_evar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_2_evar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+   { dg-missed {'map\(force_tofrom:opt_2_svar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_2_svar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+}
+
+long opt_3_gvar1;
+extern short opt_3_evar1;
+static long opt_3_svar1;
+
+static int opt_3(int opt_3_pvar1)
+{
+  int opt_3_lvar1;
+  extern short opt_3_evar2;
+  static long opt_3_svar2;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+  {
+    /* A write inside the kernel inhibits optimization to firstprivate.
+       TODO: optimize to private where the variable is dead-on-entry.  */
+
+    opt_3_pvar1 = 1;
+    opt_3_lvar1 = 2;
+    opt_3_evar2 = 3;
+    opt_3_svar2 = 4;
+
+    opt_3_gvar1 = 5;
+    opt_3_evar1 = 6;
+    opt_3_svar1 = 7;
+  }
+
+  return 0;
+
+/* { dg-optimized {'map\(force_tofrom:opt_3_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:opt_3_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+   { dg-optimized {'map\(to:opt_3_pvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(opt_3_pvar1\)'} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-optimized {'map\(force_tofrom:opt_3_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:opt_3_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+   { dg-optimized {'map\(to:opt_3_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(opt_3_lvar1\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-missed {'map\(force_tofrom:opt_3_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_3_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-missed {'map\(force_tofrom:opt_3_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_3_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {'map\(force_tofrom:opt_3_gvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_3_gvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+   { dg-missed {'map\(force_tofrom:opt_3_evar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_3_evar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+   { dg-missed {'map\(force_tofrom:opt_3_svar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_3_svar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+}
+
+static void opt_4()
+{
+  int opt_4_larray1[10];
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+    {
+      int dummy1 = opt_4_larray1[4];
+      int dummy2 = opt_4_larray1[8];
+    }
+
+/* { dg-optimized {'map\(tofrom:opt_4_larray1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:opt_4_larray1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+   { dg-bogus {'map\(to:opt_4_larray1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'firstprivate\(opt_4_larray1\)'} "" { target *-*-* } l_compute$c_compute }  */
+}
+
+static void opt_5 (int opt_5_pvar1)
+{
+  int opt_5_larray1[10];
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+    {
+      opt_5_larray1[opt_5_pvar1] = 1;
+      opt_5_pvar1[opt_5_larray1] = 2;
+    }
+
+/* { dg-optimized {'map\(force_tofrom:opt_5_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:opt_5_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } */
+
+/* TODO: this probably should be optimizable.  */
+/* { dg-missed {'map\(tofrom:opt_5_larray1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_5_larray1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+}
+
+
+/* Similar, but with optimization inhibited because of variable use.  */
+
+static int use_1(int use_1_pvar1)
+{
+  float use_1_lvar1;
+  extern char use_1_evar2;
+  static double use_1_svar2;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+  {
+    use_1_pvar1 = 0;
+    use_1_lvar1 = 1;
+    use_1_evar2 = 2;
+    use_1_svar2 = 3;
+  }
+
+  int s = 0;
+  s += use_1_pvar1; /* { dg-missed {\.\.\. here} "" { target *-*-* } } */
+  s += use_1_lvar1; /* { dg-missed {\.\.\. here} "" { target *-*-* } } */
+  s += use_1_evar2; /* { dg-bogus {note: \.\.\. here} "" { target *-*-* } }  */
+  s += use_1_svar2; /* { dg-bogus {note: \.\.\. here} "" { target *-*-* } }  */
+
+  return s;
+
+/* { dg-missed {'map\(force_tofrom:use_1_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_1_pvar1' used\.\.\.} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {'map\(force_tofrom:use_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_1_lvar1' used\.\.\.} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {'map\(force_tofrom:use_1_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_1_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {'map\(force_tofrom:use_1_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_1_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+}
+
+extern int use_2_a1[];
+
+static int use_2(int use_2_pvar1)
+{
+  int use_2_lvar1;
+  extern int use_2_evar2;
+  static int use_2_svar2;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+  {
+    use_2_pvar1 = 0;
+    use_2_lvar1 = 1;
+    use_2_evar2 = 2;
+    use_2_svar2 = 3;
+  }
+
+  int s = 0;
+  s += use_2_a1[use_2_pvar1]; /* { dg-missed {\.\.\. here} "" { target *-*-* } } */
+  s += use_2_a1[use_2_lvar1]; /* { dg-missed {\.\.\. here} "" { target *-*-* } } */
+  s += use_2_a1[use_2_evar2];
+  s += use_2_a1[use_2_svar2];
+
+  return s;
+
+/*TODO The following GIMPLE dump scanning maybe too fragile (across
+  different GCC configurations)?  The idea is to verify that we're indeed
+  doing the "deep scanning", as discussed in
+  <http://mid.mail-archive.com/877dm463sc.fsf@euler.schwinge.homeip.net>.  */
+/* { dg-final { scan-tree-dump-times {(?n)  gimple_assign <array_ref, [^,]+, use_2_a1\[use_2_pvar1\], NULL, NULL>$} 1 "gimple" } } */
+/* { dg-missed {'map\(force_tofrom:use_2_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_2_pvar1' used\.\.\.} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-final { scan-tree-dump-times {(?n)  gimple_assign <array_ref, [^,]+, use_2_a1\[use_2_lvar1\], NULL, NULL>$} 1 "gimple" } } */
+/* { dg-missed {'map\(force_tofrom:use_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_2_lvar1' used\.\.\.} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-final { scan-tree-dump-times {(?n)  gimple_assign <var_decl, use_2_evar2\.[^,]+, use_2_evar2, NULL, NULL>$} 1 "gimple" } } */
+/* { dg-final { scan-tree-dump-times {(?n)  gimple_assign <array_ref, [^,]+, use_2_a1\[use_2_evar2\.[^\]]+\], NULL, NULL>$} 1 "gimple" } } */
+/* { dg-final { scan-tree-dump-times {(?n)  gimple_assign <var_decl, use_2_svar2\.[^,]+, use_2_svar2, NULL, NULL>$} 1 "gimple" } } */
+/* { dg-final { scan-tree-dump-times {(?n)  gimple_assign <array_ref, [^,]+, use_2_a1\[use_2_svar2\.[^\]]+\], NULL, NULL>$} 1 "gimple" } } */
+/* { dg-missed {'map\(force_tofrom:use_2_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_2_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {'map\(force_tofrom:use_2_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_2_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */
+}
+
+static void use_3 ()
+{
+  int use_5_lvar1;
+  int use_5_larray1[10];
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+    {
+      use_5_lvar1 = 5;
+    }
+
+  use_5_larray1[use_5_lvar1] = 1; /* { dg-line l_use[incr c_use] } */
+
+/* { dg-missed {'map\(force_tofrom:use_5_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_5_lvar1' used\.\.\.} "" { target *-*-* } l_compute$c_compute }
+   { dg-missed {\.\.\. here} "" { target *-*-* } l_use$c_use } */
+}
+
+
+/* Similar, but with the optimization inhibited because of looping/control flow.  */
+
+static void lcf_1(int lcf_1_pvar1)
+{
+  float lcf_1_lvar1;
+  extern char lcf_1_evar2;
+  static double lcf_1_svar2;
+
+  for (int i = 0; i < ef1(i); ++i) /* { dg-line l_lcf[incr c_lcf] } */
+ {
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+  {
+    lcf_1_pvar1 = 0;
+    lcf_1_lvar1 = 1;
+    lcf_1_evar2 = 2;
+    lcf_1_svar2 = 3;
+  }
+ }
+
+/* { dg-missed {'map\(force_tofrom:lcf_1_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_1_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {'map\(force_tofrom:lcf_1_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_1_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {'map\(force_tofrom:lcf_1_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_1_pvar1' disguised by looping/control flow\.\.\.} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-missed {'map\(force_tofrom:lcf_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_1_lvar1' disguised by looping/control flow\.\.\.} "" { target *-*-* } l_compute$c_compute }
+   { dg-missed {\.\.\. here} "" { target *-*-* } l_lcf$c_lcf } */
+}
+
+static void lcf_2(int lcf_2_pvar1)
+{
+  float lcf_2_lvar1;
+  extern char lcf_2_evar2;
+  static double lcf_2_svar2;
+
+  if (ef1 (0))
+    return;
+
+ repeat:
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+  {
+    lcf_2_pvar1 = 0;
+    lcf_2_lvar1 = 1;
+    lcf_2_evar2 = 2;
+    lcf_2_svar2 = 3;
+  }
+
+  goto repeat; /* { dg-line l_lcf[incr c_lcf] } */
+
+/* { dg-missed {'map\(force_tofrom:lcf_2_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_2_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {'map\(force_tofrom:lcf_2_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_2_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {'map\(force_tofrom:lcf_2_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_2_pvar1' disguised by looping/control flow\.\.\.} "" { target *-*-* } l_compute$c_compute }
+/* { dg-missed {'map\(force_tofrom:lcf_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_2_lvar1' disguised by looping/control flow\.\.\.} "" { target *-*-* } l_compute$c_compute }
+   { dg-missed {\.\.\. here} "" { target *-*-* } l_lcf$c_lcf } */
+}
+
+static void lcf_3(int lcf_3_pvar1)
+{
+  float lcf_3_lvar1;
+  extern char lcf_3_evar2;
+  static double lcf_3_svar2;
+
+  if (ef1 (0))
+    return;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+  {
+    lcf_3_pvar1 = 0;
+    lcf_3_lvar1 = 1;
+    lcf_3_evar2 = 2;
+    lcf_3_svar2 = 3;
+  }
+
+  // Backward jump after kernel
+ repeat:
+  goto repeat; /* { dg-line l_lcf[incr c_lcf] } */
+
+/* { dg-missed {'map\(force_tofrom:lcf_3_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_3_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {'map\(force_tofrom:lcf_3_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_3_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {'map\(force_tofrom:lcf_3_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_3_pvar1' disguised by looping/control flow\.\.\.} "" { target *-*-* } l_compute$c_compute }
+/* { dg-missed {'map\(force_tofrom:lcf_3_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_3_lvar1' disguised by looping/control flow\.\.\.} "" { target *-*-* } l_compute$c_compute }
+   { dg-missed {\.\.\. here} "" { target *-*-* } l_lcf$c_lcf } */
+}
+
+static void lcf_4(int lcf_4_pvar1)
+{
+  float lcf_4_lvar1;
+  extern char lcf_4_evar2;
+  static double lcf_4_svar2;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+  {
+    lcf_4_pvar1 = 0;
+    lcf_4_lvar1 = 1;
+    lcf_4_evar2 = 2;
+    lcf_4_svar2 = 3;
+  }
+
+  // Forward jump after kernel
+  goto out;
+
+    out:
+  return;
+
+/* { dg-missed {'map\(force_tofrom:lcf_4_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_4_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+   { dg-optimized {'map\(to:lcf_4_pvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_4_pvar1\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-missed {'map\(force_tofrom:lcf_4_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_4_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+   { dg-optimized {'map\(to:lcf_4_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_4_lvar1\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-optimized {'map\(force_tofrom:lcf_4_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_4_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-optimized {'map\(force_tofrom:lcf_4_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_4_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } */
+}
+
+static void lcf_5(int lcf_5_pvar1)
+{
+  float lcf_5_lvar1;
+  extern char lcf_5_evar2;
+  static double lcf_5_svar2;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+  {
+    lcf_5_pvar1 = 0;
+    lcf_5_lvar1 = 1;
+    lcf_5_evar2 = 2;
+    lcf_5_svar2 = 3;
+  }
+
+  if (ef1 (-1))
+    ;
+
+  return;
+
+/* { dg-optimized {'map\(force_tofrom:lcf_5_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_5_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+   { dg-optimized {'map\(to:lcf_5_pvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_5_pvar1\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-optimized {'map\(force_tofrom:lcf_5_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_5_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+   { dg-optimized {'map\(to:lcf_5_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_5_lvar1\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-missed {'map\(force_tofrom:lcf_5_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_5_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {'map\(force_tofrom:lcf_5_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_5_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+}
+
+static void lcf_6(int lcf_6_pvar1)
+{
+  float lcf_6_lvar1;
+  extern char lcf_6_evar2;
+  static double lcf_6_svar2;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+  {
+    lcf_6_pvar1 = 0;
+    lcf_6_lvar1 = 1;
+    lcf_6_evar2 = 2;
+    lcf_6_svar2 = 3;
+  }
+
+  int x = ef1 (-2) ? 1 : -1;
+
+  return;
+
+/* { dg-optimized {'map\(force_tofrom:lcf_6_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_6_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+   { dg-optimized {'map\(to:lcf_6_pvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_6_pvar1\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-optimized {'map\(force_tofrom:lcf_6_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_6_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+   { dg-optimized {'map\(to:lcf_6_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_6_lvar1\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-missed {'map\(force_tofrom:lcf_6_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_6_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {'map\(force_tofrom:lcf_6_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_6_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+}
+
+static void lcf_7(int lcf_7_pvar1)
+{
+  float lcf_7_lvar1;
+  extern char lcf_7_evar2;
+  static double lcf_7_svar2;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+  {
+    lcf_7_pvar1 = 0;
+    lcf_7_lvar1 = 1;
+    lcf_7_evar2 = 2;
+    lcf_7_svar2 = 3;
+  }
+
+  switch (ef1 (-2))
+    {
+    case 0: ef1 (10); break;
+    case 2: ef1 (11); break;
+    default: ef1 (12); break;
+    }
+
+  return;
+
+/* { dg-optimized {'map\(force_tofrom:lcf_7_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_7_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+   { dg-optimized {'map\(to:lcf_7_pvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_7_pvar1\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-optimized {'map\(force_tofrom:lcf_7_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_7_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+   { dg-optimized {'map\(to:lcf_7_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_7_lvar1\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-missed {'map\(force_tofrom:lcf_7_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_7_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {'map\(force_tofrom:lcf_7_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_7_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+}
+
+static void lcf_8(int lcf_8_pvar1)
+{
+  float lcf_8_lvar1;
+  extern char lcf_8_evar2;
+  static double lcf_8_svar2;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+  {
+    lcf_8_pvar1 = 0;
+    lcf_8_lvar1 = 1;
+    lcf_8_evar2 = 2;
+    lcf_8_svar2 = 3;
+  }
+
+  asm goto ("" :::: out);
+
+out:
+  return;
+
+/* { dg-optimized {'map\(force_tofrom:lcf_8_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_8_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+   { dg-optimized {'map\(to:lcf_8_pvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_8_pvar1\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-optimized {'map\(force_tofrom:lcf_8_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_8_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+   { dg-optimized {'map\(to:lcf_8_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_8_lvar1\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-missed {'map\(force_tofrom:lcf_8_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_8_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {'map\(force_tofrom:lcf_8_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_8_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+}
+
+/* Ensure that variables are promoted to private properly.  */
+
+static void priv_1 ()
+{
+  int priv_1_lvar1, priv_1_lvar2, priv_1_lvar3, priv_1_lvar4, priv_1_lvar5;
+  int priv_1_lvar6, priv_1_lvar7, priv_1_lvar8, priv_1_lvar9, priv_1_lvar10;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+    {
+      priv_1_lvar1 = 1;
+      int dummy = priv_1_lvar2;
+
+      if (priv_1_lvar2)
+       {
+         priv_1_lvar3 = 1;
+       }
+      else
+       {
+         priv_1_lvar3 = 2;
+       }
+
+      priv_1_lvar5 = priv_1_lvar3;
+
+      if (priv_1_lvar2)
+       {
+         priv_1_lvar4 = 1;
+         int dummy = priv_1_lvar4;
+       }
+
+      switch (priv_1_lvar2)
+       {
+       case 0:
+         priv_1_lvar5 = 1;
+         dummy = priv_1_lvar6;
+         break;
+       case 1:
+         priv_1_lvar5 = 2;
+         priv_1_lvar6 = 3;
+         break;
+       default:
+         break;
+       }
+
+      asm goto ("" :: "r"(priv_1_lvar7) :: label1, label2);
+      if (0)
+       {
+label1:
+         priv_1_lvar8 = 1;
+         priv_1_lvar9 = 2;
+       }
+      if (0)
+       {
+label2:
+         dummy = priv_1_lvar9;
+         dummy = priv_1_lvar10;
+       }
+    }
+
+/* { dg-optimized {'map\(force_tofrom:priv_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+ { dg-optimized {'map\(to:priv_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar1\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-optimized {'map\(force_tofrom:priv_1_lvar2 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar2 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+ { dg-bogus {'map\(to:priv_1_lvar2 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar2\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-optimized {'map\(force_tofrom:priv_1_lvar3 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar3 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+ { dg-optimized {'map\(to:priv_1_lvar3 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar3\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-optimized {'map\(force_tofrom:priv_1_lvar4 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar4 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+ { dg-optimized {'map\(to:priv_1_lvar4 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar4\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-optimized {'map\(force_tofrom:priv_1_lvar5 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar5 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+ { dg-optimized {'map\(to:priv_1_lvar5 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar5\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-optimized {'map\(force_tofrom:priv_1_lvar6 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar6 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+ { dg-bogus {'map\(to:priv_1_lvar6 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar6\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-optimized {'map\(force_tofrom:priv_1_lvar7 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar7 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+ { dg-bogus {'map\(to:priv_1_lvar7 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar7\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-optimized {'map\(force_tofrom:priv_1_lvar8 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar8 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+ { dg-optimized {'map\(to:priv_1_lvar8 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar8\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-optimized {'map\(force_tofrom:priv_1_lvar9 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar9 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+ { dg-bogus {'map\(to:priv_1_lvar9 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar9\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-optimized {'map\(force_tofrom:priv_1_lvar10 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar10 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+ { dg-bogus {'map\(to:priv_1_lvar10 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar10\)'} "" { target *-*-* } l_compute$c_compute } */
+}
+
+static void multiple_kernels_1 ()
+{
+#pragma acc kernels
+    {
+      int multiple_kernels_1_lvar1 = 1;
+    }
+
+    int multiple_kernels_2_lvar1;
+#pragma acc kernels
+    {
+      int multiple_kernels_2_lvar1 = 1;
+    }
+
+#pragma acc parallel
+    {
+      multiple_kernels_2_lvar1++;
+    }
+}
+
+static int ref_1 ()
+{
+  int *ref_1_ref1;
+  int ref_1_lvar1;
+
+  ref_1_ref1 = &ref_1_lvar1;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+    {
+      ref_1_lvar1 = 1;
+    }
+
+  return *ref_1_ref1;
+
+/* { dg-missed {'map\(force_tofrom:ref_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'ref_1_lvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+}
+
+static int ref_2 ()
+{
+  int *ref_2_ref1;
+  int ref_2_lvar1;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+    {
+      ref_2_lvar1 = 1;
+    }
+
+  ref_2_ref1 = &ref_2_lvar1;
+  return *ref_2_ref1;
+
+/* { dg-missed {'map\(force_tofrom:ref_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'ref_2_lvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+}
+
+static void ref_3 ()
+{
+  int ref_3_lvar1;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+  // FIXME: could be optimized
+    {
+      int *ref_3_ref1 = &ref_3_lvar1;
+      ref_3_lvar1 = 1;
+    }
+
+/* { dg-missed {'map\(force_tofrom:ref_3_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'ref_3_lvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+}
+
+static void ref_4 ()
+{
+  int ref_4_lvar1;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+  // FIXME: could be optmized
+    {
+      int *ref_4_ref1 = &ref_4_lvar1;
+      *ref_4_ref1 = 1;
+    }
+
+/* { dg-missed {'map\(force_tofrom:ref_4_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'ref_4_lvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+}
+
+static void conditional_1 (int conditional_1_pvar1)
+{
+  int conditional_1_lvar1 = 1;
+
+  if (conditional_1_pvar1)
+    {
+      // TODO: should be opimizable, but isn't due to later usage in the
+      // linear scan.
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+       {
+         int dummy = conditional_1_lvar1;
+       }
+    }
+  else
+    {
+      int dummy = conditional_1_lvar1; /* { dg-line l_use[incr c_use] } */
+    }
+
+/* { dg-missed {'map\(force_tofrom:conditional_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'conditional_1_lvar1' used...} "" { target *-*-* } l_compute$c_compute }
+   { dg-missed {\.\.\. here} "" { target *-*-* } l_use$c_use } */
+}
+
+static void conditional_2 (int conditional_2_pvar1)
+{
+  int conditional_2_lvar1 = 1;
+
+  if (conditional_2_pvar1)
+    {
+      int dummy = conditional_2_lvar1;
+    }
+  else
+    {
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+       {
+         int dummy = conditional_2_lvar1;
+       }
+    }
+
+/* { dg-optimized {'map\(force_tofrom:conditional_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:conditional_2_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } */
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/uninit-copy-clause.c b/gcc/testsuite/c-c++-common/goacc/uninit-copy-clause.c
index b3cc4459328f..628b84940a1c 100644
--- a/gcc/testsuite/c-c++-common/goacc/uninit-copy-clause.c
+++ b/gcc/testsuite/c-c++-common/goacc/uninit-copy-clause.c
@@ -7,6 +7,12 @@ foo (void)
   int i;

 #pragma acc kernels
+  /* { dg-warning "'i' is used uninitialized in this function" "" { target *-*-* } .-1 } */
+  /*TODO With the 'copy' -> 'firstprivate' optimization, the original implicit 'copy(i)' clause gets optimized into a 'firstprivate(i)' clause -- and the expected (?) warning diagnostic appears.
+    Have to read up the history behind these test cases.
+    Should this test remain here in this file even if now testing 'firstprivate'?
+    Or, should the optimization be disabled for such testing?
+    Or, the testing be duplicated for both variants?  */
   {
     i = 1;
   }
diff --git a/gcc/testsuite/g++.dg/goacc/omp_data_optimize-1.C b/gcc/testsuite/g++.dg/goacc/omp_data_optimize-1.C
new file mode 100644
index 000000000000..5483e5682410
--- /dev/null
+++ b/gcc/testsuite/g++.dg/goacc/omp_data_optimize-1.C
@@ -0,0 +1,169 @@
+/* Test 'gcc/omp-data-optimize.c'.  */
+
+/* { dg-additional-options "-std=c++11" } */
+/* { dg-additional-options "-fdump-tree-gimple-raw" } */
+/* { dg-additional-options "-fopt-info-omp-all" } */
+
+/* It's only with Tcl 8.5 (released in 2007) that "the variable 'varName'
+   passed to 'incr' may be unset, and in that case, it will be set to [...]",
+   so to maintain compatibility with earlier Tcl releases, we manually
+   initialize counter variables:
+   { dg-line l_compute[variable c_compute 0] }
+   { dg-message "dummy" "" { target iN-VAl-Id } l_compute } to avoid
+   "WARNING: dg-line var l_compute defined, but not used".
+   { dg-line l_use[variable c_use 0] }
+   { dg-message "dummy" "" { target iN-VAl-Id } l_use } to avoid
+   "WARNING: dg-line var l_use defined, but not used".  */
+
+static int closure_1 (int closure_1_pvar1)
+{
+  int closure_1_lvar1 = 1;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+    {
+      /* { dg-message {note: beginning 'Graphite' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 }  */
+      closure_1_lvar1 = closure_1_pvar1;
+    }
+
+  auto lambda = [closure_1_lvar1]() {return closure_1_lvar1;}; /* { dg-line l_use[incr c_use] } */
+  return lambda();
+
+/* { dg-optimized {'map\(force_tofrom:closure_1_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:closure_1_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {'map\(force_tofrom:closure_1_lvar1 \[len: [0-9]\]\[implicit\]\)' not optimized: 'closure_1_lvar1' used...} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-missed {\.\.\. here} "" { target *-*-* } l_use$c_use } */
+}
+
+static int closure_2 (int closure_2_pvar1)
+{
+  int closure_2_lvar1 = 1;
+
+  auto lambda = [closure_2_lvar1]() {return closure_2_lvar1;};
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+    {
+      /* { dg-message {note: beginning 'Graphite' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 }  */
+      closure_2_lvar1 = closure_2_pvar1;
+    }
+
+  return lambda();
+
+/* { dg-optimized {'map\(force_tofrom:closure_2_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:closure_2_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-optimized {'map\(force_tofrom:closure_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:closure_2_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+   { dg-optimized {'map\(to:closure_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(closure_2_lvar1\)'} "" { target *-*-* } l_compute$c_compute }  */
+}
+
+static int closure_3 (int closure_3_pvar1)
+{
+  int closure_3_lvar1 = 1;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+    {
+      /* { dg-message {note: beginning 'Graphite' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 }  */
+      closure_3_lvar1 = closure_3_pvar1;
+    }
+
+  auto lambda = [&]() {return closure_3_lvar1;};
+
+  return lambda();
+
+/* { dg-optimized {'map\(force_tofrom:closure_3_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:closure_3_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {map\(force_tofrom:closure_3_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'closure_3_lvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+}
+
+static int closure_4 (int closure_4_pvar1)
+{
+  int closure_4_lvar1 = 1;
+
+  auto lambda = [&]() {return closure_4_lvar1;};
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+    {
+      /* { dg-message {note: beginning 'Graphite' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 }  */
+      closure_4_lvar1 = closure_4_pvar1;
+    }
+
+  return lambda();
+
+/* { dg-optimized {'map\(force_tofrom:closure_4_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:closure_4_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {map\(force_tofrom:closure_4_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'closure_4_lvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+}
+
+static int closure_5 (int closure_5_pvar1)
+{
+  int closure_5_lvar1 = 1;
+
+  auto lambda = [=]() {return closure_5_lvar1;};
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+    {
+      /* { dg-message {note: beginning 'Graphite' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 }  */
+      closure_5_lvar1 = closure_5_pvar1;
+    }
+
+  return lambda();
+
+/* { dg-optimized {'map\(force_tofrom:closure_5_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:closure_5_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-optimized {'map\(force_tofrom:closure_5_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:closure_5_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+   { dg-optimized {'map\(to:closure_5_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(closure_5_lvar1\)'} "" { target *-*-* } l_compute$c_compute }  */
+}
+
+static int closure_6 (int closure_6_pvar1)
+{
+  int closure_6_lvar1 = 1;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+    {
+      /* { dg-message {note: beginning 'Graphite' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 }  */
+      closure_6_lvar1 = closure_6_pvar1;
+    }
+
+  auto lambda = [=]() {return closure_6_lvar1;}; /* { dg-line l_use[incr c_use] } */
+
+  return lambda();
+
+/* { dg-optimized {'map\(force_tofrom:closure_6_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:closure_6_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {'map\(force_tofrom:closure_6_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'closure_6_lvar1' used...} "" { target *-*-* } l_compute$c_compute }
+   { dg-missed {\.\.\. here} "" { target *-*-* } l_use$c_use } */
+}
+
+static int try_1 ()
+{
+  int try_1_lvar1, try_1_lvar2;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+    {
+      /* { dg-message {note: beginning 'Graphite' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 }  */
+      try_1_lvar1 = 1;
+    }
+
+  try {
+    try_1_lvar2 = try_1_lvar1; /* { dg-line l_use[incr c_use] } */
+  } catch (...) {}
+
+  return try_1_lvar2;
+
+/* { dg-missed {'map\(force_tofrom:try_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'try_1_lvar1' used...} "" { target *-*-* } l_compute$c_compute }
+   { dg-missed {\.\.\. here} "" { target *-*-* } l_use$c_use } */
+}
+
+static int try_2 ()
+{
+  int try_2_lvar1, try_2_lvar2;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+    {
+      /* { dg-message {note: beginning 'Graphite' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 }  */
+      try_2_lvar1 = 1;
+    }
+
+  try {
+    try_2_lvar2 = 1;
+  } catch (...) {
+    try_2_lvar2 = try_2_lvar1; /* { dg-line l_use[incr c_use] } */
+  }
+
+  return try_2_lvar2;
+
+/* { dg-missed {'map\(force_tofrom:try_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'try_2_lvar1' used...} "" { target *-*-* } l_compute$c_compute }
+   { dg-missed {\.\.\. here} "" { target *-*-* } l_use$c_use } */
+}
diff --git a/gcc/testsuite/gfortran.dg/goacc/omp_data_optimize-1.f90 b/gcc/testsuite/gfortran.dg/goacc/omp_data_optimize-1.f90
new file mode 100644
index 000000000000..ce3e556faf26
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/omp_data_optimize-1.f90
@@ -0,0 +1,588 @@
+! { dg-additional-options "-fdump-tree-gimple-raw" }
+! { dg-additional-options "-fopt-info-omp-all" }
+
+! It's only with Tcl 8.5 (released in 2007) that "the variable 'varName'
+! passed to 'incr' may be unset, and in that case, it will be set to [...]",
+! so to maintain compatibility with earlier Tcl releases, we manually
+! initialize counter variables:
+! { dg-line l_compute[variable c_compute 0] }
+! { dg-message "dummy" "" { target iN-VAl-Id } l_compute } to avoid
+! "WARNING: dg-line var l_compute defined, but not used".
+! { dg-line l_use[variable c_use 0] }
+! { dg-message "dummy" "" { target iN-VAl-Id } l_use } to avoid
+! "WARNING: dg-line var l_use defined, but not used".
+
+module globals
+  use ISO_C_BINDING
+  implicit none
+  integer :: opt_1_gvar1 = 1
+  integer(C_INT), bind(C) :: opt_1_evar1
+  integer :: opt_2_gvar1 = 1
+  integer(C_INT), bind(C) :: opt_2_evar1
+  integer :: opt_3_gvar1 = 1
+  integer(C_INT), bind(C) :: opt_3_evar1
+  integer :: use_1_gvar1 = 1
+  integer(C_INT), bind(C) :: use_1_evar1
+  integer :: use_2_gvar1 = 1
+  integer(C_INT), bind(C) :: use_2_evar1
+  integer :: use_2_a1(100)
+  integer(C_INT), bind(C) :: lcf_1_evar2
+  integer(C_INT), bind(C) :: lcf_2_evar2
+  integer(C_INT), bind(C) :: lcf_3_evar2
+  integer(C_INT), bind(C) :: lcf_4_evar2
+  integer(C_INT), bind(C) :: lcf_5_evar2
+  integer(C_INT), bind(C) :: lcf_6_evar2
+  save
+end module globals
+
+subroutine opt_1 (opt_1_pvar1)
+  use globals
+  implicit none
+  integer :: opt_1_pvar1
+  integer :: opt_1_lvar1
+  integer, save :: opt_1_svar1 = 3
+  integer :: dummy1, dummy2, dummy3, dummy4, dummy5
+
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
+    dummy1 = opt_1_pvar1;
+    dummy2 = opt_1_lvar1;
+
+    dummy3 = opt_1_gvar1;
+    dummy4 = opt_1_evar1;
+    dummy5 = opt_1_svar1;
+  !$acc end kernels
+
+! Parameter is pass-by-reference
+! { dg-missed {'map\(force_tofrom:\*opt_1_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*opt_1_pvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+
+! { dg-optimized {'map\(force_tofrom:opt_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:opt_1_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+!
+! { dg-missed {'map\(force_tofrom:opt_1_gvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_1_gvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+
+! { dg-missed {'map\(force_tofrom:opt_1_evar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_1_evar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+
+! { dg-missed {'map\(force_tofrom:opt_1_svar1 \[len: 4\]\[implicit\]\)' not optimized: 'opt_1_svar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+!
+! { dg-optimized {'map\(force_tofrom:dummy1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:dummy1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy1\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(force_tofrom:dummy2 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy2 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:dummy2 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy2\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(force_tofrom:dummy3 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy3 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:dummy3 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy3\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(force_tofrom:dummy4 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy4 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:dummy4 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy4\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(force_tofrom:dummy5 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy5 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:dummy5 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy5\)'} "" { target *-*-* } l_compute$c_compute }
+end subroutine opt_1
+
+subroutine opt_2 (opt_2_pvar1)
+  use globals
+  implicit none
+  integer :: opt_2_pvar1
+  integer :: opt_2_lvar1
+  integer, save :: opt_2_svar1 = 3
+  integer :: dummy1, dummy2, dummy3, dummy4, dummy5
+
+  !$acc kernels    ! { dg-line l_compute[incr c_compute] }
+    dummy1 = opt_2_pvar1;
+    dummy2 = opt_2_lvar1;
+
+    dummy3 = opt_2_gvar1;
+    dummy4 = opt_2_evar1;
+    dummy5 = opt_2_svar1;
+  !$acc end kernels
+
+  ! A write does not inhibit optimization.
+  opt_2_pvar1 = 0;
+  opt_2_lvar1 = 1;
+
+  opt_2_gvar1 = 10;
+  opt_2_evar1 = 11;
+  opt_2_svar1 = 12;
+
+! { dg-missed {'map\(force_tofrom:\*opt_2_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*opt_2_pvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+
+! { dg-optimized {'map\(force_tofrom:opt_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:opt_2_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+
+! { dg-missed {'map\(force_tofrom:opt_2_gvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_2_gvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+
+! { dg-missed {'map\(force_tofrom:opt_2_evar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_2_evar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+
+! { dg-missed {'map\(force_tofrom:opt_2_svar1 \[len: 4\]\[implicit\]\)' not optimized: 'opt_2_svar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+
+! { dg-optimized {'map\(force_tofrom:dummy1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:dummy1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy1\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(force_tofrom:dummy2 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy2 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:dummy2 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy2\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(force_tofrom:dummy3 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy3 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:dummy3 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy3\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(force_tofrom:dummy4 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy4 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:dummy4 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy4\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(force_tofrom:dummy5 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy5 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:dummy5 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy5\)'} "" { target *-*-* } l_compute$c_compute }
+end subroutine opt_2
+
+subroutine opt_3 (opt_3_pvar1)
+  use globals
+  implicit none
+  integer :: opt_3_pvar1
+  integer :: opt_3_lvar1
+  integer, save :: opt_3_svar1 = 3
+
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
+    opt_3_pvar1 = 0;
+    opt_3_lvar1 = 1;
+
+    opt_3_gvar1 = 10;
+    opt_3_evar1 = 11;
+    opt_3_svar1 = 12;
+  !$acc end kernels
+
+! Parameter is pass-by-reference
+! { dg-missed {'map\(force_tofrom:\*opt_3_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*opt_3_pvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+
+! { dg-optimized {'map\(force_tofrom:opt_3_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:opt_3_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:opt_3_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(opt_3_lvar1\)'} "" { target *-*-* } l_compute$c_compute }
+!
+! { dg-missed {'map\(force_tofrom:opt_3_gvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_3_gvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+
+! { dg-missed {'map\(force_tofrom:opt_3_evar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_3_evar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+
+! { dg-missed {'map\(force_tofrom:opt_3_svar1 \[len: 4\]\[implicit\]\)' not optimized: 'opt_3_svar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+end subroutine opt_3
+
+subroutine opt_4 ()
+  implicit none
+  integer, dimension(10) :: opt_4_larray1
+  integer :: dummy1, dummy2
+
+  ! TODO Fortran local arrays are addressable (and may be visable to nested
+  ! functions, etc.) so they are not optimizable yet.
+
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
+    dummy1 = opt_4_larray1(4)
+    dummy2 = opt_4_larray1(8)
+  !$acc end kernels
+
+! { dg-missed {'map\(tofrom:opt_4_larray1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_4_larray1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+!
+! { dg-optimized {'map\(force_tofrom:dummy1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:dummy1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy1\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(force_tofrom:dummy2 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy2 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:dummy2 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy2\)'} "" { target *-*-* } l_compute$c_compute }
+end subroutine opt_4
+
+subroutine opt_5 (opt_5_pvar1)
+  implicit none
+  integer, dimension(10) :: opt_5_larray1
+  integer :: opt_5_lvar1, opt_5_pvar1
+
+  opt_5_lvar1 = opt_5_pvar1
+
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
+    opt_5_larray1(opt_5_lvar1) = 1
+  !$acc end kernels
+
+! { dg-missed {'map\(tofrom:opt_5_larray1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_5_larray1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+!
+! { dg-optimized {'map\(force_tofrom:opt_5_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:opt_5_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+end subroutine opt_5
+
+subroutine use_1 (use_1_pvar1)
+  use globals
+  implicit none
+  integer :: use_1_pvar1
+  integer :: use_1_lvar1
+  integer, save :: use_1_svar1 = 3
+  integer :: s
+
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
+    use_1_pvar1 = 0;
+    use_1_lvar1 = 1;
+
+    ! FIXME: svar is optimized: should not be
+    use_1_gvar1 = 10;
+    use_1_evar1 = 11;
+    use_1_svar1 = 12;
+  !$acc end kernels
+
+  s = 0
+  s = s + use_1_pvar1
+  s = s + use_1_lvar1 ! { dg-missed {\.\.\. here} "" { target *-*-* } }
+  s = s + use_1_gvar1
+  s = s + use_1_evar1
+  s = s + use_1_svar1
+
+! { dg-missed {'map\(force_tofrom:\*use_1_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*use_1_pvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:use_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_1_lvar1' used...} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:use_1_gvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_1_gvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:use_1_evar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_1_evar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:use_1_svar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_1_svar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+end subroutine use_1
+
+subroutine use_2 (use_2_pvar1)
+  use globals
+  implicit none
+  integer :: use_2_pvar1
+  integer :: use_2_lvar1
+  integer, save :: use_2_svar1 = 3
+  integer :: s
+
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
+    use_2_pvar1 = 0;
+    use_2_lvar1 = 1;
+    use_2_gvar1 = 10;
+    use_2_evar1 = 11;
+    use_2_svar1 = 12;
+  !$acc end kernels
+
+  s = 0
+  s = s + use_2_a1(use_2_pvar1)
+  s = s + use_2_a1(use_2_lvar1) ! { dg-missed {\.\.\. here} "" { target *-*-* } }
+  s = s + use_2_a1(use_2_gvar1)
+  s = s + use_2_a1(use_2_evar1)
+  s = s + use_2_a1(use_2_svar1)
+
+! { dg-missed {'map\(force_tofrom:\*use_2_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*use_2_pvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:use_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_2_lvar1' used...} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:use_2_gvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_2_gvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:use_2_evar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_2_evar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:use_2_svar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_2_svar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+end subroutine use_2
+
+! Optimization inhibited because of looping/control flow.
+
+subroutine lcf_1 (lcf_1_pvar1, iter)
+  use globals
+  implicit none
+  real :: lcf_1_pvar1
+  real :: lcf_1_lvar1
+  real, save :: lcf_1_svar2
+  integer :: i, iter
+
+  do i = 1, iter ! { dg-line l_use[incr c_use] }
+    !$acc kernels ! { dg-line l_compute[incr c_compute] }
+      lcf_1_pvar1 = 0
+      lcf_1_lvar1 = 1
+      lcf_1_evar2 = 2
+      lcf_1_svar2 = 3
+    !$acc end kernels
+  end do
+
+! { dg-missed {'map\(force_tofrom:\*lcf_1_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*lcf_1_pvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:lcf_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_1_lvar1' disguised by looping/control flow...} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:lcf_1_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_1_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:lcf_1_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_1_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {\.\.\. here} "" { target *-*-* } l_use$c_use }
+end subroutine lcf_1
+
+subroutine lcf_2 (lcf_2_pvar1)
+  use globals
+  implicit none
+  real :: lcf_2_pvar1
+  real :: lcf_2_lvar1
+  real, save :: lcf_2_svar2
+  integer :: dummy
+
+10 dummy = 1
+
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
+    lcf_2_pvar1 = 0
+    lcf_2_lvar1 = 1
+    lcf_2_evar2 = 2
+    lcf_2_svar2 = 3
+  !$acc end kernels
+
+  go to 10 ! { dg-line l_use[incr c_use] }
+
+! { dg-missed {'map\(force_tofrom:\*lcf_2_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*lcf_2_pvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:lcf_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_2_lvar1' disguised by looping/control flow...} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:lcf_2_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_2_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:lcf_2_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_2_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {\.\.\. here} "" { target *-*-* } l_use$c_use }
+end subroutine lcf_2
+
+subroutine lcf_3 (lcf_3_pvar1)
+  use globals
+  implicit none
+  real :: lcf_3_pvar1
+  real :: lcf_3_lvar1
+  real, save :: lcf_3_svar2
+  integer :: dummy
+
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
+    lcf_3_pvar1 = 0
+    lcf_3_lvar1 = 1
+    lcf_3_evar2 = 2
+    lcf_3_svar2 = 3
+  !$acc end kernels
+
+  ! Backward jump after kernel
+10 dummy = 1
+  go to 10 ! { dg-line l_use[incr c_use] }
+
+! { dg-missed {'map\(force_tofrom:\*lcf_3_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*lcf_3_pvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:lcf_3_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_3_lvar1' disguised by looping/control flow...} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:lcf_3_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_3_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:lcf_3_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_3_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {\.\.\. here} "" { target *-*-* } l_use$c_use }
+end subroutine lcf_3
+
+subroutine lcf_4 (lcf_4_pvar1)
+  use globals
+  implicit none
+  real :: lcf_4_pvar1
+  real :: lcf_4_lvar1
+  real, save :: lcf_4_svar2
+  integer :: dummy
+
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
+    lcf_4_pvar1 = 0
+    lcf_4_lvar1 = 1
+    lcf_4_evar2 = 2
+    lcf_4_svar2 = 3
+  !$acc end kernels
+
+  ! Forward jump after kernel
+  go to 10
+10 dummy = 1
+
+! { dg-missed {'map\(force_tofrom:\*lcf_4_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*lcf_4_pvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(force_tofrom:lcf_4_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_4_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:lcf_4_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_4_lvar1\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:lcf_4_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_4_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:lcf_4_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_4_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+end subroutine lcf_4
+
+subroutine lcf_5 (lcf_5_pvar1, lcf_5_pvar2)
+  use globals
+  implicit none
+  real :: lcf_5_pvar1
+  real :: lcf_5_pvar2
+  real :: lcf_5_lvar1
+  real, save :: lcf_5_svar2
+  integer :: dummy
+
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
+    lcf_5_pvar1 = 0
+    lcf_5_lvar1 = 1
+    lcf_5_evar2 = 2
+    lcf_5_svar2 = 3
+  !$acc end kernels
+
+  if (lcf_5_pvar2 > 0) then
+    dummy = 1
+  end if
+
+! { dg-missed {'map\(force_tofrom:\*lcf_5_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*lcf_5_pvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(force_tofrom:lcf_5_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_5_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:lcf_5_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_5_lvar1\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:lcf_5_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_5_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:lcf_5_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_5_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+end subroutine lcf_5
+
+subroutine lcf_6 (lcf_6_pvar1, lcf_6_pvar2)
+  use globals
+  implicit none
+  real :: lcf_6_pvar1
+  real :: lcf_6_pvar2
+  real :: lcf_6_lvar1
+  real, save :: lcf_6_svar2
+  integer :: dummy
+
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
+    lcf_6_pvar1 = 0
+    lcf_6_lvar1 = 1
+    lcf_6_evar2 = 2
+    lcf_6_svar2 = 3
+  !$acc end kernels
+
+  dummy = merge(1,0, lcf_6_pvar2 > 0)
+
+! { dg-missed {'map\(force_tofrom:\*lcf_6_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*lcf_6_pvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(force_tofrom:lcf_6_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_6_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:lcf_6_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_6_lvar1\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:lcf_6_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_6_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:lcf_6_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_6_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+end subroutine lcf_6
+
+subroutine priv_1 ()
+  implicit none
+  integer :: priv_1_lvar1, priv_1_lvar2, priv_1_lvar3, priv_1_lvar4
+  integer :: priv_1_lvar5, priv_1_lvar6, dummy
+
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
+    ! { dg-message {note: beginning 'Graphite' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 } */
+    priv_1_lvar1 = 1
+    dummy = priv_1_lvar2
+
+    if (priv_1_lvar2 > 0) then
+        priv_1_lvar3 = 1
+    else
+        priv_1_lvar3 = 2
+    end if
+
+    priv_1_lvar5 = priv_1_lvar3
+
+    if (priv_1_lvar2 > 0) then
+        priv_1_lvar4 = 1
+        dummy = priv_1_lvar4
+    end if
+  !$acc end kernels
+
+! { dg-optimized {'map\(force_tofrom:priv_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:priv_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar1\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(force_tofrom:priv_1_lvar2 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar2 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-bogus {'map\(to:priv_1_lvar2 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar2\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(force_tofrom:priv_1_lvar3 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar3 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:priv_1_lvar3 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar3\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(force_tofrom:priv_1_lvar4 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar4 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:priv_1_lvar4 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar4\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(force_tofrom:priv_1_lvar5 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar5 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:priv_1_lvar5 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar5\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(force_tofrom:dummy \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:dummy \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy\)'} "" { target *-*-* } l_compute$c_compute }
+end subroutine priv_1
+
+subroutine multiple_kernels_1 ()
+  implicit none
+  integer :: multiple_kernels_1_lvar1
+
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
+    multiple_kernels_1_lvar1 = 1
+  !$acc end kernels
+
+  !$acc kernels ! { dg-line l_use[incr c_use] }
+    multiple_kernels_1_lvar1 = multiple_kernels_1_lvar1 + 1
+  !$acc end kernels
+
+! { dg-missed {'map\(force_tofrom:multiple_kernels_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'multiple_kernels_1_lvar1' used...} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {\.\.\. here} "" { target *-*-* } l_use$c_use }
+
+! { dg-optimized {'map\(force_tofrom:multiple_kernels_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:multiple_kernels_1_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_use$c_use }
+end subroutine multiple_kernels_1
+
+subroutine multiple_kernels_2 ()
+  implicit none
+  integer :: multiple_kernels_2_lvar1
+
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
+    multiple_kernels_2_lvar1 = 1
+  !$acc end kernels
+
+  !$acc parallel
+    multiple_kernels_2_lvar1 = multiple_kernels_2_lvar1 + 1 ! { dg-line l_use[incr c_use] }
+  !$acc end parallel
+
+! { dg-missed {'map\(force_tofrom:multiple_kernels_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'multiple_kernels_2_lvar1' used...} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {\.\.\. here} "" { target *-*-* } l_use$c_use }
+end subroutine multiple_kernels_2
+
+integer function ref_1 ()
+  implicit none
+  integer, target :: ref_1_lvar1
+  integer, target :: ref_1_lvar2
+  integer, pointer :: ref_1_ref1
+
+  ref_1_ref1 => ref_1_lvar1
+
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
+    ref_1_lvar1 = 1
+    ! FIXME: currently considered unsuitable; but could be optimized
+    ref_1_lvar2 = 2
+  !$acc end kernels
+
+  ref_1 = ref_1_ref1
+
+! { dg-missed {'map\(force_tofrom:ref_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'ref_1_lvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:ref_1_lvar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'ref_1_lvar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+end function ref_1
+
+integer function ref_2 ()
+  implicit none
+  integer, target :: ref_2_lvar1
+  integer, target :: ref_2_lvar2
+  integer, pointer :: ref_2_ref1
+
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
+    ref_2_lvar1 = 1
+    ! FIXME: currently considered unsuitable, but could be optimized
+    ref_2_lvar2 = 2
+  !$acc end kernels
+
+  ref_2_ref1 => ref_2_lvar1
+  ref_2 = ref_2_ref1
+
+! { dg-missed {'map\(force_tofrom:ref_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'ref_2_lvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:ref_2_lvar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'ref_2_lvar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+end function ref_2
+
+subroutine ref_3 ()
+  implicit none
+  integer, target :: ref_3_lvar1
+  integer, pointer :: ref_3_ref1
+
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
+    ref_3_ref1 => ref_3_lvar1
+
+    ! FIXME: currently considered unsuitable, but could be optimized
+    ref_3_lvar1 = 1
+  !$acc end kernels
+
+! { dg-missed {'map\(force_tofrom:\*ref_3_ref1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*ref_3_ref1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:ref_3_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'ref_3_lvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+end subroutine ref_3
+
+subroutine ref_4 ()
+  implicit none
+  integer, target :: ref_4_lvar1
+  integer, pointer :: ref_4_ref1
+
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
+    ref_4_ref1 => ref_4_lvar1
+
+    ! FIXME: currently considered unsuitable, but could be optimized
+    ref_4_ref1 = 1
+  !$acc end kernels
+
+! { dg-missed {'map\(force_tofrom:\*ref_4_ref1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*ref_4_ref1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:ref_4_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'ref_4_lvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+end subroutine ref_4
+
+subroutine conditional_1 (conditional_1_pvar1)
+  implicit none
+  integer :: conditional_1_pvar1
+  integer :: conditional_1_lvar1
+
+  conditional_1_lvar1 = 1
+
+  if (conditional_1_pvar1 > 0) then
+    !$acc kernels ! { dg-line l_compute[incr c_compute] }
+      conditional_1_lvar1 = 2
+    !$acc end kernels
+  else
+    conditional_1_lvar1 = 3
+  end if
+
+! { dg-optimized {'map\(force_tofrom:conditional_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:conditional_1_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:conditional_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(conditional_1_lvar1\)'} "" { target *-*-* } l_compute$c_compute }
+end subroutine conditional_1
+
+subroutine conditional_2 (conditional_2_pvar1)
+  implicit none
+  integer :: conditional_2_pvar1
+  integer :: conditional_2_lvar1
+
+  conditional_2_lvar1 = 1
+
+  if (conditional_2_pvar1 > 0) then
+    conditional_2_lvar1 = 3
+  else
+    !$acc kernels ! { dg-line l_compute[incr c_compute] }
+      conditional_2_lvar1 = 2
+    !$acc end kernels
+  end if
+
+! { dg-optimized {'map\(force_tofrom:conditional_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:conditional_2_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:conditional_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(conditional_2_lvar1\)'} "" { target *-*-* } l_compute$c_compute }
+end subroutine conditional_2
diff --git a/gcc/testsuite/gfortran.dg/goacc/uninit-copy-clause.f95 b/gcc/testsuite/gfortran.dg/goacc/uninit-copy-clause.f95
index b2aae1df5229..97fbe1268b73 100644
--- a/gcc/testsuite/gfortran.dg/goacc/uninit-copy-clause.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/uninit-copy-clause.f95
@@ -5,6 +5,8 @@ subroutine foo
   integer :: i

   !$acc kernels
+  ! { dg-warning "'i' is used uninitialized in this function" "" { target *-*-* } .-1 }
+  !TODO See discussion in '../../c-c++-common/goacc/uninit-copy-clause.c'.
   i = 1
   !$acc end kernels

diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index ebaa3c86694f..7a48091f4286 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -423,6 +423,7 @@ extern gimple_opt_pass *make_pass_lower_vector (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_lower_vector_ssa (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_omp_oacc_kernels_decompose (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_lower_omp (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_omp_data_optimize (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_diagnose_omp_blocks (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_expand_omp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_expand_omp_ssa (gcc::context *ctxt);
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
index e08cfa56e3c9..88742a3bfdf4 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
@@ -29,6 +29,8 @@ int main()
   int b[N] = { 0 };

 #pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+  /* { dg-missed {'map\(tofrom:b [^)]+\)' not optimized: 'b' is unsuitable for privatization} "" { target *-*-* } .-1 }
+     { dg-missed {'map\(force_tofrom:a [^)]+\)' not optimized: 'a' is unsuitable for privatization} "" { target *-*-* } .-2 } */
   {
     int c = 234; /* { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" } */
     /* { dg-note {variable 'c' declared in block is candidate for adjusting OpenACC privatization level} "" { target *-*-* } l_compute$c_compute }
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90
index 74ee6fde84f8..994a8a35110f 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90
@@ -17,6 +17,10 @@ subroutine kernel(lo, hi, a, b, c)
   real, dimension(lo:hi) :: a, b, c

   !$acc kernels copyin(lo, hi)
+  ! { dg-optimized {'map\(force_tofrom:offset.[0-9]+ [^)]+\)' optimized to 'map\(to:offset.[0-9]+ [^)]+\)'} "" {target *-*-* } .-1 }
+  ! { dg-missed {'map\(tofrom:\*c [^)]+\)' not optimized: '\*c' is unsuitable for privatization} "" { target *-*-* } .-2 }
+  ! { dg-missed {'map\(tofrom:\*b [^)]+\)' not optimized: '\*b' is unsuitable for privatization} "" { target *-*-* } .-3 }
+  ! { dg-missed {'map\(tofrom:\*a [^)]+\)' not optimized: '\*a' is unsuitable for privatization} "" { target *-*-* } .-4 }
   !$acc loop independent ! { dg-line l_loop_i[incr c_loop_i] }
   ! { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_i$c_loop_i }
   ! { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 25/40] openacc: Add runtime alias checking for OpenACC kernels
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
                   ` (23 preceding siblings ...)
  2021-12-15 15:54 ` [PATCH 24/40] openacc: Add data optimization pass Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2021-12-15 15:54 ` [PATCH 26/40] openacc: Warn about "independent" "kernels" loops with data-dependences Frederik Harwath
                   ` (14 subsequent siblings)
  39 siblings, 0 replies; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: Andrew Stubbs, thomas, sebpop, grosser, rguenther

From: Andrew Stubbs <ams@codesourcery.com>

This commit adds the code generation for the runtime alias checks for
OpenACC loops that have been analyzed by Graphite.  The runtime alias
check condition gets generated in Graphite. It is evaluated by the
code generated for the IFN_GOACC_LOOP internal function calls.  If
aliasing is detected at runtime, the execution dimensions get adjusted
to execute the affected loops sequentially.

gcc/ChangeLog:

        * graphite-isl-ast-to-gimple.c: Include internal-fn.h.
        (graphite_oacc_analyze_scop): Implement runtime alias checks.
        * omp-expand.c (expand_oacc_for): Add an additional "noalias" parameter
        to GOACC_LOOP internal calls, and initialise it to integer_one_node.
        * omp-offload.c (oacc_xform_loop): Integrate the runtime alias check
        into the GOACC_LOOP expansion.

libgomp/ChangeLog:

        * testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-1.c: New test.
        * testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-2.c: New test.
---
 gcc/graphite-isl-ast-to-gimple.c              | 122 ++++++++
 gcc/omp-expand.c                              |  37 +--
 gcc/omp-offload.c                             | 271 ++++++++++--------
 .../runtime-alias-check-1.c                   |  79 +++++
 .../runtime-alias-check-2.c                   |  90 ++++++
 5 files changed, 457 insertions(+), 142 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-1.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-2.c

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index e820e2c32202..010adaabb000 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -58,6 +58,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "graphite.h"
 #include "graphite-oacc.h"
 #include "stdlib.h"
+#include "internal-fn.h"

 struct ast_build_info
 {
@@ -1697,6 +1698,127 @@ graphite_oacc_analyze_scop (scop_p scop)
       print_isl_schedule (dump_file, scop->original_schedule);
     }

+  if (flag_graphite_runtime_alias_checks
+      && scop->unhandled_alias_ddrs.length () > 0)
+    {
+      sese_info_p region = scop->scop_info;
+
+      /* Usually there will be a chunking loop with the actual work loop
+        inside it.  In some corner cases there may only be one loop.  */
+      loop_p top_loop = region->region.entry->dest->loop_father;
+      loop_p active_loop = top_loop->inner ? top_loop->inner : top_loop;
+      tree cond = generate_alias_cond (scop->unhandled_alias_ddrs, active_loop);
+
+      /* Walk back to GOACC_LOOP block.  */
+      basic_block goacc_loop_block = region->region.entry->src;
+
+      /* Find the GOACC_LOOP calls. If there aren't any then this is not an
+        OpenACC kernels loop and will need different handling.  */
+      gimple_stmt_iterator gsitop = gsi_start_bb (goacc_loop_block);
+      while (!gsi_end_p (gsitop)
+            && (!is_gimple_call (gsi_stmt (gsitop))
+                || !gimple_call_internal_p (gsi_stmt (gsitop))
+                || (gimple_call_internal_fn (gsi_stmt (gsitop))
+                    != IFN_GOACC_LOOP)))
+       gsi_next (&gsitop);
+
+      if (!gsi_end_p (gsitop))
+       {
+         /* Move the GOACC_LOOP CHUNK and STEP calls to after any hoisted
+            statements.  There ought not be any problematic dependencies because
+            the chunk size and step are only computed for very specific purposes.
+            They may not be at the very top of the block, but they should be
+            found together (the asserts test this assuption). */
+         gimple_stmt_iterator gsibottom = gsi_last_bb (goacc_loop_block);
+         gsi_move_after (&gsitop, &gsibottom);
+         gimple_stmt_iterator gsiinsert = gsibottom;
+         gcc_checking_assert (is_gimple_call (gsi_stmt (gsitop))
+                              && gimple_call_internal_p (gsi_stmt (gsitop))
+                              && (gimple_call_internal_fn (gsi_stmt (gsitop))
+                                  == IFN_GOACC_LOOP));
+         gsi_move_after (&gsitop, &gsibottom);
+
+         /* Insert "noalias_p = COND" before the GOACC_LOOP statements.
+            Note that these likely depend on some of the hoisted statements.  */
+         tree cond_val = force_gimple_operand_gsi (&gsiinsert, cond, true, NULL,
+                                                   true, GSI_NEW_STMT);
+
+         /* Insert the cond_val into each GOACC_LOOP call in the region.  */
+         for (int n = -1; n < (int)region->bbs.length (); n++)
+           {
+             /* Cover the region plus goacc_loop_block.  */
+             basic_block bb = n < 0 ? goacc_loop_block : region->bbs[n];
+
+             for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
+                  !gsi_end_p (gsi);
+                  gsi_next (&gsi))
+               {
+                 gimple *stmt = gsi_stmt (gsi);
+                 if (!is_gimple_call (stmt)
+                     || !gimple_call_internal_p (stmt))
+                   continue;
+
+                 gcall *goacc_call = as_a <gcall*> (stmt);
+                 if (gimple_call_internal_fn (goacc_call) != IFN_GOACC_LOOP)
+                   continue;
+
+                 enum ifn_goacc_loop_kind code = (enum ifn_goacc_loop_kind)
+                   TREE_INT_CST_LOW (gimple_call_arg (goacc_call, 0));
+                 int argno = 0;
+                 switch (code)
+                   {
+                   case IFN_GOACC_LOOP_CHUNKS:
+                   case IFN_GOACC_LOOP_STEP:
+                     argno = 6;
+                     break;
+
+                   case IFN_GOACC_LOOP_OFFSET:
+                   case IFN_GOACC_LOOP_BOUND:
+                     argno = 7;
+                     break;
+
+                   default:
+                     gcc_unreachable ();
+                   }
+
+                 gimple_call_set_arg (goacc_call, argno, cond_val);
+                 update_stmt (goacc_call);
+
+                 if (dump_enabled_p () && dump_flags & TDF_DETAILS)
+                   dump_printf (MSG_NOTE,
+                                "Runtime alias condition applied to: %G",
+                                goacc_call);
+               }
+           }
+       }
+      else
+       {
+         /* There wasn't any GOACC_LOOP calls where we expected to find them,
+            therefore this isn't an OpenACC parallel loop.  If it runs
+            sequentially then there's no need to worry about aliasing, so
+            nothing much to do here.  */
+         if (dump_enabled_p ())
+           dump_printf (MSG_NOTE, "Runtime alias check *not* inserted for"
+                        " bb %d (GOACC_LOOP not found)");
+
+         /* Unset can_be_parallel, in case something else might use it.  */
+         for (unsigned int i = 0; i < region->bbs.length (); i++)
+           if (region->bbs[i]->loop_father)
+             region->bbs[i]->loop_father->can_be_parallel = 0;
+       }
+
+      /* The loop-nest vec is shared by all DDRs. */
+      DDR_LOOP_NEST (scop->unhandled_alias_ddrs[0]).release ();
+
+      unsigned int i;
+      struct data_dependence_relation *ddr;
+
+      FOR_EACH_VEC_ELT (scop->unhandled_alias_ddrs, i, ddr)
+       if (ddr)
+         free_dependence_relation (ddr);
+      scop->unhandled_alias_ddrs.truncate (0);
+    }
+
   /* Analyze dependences in SCoP and mark loops as parallelizable accordingly. */
   isl_schedule_foreach_schedule_node_top_down (
       scop->original_schedule, visit_schedule_loop_node, scop->dependence);
diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
index 365d167b6428..585ce798ee15 100644
--- a/gcc/omp-expand.c
+++ b/gcc/omp-expand.c
@@ -7719,10 +7719,11 @@ expand_oacc_for (struct omp_region *region, struct omp_for_data *fd)
       ass = gimple_build_assign (chunk_no, expr);
       gsi_insert_before (&gsi, ass, GSI_SAME_STMT);

-      call = gimple_build_call_internal (IFN_GOACC_LOOP, 6,
+      call = gimple_build_call_internal (IFN_GOACC_LOOP, 7,
                                         build_int_cst (integer_type_node,
                                                        IFN_GOACC_LOOP_CHUNKS),
-                                        dir, range, s, chunk_size, gwv);
+                                        dir, range, s, chunk_size, gwv,
+                                        integer_one_node);
       gimple_call_set_lhs (call, chunk_max);
       gimple_set_location (call, loc);
       gsi_insert_before (&gsi, call, GSI_SAME_STMT);
@@ -7730,10 +7731,11 @@ expand_oacc_for (struct omp_region *region, struct omp_for_data *fd)
   else
     chunk_size = chunk_no;

-  call = gimple_build_call_internal (IFN_GOACC_LOOP, 6,
+  call = gimple_build_call_internal (IFN_GOACC_LOOP, 7,
                                     build_int_cst (integer_type_node,
                                                    IFN_GOACC_LOOP_STEP),
-                                    dir, range, s, chunk_size, gwv);
+                                    dir, range, s, chunk_size, gwv,
+                                    integer_one_node);
   gimple_call_set_lhs (call, step);
   gimple_set_location (call, loc);
   gsi_insert_before (&gsi, call, GSI_SAME_STMT);
@@ -7767,20 +7769,20 @@ expand_oacc_for (struct omp_region *region, struct omp_for_data *fd)
   /* Loop offset & bound go into head_bb.  */
   gsi = gsi_start_bb (head_bb);

-  call = gimple_build_call_internal (IFN_GOACC_LOOP, 7,
+  call = gimple_build_call_internal (IFN_GOACC_LOOP, 8,
                                     build_int_cst (integer_type_node,
                                                    IFN_GOACC_LOOP_OFFSET),
-                                    dir, range, s,
-                                    chunk_size, gwv, chunk_no);
+                                    dir, range, s, chunk_size, gwv, chunk_no,
+                                    integer_one_node);
   gimple_call_set_lhs (call, offset_init);
   gimple_set_location (call, loc);
   gsi_insert_after (&gsi, call, GSI_CONTINUE_LINKING);

-  call = gimple_build_call_internal (IFN_GOACC_LOOP, 7,
+  call = gimple_build_call_internal (IFN_GOACC_LOOP, 8,
                                     build_int_cst (integer_type_node,
                                                    IFN_GOACC_LOOP_BOUND),
-                                    dir, range, s,
-                                    chunk_size, gwv, offset_init);
+                                    dir, range, s, chunk_size, gwv,
+                                    offset_init, integer_one_node);
   gimple_call_set_lhs (call, bound);
   gimple_set_location (call, loc);
   gsi_insert_after (&gsi, call, GSI_CONTINUE_LINKING);
@@ -7830,22 +7832,25 @@ expand_oacc_for (struct omp_region *region, struct omp_for_data *fd)
          tree chunk = build_int_cst (diff_type, 0); /* Never chunked.  */

          t = build_int_cst (integer_type_node, IFN_GOACC_LOOP_OFFSET);
-         call = gimple_build_call_internal (IFN_GOACC_LOOP, 7, t, dir, e_range,
-                                            element_s, chunk, e_gwv, chunk);
+         call = gimple_build_call_internal (IFN_GOACC_LOOP, 8, t, dir, e_range,
+                                            element_s, chunk, e_gwv, chunk,
+                                            integer_one_node);
          gimple_call_set_lhs (call, e_offset);
          gimple_set_location (call, loc);
          gsi_insert_before (&gsi, call, GSI_SAME_STMT);

          t = build_int_cst (integer_type_node, IFN_GOACC_LOOP_BOUND);
-         call = gimple_build_call_internal (IFN_GOACC_LOOP, 7, t, dir, e_range,
-                                            element_s, chunk, e_gwv, e_offset);
+         call = gimple_build_call_internal (IFN_GOACC_LOOP, 8, t, dir, e_range,
+                                            element_s, chunk, e_gwv, e_offset,
+                                            integer_one_node);
          gimple_call_set_lhs (call, e_bound);
          gimple_set_location (call, loc);
          gsi_insert_before (&gsi, call, GSI_SAME_STMT);

          t = build_int_cst (integer_type_node, IFN_GOACC_LOOP_STEP);
-         call = gimple_build_call_internal (IFN_GOACC_LOOP, 6, t, dir, e_range,
-                                            element_s, chunk, e_gwv);
+         call = gimple_build_call_internal (IFN_GOACC_LOOP, 7, t, dir, e_range,
+                                            element_s, chunk, e_gwv,
+                                            integer_one_node);
          gimple_call_set_lhs (call, e_step);
          gimple_set_location (call, loc);
          gsi_insert_before (&gsi, call, GSI_SAME_STMT);
diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 392ca56b1f4f..3458a1acbceb 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -555,6 +555,7 @@ oacc_xform_loop (gcall *call)
   bool chunking = false, striding = true;
   unsigned outer_mask = mask & (~mask + 1); // Outermost partitioning
   unsigned inner_mask = mask & ~outer_mask; // Inner partitioning (if any)
+  tree noalias = NULL_TREE;

   /* Skip lowering if return value of IFN_GOACC_LOOP call is not used.  */
   if (!lhs)
@@ -596,147 +597,165 @@ oacc_xform_loop (gcall *call)

   switch (code)
     {
-    default: gcc_unreachable ();
+    default:
+      gcc_unreachable ();

     case IFN_GOACC_LOOP_CHUNKS:
+      noalias = gimple_call_arg (call, 6);
       if (!chunking)
-       r = build_int_cst (type, 1);
+        r = build_int_cst (type, 1);
       else
-       {
-         /* chunk_max
-            = (range - dir) / (chunks * step * num_threads) + dir  */
-         tree per = oacc_thread_numbers (false, mask, &seq);
-         per = fold_convert (type, per);
-         chunk_size = fold_convert (type, chunk_size);
-         per = fold_build2 (MULT_EXPR, type, per, chunk_size);
-         per = fold_build2 (MULT_EXPR, type, per, step);
-         r = build2 (MINUS_EXPR, type, range, dir);
-         r = build2 (PLUS_EXPR, type, r, per);
-         r = build2 (TRUNC_DIV_EXPR, type, r, per);
-       }
+        {
+          /* chunk_max
+             = (range - dir) / (chunks * step * num_threads) + dir  */
+          tree per = oacc_thread_numbers (false, mask, &seq);
+          per = fold_convert (type, per);
+          noalias = fold_convert (type, noalias);
+          per = fold_build2 (MULT_EXPR, type, per, noalias);
+          per = fold_build2 (MAX_EXPR, type, per, fold_convert (type, integer_one_node));
+          chunk_size = fold_convert (type, chunk_size);
+          per = fold_build2 (MULT_EXPR, type, per, chunk_size);
+          per = fold_build2 (MULT_EXPR, type, per, step);
+          r = fold_build2 (MINUS_EXPR, type, range, dir);
+          r = fold_build2 (PLUS_EXPR, type, r, per);
+          r = build2 (TRUNC_DIV_EXPR, type, r, per);
+        }
       break;

     case IFN_GOACC_LOOP_STEP:
+      noalias = gimple_call_arg (call, 6);
       {
        /* If striding, step by the entire compute volume, otherwise
-          step by the inner volume.  */
+           step by the inner volume.  */
        unsigned volume = striding ? mask : inner_mask;

+       noalias = fold_convert (type, noalias);
        r = oacc_thread_numbers (false, volume, &seq);
+       r = fold_convert (type, r);
+       r = build2 (MULT_EXPR, type, r, noalias);
+       r = build2 (MAX_EXPR, type, r, fold_convert (type, fold_convert (type, integer_one_node)));
        r = build2 (MULT_EXPR, type, fold_convert (type, r), step);
+        break;
       }
-      break;
-
-    case IFN_GOACC_LOOP_OFFSET:
-      /* Enable vectorization on non-SIMT targets.  */
-      if (!targetm.simt.vf
-         && outer_mask == GOMP_DIM_MASK (GOMP_DIM_VECTOR)
-         /* If not -fno-tree-loop-vectorize, hint that we want to vectorize
-            the loop.  */
-         && (flag_tree_loop_vectorize
-             || !OPTION_SET_P (flag_tree_loop_vectorize)))
-       {
-         basic_block bb = gsi_bb (gsi);
-         class loop *parent = bb->loop_father;
-         class loop *body = parent->inner;
-
-         parent->force_vectorize = true;
-         parent->safelen = INT_MAX;
-
-         /* "Chunking loops" may have inner loops.  */
-         if (parent->inner)
-           {
-             body->force_vectorize = true;
-             body->safelen = INT_MAX;
-           }
-
-         cfun->has_force_vectorize_loops = true;
-       }
-      if (striding)
-       {
-         r = oacc_thread_numbers (true, mask, &seq);
-         r = fold_convert (diff_type, r);
-       }
-      else
-       {
-         tree inner_size = oacc_thread_numbers (false, inner_mask, &seq);
-         tree outer_size = oacc_thread_numbers (false, outer_mask, &seq);
-         tree volume = fold_build2 (MULT_EXPR, TREE_TYPE (inner_size),
-                                    inner_size, outer_size);
-
-         volume = fold_convert (diff_type, volume);
-         if (chunking)
-           chunk_size = fold_convert (diff_type, chunk_size);
-         else
-           {
-             tree per = fold_build2 (MULT_EXPR, diff_type, volume, step);

-             chunk_size = build2 (MINUS_EXPR, diff_type, range, dir);
-             chunk_size = build2 (PLUS_EXPR, diff_type, chunk_size, per);
-             chunk_size = build2 (TRUNC_DIV_EXPR, diff_type, chunk_size, per);
-           }
-
-         tree span = build2 (MULT_EXPR, diff_type, chunk_size,
-                             fold_convert (diff_type, inner_size));
-         r = oacc_thread_numbers (true, outer_mask, &seq);
-         r = fold_convert (diff_type, r);
-         r = build2 (MULT_EXPR, diff_type, r, span);
-
-         tree inner = oacc_thread_numbers (true, inner_mask, &seq);
-         inner = fold_convert (diff_type, inner);
-         r = fold_build2 (PLUS_EXPR, diff_type, r, inner);
-
-         if (chunking)
-           {
-             tree chunk = fold_convert (diff_type, gimple_call_arg (call, 6));
-             tree per
-               = fold_build2 (MULT_EXPR, diff_type, volume, chunk_size);
-             per = build2 (MULT_EXPR, diff_type, per, chunk);
-
-             r = build2 (PLUS_EXPR, diff_type, r, per);
-           }
-       }
-      r = fold_build2 (MULT_EXPR, diff_type, r, step);
-      if (type != diff_type)
-       r = fold_convert (type, r);
-      break;
-
-    case IFN_GOACC_LOOP_BOUND:
-      if (striding)
-       r = range;
-      else
-       {
-         tree inner_size = oacc_thread_numbers (false, inner_mask, &seq);
-         tree outer_size = oacc_thread_numbers (false, outer_mask, &seq);
-         tree volume = fold_build2 (MULT_EXPR, TREE_TYPE (inner_size),
-                                    inner_size, outer_size);
-
-         volume = fold_convert (diff_type, volume);
-         if (chunking)
-           chunk_size = fold_convert (diff_type, chunk_size);
-         else
-           {
-             tree per = fold_build2 (MULT_EXPR, diff_type, volume, step);
-
-             chunk_size = build2 (MINUS_EXPR, diff_type, range, dir);
-             chunk_size = build2 (PLUS_EXPR, diff_type, chunk_size, per);
-             chunk_size = build2 (TRUNC_DIV_EXPR, diff_type, chunk_size, per);
-           }
-
-         tree span = build2 (MULT_EXPR, diff_type, chunk_size,
-                             fold_convert (diff_type, inner_size));
-
-         r = fold_build2 (MULT_EXPR, diff_type, span, step);
-
-         tree offset = gimple_call_arg (call, 6);
-         r = build2 (PLUS_EXPR, diff_type, r,
-                     fold_convert (diff_type, offset));
-         r = build2 (integer_onep (dir) ? MIN_EXPR : MAX_EXPR,
-                     diff_type, r, range);
-       }
-      if (diff_type != type)
-       r = fold_convert (type, r);
-      break;
+      case IFN_GOACC_LOOP_OFFSET:
+       noalias = gimple_call_arg (call, 7);
+        if (striding)
+          {
+            r = oacc_thread_numbers (true, mask, &seq);
+            r = fold_convert (diff_type, r);
+            tree tmp1 = build2 (NE_EXPR, boolean_type_node, r,
+                                fold_convert (diff_type, integer_zero_node));
+            tree tmp2 = build2 (EQ_EXPR, boolean_type_node, noalias,
+                                boolean_false_node);
+            tree tmp3 = build2 (BIT_AND_EXPR, diff_type,
+                                fold_convert (diff_type, tmp1),
+                                fold_convert (diff_type, tmp2));
+            tree tmp4 = build2 (MULT_EXPR, diff_type, tmp3, range);
+            r = build2 (PLUS_EXPR, diff_type, r, tmp4);
+          }
+        else
+          {
+            tree inner_size = oacc_thread_numbers (false, inner_mask, &seq);
+            tree outer_size = oacc_thread_numbers (false, outer_mask, &seq);
+            tree volume = fold_build2 (MULT_EXPR, TREE_TYPE (inner_size),
+                                       inner_size, outer_size);
+
+            volume = fold_convert (diff_type, volume);
+            if (chunking)
+              chunk_size = fold_convert (diff_type, chunk_size);
+            else
+              {
+                tree per = fold_build2 (MULT_EXPR, diff_type, volume, step);
+                /* chunk_size = (range + per - 1) / per.  */
+                chunk_size = build2 (MINUS_EXPR, diff_type, range, dir);
+                chunk_size = build2 (PLUS_EXPR, diff_type, chunk_size, per);
+                chunk_size = build2 (TRUNC_DIV_EXPR, diff_type, chunk_size, per);
+              }
+
+            /* Curtail the range in all but one thread when there may be
+               aliasing to prevent parallelization.  */
+            tree n = oacc_thread_numbers (true, mask, &seq);
+            n = fold_convert (diff_type, n);
+            tree tmp1 = build2 (NE_EXPR, boolean_type_node, n,
+                                fold_convert (diff_type, integer_zero_node));
+            tree tmp2 = build2 (EQ_EXPR, boolean_type_node, noalias,
+                                boolean_false_node);
+            tree tmp3 = build2 (BIT_AND_EXPR, diff_type,
+                                fold_convert (diff_type, tmp1),
+                                fold_convert (diff_type, tmp2));
+            range = build2 (MULT_EXPR, diff_type, tmp3, range);
+
+            tree span = build2 (MULT_EXPR, diff_type, chunk_size,
+                                fold_convert (diff_type, inner_size));
+            r = oacc_thread_numbers (true, outer_mask, &seq);
+            r = fold_convert (diff_type, r);
+            r = build2 (PLUS_EXPR, diff_type, r, range);
+            r = build2 (MULT_EXPR, diff_type, r, span);
+
+            tree inner = oacc_thread_numbers (true, inner_mask, &seq);
+
+            inner = fold_convert (diff_type, inner);
+            r = fold_build2 (PLUS_EXPR, diff_type, r, inner);
+
+            if (chunking)
+              {
+                tree chunk
+                    = fold_convert (diff_type, gimple_call_arg (call, 6));
+                tree per
+                    = fold_build2 (MULT_EXPR, diff_type, volume, chunk_size);
+                per = build2 (MULT_EXPR, diff_type, per, chunk);
+
+                r = build2 (PLUS_EXPR, diff_type, r, per);
+              }
+          }
+        r = fold_build2 (MULT_EXPR, diff_type, r, step);
+        if (type != diff_type)
+          r = fold_convert (type, r);
+        break;
+
+      case IFN_GOACC_LOOP_BOUND:
+        if (striding)
+          r = range;
+        else
+          {
+            noalias = fold_convert (diff_type, gimple_call_arg (call, 7));
+
+            tree inner_size = oacc_thread_numbers (false, inner_mask, &seq);
+            tree outer_size = oacc_thread_numbers (false, outer_mask, &seq);
+            tree volume = fold_build2 (MULT_EXPR, TREE_TYPE (inner_size),
+                                       inner_size, outer_size);
+
+            volume = fold_convert (diff_type, volume);
+            volume = fold_build2 (MULT_EXPR, diff_type, volume, noalias);
+            volume
+                = fold_build2 (MAX_EXPR, diff_type, volume, fold_convert (diff_type, integer_one_node));
+            if (chunking)
+              chunk_size = fold_convert (diff_type, chunk_size);
+            else
+              {
+                tree per = fold_build2 (MULT_EXPR, diff_type, volume, step);
+                /* chunk_size = (range + per - 1) / per.  */
+                chunk_size = build2 (MINUS_EXPR, diff_type, range, dir);
+                chunk_size = build2 (PLUS_EXPR, diff_type, chunk_size, per);
+                chunk_size
+                    = build2 (TRUNC_DIV_EXPR, diff_type, chunk_size, per);
+              }
+
+            tree span = build2 (MULT_EXPR, diff_type, chunk_size,
+                                fold_convert (diff_type, inner_size));
+
+            r = fold_build2 (MULT_EXPR, diff_type, span, step);
+
+            tree offset = gimple_call_arg (call, 6);
+            r = build2 (PLUS_EXPR, diff_type, r,
+                        fold_convert (diff_type, offset));
+            r = build2 (integer_onep (dir) ? MIN_EXPR : MAX_EXPR, diff_type, r,
+                        range);
+          }
+        if (diff_type != type)
+          r = fold_convert (type, r);
+        break;
     }

   gimplify_assign (lhs, r, &seq);
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-1.c
new file mode 100644
index 000000000000..2fb1c712beb3
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-1.c
@@ -0,0 +1,79 @@
+/* Test that a simple array copy does the right thing when the input and
+   output data overlap.  The GPU kernel should automatically switch to
+   a sequential operation mode in order to give the expected results.  */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+void f(int *data, int n, int to, int from, int count)
+{
+  /* We cannot use copyin for two overlapping arrays because we get an error
+     that the memory is already present.  We also cannot do the pointer
+     arithmetic inside the kernels region because it just ends up using
+     host pointers (bug?).  Using enter data with a single array, and
+     acc_deviceptr solves the problem.  */
+#pragma acc enter data copyin(data[0:n])
+
+  int *a = (int*)acc_deviceptr (data+to);
+  int *b = (int*)acc_deviceptr (data+from);
+
+#pragma acc kernels
+  for (int i = 0; i < count; i++)
+    a[i] = b[i];
+
+#pragma acc exit data copyout(data[0:n])
+}
+
+#define N 2000
+
+int data[N];
+
+int
+main ()
+{
+  for (int i=0; i < N; i++)
+    data[i] = i;
+
+  /* Baseline test; no aliasing. The high part of the data is copied to
+     the lower part.  */
+  int to = 0;
+  int from = N/2;
+  int count = N/2;
+  f (data, N, to, from, count);
+  for (int i=0; i < N; i++)
+    if (data[i] != (i%count)+count)
+      exit (1);
+
+  /* Check various amounts of data overlap.  */
+  int tests[] = {1, 10, N/4, N/2-10, N/2-1};
+  for (int t = 0; t < sizeof (tests)/sizeof(tests[0]); t++)
+    {
+      for (int i=0; i < N; i++)
+       data[i] = i;
+
+      /* Output overlaps the latter part of input; expect the initial no-aliased
+        part of the input to repeat throughout the aliased portion.  */
+      to = tests[t];
+      from = 0;
+      count = N-tests[t];
+      f (data, N, to, from, count);
+      for (int i=0; i < N; i++)
+       if (data[i] != i%tests[t])
+       exit (2);
+
+      for (int i=0; i < N; i++)
+       data[i] = i;
+
+      /* Input overlaps the latter part of the output; expect the copy to work
+        in the obvious manner.  */
+      to = 0;
+      from = tests[t];
+      count = N-tests[t];
+      f (data, N, to, from, count);
+      for (int i=0; i < count; i++)
+       if (data[i+to] != i+tests[t])
+       exit (3);
+    }
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-2.c
new file mode 100644
index 000000000000..96c03297d5b4
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-2.c
@@ -0,0 +1,90 @@
+/* Test that a simple array copy does the right thing when the input and
+   output data overlap.  The GPU kernel should automatically switch to
+   a sequential operation mode in order to give the expected results.
+
+   This test does not check the correctness of the output (there are other
+   tests for that), but checks that the code really does select the faster
+   path, when it can, by comparing the timing.  */
+
+/* No optimization means no issue with aliasing.
+   { dg-skip-if "" { *-*-* } { "-O0" } { "" } }
+   { dg-skip-if "" { *-*-* } { "-foffload=disable" } { "" } } */
+
+#include <stdlib.h>
+#include <sys/time.h>
+#include <openacc.h>
+
+void f(int *data, int n, int to, int from, int count)
+{
+  int *a = (int*)acc_deviceptr (data+to);
+  int *b = (int*)acc_deviceptr (data+from);
+
+#pragma acc kernels
+  for (int i = 0; i < count; i++)
+    a[i] = b[i];
+}
+
+#define N 1000000
+int data[N];
+
+int
+main ()
+{
+  struct timeval start, stop, difference;
+  long basetime, aliastime;
+
+  for (int i=0; i < N; i++)
+    data[i] = i;
+
+  /* Ensure that the data copies are outside the timed zone.  */
+#pragma acc enter data copyin(data[0:N])
+
+  /* Baseline test; no aliasing. The high part of the data is copied to
+     the lower part.  */
+  int to = 0;
+  int from = N/2;
+  int count = N/2;
+  gettimeofday (&start, NULL);
+  f (data, N, to, from, count);
+  gettimeofday (&stop, NULL);
+  timersub (&stop, &start, &difference);
+  basetime = difference.tv_sec * 1000000 + difference.tv_usec;
+
+  /* Check various amounts of data overlap.  */
+  int tests[] = {1, 10, N/4, N/2-10, N/2-1};
+  for (int i = 0; i < sizeof (tests)/sizeof(tests[0]); i++)
+    {
+      to = 0;
+      from = N/2 - tests[i];
+      gettimeofday (&start, NULL);
+      f (data, N, to, from, count);
+      gettimeofday (&stop, NULL);
+      timersub (&stop, &start, &difference);
+      aliastime = difference.tv_sec * 1000000 + difference.tv_usec;
+
+      /* If the aliased runtime is less than 200% of the non-aliased runtime
+        then the runtime alias check probably selected the wrong path.
+        (Actually we expect the difference to be far greater than that.)  */
+      if (basetime*2 > aliastime)
+       exit (1);
+    }
+
+  /* Repeat the baseline check just to make sure it didn't also get slower
+     after the first run.  */
+  to = 0;
+  from = N/2;
+  gettimeofday (&start, NULL);
+  f (data, N, to, from, count);
+  gettimeofday (&stop, NULL);
+  timersub (&stop, &start, &difference);
+  int controltime = difference.tv_sec * 1000000 + difference.tv_usec;
+
+  /* The two times should be roughly the same, but we just check it wouldn't
+     pass the aliastime test above.  */
+  if (basetime*2 <= controltime)
+    exit (2);
+
+#pragma acc exit data copyout(data[0:N])
+
+  return 0;
+}
--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 26/40] openacc: Warn about "independent" "kernels" loops with data-dependences
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
                   ` (24 preceding siblings ...)
  2021-12-15 15:54 ` [PATCH 25/40] openacc: Add runtime alias checking for OpenACC kernels Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2021-12-15 15:54 ` [PATCH 27/40] openacc: Handle internal function calls in pass_lim Frederik Harwath
                   ` (13 subsequent siblings)
  39 siblings, 0 replies; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: thomas

This commit concerns loops in OpenACC "kernels" region that have been marked
up with an explicit "independent" clause by the user, but for which Graphite
found data dependences.  A discussion on the private internal OpenACC mailing
list suggested that warning the user about the dependences woud be a more
acceptable solution than reverting the user's decision. This behavior is
implemented by the present commit.

gcc/ChangeLog:

        * common.opt: Add flag Wopenacc-false-independent.
        * omp-offload.c (oacc_loop_warn_if_false_independent): New function.
        (oacc_loop_fixed_partitions): Call from here.
---
 gcc/common.opt    |  5 +++++
 gcc/omp-offload.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 54 insertions(+)

diff --git a/gcc/common.opt b/gcc/common.opt
index b6c46ab63e34..ec76a88f14e3 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -850,6 +850,11 @@ Wtsan
 Common Var(warn_tsan) Init(1) Warning
 Warn about unsupported features in ThreadSanitizer.

+Wopenacc-false-independent
+Common Var(warn_openacc_false_independent) Init(1) Warning
+Warn in case a loop in an OpenACC \"kernels\" region has an \"independent\"
+clause but analysis shows that it has loop-carried dependences.
+
 Xassembler
 Driver Separate

diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 3458a1acbceb..36dde11f5955 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -1900,6 +1900,51 @@ oacc_loop_transform_auto_into_independent (oacc_loop *loop)
   return true;
 }

+/* Emit a warning if LOOP has an "independent" clause but Graphite's
+   analysis shows that it has data dependences. Note that we respect
+   the user's explicit decision to parallelize the loop but we
+   nevertheless warn that this decision could be wrong. */
+
+static void
+oacc_loop_warn_if_false_independent (oacc_loop *loop)
+{
+  if (!optimize)
+    return;
+
+  if (loop->routine)
+    return;
+
+  /* TODO Warn about "auto" & "independent" in "parallel" regions? */
+  if (!oacc_parallel_kernels_graphite_fun_p ())
+    return;
+
+  if (!(loop->flags & OLF_INDEPENDENT))
+    return;
+
+  bool analyzed = false;
+  bool can_be_parallel = oacc_loop_can_be_parallel_p (loop, analyzed);
+  loop_p cfg_loop = oacc_loop_get_cfg_loop (loop);
+
+  if (cfg_loop && cfg_loop->inner && !analyzed)
+    {
+      if (dump_enabled_p ())
+       {
+         const dump_user_location_t loc
+           = dump_user_location_t::from_location_t (loop->loc);
+         dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
+                          "'independent' loop in 'kernels' region has not been "
+                          "analyzed (cf. 'graphite' "
+                          "dumps for more information).\n");
+       }
+      return;
+    }
+
+  if (!can_be_parallel)
+    warning_at (loop->loc, 0,
+                "loop has \"independent\" clause but data dependences were "
+                "found.");
+}
+
 /* Walk the OpenACC loop hierarchy checking and assigning the
    programmer-specified partitionings.  OUTER_MASK is the partitioning
    this loop is contained within.  Return mask of partitioning
@@ -1951,6 +1996,10 @@ oacc_loop_fixed_partitions (oacc_loop *loop, unsigned outer_mask)
            }
        }

+      /* TODO Is this flag needed? Perhaps use -Wopenacc-parallelism? */
+      if (warn_openacc_false_independent)
+        oacc_loop_warn_if_false_independent (loop);
+
       if (maybe_auto && (loop->flags & OLF_INDEPENDENT))
        {
          loop->flags |= OLF_AUTO;
--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 27/40] openacc: Handle internal function calls in pass_lim
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
                   ` (25 preceding siblings ...)
  2021-12-15 15:54 ` [PATCH 26/40] openacc: Warn about "independent" "kernels" loops with data-dependences Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2021-12-15 15:54 ` [PATCH 28/40] openacc: Disable pass_pre on outlined functions analyzed by Graphite Frederik Harwath
                   ` (12 subsequent siblings)
  39 siblings, 0 replies; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: thomas, rguenther

The loop invariant motion pass correctly refuses to move statements
out of a loop if any other statement in the loop is unanalyzable.  The
pass does not know how to handle the OpenACC internal function calls
which was not necessary until recently when the OpenACC device
lowering pass was moved to a later position in the pass pipeline.

This commit changes pass_lim to ignore the OpenACC internal function
calls which do not contain any memory references. The hoisting enabled
by this change can be useful for the data-dependence analysis in
Graphite; for instance, in the outlined functions for OpenACC regions,
all invariant accesses to the ".omp_data_i" struct should be hoisted
out of the OpenACC loop.  This is particularly important for variables
that were scalars in the original loop and which have been turned into
accesses to the struct by the outlining process.  Not hoisting those
can prevent scalar evolution analysis which is crucial for Graphite.
Since any hoisting that introduces intermediate names - and hence,
"fake" dependences - inside the analyzed nest can be harmful to
data-dependence analysis, a flag to restrict the hoisting in OpenACC
functions is added to the pass. The pass instance that executes before
Graphite now runs with this flag set to true and the pass instance
after Graphite runs unrestricted.

A more precise way of selecting the statements for which hoisting
should be enabled is left for a future improvement.

gcc/ChangeLog:
        * passes.def: Set restrict_oacc_hoisting to true for the early
        pass_lim instance.
        * tree-ssa-loop-im.c (movement_possibility): Add
        restrict_oacc_hoisting flag to function; restrict movement if set.
        (compute_invariantness): Add restrict_oacc_hoisting flag and pass it on.
        (gather_mem_refs_stmt): Skip IFN_GOACC_LOOP and IFN_UNIQUE
        calls.
        (loop_invariant_motion_in_fun): Add restrict_oacc_hoisting flag and
        pass it on.
        (pass_lim::execute): Pass on new flags.
        * tree-ssa-loop-manip.h (loop_invariant_motion_in_fun): Adjust
        declaration.
        * gimple-loop-interchange.cc (pass_linterchange::execute): Adjust call to
        loop_invariant_motion_in_fun.
---
 gcc/gimple-loop-interchange.cc |  2 +-
 gcc/passes.def                 |  2 +-
 gcc/tree-ssa-loop-im.c         | 57 ++++++++++++++++++++++++++++------
 gcc/tree-ssa-loop-manip.h      |  2 +-
 4 files changed, 51 insertions(+), 12 deletions(-)

diff --git a/gcc/gimple-loop-interchange.cc b/gcc/gimple-loop-interchange.cc
index ccd5083145f8..7c9b7b2345fa 100644
--- a/gcc/gimple-loop-interchange.cc
+++ b/gcc/gimple-loop-interchange.cc
@@ -2107,7 +2107,7 @@ pass_linterchange::execute (function *fun)
   if (changed_p)
     {
       unsigned todo = TODO_update_ssa_only_virtuals;
-      todo |= loop_invariant_motion_in_fun (cfun, false);
+      todo |= loop_invariant_motion_in_fun (cfun, false, false);
       scev_reset ();
       return todo;
     }
diff --git a/gcc/passes.def b/gcc/passes.def
index 681392f8f79f..1da9382bac53 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -250,7 +250,7 @@ along with GCC; see the file COPYING3.  If not see
       NEXT_PASS (pass_cse_sincos);
       NEXT_PASS (pass_optimize_bswap);
       NEXT_PASS (pass_laddress);
-      NEXT_PASS (pass_lim);
+      NEXT_PASS (pass_lim, true /* restrict_oacc_hoisting */);
       NEXT_PASS (pass_walloca, false);
       NEXT_PASS (pass_pre);
       NEXT_PASS (pass_sink_code);
diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index 4b187c2cdafe..466dc494fb52 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -47,6 +47,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "builtins.h"
 #include "tree-dfa.h"
 #include "dbgcnt.h"
+#include "graphite-oacc.h"
+#include "internal-fn.h"

 /* TODO:  Support for predicated code motion.  I.e.

@@ -327,11 +329,23 @@ enum move_pos
    Otherwise return MOVE_IMPOSSIBLE.  */

 enum move_pos
-movement_possibility (gimple *stmt)
+movement_possibility (gimple *stmt, bool restrict_oacc_hoisting)
 {
   tree lhs;
   enum move_pos ret = MOVE_POSSIBLE;

+  if (restrict_oacc_hoisting && oacc_get_fn_attrib (cfun->decl)
+      && gimple_code (stmt) == GIMPLE_ASSIGN)
+    {
+      tree rhs = gimple_assign_rhs1 (stmt);
+
+      if (TREE_CODE (rhs) == VIEW_CONVERT_EXPR)
+       rhs = TREE_OPERAND (rhs, 0);
+
+      if (TREE_CODE (rhs) == ARRAY_REF)
+         return MOVE_IMPOSSIBLE;
+    }
+
   if (flag_unswitch_loops
       && gimple_code (stmt) == GIMPLE_COND)
     {
@@ -981,7 +995,7 @@ rewrite_bittest (gimple_stmt_iterator *bsi)
    statements.  */

 static void
-compute_invariantness (basic_block bb)
+compute_invariantness (basic_block bb, bool restrict_oacc_hoisting)
 {
   enum move_pos pos;
   gimple_stmt_iterator bsi;
@@ -1009,7 +1023,7 @@ compute_invariantness (basic_block bb)
       {
        stmt = gsi_stmt (bsi);

-       pos = movement_possibility (stmt);
+       pos = movement_possibility (stmt, restrict_oacc_hoisting);
        if (pos == MOVE_IMPOSSIBLE)
          continue;

@@ -1040,7 +1054,7 @@ compute_invariantness (basic_block bb)
     {
       stmt = gsi_stmt (bsi);

-      pos = movement_possibility (stmt);
+      pos = movement_possibility (stmt, restrict_oacc_hoisting);
       if (pos == MOVE_IMPOSSIBLE)
        {
          if (nonpure_call_p (stmt))
@@ -1465,6 +1479,13 @@ gather_mem_refs_stmt (class loop *loop, gimple *stmt)
   if (!gimple_vuse (stmt))
     return;

+  /* The expansion of those OpenACC internal function calls which occurs in a
+   * later pass does not introduce any memory references. Hence it is safe to
+   * ignore them. */
+  if (gimple_call_internal_p (stmt, IFN_GOACC_LOOP)
+      || gimple_call_internal_p (stmt, IFN_UNIQUE))
+    return;
+
   mem = simple_mem_ref_in_stmt (stmt, &is_stored);
   if (!mem && is_gimple_assign (stmt))
     {
@@ -1506,7 +1527,7 @@ gather_mem_refs_stmt (class loop *loop, gimple *stmt)
       ao_ref_alias_set (&aor);
       HOST_WIDE_INT offset, size, max_size;
       poly_int64 saved_maxsize = aor.max_size, mem_off;
-      tree mem_base;
+      tree mem_base = NULL;
       bool ref_decomposed;
       if (aor.max_size_known_p ()
          && aor.offset.is_constant (&offset)
@@ -3244,7 +3265,8 @@ tree_ssa_lim_finalize (void)
    Only perform store motion if STORE_MOTION is true.  */

 unsigned int
-loop_invariant_motion_in_fun (function *fun, bool store_motion)
+loop_invariant_motion_in_fun (function *fun, bool store_motion,
+                             bool restrict_oacc_hoisting)
 {
   unsigned int todo = 0;

@@ -3262,7 +3284,7 @@ loop_invariant_motion_in_fun (function *fun, bool store_motion)
   /* For each statement determine the outermost loop in that it is
      invariant and cost for computing the invariant.  */
   for (int i = 0; i < n; ++i)
-    compute_invariantness (BASIC_BLOCK_FOR_FN (fun, rpo[i]));
+    compute_invariantness (BASIC_BLOCK_FOR_FN (fun, rpo[i]), restrict_oacc_hoisting);

   /* Execute store motion.  Force the necessary invariants to be moved
      out of the loops as well.  */
@@ -3309,13 +3331,21 @@ class pass_lim : public gimple_opt_pass
 {
 public:
   pass_lim (gcc::context *ctxt)
-    : gimple_opt_pass (pass_data_lim, ctxt)
+    : gimple_opt_pass (pass_data_lim, ctxt), restrict_oacc_hoisting (false)
   {}

+  void set_pass_param (unsigned int n, bool param)
+    {
+      gcc_assert (n == 0);
+      restrict_oacc_hoisting = param;
+    }
+
   /* opt_pass methods: */
   opt_pass * clone () { return new pass_lim (m_ctxt); }
   virtual bool gate (function *) { return flag_tree_loop_im != 0; }
   virtual unsigned int execute (function *);
+private:
+  bool restrict_oacc_hoisting;

 }; // class pass_lim

@@ -3328,7 +3358,16 @@ pass_lim::execute (function *fun)

   if (number_of_loops (fun) <= 1)
     return 0;
-  unsigned int todo = loop_invariant_motion_in_fun (fun, flag_move_loop_stores);
+
+  bool store_motion = flag_move_loop_stores;
+  /* TODO Enabling store motion in OpenACC kernel functions requires further
+     handling of the OpenACC internal function calls.  It can also be harmful
+     to data-dependence analysis. Keep it disabled for now. */
+  if (oacc_function_p (cfun) && graphite_analyze_oacc_target_region_type_p (cfun))
+    store_motion = false;
+
+  unsigned int todo = loop_invariant_motion_in_fun (fun, store_motion,
+                                                   restrict_oacc_hoisting);

   if (!in_loop_pipeline)
     loop_optimizer_finalize ();
diff --git a/gcc/tree-ssa-loop-manip.h b/gcc/tree-ssa-loop-manip.h
index 4f604e1bd24a..864fb9f1d355 100644
--- a/gcc/tree-ssa-loop-manip.h
+++ b/gcc/tree-ssa-loop-manip.h
@@ -53,7 +53,7 @@ extern void tree_transform_and_unroll_loop (class loop *, unsigned,
                                            transform_callback, void *);
 extern void tree_unroll_loop (class loop *, unsigned, tree_niter_desc *);
 extern tree canonicalize_loop_ivs (class loop *, tree *, bool);
-extern unsigned int loop_invariant_motion_in_fun (function *, bool);
+extern unsigned int loop_invariant_motion_in_fun (function *, bool, bool);


 #endif /* GCC_TREE_SSA_LOOP_MANIP_H */
--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 28/40] openacc: Disable pass_pre on outlined functions analyzed by Graphite
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
                   ` (26 preceding siblings ...)
  2021-12-15 15:54 ` [PATCH 27/40] openacc: Handle internal function calls in pass_lim Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2021-12-15 15:54 ` [PATCH 29/40] graphite: Tune parameters for OpenACC use Frederik Harwath
                   ` (11 subsequent siblings)
  39 siblings, 0 replies; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: thomas, rguenther

The additional dependences introduced by partial redundancy
elimination proper and by the code hoisting step of the pass very
often cause Graphite to fail on OpenACC functions. On the other hand,
the pass can also enable the analysis of OpenACC loops (cf. e.g. the
loop-auto-transfer-4.f90 testcase), for instance, because full
redundancy elimination removes definitions that would otherwise
prevent the creation of runtime alias checks outside of the SCoP.

This commit disables the actual partial redundancy elimination step as
well as the code hoisting step of pass_pre on OpenACC functions that
might be handled by Graphite.

gcc/ChangeLog:

        * tree-ssa-pre.c (insert): Skip any insertions in OpenACC
        functions that might be processed by Graphite.
---
 gcc/tree-ssa-pre.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index dc55d868cc19..d61210fc2ee9 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -52,6 +52,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-cfgcleanup.h"
 #include "alias.h"
 #include "gimple-range.h"
+#include "graphite-oacc.h"

 /* Even though this file is called tree-ssa-pre.c, we actually
    implement a bit more than just PRE here.  All of them piggy-back
@@ -3742,6 +3743,22 @@ do_hoist_insertion (basic_block block)
 static void
 insert (void)
 {
+
+    /* The additional dependences introduced by the code insertions
+     can cause Graphite's dependence analysis to fail .  Without
+     special handling of those dependences in Graphite, it seems
+     better to skip this step if OpenACC loops that need to be handled
+     by Graphite are found.  Note that the full redundancy elimination
+     step of this pass is useful for the purpose of dependence
+     analysis, for instance, because it can remove definitions from
+     SCoPs that would otherwise prevent the creation of runtime alias
+     checks since those may only use definitions that are available
+     before the SCoP. */
+
+  if (oacc_function_p (cfun)
+      && ::graphite_analyze_oacc_function_p (cfun))
+    return;
+
   basic_block bb;

   FOR_ALL_BB_FN (bb, cfun)
--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 29/40] graphite: Tune parameters for OpenACC use
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
                   ` (27 preceding siblings ...)
  2021-12-15 15:54 ` [PATCH 28/40] openacc: Disable pass_pre on outlined functions analyzed by Graphite Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2021-12-15 15:54 ` [PATCH 30/40] graphite: Adjust scop loop-nest choice Frederik Harwath
                   ` (10 subsequent siblings)
  39 siblings, 0 replies; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: thomas, sebpop, grosser, rguenther

The default values of some parameters that restrict Graphite's
resource usage are too low for many OpenACC codes.  Furthermore,
exceeding the limits does not alwas lead to user-visible diagnostic
messages.

This commit increases the parameter values on OpenACC functions.  The
values were chosen to allow for the analysis of all "kernels" regions
in the SPEC ACCEL v1.3 benchmark suite.  Warnings about exceeded
Graphite-related limits are added to the -fopt-info-missed
output. Those warnings are phrased in a uniform way that intentionally
refers to the "data-dependence analysis" of "OpenACC loops" instead of
"a failure in Graphite" to make them easier to understand for users.

gcc/ChangeLog:

        * graphite-optimize-isl.c (optimize_isl): Adjust
        param_max_isl_operations value for OpenACC functions and add
        special warnings if value gets exceeded.

        * graphite-scop-detection.c (build_scops): Likewise for
        param_graphite_max_arrays_per_scop.

gcc/testsuite/ChangeLog:

        * gcc.dg/goacc/graphite-parameter-1.c: New test.
        * gcc.dg/goacc/graphite-parameter-2.c: New test.
---
 gcc/graphite-optimize-isl.c                   | 35 ++++++++++++++++---
 gcc/graphite-scop-detection.c                 | 28 ++++++++++++++-
 .../gcc.dg/goacc/graphite-parameter-1.c       | 21 +++++++++++
 .../gcc.dg/goacc/graphite-parameter-2.c       | 23 ++++++++++++
 4 files changed, 101 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/goacc/graphite-parameter-1.c
 create mode 100644 gcc/testsuite/gcc.dg/goacc/graphite-parameter-2.c

diff --git a/gcc/graphite-optimize-isl.c b/gcc/graphite-optimize-isl.c
index 019452700a49..4eecbd20b740 100644
--- a/gcc/graphite-optimize-isl.c
+++ b/gcc/graphite-optimize-isl.c
@@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "dumpfile.h"
 #include "tree-vectorizer.h"
 #include "graphite.h"
+#include "graphite-oacc.h"


 /* get_schedule_for_node_st - Improve schedule for the schedule node.
@@ -115,6 +116,14 @@ optimize_isl (scop_p scop, bool oacc_enabled_graphite)
   int old_err = isl_options_get_on_error (scop->isl_context);
   int old_max_operations = isl_ctx_get_max_operations (scop->isl_context);
   int max_operations = param_max_isl_operations;
+
+  /* The default value for param_max_isl_operations is easily exceeded
+     by "kernels" loops in existing OpenACC codes.  Raise the values
+     significantly since analyzing those loops is crucial. */
+  if (param_max_isl_operations == 350000 /* default value */
+      && oacc_function_p (cfun))
+    max_operations = 2000000;
+
   if (max_operations)
     isl_ctx_set_max_operations (scop->isl_context, max_operations);
   isl_options_set_on_error (scop->isl_context, ISL_ON_ERROR_CONTINUE);
@@ -164,11 +173,27 @@ optimize_isl (scop_p scop, bool oacc_enabled_graphite)
          dump_user_location_t loc = find_loop_location
            (scop->scop_info->region.entry->dest->loop_father);
          if (isl_ctx_last_error (scop->isl_context) == isl_error_quota)
-           dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
-                            "loop nest not optimized, optimization timed out "
-                            "after %d operations [--param max-isl-operations]\n",
-                            max_operations);
-         else
+           {
+              if (oacc_function_p (cfun))
+               {
+                 /* Special casing for OpenACC to unify diagnostic messages
+                    here and in graphite-scop-detection.c. */
+                  dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
+                                   "data-dependence analysis of OpenACC loop "
+                                   "nest "
+                                   "failed; try increasing the value of "
+                                   "--param="
+                                   "max-isl-operations=%d.\n",
+                                   max_operations);
+                }
+              else
+                dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
+                                 "loop nest not optimized, optimization timed "
+                                 "out after %d operations [--param "
+                                 "max-isl-operations]\n",
+                                 max_operations);
+            }
+          else
            dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
                             "loop nest not optimized, ISL signalled an error\n");
        }
diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index 234dbe0ec729..9a5e43a5bfc6 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -2053,6 +2053,9 @@ determine_openacc_reductions (scop_p scop)
     }
 }

+
+extern dump_user_location_t find_loop_location (class loop *);
+
 /* Find Static Control Parts (SCoP) in the current function and pushes
    them to SCOPS.  */

@@ -2106,6 +2109,11 @@ build_scops (vec<scop_p> *scops)
        }

       unsigned max_arrays = param_graphite_max_arrays_per_scop;
+
+      if (oacc_function_p (cfun)
+          && param_graphite_max_arrays_per_scop == 100 /* default value */)
+        max_arrays = 200;
+
       if (max_arrays > 0
          && scop->drs.length () >= max_arrays)
        {
@@ -2113,7 +2121,16 @@ build_scops (vec<scop_p> *scops)
                       << scop->drs.length ()
                       << " is larger than --param graphite-max-arrays-per-scop="
                       << max_arrays << ".\n");
-         free_scop (scop);
+
+          if (dump_enabled_p () && oacc_function_p (cfun))
+            dump_printf_loc (MSG_MISSED_OPTIMIZATION,
+                             find_loop_location (s->entry->dest->loop_father),
+                             "data-dependence analysis of OpenACC loop nest "
+                             "failed; try increasing the value of --param="
+                             "graphite-max-arrays-per-scop=%d.\n",
+                             max_arrays);
+
+          free_scop (scop);
          continue;
        }

@@ -2126,6 +2143,15 @@ build_scops (vec<scop_p> *scops)
                          << scop_nb_params (scop)
                          << " larger than --param graphite-max-nb-scop-params="
                          << max_dim << ".\n");
+
+          if (dump_enabled_p () && oacc_function_p (cfun))
+            dump_printf_loc (MSG_MISSED_OPTIMIZATION,
+                             find_loop_location (s->entry->dest->loop_father),
+                             "data-dependence analysis of OpenACC loop nest "
+                             "failed; try increasing the value of --param="
+                             "graphite-max-nb-scop-params=%d.\n",
+                             max_dim);
+
          free_scop (scop);
          continue;
        }
diff --git a/gcc/testsuite/gcc.dg/goacc/graphite-parameter-1.c b/gcc/testsuite/gcc.dg/goacc/graphite-parameter-1.c
new file mode 100644
index 000000000000..45adbb3f0e85
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/goacc/graphite-parameter-1.c
@@ -0,0 +1,21 @@
+/* Verify that a warning about an exceeded Graphite parameter gets
+   output as optimization information and not only as a dump message
+   for OpenACC functions. */
+
+/* { dg-additional-options "-O2 -fopt-info-missed --param=graphite-max-arrays-per-scop=1" } */
+
+extern int a[1000];
+extern int b[1000];
+
+void test ()
+{
+#pragma acc parallel loop auto
+/* { dg-missed {data-dependence analysis of OpenACC loop nest failed\; try increasing the value of --param=graphite-max-arrays-per-scop=1.} "" { target *-*-* } .-1  } */
+/* { dg-missed {'auto' loop has not been analyzed \(cf. 'graphite' dumps for more information\).} "" { target *-*-* } .-2 } */
+/* { dg-missed {.*not inlinable.*} "" { target *-*-* } .-3 } */
+  for (int i = 1; i < 995; i++)
+    a[i] = b[i + 5] + b[i - 1];
+}
+
+
+/* { dg-prune-output ".*not inlinable.*"} */
diff --git a/gcc/testsuite/gcc.dg/goacc/graphite-parameter-2.c b/gcc/testsuite/gcc.dg/goacc/graphite-parameter-2.c
new file mode 100644
index 000000000000..f2830cd62db0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/goacc/graphite-parameter-2.c
@@ -0,0 +1,23 @@
+/* Verify that a warning about an exceeded Graphite parameter gets
+   output as optimization information and not only as a dump message
+   for OpenACC functions. */
+
+/* { dg-additional-options "-O2 -fopt-info-missed --param=max-isl-operations=1" } */
+
+void test (int* restrict a, int *restrict b)
+{
+  int i = 1;
+  int j = 1;
+  int m = 0;
+
+#pragma acc parallel loop auto copyin(b) copyout(a) reduction(max:m)
+/* { dg-missed {data-dependence analysis of OpenACC loop nest failed; try increasing the value of --param=max-isl-operations=1.} "" { target *-*-* } .-1  } */
+/* { dg-missed {'auto' loop has not been analyzed \(cf. 'graphite' dumps for more information\).} "" { target *-*-* } .-2 } */
+/* { dg-missed {.*not inlinable.*} "" { target *-*-* } .-3 } */
+  for (i = 1; i < 995; i++)
+    {
+      int x = b[i] * 2;
+      for (j = 1; j < 995; j++)
+        m = m + a[i] + x;
+    }
+}
--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 30/40] graphite: Adjust scop loop-nest choice
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
                   ` (28 preceding siblings ...)
  2021-12-15 15:54 ` [PATCH 29/40] graphite: Tune parameters for OpenACC use Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2021-12-15 15:54 ` [PATCH 31/40] graphite: Accept loops without data references Frederik Harwath
                   ` (9 subsequent siblings)
  39 siblings, 0 replies; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: thomas, sebpop, grosser, rguenther

The find_common_loop function is used in Graphite to obtain a common
super-loop of all loops inside a SCoP.  The function is applied to the
loop of the destination block of the edge that leads into the SESE
region and the loop of the source block of the edge that exits the
region.  The exit block is usually introduced by the canonicalization
of the loop structure that Graphite does to support its code
generation. If it is empty, it may happen that it belongs to the outer
fake loop.  This way, build_alias_set may end up analysing
data-references with respect to this loop although there may exist a
proper super-loop of the SCoP loops.  This does not seem to be correct
in general and it leads to problems with runtime alias check creation
which fails if executed on a loop without niter information.

gcc/ChangeLog:

        * graphite-scop-detection.c (scop_context_loop): New function.
        (build_alias_set): Use scop_context_loop instead of find_common_loop.
        * graphite-isl-ast-to-gimple.c (graphite_regenerate_ast_isl): Likewise.
        * graphite.h (scop_context_loop): New declaration.
---
 gcc/graphite-isl-ast-to-gimple.c |  4 +---
 gcc/graphite-scop-detection.c    | 21 ++++++++++++++++++---
 gcc/graphite.h                   |  1 +
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index 010adaabb000..acadf544fadd 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -1543,9 +1543,7 @@ graphite_regenerate_ast_isl (scop_p scop)
         conditional if aliasing can be ruled out at runtime and the original
         version of the SCoP, otherwise. */

-      loop_p loop
-          = find_common_loop (scop->scop_info->region.entry->dest->loop_father,
-                              scop->scop_info->region.exit->src->loop_father);
+      loop_p loop = scop_context_loop (scop);
       tree cond = generate_alias_cond (scop->unhandled_alias_ddrs, loop);
       tree non_alias_cond = build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
       set_ifsese_condition (region->if_region, non_alias_cond);
diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index 9a5e43a5bfc6..f173e6c4f890 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -297,6 +297,23 @@ single_pred_cond_non_loop_exit (basic_block bb)
   return NULL;
 }

+
+/* Return the innermost loop that encloses all loops in SCOP. */
+
+loop_p
+scop_context_loop (scop_p scop)
+{
+  edge scop_entry = scop->scop_info->region.entry;
+  edge scop_exit = scop->scop_info->region.exit;
+  basic_block exit_bb = scop_exit->src;
+
+  while (sese_trivially_empty_bb_p (exit_bb) && single_pred_p (exit_bb))
+    exit_bb = single_pred (exit_bb);
+
+  loop_p entry_loop = scop_entry->dest->loop_father;
+  return find_common_loop (entry_loop, exit_bb->loop_father);
+}
+
 namespace
 {

@@ -1774,9 +1791,7 @@ build_alias_set (scop_p scop)
   int i, j;
   int *all_vertices;

-  struct loop *nest
-    = find_common_loop (scop->scop_info->region.entry->dest->loop_father,
-                       scop->scop_info->region.exit->src->loop_father);
+  struct loop *nest = scop_context_loop (scop);

   gcc_checking_assert (nest);

diff --git a/gcc/graphite.h b/gcc/graphite.h
index 9c508f31109f..dacb27a9073c 100644
--- a/gcc/graphite.h
+++ b/gcc/graphite.h
@@ -480,4 +480,5 @@ extern tree cached_scalar_evolution_in_region (const sese_l &, loop_p, tree);
 extern void dot_all_sese (FILE *, vec<sese_l> &);
 extern void dot_sese (sese_l &);
 extern void dot_cfg ();
+extern loop_p scop_context_loop (scop_p);
 #endif
--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 31/40] graphite: Accept loops without data references
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
                   ` (29 preceding siblings ...)
  2021-12-15 15:54 ` [PATCH 30/40] graphite: Adjust scop loop-nest choice Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2021-12-15 15:54 ` [PATCH 32/40] Reference reduction localization Frederik Harwath
                   ` (8 subsequent siblings)
  39 siblings, 0 replies; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: thomas, sebpop, grosser, rguenther

It seems that the check that rejects loops without data references is
only included to avoid handling non-profitable loops.  Including those
loops in Graphite's analysis enables more consistent diagnostic
messages in OpenACC "kernels" code and does not introduce any
testsuite regressions.  If executing Graphite on loops without
data references leads to noticeable compile time slow-downs for
non-OpenACC users of Graphite, the check can be re-introduced but
restricted to non-OpenACC functions.

gcc/ChangeLog:

        * graphite-scop-detection.c (scop_detection::harmful_loop_in_region):
        Remove check for loops without data references.
---
 gcc/graphite-scop-detection.c | 13 -------------
 1 file changed, 13 deletions(-)

diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index f173e6c4f890..2dcb85508a3d 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -849,19 +849,6 @@ scop_detection::harmful_loop_in_region (sese_l scop) const
          return true;
        }

-      /* Check if all loop nests have at least one data reference.
-        ???  This check is expensive and loops premature at this point.
-        If important to retain we can pre-compute this for all innermost
-        loops and reject those when we build a SESE region for a loop
-        during SESE discovery.  */
-      if (! loop->inner
-         && ! loop_nest_has_data_refs (loop))
-       {
-         DEBUG_PRINT (dp << "[scop-detection-fail] loop_" << loop->num
-                      << " does not have any data reference.\n");
-         return true;
-       }
-
       DEBUG_PRINT (dp << "[scop-detection] loop_" << loop->num << " is harmless.\n");
     }

--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 32/40] Reference reduction localization
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
                   ` (30 preceding siblings ...)
  2021-12-15 15:54 ` [PATCH 31/40] graphite: Accept loops without data references Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2021-12-15 15:54 ` [PATCH 33/40] Fix tree check failure with " Frederik Harwath
                   ` (7 subsequent siblings)
  39 siblings, 0 replies; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: Julian Brown, thomas

From: Julian Brown <julian@codesourcery.com>

        gcc/
        * gimplify.c (privatize_reduction): New struct.
        (localize_reductions_r, localize_reductions): New functions.
        (gimplify_omp_for): Call localize_reductions.
        (gimplify_omp_workshare): Likewise.
        * omp-low.c (lower_oacc_reductions): Handle localized reductions.
        Create fewer temp vars.
        * tree-core.h (omp_clause_code): Add OMP_CLAUSE_REDUCTION_PRIVATE_DECL
        documentation.
        * tree.c (omp_clause_num_ops): Bump number of ops for
        OMP_CLAUSE_REDUCTION to 6.
        (walk_tree_1): Adjust accordingly.
        * tree.h (OMP_CLAUSE_REDUCTION_PRIVATE_DECL): Add macro.
---
 gcc/gimplify.c  | 102 +++++++++++++++++++++++++++++++++++
 gcc/omp-low.c   |  45 +++++-----------
 gcc/tree-core.h |   4 +-
 gcc/tree.c      | 137 +++++++++++++++++++++++++++++++++++++++++++++---
 gcc/tree.h      |   2 +
 5 files changed, 250 insertions(+), 40 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index c2ab96e7e182..9a4331c70d6e 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -240,6 +240,11 @@ struct gimplify_omp_ctx
   int defaultmap[5];
 };

+struct privatize_reduction
+{
+  tree ref_var, local_var;
+};
+
 static struct gimplify_ctx *gimplify_ctxp;
 static struct gimplify_omp_ctx *gimplify_omp_ctxp;
 static bool in_omp_construct;
@@ -11900,6 +11905,80 @@ gimplify_omp_taskloop_expr (tree type, tree *tp, gimple_seq *pre_p,
   OMP_FOR_CLAUSES (orig_for_stmt) = c;
 }

+/* Helper function for localize_reductions.  Replace all uses of REF_VAR with
+   LOCAL_VAR.  */
+
+static tree
+localize_reductions_r (tree *tp, int *walk_subtrees, void *data)
+{
+  enum tree_code tc = TREE_CODE (*tp);
+  struct privatize_reduction *pr = (struct privatize_reduction *) data;
+
+  if (TYPE_P (*tp))
+    *walk_subtrees = 0;
+
+  switch (tc)
+    {
+    case INDIRECT_REF:
+    case MEM_REF:
+      if (TREE_OPERAND (*tp, 0) == pr->ref_var)
+       *tp = pr->local_var;
+
+      *walk_subtrees = 0;
+      break;
+
+    case VAR_DECL:
+    case PARM_DECL:
+    case RESULT_DECL:
+      if (*tp == pr->ref_var)
+       *tp = pr->local_var;
+
+      *walk_subtrees = 0;
+      break;
+
+    default:
+      break;
+    }
+
+  return NULL_TREE;
+}
+
+/* OpenACC worker and vector loop state propagation requires reductions
+   to be inside local variables.  This function replaces all reference-type
+   reductions variables associated with the loop with a local copy.  It is
+   also used to create private copies of reduction variables for those
+   which are not associated with acc loops.  */
+
+static void
+localize_reductions (tree clauses, tree body)
+{
+  tree c, var, type, new_var;
+  struct privatize_reduction pr;
+
+  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION)
+      {
+       var = OMP_CLAUSE_DECL (c);
+
+       if (!lang_hooks.decls.omp_privatize_by_reference (var))
+         {
+           OMP_CLAUSE_REDUCTION_PRIVATE_DECL (c) = NULL;
+           continue;
+         }
+
+       type = TREE_TYPE (TREE_TYPE (var));
+       new_var = create_tmp_var (type, IDENTIFIER_POINTER (DECL_NAME (var)));
+
+       pr.ref_var = var;
+       pr.local_var = new_var;
+
+       walk_tree (&body, localize_reductions_r, &pr, NULL);
+
+       OMP_CLAUSE_REDUCTION_PRIVATE_DECL (c) = new_var;
+      }
+}
+
+
 /* Gimplify the gross structure of an OMP_FOR statement.  */

 static enum gimplify_status
@@ -12126,6 +12205,23 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
       gcc_unreachable ();
     }

+  if (ort == ORT_ACC)
+    {
+      gimplify_omp_ctx *outer = gimplify_omp_ctxp;
+
+      while (outer
+            && outer->region_type != ORT_ACC_PARALLEL
+            && outer->region_type != ORT_ACC_KERNELS)
+       outer = outer->outer_context;
+
+      /* FIXME: Reductions only work in parallel regions at present.  We avoid
+        doing the reduction localization transformation in kernels regions
+        here, because the code to remove reductions in kernels regions cannot
+        handle that.  */
+      if (outer && outer->region_type == ORT_ACC_PARALLEL)
+       localize_reductions (OMP_FOR_CLAUSES (*expr_p), OMP_FOR_BODY (*expr_p));
+    }
+
   /* Set OMP_CLAUSE_LINEAR_NO_COPYIN flag on explicit linear
      clause for the IV.  */
   if (ort == ORT_SIMD && TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt)) == 1)
@@ -13654,6 +13750,12 @@ gimplify_omp_workshare (tree *expr_p, gimple_seq *pre_p)
       || (ort & ORT_HOST_TEAMS) == ORT_HOST_TEAMS)
     {
       push_gimplify_context ();
+
+      /* FIXME: Reductions are not supported in kernels regions yet.  */
+      if (/*ort == ORT_ACC_KERNELS ||*/ ort == ORT_ACC_PARALLEL)
+        localize_reductions (OMP_TARGET_CLAUSES (*expr_p),
+                            OMP_TARGET_BODY (*expr_p));
+
       gimple *g = gimplify_and_return_first (OMP_BODY (expr), &body);
       if (gimple_code (g) == GIMPLE_BIND)
        pop_gimplify_context (g);
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index afd6061ae1e9..ae5cdfc5e260 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -7530,9 +7530,9 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
             || is_oacc_kernels_decomposed_graphite_part (ctx));

        tree orig = OMP_CLAUSE_DECL (c);
-       tree var = maybe_lookup_decl (orig, ctx);
+       tree var;
        tree ref_to_res = NULL_TREE;
-       tree incoming, outgoing, v1, v2, v3;
+       tree incoming, outgoing;
        bool is_private = false;

        enum tree_code rcode = OMP_CLAUSE_REDUCTION_CODE (c);
@@ -7544,6 +7544,9 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
          rcode = BIT_IOR_EXPR;
        tree op = build_int_cst (unsigned_type_node, rcode);

+       var = OMP_CLAUSE_REDUCTION_PRIVATE_DECL (c);
+       if (!var)
+         var = maybe_lookup_decl (orig, ctx);
        if (!var)
          var = orig;

@@ -7636,34 +7639,11 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,

        if (omp_privatize_by_reference (orig))
          {
-           tree type = TREE_TYPE (var);
-           const char *id = IDENTIFIER_POINTER (DECL_NAME (var));
-
-           if (!inner)
-             {
-               tree x = create_tmp_var (TREE_TYPE (type), id);
-               gimplify_assign (var, build_fold_addr_expr (x), fork_seq);
-             }
-
-           v1 = create_tmp_var (type, id);
-           v2 = create_tmp_var (type, id);
-           v3 = create_tmp_var (type, id);
-
-           gimplify_assign (v1, var, fork_seq);
-           gimplify_assign (v2, var, fork_seq);
-           gimplify_assign (v3, var, fork_seq);
-
-           var = build_simple_mem_ref (var);
-           v1 = build_simple_mem_ref (v1);
-           v2 = build_simple_mem_ref (v2);
-           v3 = build_simple_mem_ref (v3);
            outgoing = build_simple_mem_ref (outgoing);

            if (!TREE_CONSTANT (incoming))
              incoming = build_simple_mem_ref (incoming);
          }
-       else
-         v1 = v2 = v3 = var;

        /* Determine position in reduction buffer, which may be used
           by target.  The parser has ensured that this is not a
@@ -7696,20 +7676,21 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
          = build_call_expr_internal_loc (loc, IFN_GOACC_REDUCTION,
                                          TREE_TYPE (var), 6, init_code,
                                          unshare_expr (ref_to_res),
-                                         v1, level, op, off);
+                                         var, level, op, off);
        tree fini_call
          = build_call_expr_internal_loc (loc, IFN_GOACC_REDUCTION,
                                          TREE_TYPE (var), 6, fini_code,
                                          unshare_expr (ref_to_res),
-                                         v2, level, op, off);
+                                         var, level, op, off);
        tree teardown_call
          = build_call_expr_internal_loc (loc, IFN_GOACC_REDUCTION,
-                                         TREE_TYPE (var), 6, teardown_code,
-                                         ref_to_res, v3, level, op, off);
+                                         TREE_TYPE (var), 6,
+                                         teardown_code, ref_to_res, var,
+                                         level, op, off);

-       gimplify_assign (v1, setup_call, &before_fork);
-       gimplify_assign (v2, init_call, &after_fork);
-       gimplify_assign (v3, fini_call, &before_join);
+       gimplify_assign (var, setup_call, &before_fork);
+       gimplify_assign (var, init_call, &after_fork);
+       gimplify_assign (var, fini_call, &before_join);
        gimplify_assign (outgoing, teardown_call, &after_join);
       }

diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index f0c65a25f070..980bdee6c285 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -269,7 +269,9 @@ enum omp_clause_code {
                 placeholder used in OMP_CLAUSE_REDUCTION_{INIT,MERGE}.
      Operand 4: OMP_CLAUSE_REDUCTION_DECL_PLACEHOLDER: Another dummy
                VAR_DECL placeholder, used like the above for C/C++ array
-               reductions.  */
+               reductions.
+     Operand 5: OMP_CLAUSE_REDUCTION_PRIVATE_DECL: A private VAR_DECL of
+                the original DECL associated with the reduction clause.  */
   OMP_CLAUSE_REDUCTION,

   /* OpenMP clause: task_reduction (operator:variable_list).  */
diff --git a/gcc/tree.c b/gcc/tree.c
index 7bfd64160f4e..08f5a3e884bf 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -283,7 +283,7 @@ unsigned const char omp_clause_num_ops[] =
   1, /* OMP_CLAUSE_SHARED  */
   1, /* OMP_CLAUSE_FIRSTPRIVATE  */
   2, /* OMP_CLAUSE_LASTPRIVATE  */
-  5, /* OMP_CLAUSE_REDUCTION  */
+  6, /* OMP_CLAUSE_REDUCTION  */
   5, /* OMP_CLAUSE_TASK_REDUCTION  */
   5, /* OMP_CLAUSE_IN_REDUCTION  */
   1, /* OMP_CLAUSE_COPYIN  */
@@ -11134,12 +11134,135 @@ walk_tree_1 (tree *tp, walk_tree_fn func, void *data,
       break;

     case OMP_CLAUSE:
-      {
-       int len = omp_clause_num_ops[OMP_CLAUSE_CODE (*tp)];
-       for (int i = 0; i < len; i++)
-         WALK_SUBTREE (OMP_CLAUSE_OPERAND (*tp, i));
-       WALK_SUBTREE_TAIL (OMP_CLAUSE_CHAIN (*tp));
-      }
+      switch (OMP_CLAUSE_CODE (*tp))
+       {
+       case OMP_CLAUSE_GANG:
+         WALK_SUBTREE (OMP_CLAUSE_OPERAND (*tp, 1));
+         /* FALLTHRU */
+
+       case OMP_CLAUSE_ASYNC:
+       case OMP_CLAUSE_WAIT:
+       case OMP_CLAUSE_WORKER:
+       case OMP_CLAUSE_VECTOR:
+       case OMP_CLAUSE_NUM_GANGS:
+       case OMP_CLAUSE_NUM_WORKERS:
+       case OMP_CLAUSE_VECTOR_LENGTH:
+       case OMP_CLAUSE_PRIVATE:
+       case OMP_CLAUSE_SHARED:
+       case OMP_CLAUSE_FIRSTPRIVATE:
+       case OMP_CLAUSE_COPYIN:
+       case OMP_CLAUSE_COPYPRIVATE:
+       case OMP_CLAUSE_FILTER:
+       case OMP_CLAUSE_FINAL:
+       case OMP_CLAUSE_IF:
+       case OMP_CLAUSE_NUM_THREADS:
+       case OMP_CLAUSE_SCHEDULE:
+       case OMP_CLAUSE_UNIFORM:
+       case OMP_CLAUSE_DEPEND:
+       case OMP_CLAUSE_NONTEMPORAL:
+       case OMP_CLAUSE_NUM_TEAMS:
+       case OMP_CLAUSE_THREAD_LIMIT:
+       case OMP_CLAUSE_DEVICE:
+       case OMP_CLAUSE_DIST_SCHEDULE:
+       case OMP_CLAUSE_SAFELEN:
+       case OMP_CLAUSE_SIMDLEN:
+       case OMP_CLAUSE_ORDERED:
+       case OMP_CLAUSE_PRIORITY:
+       case OMP_CLAUSE_GRAINSIZE:
+       case OMP_CLAUSE_NUM_TASKS:
+       case OMP_CLAUSE_HINT:
+       case OMP_CLAUSE_TO_DECLARE:
+       case OMP_CLAUSE_LINK:
+       case OMP_CLAUSE_DETACH:
+       case OMP_CLAUSE_USE_DEVICE_PTR:
+       case OMP_CLAUSE_USE_DEVICE_ADDR:
+       case OMP_CLAUSE_IS_DEVICE_PTR:
+       case OMP_CLAUSE_INCLUSIVE:
+       case OMP_CLAUSE_EXCLUSIVE:
+       case OMP_CLAUSE__LOOPTEMP_:
+       case OMP_CLAUSE__REDUCTEMP_:
+       case OMP_CLAUSE__CONDTEMP_:
+       case OMP_CLAUSE__SCANTEMP_:
+       case OMP_CLAUSE__SIMDUID_:
+       case OMP_CLAUSE_AFFINITY:
+         WALK_SUBTREE (OMP_CLAUSE_OPERAND (*tp, 0));
+         /* FALLTHRU */
+
+       case OMP_CLAUSE_INDEPENDENT:
+       case OMP_CLAUSE_NOWAIT:
+       case OMP_CLAUSE_DEFAULT:
+       case OMP_CLAUSE_UNTIED:
+       case OMP_CLAUSE_MERGEABLE:
+       case OMP_CLAUSE_PROC_BIND:
+       case OMP_CLAUSE_DEVICE_TYPE:
+       case OMP_CLAUSE_INBRANCH:
+       case OMP_CLAUSE_NOTINBRANCH:
+       case OMP_CLAUSE_FOR:
+       case OMP_CLAUSE_PARALLEL:
+       case OMP_CLAUSE_SECTIONS:
+       case OMP_CLAUSE_TASKGROUP:
+       case OMP_CLAUSE_NOGROUP:
+       case OMP_CLAUSE_THREADS:
+       case OMP_CLAUSE_SIMD:
+       case OMP_CLAUSE_DEFAULTMAP:
+       case OMP_CLAUSE_ORDER:
+       case OMP_CLAUSE_BIND:
+       case OMP_CLAUSE_AUTO:
+       case OMP_CLAUSE_SEQ:
+       case OMP_CLAUSE_NOHOST:
+       case OMP_CLAUSE_TILE:
+       case OMP_CLAUSE__SIMT_:
+       case OMP_CLAUSE_IF_PRESENT:
+       case OMP_CLAUSE_FINALIZE:
+         WALK_SUBTREE_TAIL (OMP_CLAUSE_CHAIN (*tp));
+
+       case OMP_CLAUSE_LASTPRIVATE:
+         WALK_SUBTREE (OMP_CLAUSE_DECL (*tp));
+         WALK_SUBTREE (OMP_CLAUSE_LASTPRIVATE_STMT (*tp));
+         WALK_SUBTREE_TAIL (OMP_CLAUSE_CHAIN (*tp));
+
+       case OMP_CLAUSE_COLLAPSE:
+         {
+           int i;
+           for (i = 0; i < 3; i++)
+             WALK_SUBTREE (OMP_CLAUSE_OPERAND (*tp, i));
+           WALK_SUBTREE_TAIL (OMP_CLAUSE_CHAIN (*tp));
+         }
+
+       case OMP_CLAUSE_LINEAR:
+         WALK_SUBTREE (OMP_CLAUSE_DECL (*tp));
+         WALK_SUBTREE (OMP_CLAUSE_LINEAR_STEP (*tp));
+         WALK_SUBTREE (OMP_CLAUSE_LINEAR_STMT (*tp));
+         WALK_SUBTREE_TAIL (OMP_CLAUSE_CHAIN (*tp));
+
+       case OMP_CLAUSE_ALIGNED:
+       case OMP_CLAUSE_ALLOCATE:
+       case OMP_CLAUSE_FROM:
+       case OMP_CLAUSE_TO:
+       case OMP_CLAUSE_MAP:
+       case OMP_CLAUSE__CACHE_:
+         WALK_SUBTREE (OMP_CLAUSE_DECL (*tp));
+         WALK_SUBTREE (OMP_CLAUSE_OPERAND (*tp, 1));
+         WALK_SUBTREE_TAIL (OMP_CLAUSE_CHAIN (*tp));
+
+       case OMP_CLAUSE_REDUCTION:
+         {
+           for (int i = 0; i < 6; i++)
+             WALK_SUBTREE (OMP_CLAUSE_OPERAND (*tp, i));
+           WALK_SUBTREE_TAIL (OMP_CLAUSE_CHAIN (*tp));
+         }
+
+       case OMP_CLAUSE_TASK_REDUCTION:
+       case OMP_CLAUSE_IN_REDUCTION:
+         {
+           for (int i = 0; i < 5; i++)
+             WALK_SUBTREE (OMP_CLAUSE_OPERAND (*tp, i));
+           WALK_SUBTREE_TAIL (OMP_CLAUSE_CHAIN (*tp));
+         }
+
+       default:
+         gcc_unreachable ();
+       }
       break;

     case TARGET_EXPR:
diff --git a/gcc/tree.h b/gcc/tree.h
index 15e5147f40b0..5ee1c33f4e15 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1746,6 +1746,8 @@ class auto_suppress_location_wrappers
 #define OMP_CLAUSE_REDUCTION_DECL_PLACEHOLDER(NODE) \
   OMP_CLAUSE_OPERAND (OMP_CLAUSE_RANGE_CHECK (NODE, OMP_CLAUSE_REDUCTION, \
                                              OMP_CLAUSE_IN_REDUCTION), 4)
+#define OMP_CLAUSE_REDUCTION_PRIVATE_DECL(NODE) \
+  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_REDUCTION), 5)

 /* True if a REDUCTION clause may reference the original list item (omp_orig)
    in its OMP_CLAUSE_REDUCTION_{,GIMPLE_}INIT.  */
--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 33/40] Fix tree check failure with reduction localization
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
                   ` (31 preceding siblings ...)
  2021-12-15 15:54 ` [PATCH 32/40] Reference reduction localization Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2021-12-15 15:54 ` [PATCH 34/40] Use more appropriate var in localize_reductions call Frederik Harwath
                   ` (6 subsequent siblings)
  39 siblings, 0 replies; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: Julian Brown, thomas

From: Julian Brown <julian@codesourcery.com>

        gcc/
        * gimplify.c (gimplify_omp_workshare): Use OMP_CLAUSES, OMP_BODY
        instead of OMP_TARGET_CLAUSES, OMP_TARGET_BODY.
---
 gcc/gimplify.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 9a4331c70d6e..04ffbc256442 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -13753,8 +13753,7 @@ gimplify_omp_workshare (tree *expr_p, gimple_seq *pre_p)

       /* FIXME: Reductions are not supported in kernels regions yet.  */
       if (/*ort == ORT_ACC_KERNELS ||*/ ort == ORT_ACC_PARALLEL)
-        localize_reductions (OMP_TARGET_CLAUSES (*expr_p),
-                            OMP_TARGET_BODY (*expr_p));
+        localize_reductions (OMP_CLAUSES (expr), OMP_BODY (expr));

       gimple *g = gimplify_and_return_first (OMP_BODY (expr), &body);
       if (gimple_code (g) == GIMPLE_BIND)
--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 34/40] Use more appropriate var in localize_reductions call
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
                   ` (32 preceding siblings ...)
  2021-12-15 15:54 ` [PATCH 33/40] Fix tree check failure with " Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2021-12-15 15:54 ` [PATCH 35/40] Handle references in OpenACC "private" clauses Frederik Harwath
                   ` (5 subsequent siblings)
  39 siblings, 0 replies; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: Julian Brown, thomas

From: Julian Brown <julian@codesourcery.com>

        gcc/
        * gimplify.c (gimplify_omp_for): Use for_stmt in call to
        localize_reductions.
---
 gcc/gimplify.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 04ffbc256442..daa69ccf6202 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -12219,7 +12219,8 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
         here, because the code to remove reductions in kernels regions cannot
         handle that.  */
       if (outer && outer->region_type == ORT_ACC_PARALLEL)
-       localize_reductions (OMP_FOR_CLAUSES (*expr_p), OMP_FOR_BODY (*expr_p));
+       localize_reductions (OMP_FOR_CLAUSES (for_stmt),
+                            OMP_FOR_BODY (for_stmt));
     }

   /* Set OMP_CLAUSE_LINEAR_NO_COPYIN flag on explicit linear
--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 35/40] Handle references in OpenACC "private" clauses
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
                   ` (33 preceding siblings ...)
  2021-12-15 15:54 ` [PATCH 34/40] Use more appropriate var in localize_reductions call Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2021-12-15 15:54 ` [PATCH 36/40] openacc: Enable reduction variable localization for "kernels" Frederik Harwath
                   ` (4 subsequent siblings)
  39 siblings, 0 replies; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: Julian Brown, thomas

From: Julian Brown <julian@codesourcery.com>

        gcc/
        * gimplify.c (localize_reductions): Rewrite references for
        OMP_CLAUSE_PRIVATE also.

        libgomp/
        * testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: New test.
        * testsuite/libgomp.oacc-c++/privatized-ref-2.C: New test.
        * testsuite/libgomp.oacc-c++/privatized-ref-3.C: New test.
---
 gcc/gimplify.c                                | 15 ++++
 .../libgomp.oacc-c++/privatized-ref-2.C       | 64 +++++++++++++++++
 .../libgomp.oacc-c++/privatized-ref-3.C       | 64 +++++++++++++++++
 .../libgomp.oacc-fortran/privatized-ref-1.f95 | 71 +++++++++++++++++++
 4 files changed, 214 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c++/privatized-ref-2.C
 create mode 100644 libgomp/testsuite/libgomp.oacc-c++/privatized-ref-3.C
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-1.f95

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index daa69ccf6202..bf37388f947c 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -11976,6 +11976,21 @@ localize_reductions (tree clauses, tree body)

        OMP_CLAUSE_REDUCTION_PRIVATE_DECL (c) = new_var;
       }
+    else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE)
+      {
+       var = OMP_CLAUSE_DECL (c);
+
+       if (!lang_hooks.decls.omp_privatize_by_reference (var))
+         continue;
+
+       type = TREE_TYPE (TREE_TYPE (var));
+       new_var = create_tmp_var (type, IDENTIFIER_POINTER (DECL_NAME (var)));
+
+       pr.ref_var = var;
+       pr.local_var = new_var;
+
+       walk_tree (&body, localize_reductions_r, &pr, NULL);
+      }
 }


diff --git a/libgomp/testsuite/libgomp.oacc-c++/privatized-ref-2.C b/libgomp/testsuite/libgomp.oacc-c++/privatized-ref-2.C
new file mode 100644
index 000000000000..3884f163132c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c++/privatized-ref-2.C
@@ -0,0 +1,64 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+
+void workers (void)
+{
+  double res[65536];
+  int i;
+
+#pragma acc parallel copyout(res) num_gangs(64) num_workers(64)
+  {
+    int i, j;
+#pragma acc loop gang
+    for (i = 0; i < 256; i++)
+      {
+#pragma acc loop worker
+       for (j = 0; j < 256; j++)
+         {
+           int tmpvar;
+           int &tmpref = tmpvar;
+           tmpref = (i * 256 + j) * 99;
+           res[i * 256 + j] = tmpref;
+         }
+      }
+  }
+
+  for (i = 0; i < 65536; i++)
+    if (res[i] != i * 99)
+      abort ();
+}
+
+void vectors (void)
+{
+  double res[65536];
+  int i;
+
+#pragma acc parallel copyout(res) num_gangs(64) num_workers(64)
+  {
+    int i, j;
+#pragma acc loop gang worker
+    for (i = 0; i < 256; i++)
+      {
+#pragma acc loop vector
+       for (j = 0; j < 256; j++)
+         {
+           int tmpvar;
+           int &tmpref = tmpvar;
+           tmpref = (i * 256 + j) * 101;
+           res[i * 256 + j] = tmpref;
+         }
+      }
+  }
+
+  for (i = 0; i < 65536; i++)
+    if (res[i] != i * 101)
+      abort ();
+}
+
+int main (int argc, char *argv[])
+{
+  workers ();
+  vectors ();
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c++/privatized-ref-3.C b/libgomp/testsuite/libgomp.oacc-c++/privatized-ref-3.C
new file mode 100644
index 000000000000..c1a10cba31b3
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c++/privatized-ref-3.C
@@ -0,0 +1,64 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+
+void workers (void)
+{
+  double res[65536];
+  int i;
+
+#pragma acc parallel copyout(res) num_gangs(64) num_workers(64)
+  {
+    int i, j;
+    int tmpvar;
+    int &tmpref = tmpvar;
+#pragma acc loop gang
+    for (i = 0; i < 256; i++)
+      {
+#pragma acc loop worker private(tmpref)
+       for (j = 0; j < 256; j++)
+         {
+           tmpref = (i * 256 + j) * 99;
+           res[i * 256 + j] = tmpref;
+         }
+      }
+  }
+
+  for (i = 0; i < 65536; i++)
+    if (res[i] != i * 99)
+      abort ();
+}
+
+void vectors (void)
+{
+  double res[65536];
+  int i;
+
+#pragma acc parallel copyout(res) num_gangs(64) num_workers(64)
+  {
+    int i, j;
+    int tmpvar;
+    int &tmpref = tmpvar;
+#pragma acc loop gang worker
+    for (i = 0; i < 256; i++)
+      {
+#pragma acc loop vector private(tmpref)
+       for (j = 0; j < 256; j++)
+         {
+           tmpref = (i * 256 + j) * 101;
+           res[i * 256 + j] = tmpref;
+         }
+      }
+  }
+
+  for (i = 0; i < 65536; i++)
+    if (res[i] != i * 101)
+      abort ();
+}
+
+int main (int argc, char *argv[])
+{
+  workers ();
+  vectors ();
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-1.f95 b/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-1.f95
new file mode 100644
index 000000000000..fe1520a8078c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-1.f95
@@ -0,0 +1,71 @@
+! { dg-do run }
+
+program main
+  implicit none
+  integer :: myint
+  integer :: i
+  real :: res(65536), tmp
+
+  res(:) = 0.0
+
+  myint = 5
+  call workers(myint, res)
+
+  do i=1,65536
+    tmp = i * 99
+    if (res(i) .ne. tmp) stop 1
+  end do
+
+  res(:) = 0.0
+
+  myint = 7
+  call vectors(myint, res)
+
+  do i=1,65536
+    tmp = i * 101
+    if (res(i) .ne. tmp) stop 2
+  end do
+
+contains
+
+  subroutine workers(t1, res)
+    implicit none
+    integer :: t1
+    integer :: i, j
+    real, intent(out) :: res(:)
+
+    !$acc parallel copyout(res) num_gangs(64) num_workers(64) ! { dg-warning "using num_workers \\(32\\), ignoring 64" "" { target openacc_nvidia_accel_selected } }
+
+    !$acc loop gang
+    do i=0,255
+      !$acc loop worker private(t1)
+      do j=1,256
+        t1 = (i * 256 + j) * 99
+        res(i * 256 + j) = t1
+      end do
+    end do
+
+    !$acc end parallel
+  end subroutine workers
+
+  subroutine vectors(t1, res)
+    implicit none
+    integer :: t1
+    integer :: i, j
+    real, intent(out) :: res(:)
+
+    !$acc parallel copyout(res) num_gangs(64) num_workers(64) ! { dg-warning "using num_workers \\(32\\), ignoring 64" "" { target openacc_nvidia_accel_selected } }
+
+    !$acc loop gang worker
+    do i=0,255
+      !$acc loop vector private(t1)
+      do j=1,256
+        t1 = (i * 256 + j) * 101
+        res(i * 256 + j) = t1
+      end do
+    end do
+
+    !$acc end parallel
+  end subroutine vectors
+
+end program main
--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 36/40] openacc: Enable reduction variable localization for "kernels"
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
                   ` (34 preceding siblings ...)
  2021-12-15 15:54 ` [PATCH 35/40] Handle references in OpenACC "private" clauses Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2021-12-15 15:54 ` [PATCH 37/40] Fix for is_gimple_reg vars to 'data kernels' Frederik Harwath
                   ` (3 subsequent siblings)
  39 siblings, 0 replies; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: thomas

gcc/ChangeLog:

        * gimplify.c (gimplify_omp_for): Enable localization on
        "kernels" regions.
        (gimplify_omp_workshare): Likewise.
---
 gcc/gimplify.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index bf37388f947c..a0137089496b 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -12229,11 +12229,9 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
             && outer->region_type != ORT_ACC_KERNELS)
        outer = outer->outer_context;

-      /* FIXME: Reductions only work in parallel regions at present.  We avoid
-        doing the reduction localization transformation in kernels regions
-        here, because the code to remove reductions in kernels regions cannot
-        handle that.  */
-      if (outer && outer->region_type == ORT_ACC_PARALLEL)
+      if (outer && (outer->region_type == ORT_ACC_PARALLEL
+                   || (outer->region_type == ORT_ACC_KERNELS
+                       && param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE)))
        localize_reductions (OMP_FOR_CLAUSES (for_stmt),
                             OMP_FOR_BODY (for_stmt));
     }
@@ -13767,8 +13765,9 @@ gimplify_omp_workshare (tree *expr_p, gimple_seq *pre_p)
     {
       push_gimplify_context ();

-      /* FIXME: Reductions are not supported in kernels regions yet.  */
-      if (/*ort == ORT_ACC_KERNELS ||*/ ort == ORT_ACC_PARALLEL)
+      if (ort == ORT_ACC_PARALLEL
+          || (ort == ORT_ACC_KERNELS
+              && param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE))
         localize_reductions (OMP_CLAUSES (expr), OMP_BODY (expr));

       gimple *g = gimplify_and_return_first (OMP_BODY (expr), &body);
--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 37/40] Fix for is_gimple_reg vars to 'data kernels'
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
                   ` (35 preceding siblings ...)
  2021-12-15 15:54 ` [PATCH 36/40] openacc: Enable reduction variable localization for "kernels" Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2021-12-15 15:54 ` [PATCH 38/40] openacc: fix privatization of by-reference arrays Frederik Harwath
                   ` (2 subsequent siblings)
  39 siblings, 0 replies; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: Tobias Burnus, thomas

From: Tobias Burnus <tobias@codesourcery.com>

Nearly all variable mapping is moved from 'kernels' to a surrounding
'data kernels' and then 'force_present' mapped for the 'kernels'. However, as
libgomp.oacc-c-c++-common/declare-vla.c shows, moving 'int i, N' will fail as
there is a special case for is_gimple_reg in mapping and that fails badly if
outside a target region (e.g. offloading = false). As those are transferred by
value and not as a pointer, it makes more sense to only map them at
'kernels' and ignore them for 'data kernels'.
Additionally, as e.g. libgomp.oacc-c-c++-common/kernels-decompose-1.c shows,
one still additionally to handle 'kernels'-declared variables which now are
declared in 'kernels data' and and can be handled as is_gimple_reg.

        gcc/
        * omp-oacc-kernels-decompose.cc (maybe_build_inner_data_region):
        is_gimple_reg vars are not yet mapped, fall through to map is as
        before the transformation.
        (omp_oacc_kernels_decompose_1): Don't map is_gimple_reg vars.
        (decompose_kernels_region_body): Use tofrom for is_gimple_reg vars.
        (omp_oacc_kernels_decompose_1): Handle is_gimple_reg vars as without
        data kernels.

        gcc/testsuite/
        * gfortran.dg/goacc/declare-3.f95: Update scan-tree-dump-times.
---
 gcc/omp-oacc-kernels-decompose.cc             | 9 +++++++--
 gcc/testsuite/gfortran.dg/goacc/declare-3.f95 | 2 +-
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/gcc/omp-oacc-kernels-decompose.cc b/gcc/omp-oacc-kernels-decompose.cc
index c96207d96250..a6be1f1ed238 100644
--- a/gcc/omp-oacc-kernels-decompose.cc
+++ b/gcc/omp-oacc-kernels-decompose.cc
@@ -873,7 +873,7 @@ maybe_build_inner_data_region (location_t loc, gimple *body,
          else
            inner_bind_vars = next;
        }
-      else
+      else if (!is_gimple_reg (v))
        {
          /* Otherwise, build the map clause.  */
          tree new_clause = build_omp_clause (loc, OMP_CLAUSE_MAP);
@@ -1222,7 +1222,9 @@ decompose_kernels_region_body (gimple *kernels_region, tree kernels_clauses)
       if (!DECL_ARTIFICIAL (var) && TREE_CODE (var) != CONST_DECL)
        {
          tree present_clause = build_omp_clause (loc, OMP_CLAUSE_MAP);
-         OMP_CLAUSE_SET_MAP_KIND (present_clause, GOMP_MAP_FORCE_PRESENT);
+         OMP_CLAUSE_SET_MAP_KIND (present_clause,
+                                  is_gimple_reg (var)
+                                  ? GOMP_MAP_TOFROM : GOMP_MAP_FORCE_PRESENT);
          OMP_CLAUSE_DECL (present_clause) = var;
          OMP_CLAUSE_SIZE (present_clause) = DECL_SIZE_UNIT (var);
          OMP_CLAUSE_CHAIN (present_clause) = present_clauses;
@@ -1437,6 +1439,9 @@ omp_oacc_kernels_decompose_1 (gimple *kernels_stmt)
                   region causes runtime errors.  */
                break;

+             if (is_gimple_reg (decl))
+               break;
+
              /* For non-artificial variables, and for non-declaration
                 expressions like A[0:n], copy the clause to the data
                 region.  */
diff --git a/gcc/testsuite/gfortran.dg/goacc/declare-3.f95 b/gcc/testsuite/gfortran.dg/goacc/declare-3.f95
index 9127cba6600d..2a1fe0a68465 100644
--- a/gcc/testsuite/gfortran.dg/goacc/declare-3.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/declare-3.f95
@@ -39,7 +39,7 @@ program test
   use mod_d
   use mod_e

-  ! { dg-final { scan-tree-dump {(?n)#pragma acc data map\(force_alloc:d\) map\(force_to:b\) map\(force_alloc:a\)$} original } }
+  ! { dg-final { scan-tree-dump {(?n)#pragma acc data map\(force_alloc:d\) map\(to:b\) map\(alloc:a\)$} original } }
 end program test

 ! { dg-final { scan-tree-dump-times {#pragma acc data} 1 original } }
--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 38/40] openacc: fix privatization of by-reference arrays
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
                   ` (36 preceding siblings ...)
  2021-12-15 15:54 ` [PATCH 37/40] Fix for is_gimple_reg vars to 'data kernels' Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2021-12-15 15:54 ` [PATCH 39/40] openacc: Check type for references in reduction lowering Frederik Harwath
  2021-12-16 12:00 ` [PATCH 40/40] openacc: Adjust testsuite to new "kernels" handling Frederik Harwath
  39 siblings, 0 replies; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: Tobias Burnus, thomas

From: Tobias Burnus <tobias@codesourcery.com>

Replacing of a by-reference variable in a private clause by a local variable
makes sense; however, for arrays, the size is not directly known by the type.
This causes an ICE via create_tmp_var which indirectly invokes
force_constant_size in this case - but the latter only handled Ada.

gcc/ChangeLog:

        * gimplify.c (localize_reductions): Do not create local
        variable for privatized arrays.
---
 gcc/gimplify.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index a0137089496b..952bc449a7db 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -11982,8 +11982,9 @@ localize_reductions (tree clauses, tree body)

        if (!lang_hooks.decls.omp_privatize_by_reference (var))
          continue;
-
        type = TREE_TYPE (TREE_TYPE (var));
+       if (TREE_CODE (type) == ARRAY_TYPE)
+         continue;
        new_var = create_tmp_var (type, IDENTIFIER_POINTER (DECL_NAME (var)));

        pr.ref_var = var;
--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 39/40] openacc: Check type for references in reduction lowering
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
                   ` (37 preceding siblings ...)
  2021-12-15 15:54 ` [PATCH 38/40] openacc: fix privatization of by-reference arrays Frederik Harwath
@ 2021-12-15 15:54 ` Frederik Harwath
  2021-12-16 12:00 ` [PATCH 40/40] openacc: Adjust testsuite to new "kernels" handling Frederik Harwath
  39 siblings, 0 replies; 49+ messages in thread
From: Frederik Harwath @ 2021-12-15 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: thomas

gcc/ChangeLog:

        * omp-low.c (lower_oacc_reductions): Only create a reference
        if variable has pointer type.
---
 gcc/omp-low.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index ae5cdfc5e260..2b8b848ec03a 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -7639,9 +7639,10 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,

        if (omp_privatize_by_reference (orig))
          {
-           outgoing = build_simple_mem_ref (outgoing);
+            if (POINTER_TYPE_P (TREE_TYPE (outgoing)))
+             outgoing = build_simple_mem_ref (outgoing);

-           if (!TREE_CONSTANT (incoming))
+            if (POINTER_TYPE_P (TREE_TYPE (incoming)))
              incoming = build_simple_mem_ref (incoming);
          }

--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 40/40] openacc: Adjust testsuite to new "kernels" handling
  2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
                   ` (38 preceding siblings ...)
  2021-12-15 15:54 ` [PATCH 39/40] openacc: Check type for references in reduction lowering Frederik Harwath
@ 2021-12-16 12:00 ` Frederik Harwath
  39 siblings, 0 replies; 49+ messages in thread
From: Frederik Harwath @ 2021-12-16 12:00 UTC (permalink / raw)
  To: gcc-patches, fortran; +Cc: Catherine_Moore

[-- Attachment #1: Type: text/plain, Size: 19973 bytes --]


Adjust the testsuite to changed expectations with the new
Graphite-based "kernels" handling.

libgomp/ChangeLog:

        * testsuite/libgomp.oacc-c++/privatized-ref-2.C: Adjust.
        * testsuite/libgomp.oacc-c++/privatized-ref-3.C: Adjust.
        * testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Adjust.
        * testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c:
        Adjust.
        * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Adjust.
        * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c:
        Adjust.
        * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c:
        Adjust.
        * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c:
        Adjust.
        * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c:
        Adjust.
        * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c:
        Adjust.
        * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-1.c:
        Adjust.
        * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-2.c:
        Adjust.
        * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-3.c:
        Adjust.
        * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-4.c:
        Adjust.
        * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-5.c:
        Adjust.
        * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-6.c:
        Adjust.
        * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c:
        Adjust.
        * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c:
        Adjust.
        * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-1.c:
        Adjust.
        * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c:
        Adjust.
        * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c:
        Adjust.
        * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c:
        Adjust.
        * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c:
        Adjust.
        * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c:
        Adjust.
        * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c:
        Adjust.
        * testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Adjust.
        * testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Adjust.
        * testsuite/libgomp.oacc-c-c++-common/pr84955-1.c: Adjust.
        * testsuite/libgomp.oacc-c-c++-common/pr85381-2.c: Adjust.
        * testsuite/libgomp.oacc-c-c++-common/pr85381-3.c: Adjust.
        * testsuite/libgomp.oacc-c-c++-common/pr85381-4.c: Adjust.
        * testsuite/libgomp.oacc-c-c++-common/pr85486-2.c: Adjust.
        * testsuite/libgomp.oacc-c-c++-common/pr85486-3.c: Adjust.
        * testsuite/libgomp.oacc-c-c++-common/pr85486.c: Adjust.
        * testsuite/libgomp.oacc-c-c++-common/routine-nohost-1.c: Adjust.
        * testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c: Adjust.
        * testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c: Adjust.
        * testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c: Adjust.
        * testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c: Adjust.
        * testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c: Adjust.
        * testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c: Adjust.
        * testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c: Adjust.
        * testsuite/libgomp.oacc-fortran/if-1.f90: Adjust.
        * testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction-2.f90:
        Adjust.
        * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-1.f90:
        Adjust.
        * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-2.f90:
        Adjust.
        * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-3.f90:
        Adjust.
        * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-6.f90:
        Adjust.
        * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-1.f90:
        Adjust.
        * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-2.f90:
        Adjust.
        * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-1.f90:
        Adjust.
        * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-2.f90:
        Adjust.
        * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-3.f90:
        Adjust.
        * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-4.f90:
        Adjust.
        * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-5.f90:
        Adjust.
        * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-6.f90:
        Adjust.
        * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-7.f90:
        Adjust.
        * testsuite/libgomp.oacc-fortran/optional-private.f90: Adjust.
        * testsuite/libgomp.oacc-fortran/pr94358-1.f90: Adjust.
        * testsuite/libgomp.oacc-fortran/routine-nohost-1.f90: Adjust.

gcc/testsuite/ChangeLog:
        * c-c++-common/goacc-gomp/nesting-1.c: Adjust.
        * c-c++-common/goacc/cache-3-1.c: Adjust.
        * c-c++-common/goacc/classify-kernels-unparallelized.c: Adjust.
        * c-c++-common/goacc/classify-kernels.c: Adjust.
        * c-c++-common/goacc/classify-routine-nohost.c: Adjust.
        * c-c++-common/goacc/classify-serial.c: Adjust.
        * c-c++-common/goacc/if-clause-2.c: Adjust.
        * c-c++-common/goacc/kernels-1.c: Adjust.
        * c-c++-common/goacc/kernels-counter-var-redundant-load.c: Adjust.
        * c-c++-common/goacc/kernels-counter-vars-function-scope.c: Adjust.
        * c-c++-common/goacc/kernels-decompose-1.c: Adjust.
        * c-c++-common/goacc/kernels-decompose-2.c: Adjust.
        * c-c++-common/goacc/kernels-decompose-ice-1.c: Adjust.
        * c-c++-common/goacc/kernels-decompose-ice-2.c: Adjust.
        * c-c++-common/goacc/kernels-double-reduction-n.c: Adjust.
        * c-c++-common/goacc/kernels-double-reduction.c: Adjust.
        * c-c++-common/goacc/kernels-loop-2.c: Adjust.
        * c-c++-common/goacc/kernels-loop-3.c: Adjust.
        * c-c++-common/goacc/kernels-loop-data-2.c: Adjust.
        * c-c++-common/goacc/kernels-loop-data-enter-exit-2.c: Adjust.
        * c-c++-common/goacc/kernels-loop-data-enter-exit.c: Adjust.
        * c-c++-common/goacc/kernels-loop-data-update.c: Adjust.
        * c-c++-common/goacc/kernels-loop-data.c: Adjust.
        * c-c++-common/goacc/kernels-loop-g.c: Adjust.
        * c-c++-common/goacc/kernels-loop-mod-not-zero.c: Adjust.
        * c-c++-common/goacc/kernels-loop-n.c: Adjust.
        * c-c++-common/goacc/kernels-loop-nest.c: Adjust.
        * c-c++-common/goacc/kernels-loop.c: Adjust.
        * c-c++-common/goacc/kernels-one-counter-var.c: Adjust.
        * c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c: Adjust.
        * c-c++-common/goacc/kernels-reduction.c: Adjust.
        * c-c++-common/goacc/loop-2-kernels.c: Adjust.
        * c-c++-common/goacc/loop-auto-1.c: Adjust.
        * c-c++-common/goacc/loop-auto-2.c: Adjust.
        * c-c++-common/goacc/nested-reductions-2-parallel.c: Adjust.
        * c-c++-common/goacc/omp_data_optimize-1.c: Adjust.
        * c-c++-common/goacc/routine-nohost-1.c: Adjust.
        * c-c++-common/goacc/uninit-copy-clause.c: Adjust.
        * g++.dg/goacc/omp_data_optimize-1.C: Adjust.
        * g++.dg/goacc/template.C: Adjust.
        * gcc.dg/goacc/loop-processing-1.c: Adjust.
        * gcc.dg/goacc/nested-function-1.c: Adjust.
        * gfortran.dg/directive_unroll_1.f90: Adjust.
        * gfortran.dg/directive_unroll_4.f90: Adjust.
        * gfortran.dg/goacc/classify-kernels-unparallelized.f95: Adjust.
        * gfortran.dg/goacc/classify-kernels.f95: Adjust.
        * gfortran.dg/goacc/classify-parallel.f95: Adjust.
        * gfortran.dg/goacc/classify-routine-nohost.f95: Adjust.
        * gfortran.dg/goacc/classify-routine.f95: Adjust.
        * gfortran.dg/goacc/classify-serial.f95: Adjust.
        * gfortran.dg/goacc/common-block-3.f90: Adjust.
        * gfortran.dg/goacc/declare-3.f95: Adjust.
        * gfortran.dg/goacc/gang-static.f95: Adjust.
        * gfortran.dg/goacc/kernels-decompose-1.f95: Adjust.
        * gfortran.dg/goacc/kernels-decompose-2.f95: Adjust.
        * gfortran.dg/goacc/kernels-loop-2.f95: Adjust.
        * gfortran.dg/goacc/kernels-loop-data-2.f95: Adjust.
        * gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95: Adjust.
        * gfortran.dg/goacc/kernels-loop-data-enter-exit.f95: Adjust.
        * gfortran.dg/goacc/kernels-loop-data-update.f95: Adjust.
        * gfortran.dg/goacc/kernels-loop-data.f95: Adjust.
        * gfortran.dg/goacc/kernels-loop-inner.f95: Adjust.
        * gfortran.dg/goacc/kernels-loop-n.f95: Adjust.
        * gfortran.dg/goacc/kernels-loop.f95: Adjust.
        * gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95: Adjust.
        * gfortran.dg/goacc/kernels-tree.f95: Adjust.
        * gfortran.dg/goacc/loop-2-kernels.f95: Adjust.
        * gfortran.dg/goacc/loop-auto-transfer-2.f90: Adjust.
        * gfortran.dg/goacc/loop-auto-transfer-3.f90: Adjust.
        * gfortran.dg/goacc/loop-auto-transfer-4.f90: Adjust.
        * gfortran.dg/goacc/nested-function-1.f90: Adjust.
        * gfortran.dg/goacc/nested-reductions-2-parallel.f90: Adjust.
        * gfortran.dg/goacc/omp_data_optimize-1.f90: Adjust.
        * gfortran.dg/goacc/private-explicit-kernels-1.f95: Adjust.
        * gfortran.dg/goacc/private-predetermined-kernels-1.f95: Adjust.
        * gfortran.dg/goacc/privatization-1-compute-loop.f90: Adjust.
        * gfortran.dg/goacc/routine-module-mod-1.f90: Adjust.
        * gfortran.dg/goacc/routine-multiple-directives-1.f90: Adjust.
        * gfortran.dg/goacc/uninit-copy-clause.f95: Adjust.
        * c-c++-common/goacc/loop-auto-reductions.c: New test.
        * c-c++-common/goacc/note-parallelism-kernels-loops-1.c: New test.
        * c-c++-common/goacc/note-parallelism-kernels-loops-parloops.c: New test.
        * gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95: New
        test.
        * gfortran.dg/goacc/kernels-conversion.f95: New test.
        * gfortran.dg/goacc/kernels-reductions.f90: New test.
---
 .../c-c++-common/goacc-gomp/nesting-1.c       |  10 +-
 gcc/testsuite/c-c++-common/goacc/cache-3-1.c  |   2 +-
 .../goacc/classify-kernels-unparallelized.c   |  34 ++-
 .../c-c++-common/goacc/classify-kernels.c     |  21 +-
 .../goacc/classify-routine-nohost.c           |  20 +-
 .../c-c++-common/goacc/classify-serial.c      |   8 +-
 .../c-c++-common/goacc/if-clause-2.c          |   2 +-
 gcc/testsuite/c-c++-common/goacc/kernels-1.c  |  17 +-
 .../kernels-counter-var-redundant-load.c      |  20 +-
 .../kernels-counter-vars-function-scope.c     |  11 +-
 .../c-c++-common/goacc/kernels-decompose-1.c  |  31 ++-
 .../c-c++-common/goacc/kernels-decompose-2.c  |  58 +++--
 .../goacc/kernels-decompose-ice-1.c           |   7 +-
 .../goacc/kernels-decompose-ice-2.c           |   3 +-
 .../goacc/kernels-double-reduction-n.c        |   5 +-
 .../goacc/kernels-double-reduction.c          |   4 +-
 .../c-c++-common/goacc/kernels-loop-2.c       |  20 +-
 .../c-c++-common/goacc/kernels-loop-3.c       |   2 +
 .../c-c++-common/goacc/kernels-loop-data-2.c  |  18 +-
 .../goacc/kernels-loop-data-enter-exit-2.c    |  17 +-
 .../goacc/kernels-loop-data-enter-exit.c      |  18 +-
 .../goacc/kernels-loop-data-update.c          |  14 +-
 .../c-c++-common/goacc/kernels-loop-data.c    |  13 +-
 .../c-c++-common/goacc/kernels-loop-g.c       |  15 +-
 .../goacc/kernels-loop-mod-not-zero.c         |  11 +-
 .../c-c++-common/goacc/kernels-loop-n.c       |  11 +-
 .../c-c++-common/goacc/kernels-loop-nest.c    |  13 +-
 .../c-c++-common/goacc/kernels-loop.c         |  11 +-
 .../goacc/kernels-one-counter-var.c           |  13 +-
 .../kernels-parallel-loop-data-enter-exit.c   |  18 +-
 .../c-c++-common/goacc/kernels-reduction.c    |   9 +-
 .../c-c++-common/goacc/loop-2-kernels.c       |   6 +-
 .../c-c++-common/goacc/loop-auto-1.c          | 127 +++++------
 .../c-c++-common/goacc/loop-auto-2.c          |  37 +--
 .../c-c++-common/goacc/loop-auto-reductions.c |  22 ++
 .../goacc/nested-reductions-2-parallel.c      | 138 +++++++++++
 .../goacc/note-parallelism-kernels-loops-1.c  |  61 +++++
 .../note-parallelism-kernels-loops-parloops.c |  53 +++++
 .../c-c++-common/goacc/omp_data_optimize-1.c  | 208 ++++++++---------
 .../c-c++-common/goacc/routine-nohost-1.c     |   2 +-
 .../c-c++-common/goacc/uninit-copy-clause.c   |   6 -
 .../g++.dg/goacc/omp_data_optimize-1.C        |  32 +--
 gcc/testsuite/g++.dg/goacc/template.C         |  18 +-
 .../gcc.dg/goacc/loop-processing-1.c          |   9 +-
 .../gcc.dg/goacc/nested-function-1.c          |   3 +-
 .../gfortran.dg/directive_unroll_1.f90        |   2 +-
 .../gfortran.dg/directive_unroll_4.f90        |   2 +-
 ...assify-kernels-unparallelized-parloops.f95 |  44 ++++
 .../goacc/classify-kernels-unparallelized.f95 |  27 +--
 .../gfortran.dg/goacc/classify-kernels.f95    |  21 +-
 .../gfortran.dg/goacc/classify-parallel.f95   |   6 +-
 .../goacc/classify-routine-nohost.f95         |  18 +-
 .../gfortran.dg/goacc/classify-routine.f95    |  20 +-
 .../gfortran.dg/goacc/classify-serial.f95     |   8 +-
 .../gfortran.dg/goacc/common-block-3.f90      |  16 +-
 gcc/testsuite/gfortran.dg/goacc/declare-3.f95 |   2 +-
 .../gfortran.dg/goacc/gang-static.f95         |  14 +-
 .../gfortran.dg/goacc/kernels-conversion.f95  |  52 +++++
 .../gfortran.dg/goacc/kernels-decompose-1.f95 | 186 ++++++++++-----
 .../gfortran.dg/goacc/kernels-decompose-2.f95 | 114 +++++++---
 .../gfortran.dg/goacc/kernels-loop-2.f95      |  11 +-
 .../gfortran.dg/goacc/kernels-loop-data-2.f95 |  11 +-
 .../goacc/kernels-loop-data-enter-exit-2.f95  |  13 +-
 .../goacc/kernels-loop-data-enter-exit.f95    |  13 +-
 .../goacc/kernels-loop-data-update.f95        |  13 +-
 .../gfortran.dg/goacc/kernels-loop-data.f95   |  15 +-
 .../gfortran.dg/goacc/kernels-loop-inner.f95  |   6 +-
 .../gfortran.dg/goacc/kernels-loop-n.f95      |  14 +-
 .../gfortran.dg/goacc/kernels-loop.f95        |  10 +-
 .../kernels-parallel-loop-data-enter-exit.f95 |  13 +-
 .../gfortran.dg/goacc/kernels-reductions.f90  |  37 +++
 .../gfortran.dg/goacc/kernels-tree.f95        |   2 +-
 .../gfortran.dg/goacc/loop-2-kernels.f95      |   6 +-
 .../goacc/loop-auto-transfer-2.f90            |   2 -
 .../goacc/loop-auto-transfer-3.f90            |   8 -
 .../goacc/loop-auto-transfer-4.f90            |  30 ---
 .../gfortran.dg/goacc/nested-function-1.f90   |  12 +-
 .../goacc/nested-reductions-2-parallel.f90    | 177 +++++++++++++++
 .../gfortran.dg/goacc/omp_data_optimize-1.f90 | 214 +++++++++---------
 .../goacc/private-explicit-kernels-1.f95      |  13 +-
 .../goacc/private-predetermined-kernels-1.f95 |  16 +-
 .../goacc/privatization-1-compute-loop.f90    |   3 -
 .../goacc/routine-module-mod-1.f90            |   4 +-
 .../goacc/routine-multiple-directives-1.f90   |  32 +--
 .../gfortran.dg/goacc/uninit-copy-clause.f95  |   2 -
 .../libgomp.oacc-c++/privatized-ref-2.C       |   4 +-
 .../libgomp.oacc-c++/privatized-ref-3.C       |   4 +-
 .../acc_prof-kernels-1.c                      |   4 +-
 .../declare-vla-kernels-decompose-ice-1.c     |   4 -
 .../kernels-decompose-1.c                     |   8 +-
 .../kernels-private-vars-local-worker-1.c     |   6 +-
 .../kernels-private-vars-local-worker-2.c     |   6 +-
 .../kernels-private-vars-local-worker-3.c     |   6 +-
 .../kernels-private-vars-local-worker-4.c     |   8 +-
 .../kernels-private-vars-local-worker-5.c     |   6 +-
 .../kernels-private-vars-loop-gang-1.c        |   4 +-
 .../kernels-private-vars-loop-gang-2.c        |   4 +-
 .../kernels-private-vars-loop-gang-3.c        |   4 +-
 .../kernels-private-vars-loop-gang-4.c        |  15 +-
 .../kernels-private-vars-loop-gang-5.c        |  10 +-
 .../kernels-private-vars-loop-gang-6.c        |   4 +-
 .../kernels-private-vars-loop-vector-1.c      |   6 +-
 .../kernels-private-vars-loop-vector-2.c      |   6 +-
 .../kernels-private-vars-loop-worker-1.c      |   8 +-
 .../kernels-private-vars-loop-worker-2.c      |   6 +-
 .../kernels-private-vars-loop-worker-3.c      |   6 +-
 .../kernels-private-vars-loop-worker-4.c      |   6 +-
 .../kernels-private-vars-loop-worker-5.c      |   9 +-
 .../kernels-private-vars-loop-worker-6.c      |   6 +-
 .../kernels-private-vars-loop-worker-7.c      |   6 +-
 .../libgomp.oacc-c-c++-common/loop-auto-1.c   |  30 ++-
 .../libgomp.oacc-c-c++-common/parallel-dims.c |  39 ++--
 .../libgomp.oacc-c-c++-common/pr84955-1.c     |   1 -
 .../libgomp.oacc-c-c++-common/pr85381-2.c     |   8 +-
 .../libgomp.oacc-c-c++-common/pr85381-3.c     |   8 +-
 .../libgomp.oacc-c-c++-common/pr85381-4.c     |   4 +-
 .../libgomp.oacc-c-c++-common/pr85486-2.c     |   4 +-
 .../libgomp.oacc-c-c++-common/pr85486-3.c     |   4 +-
 .../libgomp.oacc-c-c++-common/pr85486.c       |   4 +-
 .../routine-nohost-1.c                        |   6 +-
 .../vector-length-128-1.c                     |   5 +-
 .../vector-length-128-2.c                     |   6 +-
 .../vector-length-128-3.c                     |   5 +-
 .../vector-length-128-4.c                     |   5 +-
 .../vector-length-128-5.c                     |   5 +-
 .../vector-length-128-6.c                     |   5 +-
 .../vector-length-128-7.c                     |   5 +-
 .../testsuite/libgomp.oacc-fortran/if-1.f90   |  32 +--
 .../kernels-acc-loop-reduction-2.f90          |  12 +-
 .../kernels-private-vars-loop-gang-1.f90      |   4 +-
 .../kernels-private-vars-loop-gang-2.f90      |   4 +-
 .../kernels-private-vars-loop-gang-3.f90      |   4 +-
 .../kernels-private-vars-loop-gang-6.f90      |   5 +-
 .../kernels-private-vars-loop-vector-1.f90    |   4 +-
 .../kernels-private-vars-loop-vector-2.f90    |  11 +-
 .../kernels-private-vars-loop-worker-1.f90    |   6 +-
 .../kernels-private-vars-loop-worker-2.f90    |   4 +-
 .../kernels-private-vars-loop-worker-3.f90    |   4 +-
 .../kernels-private-vars-loop-worker-4.f90    |   4 +-
 .../kernels-private-vars-loop-worker-5.f90    |   7 +-
 .../kernels-private-vars-loop-worker-6.f90    |   4 +-
 .../kernels-private-vars-loop-worker-7.f90    |   6 +-
 .../libgomp.oacc-fortran/optional-private.f90 |   2 -
 .../libgomp.oacc-fortran/pr94358-1.f90        |   2 -
 .../libgomp.oacc-fortran/routine-nohost-1.f90 |   4 +-
 145 files changed, 1697 insertions(+), 1109 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/loop-auto-reductions.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops-1.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops-parloops.c
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-reductions.f90


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Attachment #2: 0040-openacc-Adjust-testsuite-to-new-kernels-handling.patch.gz --]
[-- Type: application/gzip, Size: 41658 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 15/40] graphite: Extend SCoP detection dump output
  2021-12-15 15:54 ` [PATCH 15/40] graphite: Extend SCoP detection dump output Frederik Harwath
@ 2022-05-16 12:49   ` Tobias Burnus
  2022-05-17  8:21     ` Richard Biener
  0 siblings, 1 reply; 49+ messages in thread
From: Tobias Burnus @ 2022-05-16 12:49 UTC (permalink / raw)
  To: gcc-patches, Richard Biener, Pop, Sebastian
  Cc: sebpop, thomas, grosser, Frederik Harwath

[-- Attachment #1: Type: text/plain, Size: 879 bytes --]

As requested by Richard: Rediffed patch.

Changes: s/.c/.cc/ + some whitespace changes.
(At least in my email reader, some <tab> were lost. I also fixed too-long line issues.)

In addition, FOR_EACH_LOOP was replaced by 'for (auto loop : ...'
(macro was removed late in GCC 12 development → r12-2605-ge41ba804ba5f5c)

Otherwise, it should be identical to Frederik's patch, earlier in this thread.

On 15.12.21 16:54, Frederik Harwath wrote:
> Extend dump output to make understanding why Graphite rejects to
> include a loop in a SCoP easier (for GCC developers).

OK for mainline?

Tobias
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Attachment #2: 015-graphite-Extend-SCoP-detection-dump-output.diff --]
[-- Type: text/x-patch, Size: 9817 bytes --]

graphite: Extend SCoP detection dump output

Extend dump output to make understanding why Graphite rejects to
include a loop in a SCoP easier (for GCC developers).

ChangeLog:

	* graphite-scop-detection.cc (scop_detection::can_represent_loop):
	Output reason for failure to dump file.
	(scop_detection::harmful_loop_in_region): Likewise.
	(scop_detection::graphite_can_represent_expr): Likewise.
	(scop_detection::stmt_has_simple_data_refs_p): Likewise.
	(scop_detection::stmt_simple_for_scop_p): Likewise.
	(print_sese_loop_numbers): New function.
	(scop_detection::add_scop): Use from here to print loops in
	rejected SCoP.

 gcc/graphite-scop-detection.cc | 182 ++++++++++++++++++++++++++++++++++++-----
 1 file changed, 161 insertions(+), 21 deletions(-)

diff --git a/gcc/graphite-scop-detection.cc b/gcc/graphite-scop-detection.cc
index 8c0ee997557..075aa3010c8 100644
--- a/gcc/graphite-scop-detection.cc
+++ b/gcc/graphite-scop-detection.cc
@@ -69,12 +69,27 @@ public:
     fprintf (output.dump_file, "%d", i);
     return output;
   }
+
   friend debug_printer &
   operator<< (debug_printer &output, const char *s)
   {
     fprintf (output.dump_file, "%s", s);
     return output;
   }
+
+  friend debug_printer &
+  operator<< (debug_printer &output, gimple* stmt)
+  {
+    print_gimple_stmt (output.dump_file, stmt, 0, TDF_VOPS | TDF_MEMSYMS);
+    return output;
+  }
+
+  friend debug_printer &
+  operator<< (debug_printer &output, tree t)
+  {
+    print_generic_expr (output.dump_file, t, TDF_SLIM);
+    return output;
+  }
 } dp;
 
 #define DEBUG_PRINT(args) do \
@@ -506,6 +521,23 @@ scop_detection::merge_sese (sese_l first, sese_l second) const
   return combined;
 }
 
+/* Print the loop numbers of the loops contained
+   in SESE to FILE. */
+
+static void
+print_sese_loop_numbers (FILE *file, sese_l sese)
+{
+  bool printed = false;
+  for (auto loop : loops_list (cfun, 0))
+    {
+      if (loop_in_sese_p (loop, sese))
+	fprintf (file, "%d, ", loop->num);
+      printed = true;
+    }
+  if (printed)
+    fprintf (file, "\b\b");
+}
+
 /* Build scop outer->inner if possible.  */
 
 void
@@ -519,6 +551,10 @@ scop_detection::build_scop_depth (loop_p loop)
       if (! next
 	  || harmful_loop_in_region (next))
 	{
+	  if (next)
+	    DEBUG_PRINT (dp << "[scop-detection] Discarding SCoP on loops ";
+			 print_sese_loop_numbers (dump_file, next);
+			 dp << " because of harmful loops\n");
 	  if (s)
 	    add_scop (s);
 	  build_scop_depth (loop);
@@ -560,14 +596,63 @@ scop_detection::can_represent_loop (loop_p loop, sese_l scop)
       || !single_pred_p (loop->latch)
       || exit->src != single_pred (loop->latch)
       || !empty_block_p (loop->latch))
-    return false;
+    {
+      DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop shape unsupported.\n");
+      return false;
+    }
+
+  bool edge_irreducible = (loop_preheader_edge (loop)->flags
+			   & EDGE_IRREDUCIBLE_LOOP);
+  if (edge_irreducible)
+    {
+      DEBUG_PRINT (dp << "[can_represent_loop-fail] "
+			 "Loop is not a natural loop.\n");
+      return false;
+    }
+
+  bool niter_is_unconditional = number_of_iterations_exit (loop,
+							   single_exit (loop),
+							   &niter_desc, false);
+
+  if (!niter_is_unconditional)
+    {
+      DEBUG_PRINT (dp << "[can_represent_loop-fail] "
+			 "Loop niter not unconditional.\n"
+			 "Condition: " << niter_desc.assumptions << "\n");
+      return false;
+    }
+
+  niter = number_of_latch_executions (loop);
+  if (!niter)
+    {
+      DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop niter unknown.\n");
+      return false;
+    }
+  if (!niter_desc.control.no_overflow)
+    {
+      DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop niter can overflow.\n");
+      return false;
+    }
 
-  return !(loop_preheader_edge (loop)->flags & EDGE_IRREDUCIBLE_LOOP)
-    && number_of_iterations_exit (loop, single_exit (loop), &niter_desc, false)
-    && niter_desc.control.no_overflow
-    && (niter = number_of_latch_executions (loop))
-    && !chrec_contains_undetermined (niter)
-    && graphite_can_represent_expr (scop, loop, niter);
+  bool undetermined_coefficients = chrec_contains_undetermined (niter);
+  if (undetermined_coefficients)
+    {
+      DEBUG_PRINT (dp << "[can_represent_loop-fail] "
+			 "Loop niter chrec contains undetermined "
+			 "coefficients.\n");
+      return false;
+    }
+
+  bool can_represent_expr = graphite_can_represent_expr (scop, loop, niter);
+  if (!can_represent_expr)
+    {
+      DEBUG_PRINT (dp << "[can_represent_loop-fail] "
+		      << "Loop niter expression cannot be represented: "
+		      << niter << "\n");
+      return false;
+    }
+
+  return true;
 }
 
 /* Return true when BEGIN is the preheader edge of a loop with a single exit
@@ -640,6 +725,15 @@ scop_detection::add_scop (sese_l s)
 
   scops.safe_push (s);
   DEBUG_PRINT (dp << "[scop-detection] Adding SCoP: "; print_sese (dump_file, s));
+
+  if (dump_file && dump_flags & TDF_DETAILS)
+    {
+      fprintf (dump_file, "Loops in SCoP: ");
+      for (auto loop : loops_list (cfun, 0))
+	if (loop_in_sese_p (loop, s))
+	  fprintf (dump_file, "%d ", loop->num);
+      fprintf (dump_file, "\n");
+    }
 }
 
 /* Return true when a statement in SCOP cannot be represented by Graphite.  */
@@ -665,7 +759,12 @@ scop_detection::harmful_loop_in_region (sese_l scop) const
 
       /* The basic block should not be part of an irreducible loop.  */
       if (bb->flags & BB_IRREDUCIBLE_LOOP)
-	return true;
+	{
+	  DEBUG_PRINT (dp << "[scop-detection-fail] Found bb in irreducible "
+			     "loop.\n");
+
+	  return true;
+	}
 
       /* Check for unstructured control flow: CFG not generated by structured
 	 if-then-else.  */
@@ -676,7 +775,11 @@ scop_detection::harmful_loop_in_region (sese_l scop) const
 	  FOR_EACH_EDGE (e, ei, bb->succs)
 	    if (!dominated_by_p (CDI_POST_DOMINATORS, bb, e->dest)
 		&& !dominated_by_p (CDI_DOMINATORS, e->dest, bb))
-	      return true;
+	      {
+		DEBUG_PRINT (dp << "[scop-detection-fail] Found unstructured "
+				   "control flow.\n");
+		return true;
+	      }
 	}
 
       /* Collect all loops in the current region.  */
@@ -688,7 +791,11 @@ scop_detection::harmful_loop_in_region (sese_l scop) const
       for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
 	   !gsi_end_p (gsi); gsi_next (&gsi))
 	if (!stmt_simple_for_scop_p (scop, gsi_stmt (gsi), bb))
-	  return true;
+	  {
+	    DEBUG_PRINT (dp << "[scop-detection-fail] "
+			       "Found harmful statement.\n");
+	    return true;
+	  }
 
       for (basic_block dom = first_dom_son (CDI_DOMINATORS, bb);
 	   dom;
@@ -731,9 +838,10 @@ scop_detection::harmful_loop_in_region (sese_l scop) const
 	  && ! loop_nest_has_data_refs (loop))
 	{
 	  DEBUG_PRINT (dp << "[scop-detection-fail] loop_" << loop->num
-		       << "does not have any data reference.\n");
+		       << " does not have any data reference.\n");
 	  return true;
 	}
+      DEBUG_PRINT (dp << "[scop-detection] loop_" << loop->num << " is harmless.\n");
     }
 
   return false;
@@ -922,7 +1030,21 @@ scop_detection::graphite_can_represent_expr (sese_l scop, loop_p loop,
 					     tree expr)
 {
   tree scev = cached_scalar_evolution_in_region (scop, loop, expr);
-  return graphite_can_represent_scev (scop, scev);
+  bool can_represent = graphite_can_represent_scev (scop, scev);
+
+  if (!can_represent)
+    {
+      if (dump_file)
+	{
+	  fprintf (dump_file,
+		   "[graphite_can_represent_expr] Cannot represent scev \"");
+	  print_generic_expr (dump_file, scev, TDF_SLIM);
+	  fprintf (dump_file, "\" of expression ");
+	  print_generic_expr (dump_file, expr, TDF_SLIM);
+	  fprintf (dump_file, " in loop %d\n", loop->num);
+	}
+    }
+  return can_represent;
 }
 
 /* Return true if the data references of STMT can be represented by Graphite.
@@ -938,7 +1060,11 @@ scop_detection::stmt_has_simple_data_refs_p (sese_l scop, gimple *stmt)
 
   auto_vec<data_reference_p> drs;
   if (! graphite_find_data_references_in_stmt (nest, loop, stmt, &drs))
-    return false;
+    {
+      DEBUG_PRINT (dp << "[stmt_has_simple_data_refs_p] "
+			 "Unanalyzable statement.\n");
+      return false;
+    }
 
   int j;
   data_reference_p dr;
@@ -946,7 +1072,12 @@ scop_detection::stmt_has_simple_data_refs_p (sese_l scop, gimple *stmt)
     {
       for (unsigned i = 0; i < DR_NUM_DIMENSIONS (dr); ++i)
 	if (! graphite_can_represent_scev (scop, DR_ACCESS_FN (dr, i)))
-	  return false;
+	  {
+	    DEBUG_PRINT (dp << "[stmt_has_simple_data_refs_p] "
+			       "Cannot represent access function SCEV: "
+			    << DR_ACCESS_FN (dr, i) << "\n");
+	    return false;
+	  }
     }
 
   return true;
@@ -1027,14 +1158,23 @@ scop_detection::stmt_simple_for_scop_p (sese_l scop, gimple *stmt,
 	for (unsigned i = 0; i < 2; ++i)
 	  {
 	    tree op = gimple_op (stmt, i);
-	    if (!graphite_can_represent_expr (scop, loop, op)
-		/* We can only constrain on integer type.  */
-		|| ! INTEGRAL_TYPE_P (TREE_TYPE (op)))
+	    if (!graphite_can_represent_expr (scop, loop, op))
+	      {
+	        DEBUG_PRINT (dump_printf_loc (MSG_MISSED_OPTIMIZATION, stmt,
+					      "[scop-detection-fail] "
+					      "Graphite cannot represent cond "
+					      "stmt operator expression.\n"));
+	        DEBUG_PRINT (dp << op << "\n");
+	        return false;
+	      }
+
+	    if (! INTEGRAL_TYPE_P (TREE_TYPE (op)))
 	      {
-		DEBUG_PRINT (dp << "[scop-detection-fail] "
-				<< "Graphite cannot represent stmt:\n";
-			     print_gimple_stmt (dump_file, stmt, 0,
-						TDF_VOPS | TDF_MEMSYMS));
+		DEBUG_PRINT (dump_printf_loc (MSG_MISSED_OPTIMIZATION, stmt,
+					      "[scop-detection-fail] "
+					      "Graphite cannot represent cond "
+					      "statement operator. "
+					      "Type must be integral.\n"));
 		return false;
 	      }
 	  }

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 16/40] graphite: Rename isl_id_for_ssa_name
  2021-12-15 15:54 ` [PATCH 16/40] graphite: Rename isl_id_for_ssa_name Frederik Harwath
@ 2022-05-16 12:49   ` Tobias Burnus
  2022-05-17  8:22     ` Richard Biener
  0 siblings, 1 reply; 49+ messages in thread
From: Tobias Burnus @ 2022-05-16 12:49 UTC (permalink / raw)
  To: gcc-patches, Richard Biener, Pop, Sebastian
  Cc: sebpop, thomas, grosser, Frederik Harwath

[-- Attachment #1: Type: text/plain, Size: 653 bytes --]

Rediffed Frederik's patch. Actual change is just s/.c/.cc/ but also a
missing space → tab.

On 15.12.21 16:54, Frederik Harwath wrote:
> The SSA names for which this function gets used are always SCoP
> parameters and hence "isl_id_for_parameter" is a better name.  It also
> explains the prefix "P_" for those names in the ISL representation.

OK for mainline?

Tobias
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Attachment #2: 016-graphite-Rename-isl_id_for_ssa_name.diff --]
[-- Type: text/x-patch, Size: 2168 bytes --]

graphite: Rename isl_id_for_ssa_name

The SSA names for which this function gets used are always SCoP
parameters and hence "isl_id_for_parameter" is a better name.  It also
explains the prefix "P_" for those names in the ISL representation.

gcc/ChangeLog:

	* graphite-sese-to-poly.cc (isl_id_for_ssa_name): Rename to ...
	(isl_id_for_parameter): ... this new function name.
	(build_scop_context): Adjust function use.

 gcc/graphite-sese-to-poly.cc | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/gcc/graphite-sese-to-poly.cc b/gcc/graphite-sese-to-poly.cc
index 5a6d779052c..ea67b267e1c 100644
--- a/gcc/graphite-sese-to-poly.cc
+++ b/gcc/graphite-sese-to-poly.cc
@@ -100,14 +100,15 @@ extract_affine_mul (scop_p s, tree e, __isl_take isl_space *space)
   return isl_pw_aff_mul (lhs, rhs);
 }
 
-/* Return an isl identifier from the name of the ssa_name E.  */
+/* Return an isl identifier for the parameter P.  */
 
 static isl_id *
-isl_id_for_ssa_name (scop_p s, tree e)
+isl_id_for_parameter (scop_p s, tree p)
 {
-  char name1[14];
-  snprintf (name1, sizeof (name1), "P_%d", SSA_NAME_VERSION (e));
-  return isl_id_alloc (s->isl_context, name1, e);
+  gcc_checking_assert (TREE_CODE (p) == SSA_NAME);
+  char name[14];
+  snprintf (name, sizeof (name), "P_%d", SSA_NAME_VERSION (p));
+  return isl_id_alloc (s->isl_context, name, p);
 }
 
 /* Return an isl identifier for the data reference DR.  Data references and
@@ -898,15 +899,15 @@ build_scop_context (scop_p scop)
   isl_space *space = isl_space_set_alloc (scop->isl_context, nbp, 0);
 
   unsigned i;
-  tree e;
-  FOR_EACH_VEC_ELT (region->params, i, e)
+  tree p;
+  FOR_EACH_VEC_ELT (region->params, i, p)
     space = isl_space_set_dim_id (space, isl_dim_param, i,
-                                  isl_id_for_ssa_name (scop, e));
+				  isl_id_for_parameter (scop, p));
 
   scop->param_context = isl_set_universe (space);
 
-  FOR_EACH_VEC_ELT (region->params, i, e)
-    add_param_constraints (scop, i, e);
+  FOR_EACH_VEC_ELT (region->params, i, p)
+    add_param_constraints (scop, i, p);
 }
 
 /* Return true when loop A is nested in loop B.  */

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 17/40] graphite: Fix minor mistakes in comments
  2021-12-15 15:54 ` [PATCH 17/40] graphite: Fix minor mistakes in comments Frederik Harwath
@ 2022-05-16 12:49   ` Tobias Burnus
  2022-05-17  8:22     ` Richard Biener
  0 siblings, 1 reply; 49+ messages in thread
From: Tobias Burnus @ 2022-05-16 12:49 UTC (permalink / raw)
  To: gcc-patches, Richard Biener, Pop, Sebastian
  Cc: sebpop, thomas, grosser, Frederik Harwath

[-- Attachment #1: Type: text/plain, Size: 709 bytes --]

Another comment-only change.

Otherwise, just re-diffed Frederik's patch. Mostly s/.c/.cc/, but I
added one '. ' that got lost.

On 15.12.21 16:54, Frederik Harwath wrote:
>          * graphite-sese-to-poly.c (build_poly_sr_1): Fix a typo and
>          a reference to a variable which does not exist.
>          * graphite-isl-ast-to-gimple.c (gsi_insert_earliest): Fix typo
>          in comment.

OK for mainline?

Tobias
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Attachment #2: 017-graphite-Fix-minor-mistakes-in-comments.diff --]
[-- Type: text/x-patch, Size: 2061 bytes --]

graphite: Fix minor mistakes in comments

gcc/ChangeLog:

	* graphite-sese-to-poly.cc (build_poly_sr_1): Fix a typo and
	a reference to a variable which does not exist.
	* graphite-isl-ast-to-gimple.cc (gsi_insert_earliest): Fix typo
	in comment.

 gcc/graphite-isl-ast-to-gimple.cc | 2 +-
 gcc/graphite-sese-to-poly.cc      | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/graphite-isl-ast-to-gimple.cc b/gcc/graphite-isl-ast-to-gimple.cc
index 45ed7704807..844b6d4e2b5 100644
--- a/gcc/graphite-isl-ast-to-gimple.cc
+++ b/gcc/graphite-isl-ast-to-gimple.cc
@@ -1014,7 +1014,7 @@ gsi_insert_earliest (gimple_seq seq)
   basic_block begin_bb = get_entry_bb (codegen_region);
 
   /* Inserting the gimple statements in a vector because gimple_seq behave
-     in strage ways when inserting the stmts from it into different basic
+     in strange ways when inserting the stmts from it into different basic
      blocks one at a time.  */
   auto_vec<gimple *, 3> stmts;
   for (gimple_stmt_iterator gsi = gsi_start (seq); !gsi_end_p (gsi);
diff --git a/gcc/graphite-sese-to-poly.cc b/gcc/graphite-sese-to-poly.cc
index ea67b267e1c..51ba3af204f 100644
--- a/gcc/graphite-sese-to-poly.cc
+++ b/gcc/graphite-sese-to-poly.cc
@@ -649,14 +649,14 @@ build_poly_sr_1 (poly_bb_p pbb, gimple *stmt, tree var, enum poly_dr_type kind,
 		 isl_map *acc, isl_set *subscript_sizes)
 {
   scop_p scop = PBB_SCOP (pbb);
-  /* Each scalar variables has a unique alias set number starting from
+  /* Each scalar variable has a unique alias set number starting from
      the maximum alias set assigned to a dr.  */
   int alias_set = scop->max_alias_set + SSA_NAME_VERSION (var);
   subscript_sizes = isl_set_fix_si (subscript_sizes, isl_dim_set, 0,
 				    alias_set);
 
   /* Add a constrain to the ACCESSES polyhedron for the alias set of
-     data reference DR.  */
+     the reference.  */
   isl_constraint *c
     = isl_equality_alloc (isl_local_space_from_space (isl_map_get_space (acc)));
   c = isl_constraint_set_constant_si (c, -alias_set);

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 15/40] graphite: Extend SCoP detection dump output
  2022-05-16 12:49   ` Tobias Burnus
@ 2022-05-17  8:21     ` Richard Biener
  2022-05-18 12:19       ` Harwath, Frederik
  0 siblings, 1 reply; 49+ messages in thread
From: Richard Biener @ 2022-05-17  8:21 UTC (permalink / raw)
  To: Tobias Burnus
  Cc: gcc-patches, Pop, Sebastian, sebpop, thomas, grosser, Frederik Harwath

On Mon, 16 May 2022, Tobias Burnus wrote:

> As requested by Richard: Rediffed patch.
> 
> Changes: s/.c/.cc/ + some whitespace changes.
> (At least in my email reader, some <tab> were lost. I also fixed too-long line
> issues.)
> 
> In addition, FOR_EACH_LOOP was replaced by 'for (auto loop : ...'
> (macro was removed late in GCC 12 development ? r12-2605-ge41ba804ba5f5c)
> 
> Otherwise, it should be identical to Frederik's patch, earlier in this thread.
> 
> On 15.12.21 16:54, Frederik Harwath wrote:
> > Extend dump output to make understanding why Graphite rejects to
> > include a loop in a SCoP easier (for GCC developers).
> 
> OK for mainline?

+  if (printed)
+    fprintf (file, "\b\b");

please find other means of omitting ", ", like by printing it
_before_ the number but only for the second and following loop number.

I'll also note that

+static void
+print_sese_loop_numbers (FILE *file, sese_l sese)
+{ 
+  bool printed = false;
+  for (auto loop : loops_list (cfun, 0))
+    { 
+      if (loop_in_sese_p (loop, sese))
+       fprintf (file, "%d, ", loop->num);
+      printed = true;
+    }

is hardly optimal.  Please instead iterate over 
sese.entry->dest->loop_father and children instead which you can do
by passing that as extra argument to loops_list.

+
+  if (dump_file && dump_flags & TDF_DETAILS)
+    {
+      fprintf (dump_file, "Loops in SCoP: ");
+      for (auto loop : loops_list (cfun, 0))
+       if (loop_in_sese_p (loop, s))
+         fprintf (dump_file, "%d ", loop->num);
+      fprintf (dump_file, "\n");
+    }

you are duplicating functionality of the function you just added ...

Otherwise looks OK to me.

Thanks,
Richard.


> Tobias
> -----------------
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstra?e 201, 80634
> M?nchen; Gesellschaft mit beschr?nkter Haftung; Gesch?ftsf?hrer: Thomas
> Heurung, Frank Th?rauf; Sitz der Gesellschaft: M?nchen; Registergericht
> M?nchen, HRB 106955
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 16/40] graphite: Rename isl_id_for_ssa_name
  2022-05-16 12:49   ` Tobias Burnus
@ 2022-05-17  8:22     ` Richard Biener
  0 siblings, 0 replies; 49+ messages in thread
From: Richard Biener @ 2022-05-17  8:22 UTC (permalink / raw)
  To: Tobias Burnus
  Cc: gcc-patches, Pop, Sebastian, sebpop, thomas, grosser, Frederik Harwath

On Mon, 16 May 2022, Tobias Burnus wrote:

> Rediffed Frederik's patch. Actual change is just s/.c/.cc/ but also a
> missing space ? tab.
> 
> On 15.12.21 16:54, Frederik Harwath wrote:
> > The SSA names for which this function gets used are always SCoP
> > parameters and hence "isl_id_for_parameter" is a better name.  It also
> > explains the prefix "P_" for those names in the ISL representation.
> 
> OK for mainline?

OK.

Richard.

> Tobias
> -----------------
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstra?e 201, 80634
> M?nchen; Gesellschaft mit beschr?nkter Haftung; Gesch?ftsf?hrer: Thomas
> Heurung, Frank Th?rauf; Sitz der Gesellschaft: M?nchen; Registergericht
> M?nchen, HRB 106955
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 17/40] graphite: Fix minor mistakes in comments
  2022-05-16 12:49   ` Tobias Burnus
@ 2022-05-17  8:22     ` Richard Biener
  0 siblings, 0 replies; 49+ messages in thread
From: Richard Biener @ 2022-05-17  8:22 UTC (permalink / raw)
  To: Tobias Burnus
  Cc: gcc-patches, Pop, Sebastian, sebpop, thomas, grosser, Frederik Harwath

On Mon, 16 May 2022, Tobias Burnus wrote:

> Another comment-only change.
> 
> Otherwise, just re-diffed Frederik's patch. Mostly s/.c/.cc/, but I
> added one '. ' that got lost.
> 
> On 15.12.21 16:54, Frederik Harwath wrote:
> >          * graphite-sese-to-poly.c (build_poly_sr_1): Fix a typo and
> >          a reference to a variable which does not exist.
> >          * graphite-isl-ast-to-gimple.c (gsi_insert_earliest): Fix typo
> >          in comment.
> 
> OK for mainline?

OK.

Richard.

> Tobias
> -----------------
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstra?e 201, 80634
> M?nchen; Gesellschaft mit beschr?nkter Haftung; Gesch?ftsf?hrer: Thomas
> Heurung, Frank Th?rauf; Sitz der Gesellschaft: M?nchen; Registergericht
> M?nchen, HRB 106955
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 15/40] graphite: Extend SCoP detection dump output
  2022-05-17  8:21     ` Richard Biener
@ 2022-05-18 12:19       ` Harwath, Frederik
  2022-05-18 12:21         ` Richard Biener
  0 siblings, 1 reply; 49+ messages in thread
From: Harwath, Frederik @ 2022-05-18 12:19 UTC (permalink / raw)
  To: Burnus, Tobias, rguenther
  Cc: gcc-patches, sebpop, Schwinge, Thomas, Harwath, Frederik, spop, grosser

[-- Attachment #1: Type: text/plain, Size: 2576 bytes --]

Hi Richard,

On Tue, 2022-05-17 at 08:21 +0000, Richard Biener wrote:
> On Mon, 16 May 2022, Tobias Burnus wrote:
>
> > As requested by Richard: Rediffed patch.
> >
> > Changes: s/.c/.cc/ + some whitespace changes.
> > (At least in my email reader, some <tab> were lost. I also fixed
> > too-long line
> > issues.)
> >
> > In addition, FOR_EACH_LOOP was replaced by 'for (auto loop : ...'
> > (macro was removed late in GCC 12 development ? r12-2605-
> > ge41ba804ba5f5c)
> >
> > Otherwise, it should be identical to Frederik's patch, earlier in
> > this thread.
> >
> > On 15.12.21 16:54, Frederik Harwath wrote:
> > > Extend dump output to make understanding why Graphite rejects to
> > > include a loop in a SCoP easier (for GCC developers).
> >
> > OK for mainline?
>
> +  if (printed)
> +    fprintf (file, "\b\b");
>
> please find other means of omitting ", ", like by printing it
> _before_ the number but only for the second and following loop
> number.

Done.

>
> I'll also note that
>
> +static void
> +print_sese_loop_numbers (FILE *file, sese_l sese)
> +{
> +  bool printed = false;
> +  for (auto loop : loops_list (cfun, 0))
> +    {
> +      if (loop_in_sese_p (loop, sese))
> +       fprintf (file, "%d, ", loop->num);
> +      printed = true;
> +    }
>
> is hardly optimal.  Please instead iterate over
> sese.entry->dest->loop_father and children instead which you can do
> by passing that as extra argument to loops_list.

Done.

This had to be extended a little bit, because a SCoP
can consist of consecutive loop-nests and iterating
only over "loops_list (cfun, LI_INCLUDE_ROOT, sese.entry->dest-
>loop_father))" would output only the loops from the first
loop-nest in the SCoP (cf. the test file scop-22a.c that I added).

>
> +
> +  if (dump_file && dump_flags & TDF_DETAILS)
> +    {
> +      fprintf (dump_file, "Loops in SCoP: ");
> +      for (auto loop : loops_list (cfun, 0))
> +       if (loop_in_sese_p (loop, s))
> +         fprintf (dump_file, "%d ", loop->num);
> +      fprintf (dump_file, "\n");
> +    }
>
> you are duplicating functionality of the function you just added ...
>

Fixed.

> Otherwise looks OK to me.

Can I commit the revised patch?

Thanks for your review,
Frederik

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-graphite-Extend-SCoP-detection-dump-output.patch --]
[-- Type: text/x-patch; name="0001-graphite-Extend-SCoP-detection-dump-output.patch", Size: 11601 bytes --]

From fb268a37704b1598a84051c735514ff38adad038 Mon Sep 17 00:00:00 2001
From: Frederik Harwath <frederik@codesourcery.com>
Date: Wed, 18 May 2022 07:59:42 +0200
Subject: [PATCH] graphite: Extend SCoP detection dump output

Extend dump output to make understanding why Graphite rejects to
include a loop in a SCoP easier (for GCC developers).

gcc/ChangeLog:

	* graphite-scop-detection.cc (scop_detection::can_represent_loop):
	Output reason for failure to dump file.
	(scop_detection::harmful_loop_in_region): Likewise.
	(scop_detection::graphite_can_represent_expr): Likewise.
	(scop_detection::stmt_has_simple_data_refs_p): Likewise.
	(scop_detection::stmt_simple_for_scop_p): Likewise.
	(print_sese_loop_numbers): New function.
	(scop_detection::add_scop): Use from here.

gcc/testsuite/ChangeLog:

	* gcc.dg/graphite/scop-22a.c: New test.
---
 gcc/graphite-scop-detection.cc           | 184 ++++++++++++++++++++---
 gcc/testsuite/gcc.dg/graphite/scop-22a.c |  56 +++++++
 2 files changed, 219 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/graphite/scop-22a.c

diff --git a/gcc/graphite-scop-detection.cc b/gcc/graphite-scop-detection.cc
index 8c0ee9975579..9792d87ee0ae 100644
--- a/gcc/graphite-scop-detection.cc
+++ b/gcc/graphite-scop-detection.cc
@@ -69,12 +69,27 @@ public:
     fprintf (output.dump_file, "%d", i);
     return output;
   }
+
   friend debug_printer &
   operator<< (debug_printer &output, const char *s)
   {
     fprintf (output.dump_file, "%s", s);
     return output;
   }
+
+  friend debug_printer &
+  operator<< (debug_printer &output, gimple* stmt)
+  {
+    print_gimple_stmt (output.dump_file, stmt, 0, TDF_VOPS | TDF_MEMSYMS);
+    return output;
+  }
+
+  friend debug_printer &
+  operator<< (debug_printer &output, tree t)
+  {
+    print_generic_expr (output.dump_file, t, TDF_SLIM);
+    return output;
+  }
 } dp;
 
 #define DEBUG_PRINT(args) do \
@@ -506,6 +521,27 @@ scop_detection::merge_sese (sese_l first, sese_l second) const
   return combined;
 }
 
+/* Print the loop numbers of the loops contained in SESE to FILE. */
+
+static void
+print_sese_loop_numbers (FILE *file, sese_l sese)
+{
+  bool first_loop = true;
+  for (loop_p nest = sese.entry->dest->loop_father; nest; nest = nest->next)
+    {
+      if (!loop_in_sese_p (nest, sese))
+        break;
+
+      for (auto loop : loops_list (cfun, LI_INCLUDE_ROOT, nest))
+        {
+          gcc_assert (loop_in_sese_p (loop, sese));
+
+          fprintf (file, "%s%d", first_loop ? "" : ", ", loop->num);
+          first_loop = false;
+        }
+    }
+}
+
 /* Build scop outer->inner if possible.  */
 
 void
@@ -519,6 +555,10 @@ scop_detection::build_scop_depth (loop_p loop)
       if (! next
 	  || harmful_loop_in_region (next))
 	{
+	  if (next)
+	    DEBUG_PRINT (dp << "[scop-detection] Discarding SCoP on loops ";
+			 print_sese_loop_numbers (dump_file, next);
+			 dp << " because of harmful loops\n");
 	  if (s)
 	    add_scop (s);
 	  build_scop_depth (loop);
@@ -560,14 +600,63 @@ scop_detection::can_represent_loop (loop_p loop, sese_l scop)
       || !single_pred_p (loop->latch)
       || exit->src != single_pred (loop->latch)
       || !empty_block_p (loop->latch))
-    return false;
+    {
+      DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop shape unsupported.\n");
+      return false;
+    }
+
+  bool edge_irreducible = (loop_preheader_edge (loop)->flags
+			   & EDGE_IRREDUCIBLE_LOOP);
+  if (edge_irreducible)
+    {
+      DEBUG_PRINT (dp << "[can_represent_loop-fail] "
+			 "Loop is not a natural loop.\n");
+      return false;
+    }
+
+  bool niter_is_unconditional = number_of_iterations_exit (loop,
+							   single_exit (loop),
+							   &niter_desc, false);
+
+  if (!niter_is_unconditional)
+    {
+      DEBUG_PRINT (dp << "[can_represent_loop-fail] "
+			 "Loop niter not unconditional.\n"
+			 "Condition: " << niter_desc.assumptions << "\n");
+      return false;
+    }
+
+  niter = number_of_latch_executions (loop);
+  if (!niter)
+    {
+      DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop niter unknown.\n");
+      return false;
+    }
+  if (!niter_desc.control.no_overflow)
+    {
+      DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop niter can overflow.\n");
+      return false;
+    }
 
-  return !(loop_preheader_edge (loop)->flags & EDGE_IRREDUCIBLE_LOOP)
-    && number_of_iterations_exit (loop, single_exit (loop), &niter_desc, false)
-    && niter_desc.control.no_overflow
-    && (niter = number_of_latch_executions (loop))
-    && !chrec_contains_undetermined (niter)
-    && graphite_can_represent_expr (scop, loop, niter);
+  bool undetermined_coefficients = chrec_contains_undetermined (niter);
+  if (undetermined_coefficients)
+    {
+      DEBUG_PRINT (dp << "[can_represent_loop-fail] "
+			 "Loop niter chrec contains undetermined "
+			 "coefficients.\n");
+      return false;
+    }
+
+  bool can_represent_expr = graphite_can_represent_expr (scop, loop, niter);
+  if (!can_represent_expr)
+    {
+      DEBUG_PRINT (dp << "[can_represent_loop-fail] "
+		      << "Loop niter expression cannot be represented: "
+		      << niter << "\n");
+      return false;
+    }
+
+  return true;
 }
 
 /* Return true when BEGIN is the preheader edge of a loop with a single exit
@@ -640,6 +729,13 @@ scop_detection::add_scop (sese_l s)
 
   scops.safe_push (s);
   DEBUG_PRINT (dp << "[scop-detection] Adding SCoP: "; print_sese (dump_file, s));
+
+  if (dump_file && dump_flags & TDF_DETAILS)
+    {
+      fprintf (dump_file, "Loops in SCoP: ");
+      print_sese_loop_numbers (dump_file, s);
+      fprintf (dump_file, "\n");
+    }
 }
 
 /* Return true when a statement in SCOP cannot be represented by Graphite.  */
@@ -665,7 +761,12 @@ scop_detection::harmful_loop_in_region (sese_l scop) const
 
       /* The basic block should not be part of an irreducible loop.  */
       if (bb->flags & BB_IRREDUCIBLE_LOOP)
-	return true;
+	{
+	  DEBUG_PRINT (dp << "[scop-detection-fail] Found bb in irreducible "
+			     "loop.\n");
+
+	  return true;
+	}
 
       /* Check for unstructured control flow: CFG not generated by structured
 	 if-then-else.  */
@@ -676,7 +777,11 @@ scop_detection::harmful_loop_in_region (sese_l scop) const
 	  FOR_EACH_EDGE (e, ei, bb->succs)
 	    if (!dominated_by_p (CDI_POST_DOMINATORS, bb, e->dest)
 		&& !dominated_by_p (CDI_DOMINATORS, e->dest, bb))
-	      return true;
+	      {
+		DEBUG_PRINT (dp << "[scop-detection-fail] Found unstructured "
+				   "control flow.\n");
+		return true;
+	      }
 	}
 
       /* Collect all loops in the current region.  */
@@ -688,7 +793,11 @@ scop_detection::harmful_loop_in_region (sese_l scop) const
       for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
 	   !gsi_end_p (gsi); gsi_next (&gsi))
 	if (!stmt_simple_for_scop_p (scop, gsi_stmt (gsi), bb))
-	  return true;
+	  {
+	    DEBUG_PRINT (dp << "[scop-detection-fail] "
+			       "Found harmful statement.\n");
+	    return true;
+	  }
 
       for (basic_block dom = first_dom_son (CDI_DOMINATORS, bb);
 	   dom;
@@ -731,9 +840,10 @@ scop_detection::harmful_loop_in_region (sese_l scop) const
 	  && ! loop_nest_has_data_refs (loop))
 	{
 	  DEBUG_PRINT (dp << "[scop-detection-fail] loop_" << loop->num
-		       << "does not have any data reference.\n");
+		       << " does not have any data reference.\n");
 	  return true;
 	}
+      DEBUG_PRINT (dp << "[scop-detection] loop_" << loop->num << " is harmless.\n");
     }
 
   return false;
@@ -922,7 +1032,21 @@ scop_detection::graphite_can_represent_expr (sese_l scop, loop_p loop,
 					     tree expr)
 {
   tree scev = cached_scalar_evolution_in_region (scop, loop, expr);
-  return graphite_can_represent_scev (scop, scev);
+  bool can_represent = graphite_can_represent_scev (scop, scev);
+
+  if (!can_represent)
+    {
+      if (dump_file)
+	{
+	  fprintf (dump_file,
+		   "[graphite_can_represent_expr] Cannot represent scev \"");
+	  print_generic_expr (dump_file, scev, TDF_SLIM);
+	  fprintf (dump_file, "\" of expression ");
+	  print_generic_expr (dump_file, expr, TDF_SLIM);
+	  fprintf (dump_file, " in loop %d\n", loop->num);
+	}
+    }
+  return can_represent;
 }
 
 /* Return true if the data references of STMT can be represented by Graphite.
@@ -938,7 +1062,11 @@ scop_detection::stmt_has_simple_data_refs_p (sese_l scop, gimple *stmt)
 
   auto_vec<data_reference_p> drs;
   if (! graphite_find_data_references_in_stmt (nest, loop, stmt, &drs))
-    return false;
+    {
+      DEBUG_PRINT (dp << "[stmt_has_simple_data_refs_p] "
+			 "Unanalyzable statement.\n");
+      return false;
+    }
 
   int j;
   data_reference_p dr;
@@ -946,7 +1074,12 @@ scop_detection::stmt_has_simple_data_refs_p (sese_l scop, gimple *stmt)
     {
       for (unsigned i = 0; i < DR_NUM_DIMENSIONS (dr); ++i)
 	if (! graphite_can_represent_scev (scop, DR_ACCESS_FN (dr, i)))
-	  return false;
+	  {
+	    DEBUG_PRINT (dp << "[stmt_has_simple_data_refs_p] "
+			       "Cannot represent access function SCEV: "
+			    << DR_ACCESS_FN (dr, i) << "\n");
+	    return false;
+	  }
     }
 
   return true;
@@ -1027,14 +1160,23 @@ scop_detection::stmt_simple_for_scop_p (sese_l scop, gimple *stmt,
 	for (unsigned i = 0; i < 2; ++i)
 	  {
 	    tree op = gimple_op (stmt, i);
-	    if (!graphite_can_represent_expr (scop, loop, op)
-		/* We can only constrain on integer type.  */
-		|| ! INTEGRAL_TYPE_P (TREE_TYPE (op)))
+	    if (!graphite_can_represent_expr (scop, loop, op))
+	      {
+		DEBUG_PRINT (dump_printf_loc (MSG_MISSED_OPTIMIZATION, stmt,
+					      "[scop-detection-fail] "
+					      "Graphite cannot represent cond "
+					      "stmt operator expression.\n"));
+		DEBUG_PRINT (dp << op << "\n");
+		return false;
+	      }
+
+	    if (! INTEGRAL_TYPE_P (TREE_TYPE (op)))
 	      {
-		DEBUG_PRINT (dp << "[scop-detection-fail] "
-				<< "Graphite cannot represent stmt:\n";
-			     print_gimple_stmt (dump_file, stmt, 0,
-						TDF_VOPS | TDF_MEMSYMS));
+		DEBUG_PRINT (dump_printf_loc (MSG_MISSED_OPTIMIZATION, stmt,
+					      "[scop-detection-fail] "
+					      "Graphite cannot represent cond "
+					      "statement operator. "
+					      "Type must be integral.\n"));
 		return false;
 	      }
 	  }
diff --git a/gcc/testsuite/gcc.dg/graphite/scop-22a.c b/gcc/testsuite/gcc.dg/graphite/scop-22a.c
new file mode 100644
index 000000000000..00d4b5315aeb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/graphite/scop-22a.c
@@ -0,0 +1,56 @@
+/* { dg-require-effective-target size32plus } */
+double u[1782225];
+
+void foo(int N, int *res)
+{
+  int i, j;
+  double a, b;
+  double sum = 0.0;
+
+  for (j = 3; j < N; j = j * j)
+    {
+      sum += a + b;
+    }
+
+  /* Next two loops form first SCoP */
+  for (i = 0; i < N; i++)
+    sum += u[i];
+
+  for (i = 0; i < N; i++)
+    {
+      a = u[i];
+      u[i] = i * i;
+      b = u[i];
+      sum += a + b;
+    }
+
+  for (j = 3; j < N; j = j * j)
+    {
+      sum += a + b;
+    }
+
+  for (j = 3; j < N; j = j * j)
+    {
+      sum += a + b;
+    }
+
+  /* Next two loop-nests form second SCoP */
+  for (i = 0; i < N; i++)
+    sum += u[i];
+
+  for (i = 0; i < N; i++)
+    for (j = 0; j < N; j++)
+      {
+	a = u[i];
+	u[i] = i * i;
+	b = u[j];
+	sum += a + b;
+      }
+
+  *res = sum + N;
+}
+
+/* { dg-final { scan-tree-dump-times "number of SCoPs: 2" 1 "graphite"} } */
+/* { dg-final { scan-tree-dump-times "Loops in SCoP" 2 "graphite"} } */
+/* { dg-final { scan-tree-dump "Loops in SCoP: 2, 3" "graphite"} } */
+/* { dg-final { scan-tree-dump "Loops in SCoP: 6, 7, 8" "graphite"} } */
-- 
2.36.0


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 15/40] graphite: Extend SCoP detection dump output
  2022-05-18 12:19       ` Harwath, Frederik
@ 2022-05-18 12:21         ` Richard Biener
  0 siblings, 0 replies; 49+ messages in thread
From: Richard Biener @ 2022-05-18 12:21 UTC (permalink / raw)
  To: Harwath, Frederik
  Cc: Burnus, Tobias, gcc-patches, sebpop, Schwinge, Thomas, spop, grosser

On Wed, 18 May 2022, Harwath, Frederik wrote:

> Hi Richard,
> 
> On Tue, 2022-05-17 at 08:21 +0000, Richard Biener wrote:
> > On Mon, 16 May 2022, Tobias Burnus wrote:
> >
> > > As requested by Richard: Rediffed patch.
> > >
> > > Changes: s/.c/.cc/ + some whitespace changes.
> > > (At least in my email reader, some <tab> were lost. I also fixed
> > > too-long line
> > > issues.)
> > >
> > > In addition, FOR_EACH_LOOP was replaced by 'for (auto loop : ...'
> > > (macro was removed late in GCC 12 development ? r12-2605-
> > > ge41ba804ba5f5c)
> > >
> > > Otherwise, it should be identical to Frederik's patch, earlier in
> > > this thread.
> > >
> > > On 15.12.21 16:54, Frederik Harwath wrote:
> > > > Extend dump output to make understanding why Graphite rejects to
> > > > include a loop in a SCoP easier (for GCC developers).
> > >
> > > OK for mainline?
> >
> > +  if (printed)
> > +    fprintf (file, "\b\b");
> >
> > please find other means of omitting ", ", like by printing it
> > _before_ the number but only for the second and following loop
> > number.
> 
> Done.
> 
> >
> > I'll also note that
> >
> > +static void
> > +print_sese_loop_numbers (FILE *file, sese_l sese)
> > +{
> > +  bool printed = false;
> > +  for (auto loop : loops_list (cfun, 0))
> > +    {
> > +      if (loop_in_sese_p (loop, sese))
> > +       fprintf (file, "%d, ", loop->num);
> > +      printed = true;
> > +    }
> >
> > is hardly optimal.  Please instead iterate over
> > sese.entry->dest->loop_father and children instead which you can do
> > by passing that as extra argument to loops_list.
> 
> Done.
> 
> This had to be extended a little bit, because a SCoP
> can consist of consecutive loop-nests and iterating
> only over "loops_list (cfun, LI_INCLUDE_ROOT, sese.entry->dest-
> >loop_father))" would output only the loops from the first
> loop-nest in the SCoP (cf. the test file scop-22a.c that I added).
> 
> >
> > +
> > +  if (dump_file && dump_flags & TDF_DETAILS)
> > +    {
> > +      fprintf (dump_file, "Loops in SCoP: ");
> > +      for (auto loop : loops_list (cfun, 0))
> > +       if (loop_in_sese_p (loop, s))
> > +         fprintf (dump_file, "%d ", loop->num);
> > +      fprintf (dump_file, "\n");
> > +    }
> >
> > you are duplicating functionality of the function you just added ...
> >
> 
> Fixed.
> 
> > Otherwise looks OK to me.
> 
> Can I commit the revised patch?

Yes.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2022-05-18 12:21 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-15 15:54 [PATCH 00/40] OpenACC "kernels" Improvements Frederik Harwath
2021-12-15 15:54 ` [PATCH 01/40] Kernels loops annotation: C and C++ Frederik Harwath
2021-12-15 15:54 ` [PATCH 02/40] Add -fno-openacc-kernels-annotate-loops option to more testcases Frederik Harwath
2021-12-15 15:54 ` [PATCH 03/40] Kernels loops annotation: Fortran Frederik Harwath
2021-12-15 15:54 ` [PATCH 04/40] Additional Fortran testsuite fixes for kernels loops annotation pass Frederik Harwath
2021-12-15 15:54 ` [PATCH 05/40] Fix bug in processing of array dimensions in data clauses Frederik Harwath
2021-12-15 15:54 ` [PATCH 06/40] Add a "combined" flag for "acc kernels loop" etc directives Frederik Harwath
2021-12-15 15:54 ` [PATCH 07/40] Annotate inner loops in "acc kernels loop" directives (C/C++) Frederik Harwath
2021-12-15 15:54 ` [PATCH 08/40] Annotate inner loops in "acc kernels loop" directives (Fortran) Frederik Harwath
2021-12-15 15:54 ` [PATCH 09/40] Permit calls to builtins and intrinsics in kernels loops Frederik Harwath
2021-12-15 15:54 ` [PATCH 10/40] Fix patterns in Fortran tests for kernels loop annotation Frederik Harwath
2021-12-15 15:54 ` [PATCH 11/40] Clean up loop variable extraction in OpenACC " Frederik Harwath
2021-12-15 15:54 ` [PATCH 12/40] Relax some restrictions on the loop bound in " Frederik Harwath
2021-12-15 15:54 ` [PATCH 13/40] Fortran: Delinearize array accesses Frederik Harwath
2021-12-15 15:54 ` [PATCH 14/40] openacc: Move pass_oacc_device_lower after pass_graphite Frederik Harwath
2021-12-15 15:54 ` [PATCH 15/40] graphite: Extend SCoP detection dump output Frederik Harwath
2022-05-16 12:49   ` Tobias Burnus
2022-05-17  8:21     ` Richard Biener
2022-05-18 12:19       ` Harwath, Frederik
2022-05-18 12:21         ` Richard Biener
2021-12-15 15:54 ` [PATCH 16/40] graphite: Rename isl_id_for_ssa_name Frederik Harwath
2022-05-16 12:49   ` Tobias Burnus
2022-05-17  8:22     ` Richard Biener
2021-12-15 15:54 ` [PATCH 17/40] graphite: Fix minor mistakes in comments Frederik Harwath
2022-05-16 12:49   ` Tobias Burnus
2022-05-17  8:22     ` Richard Biener
2021-12-15 15:54 ` [PATCH 18/40] Move compute_alias_check_pairs to tree-data-ref.c Frederik Harwath
2021-12-15 15:54 ` [PATCH 19/40] graphite: Add runtime alias checking Frederik Harwath
2021-12-15 15:54 ` [PATCH 20/40] openacc: Use Graphite for dependence analysis in "kernels" regions Frederik Harwath
2021-12-15 15:54 ` [PATCH 21/40] openacc: Add "can_be_parallel" flag info to "graph" dumps Frederik Harwath
2021-12-15 15:54 ` [PATCH 22/40] openacc: Remove unused partitioning in "kernels" regions Frederik Harwath
2021-12-15 15:54 ` [PATCH 23/40] Add function for printing a single OMP_CLAUSE Frederik Harwath
2021-12-15 15:54 ` [PATCH 24/40] openacc: Add data optimization pass Frederik Harwath
2021-12-15 15:54 ` [PATCH 25/40] openacc: Add runtime alias checking for OpenACC kernels Frederik Harwath
2021-12-15 15:54 ` [PATCH 26/40] openacc: Warn about "independent" "kernels" loops with data-dependences Frederik Harwath
2021-12-15 15:54 ` [PATCH 27/40] openacc: Handle internal function calls in pass_lim Frederik Harwath
2021-12-15 15:54 ` [PATCH 28/40] openacc: Disable pass_pre on outlined functions analyzed by Graphite Frederik Harwath
2021-12-15 15:54 ` [PATCH 29/40] graphite: Tune parameters for OpenACC use Frederik Harwath
2021-12-15 15:54 ` [PATCH 30/40] graphite: Adjust scop loop-nest choice Frederik Harwath
2021-12-15 15:54 ` [PATCH 31/40] graphite: Accept loops without data references Frederik Harwath
2021-12-15 15:54 ` [PATCH 32/40] Reference reduction localization Frederik Harwath
2021-12-15 15:54 ` [PATCH 33/40] Fix tree check failure with " Frederik Harwath
2021-12-15 15:54 ` [PATCH 34/40] Use more appropriate var in localize_reductions call Frederik Harwath
2021-12-15 15:54 ` [PATCH 35/40] Handle references in OpenACC "private" clauses Frederik Harwath
2021-12-15 15:54 ` [PATCH 36/40] openacc: Enable reduction variable localization for "kernels" Frederik Harwath
2021-12-15 15:54 ` [PATCH 37/40] Fix for is_gimple_reg vars to 'data kernels' Frederik Harwath
2021-12-15 15:54 ` [PATCH 38/40] openacc: fix privatization of by-reference arrays Frederik Harwath
2021-12-15 15:54 ` [PATCH 39/40] openacc: Check type for references in reduction lowering Frederik Harwath
2021-12-16 12:00 ` [PATCH 40/40] openacc: Adjust testsuite to new "kernels" handling Frederik Harwath

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).