public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives
@ 2023-03-24 15:30 Frederik Harwath
  2023-03-24 15:30 ` [PATCH 1/7] openmp: Add Fortran support for "omp unroll" directive Frederik Harwath
                   ` (8 more replies)
  0 siblings, 9 replies; 21+ messages in thread
From: Frederik Harwath @ 2023-03-24 15:30 UTC (permalink / raw)
  To: gcc-patches, fortran, tobias, jakub, joseph, jason

Hi,
this patch series implements the OpenMP 5.1 "unroll" and "tile"
constructs.  It includes changes to the C,C++, and Fortran front end
for parsing the new constructs and a new middle-end
"omp_transform_loops" pass which implements the transformations in a
source language agnostic way.  The "unroll" and "tile" directives are
internally implemented as clauses.  This fits the representation of
collapsed loop nests by a single internal gomp_for construct.  Loop
transformations can be applied to loops at the different levels of
such a loop nest and this can be represented well with the clause
representation.  The transformations can also be applied to loops
which are not going to be associated with any OpenMP directive after
the transformation. This is represented by a new gomp_for kind.  Loops
of this kind are lowered in the transformation pass since they are not
subject to any further OpenMP-specific processing.

The patches are roughly presented in the order of their development:
Each construct is implemented in the Fortran front end first including
the middle-end additions/changes, followed by a patch that adds the C
and C++ front end changes.  This initial implementation supports the
loop transformation constructs on the outermost loop of a loop nest
only.  The support for applying the transformations to inner loops is
then added in two further patches.

The patches have been bootstrapped and tested on x86_64-linux-gnu with
both nvptx-none and amdgcn-amdhsa offloading.

Best regards,
Frederik

Frederik Harwath (7):
  openmp: Add Fortran support for "omp unroll" directive
  openmp: Add C/C++ support for "omp unroll" directive
  openacc: Rename OMP_CLAUSE_TILE to OMP_CLAUSE_OACC_TILE
  openmp: Add Fortran support for "omp tile"
  openmp: Add C/C++ support for "omp tile"
  openmp: Add Fortran support for loop transformations on inner loops
  openmp: Add C/C++ support for loop transformations on inner loops

 gcc/Makefile.in                               |    1 +
 gcc/c-family/c-gimplify.cc                    |    1 +
 gcc/c-family/c-omp.cc                         |   12 +-
 gcc/c-family/c-pragma.cc                      |    2 +
 gcc/c-family/c-pragma.h                       |    7 +-
 gcc/c/c-parser.cc                             |  403 +++-
 gcc/c/c-typeck.cc                             |   10 +-
 gcc/cp/cp-gimplify.cc                         |    3 +
 gcc/cp/parser.cc                              |  453 ++++-
 gcc/cp/pt.cc                                  |   15 +-
 gcc/cp/semantics.cc                           |  104 +-
 gcc/fortran/dump-parse-tree.cc                |   30 +
 gcc/fortran/gfortran.h                        |   12 +-
 gcc/fortran/match.h                           |    2 +
 gcc/fortran/openmp.cc                         |  460 ++++-
 gcc/fortran/parse.cc                          |   52 +-
 gcc/fortran/resolve.cc                        |    6 +
 gcc/fortran/st.cc                             |    2 +
 gcc/fortran/trans-openmp.cc                   |  187 +-
 gcc/fortran/trans.cc                          |    2 +
 gcc/gimple-pretty-print.cc                    |    6 +
 gcc/gimple.h                                  |    1 +
 gcc/gimplify.cc                               |   79 +-
 gcc/omp-general.cc                            |   22 +-
 gcc/omp-general.h                             |    1 +
 gcc/omp-low.cc                                |    6 +-
 gcc/omp-transform-loops.cc                    | 1773 +++++++++++++++++
 gcc/params.opt                                |    9 +
 gcc/passes.def                                |    1 +
 .../loop-transforms/imperfect-loop-nest.c     |   12 +
 .../gomp/loop-transforms/tile-1.c             |  164 ++
 .../gomp/loop-transforms/tile-2.c             |  183 ++
 .../gomp/loop-transforms/tile-3.c             |  117 ++
 .../gomp/loop-transforms/tile-4.c             |  322 +++
 .../gomp/loop-transforms/tile-5.c             |  150 ++
 .../gomp/loop-transforms/tile-6.c             |   34 +
 .../gomp/loop-transforms/tile-7.c             |   31 +
 .../gomp/loop-transforms/tile-8.c             |   40 +
 .../gomp/loop-transforms/unroll-1.c           |  133 ++
 .../gomp/loop-transforms/unroll-2.c           |   95 +
 .../gomp/loop-transforms/unroll-3.c           |   18 +
 .../gomp/loop-transforms/unroll-4.c           |   19 +
 .../gomp/loop-transforms/unroll-5.c           |   19 +
 .../gomp/loop-transforms/unroll-6.c           |   20 +
 .../gomp/loop-transforms/unroll-7.c           |  144 ++
 .../gomp/loop-transforms/unroll-inner-1.c     |   15 +
 .../gomp/loop-transforms/unroll-inner-2.c     |   31 +
 .../gomp/loop-transforms/unroll-non-rect-1.c  |   37 +
 .../gomp/loop-transforms/unroll-non-rect-2.c  |   22 +
 .../gomp/loop-transforms/unroll-simd-1.c      |   84 +
 .../g++.dg/gomp/loop-transforms/tile-1.h      |   27 +
 .../g++.dg/gomp/loop-transforms/tile-1a.C     |   27 +
 .../g++.dg/gomp/loop-transforms/tile-1b.C     |   27 +
 .../g++.dg/gomp/loop-transforms/unroll-1.C    |   42 +
 .../g++.dg/gomp/loop-transforms/unroll-2.C    |   47 +
 .../g++.dg/gomp/loop-transforms/unroll-3.C    |   37 +
 .../gomp/loop-transforms/inner-loops.f90      |  124 ++
 .../gomp/loop-transforms/tile-1.f90           |  163 ++
 .../gomp/loop-transforms/tile-1a.f90          |   10 +
 .../gomp/loop-transforms/tile-2.f90           |   80 +
 .../gomp/loop-transforms/tile-3.f90           |   18 +
 .../gomp/loop-transforms/tile-4.f90           |   95 +
 .../loop-transforms/tile-imperfect-nest.f90   |   93 +
 .../loop-transforms/tile-inner-loops-1.f90    |   16 +
 .../loop-transforms/tile-inner-loops-2.f90    |   23 +
 .../loop-transforms/tile-inner-loops-3.f90    |   22 +
 .../loop-transforms/tile-inner-loops-3a.f90   |   31 +
 .../loop-transforms/tile-inner-loops-4.f90    |   30 +
 .../loop-transforms/tile-inner-loops-4a.f90   |   26 +
 .../loop-transforms/tile-inner-loops-5.f90    |  123 ++
 .../tile-non-rectangular-1.f90                |   71 +
 .../tile-non-rectangular-2.f90                |   12 +
 .../gomp/loop-transforms/tile-unroll-1.f90    |   57 +
 .../gomp/loop-transforms/unroll-1.f90         |  277 +++
 .../gomp/loop-transforms/unroll-10.f90        |    7 +
 .../gomp/loop-transforms/unroll-11.f90        |   75 +
 .../gomp/loop-transforms/unroll-12.f90        |   29 +
 .../gomp/loop-transforms/unroll-2.f90         |   22 +
 .../gomp/loop-transforms/unroll-3.f90         |   17 +
 .../gomp/loop-transforms/unroll-4.f90         |   18 +
 .../gomp/loop-transforms/unroll-5.f90         |   18 +
 .../gomp/loop-transforms/unroll-6.f90         |   19 +
 .../gomp/loop-transforms/unroll-7.f90         |   62 +
 .../gomp/loop-transforms/unroll-8.f90         |   22 +
 .../gomp/loop-transforms/unroll-9.f90         |   18 +
 .../loop-transforms/unroll-inner-loop.f90     |   57 +
 .../loop-transforms/unroll-no-clause-1.f90    |   20 +
 .../loop-transforms/unroll-no-clause-2.f90    |   21 +
 .../loop-transforms/unroll-no-clause-3.f90    |   23 +
 .../loop-transforms/unroll-non-rect-1.f90     |   31 +
 .../gomp/loop-transforms/unroll-simd-1.f90    |  244 +++
 .../gomp/loop-transforms/unroll-simd-2.f90    |   57 +
 .../gomp/loop-transforms/unroll-tile-1.f90    |   37 +
 .../gomp/loop-transforms/unroll-tile-2.f90    |   41 +
 .../loop-transforms/unroll-tile-inner-1.f90   |   25 +
 gcc/tree-core.h                               |   14 +-
 gcc/tree-nested.cc                            |    4 +-
 gcc/tree-pass.h                               |    1 +
 gcc/tree-pretty-print.cc                      |   56 +-
 gcc/tree.cc                                   |   11 +-
 gcc/tree.def                                  |    6 +
 gcc/tree.h                                    |   23 +-
 .../libgomp.c++/loop-transforms/tile-2.C      |   69 +
 .../libgomp.c++/loop-transforms/tile-3.C      |   28 +
 .../libgomp.c++/loop-transforms/unroll-1.C    |   73 +
 .../libgomp.c++/loop-transforms/unroll-2.C    |   34 +
 .../loop-transforms/unroll-full-tile.C        |   84 +
 .../loop-transforms/matrix-1.h                |   70 +
 .../loop-transforms/matrix-constant-iter.h    |   71 +
 .../loop-transforms/matrix-helper.h           |   19 +
 .../loop-transforms/matrix-no-directive-1.c   |   11 +
 .../matrix-no-directive-unroll-full-1.c       |   13 +
 .../matrix-omp-distribute-parallel-for-1.c    |    6 +
 .../loop-transforms/matrix-omp-for-1.c        |   13 +
 .../matrix-omp-parallel-for-1.c               |   13 +
 .../matrix-omp-parallel-masked-taskloop-1.c   |    6 +
 ...trix-omp-parallel-masked-taskloop-simd-1.c |    6 +
 .../matrix-omp-target-parallel-for-1.c        |   13 +
 ...p-target-teams-distribute-parallel-for-1.c |    6 +
 .../loop-transforms/matrix-omp-taskloop-1.c   |    6 +
 ...trix-omp-teams-distribute-parallel-for-1.c |    6 +
 .../loop-transforms/matrix-simd-1.c           |    6 +
 .../matrix-transform-variants-1.h             |  191 ++
 .../loop-transforms/unroll-1.c                |   76 +
 .../loop-transforms/unroll-non-rect-1.c       |  129 ++
 .../loop-transforms/inner-1.f90               |   77 +
 .../loop-transforms/tile-1.f90                |   71 +
 .../loop-transforms/tile-2.f90                |  117 ++
 .../loop-transforms/tile-unroll-1.f90         |  112 ++
 .../loop-transforms/tile-unroll-2.f90         |   71 +
 .../loop-transforms/tile-unroll-3.f90         |   77 +
 .../loop-transforms/tile-unroll-4.f90         |   75 +
 .../loop-transforms/unroll-1.f90              |   52 +
 .../loop-transforms/unroll-2.f90              |   88 +
 .../loop-transforms/unroll-3.f90              |   59 +
 .../loop-transforms/unroll-4.f90              |   72 +
 .../loop-transforms/unroll-5.f90              |   55 +
 .../loop-transforms/unroll-6.f90              |  105 +
 .../loop-transforms/unroll-7.f90              |  198 ++
 .../loop-transforms/unroll-7a.f90             |    7 +
 .../loop-transforms/unroll-7b.f90             |    7 +
 .../loop-transforms/unroll-7c.f90             |    7 +
 .../loop-transforms/unroll-8.f90              |   38 +
 .../loop-transforms/unroll-simd-1.f90         |   33 +
 .../loop-transforms/unroll-tile-1.f90         |  112 ++
 .../loop-transforms/unroll-tile-2.f90         |   71 +
 146 files changed, 10154 insertions(+), 107 deletions(-)
 create mode 100644 gcc/omp-transform-loops.cc
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/imperfect-loop-nest.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-1.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-2.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-3.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-4.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-5.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-6.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-7.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-8.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-1.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-2.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-3.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-4.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-5.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-6.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-7.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-inner-1.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-inner-2.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-non-rect-1.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-non-rect-2.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-simd-1.c
 create mode 100644 gcc/testsuite/g++.dg/gomp/loop-transforms/tile-1.h
 create mode 100644 gcc/testsuite/g++.dg/gomp/loop-transforms/tile-1a.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/loop-transforms/tile-1b.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/loop-transforms/unroll-1.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/loop-transforms/unroll-2.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/loop-transforms/unroll-3.C
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/inner-loops.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-1a.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-3.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-4.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-imperfect-nest.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-3.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-3a.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-4.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-4a.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-5.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-non-rectangular-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-non-rectangular-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-unroll-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-10.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-11.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-12.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-3.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-4.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-5.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-6.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-7.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-8.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-9.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-inner-loop.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-no-clause-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-no-clause-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-no-clause-3.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-non-rect-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-simd-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-simd-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-inner-1.f90
 create mode 100644 libgomp/testsuite/libgomp.c++/loop-transforms/tile-2.C
 create mode 100644 libgomp/testsuite/libgomp.c++/loop-transforms/tile-3.C
 create mode 100644 libgomp/testsuite/libgomp.c++/loop-transforms/unroll-1.C
 create mode 100644 libgomp/testsuite/libgomp.c++/loop-transforms/unroll-2.C
 create mode 100644 libgomp/testsuite/libgomp.c++/loop-transforms/unroll-full-tile.C
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-1.h
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-constant-iter.h
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-helper.h
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-unroll-full-1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-distribute-parallel-for-1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-for-1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-for-1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-simd-1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-parallel-for-1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-teams-distribute-parallel-for-1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-taskloop-1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-teams-distribute-parallel-for-1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-simd-1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-transform-variants-1.h
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/unroll-1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/unroll-non-rect-1.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/inner-1.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-1.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-1.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-2.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-3.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-4.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-2.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-3.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-4.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-5.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-6.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-7.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-7a.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-7b.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-7c.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-8.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-tile-1.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-tile-2.f90

--
2.36.1

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/7] openmp: Add Fortran support for "omp unroll" directive
  2023-03-24 15:30 [PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives Frederik Harwath
@ 2023-03-24 15:30 ` Frederik Harwath
  2023-04-01  8:42   ` Thomas Schwinge
  2023-03-24 15:30 ` [PATCH 2/7] openmp: Add C/C++ " Frederik Harwath
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Frederik Harwath @ 2023-03-24 15:30 UTC (permalink / raw)
  To: gcc-patches, tobias, fortran, jakub

This commit implements the OpenMP 5.1 "omp unroll" directive for
Fortran. The Fortran front end changes encompass the parsing and the
verification of nesting restrictions etc. The actual loop
transformation is implemented in a new language-independent
"omp_transform_loops" pass which runs before omp lowering.  No attempt
is made to re-use existing unrolling optimizations because a separate
implementation allows for better control of the unrolling. The new
pass will also serve as a foundation for the implementation of further
OpenMP loop transformations. This commit only implements the support
for "omp unroll" on the outermost loop of a loop nest.  The support
for inner loops will be added later.

gcc/ChangeLog:

        * Makefile.in: Add omp_transform_loops.o.
        * gimple-pretty-print.cc (dump_gimple_omp_for): Handle "full"
        and "partial" clauses.
        * gimple.h (enum gf_mask): Add GF_OMP_FOR_KIND_TRANSFORM_LOOP.
        * gimplify.cc (is_gimple_stmt): Handle OMP_UNROLL.
        (gimplify_scan_omp_clauses): Handle OMP_UNROLL_FULL,
        OMP_UNROLL_NONE, and OMP_UNROLL_PARTIAL.
        (gimplify_adjust_omp_clauses): Handle OMP_UNROLL_FULL,
        OMP_UNROLL_NONE, and OMP_UNROLL_PARTIAL.
        (gimplify_omp_for): Handle OMP_UNROLL.
        (gimplify_expr): Likewise.
        * params.opt: Add omp-unroll-full-max-iteration and
        omp-unroll-default-factor.
        * passes.def: Add pass_omp_transform_loop before
        pass_lower_omp.
        * tree-core.h (enum omp_clause_code): Add
        OMP_CLAUSE_UNROLL_NONE, OMP_CLAUSE_UNROLL_FULL, and
        OMP_CLAUSE_UNROLL_PARTIAL.
        * tree-pass.h (make_pass_omp_transform_loops): Declare
        pmake_pass_omp_transform_loops.
        * tree-pretty-print.cc (dump_omp_clause): Handle
        OMP_CLAUSE_UNROLL_NONE, OMP_CLAUSE_UNROLL_FULL, and
        OMP_CLAUSE_UNROLL_PARTIAL.
        (dump_generic_node): Handle OMP_UNROLL.
        * tree.cc (omp_clause_num_ops): Add number of operators
        for OMP_CLAUSE_UNROLL_FULL, OMP_CLAUSE_UNROLL_NONE, and
        OMP_CLAUSE_UNROLL_PARTIAl.
        (omp_clause_code_names): Add name strings for
        OMP_CLAUSE_UNROLL_FULL, OMP_CLAUSE_UNROLL_NONE, and
        OMP_CLAUSE_UNROLL_PARTIAL.
        * tree.def (OMP_UNROLL): Define.
        * tree.h (OMP_CLAUSE_UNROLL_PARTIAL_EXPR): Define.
        * omp-transform-loops.cc: New file.
        * omp-general.cc (omp_loop_transform_clause_p): New function.
        * omp-general.h (omp_loop_transform_clause_p): New declaration.

gcc/fortran/ChangeLog:

        * dump-parse-tree.cc (show_omp_clauses): Handle "unroll full"
        and "unroll partial".
        (show_omp_node): Handle OMP_UNROLL.
        (show_code_node): Handle EXEC_OMP_UNROLL.
        * gfortran.h (enum gfc_statement): Add ST_OMP_UNROLL, ST_OMP_END_UNROLL.
        (enum gfc_exec_op): Add EXEC_OMP_UNROLL.
        * match.h (gfc_match_omp_unroll): Declare.
        * openmp.cc (enum omp_mask2): Add OMP_CLAUSE_UNROLL_FULL,
        OMP_CLAUSE_UNROLL_NONE, OMP_CLAUSE_UNROLL_PARTIAL.
        (gfc_match_omp_clauses): Handle "omp unroll partial".
        (OMP_UNROLL_CLAUSES): New macro definition.
        (gfc_match_omp_unroll): Match "full" clause.
        (omp_unroll_removes_loop_nest): New function.
        (resolve_omp_unroll): New function.
        (resolve_omp_do): Accept and verify "omp unroll"
        directives between directive and loop.
        (omp_code_to_statement): Handle EXEC_OMP_UNROLL.
        (gfc_resolve_omp_directive): Likewise.
        * parse.cc (decode_omp_directive): Handle "undroll" and "end unroll".
        (next_statement): Handle ST_OMP_UNROLL.
        (gfc_ascii_statement): Handle ST_OMP_UNROLL and ST_OMP_END_UNROLL.
        (parse_omp_do): Accept ST_OMP_UNROLL and ST_OMP_END_UNROLL
        before/after loop.
        (parse_executable): Handle ST_OMP_UNROLL.
        * resolve.cc (gfc_resolve_blocks): Handle EXEC_OMP_UNROLL.
        (gfc_resolve_code): Likewise.
        * st.cc (gfc_free_statement): Likewise.
        * trans-openmp.cc (gfc_trans_omp_clauses): Handle unroll clauses.
        (gfc_trans_omp_do): Handle OMP_CLAUSE_UNROLL_FULL,
        OMP_CLAUSE_UNROLL_PARTIAL, OMP_CLAUSE_UNROLL_NONE creation.
        (gfc_trans_omp_directive): Handle EXEC_OMP_UNROLL.
        * trans.cc (trans_code): Likewise.

libgomp/ChangeLog:

        * testsuite/libgomp.fortran/loop-transforms/unroll-1.f90: New test.
        * testsuite/libgomp.fortran/loop-transforms/unroll-2.f90: New test.
        * testsuite/libgomp.fortran/loop-transforms/unroll-3.f90: New test.
        * testsuite/libgomp.fortran/loop-transforms/unroll-4.f90: New test.
        * testsuite/libgomp.fortran/loop-transforms/unroll-5.f90: New test.
        * testsuite/libgomp.fortran/loop-transforms/unroll-6.f90: New test.
        * testsuite/libgomp.fortran/loop-transforms/unroll-7.f90: New test.
        * testsuite/libgomp.fortran/loop-transforms/unroll-7a.f90: New test.
        * testsuite/libgomp.fortran/loop-transforms/unroll-7b.f90: New test.
        * testsuite/libgomp.fortran/loop-transforms/unroll-7c.f90: New test.
        * testsuite/libgomp.fortran/loop-transforms/unroll-8.f90: New test.
        * testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90: New test.

gcc/testsuite/ChangeLog:

        * gfortran.dg/gomp/loop-transforms/unroll-1.f90: New test.
        * gfortran.dg/gomp/loop-transforms/unroll-2.f90: New test.
        * gfortran.dg/gomp/loop-transforms/unroll-3.f90: New test.
        * gfortran.dg/gomp/loop-transforms/unroll-4.f90: New test.
        * gfortran.dg/gomp/loop-transforms/unroll-5.f90: New test.
        * gfortran.dg/gomp/loop-transforms/unroll-6.f90: New test.
        * gfortran.dg/gomp/loop-transforms/unroll-7.f90: New test.
        * gfortran.dg/gomp/loop-transforms/unroll-9.f90: New test.
        * gfortran.dg/gomp/loop-transforms/unroll-no-clause-1.f90: New test.
        * gfortran.dg/gomp/loop-transforms/unroll-no-clause-2.f90: New test.
        * gfortran.dg/gomp/loop-transforms/unroll-no-clause-3.f90: New test.
        * gfortran.dg/gomp/loop-transforms/unroll-simd-1.f90: New test.
---
 gcc/Makefile.in                               |    1 +
 gcc/fortran/dump-parse-tree.cc                |   15 +
 gcc/fortran/gfortran.h                        |    9 +-
 gcc/fortran/match.h                           |    1 +
 gcc/fortran/openmp.cc                         |  174 +-
 gcc/fortran/parse.cc                          |   37 +-
 gcc/fortran/resolve.cc                        |    3 +
 gcc/fortran/st.cc                             |    1 +
 gcc/fortran/trans-openmp.cc                   |   71 +-
 gcc/fortran/trans.cc                          |    1 +
 gcc/gimple-pretty-print.cc                    |    6 +
 gcc/gimple.h                                  |    1 +
 gcc/gimplify.cc                               |   40 +-
 gcc/omp-general.cc                            |   14 +
 gcc/omp-general.h                             |    1 +
 gcc/omp-transform-loops.cc                    | 1401 +++++++++++++++++
 gcc/params.opt                                |    9 +
 gcc/passes.def                                |    1 +
 .../gomp/loop-transforms/unroll-1.f90         |  277 ++++
 .../gomp/loop-transforms/unroll-10.f90        |    7 +
 .../gomp/loop-transforms/unroll-11.f90        |   75 +
 .../gomp/loop-transforms/unroll-12.f90        |   29 +
 .../gomp/loop-transforms/unroll-2.f90         |   22 +
 .../gomp/loop-transforms/unroll-3.f90         |   17 +
 .../gomp/loop-transforms/unroll-4.f90         |   18 +
 .../gomp/loop-transforms/unroll-5.f90         |   18 +
 .../gomp/loop-transforms/unroll-6.f90         |   19 +
 .../gomp/loop-transforms/unroll-7.f90         |   62 +
 .../gomp/loop-transforms/unroll-8.f90         |   22 +
 .../gomp/loop-transforms/unroll-9.f90         |   18 +
 .../loop-transforms/unroll-no-clause-1.f90    |   20 +
 .../loop-transforms/unroll-no-clause-2.f90    |   21 +
 .../loop-transforms/unroll-no-clause-3.f90    |   23 +
 .../gomp/loop-transforms/unroll-simd-1.f90    |  244 +++
 .../gomp/loop-transforms/unroll-simd-2.f90    |   57 +
 gcc/tree-core.h                               |    9 +
 gcc/tree-pass.h                               |    1 +
 gcc/tree-pretty-print.cc                      |   20 +
 gcc/tree.cc                                   |    6 +
 gcc/tree.def                                  |    6 +
 gcc/tree.h                                    |    3 +
 .../loop-transforms/unroll-1.f90              |   52 +
 .../loop-transforms/unroll-2.f90              |   88 ++
 .../loop-transforms/unroll-3.f90              |   59 +
 .../loop-transforms/unroll-4.f90              |   72 +
 .../loop-transforms/unroll-5.f90              |   55 +
 .../loop-transforms/unroll-6.f90              |  105 ++
 .../loop-transforms/unroll-7.f90              |  198 +++
 .../loop-transforms/unroll-7a.f90             |    7 +
 .../loop-transforms/unroll-7b.f90             |    7 +
 .../loop-transforms/unroll-7c.f90             |    7 +
 .../loop-transforms/unroll-8.f90              |   38 +
 .../loop-transforms/unroll-simd-1.f90         |   33 +
 53 files changed, 3484 insertions(+), 17 deletions(-)
 create mode 100644 gcc/omp-transform-loops.cc
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-10.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-11.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-12.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-3.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-4.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-5.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-6.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-7.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-8.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-9.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-no-clause-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-no-clause-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-no-clause-3.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-simd-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-simd-2.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-2.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-3.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-4.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-5.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-6.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-7.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-7a.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-7b.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-7c.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-8.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index d8b76d83d68..8e203f68bd7 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1540,6 +1540,7 @@ OBJS = \
        omp-expand.o \
        omp-general.o \
        omp-low.o \
+       omp-transform-loops.o \
        omp-oacc-kernels-decompose.o \
        omp-oacc-neuter-broadcast.o \
        omp-simd-clone.o \
diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc
index 3b24bdc1a6c..e069aca1f1d 100644
--- a/gcc/fortran/dump-parse-tree.cc
+++ b/gcc/fortran/dump-parse-tree.cc
@@ -2052,6 +2052,16 @@ show_omp_clauses (gfc_omp_clauses *omp_clauses)
     }
   if (omp_clauses->assume)
     show_omp_assumes (omp_clauses->assume);
+  if (omp_clauses->unroll_full)
+    {
+      fputs (" FULL", dumpfile);
+    }
+  if (omp_clauses->unroll_partial)
+    {
+      fputs (" PARTIAL", dumpfile);
+      if (omp_clauses->unroll_partial_factor > 0)
+       fprintf (dumpfile, "(%u)", omp_clauses->unroll_partial_factor);
+    }
 }

 /* Show a single OpenMP or OpenACC directive node and everything underneath it
@@ -2162,6 +2172,7 @@ show_omp_node (int level, gfc_code *c)
       name = "TEAMS DISTRIBUTE PARALLEL DO SIMD"; break;
     case EXEC_OMP_TEAMS_DISTRIBUTE_SIMD: name = "TEAMS DISTRIBUTE SIMD"; break;
     case EXEC_OMP_TEAMS_LOOP: name = "TEAMS LOOP"; break;
+    case EXEC_OMP_UNROLL: name = "UNROLL"; break;
     case EXEC_OMP_WORKSHARE: name = "WORKSHARE"; break;
     default:
       gcc_unreachable ();
@@ -2238,6 +2249,7 @@ show_omp_node (int level, gfc_code *c)
     case EXEC_OMP_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD:
     case EXEC_OMP_TEAMS_DISTRIBUTE_SIMD:
     case EXEC_OMP_TEAMS_LOOP:
+    case EXEC_OMP_UNROLL:
     case EXEC_OMP_WORKSHARE:
       omp_clauses = c->ext.omp_clauses;
       break;
@@ -2299,6 +2311,8 @@ show_omp_node (int level, gfc_code *c)
          d = d->block;
        }
     }
+  else if (c->op == EXEC_OMP_UNROLL)
+    show_code (level + 1, c->block != NULL ? c->block->next : c->next);
   else
     show_code (level + 1, c->block->next);
   if (c->op == EXEC_OMP_ATOMIC)
@@ -3477,6 +3491,7 @@ show_code_node (int level, gfc_code *c)
     case EXEC_OMP_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD:
     case EXEC_OMP_TEAMS_DISTRIBUTE_SIMD:
     case EXEC_OMP_TEAMS_LOOP:
+    case EXEC_OMP_UNROLL:
     case EXEC_OMP_WORKSHARE:
       show_omp_node (level, c);
       break;
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 9bab2c40ead..5ef4a8907b0 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -319,7 +319,8 @@ enum gfc_statement
   ST_OMP_END_MASKED_TASKLOOP_SIMD, ST_OMP_SCOPE, ST_OMP_END_SCOPE,
   ST_OMP_ERROR, ST_OMP_ASSUME, ST_OMP_END_ASSUME, ST_OMP_ASSUMES,
   /* Note: gfc_match_omp_nothing returns ST_NONE. */
-  ST_OMP_NOTHING, ST_NONE
+  ST_OMP_NOTHING, ST_NONE,
+  ST_OMP_UNROLL, ST_OMP_END_UNROLL
 };

 /* Types of interfaces that we can have.  Assignment interfaces are
@@ -1561,6 +1562,8 @@ typedef struct gfc_omp_clauses
   unsigned order_unconstrained:1, order_reproducible:1, capture:1;
   unsigned grainsize_strict:1, num_tasks_strict:1, compare:1, weak:1;
   unsigned non_rectangular:1, order_concurrent:1;
+  unsigned unroll_full:1, unroll_none:1, unroll_partial:1;
+  unsigned unroll_partial_factor;
   ENUM_BITFIELD (gfc_omp_sched_kind) sched_kind:3;
   ENUM_BITFIELD (gfc_omp_device_type) device_type:2;
   ENUM_BITFIELD (gfc_omp_memorder) memorder:3;
@@ -2974,6 +2977,7 @@ enum gfc_exec_op
   EXEC_OMP_TARGET_TEAMS_LOOP, EXEC_OMP_MASKED, EXEC_OMP_PARALLEL_MASKED,
   EXEC_OMP_PARALLEL_MASKED_TASKLOOP, EXEC_OMP_PARALLEL_MASKED_TASKLOOP_SIMD,
   EXEC_OMP_MASKED_TASKLOOP, EXEC_OMP_MASKED_TASKLOOP_SIMD, EXEC_OMP_SCOPE,
+  EXEC_OMP_UNROLL,
   EXEC_OMP_ERROR
 };

@@ -3868,6 +3872,9 @@ void gfc_generate_module_code (gfc_namespace *);
 /* trans-intrinsic.cc */
 bool gfc_inline_intrinsic_function_p (gfc_expr *);

+/* trans-openmp.cc */
+bool loop_transform_p (gfc_exec_op op);
+
 /* bbt.cc */
 typedef int (*compare_fn) (void *, void *);
 void gfc_insert_bbt (void *, void *, compare_fn);
diff --git a/gcc/fortran/match.h b/gcc/fortran/match.h
index 4430aff001c..5640c725f09 100644
--- a/gcc/fortran/match.h
+++ b/gcc/fortran/match.h
@@ -226,6 +226,7 @@ match gfc_match_omp_teams_distribute_parallel_do_simd (void);
 match gfc_match_omp_teams_distribute_simd (void);
 match gfc_match_omp_teams_loop (void);
 match gfc_match_omp_threadprivate (void);
+match gfc_match_omp_unroll (void);
 match gfc_match_omp_workshare (void);
 match gfc_match_omp_end_critical (void);
 match gfc_match_omp_end_nowait (void);
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index abca146d78e..e54f016b170 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -1051,6 +1051,9 @@ enum omp_mask1
 /* More OpenMP clauses and OpenACC 2.0+ specific clauses. */
 enum omp_mask2
 {
+  OMP_CLAUSE_UNROLL_FULL,  /* OpenMP 5.1.  */
+  OMP_CLAUSE_UNROLL_NONE,  /* OpenMP 5.1.  */
+  OMP_CLAUSE_UNROLL_PARTIAL,  /* OpenMP 5.1.  */
   OMP_CLAUSE_ASYNC,
   OMP_CLAUSE_NUM_GANGS,
   OMP_CLAUSE_NUM_WORKERS,
@@ -2523,6 +2526,15 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
                                              NULL, &head, true, true)
                  == MATCH_YES))
            continue;
+         if ((mask & OMP_CLAUSE_UNROLL_FULL)
+             && (m = gfc_match_dupl_check (!c->unroll_full, "full"))
+                    != MATCH_NO)
+           {
+             if (m == MATCH_ERROR)
+               goto error;
+             c->unroll_full = needs_space = true;
+             continue;
+           }
          break;
        case 'g':
          if ((mask & OMP_CLAUSE_GANG)
@@ -3156,10 +3168,36 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
            }
          break;
        case 'p':
-         if ((mask & OMP_CLAUSE_COPY)
-             && gfc_match ("pcopy ( ") == MATCH_YES
+         if (mask & OMP_CLAUSE_UNROLL_PARTIAL)
+           {
+             if ((m = gfc_match_dupl_check (!c->unroll_partial, "partial"))
+                 != MATCH_NO)
+               {
+                 int unroll_factor;
+                 if (m == MATCH_ERROR)
+                   goto error;
+
+                 c->unroll_partial = true;
+
+                 gfc_expr *cexpr = NULL;
+                 m = gfc_match (" ( %e )", &cexpr);
+                 if (m == MATCH_NO)
+                   ;
+                 else if (m == MATCH_YES
+                          && !gfc_extract_int (cexpr, &unroll_factor, -1)
+                          && unroll_factor > 0)
+                   c->unroll_partial_factor = unroll_factor;
+                 else
+                   gfc_error_now ("PARTIAL clause argument not constant "
+                                  "positive integer at %C");
+                 gfc_free_expr (cexpr);
+                 continue;
+               }
+           }
+         if ((mask & OMP_CLAUSE_COPY) && gfc_match ("pcopy ( ") == MATCH_YES
              && gfc_match_omp_map_clause (&c->lists[OMP_LIST_MAP],
-                                          OMP_MAP_TOFROM, true, allow_derived))
+                                          OMP_MAP_TOFROM, true,
+                                          allow_derived))
            continue;
          if ((mask & OMP_CLAUSE_COPYIN)
              && gfc_match ("pcopyin ( ") == MATCH_YES
@@ -4270,6 +4308,8 @@ cleanup:
   (omp_mask (OMP_CLAUSE_AT) | OMP_CLAUSE_MESSAGE | OMP_CLAUSE_SEVERITY)
 #define OMP_WORKSHARE_CLAUSES \
   omp_mask (OMP_CLAUSE_NOWAIT)
+#define OMP_UNROLL_CLAUSES \
+  (omp_mask (OMP_CLAUSE_UNROLL_FULL) | OMP_CLAUSE_UNROLL_PARTIAL)


 static match
@@ -6369,6 +6409,20 @@ gfc_match_omp_teams_distribute_simd (void)
                    | OMP_SIMD_CLAUSES);
 }

+match
+gfc_match_omp_unroll (void)
+{
+  match m = match_omp (EXEC_OMP_UNROLL, OMP_UNROLL_CLAUSES);
+
+  /* Add an internal clause as a marker to indicate that this "unroll"
+     directive had no clause. */
+  if (new_st.ext.omp_clauses
+      && !new_st.ext.omp_clauses->unroll_full
+      && !new_st.ext.omp_clauses->unroll_partial)
+    new_st.ext.omp_clauses->unroll_none = true;
+
+  return m;
+}

 match
 gfc_match_omp_workshare (void)
@@ -9235,6 +9289,75 @@ gfc_resolve_do_iterator (gfc_code *code, gfc_symbol *sym, bool add_clause)
     }
 }

+
+static bool
+omp_unroll_removes_loop_nest (gfc_code *code)
+{
+  gcc_assert (code->op == EXEC_OMP_UNROLL);
+  if (!code->ext.omp_clauses)
+    return true;
+
+  if (code->ext.omp_clauses->unroll_none)
+    {
+      gfc_warning (0, "!$OMP UNROLL without PARTIAL clause at %L turns loop "
+                  "into a non-loop",
+                  &code->loc);
+      return true;
+    }
+  if (code->ext.omp_clauses->unroll_full)
+    {
+      gfc_warning (0, "!$OMP UNROLL with FULL clause at %L turns loop into a "
+                  "non-loop",
+                  &code->loc);
+      return true;
+    }
+  return false;
+}
+
+static void
+resolve_loop_transform_generic (gfc_code *code, const char *descr)
+{
+  gcc_assert (code->block);
+
+  if (code->block->op == EXEC_OMP_UNROLL
+       && !omp_unroll_removes_loop_nest (code->block))
+    return;
+
+  if (code->block->next->op == EXEC_OMP_UNROLL
+      && !omp_unroll_removes_loop_nest (code->block->next))
+    return;
+
+  if (code->block->next->op == EXEC_DO_WHILE)
+    {
+      gfc_error ("%s invalid around DO WHILE or DO without loop "
+                "control at %L", descr, &code->loc);
+      return;
+    }
+  if (code->block->next->op == EXEC_DO_CONCURRENT)
+    {
+      gfc_error ("%s invalid around DO CONCURRENT loop at %L",
+                descr, &code->loc);
+      return;
+    }
+
+  gfc_error ("missing canonical loop nest after %s at %L",
+            descr, &code->loc);
+
+}
+
+static void
+resolve_omp_unroll (gfc_code *code)
+{
+  if (!code->block || code->block->op == EXEC_DO)
+    return;
+
+  if (code->block->next->op == EXEC_DO)
+    return;
+
+  resolve_loop_transform_generic (code, "!$OMP UNROLL");
+}
+
+
 static void
 handle_local_var (gfc_symbol *sym)
 {
@@ -9259,6 +9382,13 @@ is_outer_iteration_variable (gfc_code *code, int depth, gfc_symbol *var)
 {
   int i;
   gfc_code *do_code = code->block->next;
+  while (loop_transform_p (do_code->op)) {
+    if (do_code->block)
+      do_code = do_code->block->next;
+    else
+      do_code = do_code->next;
+  }
+  gcc_assert (!loop_transform_p (do_code->op));

   for (i = 1; i < depth; i++)
     {
@@ -9277,6 +9407,13 @@ expr_is_invariant (gfc_code *code, int depth, gfc_expr *expr)
 {
   int i;
   gfc_code *do_code = code->block->next;
+  while (loop_transform_p (do_code->op)) {
+    if (do_code->block)
+      do_code = do_code->block->next;
+    else
+      do_code = do_code->next;
+  }
+  gcc_assert (!loop_transform_p (do_code->op));

   for (i = 1; i < depth; i++)
     {
@@ -9454,6 +9591,7 @@ resolve_omp_do (gfc_code *code)
       is_simd = true;
       break;
     case EXEC_OMP_TEAMS_LOOP: name = "!$OMP TEAMS LOOP"; break;
+    case EXEC_OMP_UNROLL: name = "!$OMP UNROLL"; break;
     default: gcc_unreachable ();
     }

@@ -9461,6 +9599,23 @@ resolve_omp_do (gfc_code *code)
     resolve_omp_clauses (code, code->ext.omp_clauses, NULL);

   do_code = code->block->next;
+  /* Move forward over any loop transformation directives to find the loop. */
+  bool error = false;
+  while (do_code->op == EXEC_OMP_UNROLL)
+    {
+      if (!error && omp_unroll_removes_loop_nest (do_code))
+       {
+         gfc_error ("missing canonical loop nest after %s at %L", name,
+                    &code->loc);
+         error = true;
+       }
+      if (do_code->block)
+       do_code = do_code->block->next;
+      else
+       do_code = do_code->next;
+    }
+  gcc_assert (do_code->op != EXEC_OMP_UNROLL);
+
   if (code->ext.omp_clauses->orderedc)
     collapse = code->ext.omp_clauses->orderedc;
   else
@@ -9490,6 +9645,14 @@ resolve_omp_do (gfc_code *code)
                     &do_code->loc);
          break;
        }
+      if (do_code->op != EXEC_DO)
+       {
+         gfc_error ("%s must be DO loop at %L", name,
+                    &do_code->loc);
+         break;
+       }
+
+      gcc_assert (do_code->op != EXEC_OMP_UNROLL);
       gcc_assert (do_code->op == EXEC_DO);
       if (do_code->ext.iterator->var->ts.type != BT_INTEGER)
        gfc_error ("%s iteration variable must be of type integer at %L",
@@ -9726,6 +9889,8 @@ omp_code_to_statement (gfc_code *code)
       return ST_OMP_PARALLEL_LOOP;
     case EXEC_OMP_DEPOBJ:
       return ST_OMP_DEPOBJ;
+    case EXEC_OMP_UNROLL:
+      return ST_OMP_UNROLL;
     default:
       gcc_unreachable ();
     }
@@ -10155,6 +10320,9 @@ gfc_resolve_omp_directive (gfc_code *code, gfc_namespace *ns)
     case EXEC_OMP_TEAMS_LOOP:
       resolve_omp_do (code);
       break;
+    case EXEC_OMP_UNROLL:
+      resolve_omp_unroll (code);
+      break;
     case EXEC_OMP_ASSUME:
     case EXEC_OMP_CANCEL:
     case EXEC_OMP_ERROR:
diff --git a/gcc/fortran/parse.cc b/gcc/fortran/parse.cc
index f1e55316e5b..094678436b4 100644
--- a/gcc/fortran/parse.cc
+++ b/gcc/fortran/parse.cc
@@ -1008,6 +1008,7 @@ decode_omp_directive (void)
              ST_OMP_END_TEAMS_DISTRIBUTE);
       matcho ("end teams loop", gfc_match_omp_eos_error, ST_OMP_END_TEAMS_LOOP);
       matcho ("end teams", gfc_match_omp_eos_error, ST_OMP_END_TEAMS);
+      matchs ("end unroll", gfc_match_omp_eos_error, ST_OMP_END_UNROLL);
       matcho ("end workshare", gfc_match_omp_end_nowait,
              ST_OMP_END_WORKSHARE);
       break;
@@ -1137,6 +1138,9 @@ decode_omp_directive (void)
       matchdo ("threadprivate", gfc_match_omp_threadprivate,
               ST_OMP_THREADPRIVATE);
       break;
+    case 'u':
+      matchs ("unroll", gfc_match_omp_unroll, ST_OMP_UNROLL);
+      break;
     case 'w':
       matcho ("workshare", gfc_match_omp_workshare, ST_OMP_WORKSHARE);
       break;
@@ -1724,6 +1728,7 @@ next_statement (void)
   case ST_OMP_LOOP: case ST_OMP_PARALLEL_LOOP: case ST_OMP_TEAMS_LOOP: \
   case ST_OMP_TARGET_PARALLEL_LOOP: case ST_OMP_TARGET_TEAMS_LOOP: \
   case ST_OMP_ASSUME: \
+  case ST_OMP_UNROLL: \
   case ST_CRITICAL: \
   case ST_OACC_PARALLEL_LOOP: case ST_OACC_PARALLEL: case ST_OACC_KERNELS: \
   case ST_OACC_DATA: case ST_OACC_HOST_DATA: case ST_OACC_LOOP: \
@@ -2096,6 +2101,9 @@ gfc_ascii_statement (gfc_statement st, bool strip_sentinel)
     case ST_END_UNION:
       p = "END UNION";
       break;
+    case ST_OMP_END_UNROLL:
+      p = "!$OMP END UNROLL";
+      break;
     case ST_END_MAP:
       p = "END MAP";
       break;
@@ -2766,6 +2774,9 @@ gfc_ascii_statement (gfc_statement st, bool strip_sentinel)
     case ST_OMP_THREADPRIVATE:
       p = "!$OMP THREADPRIVATE";
       break;
+    case ST_OMP_UNROLL:
+      p = "!$OMP UNROLL";
+      break;
     case ST_OMP_WORKSHARE:
       p = "!$OMP WORKSHARE";
       break;
@@ -5180,6 +5191,7 @@ parse_omp_do (gfc_statement omp_st)
   gfc_statement st;
   gfc_code *cp, *np;
   gfc_state_data s;
+  int num_unroll = 0;

   accept_statement (omp_st);

@@ -5196,6 +5208,12 @@ parse_omp_do (gfc_statement omp_st)
        unexpected_eof ();
       else if (st == ST_DO)
        break;
+      else if (st == ST_OMP_UNROLL)
+       {
+         accept_statement (st);
+         num_unroll++;
+         continue;
+       }
       else
        unexpected_statement (st);
     }
@@ -5221,6 +5239,17 @@ parse_omp_do (gfc_statement omp_st)
   pop_state ();

   st = next_statement ();
+  for (; num_unroll > 0; num_unroll--)
+    {
+      if (st == ST_OMP_END_UNROLL)
+       {
+         gfc_clear_new_st ();
+         gfc_commit_symbols ();
+         gfc_warning_check ();
+         st = next_statement ();
+       }
+    }
+
   gfc_statement omp_end_st = ST_OMP_END_DO;
   switch (omp_st)
     {
@@ -5234,7 +5263,9 @@ parse_omp_do (gfc_statement omp_st)
     case ST_OMP_DISTRIBUTE_SIMD:
       omp_end_st = ST_OMP_END_DISTRIBUTE_SIMD;
       break;
-    case ST_OMP_DO: omp_end_st = ST_OMP_END_DO; break;
+    case ST_OMP_DO:
+      omp_end_st = ST_OMP_END_DO;
+      break;
     case ST_OMP_DO_SIMD: omp_end_st = ST_OMP_END_DO_SIMD; break;
     case ST_OMP_LOOP: omp_end_st = ST_OMP_END_LOOP; break;
     case ST_OMP_PARALLEL_DO: omp_end_st = ST_OMP_END_PARALLEL_DO; break;
@@ -5307,6 +5338,9 @@ parse_omp_do (gfc_statement omp_st)
     case ST_OMP_TEAMS_LOOP:
       omp_end_st = ST_OMP_END_TEAMS_LOOP;
       break;
+    case ST_OMP_UNROLL:
+      omp_end_st = ST_OMP_END_UNROLL;
+      break;
     default: gcc_unreachable ();
     }
   if (st == omp_end_st)
@@ -5991,6 +6025,7 @@ parse_executable (gfc_statement st)
        case ST_OMP_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD:
        case ST_OMP_TEAMS_DISTRIBUTE_SIMD:
        case ST_OMP_TEAMS_LOOP:
+       case ST_OMP_UNROLL:
          st = parse_omp_do (st);
          if (st == ST_IMPLIED_ENDDO)
            return st;
diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index f6ec76acb0b..46988ff281d 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -11041,6 +11041,7 @@ gfc_resolve_blocks (gfc_code *b, gfc_namespace *ns)
        case EXEC_OMP_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD:
        case EXEC_OMP_TEAMS_LOOP:
        case EXEC_OMP_TEAMS_DISTRIBUTE_SIMD:
+       case EXEC_OMP_UNROLL:
        case EXEC_OMP_WORKSHARE:
          break;

@@ -12197,6 +12198,7 @@ gfc_resolve_code (gfc_code *code, gfc_namespace *ns)
            case EXEC_OMP_LOOP:
            case EXEC_OMP_SIMD:
            case EXEC_OMP_TARGET_SIMD:
+           case EXEC_OMP_UNROLL:
              gfc_resolve_omp_do_blocks (code, ns);
              break;
            case EXEC_SELECT_TYPE:
@@ -12693,6 +12695,7 @@ start:
        case EXEC_OMP_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD:
        case EXEC_OMP_TEAMS_DISTRIBUTE_SIMD:
        case EXEC_OMP_TEAMS_LOOP:
+       case EXEC_OMP_UNROLL:
        case EXEC_OMP_WORKSHARE:
          gfc_resolve_omp_directive (code, ns);
          break;
diff --git a/gcc/fortran/st.cc b/gcc/fortran/st.cc
index 657bc9deebf..6112831e621 100644
--- a/gcc/fortran/st.cc
+++ b/gcc/fortran/st.cc
@@ -277,6 +277,7 @@ gfc_free_statement (gfc_code *p)
     case EXEC_OMP_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD:
     case EXEC_OMP_TEAMS_DISTRIBUTE_SIMD:
     case EXEC_OMP_TEAMS_LOOP:
+    case EXEC_OMP_UNROLL:
     case EXEC_OMP_WORKSHARE:
       gfc_free_omp_clauses (p->ext.omp_clauses);
       break;
diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 84c0184f48e..c4a23f6e247 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -3890,6 +3890,29 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
       omp_clauses = gfc_trans_add_clause (c, omp_clauses);
     }

+  if (clauses->unroll_full)
+    {
+      c = build_omp_clause (gfc_get_location (&where), OMP_CLAUSE_UNROLL_FULL);
+      omp_clauses = gfc_trans_add_clause (c, omp_clauses);
+    }
+
+  if (clauses->unroll_none)
+    {
+      c = build_omp_clause (gfc_get_location (&where), OMP_CLAUSE_UNROLL_NONE);
+      omp_clauses = gfc_trans_add_clause (c, omp_clauses);
+    }
+
+  if (clauses->unroll_partial)
+    {
+      c = build_omp_clause (gfc_get_location (&where),
+                           OMP_CLAUSE_UNROLL_PARTIAL);
+      OMP_CLAUSE_UNROLL_PARTIAL_EXPR (c)
+         = clauses->unroll_partial_factor ? build_int_cst (
+               integer_type_node, clauses->unroll_partial_factor)
+                                          : NULL_TREE;
+      omp_clauses = gfc_trans_add_clause (c, omp_clauses);
+    }
+
   if (clauses->ordered)
     {
       c = build_omp_clause (gfc_get_location (&where), OMP_CLAUSE_ORDERED);
@@ -5080,6 +5103,12 @@ gfc_trans_omp_cancel (gfc_code *code)
   return gfc_finish_block (&block);
 }

+bool
+loop_transform_p (gfc_exec_op op)
+{
+  return op == EXEC_OMP_UNROLL;
+}
+
 static tree
 gfc_trans_omp_cancellation_point (gfc_code *code)
 {
@@ -5257,7 +5286,7 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock,
 {
   gfc_se se;
   tree dovar, stmt, from, to, step, type, init, cond, incr, orig_decls;
-  tree local_dovar = NULL_TREE, cycle_label, tmp, omp_clauses;
+  tree local_dovar = NULL_TREE, cycle_label, tmp, omp_clauses, loop_transform_clauses;
   stmtblock_t block;
   stmtblock_t body;
   gfc_omp_clauses *clauses = code->ext.omp_clauses;
@@ -5268,6 +5297,7 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock,
   vec<tree, va_heap, vl_embed> *saved_doacross_steps = doacross_steps;
   gfc_expr_list *tile = do_clauses ? do_clauses->tile_list : clauses->tile_list;
   gfc_code *orig_code = code;
+  locus top_loc = code->loc;

   /* Both collapsed and tiled loops are lowered the same way.  In
      OpenACC, those clauses are not compatible, so prioritize the tile
@@ -5285,7 +5315,25 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock,
   if (collapse <= 0)
     collapse = 1;

+  if (pblock == NULL)
+    {
+      gfc_start_block (&block);
+      pblock = &block;
+    }
   code = code->block->next;
+  gcc_assert (code->op == EXEC_DO || code->op == EXEC_OMP_UNROLL);
+  /* Loop transformation directives surrounding the associated loop of an "omp
+     do" (or similar directive) are represented as clauses on the "omp do". */
+  loop_transform_clauses = NULL;
+  while (code->op == EXEC_OMP_UNROLL)
+    {
+      tree clauses = gfc_trans_omp_clauses (pblock, code->ext.omp_clauses,
+                                           code->loc);
+      loop_transform_clauses = chainon (loop_transform_clauses, clauses);
+
+      code = code->block ? code->block->next : code->next;
+    }
+  gcc_assert (code->op != EXEC_OMP_UNROLL);
   gcc_assert (code->op == EXEC_DO);

   init = make_tree_vec (collapse);
@@ -5293,18 +5341,21 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock,
   incr = make_tree_vec (collapse);
   orig_decls = clauses->ordered ? make_tree_vec (collapse) : NULL_TREE;

-  if (pblock == NULL)
-    {
-      gfc_start_block (&block);
-      pblock = &block;
-    }
-
   /* simd schedule modifier is only useful for composite do simd and other
      constructs including that, where gfc_trans_omp_do is only called
      on the simd construct and DO's clauses are translated elsewhere.  */
   do_clauses->sched_simd = false;

-  omp_clauses = gfc_trans_omp_clauses (pblock, do_clauses, code->loc);
+  if (op == EXEC_OMP_UNROLL)
+    {
+      /* This is a loop transformation on a loop which is not associated with
+        any other directive. Use the directive location instead of the loop
+        location for the clauses. */
+      omp_clauses = gfc_trans_omp_clauses (pblock, do_clauses, top_loc);
+    }
+  else
+    omp_clauses = gfc_trans_omp_clauses (pblock, do_clauses, code->loc);
+  omp_clauses = chainon (omp_clauses, loop_transform_clauses);

   for (i = 0; i < collapse; i++)
     {
@@ -5558,7 +5609,7 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock,
            }
          gcc_assert (local_dovar == dovar || c != NULL);
        }
-      if (local_dovar != dovar)
+      if (local_dovar != dovar && op != EXEC_OMP_UNROLL)
        {
          if (op != EXEC_OMP_SIMD || dovar_found == 1)
            tmp = build_omp_clause (input_location, OMP_CLAUSE_PRIVATE);
@@ -5644,6 +5695,7 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock,
     case EXEC_OMP_LOOP: stmt = make_node (OMP_LOOP); break;
     case EXEC_OMP_TASKLOOP: stmt = make_node (OMP_TASKLOOP); break;
     case EXEC_OACC_LOOP: stmt = make_node (OACC_LOOP); break;
+    case EXEC_OMP_UNROLL: stmt = make_node (OMP_LOOP_TRANS); break;
     default: gcc_unreachable ();
     }

@@ -7741,6 +7793,7 @@ gfc_trans_omp_directive (gfc_code *code)
     case EXEC_OMP_LOOP:
     case EXEC_OMP_SIMD:
     case EXEC_OMP_TASKLOOP:
+    case EXEC_OMP_UNROLL:
       return gfc_trans_omp_do (code, code->op, NULL, code->ext.omp_clauses,
                               NULL);
     case EXEC_OMP_DISTRIBUTE_PARALLEL_DO:
diff --git a/gcc/fortran/trans.cc b/gcc/fortran/trans.cc
index f7745add045..56ec59fe80e 100644
--- a/gcc/fortran/trans.cc
+++ b/gcc/fortran/trans.cc
@@ -2520,6 +2520,7 @@ trans_code (gfc_code * code, tree cond)
        case EXEC_OMP_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD:
        case EXEC_OMP_TEAMS_DISTRIBUTE_SIMD:
        case EXEC_OMP_TEAMS_LOOP:
+       case EXEC_OMP_UNROLL:
        case EXEC_OMP_WORKSHARE:
          res = gfc_trans_omp_directive (code);
          break;
diff --git a/gcc/gimple-pretty-print.cc b/gcc/gimple-pretty-print.cc
index 300e9d7ed1e..24ef60059fe 100644
--- a/gcc/gimple-pretty-print.cc
+++ b/gcc/gimple-pretty-print.cc
@@ -1478,6 +1478,9 @@ dump_gimple_omp_for (pretty_printer *buffer, const gomp_for *gs, int spc,
        case GF_OMP_FOR_KIND_SIMD:
          kind = " simd";
          break;
+       case GF_OMP_FOR_KIND_TRANSFORM_LOOP:
+         kind = " unroll";
+         break;
        default:
          gcc_unreachable ();
        }
@@ -1515,6 +1518,9 @@ dump_gimple_omp_for (pretty_printer *buffer, const gomp_for *gs, int spc,
        case GF_OMP_FOR_KIND_SIMD:
          pp_string (buffer, "#pragma omp simd");
          break;
+       case GF_OMP_FOR_KIND_TRANSFORM_LOOP:
+         pp_string (buffer, "#pragma omp loop_transform");
+         break;
        default:
          gcc_unreachable ();
        }
diff --git a/gcc/gimple.h b/gcc/gimple.h
index 081d18e425a..213cfc58abb 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -159,6 +159,7 @@ enum gf_mask {
     GF_OMP_FOR_KIND_TASKLOOP   = 2,
     GF_OMP_FOR_KIND_OACC_LOOP  = 4,
     GF_OMP_FOR_KIND_SIMD       = 5,
+    GF_OMP_FOR_KIND_TRANSFORM_LOOP = 6,
     GF_OMP_FOR_COMBINED                = 1 << 3,
     GF_OMP_FOR_COMBINED_INTO   = 1 << 4,
     GF_OMP_TARGET_KIND_MASK    = (1 << 5) - 1,
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index ade6e335da7..2c160686533 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -5949,6 +5949,7 @@ is_gimple_stmt (tree t)
     case OACC_CACHE:
     case OMP_PARALLEL:
     case OMP_FOR:
+    case OMP_LOOP_TRANS:
     case OMP_SIMD:
     case OMP_DISTRIBUTE:
     case OMP_LOOP:
@@ -12101,6 +12102,10 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
          }
          break;

+       case OMP_CLAUSE_UNROLL_FULL:
+       case OMP_CLAUSE_UNROLL_NONE:
+       case OMP_CLAUSE_UNROLL_PARTIAL:
+         break;
        case OMP_CLAUSE_NOHOST:
        default:
          gcc_unreachable ();
@@ -13071,6 +13076,9 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p,
        case OMP_CLAUSE_FINALIZE:
        case OMP_CLAUSE_INCLUSIVE:
        case OMP_CLAUSE_EXCLUSIVE:
+       case OMP_CLAUSE_UNROLL_FULL:
+       case OMP_CLAUSE_UNROLL_NONE:
+       case OMP_CLAUSE_UNROLL_PARTIAL:
          break;

        case OMP_CLAUSE_NOHOST:
@@ -13797,6 +13805,8 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
     case OMP_SIMD:
       ort = ORT_SIMD;
       break;
+    case OMP_LOOP_TRANS:
+      break;
     default:
       gcc_unreachable ();
     }
@@ -14158,8 +14168,19 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
                  n->value &= ~GOVD_LASTPRIVATE_CONDITIONAL;
                }
        }
-      else
-       omp_add_variable (gimplify_omp_ctxp, decl, GOVD_PRIVATE | GOVD_SEEN);
+      else {
+         if (TREE_CODE(orig_for_stmt) == OMP_LOOP_TRANS)
+           {
+             /* This loop is not going to be associated with any
+                directive after its transformation in
+                pass-omp_transform_loops. It will be lowered there
+                and the loop iteration variable will be used in the
+                context. */
+             omp_notice_variable(gimplify_omp_ctxp, decl, true);
+           }
+         else
+           omp_add_variable(gimplify_omp_ctxp, decl, GOVD_PRIVATE | GOVD_SEEN);
+       }

       /* If DECL is not a gimple register, create a temporary variable to act
         as an iteration counter.  This is valid, since DECL cannot be
@@ -14200,7 +14221,7 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
                  c2 = NULL_TREE;
                }
            }
-         else
+         else if (TREE_CODE (orig_for_stmt) != OMP_LOOP_TRANS)
            omp_add_variable (gimplify_omp_ctxp, var,
                              GOVD_PRIVATE | GOVD_SEEN);
        }
@@ -14481,6 +14502,7 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
     case OMP_DISTRIBUTE: kind = GF_OMP_FOR_KIND_DISTRIBUTE; break;
     case OMP_TASKLOOP: kind = GF_OMP_FOR_KIND_TASKLOOP; break;
     case OACC_LOOP: kind = GF_OMP_FOR_KIND_OACC_LOOP; break;
+    case OMP_LOOP_TRANS: kind = GF_OMP_FOR_KIND_TRANSFORM_LOOP; break;
     default:
       gcc_unreachable ();
     }
@@ -14665,6 +14687,13 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
                gtask_clauses_ptr = &OMP_CLAUSE_CHAIN (c);
              }
            break;
+         /* Move loop transformations to inner loop */
+         case OMP_CLAUSE_UNROLL_FULL:
+         case OMP_CLAUSE_UNROLL_NONE:
+         case OMP_CLAUSE_UNROLL_PARTIAL:
+           *gfor_clauses_ptr = c;
+           gfor_clauses_ptr = &OMP_CLAUSE_CHAIN (c);
+           break;
          default:
            gcc_unreachable ();
          }
@@ -15105,6 +15134,10 @@ gimplify_omp_loop (tree *expr_p, gimple_seq *pre_p)
              }
            pc = &OMP_CLAUSE_CHAIN (*pc);
            break;
+         case OMP_CLAUSE_UNROLL_PARTIAL:
+         case OMP_CLAUSE_UNROLL_FULL:
+         case OMP_CLAUSE_UNROLL_NONE:
+           break;
          default:
            gcc_unreachable ();
          }
@@ -16886,6 +16919,7 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
        case OMP_FOR:
        case OMP_DISTRIBUTE:
        case OMP_TASKLOOP:
+       case OMP_LOOP_TRANS:
        case OACC_LOOP:
          ret = gimplify_omp_for (expr_p, pre_p);
          break;
diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc
index eefdcb54590..e29d695dcba 100644
--- a/gcc/omp-general.cc
+++ b/gcc/omp-general.cc
@@ -2253,6 +2253,20 @@ omp_declare_variant_remove_hook (struct cgraph_node *node, void *)
     }
 }

+/* Return true if C is a clause that represents an OpenMP loop transformation
+   directive, false otherwise. */
+
+bool
+omp_loop_transform_clause_p (tree c)
+{
+  if (c == NULL)
+    return false;
+
+  enum omp_clause_code code = OMP_CLAUSE_CODE (c);
+  return (code == OMP_CLAUSE_UNROLL_FULL || code == OMP_CLAUSE_UNROLL_PARTIAL
+         || code == OMP_CLAUSE_UNROLL_NONE);
+}
+
 /* Try to resolve declare variant, return the variant decl if it should
    be used instead of base, or base otherwise.  */

diff --git a/gcc/omp-general.h b/gcc/omp-general.h
index 92717db1628..8d6390ad6f6 100644
--- a/gcc/omp-general.h
+++ b/gcc/omp-general.h
@@ -113,6 +113,7 @@ extern int omp_context_selector_matches (tree);
 extern int omp_context_selector_set_compare (const char *, tree, tree);
 extern tree omp_get_context_selector (tree, const char *, const char *);
 extern tree omp_resolve_declare_variant (tree);
+extern bool omp_loop_transform_clause_p (tree);
 extern tree oacc_launch_pack (unsigned code, tree device, unsigned op);
 extern tree oacc_replace_fn_attrib_attr (tree attribs, tree dims);
 extern void oacc_replace_fn_attrib (tree fn, tree dims);
diff --git a/gcc/omp-transform-loops.cc b/gcc/omp-transform-loops.cc
new file mode 100644
index 00000000000..d845d0e4798
--- /dev/null
+++ b/gcc/omp-transform-loops.cc
@@ -0,0 +1,1401 @@
+/* OMP loop transformation pass. Transforms loops according to
+   loop transformations directives such as "omp unroll".
+
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "pretty-print.h"
+#include "diagnostic-core.h"
+#include "backend.h"
+#include "target.h"
+#include "tree.h"
+#include "tree-inline.h"
+#include "gimple.h"
+#include "gimple-iterator.h"
+#include "tree-pass.h"
+#include "gimple-walk.h"
+#include "gimple-pretty-print.h"
+#include "gimplify.h"
+#include "ssa.h"
+#include "tree-into-ssa.h"
+#include "fold-const.h"
+#include "print-tree.h"
+#include "omp-general.h"
+
+/* Context information for walk_omp_for_loops. */
+struct walk_ctx
+{
+  /* The most recently visited gomp_for that has been transformed and
+     for which gimple_omp_for_set_combined_into_p returned true. */
+  gomp_for *inner_combined_loop;
+
+  /* The innermost bind enclosing the currently visited node. */
+  gbind *bind;
+};
+
+static unsigned int walk_omp_for_loops (gimple_seq *, walk_ctx *);
+static enum tree_code omp_adjust_neq_condition (tree v, tree step);
+
+static bool
+non_rectangular_p (const gomp_for *omp_for)
+{
+  size_t collapse = gimple_omp_for_collapse (omp_for);
+  for (size_t i = 0; i < collapse; i++)
+    {
+      if (TREE_CODE (gimple_omp_for_final (omp_for, i)) == TREE_VEC
+         || TREE_CODE (gimple_omp_for_initial (omp_for, i)) == TREE_VEC)
+       return true;
+    }
+
+  return false;
+}
+
+/* Callback for subst_var. */
+
+static tree
+subst_var_in_op (tree *t, int *subtrees ATTRIBUTE_UNUSED, void *data)
+{
+
+  auto *wi = (struct walk_stmt_info *)data;
+  auto from_to = (std::pair<tree, tree> *)wi->info;
+
+  if (*t == from_to->first)
+    {
+      *t = from_to->second;
+      wi->changed = true;
+    }
+
+  return NULL_TREE;
+}
+
+/* Substitute all occurrences of FROM in the operands of the GIMPLE statements
+   in SEQ by TO. */
+
+static void
+subst_var (gimple_seq *seq, tree from, tree to)
+{
+  gcc_assert (VAR_P (from));
+  gcc_assert (VAR_P (to));
+
+  std::pair<tree, tree> from_to (from, to);
+  struct walk_stmt_info wi;
+  memset (&wi, 0, sizeof (wi));
+  wi.info = (void *)&from_to;
+
+  walk_gimple_seq_mod (seq, NULL, subst_var_in_op, &wi);
+}
+
+/* Return the type that should be used for computing the iteration count of a
+   loop with the given index VAR and upper/lower bound FINAL according to
+   OpenMP 5.1. */
+
+tree
+gomp_for_iter_count_type (tree var, tree final)
+{
+  tree var_type = TREE_TYPE (var);
+
+  if (POINTER_TYPE_P (var_type))
+    return ptrdiff_type_node;
+
+  tree operand_type = TREE_TYPE (final);
+  if (TYPE_UNSIGNED (var_type) && !TYPE_UNSIGNED (operand_type))
+    return signed_type_for (operand_type);
+
+  return var_type;
+}
+
+extern tree
+gimple_assign_rhs_to_tree (gimple *stmt);
+
+/* Substitute all definitions from SEQ bottom-up into EXPR. This is used to
+   reconstruct a tree for a gimplified expression for determinig whether or not
+   the number of iterations of a loop is constant. */
+
+tree
+subst_defs (tree expr, gimple_seq seq)
+{
+  gimple_seq_node last = gimple_seq_last (seq);
+  gimple_seq_node first = gimple_seq_first (seq);
+  for (auto n = last; n != NULL; n = n != first ? n->prev : NULL)
+    {
+      if (!is_gimple_assign (n))
+       continue;
+
+      tree lhs = gimple_assign_lhs (n);
+      tree rhs = gimple_assign_rhs_to_tree (n);
+      std::pair<tree, tree> from_to (lhs, rhs);
+      struct walk_stmt_info wi;
+      memset (&wi, 0, sizeof (wi));
+      wi.info = (void *)&from_to;
+      walk_tree (&expr, subst_var_in_op, &wi, NULL);
+      expr = fold (expr);
+    }
+
+  return expr;
+}
+
+/* Return an expression for the number of iterations of the outermost loop of
+   OMP_FOR. */
+
+tree
+gomp_for_number_of_iterations (const gomp_for *omp_for, size_t level)
+{
+  gcc_assert (!non_rectangular_p (omp_for));
+
+  tree init = gimple_omp_for_initial (omp_for, level);
+  tree final = gimple_omp_for_final (omp_for, level);
+  tree_code cond = gimple_omp_for_cond (omp_for, level);
+  tree index = gimple_omp_for_index (omp_for, level);
+  tree type = gomp_for_iter_count_type (index, final);
+  tree step = TREE_OPERAND (gimple_omp_for_incr (omp_for, level), 1);
+
+  init = subst_defs (init, gimple_omp_for_pre_body (omp_for));
+  init = fold (init);
+  final = subst_defs (final, gimple_omp_for_pre_body (omp_for));
+  final = fold (final);
+
+  tree_code minus_code = MINUS_EXPR;
+  tree diff_type = type;
+  if (POINTER_TYPE_P (TREE_TYPE (final)))
+    {
+      minus_code = POINTER_DIFF_EXPR;
+      diff_type = ptrdiff_type_node;
+    }
+
+  tree diff;
+  if (cond == GT_EXPR)
+    diff = fold_build2 (minus_code, diff_type, init, final);
+  else if (cond == LT_EXPR)
+    diff = fold_build2 (minus_code, diff_type, final, init);
+  else
+    gcc_unreachable ();
+
+  diff = fold_build2 (CEIL_DIV_EXPR, type, diff, step);
+  diff = fold_build1 (ABS_EXPR, type, diff);
+
+  return diff;
+}
+
+/* Return true if the expression representing the number of iterations for
+   OMP_FOR is a constant expression, false otherwise. */
+
+bool
+gomp_for_constant_iterations_p (gomp_for *omp_for,
+                               unsigned HOST_WIDE_INT *iterations)
+{
+  tree t = gomp_for_number_of_iterations (omp_for, 0);
+  if (!TREE_CONSTANT (t)
+      || !tree_fits_uhwi_p (t))
+    return false;
+
+  *iterations = tree_to_uhwi (t);
+  return true;
+}
+
+/* Split a gomp_for that represents a collapsed loop-nest into single
+   loops. The result is a gomp_for of the same kind which is not collapsed
+   (i.e. gimple_omp_for_collapse (OMP_FOR) == 1) and which contains nested,
+   non-collapsed gomp_for loops whose kind is GF_OMP_FOR_KIND_TRANSFORM_LOOP
+   (i.e. they will be lowered into plain, non-omp loops by this pass) for each
+   of the loops of OMP_FOR.  All loops whose depth is strictly less than
+   FROM_DEPTH are left collapsed. */
+
+static gomp_for*
+gomp_for_uncollapse (gomp_for *omp_for, int from_depth = 0)
+{
+  int collapse = gimple_omp_for_collapse (omp_for);
+  gcc_assert (from_depth < collapse);
+
+  if (collapse <= 1)
+    return omp_for;
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE | MSG_PRIORITY_INTERNALS, omp_for,
+                    "Uncollapsing loop:\n %G\n",
+                    static_cast <gimple *> (omp_for));
+
+  gimple_seq body = gimple_omp_body (omp_for);
+  gomp_for *level_omp_for = omp_for;
+  for (int level = collapse - 1; level >= from_depth; level--)
+    {
+      level_omp_for = gimple_build_omp_for (body,
+                                           GF_OMP_FOR_KIND_TRANSFORM_LOOP,
+                                           NULL, 1, NULL);
+      gimple_omp_for_set_cond (level_omp_for, 0,
+                              gimple_omp_for_cond (omp_for, level));
+      gimple_omp_for_set_initial (level_omp_for, 0,
+                                 gimple_omp_for_initial (omp_for, level));
+      gimple_omp_for_set_final (level_omp_for, 0,
+                               gimple_omp_for_final (omp_for, level));
+      gimple_omp_for_set_incr (level_omp_for, 0,
+                              gimple_omp_for_incr (omp_for, level));
+      gimple_omp_for_set_index (level_omp_for, 0,
+                               gimple_omp_for_index (omp_for, level));
+
+      body = level_omp_for;
+    }
+
+  omp_for->collapse = from_depth;
+
+  if (from_depth > 0)
+    {
+      gimple_omp_set_body (omp_for, body);
+      return omp_for;
+    }
+
+  gimple_omp_for_set_clauses (level_omp_for, gimple_omp_for_clauses (omp_for));
+  gimple_omp_for_set_pre_body (level_omp_for, gimple_omp_for_pre_body (omp_for));
+  gimple_omp_for_set_combined_into_p (level_omp_for,
+                                     gimple_omp_for_combined_into_p (omp_for));
+  gimple_omp_for_set_combined_p (level_omp_for,
+                                gimple_omp_for_combined_p (omp_for));
+
+  return level_omp_for;
+}
+
+static tree
+build_loop_exit_cond (tree index, tree_code cond, tree final, gimple_seq *seq)
+{
+  tree exit_cond
+      = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node,
+                    fold_build2 (cond, boolean_type_node, index, final));
+  tree res = create_tmp_var (boolean_type_node);
+  gimplify_assign (res, exit_cond, seq);
+
+  return res;
+}
+
+/* Returns a register that contains the final value of a loop as described by
+   FINAL. This is necessary for non-rectangular loops. */
+
+static tree
+build_loop_final (tree final, gimple_seq *seq)
+{
+  if (TREE_CODE (final) != TREE_VEC) /* rectangular loop-nest */
+    return final;
+
+  tree coeff = TREE_VEC_ELT (final, 0);
+  tree outer_var = TREE_VEC_ELT (final, 1);
+  tree constt = TREE_VEC_ELT (final, 2);
+
+  tree type = TREE_TYPE (outer_var);
+  tree val = fold_build2 (MULT_EXPR, type, coeff, outer_var);
+  val = fold_build2 (PLUS_EXPR, type, val, constt);
+
+  tree res = create_tmp_var (type);
+  gimplify_assign (res, val, seq);
+
+  return res;
+}
+
+/* Unroll the loop BODY UNROLL_FACTOR times, replacing the INDEX
+   variable by a local copy in each copy of the body that will be
+   incremented as specified by INCR.  If BUILD_EXIT_CONDS is true,
+   insert a test of the loop exit condition given COND and FINAL
+   before each copy of the body that will exit the loop if the value
+   of the local index variable satisfies the loop exit condition.
+
+   For example, the unrolling with BUILD_EXIT_CONDS == true turns
+
+    for (i = 0; i < 3; i = i + 1)
+    {
+       BODY
+    }
+
+    into
+
+    for (i = 0; i < n; i = i + 1)
+    {
+       i.0 = i
+       if (!(i_0 < n))
+         goto exit
+       BODY_COPY_1[i/i.0]              i.e. index var i replaced by i.0
+       if (!(i_1 < n))
+         goto exit
+       i.1 = i.0 + 1
+       BODY_COPY_2[i/i.1]
+       if (!(i_3 < n))
+         goto exit
+       i.2 = i.2 + 1
+       BODY_COPY_3[i/i.2]
+       exit:
+    }
+ */
+static gimple_seq
+build_unroll_body (gimple_seq body, tree unroll_factor, tree index, tree incr,
+                  bool build_exit_conds = false, tree final = NULL_TREE,
+                  tree_code *cond = NULL)
+{
+  gcc_assert ((!build_exit_conds && !final && !cond)
+             || (build_exit_conds && final && cond));
+
+  gimple_seq new_body = NULL;
+
+  push_gimplify_context ();
+
+  if (build_exit_conds)
+    final = build_loop_final (final, &new_body);
+
+  tree local_index = create_tmp_var (TREE_TYPE (index));
+  subst_var (&body, index, local_index);
+  tree local_incr = unshare_expr (incr);
+  TREE_OPERAND (local_incr, 0) = local_index;
+
+  tree exit_label = create_artificial_label (gimple_location (body));
+
+  unsigned HOST_WIDE_INT n = tree_to_uhwi (unroll_factor);
+  for (unsigned HOST_WIDE_INT i = 0; i < n; i++)
+    {
+      if (i == 0)
+       gimplify_assign (local_index, index, &new_body);
+      else
+       gimplify_assign (local_index, local_incr, &new_body);
+
+      tree body_copy_label = create_artificial_label (gimple_location (body));
+
+      if (build_exit_conds)
+       {
+         tree exit_cond
+             = build_loop_exit_cond (local_index, *cond, final, &new_body);
+         gimple_seq_add_stmt (
+             &new_body,
+             gimple_build_cond (EQ_EXPR, exit_cond, boolean_true_node,
+                                exit_label, body_copy_label));
+       }
+
+      gimple_seq body_copy = copy_gimple_seq_and_replace_locals (body);
+      gimple_seq_add_stmt (&new_body, gimple_build_label (body_copy_label));
+      gimple_seq_add_seq (&new_body, body_copy);
+    }
+
+
+  gbind *bind = gimple_build_bind (NULL, new_body, NULL);
+  pop_gimplify_context (bind);
+
+  gimple_seq result = NULL;
+  gimple_seq_add_stmt (&result, bind);
+  gimple_seq_add_stmt (&result, gimple_build_label (exit_label));
+  return result;
+}
+
+static gimple_seq transform_gomp_for (gomp_for *, tree, walk_ctx *ctx);
+
+/* Execute the partial unrolling transformation for OMP_FOR with the given
+   UNROLL_FACTOR and return the resulting gimple bind. LOC is the location for
+   diagnostic messages.
+
+   Example
+   --------
+   --------
+
+    Original loop
+    -------------
+
+    #pragma omp for unroll_partial(3)
+    for (i = 0; i < 100; i = i + 1)
+    {
+       BODY
+    }
+
+    gets, roughly, translated to
+
+    {
+    #pragma omp for
+    for (i = 0; i < 100; i = i + 3)
+    {
+       i.0 = i
+       if i.0 > 100:
+           goto exit_label
+       BODY_COPY_1[i/i.0]              i.e. index var replaced
+       i.1 = i + 1
+       if i.1 > 100:
+           goto exit_label
+       BODY_COPY_2[i/1.1]
+       i.2 = i + 2
+       if i.2 > 100:
+           goto exit_label
+       BODY_COPY_3[i/i.2]
+
+       exit_label:
+    }
+ */
+
+/* FIXME The value of the loop counter of the transformed loop is
+currently unspecified. OpenMP 5.2 does not define what the value
+should be. There is an open OpenMP spec issue ("Loop counter value
+after transform: Misc 6.0: Loop transformations #3440") in the
+non-public OpenMP spec repository. */
+
+static gimple_seq
+partial_unroll (gomp_for *omp_for, tree unroll_factor,
+               location_t loc, tree transformation_clauses, walk_ctx *ctx)
+{
+  gcc_assert (unroll_factor);
+  gcc_assert (
+      OMP_CLAUSE_CODE (transformation_clauses) == OMP_CLAUSE_UNROLL_PARTIAL
+      || OMP_CLAUSE_CODE (transformation_clauses) == OMP_CLAUSE_UNROLL_NONE);
+
+  /* Partial unrolling reduces the loop nest depth of a canonical loop nest to 1
+     hence outer directives cannot require a greater collapse. */
+  gcc_assert (gimple_omp_for_collapse (omp_for) <= 1);
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE | MSG_PRIORITY_INTERNALS,
+                    dump_user_location_t::from_location_t (loc),
+                    "Partially unrolling loop:\n %G\n",
+                    static_cast<gimple *> (omp_for));
+
+  gomp_for *unrolled_for = as_a<gomp_for *> (copy_gimple_seq_and_replace_locals (omp_for));
+
+  tree final = gimple_omp_for_final (unrolled_for, 0);
+  tree incr = gimple_omp_for_incr (unrolled_for, 0);
+  tree index = gimple_omp_for_index (unrolled_for, 0);
+  gimple_seq body = gimple_omp_body (unrolled_for);
+
+  tree_code cond = gimple_omp_for_cond (unrolled_for, 0);
+  tree step = TREE_OPERAND (incr, 1);
+  gimple_omp_set_body (unrolled_for,
+                      build_unroll_body (body, unroll_factor, index, incr,
+                                         true, final, &cond));
+
+  gbind *result_bind = gimple_build_bind (NULL, NULL, NULL);
+
+  push_gimplify_context ();
+
+  tree scaled_step
+      = fold_build2 (MULT_EXPR, TREE_TYPE (step),
+                    fold_convert (TREE_TYPE (step), unroll_factor), step);
+
+  /* For combined constructs, step will be gimplified on the outer
+     gomp_for. */
+  if (!gimple_omp_for_combined_into_p (omp_for)
+      && !TREE_CONSTANT (scaled_step))
+    {
+      tree var = create_tmp_var (TREE_TYPE (step), ".omp_unroll_step");
+      gimplify_assign (var, scaled_step,
+                      gimple_omp_for_pre_body_ptr (unrolled_for));
+      scaled_step = var;
+    }
+  TREE_OPERAND (incr, 1) = scaled_step;
+  gimple_omp_for_set_incr (unrolled_for, 0, incr);
+
+  pop_gimplify_context (result_bind);
+
+  if (gimple_omp_for_combined_into_p (omp_for))
+    ctx->inner_combined_loop = unrolled_for;
+
+  tree remaining_clauses = OMP_CLAUSE_CHAIN (transformation_clauses);
+  gimple_seq_add_stmt (
+      gimple_bind_body_ptr (result_bind),
+      transform_gomp_for (unrolled_for, remaining_clauses, ctx));
+
+  return result_bind;
+}
+
+static gimple_seq
+full_unroll (gomp_for *omp_for, location_t loc, walk_ctx *ctx ATTRIBUTE_UNUSED)
+{
+  tree init = gimple_omp_for_initial (omp_for, 0);
+  unsigned HOST_WIDE_INT niter = 0;
+  if (!gomp_for_constant_iterations_p (omp_for, &niter))
+    {
+      error_at (loc, "Cannot apply full unrolling to loop with "
+                    "non-constant number of iterations");
+      return omp_for;
+    }
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE | MSG_PRIORITY_INTERNALS,
+                    dump_user_location_t::from_location_t (loc),
+                    "Fully unrolling loop with "
+                    HOST_WIDE_INT_PRINT_UNSIGNED
+                    " iterations :\n %G\n", niter,
+                    static_cast <gimple *>(omp_for));
+
+  tree incr = gimple_omp_for_incr (omp_for, 0);
+  tree index = gimple_omp_for_index (omp_for, 0);
+  gimple_seq body = gimple_omp_body (omp_for);
+
+  tree unroll_factor = build_int_cst (TREE_TYPE (init), niter);
+
+  gimple_seq unrolled = NULL;
+  gimple_seq_add_seq (&unrolled, gimple_omp_for_pre_body (omp_for));
+  push_gimplify_context ();
+  gimple_seq_add_seq (&unrolled,
+                     build_unroll_body (body, unroll_factor, index, incr));
+
+  gbind *result_bind = gimple_build_bind (NULL, unrolled, NULL);
+  pop_gimplify_context (result_bind);
+  return result_bind;
+}
+
+/* Decides if the OMP_FOR for which the user did not specify the type of
+   unrolling to apply in the 'unroll' directive represented by the TRANSFORM
+   clause should be fully unrolled. */
+
+static bool
+assign_unroll_full_clause_p (gomp_for *omp_for, tree transform)
+{
+  gcc_assert (OMP_CLAUSE_CODE (transform) == OMP_CLAUSE_UNROLL_NONE);
+  gcc_assert (OMP_CLAUSE_CHAIN (transform) == NULL);
+
+  /* Full unrolling turns the loop into a non-loop and hence
+     the following transformations would fail. */
+  if (TREE_CHAIN (transform) != NULL_TREE)
+    return false;
+
+  unsigned HOST_WIDE_INT num_iters;
+  if (!gomp_for_constant_iterations_p (omp_for, &num_iters)
+      || num_iters
+            > (unsigned HOST_WIDE_INT)param_omp_unroll_full_max_iterations)
+    return false;
+
+  if (dump_enabled_p ())
+    {
+      auto loc = dump_user_location_t::from_location_t (
+         OMP_CLAUSE_LOCATION (transform));
+      dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc,
+                      "assigned %<full%> clause to %<omp unroll%> with small "
+                      "constant number of iterations\n");
+    }
+
+  return true;
+}
+
+/* If the OMP_FOR for which the user did not specify the type of unrolling in
+   the 'unroll' directive in the TRANSFORM clause should be partially unrolled,
+   return the unroll factor, otherwise return null. */
+
+static tree
+assign_unroll_partial_clause_p (gomp_for *omp_for ATTRIBUTE_UNUSED,
+                               tree transform)
+{
+  gcc_assert (OMP_CLAUSE_CODE (transform) == OMP_CLAUSE_UNROLL_NONE);
+
+  if (param_omp_unroll_default_factor == 0)
+    return NULL;
+
+  tree unroll_factor
+      = build_int_cst (integer_type_node, param_omp_unroll_default_factor);
+
+  if (dump_enabled_p ())
+    {
+      auto loc = dump_user_location_t::from_location_t (
+         OMP_CLAUSE_LOCATION (transform));
+      dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc,
+         "added %<partial(%u)%> clause to %<omp unroll%> directive\n",
+         param_omp_unroll_default_factor);
+    }
+
+  return unroll_factor;
+}
+
+/* Generate the code for an OMP_FOR that represents the result of a
+   loop transformation which is not associated with any directive and
+   which will hence not be lowered in the omp-expansion. */
+
+static gimple_seq
+expand_transformed_loop (gomp_for *omp_for)
+{
+  gcc_assert (gimple_omp_for_kind (omp_for)
+                      == GF_OMP_FOR_KIND_TRANSFORM_LOOP);
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE | MSG_PRIORITY_INTERNALS, omp_for,
+                    "Expanding loop:\n %G\n",
+                    static_cast <gimple *> (omp_for));
+
+  push_gimplify_context ();
+
+  omp_for = gomp_for_uncollapse (omp_for);
+
+  tree incr = gimple_omp_for_incr (omp_for, 0);
+  tree index = gimple_omp_for_index (omp_for, 0);
+  tree init = gimple_omp_for_initial (omp_for, 0);
+  tree final = gimple_omp_for_final (omp_for, 0);
+  tree_code cond = gimple_omp_for_cond (omp_for, 0);
+  gimple_seq body = gimple_omp_body (omp_for);
+  gimple_seq pre_body = gimple_omp_for_pre_body (omp_for);
+
+  gimple_seq loop = NULL;
+
+  tree exit_label = create_artificial_label (UNKNOWN_LOCATION);
+  tree cycle_label = create_artificial_label (UNKNOWN_LOCATION);
+  tree body_label = create_artificial_label (UNKNOWN_LOCATION);
+
+  gimple_seq_add_seq (&loop, pre_body);
+  gimplify_assign (index, init, &loop);
+  tree final_var = final;
+  if (TREE_CODE (final) != VAR_DECL)
+    {
+      final_var = create_tmp_var (TREE_TYPE (final));
+      gimplify_assign (final_var, final, &loop);
+    }
+
+  gimple_seq_add_stmt (&loop, gimple_build_label (cycle_label));
+  gimple_seq_add_stmt (&loop, gimple_build_cond (cond, index, final_var,
+                                                body_label, exit_label));
+  gimple_seq_add_stmt (&loop, gimple_build_label (body_label));
+  gimple_seq_add_seq (&loop, body);
+  gimplify_assign (index, incr, &loop);
+  gimple_seq_add_stmt (&loop, gimple_build_goto (cycle_label));
+  gimple_seq_add_stmt (&loop, gimple_build_label (exit_label));
+
+  gbind *bind = gimple_build_bind (NULL, loop, NULL);
+  pop_gimplify_context (bind);
+
+  return bind;
+}
+
+static enum tree_code
+omp_adjust_neq_condition (tree v, tree step)
+{
+  gcc_assert (TREE_CODE (step) == INTEGER_CST);
+  if (TREE_CODE (TREE_TYPE (v)) == INTEGER_TYPE)
+    {
+      if (integer_onep (step))
+       return LT_EXPR;
+      else
+       {
+         gcc_assert (integer_minus_onep (step));
+         return GT_EXPR;
+       }
+    }
+  else
+    {
+      tree unit = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (v)));
+      gcc_assert (TREE_CODE (unit) == INTEGER_CST);
+      if (tree_int_cst_equal (unit, step))
+       return LT_EXPR;
+      else
+       {
+         gcc_assert (wi::neg (wi::to_widest (unit))
+                     == wi::to_widest (step));
+         return GT_EXPR;
+       }
+    }
+}
+
+/* Adjust *COND_CODE and *N2 so that the former is either LT_EXPR or GT_EXPR,
+   given that V is the loop index variable and STEP is loop step.
+
+   This function has been derived from omp_adjust_for_condition.
+   In contrast to the original function it does not add 1 or
+   -1 to the the final value when converting <=,>= to <,>
+   for a pointer-type index variable. Instead, this function
+   adds or subtracts the type size in bytes. This is necessary
+   to determine the number of iterations correctly. */
+
+void
+omp_adjust_for_condition2 (location_t loc, enum tree_code *cond_code, tree *n2,
+                         tree v, tree step)
+{
+  switch (*cond_code)
+    {
+    case LT_EXPR:
+    case GT_EXPR:
+      break;
+
+    case NE_EXPR:
+      *cond_code = omp_adjust_neq_condition (v, step);
+      break;
+
+    case LE_EXPR:
+      if (POINTER_TYPE_P (TREE_TYPE (*n2)))
+       {
+         tree unit = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (v)));
+         HOST_WIDE_INT type_unit = tree_to_shwi (unit);
+
+         *n2 = fold_build_pointer_plus_hwi_loc (loc, *n2, type_unit);
+       }
+      else
+       *n2 = fold_build2_loc (loc, PLUS_EXPR, TREE_TYPE (*n2), *n2,
+                              build_int_cst (TREE_TYPE (*n2), 1));
+      *cond_code = LT_EXPR;
+      break;
+    case GE_EXPR:
+      if (POINTER_TYPE_P (TREE_TYPE (*n2)))
+       {
+         tree unit = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (v)));
+         HOST_WIDE_INT type_unit = tree_to_shwi (unit);
+         *n2 = fold_build_pointer_plus_hwi_loc (loc, *n2, -1 * type_unit);
+       }
+      else
+       *n2 = fold_build2_loc (loc, MINUS_EXPR, TREE_TYPE (*n2), *n2,
+                              build_int_cst (TREE_TYPE (*n2), 1));
+      *cond_code = GT_EXPR;
+      break;
+    default:
+      gcc_unreachable ();
+    }
+}
+
+/* Transform the condition of OMP_FOR to either LT_EXPR or GT_EXPR and adjust
+   the final value as necessary. */
+
+static bool
+canonicalize_conditions (gomp_for *omp_for)
+{
+  size_t collapse = gimple_omp_for_collapse (omp_for);
+  location_t loc = gimple_location (omp_for);
+  bool new_decls = false;
+
+  gimple_seq *pre_body = gimple_omp_for_pre_body_ptr (omp_for);
+  for (size_t l = 0; l < collapse; l++)
+    {
+      enum tree_code cond = gimple_omp_for_cond (omp_for, l);
+
+      if (cond == LT_EXPR || cond == GT_EXPR)
+       continue;
+
+      tree incr = gimple_omp_for_incr (omp_for, l);
+      tree step = omp_get_for_step_from_incr (loc, incr);
+      tree index = gimple_omp_for_index (omp_for, l);
+      tree final = gimple_omp_for_final (omp_for, l);
+      tree orig_final = final;
+      /* If final refers to the index variable of an outer level, i.e.
+        the loop nest is non-rectangular, only convert NE_EXPR. This
+        is necessary for unrolling.  Unrolling needs to multiply the
+        step by the unrolling factor, but non-constant step values
+        are impossible with NE_EXPR. */
+      if (TREE_CODE (final) == TREE_VEC)
+       {
+         cond = omp_adjust_neq_condition (TREE_VEC_ELT (final, 1),
+                                          TREE_OPERAND (incr, 1));
+         gimple_omp_for_set_cond (omp_for, l, cond);
+         continue;
+       }
+
+      omp_adjust_for_condition2 (loc, &cond, &final, index, step);
+
+      gimple_omp_for_set_cond (omp_for, l, cond);
+      if (final == orig_final)
+       continue;
+
+      /* If this is a combined construct, gimplify the final on the
+        outer construct. */
+      if (TREE_CODE (final) != INTEGER_CST
+         && !gimple_omp_for_combined_into_p (omp_for))
+       {
+         tree new_final = create_tmp_var (TREE_TYPE (final));
+         gimplify_assign (new_final, final, pre_body);
+         final = new_final;
+         new_decls = true;
+       }
+
+      gimple_omp_for_set_final (omp_for, l, final);
+    }
+
+  return new_decls;
+}
+
+/* Combined distribute or taskloop constructs are represented by two
+   or more nested gomp_for constructs which are created during
+   gimplification. Loop transformations on the combined construct are
+   executed on the innermost gomp_for. This function adjusts the loop
+   header of an outer OMP_FOR loop to the changes made by the
+   transformations on the inner loop which is provided by the CTX. */
+
+static gimple_seq
+adjust_combined_loop (gomp_for *omp_for, walk_ctx *ctx)
+{
+  gcc_assert (gimple_omp_for_combined_p (omp_for));
+  gcc_assert (ctx->inner_combined_loop);
+
+  gomp_for *inner_omp_for = ctx->inner_combined_loop;
+  size_t collapse = gimple_omp_for_collapse (inner_omp_for);
+
+  int kind = gimple_omp_for_kind (omp_for);
+  if (kind == GF_OMP_FOR_KIND_DISTRIBUTE || kind == GF_OMP_FOR_KIND_TASKLOOP)
+    {
+      for (size_t level = 0; level < collapse; ++level)
+       {
+         tree outer_incr = gimple_omp_for_incr (omp_for, level);
+         tree inner_incr = gimple_omp_for_incr (inner_omp_for, level);
+         gcc_assert (TREE_TYPE (inner_incr) == TREE_TYPE (outer_incr));
+
+         tree inner_final = gimple_omp_for_final (inner_omp_for, level);
+         enum tree_code inner_cond
+             = gimple_omp_for_cond (inner_omp_for, level);
+         gimple_omp_for_set_cond (omp_for, level, inner_cond);
+
+         tree inner_step = TREE_OPERAND (inner_incr, 1);
+         /* If this omp_for is the outermost loop belonging to a
+            combined construct, gimplify the step into its
+            prebody. Otherwise, just gimplify the step on the inner
+            gomp_for and move the ungimplified step expression
+            here. */
+         if (!gimple_omp_for_combined_into_p (omp_for)
+             && !TREE_CONSTANT (inner_step))
+           {
+             push_gimplify_context ();
+             tree step = create_tmp_var (TREE_TYPE (inner_incr),
+                                         ".omp_combined_step");
+             gimplify_assign (step, inner_step,
+                              gimple_omp_for_pre_body_ptr (omp_for));
+             pop_gimplify_context (ctx->bind);
+             TREE_OPERAND (outer_incr, 1) = step;
+           }
+         else
+           TREE_OPERAND (outer_incr, 1) = inner_step;
+
+         if (!gimple_omp_for_combined_into_p (omp_for)
+             && !TREE_CONSTANT (inner_final))
+           {
+             push_gimplify_context ();
+             tree final = create_tmp_var (TREE_TYPE (inner_final),
+                                          ".omp_combined_final");
+             gimplify_assign (final, inner_final,
+                              gimple_omp_for_pre_body_ptr (omp_for));
+             pop_gimplify_context (ctx->bind);
+             gimple_omp_for_set_final (omp_for, level, final);
+           }
+         else
+           gimple_omp_for_set_final (omp_for, level, inner_final);
+
+         /* Gimplify the step on the inner loop of the combined construct. */
+         if (!TREE_CONSTANT (inner_step))
+           {
+             push_gimplify_context ();
+             tree step = create_tmp_var (TREE_TYPE (inner_incr),
+                                         ".omp_combined_step");
+             gimplify_assign (step, inner_step,
+                              gimple_omp_for_pre_body_ptr (inner_omp_for));
+             TREE_OPERAND (inner_incr, 1) = step;
+             pop_gimplify_context (ctx->bind);
+
+             tree private_clause = build_omp_clause (
+                 gimple_location (omp_for), OMP_CLAUSE_PRIVATE);
+             OMP_CLAUSE_DECL (private_clause) = step;
+             tree *clauses = gimple_omp_for_clauses_ptr (inner_omp_for);
+             *clauses = chainon (*clauses, private_clause);
+           }
+
+         /* Gimplify the final on the inner loop of the combined construct. */
+         if (!TREE_CONSTANT (inner_final))
+           {
+             push_gimplify_context ();
+             tree final = create_tmp_var (TREE_TYPE (inner_incr),
+                                          ".omp_combined_final");
+             gimplify_assign (final, inner_final,
+                              gimple_omp_for_pre_body_ptr (inner_omp_for));
+             gimple_omp_for_set_final (inner_omp_for, level, final);
+             pop_gimplify_context (ctx->bind);
+
+             tree private_clause = build_omp_clause (
+                 gimple_location (omp_for), OMP_CLAUSE_PRIVATE);
+             OMP_CLAUSE_DECL (private_clause) = final;
+             tree *clauses = gimple_omp_for_clauses_ptr (inner_omp_for);
+             *clauses = chainon (*clauses, private_clause);
+           }
+       }
+    }
+
+  if (gimple_omp_for_combined_into_p (omp_for))
+    ctx->inner_combined_loop = omp_for;
+  else
+    ctx->inner_combined_loop = NULL;
+
+  return omp_for;
+}
+
+/* Transform OMP_FOR recursively according to the clause chain
+   TRANSFORMATION. Return the resulting sequence of gimple statements.
+
+   This function dispatches OMP_FOR to the handler function for the
+   TRANSFORMATION clause. The handler function is responsible for invoking this
+   function recursively for executing the remaining transformations. */
+
+static gimple_seq
+transform_gomp_for (gomp_for *omp_for, tree transformation, walk_ctx *ctx)
+{
+  if (!transformation)
+    {
+      if (gimple_omp_for_kind (omp_for) == GF_OMP_FOR_KIND_TRANSFORM_LOOP)
+       return expand_transformed_loop (omp_for);
+
+      return omp_for;
+    }
+
+  push_gimplify_context ();
+
+  bool added_decls = canonicalize_conditions (omp_for);
+
+  gimple_seq result = NULL;
+  location_t loc = OMP_CLAUSE_LOCATION (transformation);
+  auto dump_loc = dump_user_location_t::from_location_t (loc);
+  switch (OMP_CLAUSE_CODE (transformation))
+    {
+    case OMP_CLAUSE_UNROLL_FULL:
+      gcc_assert (TREE_CHAIN (transformation) == NULL);
+      result = full_unroll (omp_for, loc, ctx);
+      break;
+    case OMP_CLAUSE_UNROLL_NONE:
+      gcc_assert (TREE_CHAIN (transformation) == NULL);
+      if (assign_unroll_full_clause_p (omp_for, transformation))
+       {
+         result = full_unroll (omp_for, loc, ctx);
+       }
+      else if (tree unroll_factor
+              = assign_unroll_partial_clause_p (omp_for, transformation))
+       {
+         result = partial_unroll (omp_for, unroll_factor, loc,
+                                  transformation, ctx);
+       }
+      else {
+         if (dump_enabled_p ())
+           {
+             /* TODO Try to inform the unrolling pass that the user
+                wants to unroll this loop. This could relax some
+                restrictions there, e.g. on the code size? */
+             dump_printf_loc (
+                 MSG_MISSED_OPTIMIZATION, dump_loc,
+                 "not unrolling loop with %<omp unroll%> directive. Add "
+                 "clause to specify unrolling type or invoke the "
+                 "compiler with --param=omp-unroll-default-factor=n for some"
+                 "constant integer n");
+           }
+         result = transform_gomp_for (omp_for, NULL, ctx);
+      }
+
+      break;
+    case OMP_CLAUSE_UNROLL_PARTIAL:
+      {
+       tree unroll_factor = OMP_CLAUSE_UNROLL_PARTIAL_EXPR (transformation);
+       if (!unroll_factor)
+         {
+           // TODO Use target architecture dependent constants?
+           unsigned factor = param_omp_unroll_default_factor > 0
+                                 ? param_omp_unroll_default_factor
+                                 : 5;
+           unroll_factor = build_int_cst (integer_type_node, factor);
+
+           if (dump_enabled_p ())
+             dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, dump_loc,
+                              "%<partial%> clause without unrolling "
+                              "factor turned into %<partial(%u)%> clause\n",
+                              factor);
+         }
+       result = partial_unroll (omp_for, unroll_factor, loc, transformation,
+                                ctx);
+      }
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+  if (added_decls && gimple_code (result) != GIMPLE_BIND)
+    result = gimple_build_bind (NULL, result, NULL);
+  pop_gimplify_context (added_decls ? result : NULL); /* for decls from canonicalize_loops */
+
+  return result;
+}
+
+/* Remove all loop transformation clauses from the clauses of OMP_FOR and
+   return a new tree chain containing just those clauses.
+
+   The clauses correspond to transformation *directives* associated with the
+   OMP_FOR's loop. The returned clauses are ordered from the innermost
+   directive to the outermost, i.e. in the order in which the transformations
+   should execute.
+
+   Example:
+   --------
+   --------
+
+   The loop
+
+   #pragma omp for nowait
+   #pragma omp unroll partial(5)
+   #pragma omp tile sizes(2,2)
+   LOOP
+
+   is represented as
+
+   #pragma omp for nowait unroll_partial(5) tile_sizes(2,2)
+   LOOP
+
+   Gimplification may add clauses after the transformation clauses added
+   by the front ends. This function will leave only the "nowait" clause on
+   OMP_FOR and return the clauses "tile_sizes(2,2) unroll_partial(5)". */
+
+static tree
+gomp_for_remove_transformation_clauses (gomp_for *omp_for)
+{
+  tree *clauses = gimple_omp_for_clauses_ptr (omp_for);
+  tree trans_clauses = NULL;
+  tree last_other_clause = NULL;
+
+  for (tree c = gimple_omp_for_clauses (omp_for); c != NULL_TREE;)
+    {
+      tree chain_tail = OMP_CLAUSE_CHAIN (c);
+      if (omp_loop_transform_clause_p (c))
+       {
+         if (last_other_clause)
+           OMP_CLAUSE_CHAIN (last_other_clause) = chain_tail;
+         else
+           *clauses = OMP_CLAUSE_CHAIN (c);
+
+         OMP_CLAUSE_CHAIN (c) = NULL;
+         trans_clauses = chainon (trans_clauses, c);
+       }
+      else
+       {
+         /* There should be no other clauses between loop transformations ... */
+         gcc_assert (!trans_clauses || !last_other_clause
+                     || TREE_CHAIN (last_other_clause) == c);
+         /* ... and hence stop if transformations were found before the
+            non-transformation clause C. */
+         if (trans_clauses)
+           break;
+         last_other_clause = c;
+        }
+
+      c = chain_tail;
+    }
+
+  return nreverse (trans_clauses);
+}
+
+static void
+print_optimized_unroll_partial_msg (tree c)
+{
+  gcc_assert (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_UNROLL_PARTIAL);
+  location_t loc = OMP_CLAUSE_LOCATION (c);
+  dump_user_location_t dump_loc;
+  dump_loc = dump_user_location_t::from_location_t (loc);
+
+  tree unroll_factor = OMP_CLAUSE_UNROLL_PARTIAL_EXPR (c);
+  dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, dump_loc,
+                  "replaced consecutive %<omp unroll%> directives by "
+                  "%<omp unroll auto(" HOST_WIDE_INT_PRINT_UNSIGNED
+                  ")%>\n", tree_to_uhwi (unroll_factor));
+}
+
+/* Optimize CLAUSES by removing and merging redundant clauses.  Return the
+   optimized clause chain. */
+
+static tree
+optimize_transformation_clauses (tree clauses)
+{
+  /* The last unroll_partial clause seen in clauses, if any,
+     or the last merged unroll partial clause. */
+  tree unroll_partial = NULL;
+  /* The last clause was not a unroll_partial clause, if any.
+     unroll_full and unroll_none are not relevant because
+     they appear only at the end of a chain. */
+  tree last_non_unroll = NULL;
+  /* Indicates that at least two unroll_partial clauses have been merged
+     since last_non_unroll was seen. */
+  bool merged_unroll_partial = false;
+
+  for (tree c = clauses; c != NULL_TREE; c = OMP_CLAUSE_CHAIN (c))
+    {
+      enum omp_clause_code code = OMP_CLAUSE_CODE (c);
+
+      switch (code)
+       {
+       case OMP_CLAUSE_UNROLL_NONE:
+         /* 'unroll' without a clause cannot be followed by any
+            transformations because its result does not have canonical loop
+            nest form. */
+         gcc_assert (OMP_CLAUSE_CHAIN (c) == NULL);
+         unroll_partial = NULL;
+         merged_unroll_partial = false;
+         break;
+       case OMP_CLAUSE_UNROLL_FULL:
+         /* 'unroll full' cannot be followed by any transformations because
+            its result does not have canonical loop nest form. */
+         gcc_assert (OMP_CLAUSE_CHAIN (c) == NULL);
+
+         /* Previous 'unroll partial' directives are useless. */
+         if (unroll_partial)
+           {
+             if (last_non_unroll)
+               OMP_CLAUSE_CHAIN (last_non_unroll) = c;
+             else
+               clauses = c;
+
+             if (dump_enabled_p ())
+               {
+                 location_t loc = OMP_CLAUSE_LOCATION (c);
+                 dump_user_location_t dump_loc;
+                 dump_loc = dump_user_location_t::from_location_t (loc);
+
+                 dump_printf_loc (
+                     MSG_OPTIMIZED_LOCATIONS, dump_loc,
+                     "removed useless %<omp unroll auto%> directives "
+                     "preceding 'omp unroll full'\n");
+               }
+           }
+         unroll_partial = NULL;
+         merged_unroll_partial = false;
+         break;
+       case OMP_CLAUSE_UNROLL_PARTIAL:
+         {
+           /* Merge a sequence of consecutive 'unroll partial' directives.
+              Note that it impossible for 'unroll full' or 'unroll' to
+              appear inbetween the 'unroll partial' clauses because they
+              remove the loop-nest. */
+           if (unroll_partial)
+             {
+               tree factor = OMP_CLAUSE_UNROLL_PARTIAL_EXPR (unroll_partial);
+               tree c_factor = OMP_CLAUSE_UNROLL_PARTIAL_EXPR (c);
+               if (factor && c_factor)
+                 factor = fold_build2 (MULT_EXPR, TREE_TYPE (factor), factor,
+                                       c_factor);
+               else if (!factor && c_factor)
+                 factor = c_factor;
+
+               gcc_assert (!factor || TREE_CODE (factor) == INTEGER_CST);
+
+               OMP_CLAUSE_UNROLL_PARTIAL_EXPR (unroll_partial) = factor;
+               OMP_CLAUSE_CHAIN (unroll_partial) = OMP_CLAUSE_CHAIN (c);
+               OMP_CLAUSE_LOCATION (unroll_partial) = OMP_CLAUSE_LOCATION (c);
+               merged_unroll_partial = true;
+             }
+           else
+             unroll_partial = c;
+         }
+         break;
+       default:
+         gcc_unreachable ();
+       }
+    }
+
+  if (merged_unroll_partial && dump_enabled_p ())
+    print_optimized_unroll_partial_msg (unroll_partial);
+
+  return clauses;
+}
+
+/* Visit the current statement in GSI_P in the walk_omp_for_loops walk and
+   execute all loop transformations found on it. */
+
+void
+process_omp_for (gomp_for *omp_for, gimple_seq *containing_seq, walk_ctx *ctx)
+{
+  auto gsi_p = gsi_for_stmt (omp_for, containing_seq);
+  tree transform_clauses = gomp_for_remove_transformation_clauses (omp_for);
+
+  /* Do not attempt to transform broken code which might violate the
+     assumptions of the loop transformation implementations.
+
+     Transformation clauses must be dropped first because following
+     passes do not handle them. */
+  if (seen_error ())
+    return;
+
+  transform_clauses = optimize_transformation_clauses (transform_clauses);
+
+  gimple *transformed = omp_for;
+  if (gimple_omp_for_combined_p (omp_for)
+      && ctx->inner_combined_loop)
+    transformed = adjust_combined_loop (omp_for, ctx);
+  else
+    transformed = transform_gomp_for (omp_for, transform_clauses, ctx);
+
+  if (transformed == omp_for)
+    return;
+
+  gsi_replace_with_seq (&gsi_p, transformed, true);
+
+  if (!dump_enabled_p () || !(dump_flags & TDF_DETAILS))
+    return;
+
+  dump_printf_loc (MSG_NOTE | MSG_PRIORITY_INTERNALS, transformed,
+                  "Transformed loop: %G\n\n", transformed);
+}
+
+/* Traverse SEQ in depth-first order and apply the loop transformation
+   found on gomp_for statements. */
+
+static unsigned int
+walk_omp_for_loops (gimple_seq *seq, walk_ctx *ctx)
+{
+  gimple_stmt_iterator gsi;
+  for (gsi = gsi_start (*seq); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      gimple *stmt = gsi_stmt (gsi);
+      switch (gimple_code (stmt))
+       {
+       case GIMPLE_OMP_CRITICAL:
+       case GIMPLE_OMP_MASTER:
+       case GIMPLE_OMP_MASKED:
+       case GIMPLE_OMP_TASKGROUP:
+       case GIMPLE_OMP_ORDERED:
+       case GIMPLE_OMP_SCAN:
+       case GIMPLE_OMP_SECTION:
+       case GIMPLE_OMP_PARALLEL:
+       case GIMPLE_OMP_TASK:
+       case GIMPLE_OMP_SCOPE:
+       case GIMPLE_OMP_SECTIONS:
+       case GIMPLE_OMP_SINGLE:
+       case GIMPLE_OMP_TARGET:
+       case GIMPLE_OMP_TEAMS:
+         {
+           gbind *bind = ctx->bind;
+           walk_omp_for_loops (gimple_omp_body_ptr (stmt), ctx);
+           ctx->bind = bind;
+           break;
+         }
+       case GIMPLE_OMP_FOR:
+         {
+           gbind *bind = ctx->bind;
+           walk_omp_for_loops (gimple_omp_for_pre_body_ptr (stmt), ctx);
+           walk_omp_for_loops (gimple_omp_body_ptr (stmt), ctx);
+           ctx->bind = bind;
+           process_omp_for (as_a<gomp_for *> (stmt), seq, ctx);
+           break;
+         }
+       case GIMPLE_BIND:
+         {
+           gbind *bind = as_a<gbind *> (stmt);
+           ctx->bind = bind;
+           walk_omp_for_loops (gimple_bind_body_ptr (bind), ctx);
+           ctx->bind = bind;
+           break;
+         }
+       case GIMPLE_TRY:
+         {
+           gbind *bind = ctx->bind;
+           walk_omp_for_loops (gimple_try_eval_ptr (as_a<gtry *> (stmt)),
+                               ctx);
+           walk_omp_for_loops (gimple_try_cleanup_ptr (as_a<gtry *> (stmt)),
+                               ctx);
+           ctx->bind = bind;
+           break;
+         }
+
+       case GIMPLE_CATCH:
+         {
+           gbind *bind = ctx->bind;
+           walk_omp_for_loops (
+               gimple_catch_handler_ptr (as_a<gcatch *> (stmt)), ctx);
+           ctx->bind = bind;
+           break;
+         }
+
+       case GIMPLE_EH_FILTER:
+         {
+           gbind *bind = ctx->bind;
+           walk_omp_for_loops (gimple_eh_filter_failure_ptr (stmt), ctx);
+           ctx->bind = bind;
+           break;
+         }
+
+       case GIMPLE_EH_ELSE:
+         {
+           gbind *bind = ctx->bind;
+           geh_else *eh_else_stmt = as_a<geh_else *> (stmt);
+           walk_omp_for_loops (gimple_eh_else_n_body_ptr (eh_else_stmt), ctx);
+           walk_omp_for_loops (gimple_eh_else_e_body_ptr (eh_else_stmt), ctx);
+           ctx->bind = bind;
+           break;
+         }
+         break;
+
+       case GIMPLE_WITH_CLEANUP_EXPR:
+         {
+           gbind *bind = ctx->bind;
+           walk_omp_for_loops (gimple_wce_cleanup_ptr (stmt), ctx);
+           ctx->bind = bind;
+           break;
+         }
+
+       case GIMPLE_TRANSACTION:
+         {
+           gbind *bind = ctx->bind;
+           auto trans = as_a<gtransaction *> (stmt);
+           walk_omp_for_loops (gimple_transaction_body_ptr (trans), ctx);
+           ctx->bind = bind;
+           break;
+         }
+
+       case GIMPLE_ASSUME:
+         break;
+
+       default:
+         gcc_assert (!gimple_has_substatements (stmt));
+         continue;
+       }
+    }
+
+  return true;
+}
+
+static unsigned int
+execute_omp_transform_loops ()
+{
+  gimple_seq body = gimple_body (current_function_decl);
+  walk_ctx ctx;
+  ctx.inner_combined_loop = NULL;
+  ctx.bind = NULL;
+  walk_omp_for_loops (&body, &ctx);
+
+  return 0;
+}
+
+namespace
+{
+
+const pass_data pass_data_omp_transform_loops = {
+  GIMPLE_PASS,           /* type */
+  "omp_transform_loops", /* name */
+  OPTGROUP_OMP,          /* optinfo_flags */
+  TV_NONE,               /* tv_id */
+  PROP_gimple_any,       /* properties_required */
+  0,                     /* properties_provided */
+  0,                     /* properties_destroyed */
+  0,                     /* todo_flags_start */
+  0,                     /* todo_flags_finish */
+};
+
+class pass_omp_transform_loops : public gimple_opt_pass
+{
+public:
+  pass_omp_transform_loops (gcc::context *ctxt)
+      : gimple_opt_pass (pass_data_omp_transform_loops, ctxt)
+  {
+  }
+
+  /* opt_pass methods: */
+  virtual unsigned int
+  execute (function *)
+  {
+    return execute_omp_transform_loops ();
+  }
+  virtual bool
+  gate (function *)
+  {
+    return flag_openmp || flag_openmp_simd;
+  }
+
+}; // class pass_omp_transform_loops
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_omp_transform_loops (gcc::context *ctxt)
+{
+  return new pass_omp_transform_loops (ctxt);
+}
diff --git a/gcc/params.opt b/gcc/params.opt
index 41d8bef245e..cf5e09bf9e0 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -820,6 +820,15 @@ Enum(openacc_privatization) String(quiet) Value(OPENACC_PRIVATIZATION_QUIET)
 EnumValue
 Enum(openacc_privatization) String(noisy) Value(OPENACC_PRIVATIZATION_NOISY)

+-param=omp-unroll-full-max-iterations=
+Common Joined UInteger Var(param_omp_unroll_full_max_iterations) Init(5) Param Optimization
+The maximum number of iterations of a loop for which an 'omp unroll' directive on the loop without a
+clause will be turned into an 'omp unroll full'.
+
+-param=omp-unroll-default-factor=
+Common Joined UInteger Var(param_omp_unroll_default_factor) Init(0) Param Optimization
+The unroll factor that will be used for loops that have an 'omp unroll partial' directive without an explicit unroll factor.
+
 -param=parloops-chunk-size=
 Common Joined UInteger Var(param_parloops_chunk_size) Param Optimization
 Chunk size of omp schedule for loops parallelized by parloops.
diff --git a/gcc/passes.def b/gcc/passes.def
index c9a8f19747b..5a5f3616cf8 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_diagnose_omp_blocks);
   NEXT_PASS (pass_diagnose_tm_blocks);
   NEXT_PASS (pass_omp_oacc_kernels_decompose);
+  NEXT_PASS (pass_omp_transform_loops);
   NEXT_PASS (pass_lower_omp);
   NEXT_PASS (pass_lower_cf);
   NEXT_PASS (pass_lower_tm);
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-1.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-1.f90
new file mode 100644
index 00000000000..4cfac4c5e26
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-1.f90
@@ -0,0 +1,277 @@
+subroutine test1
+  implicit none
+  integer :: i
+
+  !$omp do ! { dg-error {missing canonical loop nest after \!\$OMP DO at \(1\)} }
+  !$omp unroll ! { dg-warning {\!\$OMP UNROLL without PARTIAL clause at \(1\) turns loop into a non-loop} }
+  do i = 1,100
+     call dummy(i)
+  end do
+end subroutine test1
+
+subroutine test2
+  implicit none
+  integer :: i
+
+  !$omp do ! { dg-error {missing canonical loop nest after \!\$OMP DO at \(1\)} }
+  !$omp unroll ! { dg-warning {\!\$OMP UNROLL without PARTIAL clause at \(1\) turns loop into a non-loop} }
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$omp end unroll
+end subroutine test2
+
+subroutine test3
+  implicit none
+  integer :: i
+
+  !$omp do ! { dg-error {missing canonical loop nest after \!\$OMP DO at \(1\)} }
+  !$omp unroll ! { dg-warning {\!\$OMP UNROLL without PARTIAL clause at \(1\) turns loop into a non-loop} }
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$omp end do
+end subroutine test3
+
+subroutine test4
+  implicit none
+  integer :: i
+
+  !$omp do ! { dg-error {missing canonical loop nest after \!\$OMP DO at \(1\)} }
+  !$omp unroll ! { dg-warning {\!\$OMP UNROLL without PARTIAL clause at \(1\) turns loop into a non-loop} }
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$omp end unroll
+  !$omp end do
+end subroutine test4
+
+subroutine test5
+  implicit none
+  integer :: i
+
+  !$omp unroll ! { dg-error {missing canonical loop nest after \!\$OMP UNROLL at \(1\)} }
+  !$omp unroll ! { dg-warning {\!\$OMP UNROLL without PARTIAL clause at \(1\) turns loop into a non-loop} }
+  do i = 1,100
+     call dummy(i)
+  end do
+end subroutine test5
+
+subroutine test6
+  implicit none
+  integer :: i
+
+  !$omp unroll ! { dg-error {missing canonical loop nest after \!\$OMP UNROLL at \(1\)} }
+  !$omp unroll ! { dg-warning {\!\$OMP UNROLL without PARTIAL clause at \(1\) turns loop into a non-loop} }
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$omp end unroll
+end subroutine test6
+
+subroutine test7
+  implicit none
+  integer :: i
+
+  !$omp unroll ! { dg-error {missing canonical loop nest after \!\$OMP UNROLL at \(1\)} }
+  !$omp unroll ! { dg-warning {\!\$OMP UNROLL without PARTIAL clause at \(1\) turns loop into a non-loop} }
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$omp end unroll
+end subroutine test7
+
+subroutine test8
+  implicit none
+  integer :: i
+
+  !$omp unroll ! { dg-error {missing canonical loop nest after \!\$OMP UNROLL at \(1\)} }
+  !$omp unroll ! { dg-warning {\!\$OMP UNROLL without PARTIAL clause at \(1\) turns loop into a non-loop} }
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$omp end unroll
+  !$omp end unroll
+end subroutine test8
+
+subroutine test9
+  implicit none
+  integer :: i
+
+  !$omp do ! { dg-error {missing canonical loop nest after \!\$OMP DO at \(1\)} }
+  !$omp unroll full ! { dg-warning {\!\$OMP UNROLL with FULL clause at \(1\) turns loop into a non-loop} }
+  do i = 1,100
+     call dummy(i)
+  end do
+end subroutine test9
+
+subroutine test10
+  implicit none
+  integer :: i
+
+  !$omp unroll full ! { dg-error {missing canonical loop nest after \!\$OMP UNROLL at \(1\)} }
+  !$omp unroll full ! { dg-warning {\!\$OMP UNROLL with FULL clause at \(1\) turns loop into a non-loop} }
+  do i = 1,100
+     call dummy(i)
+  end do
+end subroutine test10
+
+subroutine test11
+  implicit none
+  integer :: i,j
+
+  !$omp do ! { dg-error {missing canonical loop nest after \!\$OMP DO at \(1\)} }
+  !$omp unroll ! { dg-warning {\!\$OMP UNROLL without PARTIAL clause at \(1\) turns loop into a non-loop} }
+  do i = 1,100
+     call dummy(i)
+     do j = 1,100
+        call dummy2(i,j)
+     end do
+  end do
+end subroutine test11
+
+subroutine test12
+  implicit none
+  integer :: i,j
+
+  !$omp do ! { dg-error {missing canonical loop nest after \!\$OMP DO at \(1\)} }
+  !$omp unroll ! { dg-warning {\!\$OMP UNROLL without PARTIAL clause at \(1\) turns loop into a non-loop} }
+  do i = 1,100
+     !$omp unroll ! { dg-error {missing canonical loop nest after \!\$OMP UNROLL at \(1\)} }
+  !$omp unroll ! { dg-warning {\!\$OMP UNROLL without PARTIAL clause at \(1\) turns loop into a non-loop} }
+     call dummy(i) ! { dg-error {Unexpected CALL statement at \(1\)} }
+  !$omp unroll
+     do j = 1,100
+        call dummy2(i,j)
+     end do
+  end do
+end subroutine test12
+
+subroutine test13
+  implicit none
+  integer :: i,j
+
+  !$omp do ! { dg-error {missing canonical loop nest after \!\$OMP DO at \(1\)} }
+  !$omp unroll ! { dg-warning {\!\$OMP UNROLL without PARTIAL clause at \(1\) turns loop into a non-loop} }
+  do i = 1,100
+     !$omp unroll ! { dg-error {missing canonical loop nest after \!\$OMP UNROLL at \(1\)} }
+     !$omp unroll ! { dg-warning {\!\$OMP UNROLL without PARTIAL clause at \(1\) turns loop into a non-loop} }
+     !$omp unroll
+     do j = 1,100
+        call dummy2(i,j)
+     end do
+     call dummy(i)
+  end do
+end subroutine test13
+
+subroutine test14
+  implicit none
+  integer :: i
+
+  !$omp unroll ! { dg-error {missing canonical loop nest after \!\$OMP UNROLL at \(1\)} }
+  !$omp unroll ! { dg-warning {\!\$OMP UNROLL without PARTIAL clause at \(1\) turns loop into a non-loop} }
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$omp end unroll
+  !$omp end unroll
+  !$omp end unroll ! { dg-error {Unexpected \!\$OMP END UNROLL statement at \(1\)} }
+end subroutine test14
+
+subroutine test15
+  implicit none
+  integer :: i
+
+  !$omp do ! { dg-error {missing canonical loop nest after \!\$OMP DO at \(1\)} }
+  !$omp unroll ! { dg-warning {\!\$OMP UNROLL without PARTIAL clause at \(1\) turns loop into a non-loop} }
+  !$omp unroll
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$omp end unroll
+  !$omp end unroll
+  !$omp end unroll ! { dg-error {Unexpected \!\$OMP END UNROLL statement at \(1\)} }
+end subroutine test15
+
+subroutine test16
+  implicit none
+  integer :: i
+
+  !$omp do
+  !$omp unroll partial(1)
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$omp end unroll
+end subroutine test16
+
+subroutine test17
+  implicit none
+  integer :: i
+
+  !$omp do
+  !$omp unroll partial(2)
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$omp end unroll
+end subroutine test17
+
+subroutine test18
+  implicit none
+  integer :: i
+
+  !$omp do
+  !$omp unroll partial(0) ! { dg-error {PARTIAL clause argument not constant positive integer at \(1\)} }
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$omp end unroll
+end subroutine test18
+
+subroutine test19
+  implicit none
+  integer :: i
+
+  !$omp do
+  !$omp unroll partial(-10) ! { dg-error {PARTIAL clause argument not constant positive integer at \(1\)} }
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$omp end unroll
+end subroutine test19
+
+subroutine test20
+  implicit none
+  integer :: i
+
+  !$omp do
+  !$omp unroll partial
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$omp end unroll
+end subroutine test20
+
+subroutine test21
+  implicit none
+  integer :: i
+
+  !$omp unroll partial ! { dg-error {\!\$OMP UNROLL invalid around DO CONCURRENT loop at \(1\)} }
+  do concurrent  (i = 1:100)
+     call dummy(i) ! { dg-error {Subroutine call to 'dummy' in DO CONCURRENT block at \(1\) is not PURE} }
+  end do
+  !$omp end unroll
+end subroutine test21
+
+subroutine test22
+  implicit none
+  integer :: i
+
+  !$omp do
+  !$omp unroll partial
+  do concurrent  (i = 1:100)  ! { dg-error {\!\$OMP DO cannot be a DO CONCURRENT loop at \(1\)} }
+     call dummy(i) ! { dg-error {Subroutine call to 'dummy' in DO CONCURRENT block at \(1\) is not PURE} }
+  end do
+  !$omp end unroll
+end subroutine test22
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-10.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-10.f90
new file mode 100644
index 00000000000..2c4a45d3054
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-10.f90
@@ -0,0 +1,7 @@
+subroutine test(i)
+  ! TODO The checking that produces this message comes too late. Not important, but would be nice to have.
+  !$omp unroll full ! { dg-error {missing canonical loop nest after \!\$OMP UNROLL at \(1\)} "" { xfail *-*-* } }
+  call dummy0 ! { dg-error {Unexpected CALL statement at \(1\)} }
+end subroutine test ! { dg-error {Unexpected END statement at \(1\)} }
+
+! { dg-error "Unexpected end of file" "" { target "*-*-*" } 0 }
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-11.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-11.f90
new file mode 100644
index 00000000000..3f0d5981e9b
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-11.f90
@@ -0,0 +1,75 @@
+subroutine test1(i)
+  implicit none
+  integer :: i
+  !$omp unroll ! { dg-error {missing canonical loop nest after \!\$OMP UNROLL at \(1\)} }
+  !$omp unroll full ! { dg-warning {\!\$OMP UNROLL with FULL clause at \(1\) turns loop into a non-loop} }
+  do i = 1,10
+     call dummy(i)
+  end do
+end subroutine test1
+
+subroutine test2(i)
+  implicit none
+  integer :: i
+  !$omp unroll full ! { dg-error {missing canonical loop nest after \!\$OMP UNROLL at \(1\)} }
+  !$omp unroll full ! { dg-warning {\!\$OMP UNROLL with FULL clause at \(1\) turns loop into a non-loop} }
+  !$omp unroll
+  do i = 1,10
+     call dummy(i)
+  end do
+end subroutine test2
+
+subroutine test3(i)
+  implicit none
+  integer :: i
+  !$omp unroll full ! { dg-error {missing canonical loop nest after \!\$OMP UNROLL at \(1\)} }
+  !$omp unroll ! { dg-warning {\!\$OMP UNROLL without PARTIAL clause at \(1\) turns loop into a non-loop} }
+  !$omp unroll full
+  !$omp unroll
+  do i = 1,10
+     call dummy(i)
+  end do
+end subroutine test3
+
+subroutine test4(i)
+  implicit none
+  integer :: i
+  !$omp do ! { dg-error {missing canonical loop nest after \!\$OMP DO at \(1\)} }
+  !$omp unroll full ! { dg-warning {\!\$OMP UNROLL with FULL clause at \(1\) turns loop into a non-loop} }
+  do i = 1,10
+     call dummy(i)
+  end do
+end subroutine test4
+
+subroutine test5(i)
+  implicit none
+  integer :: i
+  !$omp do ! { dg-error {missing canonical loop nest after \!\$OMP DO at \(1\)} }
+  !$omp unroll full ! { dg-warning {\!\$OMP UNROLL with FULL clause at \(1\) turns loop into a non-loop} }
+  !$omp unroll
+  do i = 1,10
+     call dummy(i)
+  end do
+end subroutine test5
+
+subroutine test6(i)
+  implicit none
+  integer :: i
+  !$omp do ! { dg-error {missing canonical loop nest after \!\$OMP DO at \(1\)} }
+  !$omp unroll full ! { dg-warning {\!\$OMP UNROLL with FULL clause at \(1\) turns loop into a non-loop} }
+  !$omp unroll
+  do i = 1,10
+     call dummy(i)
+  end do
+end subroutine test6
+
+subroutine test7(i)
+  implicit none
+  integer :: i
+  !$omp loop ! { dg-error {missing canonical loop nest after \!\$OMP LOOP at \(1\)} }
+  !$omp unroll full ! { dg-warning {\!\$OMP UNROLL with FULL clause at \(1\) turns loop into a non-loop} }
+  !$omp unroll
+  do i = 1,10
+     call dummy(i)
+  end do
+end subroutine test7
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-12.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-12.f90
new file mode 100644
index 00000000000..0d8f3f5a2c0
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-12.f90
@@ -0,0 +1,29 @@
+subroutine test1
+  implicit none
+  integer :: i
+  !$omp unroll ! { dg-error {\!\$OMP UNROLL invalid around DO WHILE or DO without loop control at \(1\)} }
+  do while (i < 10)
+     call dummy(i)
+     i = i + 1
+  end do
+end subroutine test1
+
+subroutine test2
+  implicit none
+  integer :: i
+  !$omp unroll ! { dg-error {\!\$OMP UNROLL invalid around DO WHILE or DO without loop control at \(1\)} }
+  do
+     call dummy(i)
+     i = i + 1
+     if (i >= 10) exit
+  end do
+end subroutine test2
+
+subroutine test3
+  implicit none
+  integer :: i
+  !$omp unroll ! { dg-error {\!\$OMP UNROLL invalid around DO CONCURRENT loop at \(1\)} }
+  do concurrent (i=1:10)
+     call dummy(i) ! { dg-error {Subroutine call to 'dummy' in DO CONCURRENT block at \(1\) is not PURE} }
+  end do
+end subroutine test3
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-2.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-2.f90
new file mode 100644
index 00000000000..8496f9eefe0
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-2.f90
@@ -0,0 +1,22 @@
+! { dg-additional-options "-fdump-tree-original" }
+
+subroutine test1
+  implicit none
+  integer :: i
+  !$omp unroll
+  do i = 1,10
+     call dummy(i)
+  end do
+end subroutine test1
+
+subroutine test2
+  implicit none
+  integer :: i
+  !$omp unroll full
+  do i = 1,10
+     call dummy(i)
+  end do
+end subroutine test2
+
+! { dg-final { scan-tree-dump-times "#pragma omp loop_transform unroll_none" 1 "original" } }
+! { dg-final { scan-tree-dump-times "#pragma omp loop_transform unroll_full" 1 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-3.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-3.f90
new file mode 100644
index 00000000000..0d233c9ab6f
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-3.f90
@@ -0,0 +1,17 @@
+! { dg-additional-options "-fdump-tree-omp_transform_loops" }
+! { dg-additional-options "-fdump-tree-original" }
+
+subroutine test1
+  implicit none
+  integer :: i
+  !$omp unroll full
+  do i = 1,10
+     call dummy(i)
+  end do
+end subroutine test1
+
+! Loop should be removed with 10 copies of the body remaining
+
+! { dg-final { scan-tree-dump-times "dummy" 10 "omp_transform_loops" } }
+! { dg-final { scan-tree-dump "#pragma omp loop_transform" "original" } }
+! { dg-final { scan-tree-dump-not "#pragma omp" "omp_transform_loops" } }
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-4.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-4.f90
new file mode 100644
index 00000000000..fcccdb0bcf8
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-4.f90
@@ -0,0 +1,18 @@
+! { dg-additional-options "-fdump-tree-omp_transform_loops" }
+! { dg-additional-options "-fdump-tree-original" }
+
+subroutine test1
+  implicit none
+  integer :: i
+  !$omp unroll
+  do i = 1,100
+     call dummy(i)
+  end do
+end subroutine test1
+
+! Loop should not be unrolled, but the internal representation should be lowered
+
+! { dg-final { scan-tree-dump "#pragma omp loop_transform" "original" } }
+! { dg-final { scan-tree-dump-not "#pragma omp" "omp_transform_loops" } }
+! { dg-final { scan-tree-dump-times "dummy" 1 "omp_transform_loops" } }
+! { dg-final { scan-tree-dump-times {if \(i\.[0-9]+ < .+?.+goto.+else goto.*?$} 1 "omp_transform_loops" } }
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-5.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-5.f90
new file mode 100644
index 00000000000..ee82b4d150c
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-5.f90
@@ -0,0 +1,18 @@
+! { dg-additional-options "-fdump-tree-omp_transform_loops -fopt-info-omp-optimized-missed" }
+! { dg-additional-options "-fdump-tree-original" }
+
+subroutine test1
+  implicit none
+  integer :: i
+  !$omp unroll partial ! { dg-optimized {'partial' clause without unrolling factor turned into 'partial\(5\)' clause} }
+  do i = 1,100
+     call dummy(i)
+  end do
+end subroutine test1
+
+! Loop should be unrolled 5 times and the internal representation should be lowered.
+
+! { dg-final { scan-tree-dump {#pragma omp loop_transform unroll_partial} "original" } }
+! { dg-final { scan-tree-dump-not "#pragma omp" "omp_transform_loops" } }
+! { dg-final { scan-tree-dump-times "dummy" 5 "omp_transform_loops" } }
+! { dg-final { scan-tree-dump-times {if \(i\.[0-9]+ < .+?.+goto.+else goto.*?$} 1 "omp_transform_loops" } }
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-6.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-6.f90
new file mode 100644
index 00000000000..237e6b83087
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-6.f90
@@ -0,0 +1,19 @@
+! { dg-additional-options "--param=omp-unroll-default-factor=10" }
+! { dg-additional-options "-fdump-tree-omp_transform_loops -fopt-info-omp-optimized-missed" }
+! { dg-additional-options "-fdump-tree-original" }
+
+subroutine test1
+  implicit none
+  integer :: i
+  !$omp unroll partial ! { dg-optimized {'partial' clause without unrolling factor turned into 'partial\(10\)' clause} }
+  do i = 1,100
+     call dummy(i)
+  end do
+end subroutine test1
+
+! Loop should be unrolled 10 times and the internal representation should be lowered.
+
+! { dg-final { scan-tree-dump {#pragma omp loop_transform unroll_partial} "original" } }
+! { dg-final { scan-tree-dump-not "#pragma omp" "omp_transform_loops" } }
+! { dg-final { scan-tree-dump-times "dummy" 10 "omp_transform_loops" } }
+! { dg-final { scan-tree-dump-times {if \(i\.[0-9]+ < .+?.+goto.+else goto.*?$} 1 "omp_transform_loops" } }
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-7.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-7.f90
new file mode 100644
index 00000000000..8feaf7dc4d3
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-7.f90
@@ -0,0 +1,62 @@
+! { dg-additional-options "--param=omp-unroll-default-factor=10" }
+! { dg-additional-options "-fdump-tree-omp_transform_loops -fopt-info-omp-optimized-missed" }
+! { dg-additional-options "-fdump-tree-original" }
+
+subroutine test1
+  implicit none
+  integer :: i,j
+  !$omp parallel do
+  !$omp unroll partial(10)
+  do i = 1,100
+     !$omp parallel do
+     do j = 1,100
+     call dummy(i,j)
+     end do
+  end do
+
+  !$omp taskloop
+  !$omp unroll partial(10)
+  do i = 1,100
+     !$omp parallel do
+     do j = 1,100
+     call dummy(i,j)
+     end do
+  end do
+
+end subroutine test1
+
+! For the "parallel do", there should be 11 "omp for" loops, 10 for the inner loop, 1 for outer,
+! for the "taskloop", there should be 10 "omp for" loops for the unrolled loop
+! { dg-final { scan-tree-dump-times {#pragma omp for} 21 "omp_transform_loops" } }
+! ... and two outer taskloops plus the one taskloops
+! { dg-final { scan-tree-dump-times {#pragma omp taskloop} 3 "omp_transform_loops" } }
+
+
+subroutine test2
+  implicit none
+  integer :: i,j
+  do i = 1,100
+  !$omp teams distribute
+  !$omp unroll partial(10)
+     do j = 1,100
+     call dummy(i,j)
+     end do
+  end do
+
+  do i = 1,100
+  !$omp target teams distribute
+  !$omp unroll partial(10)
+     do j = 1,100
+     call dummy(i,j)
+     end do
+  end do
+end subroutine test2
+
+! { dg-final { scan-tree-dump-times {#pragma omp distribute} 2 "omp_transform_loops" } }
+
+! After unrolling there should be 10 copies of each loop body for each loop-nest
+! { dg-final { scan-tree-dump-times "dummy" 40 "omp_transform_loops" } }
+
+! { dg-final { scan-tree-dump-not {#pragma omp loop_transform} "original" } }
+! { dg-final { scan-tree-dump-times {#pragma omp for nowait unroll_partial\(10\)} 1 "original" } }
+! { dg-final { scan-tree-dump-times {#pragma omp distribute private\(j\) unroll_partial\(10\)} 2 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-8.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-8.f90
new file mode 100644
index 00000000000..9b91e5c5f98
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-8.f90
@@ -0,0 +1,22 @@
+! { dg-additional-options "-fdump-tree-omp_transform_loops -fopt-info-omp-optimized-missed" }
+! { dg-additional-options "-fdump-tree-original" }
+
+subroutine test1
+  implicit none
+  integer :: i
+  !$omp parallel do collapse(1)
+  !$omp unroll partial(4) ! { dg-optimized {replaced consecutive 'omp unroll' directives by 'omp unroll auto\(24\)'} }
+  !$omp unroll partial(3)
+  !$omp unroll partial(2)
+  !$omp unroll partial(1)
+  do i = 1,100
+     call dummy(i)
+  end do
+end subroutine test1
+
+! Loop should be unrolled 1 * 2 * 3 * 4 = 24 times
+
+! { dg-final { scan-tree-dump {#pragma omp for nowait collapse\(1\) unroll_partial\(4\) unroll_partial\(3\) unroll_partial\(2\) unroll_partial\(1\)} "original" } }
+! { dg-final { scan-tree-dump-not "#pragma omp loop_transform" "omp_transform_loops" } }
+! { dg-final { scan-tree-dump-times "dummy" 24 "omp_transform_loops" } }
+! { dg-final { scan-tree-dump-times {#pragma omp for} 1 "omp_transform_loops" } }
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-9.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-9.f90
new file mode 100644
index 00000000000..849d4e77984
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-9.f90
@@ -0,0 +1,18 @@
+! { dg-additional-options "-fdump-tree-omp_transform_loops -fopt-info-omp-optimized-missed" }
+! { dg-additional-options "-fdump-tree-original" }
+
+subroutine test1
+  implicit none
+  integer :: i
+  !$omp unroll full ! { dg-optimized {removed useless 'omp unroll auto' directives preceding 'omp unroll full'} }
+  !$omp unroll partial(3)
+  !$omp unroll partial(2)
+  !$omp unroll partial(1)
+  do i = 1,100
+     call dummy(i)
+  end do
+end subroutine test1
+
+! { dg-final { scan-tree-dump {#pragma omp loop_transform unroll_full unroll_partial\(3\) unroll_partial\(2\) unroll_partial\(1\)} "original" } }
+! { dg-final { scan-tree-dump-not "#pragma omp unroll" "omp_transform_loops" } }
+! { dg-final { scan-tree-dump-times "dummy" 100 "omp_transform_loops" } }
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-no-clause-1.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-no-clause-1.f90
new file mode 100644
index 00000000000..079c0fdd75b
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-no-clause-1.f90
@@ -0,0 +1,20 @@
+! { dg-additional-options "-fopt-info-optimized -fdump-tree-omp_transform_loops-details" }
+
+subroutine test
+  !$omp unroll ! { dg-optimized {assigned 'full' clause to 'omp unroll' with small constant number of iterations} }
+  do i = 1,5
+     do j = 1,10
+        call dummy3(i,j)
+     end do
+  end do
+  !$omp end unroll
+
+  !$omp unroll
+  do i = 1,6
+     do j = 1,6
+        call dummy3(i,j)
+     end do
+  end do
+  !$omp end unroll
+end subroutine test
+
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-no-clause-2.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-no-clause-2.f90
new file mode 100644
index 00000000000..4893ba46e4e
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-no-clause-2.f90
@@ -0,0 +1,21 @@
+! { dg-additional-options "--param=omp-unroll-full-max-iterations=20" }
+! { dg-additional-options "-fopt-info-optimized -fdump-tree-omp_transform_loops-details" }
+
+subroutine test
+  !$omp unroll ! { dg-optimized {assigned 'full' clause to 'omp unroll' with small constant number of iterations} }
+  do i = 1,20
+     do j = 1,10
+        call dummy3(i,j)
+     end do
+  end do
+  !$omp end unroll
+
+  !$omp unroll
+  do i = 1,21
+     do j = 1,6
+        call dummy3(i,j)
+     end do
+  end do
+  !$omp end unroll
+end subroutine test
+
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-no-clause-3.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-no-clause-3.f90
new file mode 100644
index 00000000000..60f25d3abe6
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-no-clause-3.f90
@@ -0,0 +1,23 @@
+! { dg-additional-options "--param=omp-unroll-full-max-iterations=10" }
+! { dg-additional-options "--param=omp-unroll-default-factor=10" }
+! { dg-additional-options "-fopt-info-optimized -fdump-tree-omp_transform_loops-details" }
+
+subroutine test
+  !$omp unroll ! { dg-optimized {added 'partial\(10\)' clause to 'omp unroll' directive} }
+  do i = 1,20
+     do j = 1,10
+        call dummy3(i,j)
+     end do
+  end do
+  !$omp end unroll
+
+  !$omp unroll ! { dg-optimized {added 'partial\(10\)' clause to 'omp unroll' directive} }
+  do i = 1,21
+  !$omp unroll ! { dg-optimized {assigned 'full' clause to 'omp unroll' with small constant number of iterations} }
+     do j = 1,6
+        call dummy3(i,j)
+     end do
+  end do
+  !$omp end unroll
+end subroutine test
+
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-simd-1.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-simd-1.f90
new file mode 100644
index 00000000000..f22debbb78f
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-simd-1.f90
@@ -0,0 +1,244 @@
+! { dg-options "-fno-openmp -fopenmp-simd" }
+
+subroutine test1
+  implicit none
+  integer :: i
+
+  !$omp simd ! { dg-error {missing canonical loop nest after \!\$OMP SIMD at \(1\)} }
+  !$omp unroll ! { dg-warning {\!\$OMP UNROLL without PARTIAL clause at \(1\) turns loop into a non-loop} }
+  do i = 1,100
+     call dummy(i)
+  end do
+end subroutine test1
+
+subroutine test2
+  implicit none
+  integer :: i
+
+  !$omp simd ! { dg-error {missing canonical loop nest after \!\$OMP SIMD at \(1\)} }
+  !$omp unroll ! { dg-warning {\!\$OMP UNROLL without PARTIAL clause at \(1\) turns loop into a non-loop} }
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$omp end unroll
+end subroutine test2
+
+subroutine test3
+  implicit none
+  integer :: i
+
+  !$omp simd ! { dg-error {missing canonical loop nest after \!\$OMP SIMD at \(1\)} }
+  !$omp unroll ! { dg-warning {\!\$OMP UNROLL without PARTIAL clause at \(1\) turns loop into a non-loop} }
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$omp end do
+end subroutine test3
+
+subroutine test4
+  implicit none
+  integer :: i
+
+  !$omp simd ! { dg-error {missing canonical loop nest after \!\$OMP SIMD at \(1\)} }
+  !$omp unroll ! { dg-warning {\!\$OMP UNROLL without PARTIAL clause at \(1\) turns loop into a non-loop} }
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$omp end unroll
+  !$omp end do
+end subroutine test4
+
+subroutine test5
+  implicit none
+  integer :: i
+
+  !$omp unroll ! { dg-error {missing canonical loop nest after \!\$OMP UNROLL at \(1\)} }
+  !$omp unroll ! { dg-warning {\!\$OMP UNROLL without PARTIAL clause at \(1\) turns loop into a non-loop} }
+  do i = 1,100
+     call dummy(i)
+  end do
+end subroutine test5
+
+subroutine test6
+  implicit none
+  integer :: i
+
+  !$omp unroll ! { dg-error {missing canonical loop nest after \!\$OMP UNROLL at \(1\)} }
+  !$omp unroll ! { dg-warning {\!\$OMP UNROLL without PARTIAL clause at \(1\) turns loop into a non-loop} }
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$omp end unroll
+end subroutine test6
+
+subroutine test7
+  implicit none
+  integer :: i
+
+  !$omp unroll ! { dg-error {missing canonical loop nest after \!\$OMP UNROLL at \(1\)} }
+  !$omp unroll ! { dg-warning {\!\$OMP UNROLL without PARTIAL clause at \(1\) turns loop into a non-loop} }
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$omp end unroll
+  !$omp end unroll
+end subroutine test7
+
+subroutine test8
+  implicit none
+  integer :: i
+
+  !$omp simd ! { dg-error {missing canonical loop nest after \!\$OMP SIMD at \(1\)} }
+  !$omp unroll full ! { dg-warning {\!\$OMP UNROLL with FULL clause at \(1\) turns loop into a non-loop} }
+  do i = 1,100
+     call dummy(i)
+  end do
+end subroutine test8
+
+subroutine test9
+  implicit none
+  integer :: i
+
+  !$omp unroll full ! { dg-error {missing canonical loop nest after \!\$OMP UNROLL at \(1\)} }
+  !$omp unroll full ! { dg-warning {\!\$OMP UNROLL with FULL clause at \(1\) turns loop into a non-loop} }
+  do i = 1,100
+     call dummy(i)
+  end do
+end subroutine test9
+
+subroutine test10
+  implicit none
+  integer :: i,j
+
+  !$omp simd ! { dg-error {missing canonical loop nest after \!\$OMP SIMD at \(1\)} }
+  !$omp unroll ! { dg-warning {\!\$OMP UNROLL without PARTIAL clause at \(1\) turns loop into a non-loop} }
+  do i = 1,100
+     call dummy(i)
+     do j = 1,100
+        call dummy2(i,j)
+     end do
+  end do
+end subroutine test10
+
+subroutine test11
+  implicit none
+  integer :: i,j
+
+  !$omp simd ! { dg-error {missing canonical loop nest after \!\$OMP SIMD at \(1\)} }
+  !$omp unroll ! { dg-warning {\!\$OMP UNROLL without PARTIAL clause at \(1\) turns loop into a non-loop} }
+  do i = 1,100
+     !$omp unroll ! { dg-error {missing canonical loop nest after \!\$OMP UNROLL at \(1\)} }
+  !$omp unroll ! { dg-warning {\!\$OMP UNROLL without PARTIAL clause at \(1\) turns loop into a non-loop} }
+     call dummy(i) ! { dg-error {Unexpected CALL statement at \(1\)} }
+  !$omp unroll
+     do j = 1,100
+        call dummy2(i,j)
+     end do
+  end do
+end subroutine test11
+
+subroutine test12
+  implicit none
+  integer :: i,j
+
+  !$omp simd ! { dg-error {missing canonical loop nest after \!\$OMP SIMD at \(1\)} }
+  !$omp unroll ! { dg-warning {\!\$OMP UNROLL without PARTIAL clause at \(1\) turns loop into a non-loop} }
+  do i = 1,100
+     !$omp unroll ! { dg-error {missing canonical loop nest after \!\$OMP UNROLL at \(1\)} }
+     !$omp unroll ! { dg-warning {\!\$OMP UNROLL without PARTIAL clause at \(1\) turns loop into a non-loop} }
+     !$omp unroll
+     do j = 1,100
+        call dummy2(i,j)
+     end do
+     call dummy(i)
+  end do
+end subroutine test12
+
+subroutine test13
+  implicit none
+  integer :: i
+
+  !$omp unroll ! { dg-error {missing canonical loop nest after \!\$OMP UNROLL at \(1\)} }
+  !$omp unroll ! { dg-warning {\!\$OMP UNROLL without PARTIAL clause at \(1\) turns loop into a non-loop} }
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$omp end unroll
+  !$omp end unroll
+  !$omp end unroll ! { dg-error {Unexpected \!\$OMP END UNROLL statement at \(1\)} }
+end subroutine test13
+
+subroutine test14
+  implicit none
+  integer :: i
+
+  !$omp simd  ! { dg-error {missing canonical loop nest after \!\$OMP SIMD at \(1\)} }
+  !$omp unroll ! { dg-warning {\!\$OMP UNROLL without PARTIAL clause at \(1\) turns loop into a non-loop} }
+  !$omp unroll
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$omp end unroll
+  !$omp end unroll
+  !$omp end unroll ! { dg-error {Unexpected \!\$OMP END UNROLL statement at \(1\)} }
+end subroutine test14
+
+subroutine test15
+  implicit none
+  integer :: i
+
+  !$omp simd
+  !$omp unroll partial(1)
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$omp end unroll
+end subroutine test15
+
+subroutine test16
+  implicit none
+  integer :: i
+
+  !$omp simd
+  !$omp unroll partial(2)
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$omp end unroll
+end subroutine test16
+
+subroutine test17
+  implicit none
+  integer :: i
+
+  !$omp simd
+  !$omp unroll partial(0) ! { dg-error {PARTIAL clause argument not constant positive integer at \(1\)} }
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$omp end unroll
+end subroutine test17
+
+subroutine test18
+  implicit none
+  integer :: i
+
+  !$omp simd
+  !$omp unroll partial(-10) ! { dg-error {PARTIAL clause argument not constant positive integer at \(1\)} }
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$omp end unroll
+end subroutine test18
+
+subroutine test19
+  implicit none
+  integer :: i
+
+  !$omp simd
+  !$omp unroll partial
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$omp end unroll
+end subroutine test19
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-simd-2.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-simd-2.f90
new file mode 100644
index 00000000000..faaa37c5d7e
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-simd-2.f90
@@ -0,0 +1,57 @@
+! { dg-do run }
+! { dg-options "-O2 -fopenmp-simd" }
+! { dg-additional-options "-fdump-tree-original" }
+! { dg-additional-options "-fdump-tree-omp_transform_loops" }
+
+module test_functions
+  contains
+  integer function compute_sum() result(sum)
+    implicit none
+
+    integer :: i,j
+
+    !$omp simd
+    do i = 1,10,3
+       !$omp unroll full
+       do j = 1,10,3
+          sum = sum + 1
+       end do
+    end do
+  end function
+
+  integer function compute_sum2() result(sum)
+    implicit none
+
+    integer :: i,j
+
+    !$omp simd
+    !$omp unroll partial(2)
+    do i = 1,10,3
+       do j = 1,10,3
+          sum = sum + 1
+       end do
+    end do
+  end function
+end module test_functions
+
+program test
+  use test_functions
+  implicit none
+
+  integer :: result
+
+  result = compute_sum ()
+  write (*,*) result
+  if (result .ne. 16) then
+     call abort
+  end if
+
+  result = compute_sum2 ()
+  write (*,*) result
+  if (result .ne. 16) then
+     call abort
+  end if
+end program
+
+! { dg-final { scan-tree-dump {omp loop_transform} "original" } }
+! { dg-final { scan-tree-dump-not {omp loop_transform} "omp_transform_loops" } }
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index fd2be57b78c..e563408877e 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -525,6 +525,15 @@ enum omp_clause_code {

   /* OpenACC clause: nohost.  */
   OMP_CLAUSE_NOHOST,
+
+  /* Internal representation for an "omp unroll full" directive. */
+  OMP_CLAUSE_UNROLL_FULL,
+
+  /* Internal representation for an "omp unroll" directive without a clause. */
+  OMP_CLAUSE_UNROLL_NONE,
+
+  /* Internal representation for an "omp unroll partial" directive. */
+  OMP_CLAUSE_UNROLL_PARTIAL,
 };

 #undef DEFTREESTRUCT
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 6cdaed7d4b2..813176a912f 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -425,6 +425,7 @@ extern gimple_opt_pass *make_pass_lower_switch_O0 (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_lower_vector (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_lower_vector_ssa (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_omp_oacc_kernels_decompose (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_omp_transform_loops (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_lower_omp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_diagnose_omp_blocks (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_expand_omp (gcc::context *ctxt);
diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
index 7947f9647a1..588a992bcf3 100644
--- a/gcc/tree-pretty-print.cc
+++ b/gcc/tree-pretty-print.cc
@@ -505,6 +505,22 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, dump_flags_t flags)
     case OMP_CLAUSE_EXCLUSIVE:
       name = "exclusive";
       goto print_remap;
+    case OMP_CLAUSE_UNROLL_FULL:
+      pp_string (pp, "unroll_full");
+      break;
+    case OMP_CLAUSE_UNROLL_NONE:
+      pp_string (pp, "unroll_none");
+      break;
+    case OMP_CLAUSE_UNROLL_PARTIAL:
+      pp_string (pp, "unroll_partial");
+      if (OMP_CLAUSE_UNROLL_PARTIAL_EXPR (clause))
+       {
+         pp_left_paren (pp);
+         dump_generic_node (pp, OMP_CLAUSE_UNROLL_PARTIAL_EXPR (clause), spc, flags,
+                            false);
+         pp_right_paren (pp);
+       }
+      break;
     case OMP_CLAUSE__LOOPTEMP_:
       name = "_looptemp_";
       goto print_remap;
@@ -3581,6 +3597,10 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags,
       pp_string (pp, "#pragma omp distribute");
       goto dump_omp_loop;

+    case OMP_LOOP_TRANS:
+      pp_string (pp, "#pragma omp loop_transform");
+      goto dump_omp_loop;
+
     case OMP_TASKLOOP:
       pp_string (pp, "#pragma omp taskloop");
       goto dump_omp_loop;
diff --git a/gcc/tree.cc b/gcc/tree.cc
index 207293c48cb..53e44367977 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -326,6 +326,9 @@ unsigned const char omp_clause_num_ops[] =
   0, /* OMP_CLAUSE_IF_PRESENT */
   0, /* OMP_CLAUSE_FINALIZE */
   0, /* OMP_CLAUSE_NOHOST */
+  0, /* OMP_CLAUSE_UNROLL_FULL */
+  0, /* OMP_CLAUSE_UNROLL_NONE */
+  1 /* OMP_CLAUSE_UNROLL_PARTIAL */
 };

 const char * const omp_clause_code_name[] =
@@ -417,6 +420,9 @@ const char * const omp_clause_code_name[] =
   "if_present",
   "finalize",
   "nohost",
+  "unroll_full",
+  "unroll_none",
+  "unroll_partial"
 };

 /* Unless specific to OpenACC, we tend to internally maintain OpenMP-centric
diff --git a/gcc/tree.def b/gcc/tree.def
index e639a039db9..a47e4b8dbda 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -1166,6 +1166,12 @@ DEFTREECODE (OMP_TASK, "omp_task", tcc_statement, 2)
    unspecified by the standards.  */
 DEFTREECODE (OMP_FOR, "omp_for", tcc_statement, 7)

+/* OpenMP - A loop nest to which a loop transformation such as #pragma omp
+   unroll should be applied, but which is not associated with another directive
+   such as #pragma omp for. The kind of loop transformations to be applied are
+   internally represented by clauses.  Operands like for OMP_FOR.  */
+DEFTREECODE (OMP_LOOP_TRANS, "omp_loop_trans", tcc_statement, 7)
+
 /* OpenMP - #pragma omp simd [clause1 ... clauseN]
    Operands like for OMP_FOR.  */
 DEFTREECODE (OMP_SIMD, "omp_simd", tcc_statement, 7)
diff --git a/gcc/tree.h b/gcc/tree.h
index abcdb5638d4..f33f815b712 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1787,6 +1787,9 @@ class auto_suppress_location_wrappers
 #define OMP_CLAUSE_USE_DEVICE_PTR_IF_PRESENT(NODE) \
   (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_USE_DEVICE_PTR)->base.public_flag)

+#define OMP_CLAUSE_UNROLL_PARTIAL_EXPR(NODE) \
+  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_UNROLL_PARTIAL), 0)
+
 #define OMP_CLAUSE_PROC_BIND_KIND(NODE) \
   (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_PROC_BIND)->omp_clause.subcode.proc_bind_kind)

diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90
new file mode 100644
index 00000000000..f07aab898fa
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90
@@ -0,0 +1,52 @@
+! { dg-additional-options "-fdump-tree-original" }
+! { dg-do run }
+
+module test_functions
+  contains
+  integer function compute_sum() result(sum)
+    implicit none
+
+    integer :: i,j
+
+    !$omp do
+    do i = 1,10,3
+       !$omp unroll full
+       do j = 1,10,3
+          sum = sum + 1
+       end do
+    end do
+  end function
+
+  integer function compute_sum2() result(sum)
+    implicit none
+
+    integer :: i,j
+
+    !$omp parallel do reduction(+:sum)
+    !$omp unroll partial(2)
+    do i = 1,10,3
+       do j = 1,10,3
+          sum = sum + 1
+       end do
+    end do
+  end function
+end module test_functions
+
+program test
+  use test_functions
+  implicit none
+
+  integer :: result
+
+  result = compute_sum ()
+  write (*,*) result
+  if (result .ne. 16) then
+     call abort
+  end if
+
+  result = compute_sum2 ()
+  write (*,*) result
+  if (result .ne. 16) then
+     call abort
+  end if
+end program
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-2.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-2.f90
new file mode 100644
index 00000000000..2ce44d4d044
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-2.f90
@@ -0,0 +1,88 @@
+! { dg-additional-options "-fdump-tree-original -g" }
+! { dg-do run }
+
+module test_functions
+contains
+  integer function compute_sum1 () result(sum)
+    implicit none
+
+    integer :: i
+
+    sum = 0
+    !$omp unroll full
+    do i = 1,10,3
+       sum = sum + 1
+    end do
+  end function compute_sum1
+
+  integer function compute_sum2() result(sum)
+    implicit none
+
+    integer :: i
+
+    sum = 0
+    !$omp unroll full
+    do i = -20,1,3
+       sum = sum + 1
+    end do
+  end function compute_sum2
+
+
+  integer function compute_sum3() result(sum)
+    implicit none
+
+    integer :: i
+
+    sum = 0
+    !$omp unroll full
+    do i = 30,1,-3
+       sum = sum + 1
+    end do
+  end function compute_sum3
+
+
+  integer function compute_sum4() result(sum)
+    implicit none
+
+    integer :: i
+
+    sum = 0
+    !$omp unroll full
+    do i = 50,-60,-10
+       sum = sum + 1
+    end do
+  end function compute_sum4
+
+end module test_functions
+
+program test
+  use test_functions
+  implicit none
+
+  integer :: result
+
+  result = compute_sum1 ()
+  write (*,*) result
+  if (result .ne. 4) then
+     call abort
+  end if
+
+  result = compute_sum2 ()
+  write (*,*) result
+  if (result .ne. 8) then
+     call abort
+  end if
+
+  result = compute_sum3 ()
+  write (*,*) result
+  if (result .ne. 10) then
+     call abort
+  end if
+
+  result = compute_sum4 ()
+  write (*,*) result
+  if (result .ne. 12) then
+     call abort
+  end if
+
+end program
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-3.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-3.f90
new file mode 100644
index 00000000000..55e5cc568a5
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-3.f90
@@ -0,0 +1,59 @@
+! Test lowering of the internal representation of "omp unroll" loops
+! which are not unrolled.
+
+! { dg-additional-options "-O0" }
+! { dg-additional-options "--param=omp-unroll-full-max-iterations=0" }
+! { dg-additional-options "-fdump-tree-omp_transform_loops-details -fopt-info-optimized" }
+! { dg-do run }
+
+module test_functions
+contains
+  integer function compute_sum1 () result(sum)
+    implicit none
+
+    integer :: i
+
+    sum = 0
+    !$omp unroll
+    do i = 0,50
+       sum = sum + 1
+    end do
+  end function compute_sum1
+
+  integer function compute_sum3 (step,n) result(sum)
+    implicit none
+    integer :: i, step, n
+
+    sum = 0
+    do i = 0,n,step
+       sum = sum + 1
+    end do
+  end function compute_sum3
+end module test_functions
+
+program test
+  use test_functions
+  implicit none
+
+  integer :: result
+
+  result = compute_sum1 ()
+  if (result .ne. 51) then
+     call abort
+  end if
+
+  result = compute_sum3 (1, 100)
+  if (result .ne. 101) then
+     call abort
+  end if
+
+  result = compute_sum3 (2, 100)
+  if (result .ne. 51) then
+     call abort
+  end if
+
+  result = compute_sum3 (-2, -100)
+  if (result .ne. 51) then
+     call abort
+  end if
+end program
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-4.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-4.f90
new file mode 100644
index 00000000000..52a214f1049
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-4.f90
@@ -0,0 +1,72 @@
+! { dg-additional-options "-O0 -g" }
+! { dg-additional-options "-fdump-tree-omp_transform_loops-details -fopt-info-optimized" }
+! { dg-do run }
+
+module test_functions
+contains
+  integer function compute_sum1 () result(sum)
+    implicit none
+
+    integer :: i
+
+    sum = 0
+    !$omp unroll partial(2)
+    do i = 1,50
+       sum = sum + 1
+    end do
+  end function compute_sum1
+
+  integer function compute_sum3 (step,n) result(sum)
+    implicit none
+    integer :: i, step, n
+
+    sum = 0
+    !$omp unroll partial(5)
+    do i = 1,n,step
+       sum = sum + 1
+    end do
+  end function compute_sum3
+end module test_functions
+
+program test
+  use test_functions
+  implicit none
+
+  integer :: result
+
+  result = compute_sum1 ()
+  write (*,*) result
+  if (result .ne. 50) then
+     call abort
+  end if
+
+  result = compute_sum3 (1, 100)
+  write (*,*) result
+  if (result .ne. 100) then
+     call abort
+  end if
+
+  result = compute_sum3 (1, 9)
+  write (*,*) result
+  if (result .ne. 9) then
+     call abort
+  end if
+
+  result = compute_sum3 (2, 96)
+  write (*,*) result
+  if (result .ne. 48) then
+     call abort
+  end if
+
+  result = compute_sum3 (-2, -98)
+  write (*,*) result
+  if (result .ne. 50) then
+     call abort
+  end if
+
+  result = compute_sum3 (-2, -100)
+  write (*,*) result
+  if (result .ne. 51) then
+     call abort
+  end if
+end program
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-5.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-5.f90
new file mode 100644
index 00000000000..d6a4e739675
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-5.f90
@@ -0,0 +1,55 @@
+! { dg-additional-options "-O0 -g" }
+! { dg-additional-options "-fdump-tree-omp_transform_loops-details -fopt-info-optimized" }
+! { dg-do run }
+
+module test_functions
+contains
+  integer function compute_sum4 (step,n) result(sum)
+    implicit none
+    integer :: i, step, n
+
+    sum = 0
+    !$omp do
+    !$omp unroll partial(5)
+    do i = 1,n,step
+       sum = sum + 1
+    end do
+  end function compute_sum4
+end module test_functions
+
+program test
+  use test_functions
+  implicit none
+
+  integer :: result
+
+  result = compute_sum4 (1, 100)
+  write (*,*) result
+  if (result .ne. 100) then
+     call abort
+  end if
+
+  result = compute_sum4 (1, 9)
+  write (*,*) result
+  if (result .ne. 9) then
+     call abort
+  end if
+
+  result = compute_sum4 (2, 96)
+  write (*,*) result
+  if (result .ne. 48) then
+     call abort
+  end if
+
+  result = compute_sum4 (-2, -98)
+  write (*,*) result
+  if (result .ne. 50) then
+     call abort
+  end if
+
+  result = compute_sum4 (-2, -100)
+  write (*,*) result
+  if (result .ne. 51) then
+     call abort
+  end if
+end program
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-6.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-6.f90
new file mode 100644
index 00000000000..1df8ce8d5bb
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-6.f90
@@ -0,0 +1,105 @@
+! { dg-additional-options "-O0 -g" }
+! { dg-additional-options "-fdump-tree-omp_transform_loops-details -fopt-info-optimized" }
+! { dg-do run }
+
+module test_functions
+contains
+  integer function compute_sum4 (step,n) result(sum)
+    implicit none
+    integer :: i, step, n
+
+    sum = 0
+    !$omp parallel do reduction(+:sum) lastprivate(i)
+    !$omp unroll partial(5)
+    do i = 1,n,step
+       sum = sum + 1
+    end do
+  end function compute_sum4
+
+  integer function compute_sum5 (step,n) result(sum)
+    implicit none
+    integer :: i, step, n
+
+    sum = 0
+    !$omp parallel do reduction(+:sum) lastprivate(i)
+    !$omp unroll partial(5) ! { dg-optimized {replaced consecutive 'omp unroll' directives by 'omp unroll auto\(50\)'} }
+    !$omp unroll partial(10)
+    do i = 1,n,step
+       sum = sum + 1
+    end do
+  end function compute_sum5
+
+  integer function compute_sum6 (step,n) result(sum)
+    implicit none
+    integer :: i, j, step, n
+
+    sum = 0
+    !$omp parallel do reduction(+:sum) lastprivate(i)
+    do i = 1,n,step
+       !$omp unroll full ! { dg-optimized {removed useless 'omp unroll auto' directives preceding 'omp unroll full'} }
+       !$omp unroll partial(10)
+       do j = 1, 1000
+          sum = sum + 1
+       end do
+    end do
+  end function compute_sum6
+end module test_functions
+
+program test
+  use test_functions
+  implicit none
+
+  integer :: result
+
+  result = compute_sum4 (1, 100)
+  if (result .ne. 100) then
+     call abort
+  end if
+
+  result = compute_sum4 (1, 9)
+  if (result .ne. 9) then
+     call abort
+  end if
+
+  result = compute_sum4 (2, 96)
+  if (result .ne. 48) then
+     call abort
+  end if
+
+  result = compute_sum4 (-2, -98)
+  if (result .ne. 50) then
+     call abort
+  end if
+
+  result = compute_sum4 (-2, -100)
+  if (result .ne. 51) then
+     call abort
+  end if
+
+  result = compute_sum5 (1, 100)
+  if (result .ne. 100) then
+     call abort
+  end if
+
+  result = compute_sum5 (1, 9)
+  if (result .ne. 9) then
+     call abort
+  end if
+
+  result = compute_sum5 (2, 96)
+  if (result .ne. 48) then
+     call abort
+  end if
+
+  result = compute_sum5 (-2, -98)
+  if (result .ne. 50) then
+     call abort
+  end if
+
+  result = compute_sum5 (-2, -100)
+  if (result .ne. 51) then
+     call abort
+  end if
+
+
+end program
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-7.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-7.f90
new file mode 100644
index 00000000000..d25f18002ae
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-7.f90
@@ -0,0 +1,198 @@
+! { dg-additional-options "-O0 -cpp" }
+! { dg-do run }
+
+#ifndef UNROLL_FACTOR
+#define UNROLL_FACTOR 1
+#endif
+module test_functions
+contains
+   subroutine copy (array1, array2)
+    implicit none
+
+    integer :: array1(:)
+    integer :: array2(:)
+    integer :: i
+
+    !$omp parallel do
+    !$omp unroll partial(UNROLL_FACTOR)
+    do i = 1, 100
+       array1(i) = array2(i)
+    end do
+  end subroutine
+
+  subroutine copy2 (array1, array2)
+    implicit none
+
+    integer :: array1(100)
+    integer :: array2(100)
+    integer :: i
+
+    !$omp parallel do
+    !$omp unroll partial(UNROLL_FACTOR)
+    do i = 0,99
+       array1(i+1) = array2(i+1)
+    end do
+  end subroutine copy2
+
+  subroutine copy3 (array1, array2)
+    implicit none
+
+    integer :: array1(100)
+    integer :: array2(100)
+    integer :: i
+
+    !$omp parallel do lastprivate(i)
+    !$omp unroll partial(UNROLL_FACTOR)
+    do i = -49,50
+       if (i < 0) then
+          array1((-1)*i) = array2((-1)*i)
+       else
+          array1(50+i) = array2(50+i)
+       endif
+    end do
+  end subroutine copy3
+
+  subroutine copy4 (array1, array2)
+    implicit none
+
+    integer :: array1(:)
+    integer :: array2(:)
+    integer :: i
+
+    !$omp do
+    !$omp unroll partial(UNROLL_FACTOR)
+    do i = 2, 200, 2
+       array1(i/2) = array2(i/2)
+    end do
+  end subroutine copy4
+
+  subroutine copy5 (array1, array2)
+    implicit none
+
+    integer :: array1(:)
+    integer :: array2(:)
+    integer :: i
+
+    !$omp do
+    !$omp unroll partial(UNROLL_FACTOR)
+    do i = 200, 2, -2
+       array1(i/2) = array2(i/2)
+    end do
+  end subroutine
+
+  subroutine copy6 (array1, array2, lower, upper, step)
+    implicit none
+
+    integer :: array1(:)
+    integer :: array2(:)
+    integer :: lower, upper, step
+    integer :: i
+
+    !$omp do
+    !$omp unroll partial(UNROLL_FACTOR)
+    do i = lower, upper, step
+       array1 (i) = array2(i)
+    end do
+  end subroutine
+
+  subroutine prepare (array1, array2)
+    implicit none
+
+    integer :: array1(:)
+    integer :: array2(:)
+
+    array1 = 2
+    array2 = 0
+  end subroutine
+
+  subroutine check_equal (array1, array2)
+    implicit none
+
+    integer :: array1(:)
+    integer :: array2(:)
+    integer :: i
+
+    do i=1,100
+       if (array1(i) /= array2(i)) then
+          write (*,*) i
+          call abort
+       end if
+    end do
+  end subroutine
+
+  subroutine check_equal_at_steps (array1, array2, lower, upper, step)
+    implicit none
+
+    integer :: array1(:)
+    integer :: array2(:)
+    integer :: lower, upper, step
+    integer :: i
+
+    do i=lower, upper, step
+       if (array1(i) /= array2(i)) then
+          write (*,*) i
+          call abort
+       end if
+    end do
+  end subroutine
+
+  subroutine check_unchanged_at_non_steps (array1, array2, lower, upper, step)
+    implicit none
+
+    integer :: array1(:)
+    integer :: array2(:)
+    integer :: lower, upper, step
+    integer :: i, j
+
+    do i=lower, upper,step
+       do j=i,i+step-1
+          if (array2(j) /= 0) then
+             write (*,*) i
+             call abort
+          end if
+       end do
+    end do
+  end subroutine
+end module test_functions
+
+program test
+  use test_functions
+  implicit none
+
+  integer :: array1(100), array2(100)
+
+  call prepare (array1, array2)
+  call copy (array1, array2)
+  call check_equal (array1, array2)
+
+  call prepare (array1, array2)
+  call copy2 (array1, array2)
+  call check_equal (array1, array2)
+
+  call prepare (array1, array2)
+  call copy3 (array1, array2)
+  call check_equal (array1, array2)
+
+  call prepare (array1, array2)
+  call copy4 (array1, array2)
+  call check_equal (array1, array2)
+
+  call prepare (array1, array2)
+  call copy5 (array1, array2)
+  call check_equal (array1, array2)
+
+  call prepare (array1, array2)
+  call copy6 (array1, array2, 1, 100, 5)
+  call check_equal_at_steps (array1, array2, 1, 100, 5)
+  call check_unchanged_at_non_steps (array1, array2, 1, 100, 5)
+
+  call prepare (array1, array2)
+  call copy6 (array1, array2, 1, 50, 5)
+  call check_equal_at_steps (array1, array2, 1, 50, 5)
+  call check_unchanged_at_non_steps (array1, array2, 1, 50, 5)
+
+  call prepare (array1, array2)
+  call copy6 (array1, array2, 3, 18, 7)
+  call check_equal_at_steps (array1, array2, 3 , 18, 7)
+  call check_unchanged_at_non_steps (array1, array2, 3, 18, 7)
+end program
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-7a.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-7a.f90
new file mode 100644
index 00000000000..02328464c0d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-7a.f90
@@ -0,0 +1,7 @@
+! { dg-additional-options "-O0 -g -cpp" }
+! { dg-do run }
+
+! Check an unroll factor that divides the number of iterations
+! of the loops in the test implementation.
+#define UNROLL_FACTOR 5
+#include "unroll-7.f90"
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-7b.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-7b.f90
new file mode 100644
index 00000000000..60866ef33fd
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-7b.f90
@@ -0,0 +1,7 @@
+! { dg-additional-options "-O0 -g -cpp" }
+! { dg-do run }
+
+! Check an unroll factor that does not divide the number of iterations
+! of the loops in the test implementation.
+#define UNROLL_FACTOR 3
+#include "unroll-7.f90"
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-7c.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-7c.f90
new file mode 100644
index 00000000000..6d8a2ef7bc0
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-7c.f90
@@ -0,0 +1,7 @@
+! { dg-additional-options "-O0 -g -cpp" }
+! { dg-do run }
+
+! Check an unroll factor that is larger than the number of iterations
+! of the loops in the test implementation.
+#define UNROLL_FACTOR 113
+#include "unroll-7.f90"
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-8.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-8.f90
new file mode 100644
index 00000000000..40506025aa3
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-8.f90
@@ -0,0 +1,38 @@
+! { dg-additional-options "-O0 -g" }
+! { dg-additional-options "-fdump-tree-omp_transform_loops-details -fopt-info-optimized" }
+! { dg-do run }
+
+module test_functions
+contains
+   subroutine copy (array1, array2, step, n)
+    implicit none
+
+    integer :: array1(n)
+    integer :: array2(n)
+    integer :: i, step, n
+
+    call omp_set_num_threads (4)
+    !$omp parallel do shared(array1) shared(array2) schedule(static, 4)
+    !$omp unroll partial(2)
+    do i = 1,n
+       array1(i) = array2(i)
+    end do
+  end subroutine
+end module test_functions
+
+program test
+  use test_functions
+  implicit none
+
+  integer :: array1(100), array2(100)
+  integer :: i
+
+  array1 = 2
+  call copy(array1, array2, 1, 100)
+  do i=1,100
+     if (array1(i) /= array2(i)) then
+        write (*,*) i
+        call abort
+     end if
+  end do
+end program
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90
new file mode 100644
index 00000000000..5fb64ddd6fd
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90
@@ -0,0 +1,33 @@
+! { dg-options "-fno-openmp -fopenmp-simd" }
+! { dg-additional-options "-fdump-tree-original" }
+! { dg-do run }
+
+module test_functions
+  contains
+  integer function compute_sum() result(sum)
+    implicit none
+
+    integer :: i,j
+
+    !$omp simd
+    do i = 1,10,3
+       !$omp unroll full
+       do j = 1,10,3
+          sum = sum + 1
+       end do
+    end do
+  end function compute_sum
+end module test_functions
+
+program test
+  use test_functions
+  implicit none
+
+  integer :: result
+
+  result = compute_sum ()
+  write (*,*) result
+  if (result .ne. 16) then
+     call abort
+  end if
+end program
--
2.36.1

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 2/7] openmp: Add C/C++ support for "omp unroll" directive
  2023-03-24 15:30 [PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives Frederik Harwath
  2023-03-24 15:30 ` [PATCH 1/7] openmp: Add Fortran support for "omp unroll" directive Frederik Harwath
@ 2023-03-24 15:30 ` Frederik Harwath
  2023-03-24 15:30 ` [PATCH 3/7] openacc: Rename OMP_CLAUSE_TILE to OMP_CLAUSE_OACC_TILE Frederik Harwath
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Frederik Harwath @ 2023-03-24 15:30 UTC (permalink / raw)
  To: gcc-patches, tobias, jakub, joseph, jason

This commit implements the C and the C++ front end changes to support
the "omp unroll" directive.  The execution of the loop transformation
relies on the pass that has been added as a part of the earlier
Fortran patch.

gcc/c-family/ChangeLog:

        * c-gimplify.cc (c_genericize_control_stmt): Handle OMP_UNROLL.
        * c-omp.cc: Add "unroll" to omp_directives[].
        * c-pragma.cc: Add "unroll" to omp_pragmas_simd[].
        * c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_UNROLL to
        pragma_kind and adjust PRAGMA_OMP__LAST_.
        (enum pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_FULL and
        PRAGMA_OMP_CLAUSE_PARTIAL.

gcc/c/ChangeLog:

        * c-parser.cc (c_parser_omp_clause_name): Handle "full" and
        "partial" clauses.
        (check_no_duplicate_clause): Change return type to bool and
        return check result.
        (c_parser_omp_clause_unroll_full): New function for parsing
        the "unroll clause".
        (c_parser_omp_clause_unroll_partial): New function for
        parsing the "partial" clause.
        (c_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_FULL
        and PRAGMA_OMP_CLAUSE_PARTIAL.
        (c_parser_nested_omp_unroll_clauses): New function for parsing
        "omp unroll" directives following another directive.
        (OMP_UNROLL_CLAUSE_MASK): New definition.
        (c_parser_omp_unroll): New function for parsing "omp unroll"
        loops that are not associated with another directive.
        (c_parser_omp_construct): Handle PRAGMA_OMP_UNROLL.
        * c-typeck.cc (c_finish_omp_clauses): Handle
        OMP_CLAUSE_UNROLL_FULL, OMP_CLAUSE_UNROLL_PARTIAL,
        and OMP_CLAUSE_UNROLL_NONE.

gcc/cp/ChangeLog:

        * cp-gimplify.cc (cp_gimplify_expr): Handle OMP_UNROLL.
        (cp_fold_r): Likewise.
        (cp_genericize_r): Likewise.
        * parser.cc (cp_parser_omp_clause_name): Handle "full" clause.
        (check_no_duplicate_clause): Change return type to bool and
        return check result.
        (cp_parser_omp_clause_unroll_full): New function for parsing
        the "unroll clause".
        (cp_parser_omp_clause_unroll_partial): New function for
        parsing the "partial" clause.
        (cp_parser_omp_all_clauses): Handle OMP_CLAUSE_UNROLL and
        OMP_CLAUSE_FULL.
        (cp_parser_nested_omp_unroll_clauses): New function for parsing
        "omp unroll" directives following another directive.
        (cp_parser_omp_for_loop): Handle "omp unroll" directives
        between directive and loop.
        (OMP_UNROLL_CLAUSE_MASK): New definition.
        (cp_parser_omp_unroll): New function for parsing "omp unroll"
        loops that are not associated with another directive.

        (cp_parser_omp_construct): Handle PRAGMA_OMP_UNROLL.
        (cp_parser_pragma): Handle PRAGMA_OMP_UNROLL.
        * pt.cc (tsubst_omp_clauses): Handle
        OMP_CLAUSE_UNROLL_PARTIAL, OMP_CLAUSE_UNROLL_FULL, and
        OMP_CLAUSE_UNROLL_NONE.
        (tsubst_expr): Handle OMP_UNROLL.
        * semantics.cc (finish_omp_clauses): Handle
        OMP_CLAUSE_UNROLL_FULL, OMP_CLAUSE_UNROLL_PARTIAL,
        and OMP_CLAUSE_UNROLL_NONE.

libgomp/ChangeLog:

        * testsuite/libgomp.c++/loop-transforms/unroll-1.C: New test.
        * testsuite/libgomp.c++/loop-transforms/unroll-2.C: New test.
        * testsuite/libgomp.c-c++-common/loop-transforms/unroll-1.c: New test.

gcc/testsuite/ChangeLog:

        * c-c++-common/gomp/loop-transforms/unroll-1.c: New test.
        * c-c++-common/gomp/loop-transforms/unroll-2.c: New test.
        * c-c++-common/gomp/loop-transforms/unroll-3.c: New test.
        * c-c++-common/gomp/loop-transforms/unroll-4.c: New test.
        * c-c++-common/gomp/loop-transforms/unroll-5.c: New test.
        * c-c++-common/gomp/loop-transforms/unroll-6.c: New test.
        * g++.dg/gomp/loop-transforms/unroll-1.C: New test.
        * g++.dg/gomp/loop-transforms/unroll-2.C: New test.
        * g++.dg/gomp/loop-transforms/unroll-3.C: New test.
---
 gcc/c-family/c-gimplify.cc                    |   1 +
 gcc/c-family/c-omp.cc                         |   6 +-
 gcc/c-family/c-pragma.cc                      |   1 +
 gcc/c-family/c-pragma.h                       |   5 +-
 gcc/c/c-parser.cc                             | 161 ++++++++++++++++-
 gcc/c/c-typeck.cc                             |   8 +
 gcc/cp/cp-gimplify.cc                         |   3 +
 gcc/cp/parser.cc                              | 164 +++++++++++++++++-
 gcc/cp/pt.cc                                  |   4 +
 gcc/cp/semantics.cc                           |  56 ++++++
 .../gomp/loop-transforms/unroll-1.c           | 133 ++++++++++++++
 .../gomp/loop-transforms/unroll-2.c           |  99 +++++++++++
 .../gomp/loop-transforms/unroll-3.c           |  18 ++
 .../gomp/loop-transforms/unroll-4.c           |  19 ++
 .../gomp/loop-transforms/unroll-5.c           |  19 ++
 .../gomp/loop-transforms/unroll-6.c           |  20 +++
 .../gomp/loop-transforms/unroll-7.c           | 144 +++++++++++++++
 .../gomp/loop-transforms/unroll-simd-1.c      |  84 +++++++++
 .../g++.dg/gomp/loop-transforms/unroll-1.C    |  42 +++++
 .../g++.dg/gomp/loop-transforms/unroll-2.C    |  47 +++++
 .../g++.dg/gomp/loop-transforms/unroll-3.C    |  37 ++++
 .../libgomp.c++/loop-transforms/unroll-1.C    |  73 ++++++++
 .../libgomp.c++/loop-transforms/unroll-2.C    |  34 ++++
 .../loop-transforms/unroll-1.c                |  76 ++++++++
 24 files changed, 1246 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-1.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-2.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-3.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-4.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-5.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-6.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-7.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-simd-1.c
 create mode 100644 gcc/testsuite/g++.dg/gomp/loop-transforms/unroll-1.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/loop-transforms/unroll-2.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/loop-transforms/unroll-3.C
 create mode 100644 libgomp/testsuite/libgomp.c++/loop-transforms/unroll-1.C
 create mode 100644 libgomp/testsuite/libgomp.c++/loop-transforms/unroll-2.C
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/unroll-1.c

diff --git a/gcc/c-family/c-gimplify.cc b/gcc/c-family/c-gimplify.cc
index ef5c7d919fc..82c88bd70e1 100644
--- a/gcc/c-family/c-gimplify.cc
+++ b/gcc/c-family/c-gimplify.cc
@@ -506,6 +506,7 @@ c_genericize_control_stmt (tree *stmt_p, int *walk_subtrees, void *data,
     case OMP_DISTRIBUTE:
     case OMP_LOOP:
     case OMP_TASKLOOP:
+    case OMP_LOOP_TRANS:
     case OACC_LOOP:
       genericize_omp_for_stmt (stmt_p, walk_subtrees, data, func, lh);
       break;
diff --git a/gcc/c-family/c-omp.cc b/gcc/c-family/c-omp.cc
index f72ca4c6acd..85ba9c528c8 100644
--- a/gcc/c-family/c-omp.cc
+++ b/gcc/c-family/c-omp.cc
@@ -3212,9 +3212,9 @@ const struct c_omp_directive c_omp_directives[] = {
   { "teams", nullptr, nullptr, PRAGMA_OMP_TEAMS,
     C_OMP_DIR_CONSTRUCT, true },
   { "threadprivate", nullptr, nullptr, PRAGMA_OMP_THREADPRIVATE,
-    C_OMP_DIR_DECLARATIVE, false }
-  /* { "unroll", nullptr, nullptr, PRAGMA_OMP_UNROLL,
-    C_OMP_DIR_CONSTRUCT, false },  */
+    C_OMP_DIR_DECLARATIVE, false },
+ { "unroll", nullptr, nullptr, PRAGMA_OMP_UNROLL,
+    C_OMP_DIR_CONSTRUCT, false },
 };

 /* Find (non-combined/composite) OpenMP directive (if any) which starts
diff --git a/gcc/c-family/c-pragma.cc b/gcc/c-family/c-pragma.cc
index 0d2b333cebb..96a28ac1b0c 100644
--- a/gcc/c-family/c-pragma.cc
+++ b/gcc/c-family/c-pragma.cc
@@ -1593,6 +1593,7 @@ static const struct omp_pragma_def omp_pragmas_simd[] = {
   { "target", PRAGMA_OMP_TARGET },
   { "taskloop", PRAGMA_OMP_TASKLOOP },
   { "teams", PRAGMA_OMP_TEAMS },
+  { "unroll", PRAGMA_OMP_UNROLL },
 };

 void
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 9cc95ab3ee3..6686abdc94d 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -81,8 +81,9 @@ enum pragma_kind {
   PRAGMA_OMP_TASKYIELD,
   PRAGMA_OMP_THREADPRIVATE,
   PRAGMA_OMP_TEAMS,
+  PRAGMA_OMP_UNROLL,
   /* PRAGMA_OMP__LAST_ should be equal to the last PRAGMA_OMP_* code.  */
-  PRAGMA_OMP__LAST_ = PRAGMA_OMP_TEAMS,
+  PRAGMA_OMP__LAST_ = PRAGMA_OMP_UNROLL,

   PRAGMA_GCC_PCH_PREPROCESS,
   PRAGMA_IVDEP,
@@ -118,6 +119,7 @@ enum pragma_omp_clause {
   PRAGMA_OMP_CLAUSE_FIRSTPRIVATE,
   PRAGMA_OMP_CLAUSE_FOR,
   PRAGMA_OMP_CLAUSE_FROM,
+  PRAGMA_OMP_CLAUSE_FULL,
   PRAGMA_OMP_CLAUSE_GRAINSIZE,
   PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR,
   PRAGMA_OMP_CLAUSE_HINT,
@@ -140,6 +142,7 @@ enum pragma_omp_clause {
   PRAGMA_OMP_CLAUSE_ORDER,
   PRAGMA_OMP_CLAUSE_ORDERED,
   PRAGMA_OMP_CLAUSE_PARALLEL,
+  PRAGMA_OMP_CLAUSE_PARTIAL,
   PRAGMA_OMP_CLAUSE_PRIORITY,
   PRAGMA_OMP_CLAUSE_PRIVATE,
   PRAGMA_OMP_CLAUSE_PROC_BIND,
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 21bc3167ce2..9d875befccc 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -13471,6 +13471,8 @@ c_parser_omp_clause_name (c_parser *parser)
            result = PRAGMA_OMP_CLAUSE_FIRSTPRIVATE;
          else if (!strcmp ("from", p))
            result = PRAGMA_OMP_CLAUSE_FROM;
+         else if (!strcmp ("full", p))
+           result = PRAGMA_OMP_CLAUSE_FULL;
          break;
        case 'g':
          if (!strcmp ("gang", p))
@@ -13545,6 +13547,8 @@ c_parser_omp_clause_name (c_parser *parser)
        case 'p':
          if (!strcmp ("parallel", p))
            result = PRAGMA_OMP_CLAUSE_PARALLEL;
+         else if (!strcmp ("partial", p))
+           result = PRAGMA_OMP_CLAUSE_PARTIAL;
          else if (!strcmp ("present", p))
            result = PRAGMA_OACC_CLAUSE_PRESENT;
          /* As of OpenACC 2.5, these are now aliases of the non-present_or
@@ -13639,12 +13643,15 @@ c_parser_omp_clause_name (c_parser *parser)

 /* Validate that a clause of the given type does not already exist.  */

-static void
+static bool
 check_no_duplicate_clause (tree clauses, enum omp_clause_code code,
                           const char *name)
 {
-  if (tree c = omp_find_clause (clauses, code))
+  tree c = omp_find_clause (clauses, code);
+  if (c)
     error_at (OMP_CLAUSE_LOCATION (c), "too many %qs clauses", name);
+
+  return c == NULL_TREE;
 }

 /* OpenACC 2.0
@@ -17448,6 +17455,65 @@ c_parser_omp_clause_uniform (c_parser *parser, tree list)
   return list;
 }

+/* OpenMP 5.1
+   full */
+
+static tree
+c_parser_omp_clause_unroll_full (c_parser *parser, tree list)
+{
+  if (!check_no_duplicate_clause (list, OMP_CLAUSE_UNROLL_FULL, "full"))
+    return list;
+
+  location_t loc = c_parser_peek_token (parser)->location;
+  tree c = build_omp_clause (loc, OMP_CLAUSE_UNROLL_FULL);
+  OMP_CLAUSE_CHAIN (c) = list;
+  return c;
+}
+
+/* OpenMP 5.1
+   partial ( constant-expression ) */
+
+static tree
+c_parser_omp_clause_unroll_partial (c_parser *parser, tree list)
+{
+  if (!check_no_duplicate_clause (list, OMP_CLAUSE_UNROLL_PARTIAL, "partial"))
+    return list;
+
+  tree c, num = error_mark_node;
+  HOST_WIDE_INT n;
+  location_t loc;
+
+  loc = c_parser_peek_token (parser)->location;
+  c = build_omp_clause (loc, OMP_CLAUSE_UNROLL_PARTIAL);
+  OMP_CLAUSE_UNROLL_PARTIAL_EXPR (c) = NULL_TREE;
+  OMP_CLAUSE_CHAIN (c) = list;
+
+  if (!c_parser_next_token_is (parser, CPP_OPEN_PAREN))
+    return c;
+
+  matching_parens parens;
+  parens.consume_open (parser);
+  num = c_parser_expr_no_commas (parser, NULL).value;
+  parens.skip_until_found_close (parser);
+
+  if (num == error_mark_node)
+    return list;
+
+  mark_exp_read (num);
+  num = c_fully_fold (num, false, NULL);
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (num)) || !tree_fits_shwi_p (num)
+      || (n = tree_to_shwi (num)) <= 0 || (int)n != n)
+    {
+      error_at (loc,
+               "partial argument needs positive constant integer expression");
+      return list;
+   }
+
+  OMP_CLAUSE_UNROLL_PARTIAL_EXPR (c) = num;
+
+  return c;
+}
+
 /* OpenMP 5.0:
    detach ( event-handle ) */

@@ -18042,6 +18108,14 @@ c_parser_omp_all_clauses (c_parser *parser, omp_clause_mask mask,
                                            clauses);
          c_name = "enter";
          break;
+       case PRAGMA_OMP_CLAUSE_FULL:
+         c_name = "full";
+         clauses = c_parser_omp_clause_unroll_full (parser, clauses);
+         break;
+       case PRAGMA_OMP_CLAUSE_PARTIAL:
+         c_name = "partial";
+         clauses = c_parser_omp_clause_unroll_partial (parser, clauses);
+         break;
        default:
          c_parser_error (parser, "expected %<#pragma omp%> clause");
          goto saw_error;
@@ -20169,6 +20243,8 @@ c_parser_omp_scan_loop_body (c_parser *parser, bool open_brace_parsed)
                             "expected %<}%>");
 }

+static bool c_parser_nested_omp_unroll_clauses (c_parser *, tree &);
+
 /* Parse the restricted form of loop statements allowed by OpenACC and OpenMP.
    The real trick here is to determine the loop control variable early
    so that we can push a new decl if necessary to make it private.
@@ -20227,6 +20303,13 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
   condv = make_tree_vec (count);
   incrv = make_tree_vec (count);

+  if (c_parser_nested_omp_unroll_clauses (parser, clauses)
+      && count > 1)
+    {
+      error_at (loc, "collapse cannot be larger than 1 on an unrolled loop");
+      return NULL;
+    }
+
   if (!c_parser_next_token_is_keyword (parser, RID_FOR))
     {
       c_parser_error (parser, "for statement expected");
@@ -23858,6 +23941,76 @@ c_parser_omp_taskloop (location_t loc, c_parser *parser,
   return ret;
 }

+#define OMP_UNROLL_CLAUSE_MASK                                 \
+       ( (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_PARTIAL)      \
+         | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_FULL) )
+
+/* Parse zero or more '#pragma omp unroll' that follow
+   another directive that requires a canonical loop nest. */
+
+static bool
+c_parser_nested_omp_unroll_clauses (c_parser *parser, tree &clauses)
+{
+  static const char *p_name = "#pragma omp unroll";
+  c_token *tok;
+  bool found_unroll = false;
+  while (c_parser_next_token_is (parser, CPP_PRAGMA)
+        && (tok = c_parser_peek_token (parser),
+            tok->pragma_kind == PRAGMA_OMP_UNROLL))
+    {
+      c_parser_consume_pragma (parser);
+      tree c = c_parser_omp_all_clauses (parser, OMP_UNROLL_CLAUSE_MASK,
+                                        p_name, true);
+      if (c)
+       {
+         gcc_assert (!TREE_CHAIN (c));
+         found_unroll = true;
+         if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_UNROLL_FULL)
+           {
+             error_at (tok->location, "%<full%> clause is invalid here; "
+                       "turns loop into non-loop");
+             continue;
+           }
+       }
+      else
+       {
+         error_at (tok->location, "%<#pragma omp unroll%> without "
+                                  "%<partial%> clause is invalid here; "
+                                  "turns loop into non-loop");
+         continue;
+       }
+
+      clauses = chainon (clauses, c);
+    }
+
+  return found_unroll;
+}
+
+static tree
+c_parser_omp_unroll (location_t loc, c_parser *parser, bool *if_p)
+{
+  tree block, ret;
+  static const char *p_name = "#pragma omp unroll";
+  omp_clause_mask mask = OMP_UNROLL_CLAUSE_MASK;
+
+  tree clauses = c_parser_omp_all_clauses (parser, mask, p_name, false);
+  c_parser_nested_omp_unroll_clauses (parser, clauses);
+
+  if (!clauses)
+    {
+      tree c = build_omp_clause (loc, OMP_CLAUSE_UNROLL_NONE);
+      OMP_CLAUSE_CHAIN (c) = clauses;
+      clauses = c;
+    }
+
+  block = c_begin_compound_stmt (true);
+  ret = c_parser_omp_for_loop (loc, parser, OMP_LOOP_TRANS, clauses, NULL, if_p);
+  block = c_end_compound_stmt (loc, block, true);
+  add_stmt (block);
+
+  return ret;
+}
+
 /* OpenMP 5.1
    #pragma omp nothing new-line  */

@@ -24249,6 +24402,7 @@ c_parser_omp_construct (c_parser *parser, bool *if_p)
   p_kind = c_parser_peek_token (parser)->pragma_kind;
   c_parser_consume_pragma (parser);

+  gcc_assert (parser->in_pragma);
   switch (p_kind)
     {
     case PRAGMA_OACC_ATOMIC:
@@ -24342,6 +24496,9 @@ c_parser_omp_construct (c_parser *parser, bool *if_p)
     case PRAGMA_OMP_ASSUME:
       c_parser_omp_assume (parser, if_p);
       return;
+    case PRAGMA_OMP_UNROLL:
+      stmt = c_parser_omp_unroll (loc, parser, if_p);
+      break;
     default:
       gcc_unreachable ();
     }
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 45bacc06c47..bffea79b441 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -15916,6 +15916,14 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
          pc = &OMP_CLAUSE_CHAIN (c);
          continue;

+       case OMP_CLAUSE_UNROLL_FULL:
+         pc = &OMP_CLAUSE_CHAIN (c);
+         continue;
+
+       case OMP_CLAUSE_UNROLL_PARTIAL:
+         pc = &OMP_CLAUSE_CHAIN (c);
+         continue;
+
        case OMP_CLAUSE_INBRANCH:
        case OMP_CLAUSE_NOTINBRANCH:
          if (branch_seen)
diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index 4fecd5616bd..bf81097d780 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -638,6 +638,7 @@ cp_gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p)
     case OMP_DISTRIBUTE:
     case OMP_LOOP:
     case OMP_TASKLOOP:
+    case OMP_LOOP_TRANS:
       ret = cp_gimplify_omp_for (expr_p, pre_p);
       break;

@@ -1097,6 +1098,7 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, void *data_)
     case OMP_DISTRIBUTE:
     case OMP_LOOP:
     case OMP_TASKLOOP:
+    case OMP_LOOP_TRANS:
     case OACC_LOOP:
       cp_walk_tree (&OMP_FOR_BODY (stmt), cp_fold_r, data, NULL);
       cp_walk_tree (&OMP_FOR_CLAUSES (stmt), cp_fold_r, data, NULL);
@@ -1855,6 +1857,7 @@ cp_genericize_r (tree *stmt_p, int *walk_subtrees, void *data)
     case OMP_FOR:
     case OMP_SIMD:
     case OMP_LOOP:
+    case OMP_LOOP_TRANS:
     case OACC_LOOP:
     case STATEMENT_LIST:
       /* These cases are handled by shared code.  */
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index a277003ea58..7034fdf49a4 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -37204,6 +37204,8 @@ cp_parser_omp_clause_name (cp_parser *parser)
            result = PRAGMA_OMP_CLAUSE_FIRSTPRIVATE;
          else if (!strcmp ("from", p))
            result = PRAGMA_OMP_CLAUSE_FROM;
+         else if (!strcmp ("full", p))
+           result = PRAGMA_OMP_CLAUSE_FULL;
          break;
        case 'g':
          if (!strcmp ("gang", p))
@@ -37278,6 +37280,8 @@ cp_parser_omp_clause_name (cp_parser *parser)
        case 'p':
          if (!strcmp ("parallel", p))
            result = PRAGMA_OMP_CLAUSE_PARALLEL;
+         if (!strcmp ("partial", p))
+           result = PRAGMA_OMP_CLAUSE_PARTIAL;
          else if (!strcmp ("present", p))
            result = PRAGMA_OACC_CLAUSE_PRESENT;
          else if (!strcmp ("present_or_copy", p)
@@ -37368,12 +37372,15 @@ cp_parser_omp_clause_name (cp_parser *parser)

 /* Validate that a clause of the given type does not already exist.  */

-static void
+static bool
 check_no_duplicate_clause (tree clauses, enum omp_clause_code code,
                           const char *name, location_t location)
 {
-  if (omp_find_clause (clauses, code))
+  bool found = omp_find_clause (clauses, code);
+  if (found)
     error_at (location, "too many %qs clauses", name);
+
+  return !found;
 }

 /* OpenMP 2.5:
@@ -39459,6 +39466,56 @@ cp_parser_omp_clause_thread_limit (cp_parser *parser, tree list,
   return c;
 }

+/* OpenMP 5.1
+   full */
+
+static tree
+cp_parser_omp_clause_unroll_full (tree list, location_t loc)
+{
+  if (!check_no_duplicate_clause (list, OMP_CLAUSE_UNROLL_FULL, "full", loc))
+    return list;
+
+  tree c = build_omp_clause (loc, OMP_CLAUSE_UNROLL_FULL);
+  OMP_CLAUSE_CHAIN (c) = list;
+  return c;
+}
+
+/* OpenMP 5.1
+   partial ( constant-expression ) */
+
+static tree
+cp_parser_omp_clause_unroll_partial (cp_parser *parser, tree list,
+                                    location_t loc)
+{
+  if (!check_no_duplicate_clause (list, OMP_CLAUSE_UNROLL_PARTIAL, "partial",
+                                 loc))
+    return list;
+
+  tree c, num = error_mark_node;
+  c = build_omp_clause (loc, OMP_CLAUSE_UNROLL_PARTIAL);
+  OMP_CLAUSE_UNROLL_PARTIAL_EXPR (c) = NULL_TREE;
+  OMP_CLAUSE_CHAIN (c) = list;
+
+  if (!cp_lexer_next_token_is (parser->lexer, CPP_OPEN_PAREN))
+    return c;
+
+  matching_parens parens;
+  parens.consume_open (parser);
+  num = cp_parser_constant_expression (parser);
+  cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true,
+                                        /*or_comma=*/false,
+                                        /*consume_paren=*/true);
+
+  if (num == error_mark_node)
+    return list;
+
+  mark_exp_read (num);
+  num = fold_non_dependent_expr (num);
+
+  OMP_CLAUSE_UNROLL_PARTIAL_EXPR (c) = num;
+  return c;
+}
+
 /* OpenMP 4.0:
    aligned ( variable-list )
    aligned ( variable-list : constant-expression )  */
@@ -41441,6 +41498,15 @@ cp_parser_omp_all_clauses (cp_parser *parser, omp_clause_mask mask,
                                            clauses);
          c_name = "enter";
          break;
+       case PRAGMA_OMP_CLAUSE_PARTIAL:
+         clauses = cp_parser_omp_clause_unroll_partial (parser, clauses,
+                                                        token->location);
+         c_name = "partial";
+         break;
+       case PRAGMA_OMP_CLAUSE_FULL:
+         clauses = cp_parser_omp_clause_unroll_full(clauses, token->location);
+         c_name = "full";
+         break;
        default:
          cp_parser_error (parser, "expected %<#pragma omp%> clause");
          goto saw_error;
@@ -43565,6 +43631,8 @@ cp_parser_omp_scan_loop_body (cp_parser *parser)
   braces.require_close (parser);
 }

+static bool cp_parser_nested_omp_unroll_clauses (cp_parser *, tree &);
+
 /* Parse the restricted form of the for statement allowed by OpenMP.  */

 static tree
@@ -43622,6 +43690,15 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,

   loc_first = cp_lexer_peek_token (parser->lexer)->location;

+  if (cp_parser_nested_omp_unroll_clauses (parser, clauses)
+      && count > 1)
+    {
+      error_at (loc_first,
+               "collapse cannot be larger than 1 on an unrolled loop");
+      return NULL;
+    }
+
+
   for (i = 0; i < count; i++)
     {
       int bracecount = 0;
@@ -45657,6 +45734,79 @@ cp_parser_omp_target (cp_parser *parser, cp_token *pragma_tok,
   return true;
 }

+#define OMP_UNROLL_CLAUSE_MASK                                 \
+       ( (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_PARTIAL)      \
+         | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_FULL) )
+
+/* Parse zero or more '#pragma omp unroll' that follow
+   another directive that requires a canonical loop nest. */
+
+static bool
+cp_parser_nested_omp_unroll_clauses (cp_parser *parser, tree &clauses)
+{
+  static const char *p_name = "#pragma omp unroll";
+  cp_token *tok;
+  bool unroll_found = false;
+  while (cp_lexer_next_token_is (parser->lexer, CPP_PRAGMA)
+        && (tok = cp_lexer_peek_token (parser->lexer),
+            cp_parser_pragma_kind (tok) == PRAGMA_OMP_UNROLL))
+    {
+      cp_lexer_consume_token (parser->lexer);
+      gcc_assert (tok->type == CPP_PRAGMA);
+      parser->lexer->in_pragma = true;
+      tree c = cp_parser_omp_all_clauses (parser, OMP_UNROLL_CLAUSE_MASK,
+                                         p_name, tok);
+      if (c)
+       {
+         gcc_assert (!TREE_CHAIN (c));
+         unroll_found = true;
+         if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_UNROLL_FULL)
+           {
+             error_at (tok->location, "%<full%> clause is invalid here; "
+                       "turns loop into non-loop");
+             continue;
+           }
+
+         c = finish_omp_clauses (c, C_ORT_OMP);
+       }
+      else
+       {
+         error_at (tok->location, "%<#pragma omp unroll%> without "
+                                  "%<partial%> clause is invalid here; "
+                                  "turns loop into non-loop");
+         continue;
+       }
+      clauses = chainon (clauses, c);
+    }
+  return unroll_found;
+}
+
+static tree
+cp_parser_omp_unroll (cp_parser *parser, cp_token *tok, bool *if_p)
+{
+  tree block, ret;
+  static const char *p_name = "#pragma omp unroll";
+  omp_clause_mask mask = OMP_UNROLL_CLAUSE_MASK;
+
+  tree clauses = cp_parser_omp_all_clauses (parser, mask, p_name, tok, false);
+
+  if (!clauses)
+    {
+      tree c = build_omp_clause (tok->location, OMP_CLAUSE_UNROLL_NONE);
+      OMP_CLAUSE_CHAIN (c) = clauses;
+      clauses = c;
+    }
+
+  cp_parser_nested_omp_unroll_clauses (parser, clauses);
+
+  block = begin_omp_structured_block ();
+  ret = cp_parser_omp_for_loop (parser, OMP_LOOP_TRANS, clauses, NULL, if_p);
+  block = finish_omp_structured_block (block);
+  add_stmt (block);
+
+  return ret;
+}
+
 /* OpenACC 2.0:
    # pragma acc cache (variable-list) new-line
 */
@@ -48750,6 +48900,9 @@ cp_parser_omp_construct (cp_parser *parser, cp_token *pragma_tok, bool *if_p)
     case PRAGMA_OMP_ASSUME:
       cp_parser_omp_assume (parser, pragma_tok, if_p);
       return;
+    case PRAGMA_OMP_UNROLL:
+      stmt = cp_parser_omp_unroll (parser, pragma_tok, if_p);
+      break;
     default:
       gcc_unreachable ();
     }
@@ -49376,6 +49529,13 @@ cp_parser_pragma (cp_parser *parser, enum pragma_context context, bool *if_p)
       cp_parser_omp_construct (parser, pragma_tok, if_p);
       pop_omp_privatization_clauses (stmt);
       return true;
+    case PRAGMA_OMP_UNROLL:
+      if (context != pragma_stmt && context != pragma_compound)
+       goto bad_stmt;
+      stmt = push_omp_privatization_clauses (false);
+      cp_parser_omp_construct (parser, pragma_tok, if_p);
+      pop_omp_privatization_clauses (stmt);
+      return true;

     case PRAGMA_OMP_REQUIRES:
       if (context != pragma_external)
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 40deedc9ba9..63b2d1f7a45 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -18086,6 +18086,7 @@ tsubst_omp_clauses (tree clauses, enum c_omp_region_type ort,
        case OMP_CLAUSE_ASYNC:
        case OMP_CLAUSE_WAIT:
        case OMP_CLAUSE_DETACH:
+       case OMP_CLAUSE_UNROLL_PARTIAL:
          OMP_CLAUSE_OPERAND (nc, 0)
            = tsubst_expr (OMP_CLAUSE_OPERAND (oc, 0), args, complain, in_decl);
          break;
@@ -18169,6 +18170,8 @@ tsubst_omp_clauses (tree clauses, enum c_omp_region_type ort,
        case OMP_CLAUSE_IF_PRESENT:
        case OMP_CLAUSE_FINALIZE:
        case OMP_CLAUSE_NOHOST:
+       case OMP_CLAUSE_UNROLL_FULL:
+       case OMP_CLAUSE_UNROLL_NONE:
          break;
        default:
          gcc_unreachable ();
@@ -19437,6 +19440,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl)
     case OMP_SIMD:
     case OMP_DISTRIBUTE:
     case OMP_TASKLOOP:
+    case OMP_LOOP_TRANS:
     case OACC_LOOP:
       {
        tree clauses, body, pre_body;
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 99a76e3ed65..ac49502eea4 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -6779,6 +6779,7 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
   bool mergeable_seen = false;
   bool implicit_moved = false;
   bool target_in_reduction_seen = false;
+  bool unroll_full_seen = false;

   bitmap_obstack_initialize (NULL);
   bitmap_initialize (&generic_head, &bitmap_default_obstack);
@@ -8822,6 +8823,61 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
            }
          break;

+       case OMP_CLAUSE_UNROLL_FULL:
+         if (unroll_full_seen)
+           {
+             error_at (OMP_CLAUSE_LOCATION (c),
+                       "%<full%> appears more than once");
+             remove = true;
+           }
+         unroll_full_seen = true;
+         break;
+
+       case OMP_CLAUSE_UNROLL_PARTIAL:
+         {
+
+           tree t = OMP_CLAUSE_UNROLL_PARTIAL_EXPR (c);
+
+           if (!t)
+             break;
+
+           if (t == error_mark_node)
+             remove = true;
+           else if (!type_dependent_expression_p (t)
+                    && !INTEGRAL_TYPE_P (TREE_TYPE (t)))
+             {
+               error_at (OMP_CLAUSE_LOCATION (c),
+                         "partial argument needs integral type");
+               remove = true;
+             }
+           else
+             {
+               t = mark_rvalue_use (t);
+               if (!processing_template_decl)
+                 {
+                   t = maybe_constant_value (t);
+
+                   int n;
+                   if (!INTEGRAL_TYPE_P (TREE_TYPE (t))
+                       || !tree_fits_shwi_p (t)
+                       || (n = tree_to_shwi (t)) <= 0 || (int)n != n)
+                     {
+                       error_at (OMP_CLAUSE_LOCATION (c),
+                                 "partial argument needs positive constant "
+                                 "integer expression");
+                       remove = true;
+                     }
+                   t = fold_build_cleanup_point_expr (TREE_TYPE (t), t);
+                 }
+             }
+
+           OMP_CLAUSE_UNROLL_PARTIAL_EXPR (c) = t;
+         }
+         break;
+
+       case OMP_CLAUSE_UNROLL_NONE:
+         break;
+
        default:
          gcc_unreachable ();
        }
diff --git a/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-1.c b/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-1.c
new file mode 100644
index 00000000000..d496dc29053
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-1.c
@@ -0,0 +1,133 @@
+extern void dummy (int);
+
+void
+test1 ()
+{
+#pragma omp unroll partial
+  for (int i = 0; i < 100; ++i)
+    dummy (i);
+}
+
+void
+test2 ()
+{
+#pragma omp unroll partial(10)
+  for (int i = 0; i < 100; ++i)
+    dummy (i);
+}
+
+void
+test3 ()
+{
+#pragma omp unroll full
+  for (int i = 0; i < 100; ++i)
+    dummy (i);
+}
+
+void
+test4 ()
+{
+#pragma omp unroll full
+  for (int i = 0; i > 100; ++i)
+    dummy (i);
+}
+
+void
+test5 ()
+{
+#pragma omp unroll full
+  for (int i = 1; i <= 100; ++i)
+    dummy (i);
+}
+
+void
+test6 ()
+{
+#pragma omp unroll full
+  for (int i = 200; i >= 100; i--)
+    dummy (i);
+}
+
+void
+test7 ()
+{
+#pragma omp unroll full
+  for (int i = -100; i > 100; ++i)
+    dummy (i);
+}
+
+void
+test8 ()
+{
+#pragma omp unroll full
+  for (int i = 100; i > -200; --i)
+    dummy (i);
+}
+
+void
+test9 ()
+{
+#pragma omp unroll full
+  for (int i = -300; i != 100; ++i)
+    dummy (i);
+}
+
+void
+test10 ()
+{
+#pragma omp unroll full
+  for (int i = -300; i != 100; ++i)
+    dummy (i);
+}
+
+void
+test12 ()
+{
+#pragma omp unroll full
+#pragma omp unroll partial
+#pragma omp unroll partial
+  for (int i = -300; i != 100; ++i)
+    dummy (i);
+}
+
+void
+test13 ()
+{
+  for (int i = 0; i < 100; ++i)
+#pragma omp unroll full
+#pragma omp unroll partial
+#pragma omp unroll partial
+  for (int j = -300; j != 100; ++j)
+    dummy (i);
+}
+
+void
+test14 ()
+{
+  #pragma omp for
+  for (int i = 0; i < 100; ++i)
+#pragma omp unroll full
+#pragma omp unroll partial
+#pragma omp unroll partial
+  for (int j = -300; j != 100; ++j)
+    dummy (i);
+}
+
+void
+test15 ()
+{
+  #pragma omp for
+  for (int i = 0; i < 100; ++i)
+    {
+
+    dummy (i);
+
+#pragma omp unroll full
+#pragma omp unroll partial
+#pragma omp unroll partial
+  for (int j = -300; j != 100; ++j)
+    dummy (j);
+
+  dummy (i);
+    }
+ }
diff --git a/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-2.c b/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-2.c
new file mode 100644
index 00000000000..8f7c3088a2e
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-2.c
@@ -0,0 +1,99 @@
+/* { dg-prune-output "error: invalid controlling predicate" } */
+/* { dg-additional-options "-std=c++11" { target c++} } */
+
+extern void dummy (int);
+
+void
+test ()
+{
+#pragma omp unroll partial
+#pragma omp unroll full /* { dg-error {'full' clause is invalid here; turns loop into non-loop} } */
+  for (int i = -300; i != 100; ++i)
+    dummy (i);
+
+#pragma omp for
+#pragma omp unroll full /* { dg-error {'full' clause is invalid here; turns loop into non-loop} } */
+#pragma omp unroll partial
+  for (int i = -300; i != 100; ++i)
+    dummy (i);
+
+#pragma omp for
+#pragma omp unroll full /* { dg-error {'full' clause is invalid here; turns loop into non-loop} } */
+#pragma omp unroll full /* { dg-error {'full' clause is invalid here; turns loop into non-loop} } */
+  for (int i = -300; i != 100; ++i)
+    dummy (i);
+
+#pragma omp for
+#pragma omp unroll partial partial /* { dg-error {too many 'partial' clauses} } */
+  for (int i = -300; i != 100; ++i)
+    dummy (i);
+
+#pragma omp unroll full full /* { dg-error {too many 'full' clauses} } */
+  for (int i = -300; i != 100; ++i)
+    dummy (i);
+
+#pragma omp unroll partial
+#pragma omp unroll /* { dg-error {'#pragma omp unroll' without 'partial' clause is invalid here; turns loop into non-loop} } */
+  for (int i = -300; i != 100; ++i)
+    dummy (i);
+
+#pragma omp for
+#pragma omp unroll /* { dg-error {'#pragma omp unroll' without 'partial' clause is invalid here; turns loop into non-loop} } */
+  for (int i = -300; i != 100; ++i)
+    dummy (i);
+
+  int i;
+#pragma omp for
+#pragma omp unroll( /* { dg-error {expected '#pragma omp' clause before '\(' token} } */
+  /* { dg-error {'#pragma omp unroll' without 'partial' clause is invalid here; turns loop into non-loop} "" { target *-*-* } .-1 } */
+  for (int i = -300; i != 100; ++i)
+    dummy (i);
+
+#pragma omp for
+#pragma omp unroll foo /* { dg-error {expected '#pragma omp' clause before 'foo'} } */
+  /* { dg-error {'#pragma omp unroll' without 'partial' clause is invalid here; turns loop into non-loop} "" { target *-*-* } .-1 } */
+  for (int i = -300; i != 100; ++i)
+    dummy (i);
+
+#pragma omp unroll partial( /* { dg-error {expected expression before end of line} "" { target c } } */
+  /* { dg-error {expected primary-expression before end of line} "" { target c++ } .-1 } */
+  for (int i = -300; i != 100; ++i)
+    dummy (i);
+
+#pragma omp unroll partial() /* { dg-error {expected expression before '\)' token} "" { target c } } */
+  /* { dg-error {expected primary-expression before '\)' token} "" { target c++ } .-1 } */
+  for (int i = -300; i != 100; ++i)
+    dummy (i);
+
+#pragma omp unroll partial(i)
+ /* { dg-error {the value of 'i' is not usable in a constant expression} "" { target c++ } .-1 } */
+ /* { dg-error {partial argument needs positive constant integer expression} "" { target c } .-2 } */
+  for (int i = -300; i != 100; ++i)
+    dummy (i);
+
+#pragma omp unroll parti /* { dg-error {expected '#pragma omp' clause before 'parti'} } */
+  for (int i = -300; i != 100; ++i)
+    dummy (i);
+
+#pragma omp for
+#pragma omp unroll partial(1)
+#pragma omp unroll parti /* { dg-error {expected '#pragma omp' clause before 'parti'} } */
+  /* { dg-error {'#pragma omp unroll' without 'partial' clause is invalid here; turns loop into non-loop} "" { target *-*-* } .-1 } */
+  for (int i = -300; i != 100; ++i)
+    dummy (i);
+
+#pragma omp for
+#pragma omp unroll partial(1)
+#pragma omp unroll parti /* { dg-error {expected '#pragma omp' clause before 'parti'} } */
+  /* { dg-error {'#pragma omp unroll' without 'partial' clause is invalid here; turns loop into non-loop} "" { target *-*-* } .-1 } */
+  for (int i = -300; i != 100; ++i)
+    dummy (i);
+
+int sum = 0;
+#pragma omp parallel for reduction(+ : sum) collapse(2) /* { dg-error {collapse cannot be larger than 1 on an unrolled loop} "" { target c } } */
+#pragma omp unroll partial(1) /* { dg-error {collapse cannot be larger than 1 on an unrolled loop} "" { target c++ } } */
+  for (int i = 3; i < 10; ++i)
+    for (int j = -2; j < 7; ++j)
+      sum++;
+}
+
diff --git a/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-3.c b/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-3.c
new file mode 100644
index 00000000000..7ace5657b26
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-3.c
@@ -0,0 +1,18 @@
+/* { dg-additional-options "-fdump-tree-omp_transform_loops" }
+ * { dg-additional-options "-fdump-tree-original" } */
+
+extern void dummy (int);
+
+void
+test1 ()
+{
+  int i;
+#pragma omp unroll full
+  for (int i = 0; i < 10; i++)
+    dummy (i);
+}
+
+ /* Loop should be removed with 10 copies of the body remaining
+  * { dg-final { scan-tree-dump-times "dummy" 10 "omp_transform_loops" } }
+  * { dg-final { scan-tree-dump "#pragma omp loop_transform" "original" } }
+  * { dg-final { scan-tree-dump-not "#pragma omp" "omp_transform_loops" } } */
diff --git a/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-4.c b/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-4.c
new file mode 100644
index 00000000000..5e473a099d3
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-4.c
@@ -0,0 +1,19 @@
+/* { dg-additional-options "-fdump-tree-omp_transform_loops" }
+ * { dg-additional-options "-fdump-tree-original" } */
+
+extern void dummy (int);
+
+void
+test1 ()
+{
+  int i;
+#pragma omp unroll
+  for (int i = 0; i < 100; i++)
+    dummy (i);
+}
+
+/* Loop should not be unrolled, but the internal representation should be lowered
+ * { dg-final { scan-tree-dump "#pragma omp loop_transform" "original" } }
+ * { dg-final { scan-tree-dump-not "#pragma omp" "omp_transform_loops" } }
+ * { dg-final { scan-tree-dump-times "dummy" 1 "omp_transform_loops" } }
+ * { dg-final { scan-tree-dump-times {if \(i < .+?.+goto.+else goto.*?$} 1 "omp_transform_loops" } } */
diff --git a/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-5.c b/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-5.c
new file mode 100644
index 00000000000..9d5101bdc60
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-5.c
@@ -0,0 +1,19 @@
+/* { dg-additional-options "-fdump-tree-omp_transform_loops -fopt-info-omp-optimized-missed" }
+ * { dg-additional-options "-fdump-tree-original" } */
+
+extern void dummy (int);
+
+void
+test1 ()
+{
+  int i;
+#pragma omp unroll partial  /* { dg-optimized {'partial' clause without unrolling factor turned into 'partial\(5\)' clause} } */
+  for (int i = 0; i < 100; i++)
+    dummy (i);
+}
+
+/* Loop should be unrolled 5 times and the internal representation should be lowered
+ * { dg-final { scan-tree-dump "#pragma omp loop_transform" "original" } }
+ * { dg-final { scan-tree-dump-not "#pragma omp" "omp_transform_loops" } }
+ * { dg-final { scan-tree-dump-times "dummy" 5 "omp_transform_loops" } }
+ * { dg-final { scan-tree-dump-times {if \(i < .+?.+goto.+else goto.*?$} 1 "omp_transform_loops" } } */
diff --git a/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-6.c b/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-6.c
new file mode 100644
index 00000000000..ee2d000239d
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-6.c
@@ -0,0 +1,20 @@
+/* { dg-additional-options "--param=omp-unroll-default-factor=100" }
+ * { dg-additional-options "-fdump-tree-omp_transform_loops -fopt-info-omp-optimized-missed" }
+ * { dg-additional-options "-fdump-tree-original" } */
+
+extern void dummy (int);
+
+void
+test1 ()
+{
+  int i;
+#pragma omp unroll /* { dg-optimized {added 'partial\(100\)' clause to 'omp unroll' directive} } */
+  for (int i = 0; i < 100; i++)
+    dummy (i);
+}
+
+/* Loop should be unrolled 5 times and the internal representation should be lowered
+ * { dg-final { scan-tree-dump "#pragma omp loop_transform unroll_none" "original" } }
+ * { dg-final { scan-tree-dump-not "#pragma omp" "omp_transform_loops" } }
+ * { dg-final { scan-tree-dump-times "dummy" 100 "omp_transform_loops" } }
+ * { dg-final { scan-tree-dump-times {if \(i < .+?.+goto.+else goto.*?$} 1 "omp_transform_loops" } } */
diff --git a/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-7.c b/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-7.c
new file mode 100644
index 00000000000..0458cb030a9
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-7.c
@@ -0,0 +1,144 @@
+/* { dg-do run } */
+/* { dg-options "-O0 -fopenmp-simd" } */
+
+#include <stdio.h>
+
+#define ASSERT_EQ(var, val) if (var != val) { fprintf (stderr, "%s:%d: Unexpected value %d\n", __FILE__, __LINE__, var); \
+    __builtin_abort (); }
+
+#define ASSERT_EQ_PTR(var, ptr) if (var != ptr) { fprintf (stderr, "%s:%d: Unexpected value %p\n", __FILE__, __LINE__, var); \
+    __builtin_abort (); }
+
+int
+test1 (int data[10])
+{
+  int iter = 0;
+  int *i;
+  #pragma omp unroll partial(8)
+  for (i = data; i < data + 10 ; i++)
+       {
+         ASSERT_EQ (*i, data[iter]);
+         ASSERT_EQ_PTR (i, data + iter);
+         iter++;
+       }
+
+  return iter;
+}
+
+int
+test2 (int data[10])
+{
+  int iter = 0;
+  int *i;
+  #pragma omp unroll partial(8)
+  for (i = data; i < data + 10 ; i=i+2)
+       {
+         ASSERT_EQ_PTR (i, data + 2 * iter);
+         ASSERT_EQ (*i, data[2 * iter]);
+         iter++;
+       }
+
+  return iter;
+}
+
+int
+test3 (int data[10])
+{
+  int iter = 0;
+  int *i;
+  #pragma omp unroll partial(8)
+  for (i = data; i <= data + 9 ; i=i+2)
+       {
+         ASSERT_EQ (*i, data[2 * iter]);
+         iter++;
+       }
+
+  return iter;
+}
+
+int
+test4 (int data[10])
+{
+  int iter = 0;
+  int *i;
+  #pragma omp unroll partial(8)
+  for (i = data; i != data + 10 ; i=i+1)
+       {
+         ASSERT_EQ (*i, data[iter]);
+         iter++;
+       }
+
+  return iter;
+}
+
+int
+test5 (int data[10])
+{
+  int iter = 0;
+  int *i;
+  #pragma omp unroll partial(7)
+  for (i = data + 9; i >= data ; i--)
+       {
+         ASSERT_EQ (*i, data[9 - iter]);
+         iter++;
+       }
+
+  return iter;
+}
+
+int
+test6 (int data[10])
+{
+  int iter = 0;
+  int *i;
+  #pragma omp unroll partial(7)
+  for (i = data + 9; i > data - 1 ; i--)
+       {
+         ASSERT_EQ (*i, data[9 - iter]);
+         iter++;
+       }
+
+  return iter;
+}
+
+int
+test7 (int data[10])
+{
+  int iter = 0;
+  #pragma omp unroll partial(7)
+  for (int *i = data + 9; i != data - 1 ; i--)
+       {
+         ASSERT_EQ (*i, data[9 - iter]);
+         iter++;
+       }
+
+  return iter;
+}
+
+int
+main ()
+{
+  int iter_count;
+  int data[10] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
+
+  iter_count = test1 (data);
+  ASSERT_EQ (iter_count, 10);
+
+  iter_count = test2 (data);
+  ASSERT_EQ (iter_count, 5);
+
+  iter_count = test3 (data);
+  ASSERT_EQ (iter_count, 5);
+
+  iter_count = test4 (data);
+  ASSERT_EQ (iter_count, 10);
+
+  iter_count = test5 (data);
+  ASSERT_EQ (iter_count, 10);
+
+  iter_count = test6 (data);
+  ASSERT_EQ (iter_count, 10);
+
+  iter_count = test7 (data);
+  ASSERT_EQ (iter_count, 10);
+}
diff --git a/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-simd-1.c b/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-simd-1.c
new file mode 100644
index 00000000000..1cd4d6e7322
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-simd-1.c
@@ -0,0 +1,84 @@
+/* { dg-options "-fno-openmp -fopenmp-simd" } */
+/* { dg-do run } */
+/* { dg-additional-options "-fdump-tree-original" } */
+/* { dg-additional-options "-fdump-tree-omp_transform_loops" } */
+
+#include <stdio.h>
+
+int compute_sum1 ()
+{
+  int sum = 0;
+  int i,j;
+
+#pragma omp simd reduction(+:sum)
+  for (i = 3; i < 10; ++i)
+  #pragma omp unroll full
+  for (j = -2; j < 7; ++j)
+    sum++;
+
+  if (j != 7)
+    __builtin_abort;
+
+  return sum;
+}
+
+int compute_sum2()
+{
+  int sum = 0;
+  int i,j;
+#pragma omp simd reduction(+:sum)
+#pragma omp unroll partial(5)
+  for (i = 3; i < 10; ++i)
+  for (j = -2; j < 7; ++j)
+    sum++;
+
+  if (j != 7)
+    __builtin_abort;
+
+  return sum;
+}
+
+int compute_sum3()
+{
+  int sum = 0;
+  int i,j;
+#pragma omp simd reduction(+:sum)
+#pragma omp unroll partial(1)
+  for (i = 3; i < 10; ++i)
+  for (j = -2; j < 7; ++j)
+    sum++;
+
+  if (j != 7)
+    __builtin_abort;
+
+  return sum;
+}
+
+int main ()
+{
+  int result = compute_sum1 ();
+  if (result != 7 * 9)
+    {
+      fprintf (stderr, "%d: Wrong result %d\n", __LINE__, result);
+      __builtin_abort ();
+    }
+
+  result = compute_sum1 ();
+  if (result != 7 * 9)
+    {
+      fprintf (stderr, "%d: Wrong result %d\n", __LINE__, result);
+      __builtin_abort ();
+    }
+
+  result = compute_sum3 ();
+  if (result != 7 * 9)
+    {
+      fprintf (stderr, "%d: Wrong result %d\n", __LINE__, result);
+      __builtin_abort ();
+    }
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump {omp loop_transform} "original" } } */
+/* { dg-final { scan-tree-dump-not {omp loop_transform} "omp_transform_loops" } } */
diff --git a/gcc/testsuite/g++.dg/gomp/loop-transforms/unroll-1.C b/gcc/testsuite/g++.dg/gomp/loop-transforms/unroll-1.C
new file mode 100644
index 00000000000..cba37c88ebe
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gomp/loop-transforms/unroll-1.C
@@ -0,0 +1,42 @@
+// { dg-do compile }
+// { dg-additional-options "-std=c++11" }
+#include <vector>
+
+extern void dummy (int);
+
+void
+test1 ()
+{
+  std::vector<int> v;
+
+  for (unsigned i = 0; i < 1000; i++)
+    v.push_back (i);
+
+#pragma omp for
+  for (int i : v)
+    dummy (i);
+
+#pragma omp unroll partial(5)
+  for (int i : v)
+    dummy (i);
+}
+
+void
+test2 ()
+{
+  std::vector<std::vector<int>> v;
+
+  for (unsigned i = 0; i < 10; i++)
+    {
+      std::vector<int> u;
+      for (unsigned j = 0; j < 10; j++)
+       u.push_back (j);
+      v.push_back (u);
+    }
+
+#pragma omp for
+#pragma omp unroll partial(5)
+  for (auto u : v)
+    for (int i : u)
+      dummy (i);
+}
diff --git a/gcc/testsuite/g++.dg/gomp/loop-transforms/unroll-2.C b/gcc/testsuite/g++.dg/gomp/loop-transforms/unroll-2.C
new file mode 100644
index 00000000000..f606f3de757
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gomp/loop-transforms/unroll-2.C
@@ -0,0 +1,47 @@
+// { dg-do link }
+// { dg-additional-options "-std=c++11" }
+#include <vector>
+
+extern void dummy (int);
+
+template<class T, int U1, int U2, int U3> void
+test_template ()
+{
+  std::vector<int> v;
+
+  for (unsigned i = 0; i < 1000; i++)
+    v.push_back (i);
+
+#pragma omp for
+  for (int i : v)
+    dummy (i);
+
+#pragma omp unroll partial(U1)
+  for (T i : v)
+    dummy (i);
+
+#pragma omp unroll partial(U2) // { dg-error {partial argument needs positive constant integer expression} }
+  for (T i : v)
+    dummy (i);
+
+#pragma omp unroll partial(U3) // { dg-error {partial argument needs positive constant integer expression} }
+  for (T i : v)
+    dummy (i);
+
+#pragma omp for
+#pragma omp unroll partial(U1)
+  for (T i : v)
+    dummy (i);
+
+#pragma omp for
+#pragma omp unroll partial(U2) // { dg-error {partial argument needs positive constant integer expression} }
+  for (T i : v)
+    dummy (i);
+
+#pragma omp for
+#pragma omp unroll partial(U3) // { dg-error {partial argument needs positive constant integer expression} }
+  for (T i : v)
+    dummy (i);
+}
+
+void test () { test_template <long, 5,-2, 0> (); };
diff --git a/gcc/testsuite/g++.dg/gomp/loop-transforms/unroll-3.C b/gcc/testsuite/g++.dg/gomp/loop-transforms/unroll-3.C
new file mode 100644
index 00000000000..ae9f5500360
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gomp/loop-transforms/unroll-3.C
@@ -0,0 +1,37 @@
+// { dg-do compile }
+// { dg-additional-options "-std=c++11" }
+// { dg-additional-options "-fdump-tree-omp_transform_loops -fopt-info-omp-optimized-missed" }
+// { dg-additional-options "-fdump-tree-original" }
+#include <vector>
+
+extern void dummy (int);
+
+constexpr unsigned fib (unsigned n)
+{
+  return n <= 2 ? 1 : fib (n-1) + fib (n-2);
+}
+
+void
+test1 ()
+{
+  std::vector<int> v;
+
+  for (unsigned i = 0; i < 1000; i++)
+    v.push_back (i);
+
+#pragma omp unroll partial(fib(10))
+  for (int i : v)
+    dummy (i);
+}
+
+
+// Loop should be unrolled fib(10) = 55 times
+// ! { dg-final { scan-tree-dump {#pragma omp loop_transform unroll_partial\(55\)} "original" } }
+// ! { dg-final { scan-tree-dump-not "#pragma omp" "omp_transform_loops" } }
+// ! { dg-final { scan-tree-dump-times "dummy" 55 "omp_transform_loops" } }
+
+// There should be one loop that fills the vector ...
+// ! { dg-final { scan-tree-dump-times {if \(i.*? <= .+?.+goto.+else goto.*?$} 1 "omp_transform_loops" } }
+
+// ... and one resulting from the lowering of the unrolled loop
+// ! { dg-final { scan-tree-dump-times {if \(D\.[0-9]+ < retval.+?.+goto.+else goto.*?$} 1 "omp_transform_loops" } }
diff --git a/libgomp/testsuite/libgomp.c++/loop-transforms/unroll-1.C b/libgomp/testsuite/libgomp.c++/loop-transforms/unroll-1.C
new file mode 100644
index 00000000000..004eef91649
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c++/loop-transforms/unroll-1.C
@@ -0,0 +1,73 @@
+// { dg-additional-options "-std=c++11" }
+// { dg-additional-options "-O0" }
+
+#include <vector>
+#include <stdio.h>
+
+constexpr unsigned fib (unsigned n)
+{
+  return n <= 2 ? 1 : fib (n-1) + fib (n-2);
+}
+
+int
+test1 ()
+{
+  std::vector<int> v;
+
+  for (unsigned i = 0; i <= 9; i++)
+    v.push_back (1);
+
+  int sum = 0;
+  for (int k = 0; k < 10; k++)
+#pragma omp unroll partial(fib(3))
+    for (int i : v) {
+      for (int j = 8; j != -2; --j)
+       sum = sum + i;
+    }
+
+  return sum;
+}
+
+int
+test2 ()
+{
+  std::vector<int> v;
+
+  for (unsigned i = 0; i <= 10; i++)
+    v.push_back (i);
+
+  int sum = 0;
+#pragma omp parallel for reduction(+:sum)
+  for (int k = 0; k < 10; k++)
+#pragma omp unroll
+#pragma omp unroll partial(fib(4))
+  for (int i : v)
+    {
+      #pragma omp unroll full
+      for (int j = 8; j != -2; --j)
+       sum = sum + i;
+    }
+
+  return sum;
+}
+
+int
+main ()
+{
+  int result = test1 ();
+
+  if (result != 1000)
+    {
+      fprintf (stderr, "Wrong result: %d\n", result);
+    __builtin_abort ();
+    }
+
+  result = test2 ();
+  if (result != 5500)
+    {
+      fprintf (stderr, "Wrong result: %d\n", result);
+    __builtin_abort ();
+    }
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c++/loop-transforms/unroll-2.C b/libgomp/testsuite/libgomp.c++/loop-transforms/unroll-2.C
new file mode 100644
index 00000000000..90d2775c95b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c++/loop-transforms/unroll-2.C
@@ -0,0 +1,34 @@
+// { dg-do run }
+// { dg-additional-options "-std=c++11" }
+#include <vector>
+#include <iostream>
+
+int
+main ()
+{
+  std::vector<std::vector<int>> v;
+  std::vector<int> w;
+
+  for (unsigned i = 0; i < 10; i++)
+    {
+      std::vector<int> u;
+      for (unsigned j = 0; j < 10; j++)
+       u.push_back (j);
+      v.push_back (u);
+    }
+
+#pragma omp for
+#pragma omp unroll partial(7)
+  for (auto u : v)
+    for (int x : u)
+      w.push_back (x);
+
+  std::size_t l = w.size ();
+  for (std::size_t i = 0; i < l; i++)
+    {
+      if (w[i] != i % 10)
+       __builtin_abort ();
+    }
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/unroll-1.c b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/unroll-1.c
new file mode 100644
index 00000000000..2ac0fff16af
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/unroll-1.c
@@ -0,0 +1,76 @@
+#include <stdio.h>
+
+int compute_sum1 ()
+{
+  int sum = 0;
+  int i,j;
+#pragma omp parallel for reduction(+:sum) lastprivate(j)
+#pragma omp unroll partial
+  for (i = 3; i < 10; ++i)
+  for (j = -2; j < 7; ++j)
+    sum++;
+
+  if (j != 7)
+    __builtin_abort;
+
+  return sum;
+}
+
+int compute_sum2()
+{
+  int sum = 0;
+  int i,j;
+#pragma omp parallel for reduction(+:sum) lastprivate(j)
+#pragma omp unroll partial(5)
+  for (i = 3; i < 10; ++i)
+  for (j = -2; j < 7; ++j)
+    sum++;
+
+  if (j != 7)
+    __builtin_abort;
+
+  return sum;
+}
+
+int compute_sum3()
+{
+  int sum = 0;
+  int i,j;
+#pragma omp parallel for reduction(+:sum)
+#pragma omp unroll partial(1)
+  for (i = 3; i < 10; ++i)
+  for (j = -2; j < 7; ++j)
+    sum++;
+
+  if (j != 7)
+    __builtin_abort;
+
+  return sum;
+}
+
+int main ()
+{
+  int result;
+  result = compute_sum1 ();
+  if (result != 7 * 9)
+    {
+      fprintf (stderr, "%d: Wrong result %d\n", __LINE__, result);
+      __builtin_abort ();
+    }
+
+  result = compute_sum2 ();
+  if (result != 7 * 9)
+    {
+      fprintf (stderr, "%d: Wrong result %d\n", __LINE__, result);
+      __builtin_abort ();
+    }
+
+  result = compute_sum3 ();
+  if (result != 7 * 9)
+    {
+      fprintf (stderr, "%d: Wrong result %d\n", __LINE__, result);
+      __builtin_abort ();
+    }
+
+  return 0;
+}
--
2.36.1

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 3/7] openacc: Rename OMP_CLAUSE_TILE to OMP_CLAUSE_OACC_TILE
  2023-03-24 15:30 [PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives Frederik Harwath
  2023-03-24 15:30 ` [PATCH 1/7] openmp: Add Fortran support for "omp unroll" directive Frederik Harwath
  2023-03-24 15:30 ` [PATCH 2/7] openmp: Add C/C++ " Frederik Harwath
@ 2023-03-24 15:30 ` Frederik Harwath
  2023-03-24 15:30 ` [PATCH 4/7] openmp: Add Fortran support for "omp tile" Frederik Harwath
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Frederik Harwath @ 2023-03-24 15:30 UTC (permalink / raw)
  To: gcc-patches, fortran, tobias, jakub, joseph, jason

OMP_CLAUSE_TILE will be used for the OpenMP 5.1 loop transformation
construct "omp tile".

gcc/ChangeLog:

        * tree-core.h (enum omp_clause_code): Rename OMP_CLAUSE_TILE.
        * tree.h (OMP_CLAUSE_TILE_LIST): Rename to ...
        (OMP_CLAUSE_OACC_TILE_LIST): ... this.
        (OMP_CLAUSE_TILE_ITERVAR): Rename to ...
        (OMP_CLAUSE_OACC_TILE_ITERVAR): ... this.
        (OMP_CLAUSE_TILE_COUNT): Rename to ...
        (OMP_CLAUSE_OACC_TILE_COUNT): this.
        * gimplify.cc (gimplify_scan_omp_clauses): Adjust to renamings.
        (gimplify_adjust_omp_clauses): Likewise.
        (gimplify_omp_for): Likewise.
        * omp-general.cc (omp_extract_for_data): Likewise.
        * omp-low.cc (scan_sharing_clauses): Likewise.
        (lower_oacc_head_mark): Likewise.
        * tree-nested.cc (convert_nonlocal_omp_clauses): Likewise.
        (convert_local_omp_clauses): Likewise.
        * tree-pretty-print.cc (dump_omp_clause): Likewise.
        * tree.cc: Likewise.

gcc/c-family/ChangeLog:

        * c-omp.cc (c_oacc_split_loop_clauses): Adjust to renamings.

gcc/c/ChangeLog:

        * c-parser.cc (c_parser_omp_clause_collapse): Adjust to renamings.
        (c_parser_oacc_clause_tile): Likewise.
        (c_parser_omp_for_loop): Likewise.
        * c-typeck.cc (c_finish_omp_clauses): Likewise.

gcc/cp/ChangeLog:

        * parser.cc (cp_parser_oacc_clause_tile): Adjust to renamings.
        (cp_parser_omp_clause_collapse): Likewise.
        (cp_parser_omp_for_loop): Likewise.
        * pt.cc (tsubst_omp_clauses): Likewise.
        * semantics.cc (finish_omp_clauses): Likewise.
        (finish_omp_for): Likewise.

gcc/fortran/ChangeLog:

        * openmp.cc (enum omp_mask2): Adjust to renamings.
        (gfc_match_omp_clauses): Likewise.
        * trans-openmp.cc (gfc_trans_omp_clauses): Likewise.
---
 gcc/c-family/c-omp.cc       |  2 +-
 gcc/c/c-parser.cc           | 12 ++++++------
 gcc/c/c-typeck.cc           |  2 +-
 gcc/cp/parser.cc            | 12 ++++++------
 gcc/cp/pt.cc                |  2 +-
 gcc/cp/semantics.cc         |  8 ++++----
 gcc/fortran/openmp.cc       |  6 +++---
 gcc/fortran/trans-openmp.cc |  4 ++--
 gcc/gimplify.cc             |  8 ++++----
 gcc/omp-general.cc          |  8 ++++----
 gcc/omp-low.cc              |  6 +++---
 gcc/tree-core.h             |  2 +-
 gcc/tree-nested.cc          |  4 ++--
 gcc/tree-pretty-print.cc    |  4 ++--
 gcc/tree.cc                 |  2 +-
 gcc/tree.h                  | 12 ++++++------
 16 files changed, 47 insertions(+), 47 deletions(-)

diff --git a/gcc/c-family/c-omp.cc b/gcc/c-family/c-omp.cc
index 85ba9c528c8..fec7f337772 100644
--- a/gcc/c-family/c-omp.cc
+++ b/gcc/c-family/c-omp.cc
@@ -1749,7 +1749,7 @@ c_oacc_split_loop_clauses (tree clauses, tree *not_loop_clauses,
         {
          /* Loop clauses.  */
        case OMP_CLAUSE_COLLAPSE:
-       case OMP_CLAUSE_TILE:
+       case OMP_CLAUSE_OACC_TILE:
        case OMP_CLAUSE_GANG:
        case OMP_CLAUSE_WORKER:
        case OMP_CLAUSE_VECTOR:
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 9d875befccc..e7c9da99552 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -14183,7 +14183,7 @@ c_parser_omp_clause_collapse (c_parser *parser, tree list)
   location_t loc;

   check_no_duplicate_clause (list, OMP_CLAUSE_COLLAPSE, "collapse");
-  check_no_duplicate_clause (list, OMP_CLAUSE_TILE, "tile");
+  check_no_duplicate_clause (list, OMP_CLAUSE_OACC_TILE, "tile");

   loc = c_parser_peek_token (parser)->location;
   matching_parens parens;
@@ -15349,7 +15349,7 @@ c_parser_oacc_clause_tile (c_parser *parser, tree list)
   location_t loc;
   tree tile = NULL_TREE;

-  check_no_duplicate_clause (list, OMP_CLAUSE_TILE, "tile");
+  check_no_duplicate_clause (list, OMP_CLAUSE_OACC_TILE, "tile");
   check_no_duplicate_clause (list, OMP_CLAUSE_COLLAPSE, "collapse");

   loc = c_parser_peek_token (parser)->location;
@@ -15401,9 +15401,9 @@ c_parser_oacc_clause_tile (c_parser *parser, tree list)
   /* Consume the trailing ')'.  */
   c_parser_consume_token (parser);

-  c = build_omp_clause (loc, OMP_CLAUSE_TILE);
+  c = build_omp_clause (loc, OMP_CLAUSE_OACC_TILE);
   tile = nreverse (tile);
-  OMP_CLAUSE_TILE_LIST (c) = tile;
+  OMP_CLAUSE_OACC_TILE_LIST (c) = tile;
   OMP_CLAUSE_CHAIN (c) = list;
   return c;
 }
@@ -20270,10 +20270,10 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
   for (cl = clauses; cl; cl = OMP_CLAUSE_CHAIN (cl))
     if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_COLLAPSE)
       collapse = tree_to_shwi (OMP_CLAUSE_COLLAPSE_EXPR (cl));
-    else if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_TILE)
+    else if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_OACC_TILE)
       {
        tiling = true;
-       collapse = list_length (OMP_CLAUSE_TILE_LIST (cl));
+       collapse = list_length (OMP_CLAUSE_OACC_TILE_LIST (cl));
       }
     else if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_ORDERED
             && OMP_CLAUSE_ORDERED_EXPR (cl))
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index bffea79b441..40df7bb0069 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -15872,7 +15872,7 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
        case OMP_CLAUSE_GANG:
        case OMP_CLAUSE_WORKER:
        case OMP_CLAUSE_VECTOR:
-       case OMP_CLAUSE_TILE:
+       case OMP_CLAUSE_OACC_TILE:
        case OMP_CLAUSE_IF_PRESENT:
        case OMP_CLAUSE_FINALIZE:
        case OMP_CLAUSE_NOHOST:
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 7034fdf49a4..90af40c4dbc 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -37981,7 +37981,7 @@ cp_parser_oacc_clause_tile (cp_parser *parser, location_t clause_loc, tree list)
      so, but the spec authors never considered such a case and have
      differing opinions on what it might mean, including 'not
      allowed'.)  */
-  check_no_duplicate_clause (list, OMP_CLAUSE_TILE, "tile", clause_loc);
+  check_no_duplicate_clause (list, OMP_CLAUSE_OACC_TILE, "tile", clause_loc);
   check_no_duplicate_clause (list, OMP_CLAUSE_COLLAPSE, "collapse",
                             clause_loc);

@@ -38010,9 +38010,9 @@ cp_parser_oacc_clause_tile (cp_parser *parser, location_t clause_loc, tree list)
   /* Consume the trailing ')'.  */
   cp_lexer_consume_token (parser->lexer);

-  c = build_omp_clause (clause_loc, OMP_CLAUSE_TILE);
+  c = build_omp_clause (clause_loc, OMP_CLAUSE_OACC_TILE);
   tile = nreverse (tile);
-  OMP_CLAUSE_TILE_LIST (c) = tile;
+  OMP_CLAUSE_OACC_TILE_LIST (c) = tile;
   OMP_CLAUSE_CHAIN (c) = list;
   return c;
 }
@@ -38125,7 +38125,7 @@ cp_parser_omp_clause_collapse (cp_parser *parser, tree list, location_t location
     }

   check_no_duplicate_clause (list, OMP_CLAUSE_COLLAPSE, "collapse", location);
-  check_no_duplicate_clause (list, OMP_CLAUSE_TILE, "tile", location);
+  check_no_duplicate_clause (list, OMP_CLAUSE_OACC_TILE, "tile", location);
   c = build_omp_clause (loc, OMP_CLAUSE_COLLAPSE);
   OMP_CLAUSE_CHAIN (c) = list;
   OMP_CLAUSE_COLLAPSE_EXPR (c) = num;
@@ -43654,10 +43654,10 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
   for (cl = clauses; cl; cl = OMP_CLAUSE_CHAIN (cl))
     if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_COLLAPSE)
       collapse = tree_to_shwi (OMP_CLAUSE_COLLAPSE_EXPR (cl));
-    else if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_TILE)
+    else if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_OACC_TILE)
       {
        tiling = true;
-       collapse = list_length (OMP_CLAUSE_TILE_LIST (cl));
+       collapse = list_length (OMP_CLAUSE_OACC_TILE_LIST (cl));
       }
     else if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_ORDERED
             && OMP_CLAUSE_ORDERED_EXPR (cl))
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 63b2d1f7a45..16197b17e5a 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -18061,7 +18061,7 @@ tsubst_omp_clauses (tree clauses, enum c_omp_region_type ort,
              = tsubst_expr (OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR (oc), args,
                             complain, in_decl);
          /* FALLTHRU */
-       case OMP_CLAUSE_TILE:
+       case OMP_CLAUSE_OACC_TILE:
        case OMP_CLAUSE_IF:
        case OMP_CLAUSE_NUM_THREADS:
        case OMP_CLAUSE_SCHEDULE:
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index ac49502eea4..c87e252ff06 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -8729,8 +8729,8 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
          mergeable_seen = true;
          break;

-       case OMP_CLAUSE_TILE:
-         for (tree list = OMP_CLAUSE_TILE_LIST (c); !remove && list;
+       case OMP_CLAUSE_OACC_TILE:
+         for (tree list = OMP_CLAUSE_OACC_TILE_LIST (c); !remove && list;
               list = TREE_CHAIN (list))
            {
              t = TREE_VALUE (list);
@@ -10498,9 +10498,9 @@ finish_omp_for (location_t locus, enum tree_code code, tree declv,
     {
       tree c;

-      c = omp_find_clause (clauses, OMP_CLAUSE_TILE);
+      c = omp_find_clause (clauses, OMP_CLAUSE_OACC_TILE);
       if (c)
-       collapse = list_length (OMP_CLAUSE_TILE_LIST (c));
+       collapse = list_length (OMP_CLAUSE_OACC_TILE_LIST (c));
       else
        {
          c = omp_find_clause (clauses, OMP_CLAUSE_COLLAPSE);
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index e54f016b170..ec707d977cd 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -1075,7 +1075,7 @@ enum omp_mask2
   OMP_CLAUSE_WAIT,
   OMP_CLAUSE_DELETE,
   OMP_CLAUSE_AUTO,
-  OMP_CLAUSE_TILE,
+  OMP_CLAUSE_OACC_TILE,
   OMP_CLAUSE_IF_PRESENT,
   OMP_CLAUSE_FINALIZE,
   OMP_CLAUSE_ATTACH,
@@ -3478,7 +3478,7 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
              c->threads = needs_space = true;
              continue;
            }
-         if ((mask & OMP_CLAUSE_TILE)
+         if ((mask & OMP_CLAUSE_OACC_TILE)
              && !c->tile_list
              && match_oacc_expr_list ("tile (", &c->tile_list,
                                       true) == MATCH_YES)
@@ -3677,7 +3677,7 @@ error:
   (omp_mask (OMP_CLAUSE_COLLAPSE) | OMP_CLAUSE_GANG | OMP_CLAUSE_WORKER              \
    | OMP_CLAUSE_VECTOR | OMP_CLAUSE_SEQ | OMP_CLAUSE_INDEPENDENT             \
    | OMP_CLAUSE_PRIVATE | OMP_CLAUSE_REDUCTION | OMP_CLAUSE_AUTO             \
-   | OMP_CLAUSE_TILE)
+   | OMP_CLAUSE_OACC_TILE)
 #define OACC_PARALLEL_LOOP_CLAUSES \
   (OACC_LOOP_CLAUSES | OACC_PARALLEL_CLAUSES)
 #define OACC_KERNELS_LOOP_CLAUSES \
diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index c4a23f6e247..73c416c951d 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -4371,8 +4371,8 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
       for (el = clauses->tile_list; el; el = el->next)
        vec_safe_push (tvec, gfc_convert_expr_to_tree (block, el->expr));

-      c = build_omp_clause (gfc_get_location (&where), OMP_CLAUSE_TILE);
-      OMP_CLAUSE_TILE_LIST (c) = build_tree_list_vec (tvec);
+      c = build_omp_clause (gfc_get_location (&where), OMP_CLAUSE_OACC_TILE);
+      OMP_CLAUSE_OACC_TILE_LIST (c) = build_tree_list_vec (tvec);
       omp_clauses = gfc_trans_add_clause (c, omp_clauses);
       tvec->truncate (0);
     }
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 2c160686533..14616eb5316 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -11923,7 +11923,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
        case OMP_CLAUSE_ORDERED:
        case OMP_CLAUSE_UNTIED:
        case OMP_CLAUSE_COLLAPSE:
-       case OMP_CLAUSE_TILE:
+       case OMP_CLAUSE_OACC_TILE:
        case OMP_CLAUSE_AUTO:
        case OMP_CLAUSE_SEQ:
        case OMP_CLAUSE_INDEPENDENT:
@@ -13071,7 +13071,7 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p,
        case OMP_CLAUSE_VECTOR:
        case OMP_CLAUSE_AUTO:
        case OMP_CLAUSE_SEQ:
-       case OMP_CLAUSE_TILE:
+       case OMP_CLAUSE_OACC_TILE:
        case OMP_CLAUSE_IF_PRESENT:
        case OMP_CLAUSE_FINALIZE:
        case OMP_CLAUSE_INCLUSIVE:
@@ -13970,9 +13970,9 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
   c = omp_find_clause (OMP_FOR_CLAUSES (for_stmt), OMP_CLAUSE_COLLAPSE);
   if (c)
     collapse = tree_to_shwi (OMP_CLAUSE_COLLAPSE_EXPR (c));
-  c = omp_find_clause (OMP_FOR_CLAUSES (for_stmt), OMP_CLAUSE_TILE);
+  c = omp_find_clause (OMP_FOR_CLAUSES (for_stmt), OMP_CLAUSE_OACC_TILE);
   if (c)
-    tile = list_length (OMP_CLAUSE_TILE_LIST (c));
+    tile = list_length (OMP_CLAUSE_OACC_TILE_LIST (c));
   c = omp_find_clause (OMP_FOR_CLAUSES (for_stmt), OMP_CLAUSE_ALLOCATE);
   hash_set<tree> *allocate_uids = NULL;
   if (c)
diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc
index e29d695dcba..0f326128874 100644
--- a/gcc/omp-general.cc
+++ b/gcc/omp-general.cc
@@ -271,12 +271,12 @@ omp_extract_for_data (gomp_for *for_stmt, struct omp_for_data *fd,
            collapse_count = &OMP_CLAUSE_COLLAPSE_COUNT (t);
          }
        break;
-      case OMP_CLAUSE_TILE:
-       fd->tiling = OMP_CLAUSE_TILE_LIST (t);
+      case OMP_CLAUSE_OACC_TILE:
+       fd->tiling = OMP_CLAUSE_OACC_TILE_LIST (t);
        fd->collapse = list_length (fd->tiling);
        gcc_assert (fd->collapse);
-       collapse_iter = &OMP_CLAUSE_TILE_ITERVAR (t);
-       collapse_count = &OMP_CLAUSE_TILE_COUNT (t);
+       collapse_iter = &OMP_CLAUSE_OACC_TILE_ITERVAR (t);
+       collapse_count = &OMP_CLAUSE_OACC_TILE_COUNT (t);
        break;
       case OMP_CLAUSE__REDUCTEMP_:
        fd->have_reductemp = true;
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index 1818132830f..b5b2134ab17 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -1744,7 +1744,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
        case OMP_CLAUSE_INDEPENDENT:
        case OMP_CLAUSE_AUTO:
        case OMP_CLAUSE_SEQ:
-       case OMP_CLAUSE_TILE:
+       case OMP_CLAUSE_OACC_TILE:
        case OMP_CLAUSE__SIMT_:
        case OMP_CLAUSE_DEFAULT:
        case OMP_CLAUSE_NONTEMPORAL:
@@ -1963,7 +1963,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
        case OMP_CLAUSE_INDEPENDENT:
        case OMP_CLAUSE_AUTO:
        case OMP_CLAUSE_SEQ:
-       case OMP_CLAUSE_TILE:
+       case OMP_CLAUSE_OACC_TILE:
        case OMP_CLAUSE__SIMT_:
        case OMP_CLAUSE_IF_PRESENT:
        case OMP_CLAUSE_FINALIZE:
@@ -8376,7 +8376,7 @@ lower_oacc_head_mark (location_t loc, tree ddvar, tree clauses,
          tag |= OLF_INDEPENDENT;
          break;

-       case OMP_CLAUSE_TILE:
+       case OMP_CLAUSE_OACC_TILE:
          tag |= OLF_TILE;
          break;

diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index e563408877e..f1429824158 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -515,7 +515,7 @@ enum omp_clause_code {
   OMP_CLAUSE_VECTOR_LENGTH,

   /* OpenACC clause: tile ( size-expr-list ).  */
-  OMP_CLAUSE_TILE,
+  OMP_CLAUSE_OACC_TILE,

   /* OpenACC clause: if_present.  */
   OMP_CLAUSE_IF_PRESENT,
diff --git a/gcc/tree-nested.cc b/gcc/tree-nested.cc
index 1418e1f7f56..ed115b5eb3f 100644
--- a/gcc/tree-nested.cc
+++ b/gcc/tree-nested.cc
@@ -1474,7 +1474,7 @@ convert_nonlocal_omp_clauses (tree *pclauses, struct walk_stmt_info *wi)
        case OMP_CLAUSE_DEFAULT:
        case OMP_CLAUSE_COPYIN:
        case OMP_CLAUSE_COLLAPSE:
-       case OMP_CLAUSE_TILE:
+       case OMP_CLAUSE_OACC_TILE:
        case OMP_CLAUSE_UNTIED:
        case OMP_CLAUSE_MERGEABLE:
        case OMP_CLAUSE_PROC_BIND:
@@ -2270,7 +2270,7 @@ convert_local_omp_clauses (tree *pclauses, struct walk_stmt_info *wi)
        case OMP_CLAUSE_DEFAULT:
        case OMP_CLAUSE_COPYIN:
        case OMP_CLAUSE_COLLAPSE:
-       case OMP_CLAUSE_TILE:
+       case OMP_CLAUSE_OACC_TILE:
        case OMP_CLAUSE_UNTIED:
        case OMP_CLAUSE_MERGEABLE:
        case OMP_CLAUSE_PROC_BIND:
diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
index 588a992bcf3..cae81719e68 100644
--- a/gcc/tree-pretty-print.cc
+++ b/gcc/tree-pretty-print.cc
@@ -1416,9 +1416,9 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, dump_flags_t flags)
     case OMP_CLAUSE_INDEPENDENT:
       pp_string (pp, "independent");
       break;
-    case OMP_CLAUSE_TILE:
+    case OMP_CLAUSE_OACC_TILE:
       pp_string (pp, "tile(");
-      dump_generic_node (pp, OMP_CLAUSE_TILE_LIST (clause),
+      dump_generic_node (pp, OMP_CLAUSE_OACC_TILE_LIST (clause),
                         spc, flags, false);
       pp_right_paren (pp);
       break;
diff --git a/gcc/tree.cc b/gcc/tree.cc
index 53e44367977..fc7e22d352f 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -322,7 +322,7 @@ unsigned const char omp_clause_num_ops[] =
   1, /* OMP_CLAUSE_NUM_GANGS  */
   1, /* OMP_CLAUSE_NUM_WORKERS  */
   1, /* OMP_CLAUSE_VECTOR_LENGTH  */
-  3, /* OMP_CLAUSE_TILE  */
+  3, /* OMP_CLAUSE_OACC_TILE  */
   0, /* OMP_CLAUSE_IF_PRESENT */
   0, /* OMP_CLAUSE_FINALIZE */
   0, /* OMP_CLAUSE_NOHOST */
diff --git a/gcc/tree.h b/gcc/tree.h
index f33f815b712..6f7a6e7017a 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1963,12 +1963,12 @@ class auto_suppress_location_wrappers
 #define OMP_CLAUSE_ENTER_TO(NODE) \
   (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_ENTER)->base.public_flag)

-#define OMP_CLAUSE_TILE_LIST(NODE) \
-  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_TILE), 0)
-#define OMP_CLAUSE_TILE_ITERVAR(NODE) \
-  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_TILE), 1)
-#define OMP_CLAUSE_TILE_COUNT(NODE) \
-  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_TILE), 2)
+#define OMP_CLAUSE_OACC_TILE_LIST(NODE) \
+  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_OACC_TILE), 0)
+#define OMP_CLAUSE_OACC_TILE_ITERVAR(NODE) \
+  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_OACC_TILE), 1)
+#define OMP_CLAUSE_OACC_TILE_COUNT(NODE) \
+  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_OACC_TILE), 2)

 /* _CONDTEMP_ holding temporary with iteration count.  */
 #define OMP_CLAUSE__CONDTEMP__ITER(NODE) \
--
2.36.1

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 4/7] openmp: Add Fortran support for "omp tile"
  2023-03-24 15:30 [PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives Frederik Harwath
                   ` (2 preceding siblings ...)
  2023-03-24 15:30 ` [PATCH 3/7] openacc: Rename OMP_CLAUSE_TILE to OMP_CLAUSE_OACC_TILE Frederik Harwath
@ 2023-03-24 15:30 ` Frederik Harwath
  2023-03-24 15:30 ` [PATCH 5/7] openmp: Add C/C++ " Frederik Harwath
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Frederik Harwath @ 2023-03-24 15:30 UTC (permalink / raw)
  To: gcc-patches, tobias, fortran, jakub

This commit implements the Fortran front end support for the "omp
tile" directive and the corresponding middle end transformation.

gcc/fortran/ChangeLog:

        * gfortran.h (enum gfc_statement): Add ST_OMP_TILE, ST_OMP_END_TILE.
        (enum gfc_exec_op): Add EXEC_OMP_TILE.
        (loop_transform_p): New declaration.
        (struct gfc_omp_clauses): Add "tile_sizes" field.
        * dump-parse-tree.cc (show_omp_clauses): Handle "tile_sizes" dumping.
        (show_omp_node): Handle EXEC_OMP_TILE.
        (show_code_node): Likewise.
        * match.h (gfc_match_omp_tile): New declaration.
        * openmp.cc (gfc_free_omp_clauses): Free "tile_sizes" field.
        (match_tile_sizes): New function.
        (OMP_TILE_CLAUSES): New macro.
        (gfc_match_omp_tile): New function.
        (resolve_omp_do): Handle EXEC_OMP_TILE.
        (resolve_omp_tile): New function.
        (omp_code_to_statement): Handle EXEC_OMP_TILE.
        (gfc_resolve_omp_directive): Likewise.
        * parse.cc (decode_omp_directive): Handle ST_OMP_END_TILE
        and ST_OMP_TILE.
        (next_statement): Handle ST_OMP_TILE.
        (gfc_ascii_statement): Likewise.
        (parse_omp_do): Likewise.
        (parse_executable): Likewise.
        * resolve.cc (gfc_resolve_blocks): Handle EXEC_OMP_TILE.
        (gfc_resolve_code): Likewise.
        * st.cc (gfc_free_statement): Likewise.
        * trans-openmp.cc (gfc_trans_omp_clauses): Handle "tile_sizes" field.
        (loop_transform_p): New function.
        (gfc_expr_list_len): New function.
        (gfc_trans_omp_do): Handle EXEC_OMP_TILE.
        (gfc_trans_omp_directive): Likewise.
        * trans.cc (trans_code): Likewise.

gcc/ChangeLog:

        * gimplify.cc (gimplify_scan_omp_clauses): Handle OMP_CLAUSE_TILE.
        (gimplify_adjust_omp_clauses): Likewise.
        (gimplify_omp_loop): Likewise.
        * omp-transform-loops.cc (walk_omp_for_loops): New declaration.
        (subst_var_in_op): New function.
        (subst_var): New function.
        (gomp_for_number_of_iterations): Adjust.
        (gomp_for_iter_count_type): New function.
        (gimple_assign_rhs_to_tree): New function.
        (subst_defs): New function.
        (gomp_for_uncollapse): Adjust.
        (transformation_clause_p): Add OMP_CLAUSE_TILE.
        (tile): New function.
        (transform_gomp_for): Handle OMP_CLAUSE_TILE.
        (optimize_transformation_clauses): Handle OMP_CLAUSE_TILE.
        * omp-general.cc (omp_loop_transform_clauses_p): Add OMP_CLAUSE_TILE.
        * tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_TILE.
        * tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_TILE.
        * tree.cc: Add OMP_CLAUSE_TILE.
        * tree.h (OMP_CLAUSE_TILE_SIZES): New macro.

libgomp/ChangeLog:

        * testsuite/libgomp.fortran/loop-transforms/tile-1.f90: New test.
        * testsuite/libgomp.fortran/loop-transforms/tile-2.f90: New test.
        * testsuite/libgomp.fortran/loop-transforms/tile-unroll-1.f90: New test.
        * testsuite/libgomp.fortran/loop-transforms/tile-unroll-2.f90: New test.
        * testsuite/libgomp.fortran/loop-transforms/tile-unroll-3.f90: New test.
        * testsuite/libgomp.fortran/loop-transforms/tile-unroll-4.f90: New test.
        * testsuite/libgomp.fortran/loop-transforms/unroll-tile-1.f90: New test.
        * testsuite/libgomp.fortran/loop-transforms/unroll-tile-2.f90: New test.

gcc/testsuite/ChangeLog:

        * gfortran.dg/gomp/loop-transforms/tile-1.f90: New test.
        * gfortran.dg/gomp/loop-transforms/tile-1a.f90: New test.
        * gfortran.dg/gomp/loop-transforms/tile-2.f90: New test.
        * gfortran.dg/gomp/loop-transforms/tile-3.f90: New test.
        * gfortran.dg/gomp/loop-transforms/tile-4.f90: New test.
        * gfortran.dg/gomp/loop-transforms/tile-unroll-1.f90: New test.
        * gfortran.dg/gomp/loop-transforms/unroll-tile-1.f90: New test.
        * gfortran.dg/gomp/loop-transforms/unroll-tile-2.f90: New test.
---
 gcc/fortran/dump-parse-tree.cc                |  17 +-
 gcc/fortran/gfortran.h                        |   7 +-
 gcc/fortran/match.h                           |   1 +
 gcc/fortran/openmp.cc                         | 373 +++++++++++++-----
 gcc/fortran/parse.cc                          |  15 +
 gcc/fortran/resolve.cc                        |   3 +
 gcc/fortran/st.cc                             |   1 +
 gcc/fortran/trans-openmp.cc                   |  86 ++--
 gcc/fortran/trans.cc                          |   1 +
 gcc/gimplify.cc                               |   3 +
 gcc/omp-general.cc                            |   2 +-
 gcc/omp-transform-loops.cc                    | 340 +++++++++++++++-
 .../gomp/loop-transforms/tile-1.f90           | 163 ++++++++
 .../gomp/loop-transforms/tile-1a.f90          |  10 +
 .../gomp/loop-transforms/tile-2.f90           |  80 ++++
 .../gomp/loop-transforms/tile-3.f90           |  18 +
 .../gomp/loop-transforms/tile-4.f90           |  95 +++++
 .../gomp/loop-transforms/tile-unroll-1.f90    |  57 +++
 .../gomp/loop-transforms/unroll-tile-1.f90    |  37 ++
 .../gomp/loop-transforms/unroll-tile-2.f90    |  41 ++
 gcc/tree-core.h                               |   3 +
 gcc/tree-pretty-print.cc                      |   8 +
 gcc/tree.cc                                   |   7 +-
 gcc/tree.h                                    |   3 +
 .../loop-transforms/unroll-full-tile.C        |  84 ++++
 .../loop-transforms/tile-1.f90                |  71 ++++
 .../loop-transforms/tile-2.f90                | 117 ++++++
 .../loop-transforms/tile-unroll-1.f90         | 112 ++++++
 .../loop-transforms/tile-unroll-2.f90         |  71 ++++
 .../loop-transforms/tile-unroll-3.f90         |  77 ++++
 .../loop-transforms/tile-unroll-4.f90         |  75 ++++
 .../loop-transforms/unroll-tile-1.f90         | 112 ++++++
 .../loop-transforms/unroll-tile-2.f90         |  71 ++++
 33 files changed, 2042 insertions(+), 119 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-1a.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-3.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-4.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-unroll-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-2.f90
 create mode 100644 libgomp/testsuite/libgomp.c++/loop-transforms/unroll-full-tile.C
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-1.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-1.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-2.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-3.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-4.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-tile-1.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-tile-2.f90

diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc
index e069aca1f1d..82183285954 100644
--- a/gcc/fortran/dump-parse-tree.cc
+++ b/gcc/fortran/dump-parse-tree.cc
@@ -2062,6 +2062,18 @@ show_omp_clauses (gfc_omp_clauses *omp_clauses)
       if (omp_clauses->unroll_partial_factor > 0)
        fprintf (dumpfile, "(%u)", omp_clauses->unroll_partial_factor);
     }
+  if (omp_clauses->tile_sizes)
+    {
+      gfc_expr_list *sizes;
+      fputs (" TILE SIZES(", dumpfile);
+      for (sizes = omp_clauses->tile_sizes; sizes; sizes = sizes->next)
+       {
+         show_expr (sizes->expr);
+         if (sizes->next)
+           fputs (", ", dumpfile);
+       }
+      fputc (')', dumpfile);
+    }
 }

 /* Show a single OpenMP or OpenACC directive node and everything underneath it
@@ -2172,6 +2184,7 @@ show_omp_node (int level, gfc_code *c)
       name = "TEAMS DISTRIBUTE PARALLEL DO SIMD"; break;
     case EXEC_OMP_TEAMS_DISTRIBUTE_SIMD: name = "TEAMS DISTRIBUTE SIMD"; break;
     case EXEC_OMP_TEAMS_LOOP: name = "TEAMS LOOP"; break;
+    case EXEC_OMP_TILE: name = "TILE"; break;
     case EXEC_OMP_UNROLL: name = "UNROLL"; break;
     case EXEC_OMP_WORKSHARE: name = "WORKSHARE"; break;
     default:
@@ -2249,6 +2262,7 @@ show_omp_node (int level, gfc_code *c)
     case EXEC_OMP_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD:
     case EXEC_OMP_TEAMS_DISTRIBUTE_SIMD:
     case EXEC_OMP_TEAMS_LOOP:
+    case EXEC_OMP_TILE:
     case EXEC_OMP_UNROLL:
     case EXEC_OMP_WORKSHARE:
       omp_clauses = c->ext.omp_clauses;
@@ -2311,7 +2325,7 @@ show_omp_node (int level, gfc_code *c)
          d = d->block;
        }
     }
-  else if (c->op == EXEC_OMP_UNROLL)
+  else if (c->op == EXEC_OMP_UNROLL || c->op == EXEC_OMP_TILE)
     show_code (level + 1, c->block != NULL ? c->block->next : c->next);
   else
     show_code (level + 1, c->block->next);
@@ -3491,6 +3505,7 @@ show_code_node (int level, gfc_code *c)
     case EXEC_OMP_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD:
     case EXEC_OMP_TEAMS_DISTRIBUTE_SIMD:
     case EXEC_OMP_TEAMS_LOOP:
+    case EXEC_OMP_TILE:
     case EXEC_OMP_UNROLL:
     case EXEC_OMP_WORKSHARE:
       show_omp_node (level, c);
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 5ef4a8907b0..8b4eadf9b4d 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -320,7 +320,8 @@ enum gfc_statement
   ST_OMP_ERROR, ST_OMP_ASSUME, ST_OMP_END_ASSUME, ST_OMP_ASSUMES,
   /* Note: gfc_match_omp_nothing returns ST_NONE. */
   ST_OMP_NOTHING, ST_NONE,
-  ST_OMP_UNROLL, ST_OMP_END_UNROLL
+  ST_OMP_UNROLL, ST_OMP_END_UNROLL,
+  ST_OMP_TILE, ST_OMP_END_TILE
 };

 /* Types of interfaces that we can have.  Assignment interfaces are
@@ -1550,6 +1551,7 @@ typedef struct gfc_omp_clauses
   struct gfc_expr *dist_chunk_size;
   struct gfc_expr *message;
   struct gfc_omp_assumptions *assume;
+  struct gfc_expr_list *tile_sizes;
   const char *critical_name;
   enum gfc_omp_default_sharing default_sharing;
   enum gfc_omp_atomic_op atomic_op;
@@ -2977,7 +2979,7 @@ enum gfc_exec_op
   EXEC_OMP_TARGET_TEAMS_LOOP, EXEC_OMP_MASKED, EXEC_OMP_PARALLEL_MASKED,
   EXEC_OMP_PARALLEL_MASKED_TASKLOOP, EXEC_OMP_PARALLEL_MASKED_TASKLOOP_SIMD,
   EXEC_OMP_MASKED_TASKLOOP, EXEC_OMP_MASKED_TASKLOOP_SIMD, EXEC_OMP_SCOPE,
-  EXEC_OMP_UNROLL,
+  EXEC_OMP_UNROLL, EXEC_OMP_TILE,
   EXEC_OMP_ERROR
 };

@@ -3874,6 +3876,7 @@ bool gfc_inline_intrinsic_function_p (gfc_expr *);

 /* trans-openmp.cc */
 bool loop_transform_p (gfc_exec_op op);
+int gfc_expr_list_len (gfc_expr_list *);

 /* bbt.cc */
 typedef int (*compare_fn) (void *, void *);
diff --git a/gcc/fortran/match.h b/gcc/fortran/match.h
index 5640c725f09..d04e1cd66a4 100644
--- a/gcc/fortran/match.h
+++ b/gcc/fortran/match.h
@@ -226,6 +226,7 @@ match gfc_match_omp_teams_distribute_parallel_do_simd (void);
 match gfc_match_omp_teams_distribute_simd (void);
 match gfc_match_omp_teams_loop (void);
 match gfc_match_omp_threadprivate (void);
+match gfc_match_omp_tile (void);
 match gfc_match_omp_unroll (void);
 match gfc_match_omp_workshare (void);
 match gfc_match_omp_end_critical (void);
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index ec707d977cd..1de61029768 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -191,6 +191,7 @@ gfc_free_omp_clauses (gfc_omp_clauses *c)
                           i == OMP_LIST_ALLOCATE);
   gfc_free_expr_list (c->wait_list);
   gfc_free_expr_list (c->tile_list);
+  gfc_free_expr_list (c->tile_sizes);
   free (CONST_CAST (char *, c->critical_name));
   if (c->assume)
     {
@@ -977,6 +978,76 @@ cleanup:
   return MATCH_ERROR;
 }

+static match
+match_tile_sizes (gfc_expr_list **list)
+{
+  gfc_expr_list *head, *tail, *p;
+  locus old_loc;
+  gfc_expr *expr;
+  match m;
+
+  head = tail = NULL;
+
+  old_loc = gfc_current_locus;
+
+  m = gfc_match_char ('(');
+  if (m != MATCH_YES)
+    goto syntax;
+
+  for (;;)
+    {
+      m = gfc_match_expr (&expr);
+      if (m == MATCH_YES)
+       {
+         p = gfc_get_expr_list ();
+         if (head == NULL)
+           head = tail = p;
+         else
+           {
+             tail->next = p;
+             tail = tail->next;
+           }
+         int size = 0;
+         if (m == MATCH_YES)
+           {
+             if (gfc_extract_int (expr, &size, 1))
+               goto cleanup;
+             else if (size < 1)
+               {
+                 gfc_error_now ("tile size not constant "
+                                "positive integer at %C");
+                 goto cleanup;
+               }
+           tail->expr = expr;
+           }
+         goto next_item;
+       }
+      if (m == MATCH_ERROR)
+       goto cleanup;
+      goto syntax;
+
+    next_item:
+      if (gfc_match_char (')') == MATCH_YES)
+       break;
+      if (gfc_match_char (',') != MATCH_YES)
+       goto syntax;
+    }
+
+  while (*list)
+    list = &(*list)->next;
+
+  *list = head;
+  return MATCH_YES;
+
+syntax:
+  gfc_error ("Syntax error in 'tile sizes' list at %C");
+
+cleanup:
+  gfc_free_expr_list (head);
+  gfc_current_locus = old_loc;
+  return MATCH_ERROR;
+}
+
 /* OpenMP clauses.  */
 enum omp_mask1
 {
@@ -1054,6 +1125,7 @@ enum omp_mask2
   OMP_CLAUSE_UNROLL_FULL,  /* OpenMP 5.1.  */
   OMP_CLAUSE_UNROLL_NONE,  /* OpenMP 5.1.  */
   OMP_CLAUSE_UNROLL_PARTIAL,  /* OpenMP 5.1.  */
+  OMP_CLAUSE_TILE,  /* OpenMP 5.1.  */
   OMP_CLAUSE_ASYNC,
   OMP_CLAUSE_NUM_GANGS,
   OMP_CLAUSE_NUM_WORKERS,
@@ -4310,7 +4382,8 @@ cleanup:
   omp_mask (OMP_CLAUSE_NOWAIT)
 #define OMP_UNROLL_CLAUSES \
   (omp_mask (OMP_CLAUSE_UNROLL_FULL) | OMP_CLAUSE_UNROLL_PARTIAL)
-
+#define OMP_TILE_CLAUSES \
+  (omp_mask (OMP_CLAUSE_TILE))

 static match
 match_omp (gfc_exec_op op, const omp_mask mask)
@@ -6409,6 +6482,16 @@ gfc_match_omp_teams_distribute_simd (void)
                    | OMP_SIMD_CLAUSES);
 }

+match
+gfc_match_omp_tile (void)
+{
+  gfc_omp_clauses *c = gfc_get_omp_clauses();
+  new_st.op = EXEC_OMP_TILE;
+  new_st.ext.omp_clauses = c;
+
+  return match_tile_sizes (&c->tile_sizes);
+}
+
 match
 gfc_match_omp_unroll (void)
 {
@@ -9289,75 +9372,6 @@ gfc_resolve_do_iterator (gfc_code *code, gfc_symbol *sym, bool add_clause)
     }
 }

-
-static bool
-omp_unroll_removes_loop_nest (gfc_code *code)
-{
-  gcc_assert (code->op == EXEC_OMP_UNROLL);
-  if (!code->ext.omp_clauses)
-    return true;
-
-  if (code->ext.omp_clauses->unroll_none)
-    {
-      gfc_warning (0, "!$OMP UNROLL without PARTIAL clause at %L turns loop "
-                  "into a non-loop",
-                  &code->loc);
-      return true;
-    }
-  if (code->ext.omp_clauses->unroll_full)
-    {
-      gfc_warning (0, "!$OMP UNROLL with FULL clause at %L turns loop into a "
-                  "non-loop",
-                  &code->loc);
-      return true;
-    }
-  return false;
-}
-
-static void
-resolve_loop_transform_generic (gfc_code *code, const char *descr)
-{
-  gcc_assert (code->block);
-
-  if (code->block->op == EXEC_OMP_UNROLL
-       && !omp_unroll_removes_loop_nest (code->block))
-    return;
-
-  if (code->block->next->op == EXEC_OMP_UNROLL
-      && !omp_unroll_removes_loop_nest (code->block->next))
-    return;
-
-  if (code->block->next->op == EXEC_DO_WHILE)
-    {
-      gfc_error ("%s invalid around DO WHILE or DO without loop "
-                "control at %L", descr, &code->loc);
-      return;
-    }
-  if (code->block->next->op == EXEC_DO_CONCURRENT)
-    {
-      gfc_error ("%s invalid around DO CONCURRENT loop at %L",
-                descr, &code->loc);
-      return;
-    }
-
-  gfc_error ("missing canonical loop nest after %s at %L",
-            descr, &code->loc);
-
-}
-
-static void
-resolve_omp_unroll (gfc_code *code)
-{
-  if (!code->block || code->block->op == EXEC_DO)
-    return;
-
-  if (code->block->next->op == EXEC_DO)
-    return;
-
-  resolve_loop_transform_generic (code, "!$OMP UNROLL");
-}
-
-
 static void
 handle_local_var (gfc_symbol *sym)
 {
@@ -9488,6 +9502,106 @@ bound_expr_is_canonical (gfc_code *code, int depth, gfc_expr *expr,
   return false;
 }

+static bool
+omp_unroll_removes_loop_nest (gfc_code *code)
+{
+  gcc_assert (code->op == EXEC_OMP_UNROLL);
+  if (!code->ext.omp_clauses)
+    return true;
+
+  if (code->ext.omp_clauses->unroll_none)
+    {
+      gfc_warning (0, "!$OMP UNROLL without PARTIAL clause at %L turns loop "
+                  "into a non-loop",
+                  &code->loc);
+      return true;
+    }
+  if (code->ext.omp_clauses->unroll_full)
+    {
+      gfc_warning (0, "!$OMP UNROLL with FULL clause at %L turns loop into a "
+                  "non-loop",
+                  &code->loc);
+      return true;
+    }
+  return false;
+}
+
+static gfc_code *
+resolve_nested_loop_transforms (gfc_code *code, const char *name,
+                               int required_depth, locus *loc)
+{
+  if (!code)
+    return code;
+
+  bool error = false;
+  while (loop_transform_p (code->op))
+    {
+      if (!error && code->op == EXEC_OMP_UNROLL)
+       {
+         if (omp_unroll_removes_loop_nest (code))
+           {
+             gfc_error ("missing canonical loop nest after %s at %L", name,
+                        loc);
+             error = true;
+           }
+         else if (required_depth > 1)
+           {
+             gfc_error ("loop nest depth after !$OMP UNROLL at %L is insufficient "
+                        "for outer %s", &code->loc, name);
+             error = true;
+           }
+       }
+      else if (!error && code->op == EXEC_OMP_TILE
+              && required_depth > gfc_expr_list_len (code->ext.omp_clauses->tile_sizes))
+       {
+             gfc_error ("loop nest depth after !$OMP TILE at %L is insufficient "
+                        "for outer %s", &code->loc, name);
+             error = true;
+       }
+
+      if (code->block)
+       code = code->block->next;
+      else
+       code = code->next;
+    }
+  gcc_assert (!loop_transform_p (code->op));
+
+  return code;
+}
+
+static void
+resolve_omp_unroll (gfc_code *code)
+{
+  const char *descr = "!$OMP UNROLL";
+  locus *loc = &code->loc;
+
+  if (!code->block || code->block->op == EXEC_DO)
+    return;
+
+  code = resolve_nested_loop_transforms (code->block->next, descr, 1,
+                                        &code->loc);
+
+  if (code->op == EXEC_DO)
+    return;
+
+  if (code->op == EXEC_DO_WHILE)
+    {
+      gfc_error ("%s invalid around DO WHILE or DO without loop "
+                "control at %L", descr, loc);
+      return;
+    }
+
+  if (code->op == EXEC_DO_CONCURRENT)
+    {
+      gfc_error ("%s invalid around DO CONCURRENT loop at %L",
+                descr, loc);
+      return;
+    }
+
+  gfc_error ("missing canonical loop nest after %s at %L",
+            descr, loc);
+}
+
 static void
 resolve_omp_do (gfc_code *code)
 {
@@ -9592,30 +9706,13 @@ resolve_omp_do (gfc_code *code)
       break;
     case EXEC_OMP_TEAMS_LOOP: name = "!$OMP TEAMS LOOP"; break;
     case EXEC_OMP_UNROLL: name = "!$OMP UNROLL"; break;
+    case EXEC_OMP_TILE: name = "!$OMP TILE"; break;
     default: gcc_unreachable ();
     }

   if (code->ext.omp_clauses)
     resolve_omp_clauses (code, code->ext.omp_clauses, NULL);

-  do_code = code->block->next;
-  /* Move forward over any loop transformation directives to find the loop. */
-  bool error = false;
-  while (do_code->op == EXEC_OMP_UNROLL)
-    {
-      if (!error && omp_unroll_removes_loop_nest (do_code))
-       {
-         gfc_error ("missing canonical loop nest after %s at %L", name,
-                    &code->loc);
-         error = true;
-       }
-      if (do_code->block)
-       do_code = do_code->block->next;
-      else
-       do_code = do_code->next;
-    }
-  gcc_assert (do_code->op != EXEC_OMP_UNROLL);
-
   if (code->ext.omp_clauses->orderedc)
     collapse = code->ext.omp_clauses->orderedc;
   else
@@ -9630,6 +9727,9 @@ resolve_omp_do (gfc_code *code)
      depth and treats any further inner loops as the final-loop-body.  So
      here we also check canonical loop nest form only for the number of
      outer loops specified by the COLLAPSE clause too.  */
+  do_code = resolve_nested_loop_transforms (code->block->next, name, collapse,
+                                           &code->loc);
+
   for (i = 1; i <= collapse; i++)
     {
       gfc_symbol *start_var = NULL, *end_var = NULL;
@@ -9745,6 +9845,98 @@ resolve_omp_do (gfc_code *code)
     }
 }

+static void
+resolve_omp_tile (gfc_code *code)
+{
+  gfc_code *do_code, *c;
+  gfc_symbol *dovar;
+  const char *name = "!$OMP TILE";
+
+  unsigned num_loops = 0;
+  gcc_assert (code->ext.omp_clauses->tile_sizes);
+  for (gfc_expr_list *el = code->ext.omp_clauses->tile_sizes; el;
+       el = el->next)
+    num_loops++;
+
+  do_code = resolve_nested_loop_transforms (code, name, num_loops, &code->loc);
+
+  for (unsigned i = 1; i <= num_loops; i++)
+    {
+      if (do_code->op == EXEC_DO_WHILE)
+       {
+         gfc_error ("%s cannot be a DO WHILE or DO without loop control "
+                    "at %L", name, &do_code->loc);
+         break;
+       }
+      if (do_code->op == EXEC_DO_CONCURRENT)
+       {
+         gfc_error ("%s cannot be a DO CONCURRENT loop at %L", name,
+                    &do_code->loc);
+         break;
+       }
+      if (do_code->op != EXEC_DO)
+       {
+         gfc_error ("%s must be DO loop at %L", name,
+                    &do_code->loc);
+         break;
+       }
+
+      gcc_assert (do_code->op != EXEC_OMP_UNROLL);
+      gcc_assert (do_code->op == EXEC_DO);
+      dovar = do_code->ext.iterator->var->symtree->n.sym;
+      if (i > 1)
+       {
+         gfc_code *do_code2 = code;
+         while (loop_transform_p (do_code2->op))
+           {
+             if (do_code2->block)
+               do_code2 = do_code2->block->next;
+             else
+               do_code2 = do_code2->next;
+           }
+         gcc_assert (!loop_transform_p (do_code2->op));
+
+         for (unsigned j = 1; j < i; j++)
+           {
+             gfc_symbol *ivar = do_code2->ext.iterator->var->symtree->n.sym;
+             if (dovar == ivar
+                 || gfc_find_sym_in_expr (ivar, do_code->ext.iterator->start)
+                 || gfc_find_sym_in_expr (ivar, do_code->ext.iterator->end)
+                 || gfc_find_sym_in_expr (ivar, do_code->ext.iterator->step))
+               {
+                 gfc_error ("%s loops don't form rectangular "
+                            "iteration space at %L", name, &do_code->loc);
+                 break;
+               }
+             do_code2 = do_code2->block->next;
+           }
+       }
+      for (c = do_code->next; c; c = c->next)
+       if (c->op != EXEC_NOP && c->op != EXEC_CONTINUE)
+         {
+           gfc_error ("%s loops not perfectly nested at %L",
+                      name, &c->loc);
+           break;
+         }
+      if (i == num_loops || c)
+       break;
+      do_code = do_code->block;
+      if (do_code->op != EXEC_DO && do_code->op != EXEC_DO_WHILE)
+       {
+         gfc_error ("not enough DO loops for %s at %L",
+                    name, &code->loc);
+         break;
+       }
+      do_code = do_code->next;
+      if (do_code == NULL
+         || (do_code->op != EXEC_DO && do_code->op != EXEC_DO_WHILE))
+       {
+         gfc_error ("not enough DO loops for %s at %L",
+                    name, &code->loc);
+         break;
+       }
+    }
+}

 static gfc_statement
 omp_code_to_statement (gfc_code *code)
@@ -9889,6 +10081,8 @@ omp_code_to_statement (gfc_code *code)
       return ST_OMP_PARALLEL_LOOP;
     case EXEC_OMP_DEPOBJ:
       return ST_OMP_DEPOBJ;
+    case EXEC_OMP_TILE:
+      return ST_OMP_TILE;
     case EXEC_OMP_UNROLL:
       return ST_OMP_UNROLL;
     default:
@@ -10320,6 +10514,9 @@ gfc_resolve_omp_directive (gfc_code *code, gfc_namespace *ns)
     case EXEC_OMP_TEAMS_LOOP:
       resolve_omp_do (code);
       break;
+    case EXEC_OMP_TILE:
+      resolve_omp_tile (code);
+      break;
     case EXEC_OMP_UNROLL:
       resolve_omp_unroll (code);
       break;
diff --git a/gcc/fortran/parse.cc b/gcc/fortran/parse.cc
index 094678436b4..1cc5200f35a 100644
--- a/gcc/fortran/parse.cc
+++ b/gcc/fortran/parse.cc
@@ -1009,6 +1009,7 @@ decode_omp_directive (void)
       matcho ("end teams loop", gfc_match_omp_eos_error, ST_OMP_END_TEAMS_LOOP);
       matcho ("end teams", gfc_match_omp_eos_error, ST_OMP_END_TEAMS);
       matchs ("end unroll", gfc_match_omp_eos_error, ST_OMP_END_UNROLL);
+      matchs ("end tile", gfc_match_omp_eos_error, ST_OMP_END_TILE);
       matcho ("end workshare", gfc_match_omp_end_nowait,
              ST_OMP_END_WORKSHARE);
       break;
@@ -1137,6 +1138,7 @@ decode_omp_directive (void)
       matcho ("teams", gfc_match_omp_teams, ST_OMP_TEAMS);
       matchdo ("threadprivate", gfc_match_omp_threadprivate,
               ST_OMP_THREADPRIVATE);
+      matchs ("tile sizes", gfc_match_omp_tile, ST_OMP_TILE);
       break;
     case 'u':
       matchs ("unroll", gfc_match_omp_unroll, ST_OMP_UNROLL);
@@ -1729,6 +1731,7 @@ next_statement (void)
   case ST_OMP_TARGET_PARALLEL_LOOP: case ST_OMP_TARGET_TEAMS_LOOP: \
   case ST_OMP_ASSUME: \
   case ST_OMP_UNROLL: \
+  case ST_OMP_TILE: \
   case ST_CRITICAL: \
   case ST_OACC_PARALLEL_LOOP: case ST_OACC_PARALLEL: case ST_OACC_KERNELS: \
   case ST_OACC_DATA: case ST_OACC_HOST_DATA: case ST_OACC_LOOP: \
@@ -2774,6 +2777,9 @@ gfc_ascii_statement (gfc_statement st, bool strip_sentinel)
     case ST_OMP_THREADPRIVATE:
       p = "!$OMP THREADPRIVATE";
       break;
+    case ST_OMP_TILE:
+      p = "!$OMP TILE";
+      break;
     case ST_OMP_UNROLL:
       p = "!$OMP UNROLL";
       break;
@@ -5214,6 +5220,11 @@ parse_omp_do (gfc_statement omp_st)
          num_unroll++;
          continue;
        }
+      else if (st == ST_OMP_TILE)
+       {
+         accept_statement (st);
+         continue;
+       }
       else
        unexpected_statement (st);
     }
@@ -5338,6 +5349,9 @@ parse_omp_do (gfc_statement omp_st)
     case ST_OMP_TEAMS_LOOP:
       omp_end_st = ST_OMP_END_TEAMS_LOOP;
       break;
+    case ST_OMP_TILE:
+      omp_end_st = ST_OMP_END_TILE;
+      break;
     case ST_OMP_UNROLL:
       omp_end_st = ST_OMP_END_UNROLL;
       break;
@@ -6025,6 +6039,7 @@ parse_executable (gfc_statement st)
        case ST_OMP_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD:
        case ST_OMP_TEAMS_DISTRIBUTE_SIMD:
        case ST_OMP_TEAMS_LOOP:
+       case ST_OMP_TILE:
        case ST_OMP_UNROLL:
          st = parse_omp_do (st);
          if (st == ST_IMPLIED_ENDDO)
diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index 46988ff281d..182aa18053c 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -11041,6 +11041,7 @@ gfc_resolve_blocks (gfc_code *b, gfc_namespace *ns)
        case EXEC_OMP_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD:
        case EXEC_OMP_TEAMS_LOOP:
        case EXEC_OMP_TEAMS_DISTRIBUTE_SIMD:
+       case EXEC_OMP_TILE:
        case EXEC_OMP_UNROLL:
        case EXEC_OMP_WORKSHARE:
          break;
@@ -12198,6 +12199,7 @@ gfc_resolve_code (gfc_code *code, gfc_namespace *ns)
            case EXEC_OMP_LOOP:
            case EXEC_OMP_SIMD:
            case EXEC_OMP_TARGET_SIMD:
+           case EXEC_OMP_TILE:
            case EXEC_OMP_UNROLL:
              gfc_resolve_omp_do_blocks (code, ns);
              break;
@@ -12695,6 +12697,7 @@ start:
        case EXEC_OMP_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD:
        case EXEC_OMP_TEAMS_DISTRIBUTE_SIMD:
        case EXEC_OMP_TEAMS_LOOP:
+       case EXEC_OMP_TILE:
        case EXEC_OMP_UNROLL:
        case EXEC_OMP_WORKSHARE:
          gfc_resolve_omp_directive (code, ns);
diff --git a/gcc/fortran/st.cc b/gcc/fortran/st.cc
index 6112831e621..cea874e4474 100644
--- a/gcc/fortran/st.cc
+++ b/gcc/fortran/st.cc
@@ -277,6 +277,7 @@ gfc_free_statement (gfc_code *p)
     case EXEC_OMP_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD:
     case EXEC_OMP_TEAMS_DISTRIBUTE_SIMD:
     case EXEC_OMP_TEAMS_LOOP:
+    case EXEC_OMP_TILE:
     case EXEC_OMP_UNROLL:
     case EXEC_OMP_WORKSHARE:
       gfc_free_omp_clauses (p->ext.omp_clauses);
diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 73c416c951d..6936cd7f5ee 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -3913,6 +3913,24 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
       omp_clauses = gfc_trans_add_clause (c, omp_clauses);
     }

+  if (clauses->tile_sizes)
+    {
+      vec<tree, va_gc> *tvec;
+      gfc_expr_list *el;
+
+      vec_alloc (tvec, 4);
+
+      for (el = clauses->tile_sizes; el; el = el->next)
+       vec_safe_push (tvec, gfc_convert_expr_to_tree (block, el->expr));
+
+      c = build_omp_clause (gfc_get_location (&where),
+                           OMP_CLAUSE_TILE);
+      OMP_CLAUSE_TILE_SIZES (c) = build_tree_list_vec (tvec);
+      omp_clauses = gfc_trans_add_clause (c, omp_clauses);
+
+      tvec->truncate (0);
+    }
+
   if (clauses->ordered)
     {
       c = build_omp_clause (gfc_get_location (&where), OMP_CLAUSE_ORDERED);
@@ -5106,7 +5124,7 @@ gfc_trans_omp_cancel (gfc_code *code)
 bool
 loop_transform_p (gfc_exec_op op)
 {
-  return op == EXEC_OMP_UNROLL;
+  return op == EXEC_OMP_UNROLL || op == EXEC_OMP_TILE;
 }

 static tree
@@ -5280,6 +5298,16 @@ gfc_nonrect_loop_expr (stmtblock_t *pblock, gfc_se *sep, int loop_n,
   return true;
 }

+int
+gfc_expr_list_len (gfc_expr_list *list)
+{
+  unsigned len = 0;
+  for (; list; list = list->next)
+    len++;
+
+  return len;
+}
+
 static tree
 gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock,
                  gfc_omp_clauses *do_clauses, tree par_clauses)
@@ -5295,25 +5323,14 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock,
   dovar_init *di;
   unsigned ix;
   vec<tree, va_heap, vl_embed> *saved_doacross_steps = doacross_steps;
-  gfc_expr_list *tile = do_clauses ? do_clauses->tile_list : clauses->tile_list;
   gfc_code *orig_code = code;
   locus top_loc = code->loc;
-
-  /* Both collapsed and tiled loops are lowered the same way.  In
-     OpenACC, those clauses are not compatible, so prioritize the tile
-     clause, if present.  */
-  if (tile)
-    {
-      collapse = 0;
-      for (gfc_expr_list *el = tile; el; el = el->next)
-       collapse++;
-    }
-
-  doacross_steps = NULL;
-  if (clauses->orderedc)
-    collapse = clauses->orderedc;
-  if (collapse <= 0)
-    collapse = 1;
+  gfc_expr_list *oacc_tile
+      = do_clauses ? do_clauses->tile_list : clauses->tile_list;
+  gfc_expr_list *omp_tile
+      = do_clauses ? do_clauses->tile_sizes : clauses->tile_sizes;
+  gcc_assert (!omp_tile || op == EXEC_OMP_TILE);
+  gcc_assert (!(oacc_tile && omp_tile));

   if (pblock == NULL)
     {
@@ -5321,21 +5338,42 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock,
       pblock = &block;
     }
   code = code->block->next;
-  gcc_assert (code->op == EXEC_DO || code->op == EXEC_OMP_UNROLL);
+  gcc_assert (code->op == EXEC_DO || loop_transform_p (code->op));
   /* Loop transformation directives surrounding the associated loop of an "omp
      do" (or similar directive) are represented as clauses on the "omp do". */
   loop_transform_clauses = NULL;
-  while (code->op == EXEC_OMP_UNROLL)
+  int omp_tile_depth = gfc_expr_list_len (omp_tile);
+  while (loop_transform_p (code->op))
     {
       tree clauses = gfc_trans_omp_clauses (pblock, code->ext.omp_clauses,
                                            code->loc);
-      loop_transform_clauses = chainon (loop_transform_clauses, clauses);

+      /* There might be several "!$omp tile" transformations surrounding the
+        loop. Use the innermost one which must have the largest tiling depth.
+        If an inner directive has a smaller tiling depth than an outer
+        directive, an error will be emitted in pass-omp_transform_loops. */
+      omp_tile_depth = gfc_expr_list_len (code->ext.omp_clauses->tile_sizes);
+
+      loop_transform_clauses = chainon (loop_transform_clauses, clauses);
       code = code->block ? code->block->next : code->next;
     }
-  gcc_assert (code->op != EXEC_OMP_UNROLL);
+  gcc_assert (!loop_transform_p (code->op));
   gcc_assert (code->op == EXEC_DO);

+  /* Both collapsed and tiled loops are lowered the same way.  In
+     OpenACC, those clauses are not compatible, so prioritize the tile
+     clause, if present.  */
+  if (oacc_tile)
+    collapse = gfc_expr_list_len (oacc_tile);
+
+  doacross_steps = NULL;
+  if (clauses->orderedc)
+    collapse = clauses->orderedc;
+  if (collapse <= 0)
+    collapse = 1;
+
+  collapse = MAX (collapse, omp_tile_depth);
+
   init = make_tree_vec (collapse);
   cond = make_tree_vec (collapse);
   incr = make_tree_vec (collapse);
@@ -5346,7 +5384,7 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock,
      on the simd construct and DO's clauses are translated elsewhere.  */
   do_clauses->sched_simd = false;

-  if (op == EXEC_OMP_UNROLL)
+  if (loop_transform_p (op))
     {
       /* This is a loop transformation on a loop which is not associated with
         any other directive. Use the directive location instead of the loop
@@ -5695,6 +5733,7 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock,
     case EXEC_OMP_LOOP: stmt = make_node (OMP_LOOP); break;
     case EXEC_OMP_TASKLOOP: stmt = make_node (OMP_TASKLOOP); break;
     case EXEC_OACC_LOOP: stmt = make_node (OACC_LOOP); break;
+    case EXEC_OMP_TILE: stmt = make_node (OMP_LOOP_TRANS); break;
     case EXEC_OMP_UNROLL: stmt = make_node (OMP_LOOP_TRANS); break;
     default: gcc_unreachable ();
     }
@@ -7793,6 +7832,7 @@ gfc_trans_omp_directive (gfc_code *code)
     case EXEC_OMP_LOOP:
     case EXEC_OMP_SIMD:
     case EXEC_OMP_TASKLOOP:
+    case EXEC_OMP_TILE:
     case EXEC_OMP_UNROLL:
       return gfc_trans_omp_do (code, code->op, NULL, code->ext.omp_clauses,
                               NULL);
diff --git a/gcc/fortran/trans.cc b/gcc/fortran/trans.cc
index 56ec59fe80e..94b23c3b77a 100644
--- a/gcc/fortran/trans.cc
+++ b/gcc/fortran/trans.cc
@@ -2520,6 +2520,7 @@ trans_code (gfc_code * code, tree cond)
        case EXEC_OMP_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD:
        case EXEC_OMP_TEAMS_DISTRIBUTE_SIMD:
        case EXEC_OMP_TEAMS_LOOP:
+       case EXEC_OMP_TILE:
        case EXEC_OMP_UNROLL:
        case EXEC_OMP_WORKSHARE:
          res = gfc_trans_omp_directive (code);
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 14616eb5316..4d504a12451 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -12105,6 +12105,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
        case OMP_CLAUSE_UNROLL_FULL:
        case OMP_CLAUSE_UNROLL_NONE:
        case OMP_CLAUSE_UNROLL_PARTIAL:
+       case OMP_CLAUSE_TILE:
          break;
        case OMP_CLAUSE_NOHOST:
        default:
@@ -13076,6 +13077,7 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p,
        case OMP_CLAUSE_FINALIZE:
        case OMP_CLAUSE_INCLUSIVE:
        case OMP_CLAUSE_EXCLUSIVE:
+       case OMP_CLAUSE_TILE:
        case OMP_CLAUSE_UNROLL_FULL:
        case OMP_CLAUSE_UNROLL_NONE:
        case OMP_CLAUSE_UNROLL_PARTIAL:
@@ -15134,6 +15136,7 @@ gimplify_omp_loop (tree *expr_p, gimple_seq *pre_p)
              }
            pc = &OMP_CLAUSE_CHAIN (*pc);
            break;
+         case OMP_CLAUSE_TILE:
          case OMP_CLAUSE_UNROLL_PARTIAL:
          case OMP_CLAUSE_UNROLL_FULL:
          case OMP_CLAUSE_UNROLL_NONE:
diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc
index 0f326128874..e568ba0703e 100644
--- a/gcc/omp-general.cc
+++ b/gcc/omp-general.cc
@@ -2264,7 +2264,7 @@ omp_loop_transform_clause_p (tree c)

   enum omp_clause_code code = OMP_CLAUSE_CODE (c);
   return (code == OMP_CLAUSE_UNROLL_FULL || code == OMP_CLAUSE_UNROLL_PARTIAL
-         || code == OMP_CLAUSE_UNROLL_NONE);
+         || code == OMP_CLAUSE_UNROLL_NONE || code == OMP_CLAUSE_TILE);
 }

 /* Try to resolve declare variant, return the variant decl if it should
diff --git a/gcc/omp-transform-loops.cc b/gcc/omp-transform-loops.cc
index d845d0e4798..858a271261a 100644
--- a/gcc/omp-transform-loops.cc
+++ b/gcc/omp-transform-loops.cc
@@ -211,6 +211,9 @@ gomp_for_constant_iterations_p (gomp_for *omp_for,
   return true;
 }

+static gimple_seq
+expand_transformed_loop (gomp_for *omp_for);
+
 /* Split a gomp_for that represents a collapsed loop-nest into single
    loops. The result is a gomp_for of the same kind which is not collapsed
    (i.e. gimple_omp_for_collapse (OMP_FOR) == 1) and which contains nested,
@@ -220,7 +223,7 @@ gomp_for_constant_iterations_p (gomp_for *omp_for,
    FROM_DEPTH are left collapsed. */

 static gomp_for*
-gomp_for_uncollapse (gomp_for *omp_for, int from_depth = 0)
+gomp_for_uncollapse (gomp_for *omp_for, int from_depth = 0, bool expand = false)
 {
   int collapse = gimple_omp_for_collapse (omp_for);
   gcc_assert (from_depth < collapse);
@@ -251,7 +254,11 @@ gomp_for_uncollapse (gomp_for *omp_for, int from_depth = 0)
       gimple_omp_for_set_index (level_omp_for, 0,
                                gimple_omp_for_index (omp_for, level));

-      body = level_omp_for;
+
+      if (expand)
+       body = expand_transformed_loop (level_omp_for);
+      else
+       body = level_omp_for;
     }

   omp_for->collapse = from_depth;
@@ -808,6 +815,316 @@ canonicalize_conditions (gomp_for *omp_for)
   return new_decls;
 }

+/* Execute the tiling transformation for OMP_FOR with the given TILE_SIZES and
+   return the resulting gimple bind. TILE_SIZES must be a non-empty tree chain
+   of integer constants and the collapse of OMP_FOR must be at least the length
+   of TILE_SIZES. TRANSFORMATION_CLAUSES are the loop transformations that
+   must be applied to OMP_FOR. Those are applied on the result of the tiling
+   transformation. LOC is the location for diagnostic messages.
+
+   Example 1
+   ---------
+   ---------
+
+   Original loop
+   -------------
+
+   #pragma omp for
+   #pragma omp tile sizes(3)
+   for (i = 1; i <= n; i = i + 1)
+   {
+       body;
+   }
+
+   Internally, the tile directive is represented as a clause on the
+   omp for, i.e. as #pragma omp for tile_sizes(3).
+
+   Transformed loop
+   ----------------
+
+   #pragma omp for
+   for (.omp_tile_index = 1; .omp_tile_index < ceil(n/3); .omp_tile_index = .omp_tile_index + 3)
+   {
+       D.4287 = .omp_tile_index + 3 + 1
+       #pragma omp loop_transform
+       for (i = .omp_tile_index; i < D.4287; i = i + 1)
+       {
+           if (i.0 > n)
+               goto L.0
+           body;
+       }
+       L_0:
+    }
+
+   The outer loop is the "floor loop" and the inner loop is the "tile
+   loop".  The tile loop is never in canonical loop nest form and
+   hence it cannot be associated with any loop construct. The
+   GCC-internal "omp loop transform" construct will be lowered after
+   the tiling transformation.
+ */
+
+static gimple_seq
+tile (gomp_for *omp_for, location_t loc, tree tile_sizes,
+      tree transformation_clauses, walk_ctx *ctx)
+{
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE | MSG_PRIORITY_INTERNALS,
+                    dump_user_location_t::from_location_t (loc),
+                    "Executing tile transformation %T:\n %G\n",
+                    transformation_clauses, static_cast<gimple *> (omp_for));
+
+  gimple_seq tile_loops = copy_gimple_seq_and_replace_locals (omp_for);
+  gimple_seq floor_loops = copy_gimple_seq_and_replace_locals (omp_for);
+
+  size_t collapse = gimple_omp_for_collapse (omp_for);
+  size_t tiling_depth = list_length (tile_sizes);
+  tree clauses = gimple_omp_for_clauses (omp_for);
+  size_t clause_collapse = 1;
+  tree collapse_clause = NULL;
+
+  if (tree c = omp_find_clause (clauses, OMP_CLAUSE_ORDERED))
+    {
+      error_at (OMP_CLAUSE_LOCATION (c),
+               "%<ordered%> invalid in conjunction with %<omp tile%>");
+      return omp_for;
+    }
+
+  if (tree c = omp_find_clause (clauses, OMP_CLAUSE_COLLAPSE))
+    {
+      tree expr = OMP_CLAUSE_COLLAPSE_EXPR (c);
+      clause_collapse = tree_to_uhwi (expr);
+      collapse_clause = c;
+    }
+
+  /* The 'omp tile' construct creates a canonical loop-nest whose nesting depth
+     equals tiling_depth. The whole loop-nest has depth at least 2 *
+     omp_tile_depth, but the 'tile loops' at levels
+     omp_tile_depth+1...2*omp_tile_depth are not in canonical loop-nest form
+     and hence cannot be associated with a loop construct. */
+  if (clause_collapse > tiling_depth)
+    {
+      error_at (OMP_CLAUSE_LOCATION (collapse_clause),
+               "collapse cannot extend below the floor loops "
+               "generated by the %<omp tile%> construct");
+      OMP_CLAUSE_COLLAPSE_EXPR (collapse_clause)
+         = build_int_cst (unsigned_type_node, tiling_depth);
+      return transform_gomp_for (omp_for, NULL, ctx);
+    }
+
+  if (tiling_depth > collapse)
+    return transform_gomp_for (omp_for, NULL, ctx);
+
+  gcc_assert (collapse >= clause_collapse);
+
+  push_gimplify_context ();
+
+  /* Create the index variables for iterating the tiles in the floor
+     loops first tiling_depth loops transformed loop nest. */
+  gimple_seq floor_loops_pre_body = NULL;
+  size_t tile_level = 0;
+  auto_vec<tree> sizes_vec;
+  for (tree el = tile_sizes; el; el = TREE_CHAIN (el), tile_level++)
+    {
+      size_t nest_level = tile_level;
+      tree index = gimple_omp_for_index (omp_for, nest_level);
+      tree init = gimple_omp_for_initial (omp_for, nest_level);
+      tree incr = gimple_omp_for_incr (omp_for, nest_level);
+      tree step = TREE_OPERAND (incr, 1);
+
+      /* Initialize original index variables in the pre-body.  The
+        loop lowering will not initialize them because of the changed
+        index variables. */
+      gimplify_assign (index, init, &floor_loops_pre_body);
+
+      tree tile_size = fold_convert (TREE_TYPE (step), TREE_VALUE (el));
+      sizes_vec.safe_push (tile_size);
+      tree tile_index = create_tmp_var (TREE_TYPE (index), ".omp_tile_index");
+      gimplify_assign (tile_index, init, &floor_loops_pre_body);
+
+      /* Floor loops */
+      step = fold_build2 (MULT_EXPR, TREE_TYPE (step), step, tile_size);
+      tree tile_step = step;
+      /* For combined constructs, step will be gimplified on the outer
+        gomp_for. */
+      if (!gimple_omp_for_combined_into_p (omp_for) && !TREE_CONSTANT (step))
+       {
+         tile_step = create_tmp_var (TREE_TYPE (step), ".omp_tile_step");
+         gimplify_assign (tile_step, step, &floor_loops_pre_body);
+       }
+      incr = fold_build2 (TREE_CODE (incr), TREE_TYPE (incr), tile_index,
+                         tile_step);
+      gimple_omp_for_set_incr (floor_loops, nest_level, incr);
+      gimple_omp_for_set_index (floor_loops, nest_level, tile_index);
+    }
+  gbind *result_bind = gimple_build_bind (NULL, NULL, NULL);
+  pop_gimplify_context (result_bind);
+  gimple_seq_add_seq (gimple_omp_for_pre_body_ptr (floor_loops),
+                     floor_loops_pre_body);
+
+  /* The tiling loops will not form a perfect loop nest because the
+     loop for each tiling dimension needs to check if the current tile
+     is incomplete and this check is intervening code. Since OpenMP
+     5.1 does not allow the collapse of the loop-nest to extend beyond
+     the floor loops, this is not a problem.
+
+     "Uncollapse" the tiling loop nest, i.e. split the loop nest into
+     nested separate gomp_for structures for each level. This allows
+     to add the incomplete tile checks to each level loop. */
+
+  tile_loops = gomp_for_uncollapse (as_a <gomp_for *> (tile_loops));
+  gimple_omp_for_set_kind (as_a<gomp_for *> (tile_loops),
+                          GF_OMP_FOR_KIND_TRANSFORM_LOOP);
+  gimple_omp_for_set_clauses (tile_loops, NULL_TREE);
+  gimple_omp_for_set_pre_body (tile_loops, NULL);
+
+  /* Transform the loop bodies of the "uncollapsed" tiling loops and
+     add them to the body of the floor loops.  At this point, the
+     loop nest consists of perfectly nested gimple_omp_for constructs,
+     each representing a single loop. */
+  gimple_seq floor_loops_body = NULL;
+  gimple *level_loop = tile_loops;
+  gimple_seq_add_stmt (&floor_loops_body, tile_loops);
+  gimple_seq *surrounding_seq = &floor_loops_body;
+
+  push_gimplify_context ();
+
+  tree break_label = create_artificial_label (UNKNOWN_LOCATION);
+  gimple_seq_add_stmt (surrounding_seq, gimple_build_label (break_label));
+  for (size_t level = 0; level < tiling_depth; level++)
+    {
+      tree original_index = gimple_omp_for_index (omp_for, level);
+      tree original_final = gimple_omp_for_final (omp_for, level);
+
+      tree tile_index = gimple_omp_for_index (floor_loops, level);
+      tree tile_size = sizes_vec[level];
+      tree type = TREE_TYPE (tile_index);
+      tree plus_type = type;
+
+      tree incr = gimple_omp_for_incr (omp_for, level);
+      tree step = omp_get_for_step_from_incr (gimple_location (omp_for), incr);
+
+      gimple_seq *pre_body = gimple_omp_for_pre_body_ptr (level_loop);
+      gimple_seq level_body = gimple_omp_body (level_loop);
+      gcc_assert (gimple_omp_for_collapse (level_loop) == 1);
+      tree_code original_cond = gimple_omp_for_cond (omp_for, level);
+
+      gimple_omp_for_set_initial (level_loop, 0, tile_index);
+
+      tree tile_final = create_tmp_var (type);
+      tree scaled_tile_size = fold_build2 (MULT_EXPR, TREE_TYPE (tile_size),
+                                          tile_size, step);
+
+      tree_code plus_code = PLUS_EXPR;
+      if (POINTER_TYPE_P (TREE_TYPE (tile_index)))
+       {
+         plus_code = POINTER_PLUS_EXPR;
+         int unsignedp = TYPE_UNSIGNED (TREE_TYPE (scaled_tile_size));
+         plus_type = signed_or_unsigned_type_for (unsignedp, ptrdiff_type_node);
+       }
+
+      scaled_tile_size = fold_convert (plus_type, scaled_tile_size);
+      gimplify_assign (tile_final,
+                      fold_build2 (plus_code, type,
+                                   tile_index, scaled_tile_size),
+                      pre_body);
+      gimple_omp_for_set_final (level_loop, 0, tile_final);
+
+      /* Redefine the original loop index variable of OMP_FOR in terms of the
+        floor loop and the tiling loop index variable for the current
+        dimension/level at the top of the loop. */
+      gimple_seq level_preamble = NULL;
+
+      push_gimplify_context ();
+
+      tree body_label = create_artificial_label (UNKNOWN_LOCATION);
+
+      /* Handle partial tiles, i.e. add a check that breaks from the tile loop
+        if the new index value does not belong to the iteration space of the
+        original loop. */
+      gimple_seq_add_stmt (&level_preamble,
+                          gimple_build_cond (original_cond, original_index,
+                                             original_final, body_label,
+                                             break_label));
+      gimple_seq_add_stmt (&level_preamble, gimple_build_label (body_label));
+
+      auto gsi = gsi_start (level_body);
+      gsi_insert_seq_before (&gsi, level_preamble, GSI_SAME_STMT);
+      gbind *level_bind = gimple_build_bind (NULL, NULL, NULL);
+      pop_gimplify_context (level_bind);
+      gimple_bind_set_body (level_bind, level_body);
+      gimple_omp_set_body (level_loop, level_bind);
+
+      surrounding_seq = &level_body;
+      level_loop = gsi_stmt (gsi);
+
+      /* The label for jumping out of the loop at the next nesting
+        level. For the outermost level, the label is put after the
+        loop-nest, for the last one it is not necessary. */
+      if (level != tiling_depth - 1)
+       {
+         break_label = create_artificial_label (UNKNOWN_LOCATION);
+         gsi_insert_after (&gsi, gimple_build_label (break_label),
+                           GSI_NEW_STMT);
+       }
+    }
+
+  gbind *tile_loops_bind;
+  tile_loops_bind = gimple_build_bind (NULL, tile_loops, NULL);
+  pop_gimplify_context (tile_loops_bind);
+
+  gimple_omp_set_body (floor_loops, tile_loops_bind);
+
+  tree remaining_clauses = OMP_CLAUSE_CHAIN (transformation_clauses);
+
+  /* Collapsing of the OMP_FOR is used both for the "omp tile"
+     implementation and for the actual "collapse" clause. If the
+     tiling depth was greater than the collapse depth required by the
+     clauses on OMP_FOR, the collapse of OMP_FOR must be adjusted to
+     the latter value and all loops below the new collapse depth must
+     be transformed to GF_OMP_FOR_KIND_TRANSFORM_LOOP to ensure their
+     lowering in this pass. */
+  size_t new_collapse = clause_collapse;
+
+  /* Keep the omp_for collapsed if there are further transformations */
+  if (remaining_clauses)
+    {
+      size_t next_transform_depth = 1;
+      if (OMP_CLAUSE_CODE (remaining_clauses) == OMP_CLAUSE_TILE)
+       next_transform_depth
+           = list_length (OMP_CLAUSE_TILE_SIZES (remaining_clauses));
+
+      /* The current "omp tile" transformation reduces the nesting depth
+        of the canonical loop-nest to TILING_DEPTH.
+        Hence the following "omp tile" transformation is invalid if
+        it requires a greater nesting depth. */
+      gcc_assert (next_transform_depth <= tiling_depth);
+      if (next_transform_depth > new_collapse)
+       new_collapse = next_transform_depth;
+    }
+
+  if (collapse > new_collapse)
+    floor_loops = gomp_for_uncollapse (as_a<gomp_for *> (floor_loops),
+                                      new_collapse, true);
+
+  /* Lower the uncollapsed tile loops. */
+  walk_omp_for_loops (gimple_bind_body_ptr (tile_loops_bind), ctx);
+
+  gcc_assert (remaining_clauses || !collapse_clause
+             || gimple_omp_for_collapse (floor_loops)
+             == (size_t)clause_collapse);
+
+  if (gimple_omp_for_combined_into_p (omp_for))
+    ctx->inner_combined_loop = as_a<gomp_for *> (floor_loops);
+
+  /* Apply remaining transformation clauses and assemble the transformation
+     result. */
+  gimple_bind_set_body (result_bind,
+                       transform_gomp_for (as_a<gomp_for *> (floor_loops),
+                                           remaining_clauses, ctx));
+
+  return result_bind;
+}
+
 /* Combined distribute or taskloop constructs are represented by two
    or more nested gomp_for constructs which are created during
    gimplification. Loop transformations on the combined construct are
@@ -999,6 +1316,10 @@ transform_gomp_for (gomp_for *omp_for, tree transformation, walk_ctx *ctx)
                                 ctx);
       }
       break;
+    case OMP_CLAUSE_TILE:
+      result = tile (omp_for, loc, OMP_CLAUSE_TILE_SIZES (transformation),
+                    transformation, ctx);
+      break;
     default:
       gcc_unreachable ();
     }
@@ -1177,6 +1498,21 @@ optimize_transformation_clauses (tree clauses)
              unroll_partial = c;
          }
          break;
+       case OMP_CLAUSE_TILE:
+         {
+           /* No optimization for those clauses yet, but they end any chain of
+              "unroll partial" clauses. */
+           if (merged_unroll_partial && dump_enabled_p ())
+             print_optimized_unroll_partial_msg (unroll_partial);
+
+           if (unroll_partial)
+             OMP_CLAUSE_CHAIN (unroll_partial) = c;
+
+           unroll_partial = NULL;
+           merged_unroll_partial = false;
+           last_non_unroll = c;
+         }
+         break;
        default:
          gcc_unreachable ();
        }
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-1.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-1.f90
new file mode 100644
index 00000000000..84ea93300fa
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-1.f90
@@ -0,0 +1,163 @@
+subroutine test
+  implicit none
+  integer :: i, j, k
+
+  !$omp tile sizes(1)
+  do i = 1,100
+     call dummy(i)
+  end do
+
+  !$omp tile sizes(1)
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$end omp tile
+
+  !$omp tile sizes(2+3)
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$end omp tile
+
+  !$omp tile sizes(-21) ! { dg-error {tile size not constant positive integer at \(1\)} }
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$end omp tile
+
+  !$omp tile sizes(0) ! { dg-error {tile size not constant positive integer at \(1\)} }
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$end omp tile
+
+  !$omp tile sizes(i) ! { dg-error {Constant expression required at \(1\)} }
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$end omp tile
+
+  !$omp tile sizes ! { dg-error {Syntax error in 'tile sizes' list at \(1\)} }
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$end omp tile
+
+  !$omp tile sizes( ! { dg-error {Syntax error in 'tile sizes' list at \(1\)} }
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$end omp tile
+
+  !$omp tile sizes(2 ! { dg-error {Syntax error in 'tile sizes' list at \(1\)} }
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$end omp tile
+
+  !$omp tile sizes() ! { dg-error {Syntax error in 'tile sizes' list at \(1\)} }
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$end omp tile
+
+  !$omp tile sizes(2,) ! { dg-error {Syntax error in 'tile sizes' list at \(1\)} }
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$end omp tile
+
+  !$omp tile sizes(,2) ! { dg-error {Syntax error in 'tile sizes' list at \(1\)} }
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$end omp tile
+
+  !$omp tile sizes(,i) ! { dg-error {Syntax error in 'tile sizes' list at \(1\)} }
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$end omp tile
+
+  !$omp tile sizes(i,) ! { dg-error {Constant expression required at \(1\)} }
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$end omp tile
+
+  !$omp tile sizes(1,2)
+  do i = 1,100
+     do j = 1,100
+        call dummy(j)
+     end do
+  end do
+  !$end omp tile
+
+  !$omp tile sizes(1,2) ! { dg-error {not enough DO loops for \!\$OMP TILE at \(1\)} }
+  do i = 1,100
+     call dummy(i)
+  end do
+  !$end omp tile
+
+  !$omp tile sizes(1,2,1) ! { dg-error {not enough DO loops for \!\$OMP TILE at \(1\)} }
+  do i = 1,100
+     do j = 1,100
+        call dummy(i)
+     end do
+  end do
+  !$end omp tile
+
+  !$omp tile sizes(1,2,1)
+  do i = 1,100
+     do j = 1,100
+        do k = 1,100
+           call dummy(i)
+        end do
+     end do
+  end do
+  !$end omp tile
+
+  !$omp tile sizes(1,2,1)
+  do i = 1,100
+     do j = 1,100
+        do k = 1,100
+           call dummy(i)
+        end do
+     end do
+     call dummy(i) ! { dg-error {\!\$OMP TILE loops not perfectly nested at \(1\)} }
+  end do
+  !$end omp tile
+
+  !$omp tile sizes(1,2,1)
+  do i = 1,100
+     do j = 1,100
+        do k = 1,100
+           call dummy(i)
+        end do
+        call dummy(j) ! { dg-error {\!\$OMP TILE loops not perfectly nested at \(1\)} }
+     end do
+  end do
+  !$end omp tile
+
+  !$omp tile sizes(1,2,1) ! { dg-error {not enough DO loops for \!\$OMP TILE at \(1\)} }
+  do i = 1,100
+     call dummy(i)
+     do j = 1,100
+        do k = 1,100
+           call dummy(i)
+        end do
+     end do
+  end do
+  !$end omp tile
+
+  !$omp tile sizes(1,2,1) ! { dg-error {not enough DO loops for \!\$OMP TILE at \(1\)} }
+  do i = 1,100
+     do j = 1,100
+        call dummy(j)
+        do k = 1,100
+           call dummy(i)
+        end do
+     end do
+  end do
+  !$end omp tile
+end subroutine test
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-1a.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-1a.f90
new file mode 100644
index 00000000000..29d7532bc37
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-1a.f90
@@ -0,0 +1,10 @@
+
+subroutine test
+  !$omp tile sizes(1,2,1) ! { dg-error {not enough DO loops for \!\$OMP TILE at \(1\)} }
+  do i = 1,100
+     do j = 1,100
+        call dummy(i)
+     end do
+  end do
+  !$end omp tile
+end subroutine test
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-2.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-2.f90
new file mode 100644
index 00000000000..8a5eae3a188
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-2.f90
@@ -0,0 +1,80 @@
+subroutine test1
+  implicit none
+  integer :: i, j, k
+
+  !$omp tile sizes (1,2)
+  !$omp tile sizes (1,2)
+  do i = 1,100
+     do j = 1,100
+        call dummy(j)
+        do k = 1,100
+           call dummy(i)
+        end do
+     end do
+  end do
+  !$end omp tile
+
+  !$omp tile sizes (8)
+  !$omp tile sizes (1,2)
+  !$omp tile sizes (1,2,3)
+  do i = 1,100
+     do j = 1,100
+        do k = 1,100
+           call dummy(i)
+        end do
+     end do
+  end do
+  !$end omp tile
+end subroutine test1
+
+subroutine test2
+  implicit none
+  integer :: i, j, k
+
+  !$omp taskloop collapse(2)
+  !$omp tile sizes (3,4)
+  !$omp tile sizes (1,2)
+  do i = 1,100
+     do j = 1,100
+        call dummy(j)
+        do k = 1,100
+           call dummy(i)
+        end do
+     end do
+  end do
+  !$end omp tile
+  !$omp end taskloop
+
+  !$omp taskloop simd
+  !$omp tile sizes (8)
+  !$omp tile sizes (1,2)
+  !$omp tile sizes (1,2,3)
+  do i = 1,100
+     do j = 1,100
+        do k = 1,100
+           call dummy(i)
+        end do
+     end do
+  end do
+  !$end omp tile
+  !$omp end taskloop simd
+end subroutine test2
+
+subroutine test3
+  implicit none
+  integer :: i, j, k
+
+  !$omp taskloop collapse(3) ! { dg-error {not enough DO loops for collapsed \!\$OMP TASKLOOP at \(1\)} }
+  !$omp tile sizes (1,2) ! { dg-error {loop nest depth after \!\$OMP TILE at \(1\) is insufficient for outer \!\$OMP TASKLOOP} }
+  !$omp tile sizes (1,2)
+  do i = 1,100
+     do j = 1,100
+        call dummy(j)
+        do k = 1,100
+           call dummy(i)
+        end do
+     end do
+  end do
+  !$end omp tile
+  !$omp end taskloop
+end subroutine test3
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-3.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-3.f90
new file mode 100644
index 00000000000..eaa7895eaa0
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-3.f90
@@ -0,0 +1,18 @@
+subroutine test
+  implicit none
+  integer :: i, j, k
+
+  !$omp parallel do collapse(2) ordered(2)
+  !$omp tile sizes (1,2)
+  do i = 1,100 ! { dg-error {'ordered' invalid in conjunction with 'omp tile'} }
+     do j = 1,100
+        call dummy(j)
+        do k = 1,100
+           call dummy(i)
+        end do
+     end do
+  end do
+  !$end omp tile
+  !$end omp target
+
+end subroutine test
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-4.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-4.f90
new file mode 100644
index 00000000000..b2dca0bbec6
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-4.f90
@@ -0,0 +1,95 @@
+
+subroutine test1
+  implicit none
+  integer :: i, j, k
+
+  !$omp tile sizes (1,2)
+  !$omp tile sizes (1)  ! { dg-error {loop nest depth after \!\$OMP TILE at \(1\) is insufficient for outer \!\$OMP TILE} }
+  do i = 1,100
+     do j = 1,100
+        call dummy(j)
+        do k = 1,100
+           call dummy(i)
+        end do
+     end do
+  end do
+  !$end omp tile
+
+end subroutine test1
+
+subroutine test2
+  implicit none
+  integer :: i, j, k
+
+  !$omp tile sizes (1,2)
+  !$omp tile sizes (1)  ! { dg-error {loop nest depth after \!\$OMP TILE at \(1\) is insufficient for outer \!\$OMP TILE} }
+  do i = 1,100
+     do j = 1,100
+        call dummy(j)
+        do k = 1,100
+           call dummy(i)
+        end do
+     end do
+  end do
+  !$end omp tile
+
+end subroutine test2
+
+subroutine test3
+  implicit none
+  integer :: i, j, k
+
+  !$omp target teams distribute
+  !$omp tile sizes (1,2)
+  !$omp tile sizes (1)  ! { dg-error {loop nest depth after \!\$OMP TILE at \(1\) is insufficient for outer \!\$OMP TILE} }
+  do i = 1,100
+     do j = 1,100
+        call dummy(j)
+        do k = 1,100
+           call dummy(i)
+        end do
+     end do
+  end do
+  !$end omp tile
+
+end subroutine test3
+
+subroutine test4
+  implicit none
+  integer :: i, j, k
+
+  !$omp target teams distribute collapse(2)
+  !$omp tile sizes (8)  ! { dg-error {loop nest depth after \!\$OMP TILE at \(1\) is insufficient for outer \!\$OMP TARGET TEAMS DISTRIBUTE} }
+  !$omp tile sizes (1,2)
+  do i = 1,100
+     do j = 1,100
+        call dummy(j)
+        do k = 1,100
+           call dummy(i)
+        end do
+     end do
+  end do
+  !$end omp tile
+
+end subroutine test4
+
+subroutine test5
+  implicit none
+  integer :: i, j, k
+
+  !$omp parallel do collapse(2) ordered(2)
+  !$omp tile sizes (8)  ! { dg-error {loop nest depth after \!\$OMP TILE at \(1\) is insufficient for outer \!\$OMP PARALLEL DO} }
+  !$omp tile sizes (1,2)
+  do i = 1,100
+     do j = 1,100
+        call dummy(j)
+        do k = 1,100
+           call dummy(i)
+        end do
+     end do
+  end do
+  !$end omp tile
+  !$end omp tile
+  !$end omp target
+
+end subroutine test5
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-unroll-1.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-unroll-1.f90
new file mode 100644
index 00000000000..27920701b36
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-unroll-1.f90
@@ -0,0 +1,57 @@
+function mult (a, b) result (c)
+  integer, allocatable, dimension (:,:) :: a,b,c
+  integer :: i, j, k, inner
+
+  allocate(c( n, m ))
+
+  !$omp parallel do collapse(2)
+  !$omp tile sizes (8,8)
+  !$omp unroll partial(2) ! { dg-error {loop nest depth after \!\$OMP UNROLL at \(1\) is insufficient for outer \!\$OMP TILE} }
+  ! { dg-error {loop nest depth after \!\$OMP UNROLL at \(1\) is insufficient for outer \!\$OMP PARALLEL DO} "" { target *-*-*} .-1 }
+  do i = 1,m
+     do j = 1,n
+        inner = 0
+        do k = 1, n
+           inner = inner + a(k, i) * b(j, k)
+        end do
+        c(j, i) = inner
+     end do
+  end do
+
+  !$omp tile sizes (8,8)
+  !$omp unroll partial(2) ! { dg-error {loop nest depth after \!\$OMP UNROLL at \(1\) is insufficient for outer \!\$OMP TILE} }
+  do i = 1,m
+     do j = 1,n
+        inner = 0
+        do k = 1, n
+           inner = inner + a(k, i) * b(j, k)
+        end do
+        c(j, i) = inner
+     end do
+  end do
+
+  !$omp tile sizes (8)
+  !$omp unroll partial(1)
+  do i = 1,m
+     do j = 1,n
+        inner = 0
+        do k = 1, n
+           inner = inner + a(k, i) * b(j, k)
+        end do
+        c(j, i) = inner
+     end do
+  end do
+
+  !$omp parallel do collapse(2) ! { dg-error {missing canonical loop nest after \!\$OMP PARALLEL DO at \(1\)} }
+  !$omp tile sizes (8,8) ! { dg-error {missing canonical loop nest after \!\$OMP TILE at \(1\)} }
+  !$omp unroll full ! { dg-warning {\!\$OMP UNROLL with FULL clause at \(1\) turns loop into a non-loop} }
+  do i = 1,m
+     do j = 1,n
+        inner = 0
+        do k = 1, n
+           inner = inner + a(k, i) * b(j, k)
+        end do
+        c(j, i) = inner
+     end do
+  end do
+end function mult
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-1.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-1.f90
new file mode 100644
index 00000000000..cda878f3037
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-1.f90
@@ -0,0 +1,37 @@
+! { dg-additional-options "-fdump-tree-original" }
+! { dg-additional-options "-fdump-tree-omp_transform_loops" }
+
+function mult (a, b) result (c)
+  integer, allocatable, dimension (:,:) :: a,b,c
+  integer :: i, j, k, inner
+
+  allocate(c( n, m ))
+
+  !$omp parallel do
+  !$omp unroll partial(1)
+  !$omp tile sizes (8,8)
+  do i = 1,m
+     do j = 1,n
+        inner = 0
+        do k = 1, n
+           inner = inner + a(k, i) * b(j, k)
+        end do
+        c(j, i) = inner
+     end do
+  end do
+end function mult
+
+! { dg-final { scan-tree-dump-times {#pragma omp for nowait unroll_partial\(1\) tile sizes\(8, 8\)} 1 "original" } }
+! { dg-final { scan-tree-dump-not "#pragma omp loop_transform unroll_partial" "omp_transform_loops" } }
+
+! Tiling adds two floor and two tile loops.
+
+! Number of conditional statements after tiling:
+!     5
+!  =  2    (lowering of 2 tile loops)
+!  +  1    (partial tile handling in 2 tile loops)
+!  +  1    (lowering of non-associated floor loop)
+
+! The unrolling with unroll factor 1 currently gets executed (TODO could/should be skipped?)
+
+! { dg-final { scan-tree-dump-times {if \([A-Za-z0-9_.]+ < } 5 "omp_transform_loops" } }
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-2.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-2.f90
new file mode 100644
index 00000000000..00615011856
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-2.f90
@@ -0,0 +1,41 @@
+! { dg-additional-options "-fdump-tree-original" }
+! { dg-additional-options "-fdump-tree-omp_transform_loops" }
+
+function mult (a, b) result (c)
+  integer, allocatable, dimension (:,:) :: a,b,c
+  integer :: i, j, k, inner
+
+  allocate(c( n, m ))
+  c = 0
+
+  !$omp target
+  !$omp parallel do
+  !$omp unroll partial(2)
+  !$omp tile sizes (8,8,4)
+  do i = 1,m
+     do j = 1,n
+        do k = 1, n
+           c(j,i) = c(j,i) + a(k, i) * b(j, k)
+        end do
+     end do
+  end do
+  !$omp end target
+end function mult
+
+! { dg-final { scan-tree-dump-times {#pragma omp for nowait unroll_partial\(2\) tile sizes\(8, 8, 4\)} 1 "original" } }
+! { dg-final { scan-tree-dump-not "#pragma omp loop_transform unroll_partial" "omp_transform_loops" } }
+
+! Check the number of loops
+
+! Tiling adds three tile and three floor loops.
+! The outermost floor loop is associated with the "!$omp parallel do"
+! and hence it isn't lowered in the transformation pass.
+! Number of conditional statements after tiling:
+!     8
+!  =  2    (inner floor loop lowering)
+!  +  3    (partial tile handling in 3 tile loops)
+!  +  3    (lowering of 3 tile loops)
+!
+! Unrolling creates 2 copies of the tiled loop nest.
+
+! { dg-final { scan-tree-dump-times {if \([A-Za-z0-9_.]+ < } 16 "omp_transform_loops" } }
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index f1429824158..b241e144515 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -534,6 +534,9 @@ enum omp_clause_code {

   /* Internal representation for an "omp unroll partial" directive. */
   OMP_CLAUSE_UNROLL_PARTIAL,
+
+  /* Represents a "tile" directive internally. */
+  OMP_CLAUSE_TILE
 };

 #undef DEFTREESTRUCT
diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
index cae81719e68..02c207d87a0 100644
--- a/gcc/tree-pretty-print.cc
+++ b/gcc/tree-pretty-print.cc
@@ -521,6 +521,14 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, dump_flags_t flags)
          pp_right_paren (pp);
        }
       break;
+    case OMP_CLAUSE_TILE:
+      pp_string (pp, "tile sizes");
+      pp_left_paren (pp);
+      gcc_assert (OMP_CLAUSE_TILE_SIZES (clause));
+      dump_generic_node (pp, OMP_CLAUSE_TILE_SIZES (clause), spc, flags,
+                        false);
+      pp_right_paren (pp);
+      break;
     case OMP_CLAUSE__LOOPTEMP_:
       name = "_looptemp_";
       goto print_remap;
diff --git a/gcc/tree.cc b/gcc/tree.cc
index fc7e22d352f..893f509fa3a 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -327,8 +327,10 @@ unsigned const char omp_clause_num_ops[] =
   0, /* OMP_CLAUSE_FINALIZE */
   0, /* OMP_CLAUSE_NOHOST */
   0, /* OMP_CLAUSE_UNROLL_FULL */
+
   0, /* OMP_CLAUSE_UNROLL_NONE */
-  1 /* OMP_CLAUSE_UNROLL_PARTIAL */
+  1, /* OMP_CLAUSE_UNROLL_PARTIAL */
+  1  /* OMP_CLAUSE_TILE */
 };

 const char * const omp_clause_code_name[] =
@@ -422,7 +424,8 @@ const char * const omp_clause_code_name[] =
   "nohost",
   "unroll_full",
   "unroll_none",
-  "unroll_partial"
+  "unroll_partial",
+  "tile"
 };

 /* Unless specific to OpenACC, we tend to internally maintain OpenMP-centric
diff --git a/gcc/tree.h b/gcc/tree.h
index 6f7a6e7017a..8f4d2761d1a 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1790,6 +1790,9 @@ class auto_suppress_location_wrappers
 #define OMP_CLAUSE_UNROLL_PARTIAL_EXPR(NODE) \
   OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_UNROLL_PARTIAL), 0)

+#define OMP_CLAUSE_TILE_SIZES(NODE) \
+  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_TILE), 0)
+
 #define OMP_CLAUSE_PROC_BIND_KIND(NODE) \
   (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_PROC_BIND)->omp_clause.subcode.proc_bind_kind)

diff --git a/libgomp/testsuite/libgomp.c++/loop-transforms/unroll-full-tile.C b/libgomp/testsuite/libgomp.c++/loop-transforms/unroll-full-tile.C
new file mode 100644
index 00000000000..8970bfa7fd8
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c++/loop-transforms/unroll-full-tile.C
@@ -0,0 +1,84 @@
+#include <stdlib.h>
+#include <stdio.h>
+
+template <int dim0, int dim1>
+int sum ()
+{
+  int sum = 0;
+#pragma omp unroll full
+#pragma omp tile sizes(dim0, dim1)
+  for (unsigned i = 0; i < 4; i++)
+    for (unsigned j = 0; j < 5; j++)
+      sum++;
+
+  return sum;
+}
+
+int main ()
+{
+  if (sum <1,1> () != 20)
+    __builtin_abort ();
+  if (sum <1,2> () != 20)
+    __builtin_abort ();
+  if (sum <1,3> () != 20)
+    __builtin_abort ();
+  if (sum <1,4> () != 20)
+    __builtin_abort ();
+  if (sum <1,5> () != 20)
+    __builtin_abort ();
+
+  if (sum <2,1> () != 20)
+    __builtin_abort ();
+  if (sum <2,2> () != 20)
+    __builtin_abort ();
+  if (sum <2,3> () != 20)
+    __builtin_abort ();
+  if (sum <2,4> () != 20)
+    __builtin_abort ();
+  if (sum <2,5> () != 20)
+    __builtin_abort ();
+
+  if (sum <3,1> () != 20)
+    __builtin_abort ();
+  if (sum <3,2> () != 20)
+    __builtin_abort ();
+  if (sum <3,3> () != 20)
+    __builtin_abort ();
+  if (sum <3,4> () != 20)
+    __builtin_abort ();
+  if (sum <3,5> () != 20)
+    __builtin_abort ();
+
+  if (sum <4,1> () != 20)
+    __builtin_abort ();
+  if (sum <4,2> () != 20)
+    __builtin_abort ();
+  if (sum <4,3> () != 20)
+    __builtin_abort ();
+  if (sum <4,4> () != 20)
+    __builtin_abort ();
+  if (sum <4,5> () != 20)
+    __builtin_abort ();
+
+  if (sum <5,1> () != 20)
+    __builtin_abort ();
+  if (sum <5,2> () != 20)
+    __builtin_abort ();
+  if (sum <5,3> () != 20)
+    __builtin_abort ();
+  if (sum <5,4> () != 20)
+    __builtin_abort ();
+  if (sum <5,5> () != 20)
+    __builtin_abort ();
+
+  if (sum <6,1> () != 20)
+    __builtin_abort ();
+  if (sum <6,2> () != 20)
+    __builtin_abort ();
+  if (sum <6,3> () != 20)
+    __builtin_abort ();
+  if (sum <6,4> () != 20)
+    __builtin_abort ();
+  if (sum <6,5> () != 20)
+    __builtin_abort ();
+}
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-1.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-1.f90
new file mode 100644
index 00000000000..bb48c31224e
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-1.f90
@@ -0,0 +1,71 @@
+module matrix
+  implicit none
+  integer :: n = 10
+  integer :: m =  10
+
+contains
+  function mult (a, b) result (c)
+    integer, allocatable, dimension (:,:) :: a,b,c
+    integer :: i, j, k, inner
+
+    allocate(c( n, m ))
+    !$omp parallel do collapse(2) private(inner)
+       !$omp tile sizes (8, 1)
+       do i = 1,m
+          do j = 1,n
+          inner = 0
+          do k = 1, n
+             inner = inner + a(k, i) * b(j, k)
+          end do
+          c(j, i) = inner
+       end do
+    end do
+  end function mult
+
+  subroutine print_matrix (m)
+    integer, allocatable :: m(:,:)
+    integer :: i, j, n
+
+    n = size (m, 1)
+    do i = 1,n
+       do j = 1,n
+          write (*, fmt="(i4)", advance='no') m(j, i)
+       end do
+      write (*, *)  ""
+   end do
+      write (*, *)  ""
+   end subroutine
+
+end module matrix
+
+program main
+  use matrix
+  implicit none
+
+  integer, allocatable :: a(:,:),b(:,:),c(:,:)
+  integer :: i,j
+
+  allocate(a( n, m ))
+  allocate(b( n, m ))
+
+  do i = 1,n
+     do j = 1,m
+        a(j,i) = merge(1,0, i.eq.j)
+        b(j,i) = j
+     end do
+  end do
+
+  c = mult (a, b)
+
+  call print_matrix (a)
+  call print_matrix (b)
+  call print_matrix (c)
+
+  do i = 1,n
+     do j = 1,m
+        if (b(i,j) .ne. c(i,j)) call abort ()
+     end do
+  end do
+
+
+end program main
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90
new file mode 100644
index 00000000000..6aedbf4724f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90
@@ -0,0 +1,117 @@
+! { dg-additional-options "-fdump-tree-original" }
+! { dg-do run }
+
+module test_functions
+  contains
+  integer function compute_sum1() result(sum)
+    implicit none
+
+    integer :: i,j
+
+    sum = 0
+    !$omp do
+    do i = 1,10,3
+       !$omp tile sizes(2)
+       do j = 1,10,3
+          sum = sum + 1
+       end do
+    end do
+  end function
+
+  integer function compute_sum2() result(sum)
+    implicit none
+
+    integer :: i,j
+
+    sum = 0
+    !$omp do
+    do i = 1,10,3
+       !$omp tile sizes(16)
+       do j = 1,10,3
+          sum = sum + 1
+       end do
+    end do
+  end function
+
+  integer function compute_sum3() result(sum)
+    implicit none
+
+    integer :: i,j
+
+    sum = 0
+    !$omp do
+    do i = 1,10,3
+       !$omp tile sizes(100)
+       do j = 1,10,3
+          sum = sum + 1
+       end do
+    end do
+  end function
+
+  integer function compute_sum4() result(sum)
+    implicit none
+
+    integer :: i,j
+
+    sum = 0
+    !$omp do
+    !$omp tile sizes(6,10)
+    do i = 1,10,3
+       do j = 1,10,3
+          sum = sum + 1
+       end do
+    end do
+  end function
+
+  integer function compute_sum5() result(sum)
+    implicit none
+
+    integer :: i,j
+
+    sum = 0
+    !$omp parallel do collapse(2)
+    !$omp tile sizes(6,10)
+    do i = 1,10,3
+       do j = 1,10,3
+          sum = sum + 1
+       end do
+    end do
+  end function
+end module test_functions
+
+program test
+  use test_functions
+  implicit none
+
+  integer :: result
+
+  result = compute_sum1 ()
+  write (*,*) result
+  if (result .ne. 16) then
+     call abort
+  end if
+
+  result = compute_sum2 ()
+  write (*,*) result
+  if (result .ne. 16) then
+     call abort
+  end if
+
+  result = compute_sum3 ()
+  write (*,*) result
+  if (result .ne. 16) then
+     call abort
+  end if
+
+  result = compute_sum4 ()
+  write (*,*) result
+  if (result .ne. 16) then
+     call abort
+  end if
+
+  result = compute_sum5 ()
+  write (*,*) result
+  if (result .ne. 16) then
+     call abort
+  end if
+end program
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-1.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-1.f90
new file mode 100644
index 00000000000..2f2f014ead9
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-1.f90
@@ -0,0 +1,112 @@
+module matrix
+  implicit none
+  integer :: n = 10
+  integer :: m =  10
+
+contains
+
+  function mult (a, b) result (c)
+    integer, allocatable, dimension (:,:) :: a,b,c
+    integer :: i, j, k, inner
+
+    allocate(c( n, m ))
+    do i = 1,10
+       do j = 1,n
+          c(j,i) = 0
+       end do
+    end do
+
+    !$omp unroll partial(10)
+    !$omp tile sizes(1, 3)
+    do i = 1,10
+       do j = 1,n
+          do k = 1, n
+       write (*,*) i, j, k
+             c(j,i) = c(j,i) + a(k, i) * b(j, k)
+          end do
+       end do
+    end do
+  end function mult
+
+  function mult2 (a, b) result (c)
+    integer, allocatable, dimension (:,:) :: a,b,c
+    integer :: i, j, k, inner
+
+    allocate(c( n, m ))
+    do i = 1,10
+       do j = 1,n
+          c(j,i) = 0
+       end do
+    end do
+
+    !$omp unroll partial(2)
+    !$omp tile sizes(1,2)
+    do i = 1,10
+       do j = 1,n
+          do k = 1, n
+       write (*,*) i, j, k
+             c(j,i) = c(j,i) + a(k, i) * b(j, k)
+          end do
+       end do
+    end do
+  end function mult2
+
+  subroutine print_matrix (m)
+    integer, allocatable :: m(:,:)
+    integer :: i, j, n
+
+    n = size (m, 1)
+    do i = 1,n
+       do j = 1,n
+          write (*, fmt="(i4)", advance='no') m(j, i)
+       end do
+      write (*, *)  ""
+   end do
+      write (*, *)  ""
+   end subroutine
+
+end module matrix
+
+program main
+  use matrix
+  implicit none
+
+  integer, allocatable :: a(:,:),b(:,:),c(:,:)
+  integer :: i,j
+
+  allocate(a( n, m ))
+  allocate(b( n, m ))
+
+  do i = 1,n
+     do j = 1,m
+        a(j,i) = merge(1,0, i.eq.j)
+        b(j,i) = j
+     end do
+  end do
+
+  ! c = mult (a, b)
+
+  ! call print_matrix (a)
+  ! call print_matrix (b)
+  ! call print_matrix (c)
+
+  ! do i = 1,n
+  !    do j = 1,m
+  !       if (b(i,j) .ne. c(i,j)) call abort ()
+  !    end do
+  ! end do
+
+
+  c = mult2 (a, b)
+
+  call print_matrix (a)
+  call print_matrix (b)
+  call print_matrix (c)
+
+  do i = 1,n
+     do j = 1,m
+        if (b(i,j) .ne. c(i,j)) call abort ()
+     end do
+  end do
+
+end program main
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-2.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-2.f90
new file mode 100644
index 00000000000..1b5b623b838
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-2.f90
@@ -0,0 +1,71 @@
+module matrix
+  implicit none
+  integer :: n = 10
+  integer :: m =  10
+
+contains
+
+  function copy (a, b) result (c)
+    integer, allocatable, dimension (:,:) :: a,b,c
+    integer :: i, j, k, inner
+
+    allocate(c( n, m ))
+    do i = 1,10
+       do j = 1,n
+          c(j,i) = 0
+       end do
+    end do
+
+    !$omp unroll partial(2)
+    !$omp tile sizes (1,5)
+    do i = 1,10
+       do j = 1,n
+          c(j,i) = c(j,i) + a(j, i)
+       end do
+    end do
+  end function copy
+
+  subroutine print_matrix (m)
+    integer, allocatable :: m(:,:)
+    integer :: i, j, n
+
+    n = size (m, 1)
+    do i = 1,n
+       do j = 1,n
+          write (*, fmt="(i4)", advance='no') m(j, i)
+       end do
+      write (*, *)  ""
+   end do
+      write (*, *)  ""
+   end subroutine
+end module matrix
+
+program main
+  use matrix
+  implicit none
+
+  integer, allocatable :: a(:,:),b(:,:),c(:,:)
+  integer :: i,j
+
+  allocate(a( n, m ))
+  allocate(b( n, m ))
+
+  do i = 1,n
+     do j = 1,m
+        a(j,i) = 1
+     end do
+  end do
+
+  c = copy (a, b)
+
+  call print_matrix (a)
+  call print_matrix (b)
+  call print_matrix (c)
+
+  do i = 1,n
+     do j = 1,m
+        if (c(i,j) .ne. a(i,j)) call abort ()
+     end do
+  end do
+
+end program main
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-3.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-3.f90
new file mode 100644
index 00000000000..518968f1335
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-3.f90
@@ -0,0 +1,77 @@
+module matrix
+  implicit none
+  integer :: n = 4
+  integer :: m = 4
+
+contains
+  function mult (a, b) result (c)
+    integer, allocatable, dimension (:,:) :: a,b,c
+    integer :: i, j, k, inner
+
+    allocate(c( n, m ))
+    ! omp do private(inner)
+    do i = 1,m
+       !$omp unroll partial(4)
+       !$omp tile sizes (5)
+       do j = 1,n
+          do k = 1, n
+             write (*,*) "i", i, "j", j, "k", k
+             if (k == 1) then
+                inner = 0
+             endif
+             inner = inner + a(k, i) * b(j, k)
+             if (k == n) then
+                c(j, i) = inner
+             endif
+          end do
+       end do
+    end do
+  end function mult
+
+  subroutine print_matrix (m)
+    integer, allocatable :: m(:,:)
+    integer :: i, j, n
+
+    n = size (m, 1)
+    do i = 1,n
+       do j = 1,n
+          write (*, fmt="(i4)", advance='no') m(j, i)
+       end do
+      write (*, *)  ""
+   end do
+      write (*, *)  ""
+   end subroutine
+
+end module matrix
+
+program main
+  use matrix
+  implicit none
+
+  integer, allocatable :: a(:,:),b(:,:),c(:,:)
+  integer :: i,j
+
+  allocate(a( n, m ))
+  allocate(b( n, m ))
+
+  do i = 1,n
+     do j = 1,m
+        a(j,i) = merge(1,0, i.eq.j)
+        b(j,i) = j
+     end do
+  end do
+
+  c = mult (a, b)
+
+  call print_matrix (a)
+  call print_matrix (b)
+  call print_matrix (c)
+
+  do i = 1,n
+     do j = 1,m
+        if (b(i,j) .ne. c(i,j)) call abort ()
+     end do
+  end do
+
+
+end program main
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-4.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-4.f90
new file mode 100644
index 00000000000..807135df5e8
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-4.f90
@@ -0,0 +1,75 @@
+module matrix
+  implicit none
+  integer :: n = 4
+  integer :: m = 4
+
+contains
+  function mult (a, b) result (c)
+    integer, allocatable, dimension (:,:) :: a,b,c
+    integer :: i, j, k, inner
+
+    allocate(c( n, m ))
+    do i = 1,m
+       do j = 1,n
+          c(j, i) = 0
+       end do
+    end do
+
+    !$omp parallel do
+    do i = 1,m
+       !$omp tile sizes (5,2)
+       do j = 1,n
+          do k = 1, n
+             c(j,i) = c(j,i) + a(k, i) * b(j, k)
+          end do
+       end do
+    end do
+  end function mult
+
+  subroutine print_matrix (m)
+    integer, allocatable :: m(:,:)
+    integer :: i, j, n
+
+    n = size (m, 1)
+    do i = 1,n
+       do j = 1,n
+          write (*, fmt="(i4)", advance='no') m(j, i)
+       end do
+      write (*, *)  ""
+   end do
+      write (*, *)  ""
+   end subroutine
+
+end module matrix
+
+program main
+  use matrix
+  implicit none
+
+  integer, allocatable :: a(:,:),b(:,:),c(:,:)
+  integer :: i,j
+
+  allocate(a( n, m ))
+  allocate(b( n, m ))
+
+  do i = 1,n
+     do j = 1,m
+        a(j,i) = merge(1,0, i.eq.j)
+        b(j,i) = j
+     end do
+  end do
+
+  c = mult (a, b)
+
+  call print_matrix (a)
+  call print_matrix (b)
+  call print_matrix (c)
+
+  do i = 1,n
+     do j = 1,m
+        if (b(i,j) .ne. c(i,j)) call abort ()
+     end do
+  end do
+
+
+end program main
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-tile-1.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-tile-1.f90
new file mode 100644
index 00000000000..2f2f014ead9
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-tile-1.f90
@@ -0,0 +1,112 @@
+module matrix
+  implicit none
+  integer :: n = 10
+  integer :: m =  10
+
+contains
+
+  function mult (a, b) result (c)
+    integer, allocatable, dimension (:,:) :: a,b,c
+    integer :: i, j, k, inner
+
+    allocate(c( n, m ))
+    do i = 1,10
+       do j = 1,n
+          c(j,i) = 0
+       end do
+    end do
+
+    !$omp unroll partial(10)
+    !$omp tile sizes(1, 3)
+    do i = 1,10
+       do j = 1,n
+          do k = 1, n
+       write (*,*) i, j, k
+             c(j,i) = c(j,i) + a(k, i) * b(j, k)
+          end do
+       end do
+    end do
+  end function mult
+
+  function mult2 (a, b) result (c)
+    integer, allocatable, dimension (:,:) :: a,b,c
+    integer :: i, j, k, inner
+
+    allocate(c( n, m ))
+    do i = 1,10
+       do j = 1,n
+          c(j,i) = 0
+       end do
+    end do
+
+    !$omp unroll partial(2)
+    !$omp tile sizes(1,2)
+    do i = 1,10
+       do j = 1,n
+          do k = 1, n
+       write (*,*) i, j, k
+             c(j,i) = c(j,i) + a(k, i) * b(j, k)
+          end do
+       end do
+    end do
+  end function mult2
+
+  subroutine print_matrix (m)
+    integer, allocatable :: m(:,:)
+    integer :: i, j, n
+
+    n = size (m, 1)
+    do i = 1,n
+       do j = 1,n
+          write (*, fmt="(i4)", advance='no') m(j, i)
+       end do
+      write (*, *)  ""
+   end do
+      write (*, *)  ""
+   end subroutine
+
+end module matrix
+
+program main
+  use matrix
+  implicit none
+
+  integer, allocatable :: a(:,:),b(:,:),c(:,:)
+  integer :: i,j
+
+  allocate(a( n, m ))
+  allocate(b( n, m ))
+
+  do i = 1,n
+     do j = 1,m
+        a(j,i) = merge(1,0, i.eq.j)
+        b(j,i) = j
+     end do
+  end do
+
+  ! c = mult (a, b)
+
+  ! call print_matrix (a)
+  ! call print_matrix (b)
+  ! call print_matrix (c)
+
+  ! do i = 1,n
+  !    do j = 1,m
+  !       if (b(i,j) .ne. c(i,j)) call abort ()
+  !    end do
+  ! end do
+
+
+  c = mult2 (a, b)
+
+  call print_matrix (a)
+  call print_matrix (b)
+  call print_matrix (c)
+
+  do i = 1,n
+     do j = 1,m
+        if (b(i,j) .ne. c(i,j)) call abort ()
+     end do
+  end do
+
+end program main
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-tile-2.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-tile-2.f90
new file mode 100644
index 00000000000..1b5b623b838
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-tile-2.f90
@@ -0,0 +1,71 @@
+module matrix
+  implicit none
+  integer :: n = 10
+  integer :: m =  10
+
+contains
+
+  function copy (a, b) result (c)
+    integer, allocatable, dimension (:,:) :: a,b,c
+    integer :: i, j, k, inner
+
+    allocate(c( n, m ))
+    do i = 1,10
+       do j = 1,n
+          c(j,i) = 0
+       end do
+    end do
+
+    !$omp unroll partial(2)
+    !$omp tile sizes (1,5)
+    do i = 1,10
+       do j = 1,n
+          c(j,i) = c(j,i) + a(j, i)
+       end do
+    end do
+  end function copy
+
+  subroutine print_matrix (m)
+    integer, allocatable :: m(:,:)
+    integer :: i, j, n
+
+    n = size (m, 1)
+    do i = 1,n
+       do j = 1,n
+          write (*, fmt="(i4)", advance='no') m(j, i)
+       end do
+      write (*, *)  ""
+   end do
+      write (*, *)  ""
+   end subroutine
+end module matrix
+
+program main
+  use matrix
+  implicit none
+
+  integer, allocatable :: a(:,:),b(:,:),c(:,:)
+  integer :: i,j
+
+  allocate(a( n, m ))
+  allocate(b( n, m ))
+
+  do i = 1,n
+     do j = 1,m
+        a(j,i) = 1
+     end do
+  end do
+
+  c = copy (a, b)
+
+  call print_matrix (a)
+  call print_matrix (b)
+  call print_matrix (c)
+
+  do i = 1,n
+     do j = 1,m
+        if (c(i,j) .ne. a(i,j)) call abort ()
+     end do
+  end do
+
+end program main
--
2.36.1

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 5/7] openmp: Add C/C++ support for "omp tile"
  2023-03-24 15:30 [PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives Frederik Harwath
                   ` (3 preceding siblings ...)
  2023-03-24 15:30 ` [PATCH 4/7] openmp: Add Fortran support for "omp tile" Frederik Harwath
@ 2023-03-24 15:30 ` Frederik Harwath
  2023-03-24 15:30 ` [PATCH 6/7] openmp: Add Fortran support for loop transformations on inner loops Frederik Harwath
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Frederik Harwath @ 2023-03-24 15:30 UTC (permalink / raw)
  To: gcc-patches, tobias, jakub, joseph, jason

This commit adds the C and C++ front end support for the "omp tile"
directive.

gcc/c-family/ChangeLog:

        * c-omp.cc (c_omp_directives): Add PRAGMA_OMP_TILE.
        * c-pragma.cc (omp_pragmas_simd): Likewise.
        * c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_TILE.
        (enum pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_TILE

gcc/c/ChangeLog:

        * c-parser.cc (c_parser_nested_omp_unroll_clauses): Rename and
        generalize ...
        (c_parser_omp_nested_loop_transform_clauses): ... to this.
        (c_parser_omp_for_loop): Handle "omp tile" parsing in loop nests.
        (c_parser_omp_tile_sizes): Parse single "sizes" clause.
        (c_parser_omp_loop_transform_clause): New function.
        (c_parser_omp_tile): New function for parsing "omp tile"
        (c_parser_omp_unroll): Adjust to renaming.
        (c_parser_omp_construct): Handle PRAGMA_OMP_TILE.

gcc/cp/ChangeLog:

        * parser.cc (cp_parser_omp_clause_unroll_partial): Adjust.
        (cp_parser_nested_omp_unroll_clauses): Rename ...
        (cp_parser_omp_nested_loop_transform_clauses): ... to this.
        (cp_parser_omp_for_loop): Handle "omp tile" parsing in loop nests.
        (cp_parser_omp_tile_sizes): New function, parses single "sizes" clause
        (cp_parser_omp_tile): New function for parsing "omp tile".
        (cp_parser_omp_loop_transform_clause): New  function.
        (cp_parser_omp_unroll): Adjust to renaming.
        (cp_parser_omp_construct): Handle PRAGMA_OMP_TILE.
        (cp_parser_pragma): Likewise.
        * pt.cc (tsubst_omp_clauses): Handle OMP_CLAUSE_TILE.
        * semantics.cc (finish_omp_clauses): Likewise.

gcc/ChangeLog:

        * gimplify.cc (omp_for_drop_tile_clauses): New function, ...
        (gimplify_omp_for): ... used here.

libgomp/ChangeLog:

        * testsuite/libgomp.c++/loop-transforms/tile-1.C: New test.
        * testsuite/libgomp.c++/loop-transforms/tile-2.C: New test.
        * testsuite/libgomp.c++/loop-transforms/tile-3.C: New test.

gcc/testsuite/ChangeLog:

        * c-c++-common/gomp/loop-transforms/tile-1.c: New test.
        * c-c++-common/gomp/loop-transforms/tile-2.c: New test.
        * c-c++-common/gomp/loop-transforms/tile-3.c: New test.
        * c-c++-common/gomp/loop-transforms/tile-4.c: New test.
        * c-c++-common/gomp/loop-transforms/tile-5.c: New test.
        * c-c++-common/gomp/loop-transforms/tile-6.c: New test.
        * c-c++-common/gomp/loop-transforms/tile-7.c: New test.
        * c-c++-common/gomp/loop-transforms/tile-8.c: New test.
        * c-c++-common/gomp/loop-transforms/unroll-2.c: Adapt.
        * g++.dg/gomp/loop-transforms/tile-1.h: New test.
        * g++.dg/gomp/loop-transforms/tile-1a.C: New test.
        * g++.dg/gomp/loop-transforms/tile-1b.C: New test.
---
 gcc/c-family/c-omp.cc                         |   4 +-
 gcc/c-family/c-pragma.cc                      |   1 +
 gcc/c-family/c-pragma.h                       |   2 +
 gcc/c/c-parser.cc                             | 277 ++++++++++++---
 gcc/cp/parser.cc                              | 289 +++++++++++++---
 gcc/cp/pt.cc                                  |   1 +
 gcc/cp/semantics.cc                           |  40 +++
 gcc/gimplify.cc                               |  28 ++
 .../gomp/loop-transforms/tile-1.c             | 164 +++++++++
 .../gomp/loop-transforms/tile-2.c             | 183 ++++++++++
 .../gomp/loop-transforms/tile-3.c             | 117 +++++++
 .../gomp/loop-transforms/tile-4.c             | 322 ++++++++++++++++++
 .../gomp/loop-transforms/tile-5.c             | 150 ++++++++
 .../gomp/loop-transforms/tile-6.c             |  34 ++
 .../gomp/loop-transforms/tile-7.c             |  31 ++
 .../gomp/loop-transforms/tile-8.c             |  40 +++
 .../gomp/loop-transforms/unroll-2.c           |  12 +-
 .../g++.dg/gomp/loop-transforms/tile-1.h      |  27 ++
 .../g++.dg/gomp/loop-transforms/tile-1a.C     |  27 ++
 .../g++.dg/gomp/loop-transforms/tile-1b.C     |  27 ++
 .../libgomp.c++/loop-transforms/tile-1.C      |  52 +++
 .../libgomp.c++/loop-transforms/tile-2.C      |  69 ++++
 .../libgomp.c++/loop-transforms/tile-3.C      |  28 ++
 23 files changed, 1823 insertions(+), 102 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-1.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-2.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-3.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-4.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-5.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-6.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-7.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-8.c
 create mode 100644 gcc/testsuite/g++.dg/gomp/loop-transforms/tile-1.h
 create mode 100644 gcc/testsuite/g++.dg/gomp/loop-transforms/tile-1a.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/loop-transforms/tile-1b.C
 create mode 100644 libgomp/testsuite/libgomp.c++/loop-transforms/tile-1.C
 create mode 100644 libgomp/testsuite/libgomp.c++/loop-transforms/tile-2.C
 create mode 100644 libgomp/testsuite/libgomp.c++/loop-transforms/tile-3.C

diff --git a/gcc/c-family/c-omp.cc b/gcc/c-family/c-omp.cc
index fec7f337772..2ab7faea2cc 100644
--- a/gcc/c-family/c-omp.cc
+++ b/gcc/c-family/c-omp.cc
@@ -3207,8 +3207,8 @@ const struct c_omp_directive c_omp_directives[] = {
     C_OMP_DIR_STANDALONE, false },
   { "taskyield", nullptr, nullptr, PRAGMA_OMP_TASKYIELD,
     C_OMP_DIR_STANDALONE, false },
-  /* { "tile", nullptr, nullptr, PRAGMA_OMP_TILE,
-    C_OMP_DIR_CONSTRUCT, false },  */
+  { "tile", nullptr, nullptr, PRAGMA_OMP_TILE,
+    C_OMP_DIR_CONSTRUCT, false },
   { "teams", nullptr, nullptr, PRAGMA_OMP_TEAMS,
     C_OMP_DIR_CONSTRUCT, true },
   { "threadprivate", nullptr, nullptr, PRAGMA_OMP_THREADPRIVATE,
diff --git a/gcc/c-family/c-pragma.cc b/gcc/c-family/c-pragma.cc
index 96a28ac1b0c..75d5cabbafd 100644
--- a/gcc/c-family/c-pragma.cc
+++ b/gcc/c-family/c-pragma.cc
@@ -1593,6 +1593,7 @@ static const struct omp_pragma_def omp_pragmas_simd[] = {
   { "target", PRAGMA_OMP_TARGET },
   { "taskloop", PRAGMA_OMP_TASKLOOP },
   { "teams", PRAGMA_OMP_TEAMS },
+  { "tile", PRAGMA_OMP_TILE },
   { "unroll", PRAGMA_OMP_UNROLL },
 };

diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 6686abdc94d..c0476f74441 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -81,6 +81,7 @@ enum pragma_kind {
   PRAGMA_OMP_TASKYIELD,
   PRAGMA_OMP_THREADPRIVATE,
   PRAGMA_OMP_TEAMS,
+  PRAGMA_OMP_TILE,
   PRAGMA_OMP_UNROLL,
   /* PRAGMA_OMP__LAST_ should be equal to the last PRAGMA_OMP_* code.  */
   PRAGMA_OMP__LAST_ = PRAGMA_OMP_UNROLL,
@@ -157,6 +158,7 @@ enum pragma_omp_clause {
   PRAGMA_OMP_CLAUSE_TASKGROUP,
   PRAGMA_OMP_CLAUSE_THREAD_LIMIT,
   PRAGMA_OMP_CLAUSE_THREADS,
+  PRAGMA_OMP_CLAUSE_TILE,
   PRAGMA_OMP_CLAUSE_TO,
   PRAGMA_OMP_CLAUSE_UNIFORM,
   PRAGMA_OMP_CLAUSE_UNTIED,
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index e7c9da99552..aac23dec9c0 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -20243,7 +20243,8 @@ c_parser_omp_scan_loop_body (c_parser *parser, bool open_brace_parsed)
                             "expected %<}%>");
 }

-static bool c_parser_nested_omp_unroll_clauses (c_parser *, tree &);
+static int c_parser_omp_nested_loop_transform_clauses (c_parser *, tree &, int,
+                                                      const char *);

 /* Parse the restricted form of loop statements allowed by OpenACC and OpenMP.
    The real trick here is to determine the loop control variable early
@@ -20263,16 +20264,18 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
   bool fail = false, open_brace_parsed = false;
   int i, collapse = 1, ordered = 0, count, nbraces = 0;
   location_t for_loc;
-  bool tiling = false;
+  bool oacc_tiling = false;
   bool inscan = false;
   vec<tree, va_gc> *for_block = make_tree_vector ();

   for (cl = clauses; cl; cl = OMP_CLAUSE_CHAIN (cl))
     if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_COLLAPSE)
-      collapse = tree_to_shwi (OMP_CLAUSE_COLLAPSE_EXPR (cl));
+      {
+       collapse = tree_to_shwi (OMP_CLAUSE_COLLAPSE_EXPR (cl));
+      }
     else if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_OACC_TILE)
       {
-       tiling = true;
+       oacc_tiling = true;
        collapse = list_length (OMP_CLAUSE_OACC_TILE_LIST (cl));
       }
     else if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_ORDERED
@@ -20295,21 +20298,31 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
       ordered = collapse;
     }

-  gcc_assert (tiling || (collapse >= 1 && ordered >= 0));
+  c_parser_omp_nested_loop_transform_clauses (parser, clauses, collapse,
+                                             "loop collapse");
+
+  /* Find the depth of the loop nest affected by "omp tile"
+     directives. There can be several such directives, but the tiling
+     depth of the outer ones may not be larger than the depth of the
+     innermost directive. */
+  int omp_tile_depth = 0;
+  for (tree c = clauses; c; c = TREE_CHAIN (c))
+    {
+      if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_TILE)
+       continue;
+
+      omp_tile_depth = list_length (OMP_CLAUSE_TILE_SIZES (c));
+    }
+
+  gcc_assert (oacc_tiling || (collapse >= 1 && ordered >= 0));
   count = ordered ? ordered : collapse;
+  count = MAX (count, omp_tile_depth);

   declv = make_tree_vec (count);
   initv = make_tree_vec (count);
   condv = make_tree_vec (count);
   incrv = make_tree_vec (count);

-  if (c_parser_nested_omp_unroll_clauses (parser, clauses)
-      && count > 1)
-    {
-      error_at (loc, "collapse cannot be larger than 1 on an unrolled loop");
-      return NULL;
-    }
-
   if (!c_parser_next_token_is_keyword (parser, RID_FOR))
     {
       c_parser_error (parser, "for statement expected");
@@ -23945,47 +23958,224 @@ c_parser_omp_taskloop (location_t loc, c_parser *parser,
        ( (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_PARTIAL)      \
          | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_FULL) )

-/* Parse zero or more '#pragma omp unroll' that follow
-   another directive that requires a canonical loop nest. */
+/* OpenMP 5.1: Parse sizes list for "omp tile sizes"
+   sizes ( size-expr-list ) */
+static tree
+c_parser_omp_tile_sizes (c_parser *parser, location_t loc)
+{
+  tree sizes = NULL_TREE;

-static bool
-c_parser_nested_omp_unroll_clauses (c_parser *parser, tree &clauses)
+  c_token *tok = c_parser_peek_token (parser);
+  if (tok->type != CPP_NAME
+      || strcmp ("sizes", IDENTIFIER_POINTER (tok->value)))
+    {
+      c_parser_error (parser, "expected %<sizes%>");
+      return error_mark_node;
+    }
+  c_parser_consume_token (parser);
+
+  if (!c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
+    return error_mark_node;
+
+  do
+    {
+      if (sizes && !c_parser_require (parser, CPP_COMMA, "expected %<,%>"))
+       return error_mark_node;
+
+      location_t expr_loc = c_parser_peek_token (parser)->location;
+      c_expr cexpr = c_parser_expr_no_commas (parser, NULL);
+      cexpr = convert_lvalue_to_rvalue (expr_loc, cexpr, false, true);
+      tree expr = cexpr.value;
+
+      if (expr == error_mark_node)
+       {
+         c_parser_skip_until_found (parser, CPP_CLOSE_PAREN,
+                                    "expected %<)%>");
+         return error_mark_node;
+       }
+
+      expr = c_fully_fold (expr, false, NULL);
+
+      if (!INTEGRAL_TYPE_P (TREE_TYPE (expr)) || !tree_fits_shwi_p (expr)
+         || tree_to_shwi (expr) <= 0)
+       {
+         c_parser_error (parser, "%<tile sizes%> argument needs positive"
+                                 " integral constant");
+         expr = integer_zero_node;
+       }
+
+      sizes = tree_cons (NULL_TREE, expr, sizes);
+    }
+  while (c_parser_next_token_is_not (parser, CPP_CLOSE_PAREN));
+  c_parser_consume_token (parser);
+
+  gcc_assert (sizes);
+  tree c  = build_omp_clause (loc, OMP_CLAUSE_TILE);
+  OMP_CLAUSE_TILE_SIZES (c) = sizes;
+
+  return c;
+}
+
+/* Parse a single OpenMP loop transformation directive and return the
+   clause that is used internally to represent the directive. */
+
+static tree
+c_parser_omp_loop_transform_clause (c_parser *parser)
 {
-  static const char *p_name = "#pragma omp unroll";
-  c_token *tok;
-  bool found_unroll = false;
-  while (c_parser_next_token_is (parser, CPP_PRAGMA)
-        && (tok = c_parser_peek_token (parser),
-            tok->pragma_kind == PRAGMA_OMP_UNROLL))
+  c_token *tok = c_parser_peek_token (parser);
+  if (tok->type != CPP_PRAGMA)
+    return NULL_TREE;
+
+  tree c;
+  switch (tok->pragma_kind)
     {
+    case PRAGMA_OMP_UNROLL:
       c_parser_consume_pragma (parser);
-      tree c = c_parser_omp_all_clauses (parser, OMP_UNROLL_CLAUSE_MASK,
-                                        p_name, true);
-      if (c)
+      c = c_parser_omp_all_clauses (parser, OMP_UNROLL_CLAUSE_MASK,
+                                   "#pragma omp unroll", false, true);
+      if (!c)
        {
-         gcc_assert (!TREE_CHAIN (c));
-         found_unroll = true;
-         if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_UNROLL_FULL)
-           {
-             error_at (tok->location, "%<full%> clause is invalid here; "
-                       "turns loop into non-loop");
-             continue;
-           }
+         if (c_parser_next_token_is (parser, CPP_PRAGMA_EOL))
+           c = build_omp_clause (tok->location, OMP_CLAUSE_UNROLL_NONE);
+         else
+           c = error_mark_node;
        }
-      else
+      c_parser_skip_to_pragma_eol (parser);
+      break;
+
+    case PRAGMA_OMP_TILE:
+      c_parser_consume_pragma (parser);
+      c = c_parser_omp_tile_sizes (parser, tok->location);
+      c_parser_skip_to_pragma_eol (parser);
+      break;
+
+    default:
+      c = NULL_TREE;
+      break;
+    }
+
+  gcc_assert (!c || !TREE_CHAIN (c));
+  return c;
+}
+
+/* Parse zero or more OpenMP loop transformation directives that
+   follow another directive that requires a canonical loop nest and
+   append all to CLAUSES.  Return the nesting depth
+   of the transformed loop nest.
+
+   REQUIRED_DEPTH is the nesting depth of the loop nest required by
+   the preceding directive.  OUTER_DESCR is a description of the
+   language construct that requires the loop nest depth (e.g. "loop
+   collpase", "outer transformation") that is used for error
+   messages. */
+
+static int
+c_parser_omp_nested_loop_transform_clauses (c_parser *parser, tree &clauses,
+                                           int required_depth,
+                                           const char *outer_descr)
+{
+  tree c = NULL_TREE;
+  tree last_c = tree_last (clauses);
+
+  /* The depth of the loop nest, counting from LEVEL, after the
+     transformations. That is, the nesting depth left by the outermost
+     transformation which is the first to be parsed, but the last to be
+     executed. */
+  int transformed_depth = 0;
+
+  /* The minimum nesting depth required by the last parsed transformation. */
+  int last_depth = required_depth;
+  while ((c = c_parser_omp_loop_transform_clause (parser)))
+    {
+      /* The nesting depth left after the current transformation */
+      int depth = 1;
+      if (TREE_CODE (c) == ERROR_MARK)
+       goto error;
+
+      gcc_assert (!TREE_CHAIN (c));
+      switch (OMP_CLAUSE_CODE (c))
        {
-         error_at (tok->location, "%<#pragma omp unroll%> without "
-                                  "%<partial%> clause is invalid here; "
-                                  "turns loop into non-loop");
-         continue;
+       case OMP_CLAUSE_UNROLL_FULL:
+         error_at (OMP_CLAUSE_LOCATION (c),
+                   "%<full%> clause is invalid here; "
+                   "turns loop into non-loop");
+         goto error;
+       case OMP_CLAUSE_UNROLL_NONE:
+         error_at (OMP_CLAUSE_LOCATION (c),
+                   "%<#pragma omp unroll%> without "
+                   "%<partial%> clause is invalid here; "
+                   "turns loop into non-loop");
+         goto error;
+       case OMP_CLAUSE_UNROLL_PARTIAL:
+         depth = 1;
+         break;
+       case OMP_CLAUSE_TILE:
+         depth = list_length (OMP_CLAUSE_TILE_SIZES (c));
+         break;
+       default:
+         gcc_unreachable ();
+       }
+
+      if (depth < last_depth)
+       {
+         bool is_outermost_clause = !transformed_depth;
+         error_at (OMP_CLAUSE_LOCATION (c),
+                   "nesting depth left after this transformation too low "
+                   "for %s",
+                   is_outermost_clause ? outer_descr
+                                       : "outer transformation");
+         goto error;
        }

-      clauses = chainon (clauses, c);
+      last_depth = depth;
+
+      if (!transformed_depth)
+       transformed_depth = last_depth;
+
+      if (!clauses)
+       clauses = c;
+      else if (last_c)
+       TREE_CHAIN (last_c) = c;
+
+      last_c = c;
     }

-  return found_unroll;
+  return transformed_depth;
+
+error:
+  while (c_parser_omp_loop_transform_clause (parser))
+    ;
+  clauses = NULL_TREE;
+  return -1;
 }

+/* OpenMP 5.1:
+   tile sizes ( size-expr-list ) */
+
+static tree
+c_parser_omp_tile (location_t loc, c_parser *parser, bool *if_p)
+{
+  tree block;
+  tree ret = error_mark_node;
+
+  tree clauses = c_parser_omp_tile_sizes (parser, loc);
+  c_parser_skip_to_pragma_eol (parser);
+
+  if (!clauses || clauses == error_mark_node)
+    return error_mark_node;
+
+  int required_depth = list_length (OMP_CLAUSE_TILE_SIZES (clauses));
+  c_parser_omp_nested_loop_transform_clauses (parser, clauses, required_depth,
+                                             "outer transformation");
+
+  block = c_begin_compound_stmt (true);
+  ret = c_parser_omp_for_loop (loc, parser, OMP_LOOP_TRANS, clauses, NULL, if_p);
+  block = c_end_compound_stmt (loc, block, true);
+  add_stmt (block);
+
+  return ret;
+ }
+
 static tree
 c_parser_omp_unroll (location_t loc, c_parser *parser, bool *if_p)
 {
@@ -23994,7 +24184,9 @@ c_parser_omp_unroll (location_t loc, c_parser *parser, bool *if_p)
   omp_clause_mask mask = OMP_UNROLL_CLAUSE_MASK;

   tree clauses = c_parser_omp_all_clauses (parser, mask, p_name, false);
-  c_parser_nested_omp_unroll_clauses (parser, clauses);
+  int required_depth = 1;
+  c_parser_omp_nested_loop_transform_clauses (parser, clauses, required_depth,
+                                             "outer transformation");

   if (!clauses)
     {
@@ -24496,6 +24688,9 @@ c_parser_omp_construct (c_parser *parser, bool *if_p)
     case PRAGMA_OMP_ASSUME:
       c_parser_omp_assume (parser, if_p);
       return;
+    case PRAGMA_OMP_TILE:
+      stmt = c_parser_omp_tile (loc, parser, if_p);
+      break;
     case PRAGMA_OMP_UNROLL:
       stmt = c_parser_omp_unroll (loc, parser, if_p);
       break;
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 90af40c4dbc..084ecd3ada5 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -43631,7 +43631,8 @@ cp_parser_omp_scan_loop_body (cp_parser *parser)
   braces.require_close (parser);
 }

-static bool cp_parser_nested_omp_unroll_clauses (cp_parser *, tree &);
+static int cp_parser_omp_nested_loop_transform_clauses (cp_parser *, tree &,
+                                                       int, const char *);

 /* Parse the restricted form of the for statement allowed by OpenMP.  */

@@ -43643,20 +43644,20 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
   tree orig_decl;
   tree real_decl, initv, condv, incrv, declv, orig_declv;
   tree this_pre_body, cl, ordered_cl = NULL_TREE;
-  location_t loc_first;
   bool collapse_err = false;
   int i, collapse = 1, ordered = 0, count, nbraces = 0;
   releasing_vec for_block;
   auto_vec<tree, 4> orig_inits;
-  bool tiling = false;
+  bool oacc_tiling = false;
   bool inscan = false;
+  location_t loc_first = cp_lexer_peek_token (parser->lexer)->location;

   for (cl = clauses; cl; cl = OMP_CLAUSE_CHAIN (cl))
     if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_COLLAPSE)
       collapse = tree_to_shwi (OMP_CLAUSE_COLLAPSE_EXPR (cl));
     else if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_OACC_TILE)
       {
-       tiling = true;
+       oacc_tiling = true;
        collapse = list_length (OMP_CLAUSE_OACC_TILE_LIST (cl));
       }
     else if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_ORDERED
@@ -43679,26 +43680,33 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
       ordered = collapse;
     }

-  gcc_assert (tiling || (collapse >= 1 && ordered >= 0));
+
+  gcc_assert (oacc_tiling || (collapse >= 1 && ordered >= 0));
   count = ordered ? ordered : collapse;

+  cp_parser_omp_nested_loop_transform_clauses (parser, clauses, count,
+                                              "loop collapse");
+
+  /* Find the depth of the loop nest affected by "omp tile"
+     directives. There can be several such directives, but the tiling
+     depth of the outer ones may not be larger than the depth of the
+     innermost directive. */
+  int omp_tile_depth = 0;
+  for (tree c = clauses; c; c = TREE_CHAIN (c))
+    {
+      if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_TILE)
+       continue;
+
+      omp_tile_depth = list_length (OMP_CLAUSE_TILE_SIZES (c));
+    }
+  count = MAX (count, omp_tile_depth);
+
   declv = make_tree_vec (count);
   initv = make_tree_vec (count);
   condv = make_tree_vec (count);
   incrv = make_tree_vec (count);
   orig_declv = NULL_TREE;

-  loc_first = cp_lexer_peek_token (parser->lexer)->location;
-
-  if (cp_parser_nested_omp_unroll_clauses (parser, clauses)
-      && count > 1)
-    {
-      error_at (loc_first,
-               "collapse cannot be larger than 1 on an unrolled loop");
-      return NULL;
-    }
-
-
   for (i = 0; i < count; i++)
     {
       int bracecount = 0;
@@ -45734,51 +45742,224 @@ cp_parser_omp_target (cp_parser *parser, cp_token *pragma_tok,
   return true;
 }

+/* OpenMP 5.1: Parse sizes list for "omp tile sizes"
+   sizes ( size-expr-list ) */
+static tree
+cp_parser_omp_tile_sizes (cp_parser *parser, location_t loc)
+{
+  tree sizes = NULL_TREE;
+  cp_lexer *lexer = parser->lexer;
+
+  cp_token *tok = cp_lexer_peek_token (lexer);
+  if (tok->type != CPP_NAME
+      || strcmp ("sizes", IDENTIFIER_POINTER (tok->u.value)))
+    {
+      cp_parser_error (parser, "expected %<sizes%>");
+      return error_mark_node;
+    }
+  cp_lexer_consume_token (lexer);
+
+  if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
+    return error_mark_node;
+
+  do
+    {
+      if (sizes && !cp_parser_require (parser, CPP_COMMA, RT_COMMA))
+       return error_mark_node;
+
+      tree expr = cp_parser_constant_expression (parser);
+      if (expr == error_mark_node)
+       {
+         cp_parser_skip_to_closing_parenthesis (parser,
+                                                /*recovering=*/true,
+                                                /*or_comma=*/false,
+                                                /*consume_paren=*/
+                                                true);
+         return error_mark_node;
+       }
+
+      sizes = tree_cons (NULL_TREE, expr, sizes);
+    }
+  while (cp_lexer_next_token_is_not (lexer, CPP_CLOSE_PAREN));
+  cp_lexer_consume_token (lexer);
+
+  gcc_assert (sizes);
+  tree c  = build_omp_clause (loc, OMP_CLAUSE_TILE);
+  OMP_CLAUSE_TILE_SIZES (c) = sizes;
+
+  return c;
+}
+
+/* OpenMP 5.1:
+   tile sizes ( size-expr-list ) */
+
+static tree
+cp_parser_omp_tile (cp_parser *parser, cp_token *tok, bool *if_p)
+{
+  tree block;
+  tree ret = error_mark_node;
+
+  tree clauses = cp_parser_omp_tile_sizes (parser, tok->location);
+  cp_parser_require_pragma_eol (parser, tok);
+
+  if (!clauses || clauses == error_mark_node)
+    return error_mark_node;
+
+  int required_depth = list_length (OMP_CLAUSE_TILE_SIZES (clauses));
+  cp_parser_omp_nested_loop_transform_clauses (
+      parser, clauses, required_depth, "outer transformation");
+
+  block = begin_omp_structured_block ();
+  clauses = finish_omp_clauses (clauses, C_ORT_OMP);
+
+  ret = cp_parser_omp_for_loop (parser, OMP_LOOP_TRANS, clauses, NULL, if_p);
+  block = finish_omp_structured_block (block);
+  add_stmt (block);
+
+  return ret;
+}
+
 #define OMP_UNROLL_CLAUSE_MASK                                 \
        ( (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_PARTIAL)      \
          | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_FULL) )

-/* Parse zero or more '#pragma omp unroll' that follow
-   another directive that requires a canonical loop nest. */
+/* Parse a single OpenMP loop transformation directive and return the
+   clause that is used internally to represent the directive. */

-static bool
-cp_parser_nested_omp_unroll_clauses (cp_parser *parser, tree &clauses)
+static tree
+cp_parser_omp_loop_transform_clause (cp_parser *parser)
 {
-  static const char *p_name = "#pragma omp unroll";
-  cp_token *tok;
-  bool unroll_found = false;
-  while (cp_lexer_next_token_is (parser->lexer, CPP_PRAGMA)
-        && (tok = cp_lexer_peek_token (parser->lexer),
-            cp_parser_pragma_kind (tok) == PRAGMA_OMP_UNROLL))
+  cp_lexer *lexer = parser->lexer;
+  cp_token *tok = cp_lexer_peek_token (lexer);
+  if (tok->type != CPP_PRAGMA)
+    return NULL_TREE;
+
+  tree c;
+  switch (cp_parser_pragma_kind (tok))
     {
-      cp_lexer_consume_token (parser->lexer);
-      gcc_assert (tok->type == CPP_PRAGMA);
-      parser->lexer->in_pragma = true;
-      tree c = cp_parser_omp_all_clauses (parser, OMP_UNROLL_CLAUSE_MASK,
-                                         p_name, tok);
-      if (c)
-       {
-         gcc_assert (!TREE_CHAIN (c));
-         unroll_found = true;
-         if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_UNROLL_FULL)
-           {
-             error_at (tok->location, "%<full%> clause is invalid here; "
-                       "turns loop into non-loop");
-             continue;
-           }
+    case PRAGMA_OMP_UNROLL:
+      cp_lexer_consume_token (lexer);
+      lexer->in_pragma = true;
+      c = cp_parser_omp_all_clauses (parser, OMP_UNROLL_CLAUSE_MASK,
+                                    "#pragma omp unroll", tok,
+                                    false, true);
+      if (!c)
+       {
+         if (cp_lexer_next_token_is (lexer, CPP_PRAGMA_EOL))
+           c = build_omp_clause (tok->location, OMP_CLAUSE_UNROLL_NONE);
+         else
+           c = error_mark_node;
+       }
+      cp_parser_skip_to_pragma_eol (parser, tok);
+      break;

-         c = finish_omp_clauses (c, C_ORT_OMP);
+    case PRAGMA_OMP_TILE:
+      cp_lexer_consume_token (lexer);
+      lexer->in_pragma = true;
+      c = cp_parser_omp_tile_sizes (parser, tok->location);
+      cp_parser_require_pragma_eol (parser, tok);
+      break;
+
+    default:
+      c = NULL_TREE;
+      break;
+    }
+
+  gcc_assert (!c || !TREE_CHAIN (c));
+  return c;
+}
+
+/* Parse zero or more OpenMP loop transformation directives that
+   follow another directive that requires a canonical loop nest and
+   append all to CLAUSES.  Return the nesting depth
+   of the transformed loop nest.
+
+   REQUIRED_DEPTH is the nesting depth of the loop nest required by
+   the preceding directive.  OUTER_DESCR is a description of the
+   language construct that requires the loop nest depth (e.g. "loop
+   collpase", "outer transformation") that is used for error
+   messages. */
+
+static int
+cp_parser_omp_nested_loop_transform_clauses (cp_parser *parser, tree &clauses,
+                                            int required_depth,
+                                            const char *outer_descr)
+{
+  tree c = NULL_TREE;
+  tree last_c = tree_last (clauses);
+
+  /* The depth of the loop nest after the transformations. That is,
+     the nesting depth left by the outermost transformation which is
+     the first to be parsed, but the last to be executed. */
+  int transformed_depth = 0;
+
+  /* The minimum nesting depth required by the last parsed transformation. */
+  int last_depth = required_depth;
+
+  while ((c = cp_parser_omp_loop_transform_clause (parser)))
+    {
+      /* The nesting depth left after the current transformation */
+      int depth = 1;
+      if (TREE_CODE (c) == ERROR_MARK)
+       goto error;
+
+      gcc_assert (!TREE_CHAIN (c));
+      switch (OMP_CLAUSE_CODE (c))
+       {
+       case OMP_CLAUSE_UNROLL_FULL:
+         error_at (OMP_CLAUSE_LOCATION (c),
+                   "%<full%> clause is invalid here; "
+                   "turns loop into non-loop");
+         goto error;
+       case OMP_CLAUSE_UNROLL_NONE:
+         error_at (OMP_CLAUSE_LOCATION (c),
+                   "%<#pragma omp unroll%> without "
+                   "%<partial%> clause is invalid here; "
+                   "turns loop into non-loop");
+         goto error;
+       case OMP_CLAUSE_UNROLL_PARTIAL:
+         depth = 1;
+         break;
+       case OMP_CLAUSE_TILE:
+         depth = list_length (OMP_CLAUSE_TILE_SIZES (c));
+         break;
+       default:
+         gcc_unreachable ();
        }
-      else
+
+      if (depth < last_depth)
        {
-         error_at (tok->location, "%<#pragma omp unroll%> without "
-                                  "%<partial%> clause is invalid here; "
-                                  "turns loop into non-loop");
-         continue;
+         bool is_outermost_clause = !transformed_depth;
+         error_at (OMP_CLAUSE_LOCATION (c),
+                   "nesting depth left after this transformation too low "
+                   "for %s",
+                   is_outermost_clause ? outer_descr
+                                       : "outer transformation");
+         goto error;
        }
-      clauses = chainon (clauses, c);
+
+      last_depth = depth;
+
+      if (!transformed_depth)
+       transformed_depth = last_depth;
+
+      c = finish_omp_clauses (c, C_ORT_OMP);
+
+      if (!clauses)
+       clauses = c;
+      else if (last_c)
+       TREE_CHAIN (last_c) = c;
+
+      last_c = c;
     }
-  return unroll_found;
+
+  return transformed_depth;
+
+error:
+  while (cp_parser_omp_loop_transform_clause (parser))
+    ;
+  clauses = NULL_TREE;
+  return -1;
 }

 static tree
@@ -45788,7 +45969,7 @@ cp_parser_omp_unroll (cp_parser *parser, cp_token *tok, bool *if_p)
   static const char *p_name = "#pragma omp unroll";
   omp_clause_mask mask = OMP_UNROLL_CLAUSE_MASK;

-  tree clauses = cp_parser_omp_all_clauses (parser, mask, p_name, tok, false);
+  tree clauses = cp_parser_omp_all_clauses (parser, mask, p_name, tok, true);

   if (!clauses)
     {
@@ -45797,7 +45978,9 @@ cp_parser_omp_unroll (cp_parser *parser, cp_token *tok, bool *if_p)
       clauses = c;
     }

-  cp_parser_nested_omp_unroll_clauses (parser, clauses);
+  int required_depth = 1;
+  cp_parser_omp_nested_loop_transform_clauses (
+      parser, clauses, required_depth, "outer transformation");

   block = begin_omp_structured_block ();
   ret = cp_parser_omp_for_loop (parser, OMP_LOOP_TRANS, clauses, NULL, if_p);
@@ -48900,6 +49083,9 @@ cp_parser_omp_construct (cp_parser *parser, cp_token *pragma_tok, bool *if_p)
     case PRAGMA_OMP_ASSUME:
       cp_parser_omp_assume (parser, pragma_tok, if_p);
       return;
+    case PRAGMA_OMP_TILE:
+      stmt = cp_parser_omp_tile (parser, pragma_tok, if_p);
+      break;
     case PRAGMA_OMP_UNROLL:
       stmt = cp_parser_omp_unroll (parser, pragma_tok, if_p);
       break;
@@ -49529,6 +49715,7 @@ cp_parser_pragma (cp_parser *parser, enum pragma_context context, bool *if_p)
       cp_parser_omp_construct (parser, pragma_tok, if_p);
       pop_omp_privatization_clauses (stmt);
       return true;
+    case PRAGMA_OMP_TILE:
     case PRAGMA_OMP_UNROLL:
       if (context != pragma_stmt && context != pragma_compound)
        goto bad_stmt;
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 16197b17e5a..a9d36d66caf 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -18087,6 +18087,7 @@ tsubst_omp_clauses (tree clauses, enum c_omp_region_type ort,
        case OMP_CLAUSE_WAIT:
        case OMP_CLAUSE_DETACH:
        case OMP_CLAUSE_UNROLL_PARTIAL:
+       case OMP_CLAUSE_TILE:
          OMP_CLAUSE_OPERAND (nc, 0)
            = tsubst_expr (OMP_CLAUSE_OPERAND (oc, 0), args, complain, in_decl);
          break;
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index c87e252ff06..15f7c7e6dc4 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -8769,6 +8769,46 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
            }
          break;

+       case OMP_CLAUSE_TILE:
+         for (tree list = OMP_CLAUSE_TILE_SIZES (c); !remove && list;
+              list = TREE_CHAIN (list))
+           {
+             t = TREE_VALUE (list);
+
+             if (t == error_mark_node)
+               remove = true;
+             else if (!type_dependent_expression_p (t)
+                      && !INTEGRAL_TYPE_P (TREE_TYPE (t)))
+               {
+                 error_at (OMP_CLAUSE_LOCATION (c),
+                           "%<tile sizes%> argument needs integral type");
+                 remove = true;
+               }
+             else
+               {
+                 t = mark_rvalue_use (t);
+                 if (!processing_template_decl)
+                   {
+                     t = maybe_constant_value (t);
+                     int n;
+                     if (!tree_fits_shwi_p (t)
+                         || !INTEGRAL_TYPE_P (TREE_TYPE (t))
+                         || (n = tree_to_shwi (t)) <= 0 || (int)n != n)
+                       {
+                         error_at (OMP_CLAUSE_LOCATION (c),
+                                   "%<tile sizes%> argument needs positive "
+                                   "integral constant");
+                         remove = true;
+                       }
+                     t = fold_build_cleanup_point_expr (TREE_TYPE (t), t);
+                   }
+               }
+
+             /* Update list item.  */
+             TREE_VALUE (list) = t;
+           }
+         break;
+
        case OMP_CLAUSE_ORDERED:
          ordered_seen = true;
          break;
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 4d504a12451..365897afb61 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -13572,6 +13572,29 @@ find_standalone_omp_ordered (tree *tp, int *walk_subtrees, void *)
   return NULL_TREE;
 }

+static void omp_for_drop_tile_clauses (tree for_stmt)
+{
+  /* Drop erroneous loop transformation clauses to avoid follow up errors
+     in pass-omp_transform_loops. */
+  tree last_c = NULL_TREE;
+  for (tree c = OMP_FOR_CLAUSES (for_stmt); c;
+       c = OMP_CLAUSE_CHAIN (c))
+    {
+
+      if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_TILE)
+       continue;
+
+      if (last_c)
+       TREE_CHAIN (last_c) = TREE_CHAIN (c);
+      else
+       OMP_FOR_CLAUSES (for_stmt) = TREE_CHAIN (c);
+
+      error_at (OMP_CLAUSE_LOCATION (c),
+               "'tile' loop transformation may not appear on "
+               "non-rectangular for");
+    }
+}
+
 /* Gimplify the gross structure of an OMP_FOR statement.  */

 static enum gimplify_status
@@ -13763,6 +13786,8 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
     case OMP_FOR:
       if (OMP_FOR_NON_RECTANGULAR (inner_for_stmt ? inner_for_stmt : for_stmt))
        {
+         omp_for_drop_tile_clauses (for_stmt);
+
          if (omp_find_clause (OMP_FOR_CLAUSES (for_stmt),
                               OMP_CLAUSE_SCHEDULE))
            error_at (EXPR_LOCATION (for_stmt),
@@ -13808,6 +13833,8 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
       ort = ORT_SIMD;
       break;
     case OMP_LOOP_TRANS:
+      if (OMP_FOR_NON_RECTANGULAR (inner_for_stmt ? inner_for_stmt : for_stmt))
+       omp_for_drop_tile_clauses (for_stmt);
       break;
     default:
       gcc_unreachable ();
@@ -14693,6 +14720,7 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
          case OMP_CLAUSE_UNROLL_FULL:
          case OMP_CLAUSE_UNROLL_NONE:
          case OMP_CLAUSE_UNROLL_PARTIAL:
+         case OMP_CLAUSE_TILE:
            *gfor_clauses_ptr = c;
            gfor_clauses_ptr = &OMP_CLAUSE_CHAIN (c);
            break;
diff --git a/gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-1.c b/gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-1.c
new file mode 100644
index 00000000000..8a2f2126af4
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-1.c
@@ -0,0 +1,164 @@
+extern void dummy (int);
+
+void
+test ()
+{
+    #pragma omp tile sizes(1)
+    for (int i = 0; i < 100; ++i)
+       dummy (i);
+
+    #pragma omp tile sizes(0) /* { dg-error {'tile sizes' argument needs positive integral constant} } */
+    for (int i = 0; i < 100; ++i)
+       dummy (i);
+
+    #pragma omp tile sizes(-1) /* { dg-error {'tile sizes' argument needs positive integral constant} } */
+    for (int i = 0; i < 100; ++i)
+       dummy (i);
+
+    #pragma omp tile sizes() /* { dg-error {expected expression before} "" { target c} } */
+    /* { dg-error {expected primary-expression before} "" { target c++ } .-1 } */
+    for (int i = 0; i < 100; ++i)
+       dummy (i);
+
+    #pragma omp tile sizes(,) /* { dg-error {expected expression before} "" { target c } } */
+    /* { dg-error {expected primary-expression before} "" { target c++ } .-1 } */
+    for (int i = 0; i < 100; ++i)
+       dummy (i);
+
+    #pragma omp tile sizes(1,2 /* { dg-error {expected '\,' before end of line} } */
+    for (int i = 0; i < 100; ++i)
+       dummy (i);
+
+    #pragma omp tile sizes /* { dg-error {expected '\(' before end of line} } */
+    for (int i = 0; i < 100; ++i)
+       dummy (i);
+
+    #pragma omp tile sizes(1) sizes(1) /* { dg-error {expected end of line before 'sizes'} } */
+    for (int i = 0; i < 100; ++i)
+       dummy (i);
+
+    #pragma omp tile sizes(1)
+    #pragma omp tile sizes(1)
+    for (int i = 0; i < 100; ++i)
+       dummy (i);
+
+    #pragma omp tile sizes(1, 2)
+    #pragma omp tile sizes(1) /* { dg-error {nesting depth left after this transformation too low for outer transformation} } */
+    for (int i = 0; i < 100; ++i)
+    for (int j = 0; j < 100; ++j)
+       dummy (i);
+
+    #pragma omp tile sizes(1, 2)
+    #pragma omp tile sizes(1, 2)
+    for (int i = 0; i < 100; ++i)
+    for (int j = 0; j < 100; ++j)
+       dummy (i);
+
+    #pragma omp tile sizes(5, 6)
+    #pragma omp tile sizes(1, 2, 3)
+    for (int i = 0; i < 100; ++i)
+    for (int j = 0; j < 100; ++j)
+    for (int k = 0; k < 100; ++k)
+       dummy (i);
+
+    #pragma omp tile sizes(1)
+    #pragma omp unroll partia /* { dg-error {expected '#pragma omp' clause before 'partia'} } */
+    #pragma omp tile sizes(1)
+    for (int i = 0; i < 100; ++i)
+       dummy (i);
+
+    #pragma omp tile sizes(1)
+    #pragma omp unroll /* { dg-error {'#pragma omp unroll' without 'partial' clause is invalid here; turns loop into non-loop} } */
+    for (int i = 0; i < 100; ++i)
+       dummy (i);
+
+    #pragma omp tile sizes(1)
+    #pragma omp unroll full /* { dg-error {'full' clause is invalid here; turns loop into non-loop} } */
+    for (int i = 0; i < 100; ++i)
+       dummy (i);
+
+    #pragma omp tile sizes(1)
+    #pragma omp unroll partial
+    #pragma omp tile sizes(1)
+    for (int i = 0; i < 100; ++i)
+       dummy (i);
+
+    #pragma omp tile sizes(8,8)
+    #pragma omp unroll partial /* { dg-error {nesting depth left after this transformation too low for outer transformation} } */
+    #pragma omp tile sizes(1)
+    for (int i = 0; i < 100; ++i)
+       dummy (i);
+
+    #pragma omp tile sizes(8,8)
+    #pragma omp unroll partial /* { dg-error {nesting depth left after this transformation too low for outer transformation} } */
+    for (int i = 0; i < 100; ++i)
+       dummy (i);
+
+    #pragma omp tile sizes(1, 2)
+    for (int i = 0; i < 100; ++i)
+    for (int j = 0; j < 100; ++j)
+       dummy (i);
+
+    #pragma omp tile sizes(1, 2) /* { dg-error {'tile' loop transformation may not appear on non-rectangular for} } */
+    for (int i = 0; i < 100; ++i)
+    for (int j = i; j < 100; ++j)
+       dummy (i);
+
+    #pragma omp tile sizes(1, 2) /* { dg-error {'tile' loop transformation may not appear on non-rectangular for} } */
+    for (int i = 0; i < 100; ++i)
+    for (int j = 2; j < i; ++j)
+       dummy (i);
+
+    #pragma omp tile sizes(1, 2, 3)
+    for (int i = 0; i < 100; ++i)
+      for (int j = 0; j < 100; ++j)
+        dummy (i); /* { dg-error {not enough perfectly nested loops before 'dummy'} "" { target c } } */
+    /* { dg-error {not enough for loops to collapse} "" { target c++ } .-1 } */
+    /* { dg-error {'i' was not declared in this scope} "" { target c++ } .-2 } */
+
+    #pragma omp tile sizes(1)
+    for (int i = 0; i < 100; ++i)
+      {
+       dummy (i);
+        for (int j = 0; j < 100; ++j)
+          dummy (i);
+      }
+
+    #pragma omp tile sizes(1)
+    for (int i = 0; i < 100; ++i)
+      {
+        for (int j = 0; j < 100; ++j)
+           dummy (j);
+       dummy (i);
+      }
+
+    #pragma omp tile sizes(1, 2)
+    for (int i = 0; i < 100; ++i)
+      {
+        dummy (i); /* { dg-error {not enough perfectly nested loops before 'dummy'} "" { target c } } */
+       /* { dg-error {not enough for loops to collapse} "" { target c++ } .-1 } */
+        for (int j = 0; j < 100; ++j)
+          dummy (j);
+      }
+
+    #pragma omp tile sizes(1, 2)
+    for (int i = 0; i < 100; ++i)
+      {
+        for (int j = 0; j < 100; ++j)
+         dummy (j);
+       dummy (i); /* { dg-error {collapsed loops not perfectly nested before 'dummy'} "" { target c} } */
+       /* { dg-error {collapsed loops not perfectly nested} "" { target c++ } .-1 } */
+      }
+
+    int s;
+    #pragma omp tile sizes(s) /* { dg-error {'tile sizes' argument needs positive integral constant} "" { target { ! c++98_only } } } */
+    /* { dg-error {the value of 's' is not usable in a constant expression} "" { target { c++ && { ! c++98_only } } } .-1 } */
+    /* { dg-error {'s' cannot appear in a constant-expression} "" { target c++98_only } .-2 } */
+    for (int i = 0; i < 100; ++i)
+       dummy (i);
+
+    #pragma omp tile sizes(42.0) /* { dg-error {'tile sizes' argument needs positive integral constant} "" { target c } } */
+    /* { dg-error {'tile sizes' argument needs integral type} "" { target c++ } .-1 } */
+    for (int i = 0; i < 100; ++i)
+       dummy (i);
+}
diff --git a/gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-2.c b/gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-2.c
new file mode 100644
index 00000000000..51d62552945
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-2.c
@@ -0,0 +1,183 @@
+extern void dummy (int);
+
+void
+test ()
+{
+    #pragma omp parallel for
+    #pragma omp tile sizes(1)
+    for (int i = 0; i < 100; ++i)
+       dummy (i);
+
+    #pragma omp parallel for
+    #pragma omp tile sizes(0) /* { dg-error {'tile sizes' argument needs positive integral constant} } */
+    for (int i = 0; i < 100; ++i)
+       dummy (i);
+
+    #pragma omp parallel for
+    #pragma omp tile sizes(-1) /* { dg-error {'tile sizes' argument needs positive integral constant} } */
+    for (int i = 0; i < 100; ++i)
+       dummy (i);
+
+    #pragma omp parallel for
+    #pragma omp tile sizes() /* { dg-error {expected expression before} "" { target c} } */
+    /* { dg-error {expected primary-expression before} "" { target c++ } .-1 } */
+    for (int i = 0; i < 100; ++i)
+       dummy (i);
+
+    #pragma omp parallel for
+    #pragma omp tile sizes(,) /* { dg-error {expected expression before} "" { target c } } */
+    /* { dg-error {expected primary-expression before} "" { target c++ } .-1 } */
+    for (int i = 0; i < 100; ++i)
+       dummy (i);
+
+    #pragma omp parallel for
+    #pragma omp tile sizes(1,2 /* { dg-error {expected '\,' before end of line} } */
+    for (int i = 0; i < 100; ++i)
+       dummy (i);
+
+    #pragma omp parallel for
+    #pragma omp tile sizes /* { dg-error {expected '\(' before end of line} } */
+    for (int i = 0; i < 100; ++i)
+       dummy (i);
+
+    #pragma omp parallel for
+    #pragma omp tile sizes(1) sizes(1) /* { dg-error {expected end of line before 'sizes'} } */
+    for (int i = 0; i < 100; ++i)
+       dummy (i);
+
+    #pragma omp parallel for
+    #pragma omp tile sizes(1)
+    #pragma omp tile sizes(1)
+    for (int i = 0; i < 100; ++i)
+       dummy (i);
+
+    #pragma omp parallel for
+    #pragma omp tile sizes(1, 2)
+    #pragma omp tile sizes(1) /* { dg-error {nesting depth left after this transformation too low for outer transformation} } */
+    for (int i = 0; i < 100; ++i)
+    for (int j = 0; j < 100; ++j)
+       dummy (i);
+
+    #pragma omp parallel for
+    #pragma omp tile sizes(1, 2)
+    #pragma omp tile sizes(1, 2)
+    for (int i = 0; i < 100; ++i)
+    for (int j = 0; j < 100; ++j)
+       dummy (i);
+
+    #pragma omp parallel for
+    #pragma omp tile sizes(5, 6)
+    #pragma omp tile sizes(1, 2, 3)
+    for (int i = 0; i < 100; ++i)
+    for (int j = 0; j < 100; ++j)
+    for (int k = 0; k < 100; ++k)
+       dummy (i);
+
+    #pragma omp parallel for
+    #pragma omp tile sizes(1)
+    #pragma omp unroll partia /* { dg-error {expected '#pragma omp' clause before 'partia'} } */
+    #pragma omp tile sizes(1)
+    for (int i = 0; i < 100; ++i)
+       dummy (i);
+
+    #pragma omp parallel for
+    #pragma omp tile sizes(1)
+    #pragma omp unroll /* { dg-error {'#pragma omp unroll' without 'partial' clause is invalid here; turns loop into non-loop} } */
+    for (int i = 0; i < 100; ++i)
+       dummy (i);
+
+    #pragma omp parallel for
+    #pragma omp tile sizes(1)
+    #pragma omp unroll full /* { dg-error {'full' clause is invalid here; turns loop into non-loop} } */
+    for (int i = 0; i < 100; ++i)
+       dummy (i);
+
+    #pragma omp parallel for
+    #pragma omp tile sizes(1)
+    #pragma omp unroll partial
+    #pragma omp tile sizes(1)
+    for (int i = 0; i < 100; ++i)
+       dummy (i);
+
+    #pragma omp parallel for
+    #pragma omp tile sizes(8,8)
+    #pragma omp unroll partial /* { dg-error {nesting depth left after this transformation too low for outer transformation} } */
+    #pragma omp tile sizes(1)
+    for (int i = 0; i < 100; ++i)
+       dummy (i);
+
+    #pragma omp parallel for
+    #pragma omp tile sizes(8,8)
+    #pragma omp unroll partial /* { dg-error {nesting depth left after this transformation too low for outer transformation} } */
+    for (int i = 0; i < 100; ++i)
+       dummy (i);
+
+    #pragma omp parallel for
+    #pragma omp tile sizes(1, 2)
+    for (int i = 0; i < 100; ++i)
+    for (int j = 0; j < 100; ++j)
+       dummy (i);
+
+    #pragma omp parallel for
+    #pragma omp tile sizes(1, 2) /* { dg-error {'tile' loop transformation may not appear on non-rectangular for} } */
+    for (int i = 0; i < 100; ++i)
+    for (int j = i; j < 100; ++j)
+       dummy (i);
+
+    #pragma omp parallel for
+    #pragma omp tile sizes(1, 2) /* { dg-error {'tile' loop transformation may not appear on non-rectangular for} } */
+    for (int i = 0; i < 100; ++i)
+    for (int j = 2; j < i; ++j)
+       dummy (i);
+
+    #pragma omp parallel for
+    #pragma omp tile sizes(1, 2, 3)
+    for (int i = 0; i < 100; ++i)
+      for (int j = 0; j < 100; ++j)
+        dummy (i); /* { dg-error {not enough perfectly nested loops before 'dummy'} "" { target c } } */
+    /* { dg-error {not enough for loops to collapse} "" { target c++ } .-1 } */
+    /* { dg-error {'i' was not declared in this scope} "" { target c++ } .-2 } */
+
+    #pragma omp parallel for
+    #pragma omp tile sizes(1)
+    for (int i = 0; i < 100; ++i)
+      {
+       dummy (i);
+        for (int j = 0; j < 100; ++j)
+          dummy (i);
+      }
+
+    #pragma omp parallel for
+    #pragma omp tile sizes(1)
+    for (int i = 0; i < 100; ++i)
+      {
+        for (int j = 0; j < 100; ++j)
+           dummy (j);
+       dummy (i);
+      }
+
+    #pragma omp parallel for
+    #pragma omp tile sizes(1, 2)
+    for (int i = 0; i < 100; ++i)
+      {
+        dummy (i); /* { dg-error {not enough perfectly nested loops before 'dummy'} "" { target c } } */
+       /* { dg-error {not enough for loops to collapse} "" { target c++ } .-1 } */
+        for (int j = 0; j < 100; ++j)
+          dummy (j);
+      }
+
+    #pragma omp parallel for
+    #pragma omp tile sizes(1, 2)
+    for (int i = 0; i < 100; ++i)
+      {
+        for (int j = 0; j < 100; ++j)
+         dummy (j);
+       dummy (i); /* { dg-error {collapsed loops not perfectly nested before 'dummy'} "" { target c} } */
+       /* { dg-error {collapsed loops not perfectly nested} "" { target c++ } .-1 } */
+      }
+
+    #pragma omp parallel for
+    #pragma omp tile sizes(1)
+    for (int i = 0; i < 100; ++i)
+       dummy (i);
+}
diff --git a/gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-3.c b/gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-3.c
new file mode 100644
index 00000000000..7fffc72b335
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-3.c
@@ -0,0 +1,117 @@
+extern void dummy (int);
+
+void
+test ()
+{
+    #pragma omp for
+    #pragma omp tile sizes(1, 2) /* { dg-error {'tile' loop transformation may not appear on non-rectangular for} } */
+    for (int i = 0; i < 100; ++i)
+    for (int j = i; j < 100; ++j)
+       dummy (i);
+
+    #pragma omp for
+    #pragma omp tile sizes(1, 2) /* { dg-error {'tile' loop transformation may not appear on non-rectangular for} } */
+    for (int i = 0; i < 100; ++i)
+    for (int j = 0; j < i; ++j)
+       dummy (i);
+
+
+#pragma omp for collapse(1)
+    #pragma omp tile sizes(1)
+    for (int i = 0; i < 100; ++i)
+    for (int j = 0; j < 100; ++j)
+       dummy (i);
+
+#pragma omp for collapse(2)
+    #pragma omp tile sizes(1) /* { dg-error {nesting depth left after this transformation too low for loop collapse} } */
+    for (int i = 0; i < 100; ++i)
+    for (int j = 0; j < 100; ++j)
+       dummy (i);
+
+#pragma omp for collapse(2)
+    #pragma omp tile sizes(1, 2)
+    for (int i = 0; i < 100; ++i)
+    for (int j = 0; j < 100; ++j)
+       dummy (i);
+
+#pragma omp for collapse(3)
+    #pragma omp tile sizes(1, 2) /* { dg-error {nesting depth left after this transformation too low for loop collapse} } */
+    for (int i = 0; i < 100; ++i)
+    for (int j = 0; j < 100; ++j)
+       dummy (i);
+    /* { dg-error {not enough perfectly nested loops before 'dummy'} "" { target c } .-1 } */
+    /* { dg-error {not enough for loops to collapse} "" { target c++ } .-2 } */
+    /* { dg-error {'i' was not declared in this scope} "" { target c++ } .-3 } */
+
+#pragma omp for collapse(1)
+#pragma omp tile sizes(1)
+#pragma omp tile sizes(1)
+    for (int i = 0; i < 100; ++i)
+       dummy (i);
+
+#pragma omp for collapse(2)
+#pragma omp tile sizes(1, 2)
+#pragma omp tile sizes(1) /* { dg-error {nesting depth left after this transformation too low for outer transformation} } */
+    for (int i = 0; i < 100; ++i)
+       dummy (i); /* { dg-error {not enough perfectly nested loops before 'dummy'} "" { target c } } */
+    /* { dg-error {not enough for loops to collapse} "" { target c++ } .-1 } */
+
+#pragma omp for collapse(2)
+#pragma omp tile sizes(1, 2)
+#pragma omp tile sizes(1, 2)
+    for (int i = 0; i < 100; ++i)
+       dummy (i); /* { dg-error {not enough perfectly nested loops before 'dummy'} "" { target c } } */
+    /* { dg-error {not enough for loops to collapse} "" { target c++ } .-1 } */
+
+#pragma omp for collapse(2)
+#pragma omp tile sizes(5, 6)
+#pragma omp tile sizes(1, 2, 3)
+    for (int i = 0; i < 100; ++i)
+       dummy (i); /* { dg-error {not enough perfectly nested loops before 'dummy'} "" { target c } } */
+    /* { dg-error {not enough for loops to collapse} "" { target c++ } .-1 } */
+
+
+#pragma omp for collapse(1)
+#pragma omp tile sizes(1)
+#pragma omp tile sizes(1)
+    for (int i = 0; i < 100; ++i)
+       dummy (i);
+
+#pragma omp for collapse(2)
+#pragma omp tile sizes(1, 2)
+#pragma omp tile sizes(1) /* { dg-error {nesting depth left after this transformation too low for outer transformation} } */
+    for (int i = 0; i < 100; ++i)
+    for (int j = 0; j < 100; ++j)
+       dummy (i);
+
+#pragma omp for collapse(2)
+#pragma omp tile sizes(1, 2)
+#pragma omp tile sizes(1, 2)
+    for (int i = 0; i < 100; ++i)
+    for (int j = 0; j < 100; ++j)
+       dummy (i);
+
+#pragma omp for collapse(2)
+#pragma omp tile sizes(5, 6)
+#pragma omp tile sizes(1, 2, 3)
+    for (int i = 0; i < 100; ++i)
+    for (int j = 0; j < 100; ++j)
+    for (int k = 0; k < 100; ++k)
+       dummy (i);
+
+#pragma omp for collapse(3)
+#pragma omp tile sizes(1, 2) /* { dg-error {nesting depth left after this transformation too low for loop collapse} } */
+#pragma omp tile sizes(1, 2)
+    for (int i = 0; i < 100; ++i)
+    for (int j = 0; j < 100; ++j)
+       dummy (i); /* { dg-error {not enough perfectly nested loops before 'dummy'} "" { target c } } */
+    /* { dg-error {not enough for loops to collapse} "" { target c++ } .-1 } */
+
+#pragma omp for collapse(3)
+#pragma omp tile sizes(5, 6) /* { dg-error {nesting depth left after this transformation too low for loop collapse} } */
+#pragma omp tile sizes(1, 2, 3)
+    for (int i = 0; i < 100; ++i)
+    for (int j = 0; j < 100; ++j)
+    for (int k = 0; k < 100; ++k)
+       dummy (i);
+}
diff --git a/gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-4.c b/gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-4.c
new file mode 100644
index 00000000000..d46bb0cb642
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-4.c
@@ -0,0 +1,322 @@
+/* { dg-do run } */
+/* { dg-options "-O0 -fopenmp-simd" } */
+
+#include <stdio.h>
+
+#define ASSERT_EQ(var, val) if (var != val) { fprintf (stderr, "%s:%d: Unexpected value %d, expected %d\n", __FILE__, __LINE__, var, val); \
+    __builtin_abort (); }
+
+int
+test1 ()
+{
+  int iter = 0;
+  int i;
+#pragma omp tile sizes(3)
+  for (i = 0; i < 10; i=i+2)
+       {
+         ASSERT_EQ (i, iter)
+         iter = iter + 2;
+       }
+
+  ASSERT_EQ (i, 10)
+  return iter;
+}
+
+int
+test2 ()
+{
+  int iter = 0;
+  int i;
+#pragma omp tile sizes(3)
+  for (i = 0; i < 10; i=i+2)
+       {
+         ASSERT_EQ (i, iter)
+         iter = iter + 2;
+       }
+
+  ASSERT_EQ (i, 10)
+  return iter;
+}
+
+int
+test3 ()
+{
+  int iter = 0;
+  int i;
+#pragma omp tile sizes(8)
+  for (i = 0; i < 10; i=i+2)
+       {
+         ASSERT_EQ (i, iter)
+         iter = iter + 2;
+       }
+
+  ASSERT_EQ (i, 10)
+  return iter;
+}
+
+int
+test4 ()
+{
+  int iter = 10;
+  int i;
+#pragma omp tile sizes(8)
+  for (i = 10; i > 0; i=i-2)
+       {
+         ASSERT_EQ (i, iter)
+         iter = iter - 2;
+       }
+  ASSERT_EQ (i, 0)
+  return iter;
+}
+
+int
+test5 ()
+{
+  int iter = 10;
+  int i;
+#pragma omp tile sizes(71)
+  for (i = 10; i > 0; i=i-2)
+       {
+         ASSERT_EQ (i, iter)
+         iter = iter - 2;
+       }
+
+  ASSERT_EQ (i, 0)
+  return iter;
+}
+
+int
+test6 ()
+{
+  int iter = 10;
+  int i;
+#pragma omp tile sizes(1)
+  for (i = 10; i > 0; i=i-2)
+       {
+         ASSERT_EQ (i, iter)
+         iter = iter - 2;
+       }
+  ASSERT_EQ (i, 0)
+  return iter;
+}
+
+int
+test7 ()
+{
+  int iter = 5;
+  int i;
+#pragma omp tile sizes(2)
+  for (i = 5; i < -5; i=i-3)
+       {
+         fprintf (stderr, "%d\n", i);
+         __builtin_abort ();
+         iter = iter - 3;
+       }
+
+  ASSERT_EQ (i, 5)
+
+  /* No iteration expected */
+  return iter;
+}
+
+int
+test8 ()
+{
+  int iter = 5;
+  int i;
+#pragma omp tile sizes(2)
+  for (i = 5; i > -5; i=i-3)
+       {
+         ASSERT_EQ (i, iter)
+         /* Expect only first iteration of the last tile to execute */
+         if (iter != -4)
+           iter = iter - 3;
+       }
+
+  ASSERT_EQ (i, -7)
+  return iter;
+}
+
+
+int
+test9 ()
+{
+  int iter = 5;
+  int i;
+#pragma omp tile sizes(5)
+  for (i = 5; i >= -5; i=i-4)
+       {
+         ASSERT_EQ (i, iter)
+         /* Expect only first iteration of the last tile to execute */
+         if (iter != - 3)
+           iter = iter - 4;
+       }
+
+  ASSERT_EQ (i, -7)
+  return iter;
+}
+
+int
+test10 ()
+{
+  int iter = 5;
+  int i;
+#pragma omp tile sizes(5)
+  for (i = 5; i >= -5; i--)
+       {
+         ASSERT_EQ (i, iter)
+         iter--;
+       }
+
+  ASSERT_EQ (i, -6)
+  return iter;
+}
+
+int
+test11 ()
+{
+  int iter = 5;
+  int i;
+#pragma omp tile sizes(15)
+  for (i = 5; i != -5; i--)
+       {
+         ASSERT_EQ (i, iter)
+         iter--;
+       }
+  ASSERT_EQ (i, -5)
+  return iter;
+}
+
+int
+test12 ()
+{
+  int iter = 0;
+  unsigned i;
+#pragma omp tile sizes(3)
+  for (i = 0; i != 5; i++)
+       {
+         ASSERT_EQ (i, iter)
+         iter++;
+       }
+
+  ASSERT_EQ (i, 5)
+  return iter;
+}
+
+int
+test13 ()
+{
+  int iter = -5;
+  long long unsigned int i;
+#pragma omp tile sizes(15)
+  for (int i = -5; i < 5; i=i+3)
+       {
+         ASSERT_EQ (i, iter)
+         iter++;
+       }
+
+  ASSERT_EQ (i, 5)
+  return iter;
+}
+
+int
+test14 (unsigned init, int step)
+{
+  int iter = init;
+  long long unsigned int i;
+#pragma omp tile sizes(8)
+  for (i = init; i < 2*init; i=i+step)
+    iter++;
+
+  ASSERT_EQ (i, 2*init)
+  return iter;
+}
+
+int
+test15 (unsigned init, int step)
+{
+  int iter = init;
+  int i;
+#pragma omp tile sizes(8)
+  for (unsigned i = init; i > 2* init; i=i+step)
+    iter++;
+
+  return iter;
+}
+
+int
+main ()
+{
+  int last_iter;
+
+  last_iter = test1 ();
+  ASSERT_EQ (last_iter, 10);
+
+  last_iter = test2 ();
+  ASSERT_EQ (last_iter, 10);
+
+  last_iter = test3 ();
+  ASSERT_EQ (last_iter, 10);
+
+  last_iter = test4 ();
+  ASSERT_EQ (last_iter, 0);
+
+  last_iter = test5 ();
+  ASSERT_EQ (last_iter, 0);
+
+  last_iter = test6 ();
+  ASSERT_EQ (last_iter, 0);
+
+  last_iter = test7 ();
+  ASSERT_EQ (last_iter, 5);
+
+  last_iter = test8 ();
+  ASSERT_EQ (last_iter, -4);
+
+  last_iter = test9 ();
+  ASSERT_EQ (last_iter, -3);
+
+  last_iter = test10 ();
+  ASSERT_EQ (last_iter, -6);
+  return 0;
+
+  last_iter = test11 ();
+  ASSERT_EQ (last_iter, -4);
+  return 0;
+
+  last_iter = test12 ();
+  ASSERT_EQ (last_iter, 5);
+  return 0;
+
+  last_iter = test13 ();
+  ASSERT_EQ (last_iter, 4);
+  return 0;
+
+  last_iter = test14 (0, 1);
+  ASSERT_EQ (last_iter, 0);
+  return 0;
+
+  last_iter = test14 (0, -1);
+  ASSERT_EQ (last_iter, 0);
+  return 0;
+
+  last_iter = test14 (8, 2);
+  ASSERT_EQ (last_iter, 16);
+  return 0;
+
+  last_iter = test14 (5, 3);
+  ASSERT_EQ (last_iter, 9);
+  return 0;
+
+  last_iter = test15 (8, -1);
+  ASSERT_EQ (last_iter, 9);
+  return 0;
+
+  last_iter = test15 (8, -2);
+  ASSERT_EQ (last_iter, 10);
+  return 0;
+
+  last_iter = test15 (5, -3);
+  ASSERT_EQ (last_iter, 6);
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-5.c b/gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-5.c
new file mode 100644
index 00000000000..815318ab27a
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-5.c
@@ -0,0 +1,150 @@
+/* { dg-do run } */
+/* { dg-options "-O0 -fopenmp-simd" } */
+
+#include <stdio.h>
+
+#define ASSERT_EQ(var, val) if (var != val) { fprintf (stderr, "%s:%d: Unexpected value %d\n", __FILE__, __LINE__, var); \
+    __builtin_abort (); }
+
+#define ASSERT_EQ_PTR(var, ptr) if (var != ptr) { fprintf (stderr, "%s:%d: Unexpected value %p\n", __FILE__, __LINE__, var); \
+    __builtin_abort (); }
+
+int
+test1 (int data[10])
+{
+  int iter = 0;
+  int *i;
+  #pragma omp tile sizes(5)
+  for (i = data; i < data + 10 ; i++)
+       {
+         ASSERT_EQ (*i, data[iter]);
+         ASSERT_EQ_PTR (i, data + iter);
+         iter++;
+       }
+
+  ASSERT_EQ_PTR (i, data + 10)
+  return iter;
+}
+
+int
+test2 (int data[10])
+{
+  int iter = 0;
+  int *i;
+  #pragma omp tile sizes(5)
+  for (i = data; i < data + 10 ; i=i+2)
+       {
+         ASSERT_EQ_PTR (i, data + 2 * iter);
+         ASSERT_EQ (*i, data[2 * iter]);
+         iter++;
+       }
+
+  ASSERT_EQ_PTR (i, data + 10)
+  return iter;
+}
+
+int
+test3 (int data[10])
+{
+  int iter = 0;
+  int *i;
+  #pragma omp tile sizes(5)
+  for (i = data; i <= data + 9 ; i=i+2)
+       {
+         ASSERT_EQ (*i, data[2 * iter]);
+         iter++;
+       }
+
+  ASSERT_EQ_PTR (i, data + 10)
+  return iter;
+}
+
+int
+test4 (int data[10])
+{
+  int iter = 0;
+  int *i;
+  #pragma omp tile sizes(5)
+  for (i = data; i != data + 10 ; i=i+1)
+       {
+         ASSERT_EQ (*i, data[iter]);
+         iter++;
+       }
+
+  ASSERT_EQ_PTR (i, data + 10)
+  return iter;
+}
+
+int
+test5 (int data[10])
+{
+  int iter = 0;
+  int *i;
+  #pragma omp tile sizes(3)
+  for (i = data + 9; i >= data ; i--)
+       {
+         ASSERT_EQ (*i, data[9 - iter]);
+         iter++;
+       }
+
+  ASSERT_EQ_PTR (i, data - 1)
+  return iter;
+}
+
+int
+test6 (int data[10])
+{
+  int iter = 0;
+  int *i;
+  #pragma omp tile sizes(3)
+  for (i = data + 9; i > data - 1 ; i--)
+       {
+         ASSERT_EQ (*i, data[9 - iter]);
+         iter++;
+       }
+
+  ASSERT_EQ_PTR (i, data - 1)
+  return iter;
+}
+
+int
+test7 (int data[10])
+{
+  int iter = 0;
+  #pragma omp tile sizes(1)
+  for (int *i = data + 9; i != data - 1 ; i--)
+       {
+         ASSERT_EQ (*i, data[9 - iter]);
+         iter++;
+       }
+
+  return iter;
+}
+
+int
+main ()
+{
+  int iter_count;
+  int data[10] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
+
+  iter_count = test1 (data);
+  ASSERT_EQ (iter_count, 10);
+
+  iter_count = test2 (data);
+  ASSERT_EQ (iter_count, 5);
+
+  iter_count = test3 (data);
+  ASSERT_EQ (iter_count, 5);
+
+  iter_count = test4 (data);
+  ASSERT_EQ (iter_count, 10);
+
+  iter_count = test5 (data);
+  ASSERT_EQ (iter_count, 10);
+
+  iter_count = test6 (data);
+  ASSERT_EQ (iter_count, 10);
+
+  iter_count = test7 (data);
+  ASSERT_EQ (iter_count, 10);
+}
diff --git a/gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-6.c b/gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-6.c
new file mode 100644
index 00000000000..8132128a5a8
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-6.c
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+/* { dg-options "-O0 -fopenmp-simd" } */
+
+#include <stdio.h>
+
+int
+test1 ()
+{
+   int sum = 0;
+for (int k = 0; k < 10; k++)
+  {
+#pragma omp tile sizes(5,7)
+  for (int i = 0; i < 10; i++)
+  for (int j = 0; j < 10; j=j+2)
+       {
+         sum = sum + 1;
+       }
+  }
+
+  return sum;
+}
+
+int
+main ()
+{
+  int result = test1 ();
+
+  if (result != 500)
+    {
+      fprintf (stderr, "Wrong result: %d\n", result);
+    __builtin_abort ();
+    }
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-7.c b/gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-7.c
new file mode 100644
index 00000000000..cd25a62c5c0
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-7.c
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+/* { dg-options "-O0 -fopenmp-simd" } */
+
+#include <stdio.h>
+#define ASSERT_EQ(var, val) if (var != val) { fprintf (stderr, "%s:%d: Unexpected value %d\n", __FILE__, __LINE__, var); \
+    __builtin_abort (); }
+
+#define ASSERT_EQ_PTR(var, ptr) if (var != ptr) { fprintf (stderr, "%s:%d: Unexpected value %p\n", __FILE__, __LINE__, var); \
+    __builtin_abort (); }
+
+int
+main ()
+{
+  int iter_count;
+  int data[10] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
+
+  int iter = 0;
+  int *i;
+  #pragma omp tile sizes(1)
+  for (i = data; i < data + 10; i=i+2)
+       {
+         ASSERT_EQ_PTR (i, data + 2 * iter);
+         ASSERT_EQ (*i, data[2 * iter]);
+         iter++;
+       }
+
+  unsigned long real_iter_count = ((unsigned long)i - (unsigned long)data) / (sizeof (int) * 2);
+  ASSERT_EQ (real_iter_count, 5);
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-8.c b/gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-8.c
new file mode 100644
index 00000000000..c26e03d7e74
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-8.c
@@ -0,0 +1,40 @@
+/* { dg-do run } */
+/* { dg-options "-O0 -fopenmp-simd" } */
+
+#include <stdio.h>
+
+#define ASSERT_EQ(var, val) if (var != val) { fprintf (stderr, "%s:%d: Unexpected value %d, expected %d\n", __FILE__, __LINE__, var, val); \
+    __builtin_abort (); }
+
+int
+main ()
+{
+  int iter_j = 0, iter_k = 0;
+  unsigned i, j, k;
+#pragma omp tile sizes(3,5,8)
+  for (i = 0; i < 2; i=i+2)
+  for (j = 0; j < 3; j=j+1)
+  for (k = 0; k < 5; k=k+3)
+       {
+         /* fprintf (stderr, "i=%d j=%d k=%d\n", i, j, k);
+          * fprintf (stderr, "iter_j=%d iter_k=%d\n", iter_j, iter_k); */
+         ASSERT_EQ (i, 0);
+         if (k == 0)
+           {
+             ASSERT_EQ (j, iter_j);
+             iter_k = 0;
+           }
+
+         ASSERT_EQ (k, iter_k);
+
+         iter_k = iter_k + 3;
+         if (k == 3)
+           iter_j++;
+       }
+
+  ASSERT_EQ (i, 2);
+  ASSERT_EQ (j, 3);
+  ASSERT_EQ (k, 6);
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-2.c b/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-2.c
index 8f7c3088a2e..e4fee72c04d 100644
--- a/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-2.c
+++ b/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-2.c
@@ -19,7 +19,7 @@ test ()

 #pragma omp for
 #pragma omp unroll full /* { dg-error {'full' clause is invalid here; turns loop into non-loop} } */
-#pragma omp unroll full /* { dg-error {'full' clause is invalid here; turns loop into non-loop} } */
+#pragma omp unroll full
   for (int i = -300; i != 100; ++i)
     dummy (i);

@@ -45,13 +45,11 @@ test ()
   int i;
 #pragma omp for
 #pragma omp unroll( /* { dg-error {expected '#pragma omp' clause before '\(' token} } */
-  /* { dg-error {'#pragma omp unroll' without 'partial' clause is invalid here; turns loop into non-loop} "" { target *-*-* } .-1 } */
   for (int i = -300; i != 100; ++i)
     dummy (i);

 #pragma omp for
 #pragma omp unroll foo /* { dg-error {expected '#pragma omp' clause before 'foo'} } */
-  /* { dg-error {'#pragma omp unroll' without 'partial' clause is invalid here; turns loop into non-loop} "" { target *-*-* } .-1 } */
   for (int i = -300; i != 100; ++i)
     dummy (i);

@@ -67,7 +65,7 @@ test ()

 #pragma omp unroll partial(i)
  /* { dg-error {the value of 'i' is not usable in a constant expression} "" { target c++ } .-1 } */
- /* { dg-error {partial argument needs positive constant integer expression} "" { target c } .-2 } */
+ /* { dg-error {partial argument needs positive constant integer expression} "" { target *-*-* } .-2 } */
   for (int i = -300; i != 100; ++i)
     dummy (i);

@@ -78,20 +76,18 @@ test ()
 #pragma omp for
 #pragma omp unroll partial(1)
 #pragma omp unroll parti /* { dg-error {expected '#pragma omp' clause before 'parti'} } */
-  /* { dg-error {'#pragma omp unroll' without 'partial' clause is invalid here; turns loop into non-loop} "" { target *-*-* } .-1 } */
   for (int i = -300; i != 100; ++i)
     dummy (i);

 #pragma omp for
 #pragma omp unroll partial(1)
 #pragma omp unroll parti /* { dg-error {expected '#pragma omp' clause before 'parti'} } */
-  /* { dg-error {'#pragma omp unroll' without 'partial' clause is invalid here; turns loop into non-loop} "" { target *-*-* } .-1 } */
   for (int i = -300; i != 100; ++i)
     dummy (i);

 int sum = 0;
-#pragma omp parallel for reduction(+ : sum) collapse(2) /* { dg-error {collapse cannot be larger than 1 on an unrolled loop} "" { target c } } */
-#pragma omp unroll partial(1) /* { dg-error {collapse cannot be larger than 1 on an unrolled loop} "" { target c++ } } */
+#pragma omp parallel for reduction(+ : sum) collapse(2)
+#pragma omp unroll partial(1) /* { dg-error {nesting depth left after this transformation too low for loop collapse} } */
   for (int i = 3; i < 10; ++i)
     for (int j = -2; j < 7; ++j)
       sum++;
diff --git a/gcc/testsuite/g++.dg/gomp/loop-transforms/tile-1.h b/gcc/testsuite/g++.dg/gomp/loop-transforms/tile-1.h
new file mode 100644
index 00000000000..166d1d48677
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gomp/loop-transforms/tile-1.h
@@ -0,0 +1,27 @@
+// { dg-do compile }
+// { dg-additional-options "-std=c++11" }
+
+#include <vector>
+
+extern void dummy (int);
+
+template<class T, int U, unsigned V> void
+test1_template ()
+{
+  std::vector<int> v;
+
+  for (unsigned i = 0; i < 10; i++)
+    v.push_back (i);
+
+#pragma omp for
+  for (int i : v)
+    dummy (i);
+
+#pragma omp tile sizes (U, 10, V)
+  for (T i : v)
+  for (T j : v)
+  for (T k : v)
+    dummy (i);
+}
+
+void test () { test1_template <long, 5, 3> (); };
diff --git a/gcc/testsuite/g++.dg/gomp/loop-transforms/tile-1a.C b/gcc/testsuite/g++.dg/gomp/loop-transforms/tile-1a.C
new file mode 100644
index 00000000000..1ee76da3d4a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gomp/loop-transforms/tile-1a.C
@@ -0,0 +1,27 @@
+// { dg-do compile }
+// { dg-additional-options "-std=c++11" }
+
+#include <vector>
+
+extern void dummy (int);
+
+template<class T, int U, unsigned V> void
+test1_template ()
+{
+  std::vector<int> v;
+
+  for (unsigned i = 0; i < 10; i++)
+    v.push_back (i);
+
+#pragma omp teams distribute parallel for num_teams(V)
+  for (int i : v)
+    dummy (i);
+
+#pragma omp tile sizes (V, U)
+  for (T i : v)
+  for (T j : v)
+  for (T k : v)
+    dummy (i);
+}
+
+void test () { test1_template <long, 5, 3> (); };
diff --git a/gcc/testsuite/g++.dg/gomp/loop-transforms/tile-1b.C b/gcc/testsuite/g++.dg/gomp/loop-transforms/tile-1b.C
new file mode 100644
index 00000000000..263c9b301c6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gomp/loop-transforms/tile-1b.C
@@ -0,0 +1,27 @@
+// { dg-do compile }
+// { dg-additional-options "-std=c++11" }
+
+#include <vector>
+
+extern void dummy (int);
+
+template<class T, int U, unsigned V> void
+test1_template ()
+{
+  std::vector<int> v;
+
+  for (unsigned i = 0; i < 10; i++)
+    v.push_back (i);
+
+#pragma omp for
+  for (int i : v)
+    dummy (i);
+
+#pragma omp tile sizes (U, 10, V) // { dg-error {'tile sizes' argument needs positive integral constant} }
+  for (T i : v)
+  for (T j : v)
+  for (T k : v)
+    dummy (i);
+}
+
+void test () { test1_template <long, 5, 0> (); };
diff --git a/libgomp/testsuite/libgomp.c++/loop-transforms/tile-1.C b/libgomp/testsuite/libgomp.c++/loop-transforms/tile-1.C
new file mode 100644
index 00000000000..2a4d760720d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c++/loop-transforms/tile-1.C
@@ -0,0 +1,52 @@
+#include <string.h>
+#include <stdio.h>
+#include <math.h>
+
+void
+mult (float *matrix1, float *matrix2, float *result, unsigned dim0,
+      unsigned dim1)
+{
+  memset (result, 0, sizeof (float) * dim0 * dim1);
+#pragma omp target parallel for collapse(3) map(tofrom:result[0:dim0*dim1]) map(to:matrix1[0:dim0*dim1], matrix2[0:dim0*dim1])
+#pragma omp tile sizes(8, 16, 4)
+  for (unsigned i = 0; i < dim0; i++)
+    for (unsigned j = 0; j < dim1; j++)
+      for (unsigned k = 0; k < dim1; k++)
+       result[i * dim1 + j] += matrix1[i * dim1 + k] * matrix2[k * dim0 + j];
+}
+
+int
+main ()
+{
+  unsigned dim0 = 20;
+  unsigned dim1 = 20;
+
+  float *result = (float *)malloc (sizeof (float) * dim0 * dim1);
+  float *matrix1 = (float *)malloc (sizeof (float) * dim0 * dim1);
+  float *matrix2 = (float *)malloc (sizeof (float) * dim0 * dim1);
+
+  for (unsigned i = 0; i < dim0; i++)
+    for (unsigned j = 0; j < dim1; j++)
+      matrix1[i * dim1 + j] = j;
+
+  for (unsigned i = 0; i < dim1; i++)
+    for (unsigned j = 0; j < dim0; j++)
+      if (i == j)
+       matrix2[i * dim0 + j] = 1;
+      else
+       matrix2[i * dim0 + j] = 0;
+
+  mult (matrix1, matrix2, result, dim0, dim1);
+
+  for (unsigned i = 0; i < dim0; i++)
+    for (unsigned j = 0; j < dim1; j++)
+      {
+       if (matrix1[i * dim1 + j] != result[i * dim1 + j])
+         {
+           printf ("ERROR at %d, %d\n", i, j);
+           __builtin_abort ();
+         }
+      }
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c++/loop-transforms/tile-2.C b/libgomp/testsuite/libgomp.c++/loop-transforms/tile-2.C
new file mode 100644
index 00000000000..780421fa4c7
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c++/loop-transforms/tile-2.C
@@ -0,0 +1,69 @@
+// { dg-additional-options "-std=c++11" }
+// { dg-additional-options "-O0" }
+
+#include <vector>
+#include <stdio.h>
+
+constexpr unsigned fib (unsigned n)
+{
+  return n <= 2 ? 1 : fib (n-1) + fib (n-2);
+}
+
+int
+test1 ()
+{
+  std::vector<int> v;
+
+  for (unsigned i = 0; i <= 9; i++)
+    v.push_back (1);
+
+  int sum = 0;
+  for (int k = 0; k < 10; k++)
+    #pragma omp tile sizes(fib(4))
+    for (int i : v) {
+      for (int j = 8; j != -2; --j)
+       sum = sum + i;
+    }
+
+  return sum;
+}
+
+int
+test2 ()
+{
+  std::vector<int> v;
+
+  for (unsigned i = 0; i <= 10; i++)
+    v.push_back (i);
+
+  int sum = 0;
+  for (int k = 0; k < 10; k++)
+#pragma omp parallel for collapse(2) reduction(+:sum)
+#pragma omp tile sizes(fib(4), 1)
+  for (int i : v)
+    for (int j = 8; j > -2; --j)
+       sum = sum + i;
+
+  return sum;
+}
+
+int
+main ()
+{
+  int result = test1 ();
+
+  if (result != 1000)
+    {
+      fprintf (stderr, "%d: Wrong result: %d\n", __LINE__, result);
+      __builtin_abort ();
+    }
+
+  result = test2 ();
+  if (result != 5500)
+    {
+      fprintf (stderr, "%d: Wrong result: %d\n", __LINE__, result);
+    __builtin_abort ();
+    }
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c++/loop-transforms/tile-3.C b/libgomp/testsuite/libgomp.c++/loop-transforms/tile-3.C
new file mode 100644
index 00000000000..91ec8f5c137
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c++/loop-transforms/tile-3.C
@@ -0,0 +1,28 @@
+// { dg-additional-options "-std=c++11" }
+// { dg-additional-options "-O0" }
+
+#include <vector>
+
+int
+main ()
+{
+  std::vector<int> v;
+  std::vector<int> w;
+
+  for (unsigned i = 0; i <= 9; i++)
+    v.push_back (i);
+
+  int iter = 0;
+#pragma omp for
+#pragma omp tile sizes(5)
+  for (int i : v)
+    {
+      w.push_back (iter);
+      iter++;
+    }
+
+  for (int i = 0; i < w.size (); i++)
+    if (w[i] != i)
+      __builtin_abort ();
+  return 0;
+}
--
2.36.1

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 6/7] openmp: Add Fortran support for loop transformations on inner loops
  2023-03-24 15:30 [PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives Frederik Harwath
                   ` (4 preceding siblings ...)
  2023-03-24 15:30 ` [PATCH 5/7] openmp: Add C/C++ " Frederik Harwath
@ 2023-03-24 15:30 ` Frederik Harwath
  2023-03-24 15:30 ` [PATCH 7/7] openmp: Add C/C++ " Frederik Harwath
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Frederik Harwath @ 2023-03-24 15:30 UTC (permalink / raw)
  To: gcc-patches, tobias, fortran, jakub

So far the implementation of the "omp tile" and "omp unroll"
directives restricted their use to the outermost loop of a loop-nest.
This commit changes the Fortran front end to parse and verify the
directives on inner loops.  The transformation clauses are extended to
carry the information about the level of the loop nest at which a
transformation should be applied.  The middle end transformation pass
is adjusted to apply the transformations at the correct level of a
loop nest and to take their effect on the loop nest depth into
account.

gcc/fortran/ChangeLog:

        * openmp.cc (omp_unroll_removes_loop_nest): Move down in file.
        (resolve_loop_transform_generic): Remove, and ...
        (resolve_omp_unroll): ... inline and adapt here. Move function.
        Move functin.
        (find_nested_loop_in_block): New function.
        (find_nested_loop_in_chain): New function, used ...
        (is_outer_iteration_variable): ... here, and ...
        (expr_is_invariant): ... here.
        (resolve_omp_do): Adjust code for resolving loop transformations.
        (resolve_omp_tile): Likewise.
        * trans-openmp.cc (gfc_trans_omp_clauses): Set OMP_TRANSFROM_LEVEL
        on new clause.
        (compute_transformed_depth): New function to compute the depth
        ("collapse") of a transformed loop nest, used
        (gfc_trans_omp_do): ... here.

gcc/ChangeLog:

        * omp-transform-loops.cc (gimple_assign_rhs_to_tree): Fix type
        in comment.
        (gomp_for_uncollapse): Adjust "collapse" value after uncollapse.
        (partial_unroll): Add argument for the loop nest level to be transformed.
        (tile): Likewise.
        (transform_gomp_for): Pass level to transformatoin functions.
        (optimize_transformation_clauses): Handle transformation clauses for all
        levels recursively.
        * tree-pretty-print.cc (dump_omp_clause): Print
        OMP_CLAUSE_TRANSFORM_LEVEL for OMP_CLAUSE_UNROLL_FULL,
        OMP_CLAUSE_UNROLL_PARTIAL, and OMP_CLAUSE_TILE.
        * tree.cc: Increase number of operands of OMP_CLAUSE_UNROLL_FULL,
        OMP_CLAUSE_UNROLL_PARTIAL, and OMP_CLAUSE_TILE.
        * tree.h (OMP_CLAUSE_TRANSFORM_LEVEL): New macro to access
        clause operand 0.
        (OMP_CLAUSE_UNROLL_PARTIAL_EXPR): Use operand 1 instead of 0.
        (OMP_CLAUSE_TILE_SIZES): Likewise.

gcc/cp/ChangeLog

        * parser.cc (cp_parser_omp_clause_unroll_full): Set new
        OMP_CLAUSE_TRANSFORM_LEVEL operand to default value.
        (cp_parser_omp_clause_unroll_partial): Likewise.
        (cp_parser_omp_tile_sizes): Likewise.
        (cp_parser_omp_loop_transform_clause): Likewise.
        (cp_parser_omp_nested_loop_transform_clauses): Likewise.
        (cp_parser_omp_unroll): Likewise.
        * pt.cc (tsubst_omp_clauses): Adjust OMP_CLAUSE_UNROLL_PARTIAL
        and OMP_CLAUSE_TILE handling to changed number of operands.

gcc/c/ChangeLog

        * c-parser.cc (c_parser_omp_clause_unroll_full): Set new
        OMP_CLAUSE_TRANSFORM_LEVEL operand to default value.
        (c_parser_omp_clause_unroll_partial): Likewise.
        (c_parser_omp_tile_sizes): Likewise.
        (c_parser_omp_loop_transform_clause): Likewise.
        (c_parser_omp_nested_loop_transform_clauses): Likewise.
        (c_parser_omp_unroll): Likewise.

gcc/testsuite/ChangeLog:

        * gfortran.dg/gomp/loop-transforms/unroll-8.f90: Adjust.
        * gfortran.dg/gomp/loop-transforms/unroll-9.f90: Adjust.
        * gfortran.dg/gomp/loop-transforms/unroll-tile-1.f90: Adjust.
        * gfortran.dg/gomp/loop-transforms/unroll-tile-2.f90: Adjust.
        * gfortran.dg/gomp/loop-transforms/inner-loops.f90: New test.
        * gfortran.dg/gomp/loop-transforms/tile-imperfect-nest.f90: New test.
        * gfortran.dg/gomp/loop-transforms/tile-inner-loops-1.f90: New test.
        * gfortran.dg/gomp/loop-transforms/tile-inner-loops-2.f90: New test.
        * gfortran.dg/gomp/loop-transforms/tile-inner-loops-3.f90: New test.
        * gfortran.dg/gomp/loop-transforms/tile-inner-loops-3a.f90: New test.
        * gfortran.dg/gomp/loop-transforms/tile-inner-loops-4.f90: New test.
        * gfortran.dg/gomp/loop-transforms/tile-inner-loops-4a.f90: New test.
        * gfortran.dg/gomp/loop-transforms/tile-inner-loops-5.f90: New test.
        * gfortran.dg/gomp/loop-transforms/unroll-inner-loop.f90: New test.
        * gfortran.dg/gomp/loop-transforms/unroll-tile-inner-1.f90: New test.
        * gfortran.dg/gomp/loop-transforms/tile-3.f90: Adapt to
        changed diagnostic messages.

libgomp/ChangeLog:
        * testsuite/libgomp.fortran/loop-transforms/inner-1.f90: New test.
---
 gcc/c/c-parser.cc                             |  10 +-
 gcc/cp/parser.cc                              |  12 +-
 gcc/cp/pt.cc                                  |  12 +-
 gcc/fortran/openmp.cc                         | 173 ++++++++++++------
 gcc/fortran/trans-openmp.cc                   |  74 ++++++--
 gcc/omp-transform-loops.cc                    | 138 ++++++++------
 .../gomp/loop-transforms/inner-loops.f90      | 124 +++++++++++++
 .../gomp/loop-transforms/tile-3.f90           |   4 +-
 .../loop-transforms/tile-imperfect-nest.f90   |  93 ++++++++++
 .../loop-transforms/tile-inner-loops-1.f90    |  16 ++
 .../loop-transforms/tile-inner-loops-2.f90    |  23 +++
 .../loop-transforms/tile-inner-loops-3.f90    |  22 +++
 .../loop-transforms/tile-inner-loops-3a.f90   |  31 ++++
 .../loop-transforms/tile-inner-loops-4.f90    |  30 +++
 .../loop-transforms/tile-inner-loops-4a.f90   |  26 +++
 .../loop-transforms/tile-inner-loops-5.f90    | 123 +++++++++++++
 .../tile-non-rectangular-1.f90                |  71 +++++++
 .../tile-non-rectangular-2.f90                |  12 ++
 .../gomp/loop-transforms/unroll-8.f90         |   2 +-
 .../gomp/loop-transforms/unroll-9.f90         |   2 +-
 .../loop-transforms/unroll-inner-loop.f90     |  57 ++++++
 .../loop-transforms/unroll-non-rect-1.f90     |  31 ++++
 .../gomp/loop-transforms/unroll-tile-1.f90    |   2 +-
 .../gomp/loop-transforms/unroll-tile-2.f90    |   2 +-
 .../loop-transforms/unroll-tile-inner-1.f90   |  25 +++
 gcc/tree-pretty-print.cc                      |  24 +++
 gcc/tree.cc                                   |   8 +-
 gcc/tree.h                                    |   9 +-
 .../loop-transforms/inner-1.f90               |  77 ++++++++
 29 files changed, 1103 insertions(+), 130 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/inner-loops.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-imperfect-nest.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-3.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-3a.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-4.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-4a.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-5.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-non-rectangular-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-non-rectangular-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-inner-loop.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-non-rect-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-inner-1.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/inner-1.f90

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index aac23dec9c0..41f9fb90037 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -17466,6 +17466,7 @@ c_parser_omp_clause_unroll_full (c_parser *parser, tree list)

   location_t loc = c_parser_peek_token (parser)->location;
   tree c = build_omp_clause (loc, OMP_CLAUSE_UNROLL_FULL);
+  OMP_CLAUSE_TRANSFORM_LEVEL (c) = build_int_cst (unsigned_type_node, 0);
   OMP_CLAUSE_CHAIN (c) = list;
   return c;
 }
@@ -17486,6 +17487,7 @@ c_parser_omp_clause_unroll_partial (c_parser *parser, tree list)
   loc = c_parser_peek_token (parser)->location;
   c = build_omp_clause (loc, OMP_CLAUSE_UNROLL_PARTIAL);
   OMP_CLAUSE_UNROLL_PARTIAL_EXPR (c) = NULL_TREE;
+  OMP_CLAUSE_TRANSFORM_LEVEL (c) = build_int_cst (unsigned_type_node, 0);
   OMP_CLAUSE_CHAIN (c) = list;

   if (!c_parser_next_token_is (parser, CPP_OPEN_PAREN))
@@ -24011,6 +24013,7 @@ c_parser_omp_tile_sizes (c_parser *parser, location_t loc)

   gcc_assert (sizes);
   tree c  = build_omp_clause (loc, OMP_CLAUSE_TILE);
+  OMP_CLAUSE_TRANSFORM_LEVEL (c) = build_int_cst (unsigned_type_node, 0);
   OMP_CLAUSE_TILE_SIZES (c) = sizes;

   return c;
@@ -24036,7 +24039,11 @@ c_parser_omp_loop_transform_clause (c_parser *parser)
       if (!c)
        {
          if (c_parser_next_token_is (parser, CPP_PRAGMA_EOL))
-           c = build_omp_clause (tok->location, OMP_CLAUSE_UNROLL_NONE);
+           {
+             c = build_omp_clause (tok->location, OMP_CLAUSE_UNROLL_NONE);
+             OMP_CLAUSE_TRANSFORM_LEVEL (c) =
+               build_int_cst (unsigned_type_node, 0);
+           }
          else
            c = error_mark_node;
        }
@@ -24191,6 +24198,7 @@ c_parser_omp_unroll (location_t loc, c_parser *parser, bool *if_p)
   if (!clauses)
     {
       tree c = build_omp_clause (loc, OMP_CLAUSE_UNROLL_NONE);
+      OMP_CLAUSE_TRANSFORM_LEVEL (c) = build_int_cst (unsigned_type_node, 0);
       OMP_CLAUSE_CHAIN (c) = clauses;
       clauses = c;
     }
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 084ecd3ada5..8219c476153 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -39476,6 +39476,7 @@ cp_parser_omp_clause_unroll_full (tree list, location_t loc)
     return list;

   tree c = build_omp_clause (loc, OMP_CLAUSE_UNROLL_FULL);
+  OMP_CLAUSE_TRANSFORM_LEVEL (c) = build_int_cst (unsigned_type_node, 0);
   OMP_CLAUSE_CHAIN (c) = list;
   return c;
 }
@@ -39494,6 +39495,7 @@ cp_parser_omp_clause_unroll_partial (cp_parser *parser, tree list,
   tree c, num = error_mark_node;
   c = build_omp_clause (loc, OMP_CLAUSE_UNROLL_PARTIAL);
   OMP_CLAUSE_UNROLL_PARTIAL_EXPR (c) = NULL_TREE;
+  OMP_CLAUSE_TRANSFORM_LEVEL (c) = build_int_cst (unsigned_type_node, 0);
   OMP_CLAUSE_CHAIN (c) = list;

   if (!cp_lexer_next_token_is (parser->lexer, CPP_OPEN_PAREN))
@@ -45786,6 +45788,8 @@ cp_parser_omp_tile_sizes (cp_parser *parser, location_t loc)
   gcc_assert (sizes);
   tree c  = build_omp_clause (loc, OMP_CLAUSE_TILE);
   OMP_CLAUSE_TILE_SIZES (c) = sizes;
+  OMP_CLAUSE_TRANSFORM_LEVEL (c)
+    = build_int_cst (unsigned_type_node, 0);

   return c;
 }
@@ -45846,7 +45850,11 @@ cp_parser_omp_loop_transform_clause (cp_parser *parser)
       if (!c)
        {
          if (cp_lexer_next_token_is (lexer, CPP_PRAGMA_EOL))
-           c = build_omp_clause (tok->location, OMP_CLAUSE_UNROLL_NONE);
+           {
+             c = build_omp_clause (tok->location, OMP_CLAUSE_UNROLL_NONE);
+             OMP_CLAUSE_TRANSFORM_LEVEL (c)
+               = build_int_cst (unsigned_type_node, 0);
+           }
          else
            c = error_mark_node;
        }
@@ -45926,6 +45934,7 @@ cp_parser_omp_nested_loop_transform_clauses (cp_parser *parser, tree &clauses,
        default:
          gcc_unreachable ();
        }
+      OMP_CLAUSE_TRANSFORM_LEVEL (c) = build_int_cst (unsigned_type_node, 0);

       if (depth < last_depth)
        {
@@ -45974,6 +45983,7 @@ cp_parser_omp_unroll (cp_parser *parser, cp_token *tok, bool *if_p)
   if (!clauses)
     {
       tree c = build_omp_clause (tok->location, OMP_CLAUSE_UNROLL_NONE);
+      OMP_CLAUSE_TRANSFORM_LEVEL (c) = build_int_cst (unsigned_type_node, 0);
       OMP_CLAUSE_CHAIN (c) = clauses;
       clauses = c;
     }
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index a9d36d66caf..aeea36b24d7 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -18086,11 +18086,19 @@ tsubst_omp_clauses (tree clauses, enum c_omp_region_type ort,
        case OMP_CLAUSE_ASYNC:
        case OMP_CLAUSE_WAIT:
        case OMP_CLAUSE_DETACH:
-       case OMP_CLAUSE_UNROLL_PARTIAL:
-       case OMP_CLAUSE_TILE:
          OMP_CLAUSE_OPERAND (nc, 0)
            = tsubst_expr (OMP_CLAUSE_OPERAND (oc, 0), args, complain, in_decl);
          break;
+       case OMP_CLAUSE_UNROLL_PARTIAL:
+         OMP_CLAUSE_UNROLL_PARTIAL_EXPR (nc)
+           = tsubst_expr (OMP_CLAUSE_UNROLL_PARTIAL_EXPR (oc), args, complain,
+                          in_decl);
+         break;
+       case OMP_CLAUSE_TILE:
+         OMP_CLAUSE_TILE_SIZES (nc)
+           = tsubst_expr (OMP_CLAUSE_TILE_SIZES (oc), args, complain,
+                          in_decl);
+         break;
        case OMP_CLAUSE_REDUCTION:
        case OMP_CLAUSE_IN_REDUCTION:
        case OMP_CLAUSE_TASK_REDUCTION:
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 1de61029768..86e9e4ead0e 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -9389,27 +9389,79 @@ gfc_resolve_omp_local_vars (gfc_namespace *ns)
     gfc_traverse_ns (ns, handle_local_var);
 }

+
+/* Forward declaration for mutually recursive functions.  */
+static gfc_code *
+find_nested_loop_in_block (gfc_code *block);
+
+/* Return the first nested DO loop in CHAIN, or NULL if there
+   isn't one.  Does no error checking on intervening code.  */
+
+static gfc_code *
+find_nested_loop_in_chain (gfc_code *chain)
+{
+  gfc_code *code;
+
+  if (!chain)
+    return NULL;
+
+  for (code = chain; code; code = code->next)
+    {
+      if (code->op == EXEC_DO)
+       return code;
+      else if (loop_transform_p (code->op) && code->block)
+       {
+         code = code->block;
+         continue;
+       }
+      else if (code->op == EXEC_BLOCK)
+       {
+         gfc_code *c = find_nested_loop_in_block (code);
+         if (c)
+           return c;
+       }
+    }
+  return NULL;
+}
+
+/* Return the first nested DO loop in BLOCK, or NULL if there
+   isn't one.  Does no error checking on intervening code.  */
+static gfc_code *
+find_nested_loop_in_block (gfc_code *block)
+{
+  gfc_namespace *ns;
+  gcc_assert (block->op == EXEC_BLOCK);
+  ns = block->ext.block.ns;
+  gcc_assert (ns);
+  return find_nested_loop_in_chain (ns->code);
+}
 /* CODE is an OMP loop construct.  Return true if VAR matches an iteration
    variable outer to level DEPTH.  */
 static bool
 is_outer_iteration_variable (gfc_code *code, int depth, gfc_symbol *var)
 {
   int i;
-  gfc_code *do_code = code->block->next;
-  while (loop_transform_p (do_code->op)) {
-    if (do_code->block)
-      do_code = do_code->block->next;
-    else
-      do_code = do_code->next;
-  }
-  gcc_assert (!loop_transform_p (do_code->op));
+  gfc_code *chain;
+  if (code->block)
+    chain = code->block->next;
+  else
+    {
+      gcc_assert (loop_transform_p (code->op));
+      chain = code;
+      while (loop_transform_p (chain->op))
+       chain = chain->next;
+    }

   for (i = 1; i < depth; i++)
     {
+      gfc_code *do_code = find_nested_loop_in_chain (chain);
+      gcc_assert (do_code != code);
+      gcc_assert (do_code && do_code->op == EXEC_DO);
       gfc_symbol *ivar = do_code->ext.iterator->var->symtree->n.sym;
       if (var == ivar)
        return true;
-      do_code = do_code->block->next;
+
+      chain = do_code->block->next;
     }
   return false;
 }
@@ -9420,21 +9472,22 @@ static bool
 expr_is_invariant (gfc_code *code, int depth, gfc_expr *expr)
 {
   int i;
-  gfc_code *do_code = code->block->next;
-  while (loop_transform_p (do_code->op)) {
-    if (do_code->block)
-      do_code = do_code->block->next;
-    else
-      do_code = do_code->next;
-  }
-  gcc_assert (!loop_transform_p (do_code->op));
+  gfc_code *do_code = code;
+
+  /* Move over loop transformations until the
+     loop is found. It may also be represented by a
+     transformation construct (but then with a block)
+     if it is not associated with any other construct. */
+  while (loop_transform_p (do_code->op) && !do_code->block)
+    do_code = do_code->next;

   for (i = 1; i < depth; i++)
     {
+      do_code = find_nested_loop_in_chain (do_code->block->next);
+      gcc_assert (do_code);
       gfc_symbol *ivar = do_code->ext.iterator->var->symtree->n.sym;
       if (gfc_find_sym_in_expr (ivar, expr))
        return false;
-      do_code = do_code->block->next;
     }
   return true;
 }
@@ -9828,6 +9881,8 @@ resolve_omp_do (gfc_code *code)
       if (i == collapse || c)
        break;
       do_code = do_code->block;
+      do_code = resolve_nested_loop_transforms (do_code, name, collapse - i,
+                                               &code->loc);
       if (do_code->op != EXEC_DO && do_code->op != EXEC_DO_WHILE)
        {
          gfc_error ("not enough DO loops for collapsed %s at %L",
@@ -9835,6 +9890,8 @@ resolve_omp_do (gfc_code *code)
          break;
        }
       do_code = do_code->next;
+      do_code = resolve_nested_loop_transforms (do_code, name, collapse - i,
+                                               &code->loc);
       if (do_code == NULL
          || (do_code->op != EXEC_DO && do_code->op != EXEC_DO_WHILE))
        {
@@ -9848,7 +9905,7 @@ resolve_omp_do (gfc_code *code)
 static void
 resolve_omp_tile (gfc_code *code)
 {
-  gfc_code *do_code, *c;
+  gfc_code *do_code, *next;
   gfc_symbol *dovar;
   const char *name = "!$OMP TILE";

@@ -9862,65 +9919,78 @@ resolve_omp_tile (gfc_code *code)

   for (unsigned i = 1; i <= num_loops; i++)
     {
+
+      gfc_symbol *start_var = NULL, *end_var = NULL;
+
       if (do_code->op == EXEC_DO_WHILE)
        {
          gfc_error ("%s cannot be a DO WHILE or DO without loop control "
                     "at %L", name, &do_code->loc);
-         break;
+         return;
        }
       if (do_code->op == EXEC_DO_CONCURRENT)
        {
          gfc_error ("%s cannot be a DO CONCURRENT loop at %L", name,
                     &do_code->loc);
-         break;
+         return;
        }
       if (do_code->op != EXEC_DO)
        {
          gfc_error ("%s must be DO loop at %L", name,
                     &do_code->loc);
-         break;
+         return;
        }

       gcc_assert (do_code->op != EXEC_OMP_UNROLL);
       gcc_assert (do_code->op == EXEC_DO);
       dovar = do_code->ext.iterator->var->symtree->n.sym;
-      if (i > 1)
+      if (is_outer_iteration_variable (code, i, dovar))
        {
-         gfc_code *do_code2 = code;
-         while (loop_transform_p (do_code2->op))
-           {
-             if (do_code2->block)
-               do_code2 = do_code2->block->next;
-             else
-               do_code2 = do_code2->next;
-           }
-         gcc_assert (!loop_transform_p (do_code2->op));
-
-         for (unsigned j = 1; j < i; j++)
-           {
-             gfc_symbol *ivar = do_code2->ext.iterator->var->symtree->n.sym;
-             if (dovar == ivar
-                 || gfc_find_sym_in_expr (ivar, do_code->ext.iterator->start)
-                 || gfc_find_sym_in_expr (ivar, do_code->ext.iterator->end)
-                 || gfc_find_sym_in_expr (ivar, do_code->ext.iterator->step))
-               {
-                 gfc_error ("%s loops don't form rectangular "
-                            "iteration space at %L", name, &do_code->loc);
-                 break;
-               }
-             do_code2 = do_code2->block->next;
-           }
+         gfc_error ("%s iteration variable used in more than one loop at %L (depth %d)",
+                    name, &do_code->loc, i);
+         return;
        }
-      for (c = do_code->next; c; c = c->next)
-       if (c->op != EXEC_NOP && c->op != EXEC_CONTINUE)
+      else if (!bound_expr_is_canonical (code, i,
+                                        do_code->ext.iterator->start,
+                                        &start_var))
+       {
+         gfc_error ("%s loop start expression not in canonical form at %L",
+                    name, &do_code->loc);
+         return;
+       }
+      else if (!bound_expr_is_canonical (code, i,
+                                        do_code->ext.iterator->end,
+                                        &end_var))
+       {
+         gfc_error ("%s loop end expression not in canonical form at %L",
+                    name, &do_code->loc);
+         return;
+       }
+      else if (start_var && end_var && start_var != end_var)
+       {
+         gfc_error ("%s loop bounds reference different "
+                    "iteration variables at %L", name, &do_code->loc);
+         return;
+       }
+      else if (!expr_is_invariant (code, i, do_code->ext.iterator->step))
+       {
+         gfc_error ("%s loop increment not in canonical form at %L",
+                    name, &do_code->loc);
+         return;
+       }
+      if (start_var || end_var)
+       code->ext.omp_clauses->non_rectangular = 1;
+      for (next = do_code->next; next; next = next->next)
+       if (next->op != EXEC_NOP && next->op != EXEC_CONTINUE)
          {
            gfc_error ("%s loops not perfectly nested at %L",
-                      name, &c->loc);
+                      name, &next->loc);
            break;
          }
-      if (i == num_loops || c)
+      if (i == num_loops || next)
        break;
       do_code = do_code->block;
+      do_code = resolve_nested_loop_transforms (do_code, name, num_loops - i, &code->loc);
       if (do_code->op != EXEC_DO && do_code->op != EXEC_DO_WHILE)
        {
          gfc_error ("not enough DO loops for %s at %L",
@@ -9928,6 +9998,7 @@ resolve_omp_tile (gfc_code *code)
          break;
        }
       do_code = do_code->next;
+      do_code = resolve_nested_loop_transforms (do_code, name, num_loops - i, &code->loc);
       if (do_code == NULL
          || (do_code->op != EXEC_DO && do_code->op != EXEC_DO_WHILE))
        {
diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 6936cd7f5ee..0cef3a8ba3a 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -3893,12 +3893,14 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
   if (clauses->unroll_full)
     {
       c = build_omp_clause (gfc_get_location (&where), OMP_CLAUSE_UNROLL_FULL);
+      OMP_CLAUSE_TRANSFORM_LEVEL (c) = build_int_cst (unsigned_type_node, 0);
       omp_clauses = gfc_trans_add_clause (c, omp_clauses);
     }

   if (clauses->unroll_none)
     {
       c = build_omp_clause (gfc_get_location (&where), OMP_CLAUSE_UNROLL_NONE);
+      OMP_CLAUSE_TRANSFORM_LEVEL (c) = build_int_cst (unsigned_type_node, 0);
       omp_clauses = gfc_trans_add_clause (c, omp_clauses);
     }

@@ -3906,6 +3908,7 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
     {
       c = build_omp_clause (gfc_get_location (&where),
                            OMP_CLAUSE_UNROLL_PARTIAL);
+      OMP_CLAUSE_TRANSFORM_LEVEL (c) = build_int_cst (unsigned_type_node, 0);
       OMP_CLAUSE_UNROLL_PARTIAL_EXPR (c)
          = clauses->unroll_partial_factor ? build_int_cst (
                integer_type_node, clauses->unroll_partial_factor)
@@ -3926,6 +3929,7 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
       c = build_omp_clause (gfc_get_location (&where),
                            OMP_CLAUSE_TILE);
       OMP_CLAUSE_TILE_SIZES (c) = build_tree_list_vec (tvec);
+      OMP_CLAUSE_TRANSFORM_LEVEL (c) = build_int_cst (unsigned_type_node, 0);
       omp_clauses = gfc_trans_add_clause (c, omp_clauses);

       tvec->truncate (0);
@@ -5308,6 +5312,29 @@ gfc_expr_list_len (gfc_expr_list *list)
   return len;
 }

+/* Traverse the loops with nesting depth at most
+   COLLAPSE from CODE and determine the largest
+   loop nest depth required by the loop transformations
+   found on the loops. */
+int compute_transformed_depth (gfc_code *code, int collapse)
+{
+  int new_collapse = collapse;
+  for (int i = 0; i < new_collapse; i++)
+    {
+      gcc_assert (code->op == EXEC_DO || loop_transform_p (code->op));
+      while (loop_transform_p (code->op))
+       {
+         int tile_depth
+             = gfc_expr_list_len (code->ext.omp_clauses->tile_sizes);
+         new_collapse = MAX (new_collapse, i + tile_depth);
+         code = code->block ? code->block->next : code->next;
+       }
+      code = code->block->next;
+    }
+
+  return new_collapse;
+}
+
 static tree
 gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock,
                  gfc_omp_clauses *do_clauses, tree par_clauses)
@@ -5343,6 +5370,7 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock,
      do" (or similar directive) are represented as clauses on the "omp do". */
   loop_transform_clauses = NULL;
   int omp_tile_depth = gfc_expr_list_len (omp_tile);
+  tree clauses_tail = NULL;
   while (loop_transform_p (code->op))
     {
       tree clauses = gfc_trans_omp_clauses (pblock, code->ext.omp_clauses,
@@ -5354,7 +5382,14 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock,
         directive, an error will be emitted in pass-omp_transform_loops. */
       omp_tile_depth = gfc_expr_list_len (code->ext.omp_clauses->tile_sizes);

-      loop_transform_clauses = chainon (loop_transform_clauses, clauses);
+      if (!loop_transform_clauses)
+       {
+         loop_transform_clauses = clauses;
+         clauses_tail = tree_last (clauses);
+       }
+      else
+       clauses_tail = chainon (clauses_tail, clauses);
+
       code = code->block ? code->block->next : code->next;
     }
   gcc_assert (!loop_transform_p (code->op));
@@ -5371,9 +5406,12 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock,
     collapse = clauses->orderedc;
   if (collapse <= 0)
     collapse = 1;
-
   collapse = MAX (collapse, omp_tile_depth);
+  gfc_code *first_loop = loop_transform_p (orig_code->op) ?
+    orig_code : orig_code->block->next;
+  int transform_depth = compute_transformed_depth (first_loop, collapse);

+  collapse = transform_depth;
   init = make_tree_vec (collapse);
   cond = make_tree_vec (collapse);
   incr = make_tree_vec (collapse);
@@ -5384,15 +5422,8 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock,
      on the simd construct and DO's clauses are translated elsewhere.  */
   do_clauses->sched_simd = false;

-  if (loop_transform_p (op))
-    {
-      /* This is a loop transformation on a loop which is not associated with
-        any other directive. Use the directive location instead of the loop
-        location for the clauses. */
-      omp_clauses = gfc_trans_omp_clauses (pblock, do_clauses, top_loc);
-    }
-  else
-    omp_clauses = gfc_trans_omp_clauses (pblock, do_clauses, code->loc);
+  omp_clauses = NULL;
+  omp_clauses = gfc_trans_omp_clauses (pblock, do_clauses, top_loc);
   omp_clauses = chainon (omp_clauses, loop_transform_clauses);

   for (i = 0; i < collapse; i++)
@@ -5665,7 +5696,26 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock,
        }

       if (i + 1 < collapse)
-       code = code->block->next;
+       {
+         code = code->block->next;
+
+         loop_transform_clauses = NULL;
+         clauses_tail = omp_clauses;
+         while (loop_transform_p (code->op))
+           {
+             loop_transform_clauses = gfc_trans_omp_clauses (
+                 pblock, code->ext.omp_clauses, code->loc);
+             for (tree c = loop_transform_clauses; c;
+                  c = OMP_CLAUSE_CHAIN (c))
+               OMP_CLAUSE_TRANSFORM_LEVEL (c)
+                   = build_int_cst (unsigned_type_node, i + 1);
+
+             clauses_tail = chainon (clauses_tail, loop_transform_clauses);
+             clauses_tail = tree_last (loop_transform_clauses);
+
+             code = code->block ? code->block->next : code->next;
+           }
+       }
     }

   if (pblock != &block)
diff --git a/gcc/omp-transform-loops.cc b/gcc/omp-transform-loops.cc
index 858a271261a..517faea537c 100644
--- a/gcc/omp-transform-loops.cc
+++ b/gcc/omp-transform-loops.cc
@@ -127,7 +127,7 @@ extern tree
 gimple_assign_rhs_to_tree (gimple *stmt);

 /* Substitute all definitions from SEQ bottom-up into EXPR. This is used to
-   reconstruct a tree for a gimplified expression for determinig whether or not
+   reconstruct a tree from a gimplified expression for determinig whether or not
    the number of iterations of a loop is constant. */

 tree
@@ -227,6 +227,7 @@ gomp_for_uncollapse (gomp_for *omp_for, int from_depth = 0, bool expand = false)
 {
   int collapse = gimple_omp_for_collapse (omp_for);
   gcc_assert (from_depth < collapse);
+  gcc_assert (from_depth >= 0);

   if (collapse <= 1)
     return omp_for;
@@ -266,6 +267,7 @@ gomp_for_uncollapse (gomp_for *omp_for, int from_depth = 0, bool expand = false)
   if (from_depth > 0)
     {
       gimple_omp_set_body (omp_for, body);
+      omp_for->collapse = from_depth;
       return omp_for;
     }

@@ -453,7 +455,7 @@ after transform: Misc 6.0: Loop transformations #3440") in the
 non-public OpenMP spec repository. */

 static gimple_seq
-partial_unroll (gomp_for *omp_for, tree unroll_factor,
+partial_unroll (gomp_for *omp_for, size_t level, tree unroll_factor,
                location_t loc, tree transformation_clauses, walk_ctx *ctx)
 {
   gcc_assert (unroll_factor);
@@ -463,7 +465,7 @@ partial_unroll (gomp_for *omp_for, tree unroll_factor,

   /* Partial unrolling reduces the loop nest depth of a canonical loop nest to 1
      hence outer directives cannot require a greater collapse. */
-  gcc_assert (gimple_omp_for_collapse (omp_for) <= 1);
+  gcc_assert (gimple_omp_for_collapse (omp_for) <= level + 1);

   if (dump_enabled_p ())
     dump_printf_loc (MSG_NOTE | MSG_PRIORITY_INTERNALS,
@@ -473,12 +475,12 @@ partial_unroll (gomp_for *omp_for, tree unroll_factor,

   gomp_for *unrolled_for = as_a<gomp_for *> (copy_gimple_seq_and_replace_locals (omp_for));

-  tree final = gimple_omp_for_final (unrolled_for, 0);
-  tree incr = gimple_omp_for_incr (unrolled_for, 0);
-  tree index = gimple_omp_for_index (unrolled_for, 0);
+  tree final = gimple_omp_for_final (unrolled_for, level);
+  tree incr = gimple_omp_for_incr (unrolled_for, level);
+  tree index = gimple_omp_for_index (unrolled_for, level);
   gimple_seq body = gimple_omp_body (unrolled_for);

-  tree_code cond = gimple_omp_for_cond (unrolled_for, 0);
+  tree_code cond = gimple_omp_for_cond (unrolled_for, level);
   tree step = TREE_OPERAND (incr, 1);
   gimple_omp_set_body (unrolled_for,
                       build_unroll_body (body, unroll_factor, index, incr,
@@ -503,7 +505,7 @@ partial_unroll (gomp_for *omp_for, tree unroll_factor,
       scaled_step = var;
     }
   TREE_OPERAND (incr, 1) = scaled_step;
-  gimple_omp_for_set_incr (unrolled_for, 0, incr);
+  gimple_omp_for_set_incr (unrolled_for, level, incr);

   pop_gimplify_context (result_bind);

@@ -864,7 +866,7 @@ canonicalize_conditions (gomp_for *omp_for)
  */

 static gimple_seq
-tile (gomp_for *omp_for, location_t loc, tree tile_sizes,
+tile (gomp_for *omp_for, location_t loc, size_t start_level, tree tile_sizes,
       tree transformation_clauses, walk_ctx *ctx)
 {
   if (dump_enabled_p ())
@@ -896,22 +898,21 @@ tile (gomp_for *omp_for, location_t loc, tree tile_sizes,
       collapse_clause = c;
     }

-  /* The 'omp tile' construct creates a canonical loop-nest whose nesting depth
-     equals tiling_depth. The whole loop-nest has depth at least 2 *
-     omp_tile_depth, but the 'tile loops' at levels
-     omp_tile_depth+1...2*omp_tile_depth are not in canonical loop-nest form
-     and hence cannot be associated with a loop construct. */
-  if (clause_collapse > tiling_depth)
+  /* The tiled loop nest is a canonical loop nest with nesting depth
+     tiling_depth. The tile loops below that level are not in
+     canonical loop nest form and hence cannot be associated with a
+     loop construct. */
+  if (clause_collapse > tiling_depth + start_level)
     {
       error_at (OMP_CLAUSE_LOCATION (collapse_clause),
                "collapse cannot extend below the floor loops "
                "generated by the %<omp tile%> construct");
       OMP_CLAUSE_COLLAPSE_EXPR (collapse_clause)
-         = build_int_cst (unsigned_type_node, tiling_depth);
+         = build_int_cst (unsigned_type_node, start_level + tiling_depth);
       return transform_gomp_for (omp_for, NULL, ctx);
     }

-  if (tiling_depth > collapse)
+  if (start_level + tiling_depth > collapse)
     return transform_gomp_for (omp_for, NULL, ctx);

   gcc_assert (collapse >= clause_collapse);
@@ -919,13 +920,15 @@ tile (gomp_for *omp_for, location_t loc, tree tile_sizes,
   push_gimplify_context ();

   /* Create the index variables for iterating the tiles in the floor
-     loops first tiling_depth loops transformed loop nest. */
+     loops which will be the loops at levels start_level
+     ... start_level + tiling_depth of the transformed loop nest. The
+     loops at level 0 ... start_level - 1 are left unchanged. */
   gimple_seq floor_loops_pre_body = NULL;
   size_t tile_level = 0;
   auto_vec<tree> sizes_vec;
   for (tree el = tile_sizes; el; el = TREE_CHAIN (el), tile_level++)
     {
-      size_t nest_level = tile_level;
+      size_t nest_level = start_level + tile_level;
       tree index = gimple_omp_for_index (omp_for, nest_level);
       tree init = gimple_omp_for_initial (omp_for, nest_level);
       tree incr = gimple_omp_for_incr (omp_for, nest_level);
@@ -956,6 +959,7 @@ tile (gomp_for *omp_for, location_t loc, tree tile_sizes,
       gimple_omp_for_set_incr (floor_loops, nest_level, incr);
       gimple_omp_for_set_index (floor_loops, nest_level, tile_index);
     }
+
   gbind *result_bind = gimple_build_bind (NULL, NULL, NULL);
   pop_gimplify_context (result_bind);
   gimple_seq_add_seq (gimple_omp_for_pre_body_ptr (floor_loops),
@@ -972,6 +976,9 @@ tile (gomp_for *omp_for, location_t loc, tree tile_sizes,
      to add the incomplete tile checks to each level loop. */

   tile_loops = gomp_for_uncollapse (as_a <gomp_for *> (tile_loops));
+  for (size_t i = 0; i < start_level; i++)
+    tile_loops = gimple_omp_body (tile_loops);
+
   gimple_omp_for_set_kind (as_a<gomp_for *> (tile_loops),
                           GF_OMP_FOR_KIND_TRANSFORM_LOOP);
   gimple_omp_for_set_clauses (tile_loops, NULL_TREE);
@@ -990,50 +997,51 @@ tile (gomp_for *omp_for, location_t loc, tree tile_sizes,

   tree break_label = create_artificial_label (UNKNOWN_LOCATION);
   gimple_seq_add_stmt (surrounding_seq, gimple_build_label (break_label));
-  for (size_t level = 0; level < tiling_depth; level++)
+  for (size_t tile_level = 0; tile_level < tiling_depth; tile_level++)
     {
-      tree original_index = gimple_omp_for_index (omp_for, level);
-      tree original_final = gimple_omp_for_final (omp_for, level);
+      gimple_seq level_preamble = NULL;
+      gimple_seq level_body = gimple_omp_body (level_loop);
+      auto gsi = gsi_start (level_body);

-      tree tile_index = gimple_omp_for_index (floor_loops, level);
-      tree tile_size = sizes_vec[level];
+      int nest_level = start_level + tile_level;
+      tree original_index = gimple_omp_for_index (omp_for, nest_level);
+      tree original_final = gimple_omp_for_final (omp_for, nest_level);
+
+      tree tile_index
+         = gimple_omp_for_index (floor_loops, nest_level);
+      tree tile_size = sizes_vec[tile_level];
       tree type = TREE_TYPE (tile_index);
       tree plus_type = type;

-      tree incr = gimple_omp_for_incr (omp_for, level);
+      tree incr = gimple_omp_for_incr (omp_for, nest_level);
       tree step = omp_get_for_step_from_incr (gimple_location (omp_for), incr);

       gimple_seq *pre_body = gimple_omp_for_pre_body_ptr (level_loop);
-      gimple_seq level_body = gimple_omp_body (level_loop);
       gcc_assert (gimple_omp_for_collapse (level_loop) == 1);
-      tree_code original_cond = gimple_omp_for_cond (omp_for, level);
+      tree_code original_cond = gimple_omp_for_cond (omp_for, nest_level);

       gimple_omp_for_set_initial (level_loop, 0, tile_index);

       tree tile_final = create_tmp_var (type);
-      tree scaled_tile_size = fold_build2 (MULT_EXPR, TREE_TYPE (tile_size),
-                                          tile_size, step);
+      tree scaled_tile_size
+         = fold_build2 (MULT_EXPR, TREE_TYPE (tile_size), tile_size, step);

       tree_code plus_code = PLUS_EXPR;
       if (POINTER_TYPE_P (TREE_TYPE (tile_index)))
        {
          plus_code = POINTER_PLUS_EXPR;
          int unsignedp = TYPE_UNSIGNED (TREE_TYPE (scaled_tile_size));
-         plus_type = signed_or_unsigned_type_for (unsignedp, ptrdiff_type_node);
+         plus_type
+             = signed_or_unsigned_type_for (unsignedp, ptrdiff_type_node);
        }

       scaled_tile_size = fold_convert (plus_type, scaled_tile_size);
-      gimplify_assign (tile_final,
-                      fold_build2 (plus_code, type,
-                                   tile_index, scaled_tile_size),
-                      pre_body);
+      gimplify_assign (
+         tile_final,
+         fold_build2 (plus_code, type, tile_index, scaled_tile_size),
+         pre_body);
       gimple_omp_for_set_final (level_loop, 0, tile_final);

-      /* Redefine the original loop index variable of OMP_FOR in terms of the
-        floor loop and the tiling loop index variable for the current
-        dimension/level at the top of the loop. */
-      gimple_seq level_preamble = NULL;
-
       push_gimplify_context ();

       tree body_label = create_artificial_label (UNKNOWN_LOCATION);
@@ -1047,7 +1055,6 @@ tile (gomp_for *omp_for, location_t loc, tree tile_sizes,
                                              break_label));
       gimple_seq_add_stmt (&level_preamble, gimple_build_label (body_label));

-      auto gsi = gsi_start (level_body);
       gsi_insert_seq_before (&gsi, level_preamble, GSI_SAME_STMT);
       gbind *level_bind = gimple_build_bind (NULL, NULL, NULL);
       pop_gimplify_context (level_bind);
@@ -1057,10 +1064,10 @@ tile (gomp_for *omp_for, location_t loc, tree tile_sizes,
       surrounding_seq = &level_body;
       level_loop = gsi_stmt (gsi);

-      /* The label for jumping out of the loop at the next nesting
-        level. For the outermost level, the label is put after the
-        loop-nest, for the last one it is not necessary. */
-      if (level != tiling_depth - 1)
+      /* The label for jumping out of the loop at the next
+        nesting level. For the outermost level, the label is put
+        after the loop-nest, for the last one it is not necessary. */
+      if (tile_level != tiling_depth - 1)
        {
          break_label = create_artificial_label (UNKNOWN_LOCATION);
          gsi_insert_after (&gsi, gimple_build_label (break_label),
@@ -1093,13 +1100,15 @@ tile (gomp_for *omp_for, location_t loc, tree tile_sizes,
        next_transform_depth
            = list_length (OMP_CLAUSE_TILE_SIZES (remaining_clauses));

+      size_t next_level
+         = tree_to_uhwi (OMP_CLAUSE_TRANSFORM_LEVEL (remaining_clauses));
       /* The current "omp tile" transformation reduces the nesting depth
         of the canonical loop-nest to TILING_DEPTH.
         Hence the following "omp tile" transformation is invalid if
         it requires a greater nesting depth. */
-      gcc_assert (next_transform_depth <= tiling_depth);
-      if (next_transform_depth > new_collapse)
-       new_collapse = next_transform_depth;
+      gcc_assert (next_level + next_transform_depth <= start_level + tiling_depth);
+      if (next_level + next_transform_depth > new_collapse)
+       new_collapse = next_level + next_transform_depth;
     }

   if (collapse > new_collapse)
@@ -1260,14 +1269,17 @@ transform_gomp_for (gomp_for *omp_for, tree transformation, walk_ctx *ctx)
   gimple_seq result = NULL;
   location_t loc = OMP_CLAUSE_LOCATION (transformation);
   auto dump_loc = dump_user_location_t::from_location_t (loc);
+  size_t level = tree_to_uhwi (OMP_CLAUSE_TRANSFORM_LEVEL (transformation));
   switch (OMP_CLAUSE_CODE (transformation))
     {
     case OMP_CLAUSE_UNROLL_FULL:
       gcc_assert (TREE_CHAIN (transformation) == NULL);
+      gcc_assert (level == 0);
       result = full_unroll (omp_for, loc, ctx);
       break;
     case OMP_CLAUSE_UNROLL_NONE:
       gcc_assert (TREE_CHAIN (transformation) == NULL);
+      gcc_assert (level == 0);
       if (assign_unroll_full_clause_p (omp_for, transformation))
        {
          result = full_unroll (omp_for, loc, ctx);
@@ -1275,7 +1287,7 @@ transform_gomp_for (gomp_for *omp_for, tree transformation, walk_ctx *ctx)
       else if (tree unroll_factor
               = assign_unroll_partial_clause_p (omp_for, transformation))
        {
-         result = partial_unroll (omp_for, unroll_factor, loc,
+         result = partial_unroll (omp_for, level, unroll_factor, loc,
                                   transformation, ctx);
        }
       else {
@@ -1312,12 +1324,14 @@ transform_gomp_for (gomp_for *omp_for, tree transformation, walk_ctx *ctx)
                               "factor turned into %<partial(%u)%> clause\n",
                               factor);
          }
-       result = partial_unroll (omp_for, unroll_factor, loc, transformation,
-                                ctx);
+
+       result = partial_unroll (omp_for, level,
+                                unroll_factor, loc, transformation, ctx);
       }
       break;
     case OMP_CLAUSE_TILE:
-      result = tile (omp_for, loc, OMP_CLAUSE_TILE_SIZES (transformation),
+      result = tile (omp_for, loc, level,
+                    OMP_CLAUSE_TILE_SIZES (transformation),
                     transformation, ctx);
       break;
     default:
@@ -1418,6 +1432,9 @@ print_optimized_unroll_partial_msg (tree c)
 static tree
 optimize_transformation_clauses (tree clauses)
 {
+  if (!clauses)
+    return NULL_TREE;
+
   /* The last unroll_partial clause seen in clauses, if any,
      or the last merged unroll partial clause. */
   tree unroll_partial = NULL;
@@ -1429,6 +1446,7 @@ optimize_transformation_clauses (tree clauses)
      since last_non_unroll was seen. */
   bool merged_unroll_partial = false;

+  size_t level = tree_to_uhwi (OMP_CLAUSE_TRANSFORM_LEVEL (clauses));
   for (tree c = clauses; c != NULL_TREE; c = OMP_CLAUSE_CHAIN (c))
     {
       enum omp_clause_code code = OMP_CLAUSE_CODE (c);
@@ -1516,6 +1534,24 @@ optimize_transformation_clauses (tree clauses)
        default:
          gcc_unreachable ();
        }
+
+      /* The transformations are ordered by the level of the loop-nest to which
+        they apply in decreasing order. Handle the different levels separately
+        as long as we do not implement optimizations across the levels. */
+      tree next_c = OMP_CLAUSE_CHAIN (c);
+      if (!next_c)
+       break;
+
+      size_t next_level = tree_to_uhwi (OMP_CLAUSE_TRANSFORM_LEVEL (next_c));
+      if (next_level != level)
+       {
+         gcc_assert (next_level < level);
+         tree tail = optimize_transformation_clauses (next_c);
+         OMP_CLAUSE_CHAIN (c) = tail;
+         break;
+       }
+      else level = next_level;
+
     }

   if (merged_unroll_partial && dump_enabled_p ())
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/inner-loops.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/inner-loops.f90
new file mode 100644
index 00000000000..f9ee5184dab
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/inner-loops.f90
@@ -0,0 +1,124 @@
+subroutine test1
+  !$omp parallel do collapse(2)
+  do i=0,100
+     !$omp unroll partial(2)
+     do j=-300,100
+        call dummy (j)
+     end do
+  end do
+end subroutine test1
+
+subroutine test2
+  !$omp parallel do collapse(3)
+  do i=0,100
+     !$omp unroll partial(2) ! { dg-error {loop nest depth after \!\$OMP UNROLL at \(1\) is insufficient for outer \!\$OMP PARALLEL DO} }
+     do j=-300,100
+        do k=-300,100
+           call dummy (k)
+        end do
+    end do
+   end do
+end subroutine test2
+
+subroutine test3
+!$omp parallel do collapse(3)
+do i=0,100
+    do j=-300,100
+    !$omp unroll partial(2)
+    do k=-300,100
+        call dummy (k)
+    end do
+end do
+end do
+end subroutine test3
+
+subroutine test4
+!$omp parallel do collapse(3)
+do i=0,100
+   !$omp tile sizes(3) ! { dg-error {loop nest depth after \!\$OMP TILE at \(1\) is insufficient for outer \!\$OMP PARALLEL DO} }
+    do j=-300,100
+    !$omp unroll partial(2)
+    do k=-300,100
+        call dummy (k)
+    end do
+end do
+end do
+end subroutine test4
+
+subroutine test5
+  !$omp parallel do collapse(3)
+  !$omp tile sizes(3,2) ! { dg-error {loop nest depth after \!\$OMP TILE at \(1\) is insufficient for outer \!\$OMP PARALLEL DO} }
+  do i=0,100
+     do j=-300,100
+        do k=-300,100
+           call dummy (k)
+        end do
+     end do
+  end do
+end subroutine test5
+
+subroutine test6
+!$omp parallel do collapse(3)
+do i=0,100
+   !$omp tile sizes(3,2)
+    do j=-300,100
+    !$omp unroll partial(2)
+    do k=-300,100
+        call dummy (k)
+    end do
+end do
+end do
+end subroutine test6
+
+subroutine test7
+!$omp parallel do collapse(3)
+do i=0,100
+   !$omp tile sizes(3,3)
+    do j=-300,100
+    !$omp tile sizes(5)
+    do k=-300,100
+        call dummy (k)
+    end do
+end do
+end do
+end subroutine test7
+
+subroutine test8
+!$omp parallel do collapse(1)
+do i=0,100
+   !$omp tile sizes(3,3)
+    do j=-300,100
+    !$omp tile sizes(5)
+    do k=-300,100
+        call dummy (k)
+    end do
+end do
+end do
+end subroutine test8
+
+subroutine test9
+!$omp parallel do collapse(3)
+do i=0,100
+   !$omp tile sizes(3,3,3) ! { dg-error {not enough DO loops for \!\$OMP TILE at \(1\)} }
+    do j=-300,100
+    !$omp tile sizes(5) ! { dg-error {loop nest depth after \!\$OMP TILE at \(1\) is insufficient for outer \!\$OMP TILE} }
+    do k=-300,100
+        call dummy (k)
+    end do
+end do
+end do
+end subroutine test9
+
+subroutine test10
+!$omp parallel do
+do i=0,100
+   !$omp tile sizes(3,3,3) ! { dg-error {not enough DO loops for \!\$OMP TILE at \(1\)} }
+    do j=-300,100
+    !$omp tile sizes(5) ! { dg-error {loop nest depth after \!\$OMP TILE at \(1\) is insufficient for outer \!\$OMP TILE} }
+    do k=-300,100
+        call dummy (k)
+    end do
+end do
+end do
+end subroutine test10
+
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-3.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-3.f90
index eaa7895eaa0..308e3b3e4d0 100644
--- a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-3.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-3.f90
@@ -2,9 +2,9 @@ subroutine test
   implicit none
   integer :: i, j, k

-  !$omp parallel do collapse(2) ordered(2)
+  !$omp parallel do collapse(2) ordered(2) ! { dg-error {'ordered' invalid in conjunction with 'omp tile'} }
   !$omp tile sizes (1,2)
-  do i = 1,100 ! { dg-error {'ordered' invalid in conjunction with 'omp tile'} }
+  do i = 1,100
      do j = 1,100
         call dummy(j)
         do k = 1,100
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-imperfect-nest.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-imperfect-nest.f90
new file mode 100644
index 00000000000..3ec1671f01f
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-imperfect-nest.f90
@@ -0,0 +1,93 @@
+subroutine test0
+    integer, allocatable, dimension (:,:) :: a,b,c
+    integer :: i, j, k, inner
+    !$omp parallel do collapse(2) private(inner)
+    !$omp tile sizes (8, 1)
+    do i = 1,m
+       !$omp tile sizes (8, 1)
+       do j = 1,n
+          !$omp unroll partial(10)
+          do k = 1, n
+             if (k == 1) then
+                inner = 0
+             endif
+          end do
+       end do
+    end do
+end subroutine test0
+
+subroutine test0m
+    integer, allocatable, dimension (:,:) :: a,b,c
+    integer :: i, j, k, inner
+    !$omp parallel do collapse(2) private(inner)
+    do i = 1,m
+       !$omp tile sizes (8, 1)
+       do j = 1,n
+          do k = 1, n
+             if (k == 1) then
+                inner = 0
+             endif
+             inner = inner + a(k, i) * b(j, k)
+          end do
+          c(j, i) = inner ! { dg-error {\!\$OMP TILE loops not perfectly nested at \(1\)} }
+       end do
+    end do
+end subroutine test0m
+
+subroutine test1
+    integer, allocatable, dimension (:,:) :: a,b,c
+    integer :: i, j, k, inner
+    !$omp parallel do collapse(2) private(inner)
+    !$omp tile sizes (8, 1)
+    do i = 1,m
+       !$omp tile sizes (8, 1)
+       do j = 1,n
+          !$omp unroll partial(10)
+          do k = 1, n
+             if (k == 1) then
+                inner = 0
+             endif
+             inner = inner + a(k, i) * b(j, k)
+          end do
+          c(j, i) = inner ! { dg-error {\!\$OMP TILE loops not perfectly nested at \(1\)} "TODO Fix with upcoming imperfect loop nest handling" { xfail *-*-* } }
+       end do
+    end do
+end subroutine test1
+
+
+subroutine test2
+    integer, allocatable, dimension (:,:) :: a,b,c
+    integer :: i, j, k, inner
+    !$omp parallel do collapse(2) private(inner)
+    !$omp tile sizes (8, 1)
+    do i = 1,m
+       !$omp tile sizes (8, 1)
+       do j = 1,n
+          do k = 1, n
+             if (k == 1) then
+                inner = 0
+             endif
+             inner = inner + a(k, i) * b(j, k)
+          end do
+          c(j, i) = inner ! { dg-error {\!\$OMP TILE loops not perfectly nested at \(1\)} }
+       end do
+    end do
+end subroutine test2
+
+subroutine test3
+    integer, allocatable, dimension (:,:) :: a,b,c
+    integer :: i, j, k, inner
+    !$omp parallel do collapse(2) private(inner)
+    do i = 1,m
+       !$omp tile sizes (8, 1)
+       do j = 1,n
+          do k = 1, n
+             if (k == 1) then
+                inner = 0
+             endif
+             inner = inner + a(k, i) * b(j, k)
+          end do
+          c(j, i) = inner ! { dg-error {\!\$OMP TILE loops not perfectly nested at \(1\)} }
+       end do
+    end do
+end subroutine test3
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-1.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-1.f90
new file mode 100644
index 00000000000..6474b9da1e2
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-1.f90
@@ -0,0 +1,16 @@
+! { dg-additional-options "-fdump-tree-original" }
+! { dg-additional-options "-fdump-tree-omp_transform_loops" }
+
+subroutine test1
+  !$omp parallel do collapse(2)
+  do i=0,100
+     !$omp tile sizes(4)
+     do j=-300,100
+        call dummy (j)
+     end do
+  end do
+end subroutine test1
+
+! Collapse of the gimple_omp_for should be unaffacted by the transformation
+! { dg-final { scan-tree-dump-times {\#pragma omp for nowait collapse\(2\) tile sizes\(4\).1\n +for \(i = 0; i <= 100; i = i \+ 1\)\n +for \(j = -300; j <= 100; j = j \+ 1\)} 1 "original" } }
+! { dg-final { scan-tree-dump-times {\#pragma omp for nowait collapse\(2\) private\(j.0\) private\(j\)\n +for \(i = 0; i < 101; i = i \+ 1\)\n +for \(.omp_tile_index.\d = -300; .omp_tile_index.\d < 101; .omp_tile_index.\d = .omp_tile_index.\d \+ 4\)} 1 "omp_transform_loops" } }
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-2.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-2.f90
new file mode 100644
index 00000000000..0d462debd72
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-2.f90
@@ -0,0 +1,23 @@
+! { dg-additional-options "-fdump-tree-original" }
+! { dg-additional-options "-fdump-tree-omp_transform_loops" }
+
+subroutine test2
+  !$omp parallel do
+  !$omp tile sizes(3,3)
+  do i=0,100
+     do j=-300,100
+        !$omp tile sizes(3,3)
+        do k=-300,100
+           do l=0,100
+              call dummy (l)
+           end do
+        end do
+     end do
+  end do
+end subroutine test2
+
+! One gimple_omp_for should cover the outer two loops, another the inner two loops
+! { dg-final { scan-tree-dump-times {\#pragma omp for nowait tile sizes\(3, 3\)@0\n +for \(i = 0; i <= 100; i = i \+ 1\)\n +for \(j = -300; j <= 100; j = j \+ 1\)\n} 1 "original" } }
+! { dg-final { scan-tree-dump-times {\#pragma omp loop_transform tile sizes\(3, 3\)@0\n +for \(k = -300; k <= 100; k = k \+ 1\)\n +for \(l = 0; l <= 100; l = l \+ 1\)} 1 "original" } }
+! Collapse after the transformations should be 1
+! { dg-final { scan-tree-dump-times {\#pragma omp for nowait\n +for \(.omp_tile_index.\d = 0; .omp_tile_index.\d < 101; .omp_tile_index.\d = .omp_tile_index.\d \+ \d\)} 1 "omp_transform_loops" } }
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-3.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-3.f90
new file mode 100644
index 00000000000..3ce87ad8a4b
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-3.f90
@@ -0,0 +1,22 @@
+! { dg-additional-options "-fdump-tree-original" }
+! { dg-additional-options "-fdump-tree-omp_transform_loops" }
+
+subroutine test3
+  !$omp parallel do
+  !$omp tile sizes(3,3,3)
+  do i=0,100
+     do j=-300,100
+        !$omp tile sizes(3,3)
+        do k=-300,100
+           do l=0,100
+              call dummy (l)
+           end do
+        end do
+     end do
+  end do
+end subroutine test3
+
+! gimple_omp_for collapse should be extended to cover all loops affected by the transformations (i.e. 4)
+! { dg-final { scan-tree-dump-times {\#pragma omp for nowait tile sizes\(3, 3, 3\)@0 tile sizes\(3, 3\)@2\n +for \(i = 0; i <= 100; i = i \+ 1\)\n +for \(j = -300; j <= 100; j = j \+ 1\)\n +for \(k = -300; k <= 100; k = k \+ 1\)\n +for \(l = 0; l <= 100; l = l \+ 1\)} 1 "original" } }
+! Collapse after the transformations should be 1
+! { dg-final { scan-tree-dump-times {\#pragma omp for nowait private\(l.0\) private\(k\)\n +for \(.omp_tile_index.\d = 0; .omp_tile_index.\d < 101; .omp_tile_index.\d = .omp_tile_index.\d \+ \d\)} 1 "omp_transform_loops" } }
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-3a.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-3a.f90
new file mode 100644
index 00000000000..2c06d2094ba
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-3a.f90
@@ -0,0 +1,31 @@
+! { dg-additional-options "-fdump-tree-original" }
+! { dg-additional-options "-fdump-tree-omp_transform_loops" }
+
+subroutine test
+  !$omp tile sizes(3,3,3)
+  do i=0,100
+     do j=-300,100
+        !$omp tile sizes(3,3)
+        do k=-300,100
+           do l=0,100
+              call dummy (l)
+           end do
+        end do
+     end do
+  end do
+end subroutine test
+
+! gimple_omp_for collapse should be extended to cover all loops affected by the transformations (i.e. 4)
+! { dg-final { scan-tree-dump-times {\#pragma omp loop_transform tile sizes\(3, 3, 3\)@0 tile sizes\(3, 3\)@2\n +for \(i = 0; i <= 100; i = i \+ 1\)\n +for \(j = -300; j <= 100; j = j \+ 1\)\n +for \(k = -300; k <= 100; k = k \+ 1\)\n +for \(l = 0; l <= 100; l = l \+ 1\)} 1 "original" } }
+
+! The loops should be lowered after the tiling transformations
+! { dg-final { scan-tree-dump-not {\#pragma omp} "omp_transform_loops" } }
+
+! Third level is tiled first by the inner construct. The resulting floor loop is tiled by the outer construct.
+! { dg-final { scan-tree-dump-times {if \(.omp_tile_index.1} 2 "omp_transform_loops" } }
+
+! All other levels are tiled once
+! { dg-final { scan-tree-dump-times {if \(.omp_tile_index.2} 1 "omp_transform_loops" } }
+! { dg-final { scan-tree-dump-times {if \(.omp_tile_index.3} 1 "omp_transform_loops" } }
+! { dg-final { scan-tree-dump-times {if \(.omp_tile_index.4} 1 "omp_transform_loops" } }
+
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-4.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-4.f90
new file mode 100644
index 00000000000..355d977fe35
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-4.f90
@@ -0,0 +1,30 @@
+! { dg-additional-options "-fdump-tree-original" }
+! { dg-additional-options "-fdump-tree-omp_transform_loops" }
+
+subroutine test3
+  !$omp parallel do
+  !$omp tile sizes(3)
+  do i=0,100
+     do j=-300,100
+        !$omp tile sizes(3,3)
+        do k=-300,100
+           do l=0,100
+              call dummy (l)
+           end do
+        end do
+     end do
+  end do
+end subroutine test3
+
+! The outer gimple_omp_for should not cover the loop with the tile transformation
+! { dg-final { scan-tree-dump-times {\#pragma omp for nowait tile sizes\(3\)@0\n +for \(i = 0; i <= 100; i = i \+ 1\)\n} 1 "original" } }
+! { dg-final { scan-tree-dump-times {\#pragma omp loop_transform tile sizes\(3, 3\)@0\n +for \(k = -300; k <= 100; k = k \+ 1\)\n +for \(l = 0; l <= 100; l = l \+ 1\)} 1 "original" } }
+
+
+! After transformations, the outer loop should be a floor loop created
+! by the tiling and the outer construct type and non-transformation
+! clauses should be unaffected by the tiling
+! { dg-final { scan-tree-dump {\#pragma omp for nowait\n +for \(.omp_tile_index.\d = 0; .omp_tile_index.\d < 101; .omp_tile_index.\d = .omp_tile_index.\d \+ 3\)} "omp_transform_loops" } }
+! { dg-final { scan-tree-dump-times {\#pragma omp} 2 "omp_transform_loops" } }
+! { dg-final { scan-tree-dump-times {\#pragma omp parallel} 1 "omp_transform_loops" } }
+! { dg-final { scan-tree-dump-times {\#pragma omp for} 1 "omp_transform_loops" } }
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-4a.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-4a.f90
new file mode 100644
index 00000000000..0c83da660f5
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-4a.f90
@@ -0,0 +1,26 @@
+! { dg-additional-options "-fdump-tree-original" }
+! { dg-additional-options "-fdump-tree-omp_transform_loops" }
+
+subroutine test3
+  !$omp tile sizes(3)
+  do i=0,100
+     do j=-300,100
+        !$omp tile sizes(3,3)
+        do k=-300,100
+           do l=0,100
+              call dummy (l)
+           end do
+        end do
+     end do
+  end do
+end subroutine test3
+
+! There should be separate gimple_omp_for constructs for the tile constructs because the tiling depth
+! of the outer construct does not reach the level of the inner construct
+! { dg-final { scan-tree-dump-times {\#pragma omp loop_transform tile sizes\(3\)@0\n +for \(i = 0; i <= 100; i = i \+ 1\)\n} 1 "original" } }
+! { dg-final { scan-tree-dump-times {\#pragma omp loop_transform tile sizes\(3, 3\)@0\n +for \(k = -300; k <= 100; k = k \+ 1\)\n +for \(l = 0; l <= 100; l = l \+ 1\)} 1 "original" } }
+
+
+! The loops should be lowered after the tiling transformations
+! { dg-final { scan-tree-dump-not {\#pragma omp} "omp_transform_loops" } }
+! { dg-final { scan-tree-dump-times {if \(.omp_tile_index} 3 "omp_transform_loops" } }
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-5.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-5.f90
new file mode 100644
index 00000000000..670e14caa12
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-5.f90
@@ -0,0 +1,123 @@
+subroutine test1a
+  !$omp parallel do
+  !$omp tile sizes(3,3,3)
+  do i=0,100
+     do j=-300,100
+        !$omp tile sizes(5)
+        do k=-300,100
+           call dummy (k)
+        end do
+     end do
+  end do
+end subroutine test1a
+
+subroutine test2a
+  !$omp parallel do
+  !$omp tile sizes(3,3,3,3)
+  do i=0,100
+     do j=-300,100
+        !$omp tile sizes(5,5)
+        do k=-300,100
+           do l=-300,100
+              do m=-300,100
+                 call dummy (m)
+              end do
+           end do
+        end do
+     end do
+  end do
+end subroutine test2a
+
+subroutine test3a
+  !$omp parallel do
+  !$omp tile sizes(3,3,3,3)
+  do i=0,100
+     do j=-300,100
+        !$omp tile sizes(5) ! { dg-error {loop nest depth after \!\$OMP TILE at \(1\) is insufficient for outer \!\$OMP TILE} }
+        do k=-300,100
+           do l=-300,100
+              call dummy (l)
+           end do
+        end do
+     end do
+  end do
+end subroutine test3a
+
+subroutine test4a
+  !$omp parallel do
+  !$omp tile sizes(3,3,3,3,3)
+  do i=0,100
+     do j=-300,100
+        !$omp tile sizes(5,5)  ! { dg-error {loop nest depth after \!\$OMP TILE at \(1\) is insufficient for outer \!\$OMP TILE} }
+        do k=-300,100
+           do l=-300,100
+           do m=-300,100
+              call dummy (m)
+           end do
+           end do
+        end do
+     end do
+  end do
+end subroutine test4a
+
+subroutine test1b
+  !$omp parallel do
+  !$omp tile sizes(3,3,3)
+  do i=0,100
+     do j=-300,100
+        !$omp tile sizes(5)
+        do k=-300,100
+           call dummy (k)
+        end do
+     end do
+  end do
+end subroutine test1b
+
+subroutine test2b
+  !$omp parallel do
+  !$omp tile sizes(3,3,3,3)
+  do i=0,100
+     do j=-300,100
+        !$omp tile sizes(5,5)
+        do k=-300,100
+           do l=-300,100
+              do m=-300,100
+                 call dummy (m)
+              end do
+           end do
+        end do
+     end do
+  end do
+end subroutine test2b
+
+subroutine test3b
+  !$omp parallel do
+  !$omp tile sizes(3,3,3,3)
+  do i=0,100
+     do j=-300,100
+        !$omp tile sizes(5) ! { dg-error {loop nest depth after \!\$OMP TILE at \(1\) is insufficient for outer \!\$OMP TILE} }
+        do k=-300,100
+           do l=-300,100
+              call dummy (l)
+           end do
+        end do
+     end do
+  end do
+end subroutine test3b
+
+subroutine test4b
+  !$omp parallel do
+  !$omp tile sizes(3,3,3,3,3)
+  do i=0,100
+     do j=-300,100
+        !$omp tile sizes(5,5)  ! { dg-error {loop nest depth after \!\$OMP TILE at \(1\) is insufficient for outer \!\$OMP TILE} }
+        do k=-300,100
+           do l=-300,100
+           do m=-300,100
+              call dummy (m)
+           end do
+           end do
+        end do
+     end do
+  end do
+end subroutine test4b
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-non-rectangular-1.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-non-rectangular-1.f90
new file mode 100644
index 00000000000..169c2b10e54
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-non-rectangular-1.f90
@@ -0,0 +1,71 @@
+subroutine test1
+  !$omp tile sizes(1)
+  do i = 1,100
+     do j = 1,i
+        do k = 1,100
+           call dummy(i)
+        end do
+     end do
+  end do
+  !$end omp tile
+end subroutine test1
+
+subroutine test2
+  !$omp tile sizes(1,2) ! { dg-error {'tile' loop transformation may not appear on non-rectangular for} }
+  do i = 1,100
+     do j = 1,i
+        do k = 1,100
+           call dummy(i)
+        end do
+     end do
+  end do
+  !$end omp tile
+end subroutine test2
+
+subroutine test3
+  !$omp tile sizes(1,2,1) ! { dg-error {'tile' loop transformation may not appear on non-rectangular for} }
+  do i = 1,100
+     do j = 1,i
+        do k = 1,100
+           call dummy(i)
+        end do
+     end do
+  end do
+  !$end omp tile
+end subroutine test3
+
+subroutine test4
+  !$omp tile sizes(1,2,1) ! { dg-error {'tile' loop transformation may not appear on non-rectangular for} }
+  do i = 1,100
+     do j = 1,100
+        do k = 1,i
+           call dummy(i)
+        end do
+     end do
+  end do
+  !$end omp tile
+end subroutine test4
+
+subroutine test5
+  !$omp tile sizes(1,2)
+  do i = 1,100
+     do j = 1,100
+        do k = 1,j
+           call dummy(i)
+        end do
+     end do
+  end do
+  !$end omp tile
+end subroutine test5
+
+subroutine test6
+  !$omp tile sizes(1,2,1) ! { dg-error {'tile' loop transformation may not appear on non-rectangular for} }
+  do i = 1,100
+     do j = 1,100
+        do k = 1,j
+           call dummy(i)
+        end do
+     end do
+  end do
+  !$end omp tile
+end subroutine test6
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-non-rectangular-2.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-non-rectangular-2.f90
new file mode 100644
index 00000000000..d5352e5a117
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-non-rectangular-2.f90
@@ -0,0 +1,12 @@
+subroutine test
+  !$omp tile sizes(1,2,1) ! { dg-error {'tile' loop transformation may not appear on non-rectangular for} } ! { dg-error {'tile' loop transformation may not appear on non-rectangular for
} }
+  do i = 1,100
+     do j = 1,100
+        do k = 1,i
+           call dummy(i)
+        end do
+     end do
+  end do
+  !$end omp tile
+end subroutine test
+
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-8.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-8.f90
index 9b91e5c5f98..fd687890ee6 100644
--- a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-8.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-8.f90
@@ -16,7 +16,7 @@ end subroutine test1

 ! Loop should be unrolled 1 * 2 * 3 * 4 = 24 times

-! { dg-final { scan-tree-dump {#pragma omp for nowait collapse\(1\) unroll_partial\(4\) unroll_partial\(3\) unroll_partial\(2\) unroll_partial\(1\)} "original" } }
+! { dg-final { scan-tree-dump {#pragma omp for nowait collapse\(1\) unroll_partial\(4\).0 unroll_partial\(3\).0 unroll_partial\(2\).0 unroll_partial\(1\)} "original" } }
 ! { dg-final { scan-tree-dump-not "#pragma omp loop_transform" "omp_transform_loops" } }
 ! { dg-final { scan-tree-dump-times "dummy" 24 "omp_transform_loops" } }
 ! { dg-final { scan-tree-dump-times {#pragma omp for} 1 "omp_transform_loops" } }
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-9.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-9.f90
index 849d4e77984..928ca44e811 100644
--- a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-9.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-9.f90
@@ -13,6 +13,6 @@ subroutine test1
   end do
 end subroutine test1

-! { dg-final { scan-tree-dump {#pragma omp loop_transform unroll_full unroll_partial\(3\) unroll_partial\(2\) unroll_partial\(1\)} "original" } }
+! { dg-final { scan-tree-dump {#pragma omp loop_transform unroll_full.0 unroll_partial\(3\).0 unroll_partial\(2\).0 unroll_partial\(1\).0} "original" } }
 ! { dg-final { scan-tree-dump-not "#pragma omp unroll" "omp_transform_loops" } }
 ! { dg-final { scan-tree-dump-times "dummy" 100 "omp_transform_loops" } }
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-inner-loop.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-inner-loop.f90
new file mode 100644
index 00000000000..efcc691185d
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-inner-loop.f90
@@ -0,0 +1,57 @@
+subroutine test1a
+  !$omp parallel do
+  !$omp tile sizes(3,3,3)
+  do i=0,100
+     do j=-300,100
+        !$omp unroll partial(5)
+        do k=-300,100
+           do l=0,100
+              call dummy (l)
+           end do
+        end do
+     end do
+  end do
+end subroutine test1a
+
+subroutine test1b
+  !$omp tile sizes(3,3,3)
+  do i=0,100
+     do j=-300,100
+        !$omp unroll partial(5)
+        do k=-300,100
+           do l=0,100
+              call dummy (l)
+           end do
+        end do
+     end do
+  end do
+end subroutine test1b
+
+subroutine test2a
+  !$omp parallel do
+  !$omp tile sizes(3,3,3,3)
+  do i=0,100
+     do j=-300,100
+        !$omp unroll partial(5)  ! { dg-error {loop nest depth after \!\$OMP UNROLL at \(1\) is insufficient for outer \!\$OMP TILE} }
+        do k=-300,100
+           do l=0,100
+              call dummy (l)
+           end do
+        end do
+     end do
+  end do
+end subroutine test2a
+
+subroutine test2b
+  !$omp tile sizes(3,3,3,3)
+  do i=0,100
+     do j=-300,100
+        !$omp unroll partial(5) ! { dg-error {loop nest depth after \!\$OMP UNROLL at \(1\) is insufficient for outer \!\$OMP TILE} }
+        do k=-300,100
+           do l=0,100
+              call dummy (l)
+           end do
+        end do
+     end do
+  end do
+end subroutine test2b
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-non-rect-1.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-non-rect-1.f90
new file mode 100644
index 00000000000..3da99158cc0
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-non-rect-1.f90
@@ -0,0 +1,31 @@
+subroutine test
+  implicit none
+
+  integer :: i, j, k
+  !$omp target parallel do collapse(2) ! { dg-error {invalid OpenMP non-rectangular loop step; '\(2 - 1\) \* 1' is not a multiple of loop 2 step '5'} }
+  do i = -300, 100
+    !$omp unroll partial
+    do j = i,i*2
+      call dummy (i)
+    end do
+   end do
+
+  !$omp target parallel do collapse(3) ! { dg-error {invalid OpenMP non-rectangular loop step; '\(2 - 1\) \* 1' is not a multiple of loop 3 step '5'} }
+  do i = -300, 100
+    do j = 1,10
+       !$omp unroll partial
+       do k = j,j*2 + 1
+      call dummy (i)
+    end do
+   end do
+  end do
+
+  !$omp unroll full
+  do i = -3, 5
+    do j = 1,10
+       do k = j,j*2 + 1
+      call dummy (i)
+    end do
+   end do
+  end do
+end subroutine
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-1.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-1.f90
index cda878f3037..20617e25105 100644
--- a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-1.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-1.f90
@@ -21,7 +21,7 @@ function mult (a, b) result (c)
   end do
 end function mult

-! { dg-final { scan-tree-dump-times {#pragma omp for nowait unroll_partial\(1\) tile sizes\(8, 8\)} 1 "original" } }
+! { dg-final { scan-tree-dump-times {#pragma omp for nowait unroll_partial\(1\)@0 tile sizes\(8, 8\)@0} 1 "original" } }
 ! { dg-final { scan-tree-dump-not "#pragma omp loop_transform unroll_partial" "omp_transform_loops" } }

 ! Tiling adds two floor and two tile loops.
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-2.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-2.f90
index 00615011856..c1e7f356a87 100644
--- a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-2.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-2.f90
@@ -22,7 +22,7 @@ function mult (a, b) result (c)
   !$omp end target
 end function mult

-! { dg-final { scan-tree-dump-times {#pragma omp for nowait unroll_partial\(2\) tile sizes\(8, 8, 4\)} 1 "original" } }
+! { dg-final { scan-tree-dump-times {#pragma omp for nowait unroll_partial\(2\)@0 tile sizes\(8, 8, 4\)@0} 1 "original" } }
 ! { dg-final { scan-tree-dump-not "#pragma omp loop_transform unroll_partial" "omp_transform_loops" } }

 ! Check the number of loops
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-inner-1.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-inner-1.f90
new file mode 100644
index 00000000000..bc7a890df17
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-inner-1.f90
@@ -0,0 +1,25 @@
+! { dg-additional-options "-fdump-tree-original" }
+! { dg-additional-options "-fdump-tree-omp_transform_loops" }
+
+function mult (a, b) result (c)
+  integer, allocatable, dimension (:,:) :: a,b,c
+  integer :: i, j, k, inner
+
+  allocate(c( n, m ))
+
+  !$omp parallel do collapse(2)
+  !$omp tile sizes (8,8)
+  do i = 1,m
+     do j = 1,n
+        inner = 0
+        !$omp unroll partial(10)
+        do k = 1, n
+           inner = inner + a(k, i) * b(j, k)
+        end do
+        c(j, i) = inner
+     end do
+  end do
+end function mult
+
+! { dg-final { scan-tree-dump-times "#pragma omp loop_transform unroll_partial" 1 "original" } }
+! { dg-final { scan-tree-dump-not "#pragma omp loop_transform unroll_partial" "omp_transform_loops" } }
diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
index 02c207d87a0..510f65311b5 100644
--- a/gcc/tree-pretty-print.cc
+++ b/gcc/tree-pretty-print.cc
@@ -507,9 +507,21 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, dump_flags_t flags)
       goto print_remap;
     case OMP_CLAUSE_UNROLL_FULL:
       pp_string (pp, "unroll_full");
+      if (OMP_CLAUSE_TRANSFORM_LEVEL (clause))
+       {
+         pp_string (pp, "@");
+         dump_generic_node (pp, OMP_CLAUSE_TRANSFORM_LEVEL (clause),
+                            spc, flags, false);
+       }
       break;
     case OMP_CLAUSE_UNROLL_NONE:
       pp_string (pp, "unroll_none");
+      if (OMP_CLAUSE_TRANSFORM_LEVEL (clause))
+       {
+         pp_string (pp, "@");
+         dump_generic_node (pp, OMP_CLAUSE_TRANSFORM_LEVEL (clause),
+                            spc, flags, false);
+       }
       break;
     case OMP_CLAUSE_UNROLL_PARTIAL:
       pp_string (pp, "unroll_partial");
@@ -520,6 +532,12 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, dump_flags_t flags)
                             false);
          pp_right_paren (pp);
        }
+      if (OMP_CLAUSE_TRANSFORM_LEVEL (clause))
+       {
+         pp_string (pp, "@");
+         dump_generic_node (pp, OMP_CLAUSE_TRANSFORM_LEVEL (clause),
+                            spc, flags, false);
+       }
       break;
     case OMP_CLAUSE_TILE:
       pp_string (pp, "tile sizes");
@@ -528,6 +546,12 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, dump_flags_t flags)
       dump_generic_node (pp, OMP_CLAUSE_TILE_SIZES (clause), spc, flags,
                         false);
       pp_right_paren (pp);
+      if (OMP_CLAUSE_TRANSFORM_LEVEL (clause))
+       {
+         pp_string (pp, "@");
+         dump_generic_node (pp, OMP_CLAUSE_TRANSFORM_LEVEL (clause),
+                            spc, flags, false);
+       }
       break;
     case OMP_CLAUSE__LOOPTEMP_:
       name = "_looptemp_";
diff --git a/gcc/tree.cc b/gcc/tree.cc
index 893f509fa3a..38478a0ad46 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -326,11 +326,11 @@ unsigned const char omp_clause_num_ops[] =
   0, /* OMP_CLAUSE_IF_PRESENT */
   0, /* OMP_CLAUSE_FINALIZE */
   0, /* OMP_CLAUSE_NOHOST */
-  0, /* OMP_CLAUSE_UNROLL_FULL */
+  1, /* OMP_CLAUSE_UNROLL_FULL */

-  0, /* OMP_CLAUSE_UNROLL_NONE */
-  1, /* OMP_CLAUSE_UNROLL_PARTIAL */
-  1  /* OMP_CLAUSE_TILE */
+  1, /* OMP_CLAUSE_UNROLL_NONE */
+  2, /* OMP_CLAUSE_UNROLL_PARTIAL */
+  2  /* OMP_CLAUSE_TILE */
 };

 const char * const omp_clause_code_name[] =
diff --git a/gcc/tree.h b/gcc/tree.h
index 8f4d2761d1a..0f8aebab89f 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1787,11 +1787,16 @@ class auto_suppress_location_wrappers
 #define OMP_CLAUSE_USE_DEVICE_PTR_IF_PRESENT(NODE) \
   (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_USE_DEVICE_PTR)->base.public_flag)

+/* The level of a collapsed loop nest at which the tranformation represented
+   by this clause should be applied. */
+#define OMP_CLAUSE_TRANSFORM_LEVEL(NODE) \
+  OMP_CLAUSE_OPERAND (NODE, 0)
+
 #define OMP_CLAUSE_UNROLL_PARTIAL_EXPR(NODE) \
-  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_UNROLL_PARTIAL), 0)
+  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_UNROLL_PARTIAL), 1)

 #define OMP_CLAUSE_TILE_SIZES(NODE) \
-  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_TILE), 0)
+  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_TILE), 1)

 #define OMP_CLAUSE_PROC_BIND_KIND(NODE) \
   (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_PROC_BIND)->omp_clause.subcode.proc_bind_kind)
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/inner-1.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/inner-1.f90
new file mode 100644
index 00000000000..1db97feb34d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/inner-1.f90
@@ -0,0 +1,77 @@
+module matrix
+  implicit none
+  integer :: n = 10
+  integer :: m =  10
+
+contains
+  function mult (a, b) result (c)
+    integer, allocatable, dimension (:,:) :: a,b,c
+    integer :: i, j, k, inner
+
+    allocate(c( n, m ))
+    !$omp target parallel do collapse(2) private(inner) map(to:a,b) map(from:c)
+    !$omp tile sizes (8, 1)
+    do i = 1,m
+       !$omp tile sizes (8)
+       do j = 1,n
+          !$omp unroll partial(10)
+          do k = 1, n
+             if (k == 1) then
+                inner = 0
+             endif
+             inner = inner + a(k, i) * b(j, k)
+             if (k == n) then
+                c(j, i) = inner
+             endif
+          end do
+       end do
+    end do
+  end function mult
+
+  subroutine print_matrix (m)
+    integer, allocatable :: m(:,:)
+    integer :: i, j, n
+
+    n = size (m, 1)
+    do i = 1,n
+       do j = 1,n
+          write (*, fmt="(i4)", advance='no') m(j, i)
+       end do
+      write (*, *)  ""
+   end do
+      write (*, *)  ""
+   end subroutine
+
+end module matrix
+
+program main
+  use matrix
+  implicit none
+
+  integer, allocatable :: a(:,:),b(:,:),c(:,:)
+  integer :: i,j
+
+  allocate(a( n, m ))
+  allocate(b( n, m ))
+
+  do i = 1,n
+     do j = 1,m
+        a(j,i) = merge(1,0, i.eq.j)
+        b(j,i) = j
+     end do
+  end do
+
+  c = mult (a, b)
+
+  call print_matrix (a)
+  call print_matrix (b)
+  call print_matrix (c)
+
+  do i = 1,n
+     do j = 1,m
+        if (b(i,j) .ne. c(i,j)) call abort ()
+     end do
+  end do
+
+
+end program main
--
2.36.1

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 7/7] openmp: Add C/C++ support for loop transformations on inner loops
  2023-03-24 15:30 [PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives Frederik Harwath
                   ` (5 preceding siblings ...)
  2023-03-24 15:30 ` [PATCH 6/7] openmp: Add Fortran support for loop transformations on inner loops Frederik Harwath
@ 2023-03-24 15:30 ` Frederik Harwath
  2023-05-15 10:19 ` [PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives Jakub Jelinek
  2023-07-28 13:04 ` [PATCH 0/4] openmp: loop transformation fixes Frederik Harwath
  8 siblings, 0 replies; 21+ messages in thread
From: Frederik Harwath @ 2023-03-24 15:30 UTC (permalink / raw)
  To: gcc-patches, tobias, jakub, joseph, jason

Add the parsing of loop transformations on inner loops of a loop-nest.

gcc/c/ChangeLog:

        * c-parser.cc (c_parser_omp_nested_loop_transform_clauses):
        Add argument for the level of loop-nest at which the clauses
        appear, ...
        (c_parser_omp_tile): ... adjust use here,
        (c_parser_omp_unroll): ... and here,
        (c_parser_omp_for_loop): ... and here.  Stop treating loop
        transformations like intervening code, parse them, and adjust
        the loop-nest depth if necessary for tiling.

gcc/cp/ChangeLog:

        * parser.cc (cp_parser_is_pragma): New function.
        (cp_parser_omp_nested_loop_transform_clauses):
        Add argument for the level of loop-nest at which the clauses
        appear, ...
        (cp_parser_omp_tile): ... adjust use here,
        (cp_parser_omp_unroll): ... and here,
        (cp_parser_omp_for_loop): ... and here.  Stop treating loop

gcc/testsuite/ChangeLog:

        * c-c++-common/gomp/loop-transforms/unroll-inner-1.c: New test.
        * c-c++-common/gomp/loop-transforms/unroll-inner-2.c: New test.

libgomp/ChangeLog
        * testsuite/libgomp.c++/loop-transforms/tile-1.C: Deleted, replaced by
        matrix-* tests.
        * testsuite/libgomp.c-c++-common/loop-transforms/matrix-1.h:
        New header file for new tests.
        * testsuite/libgomp.c-c++-common/loop-transforms/matrix-constant-iter.h:
        Likewise.
        * testsuite/libgomp.c-c++-common/loop-transforms/matrix-helper.h:
        Likewise.
        * testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-1.c:
        New test.
        * testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-unroll-full-1.c:
        New test.
        * testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-distribute-parallel-for-1.c:
        New test.
        * testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-for-1.c:
        New test.
        * testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-for-1.c:
        New test.
        * testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-1.c:
        New test.
        * testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-simd-1.c:
        New test.
        * testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-parallel-for-1.c:
        New test.
        * testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-teams-distribute-parallel-for-1.c:
        New test.
        * testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-taskloop-1.c:
        New test.
        * testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-teams-distribute-parallel-for-1.c:
        New test.
        * testsuite/libgomp.c-c++-common/loop-transforms/matrix-simd-1.c:
        New test.
        * testsuite/libgomp.c-c++-common/loop-transforms/matrix-transform-variants-1.h:
        New test.
        * testsuite/libgomp.c-c++-common/loop-transforms/unroll-non-rect-1.c:
        New test.
---
 gcc/c/c-parser.cc                             |  35 +++-
 gcc/cp/parser.cc                              |  88 ++++++--
 .../loop-transforms/imperfect-loop-nest.c     |  12 ++
 .../gomp/loop-transforms/unroll-inner-1.c     |  15 ++
 .../gomp/loop-transforms/unroll-inner-2.c     |  31 +++
 .../gomp/loop-transforms/unroll-non-rect-1.c  |  37 ++++
 .../gomp/loop-transforms/unroll-non-rect-2.c  |  22 ++
 .../libgomp.c++/loop-transforms/tile-1.C      |  52 -----
 .../loop-transforms/matrix-1.h                |  70 +++++++
 .../loop-transforms/matrix-constant-iter.h    |  71 +++++++
 .../loop-transforms/matrix-helper.h           |  19 ++
 .../loop-transforms/matrix-no-directive-1.c   |  11 +
 .../matrix-no-directive-unroll-full-1.c       |  13 ++
 .../matrix-omp-distribute-parallel-for-1.c    |   6 +
 .../loop-transforms/matrix-omp-for-1.c        |  13 ++
 .../matrix-omp-parallel-for-1.c               |  13 ++
 .../matrix-omp-parallel-masked-taskloop-1.c   |   6 +
 ...trix-omp-parallel-masked-taskloop-simd-1.c |   6 +
 .../matrix-omp-target-parallel-for-1.c        |  13 ++
 ...p-target-teams-distribute-parallel-for-1.c |   6 +
 .../loop-transforms/matrix-omp-taskloop-1.c   |   6 +
 ...trix-omp-teams-distribute-parallel-for-1.c |   6 +
 .../loop-transforms/matrix-simd-1.c           |   6 +
 .../matrix-transform-variants-1.h             | 191 ++++++++++++++++++
 .../loop-transforms/unroll-non-rect-1.c       | 129 ++++++++++++
 25 files changed, 801 insertions(+), 76 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/imperfect-loop-nest.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-inner-1.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-inner-2.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-non-rect-1.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-non-rect-2.c
 delete mode 100644 libgomp/testsuite/libgomp.c++/loop-transforms/tile-1.C
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-1.h
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-constant-iter.h
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-helper.h
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-unroll-full-1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-distribute-parallel-for-1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-for-1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-for-1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-simd-1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-parallel-for-1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-teams-distribute-parallel-for-1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-taskloop-1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-teams-distribute-parallel-for-1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-simd-1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-transform-variants-1.h
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/unroll-non-rect-1.c

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 41f9fb90037..b32f5f7547f 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -20246,7 +20246,7 @@ c_parser_omp_scan_loop_body (c_parser *parser, bool open_brace_parsed)
 }

 static int c_parser_omp_nested_loop_transform_clauses (c_parser *, tree &, int,
-                                                      const char *);
+                                                      int, const char *);

 /* Parse the restricted form of loop statements allowed by OpenACC and OpenMP.
    The real trick here is to determine the loop control variable early
@@ -20300,7 +20300,7 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
       ordered = collapse;
     }

-  c_parser_omp_nested_loop_transform_clauses (parser, clauses, collapse,
+  c_parser_omp_nested_loop_transform_clauses (parser, clauses, 0, collapse,
                                              "loop collapse");

   /* Find the depth of the loop nest affected by "omp tile"
@@ -20489,6 +20489,22 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
          else if (bracecount
                   && c_parser_next_token_is (parser, CPP_SEMICOLON))
            c_parser_consume_token (parser);
+         else if (c_parser_peek_token (parser)->pragma_kind
+                      == PRAGMA_OMP_UNROLL
+                  || c_parser_peek_token (parser)->pragma_kind
+                         == PRAGMA_OMP_TILE)
+           {
+             int depth = c_parser_omp_nested_loop_transform_clauses (
+                 parser, clauses, i + 1, count - i - 1, "loop collapse");
+             if (i + 1 + depth > count)
+               {
+                 count = i + 1 + depth;
+                 declv = grow_tree_vec (declv, count);
+                 initv = grow_tree_vec (initv, count);
+                 condv = grow_tree_vec (condv, count);
+                 incrv = grow_tree_vec (incrv, count);
+               }
+           }
          else
            {
              c_parser_error (parser, "not enough perfectly nested loops");
@@ -20500,7 +20516,7 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
              fail = true;
              count = 0;
              break;
-           }
+             }
        }
       while (1);

@@ -24066,9 +24082,9 @@ c_parser_omp_loop_transform_clause (c_parser *parser)
 }

 /* Parse zero or more OpenMP loop transformation directives that
-   follow another directive that requires a canonical loop nest and
-   append all to CLAUSES.  Return the nesting depth
-   of the transformed loop nest.
+   follow another directive that requires a canonical loop nest,
+   append all to CLAUSES and record the LEVEL at which the clauses
+   appear in the loop nest in each clause.

    REQUIRED_DEPTH is the nesting depth of the loop nest required by
    the preceding directive.  OUTER_DESCR is a description of the
@@ -24078,7 +24094,7 @@ c_parser_omp_loop_transform_clause (c_parser *parser)

 static int
 c_parser_omp_nested_loop_transform_clauses (c_parser *parser, tree &clauses,
-                                           int required_depth,
+                                           int level, int required_depth,
                                            const char *outer_descr)
 {
   tree c = NULL_TREE;
@@ -24139,6 +24155,7 @@ c_parser_omp_nested_loop_transform_clauses (c_parser *parser, tree &clauses,
       if (!transformed_depth)
        transformed_depth = last_depth;

+      OMP_CLAUSE_TRANSFORM_LEVEL (c) = build_int_cst (unsigned_type_node, level);
       if (!clauses)
        clauses = c;
       else if (last_c)
@@ -24172,7 +24189,7 @@ c_parser_omp_tile (location_t loc, c_parser *parser, bool *if_p)
     return error_mark_node;

   int required_depth = list_length (OMP_CLAUSE_TILE_SIZES (clauses));
-  c_parser_omp_nested_loop_transform_clauses (parser, clauses, required_depth,
+  c_parser_omp_nested_loop_transform_clauses (parser, clauses, 0, required_depth,
                                              "outer transformation");

   block = c_begin_compound_stmt (true);
@@ -24192,7 +24209,7 @@ c_parser_omp_unroll (location_t loc, c_parser *parser, bool *if_p)

   tree clauses = c_parser_omp_all_clauses (parser, mask, p_name, false);
   int required_depth = 1;
-  c_parser_omp_nested_loop_transform_clauses (parser, clauses, required_depth,
+  c_parser_omp_nested_loop_transform_clauses (parser, clauses, 0, required_depth,
                                              "outer transformation");

   if (!clauses)
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 8219c476153..2b65ce909fb 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -2974,6 +2974,14 @@ cp_parser_is_keyword (cp_token* token, enum rid keyword)
   return token->keyword == keyword;
 }

+/* Returns nonzero if TOKEN is a pragma of the indicated KIND.  */
+
+static bool
+cp_parser_is_pragma (cp_token* token, enum pragma_kind kind)
+{
+  return cp_parser_pragma_kind (token) == kind;
+}
+
 /* Helper function for cp_parser_error.
    Having peeked a token of kind TOK1_KIND that might signify
    a conflict marker, peek successor tokens to determine
@@ -43634,7 +43642,8 @@ cp_parser_omp_scan_loop_body (cp_parser *parser)
 }

 static int cp_parser_omp_nested_loop_transform_clauses (cp_parser *, tree &,
-                                                       int, const char *);
+                                                       int, int,
+                                                       const char *);

 /* Parse the restricted form of the for statement allowed by OpenMP.  */

@@ -43686,7 +43695,7 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
   gcc_assert (oacc_tiling || (collapse >= 1 && ordered >= 0));
   count = ordered ? ordered : collapse;

-  cp_parser_omp_nested_loop_transform_clauses (parser, clauses, count,
+  cp_parser_omp_nested_loop_transform_clauses (parser, clauses, 0, count,
                                               "loop collapse");

   /* Find the depth of the loop nest affected by "omp tile"
@@ -43956,19 +43965,42 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
       cp_parser_parse_tentatively (parser);
       for (;;)
        {
-         if (cp_lexer_next_token_is_keyword (parser->lexer, RID_FOR))
+         cp_token *tok = cp_lexer_peek_token (parser->lexer);
+         if (cp_parser_is_keyword (tok, RID_FOR))
            break;
-         else if (cp_lexer_next_token_is (parser->lexer, CPP_OPEN_BRACE))
+         else if (tok->type == CPP_OPEN_BRACE)
            {
              cp_lexer_consume_token (parser->lexer);
              bracecount++;
            }
-         else if (bracecount
-                  && cp_lexer_next_token_is (parser->lexer, CPP_SEMICOLON))
+         else if (bracecount && tok->type == CPP_SEMICOLON)
            cp_lexer_consume_token (parser->lexer);
+         else if (cp_parser_is_pragma (tok, PRAGMA_OMP_UNROLL)
+                  || cp_parser_is_pragma (tok, PRAGMA_OMP_TILE))
+           {
+             int depth = cp_parser_omp_nested_loop_transform_clauses (
+                 parser, clauses, i + 1, count - i - 1, "loop collapse");
+
+             /* Adjust the loop nest depth to the requirements of the
+                loop transformations. The collapse will be reduced
+                to value requested by the "collapse" and "ordered"
+                clauses after the execution of the loop transformations
+                in the middle end. */
+             if (i + 1 + depth > count)
+               {
+                 count = i + 1 + depth;
+                 if (declv)
+                   declv = grow_tree_vec (declv, count);
+                 initv = grow_tree_vec (initv, count);
+                 condv = grow_tree_vec (condv, count);
+                 incrv = grow_tree_vec (incrv, count);
+                 if (orig_declv)
+                   declv = grow_tree_vec (orig_declv, count);
+               }
+           }
          else
            {
-             loc = cp_lexer_peek_token (parser->lexer)->location;
+             loc = tok->location;
              error_at (loc, "not enough for loops to collapse");
              collapse_err = true;
              cp_parser_abort_tentative_parse (parser);
@@ -44027,6 +44059,27 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
        }
       else if (cp_lexer_next_token_is (parser->lexer, CPP_SEMICOLON))
        cp_lexer_consume_token (parser->lexer);
+      else if (cp_parser_is_pragma (cp_lexer_peek_token (parser->lexer),
+                                   PRAGMA_OMP_UNROLL)
+              || cp_parser_is_pragma (cp_lexer_peek_token (parser->lexer),
+                                      PRAGMA_OMP_TILE))
+       {
+         int depth =
+           cp_parser_omp_nested_loop_transform_clauses (parser, clauses,
+                                                        i + 1, count - i -1,
+                                                        "loop collapse");
+         if (i + 1 + depth > count)
+           {
+             count = i + 1 + depth;
+             if (declv)
+               declv = grow_tree_vec (declv, count);
+             initv = grow_tree_vec (initv, count);
+             condv = grow_tree_vec (condv, count);
+             incrv = grow_tree_vec (incrv, count);
+             if (orig_declv)
+               declv = grow_tree_vec (orig_declv, count);
+           }
+       }
       else
        {
          if (!collapse_err)
@@ -45787,6 +45840,7 @@ cp_parser_omp_tile_sizes (cp_parser *parser, location_t loc)

   gcc_assert (sizes);
   tree c  = build_omp_clause (loc, OMP_CLAUSE_TILE);
+  OMP_CLAUSE_TRANSFORM_LEVEL (c) = build_int_cst (unsigned_type_node, 0);
   OMP_CLAUSE_TILE_SIZES (c) = sizes;
   OMP_CLAUSE_TRANSFORM_LEVEL (c)
     = build_int_cst (unsigned_type_node, 0);
@@ -45810,8 +45864,9 @@ cp_parser_omp_tile (cp_parser *parser, cp_token *tok, bool *if_p)
     return error_mark_node;

   int required_depth = list_length (OMP_CLAUSE_TILE_SIZES (clauses));
-  cp_parser_omp_nested_loop_transform_clauses (
-      parser, clauses, required_depth, "outer transformation");
+  cp_parser_omp_nested_loop_transform_clauses (parser, clauses, 0,
+                                              required_depth,
+                                              "outer transformation");

   block = begin_omp_structured_block ();
   clauses = finish_omp_clauses (clauses, C_ORT_OMP);
@@ -45878,8 +45933,9 @@ cp_parser_omp_loop_transform_clause (cp_parser *parser)
 }

 /* Parse zero or more OpenMP loop transformation directives that
-   follow another directive that requires a canonical loop nest and
-   append all to CLAUSES.  Return the nesting depth
+   follow another directive that requires a canonical loop nest,
+   append all to CLAUSES, and require the level at which the clause
+   appears in the loop nest in each clause.  Return the nesting depth
    of the transformed loop nest.

    REQUIRED_DEPTH is the nesting depth of the loop nest required by
@@ -45890,7 +45946,7 @@ cp_parser_omp_loop_transform_clause (cp_parser *parser)

 static int
 cp_parser_omp_nested_loop_transform_clauses (cp_parser *parser, tree &clauses,
-                                            int required_depth,
+                                            int level, int required_depth,
                                             const char *outer_descr)
 {
   tree c = NULL_TREE;
@@ -45934,7 +45990,8 @@ cp_parser_omp_nested_loop_transform_clauses (cp_parser *parser, tree &clauses,
        default:
          gcc_unreachable ();
        }
-      OMP_CLAUSE_TRANSFORM_LEVEL (c) = build_int_cst (unsigned_type_node, 0);
+      OMP_CLAUSE_TRANSFORM_LEVEL (c)
+         = build_int_cst (unsigned_type_node, level);

       if (depth < last_depth)
        {
@@ -45989,8 +46046,9 @@ cp_parser_omp_unroll (cp_parser *parser, cp_token *tok, bool *if_p)
     }

   int required_depth = 1;
-  cp_parser_omp_nested_loop_transform_clauses (
-      parser, clauses, required_depth, "outer transformation");
+  cp_parser_omp_nested_loop_transform_clauses (parser, clauses, 0,
+                                              required_depth,
+                                              "outer transformation");

   block = begin_omp_structured_block ();
   ret = cp_parser_omp_for_loop (parser, OMP_LOOP_TRANS, clauses, NULL, if_p);
diff --git a/gcc/testsuite/c-c++-common/gomp/loop-transforms/imperfect-loop-nest.c b/gcc/testsuite/c-c++-common/gomp/loop-transforms/imperfect-loop-nest.c
new file mode 100644
index 00000000000..57e72dffa03
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/loop-transforms/imperfect-loop-nest.c
@@ -0,0 +1,12 @@
+void test ()
+{
+#pragma omp tile sizes (2,4,6)
+  for (unsigned i = 0; i < 10; i++)
+    for (unsigned j = 0; j < 10; j++)
+      {
+       float intervening_decl = 0; /* { dg-bogus "not enough for loops to collapse" "TODO C/C++ imperfect loop nest handling" { xfail c++ } } */
+       /* { dg-bogus "not enough perfectly nested loops" "TODO C/C++ imperfect loop nest handling" { xfail c } .-1 } */
+#pragma omp unroll partial(2)
+       for (unsigned k = 0; k < 10; k++);
+      }
+}
diff --git a/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-inner-1.c b/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-inner-1.c
new file mode 100644
index 00000000000..c365d942591
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-inner-1.c
@@ -0,0 +1,15 @@
+/* { dg-additional-options "-std=c++11" { target c++} } */
+
+extern void dummy (int);
+
+void
+test ()
+{
+
+#pragma omp target parallel for collapse(2)
+  for (int i = -300; i != 100; ++i)
+    #pragma omp unroll partial
+    for (int j = 0; j != 100; ++j)
+      dummy (i);
+}
+
diff --git a/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-inner-2.c b/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-inner-2.c
new file mode 100644
index 00000000000..3f8fbf2d45a
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-inner-2.c
@@ -0,0 +1,31 @@
+/* { dg-additional-options "-std=c++11" { target c++} } */
+
+extern void dummy (int);
+
+void
+test ()
+{
+
+#pragma omp target parallel for collapse(2)
+  for (int i = -300; i != 100; ++i)
+#pragma omp tile sizes(2)
+    for (int j = 0; j != 100; ++j)
+      dummy (i);
+
+#pragma omp target parallel for collapse(2)
+  for (int i = -300; i != 100; ++i)
+#pragma omp tile sizes(2, 3)
+    for (int j = 0; j != 100; ++j)
+      dummy (i); /* { dg-error {not enough for loops to collapse} "" { target c++ } } */
+/* { dg-error {'i' was not declared in this scope} "" { target c++ } .-1 } */
+/* { dg-error {not enough perfectly nested loops before 'dummy'} "" { target c } .-2 } */
+
+#pragma omp target parallel for collapse(2)
+  for (int i = -300; i != 100; ++i)
+#pragma omp tile sizes(2, 3)
+    for (int j = 0; j != 100; ++j)
+      for (int k = 0; k != 100; ++k)
+       dummy (i);
+}
+
+
diff --git a/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-non-rect-1.c b/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-non-rect-1.c
new file mode 100644
index 00000000000..40e7f8e4bfb
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-non-rect-1.c
@@ -0,0 +1,37 @@
+extern void dummy (int);
+
+void
+test1 ()
+{
+#pragma omp target parallel for collapse(2)
+  for (int i = -300; i != 100; ++i)
+#pragma omp unroll partial(2)
+    for (int j = i * 2; j <= i * 4 + 1; ++j)
+      dummy (i);
+
+#pragma omp target parallel for collapse(3)
+  for (int i = -300; i != 100; ++i)
+    for (int j = i; j != i * 2; ++j)
+    #pragma omp unroll partial
+    for (int k = 2; k != 100; ++k)
+      dummy (i);
+
+#pragma omp unroll full
+  for (int i = -300; i != 100; ++i)
+    for (int j = i; j != i * 2; ++j)
+    for (int k = 2; k != 100; ++k)
+      dummy (i);
+
+  for (int i = -300; i != 100; ++i)
+#pragma omp unroll full
+    for (int j = i; j != i + 10; ++j)
+    for (int k = 2; k != 100; ++k)
+      dummy (i);
+
+  for (int i = -300; i != 100; ++i)
+#pragma omp unroll full
+    for (int j = i; j != i + 10; ++j)
+    for (int k = j; k != 100; ++k)
+      dummy (i);
+}
+
diff --git a/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-non-rect-2.c b/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-non-rect-2.c
new file mode 100644
index 00000000000..7696e5d5fab
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-non-rect-2.c
@@ -0,0 +1,22 @@
+extern void dummy (int);
+
+void
+test1 ()
+{
+#pragma omp target parallel for collapse(2) /* { dg-error {invalid OpenMP non-rectangular loop step; \'\(1 - 0\) \* 1\' is not a multiple of loop 2 step \'5\'} "" { target c } } */
+  for (int i = -300; i != 100; ++i) /* { dg-error {invalid OpenMP non-rectangular loop step; \'\(1 - 0\) \* 1\' is not a multiple of loop 2 step \'5\'} "" { target c++ } } */
+#pragma omp unroll partial
+    for (int j = 2; j != i; ++j)
+      dummy (i);
+}
+
+void
+test2 ()
+{
+  int i,j;
+#pragma omp target parallel for collapse(2)
+  for (i = -300; i != 100; ++i)
+    #pragma omp unroll partial
+    for (j = 2; j != i; ++j)
+      dummy (i);
+}
diff --git a/libgomp/testsuite/libgomp.c++/loop-transforms/tile-1.C b/libgomp/testsuite/libgomp.c++/loop-transforms/tile-1.C
deleted file mode 100644
index 2a4d760720d..00000000000
--- a/libgomp/testsuite/libgomp.c++/loop-transforms/tile-1.C
+++ /dev/null
@@ -1,52 +0,0 @@
-#include <string.h>
-#include <stdio.h>
-#include <math.h>
-
-void
-mult (float *matrix1, float *matrix2, float *result, unsigned dim0,
-      unsigned dim1)
-{
-  memset (result, 0, sizeof (float) * dim0 * dim1);
-#pragma omp target parallel for collapse(3) map(tofrom:result[0:dim0*dim1]) map(to:matrix1[0:dim0*dim1], matrix2[0:dim0*dim1])
-#pragma omp tile sizes(8, 16, 4)
-  for (unsigned i = 0; i < dim0; i++)
-    for (unsigned j = 0; j < dim1; j++)
-      for (unsigned k = 0; k < dim1; k++)
-       result[i * dim1 + j] += matrix1[i * dim1 + k] * matrix2[k * dim0 + j];
-}
-
-int
-main ()
-{
-  unsigned dim0 = 20;
-  unsigned dim1 = 20;
-
-  float *result = (float *)malloc (sizeof (float) * dim0 * dim1);
-  float *matrix1 = (float *)malloc (sizeof (float) * dim0 * dim1);
-  float *matrix2 = (float *)malloc (sizeof (float) * dim0 * dim1);
-
-  for (unsigned i = 0; i < dim0; i++)
-    for (unsigned j = 0; j < dim1; j++)
-      matrix1[i * dim1 + j] = j;
-
-  for (unsigned i = 0; i < dim1; i++)
-    for (unsigned j = 0; j < dim0; j++)
-      if (i == j)
-       matrix2[i * dim0 + j] = 1;
-      else
-       matrix2[i * dim0 + j] = 0;
-
-  mult (matrix1, matrix2, result, dim0, dim1);
-
-  for (unsigned i = 0; i < dim0; i++)
-    for (unsigned j = 0; j < dim1; j++)
-      {
-       if (matrix1[i * dim1 + j] != result[i * dim1 + j])
-         {
-           printf ("ERROR at %d, %d\n", i, j);
-           __builtin_abort ();
-         }
-      }
-
-  return 0;
-}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-1.h b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-1.h
new file mode 100644
index 00000000000..b9b865cf554
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-1.h
@@ -0,0 +1,70 @@
+#include <stdlib.h>
+#include <string.h>
+#include <stdio.h>
+#include <math.h>
+
+#ifndef FUN_NAME_SUFFIX
+#define FUN_NAME_SUFFIX
+#endif
+
+#ifdef MULT
+#undef MULT
+#endif
+#define MULT CAT(mult, FUN_NAME_SUFFIX)
+
+#ifdef MAIN
+#undef MAIN
+#endif
+#define MAIN CAT(main, FUN_NAME_SUFFIX)
+
+void MULT (float *matrix1, float *matrix2, float *result,
+          unsigned dim0, unsigned dim1)
+{
+  unsigned i;
+
+  memset (result, 0, sizeof (float) * dim0 * dim1);
+  DIRECTIVE
+  TRANSFORMATION1
+  for (i = 0; i < dim0; i++)
+    TRANSFORMATION2
+    for (unsigned j = 0; j < dim1; j++)
+      TRANSFORMATION3
+      for (unsigned k = 0; k < dim1; k++)
+       result[i * dim1 + j] += matrix1[i * dim1 + k] * matrix2[k * dim0 + j];
+}
+
+int MAIN ()
+{
+  unsigned dim0 = 20;
+  unsigned dim1 = 20;
+
+  float *result = (float *)malloc (sizeof (float) * dim0 * dim1);
+  float *matrix1 = (float *)malloc (sizeof (float) * dim0 * dim1);
+  float *matrix2 = (float *)malloc (sizeof (float) * dim0 * dim1);
+
+  for (unsigned i = 0; i < dim0; i++)
+    for (unsigned j = 0; j < dim1; j++)
+      matrix1[i * dim1 + j] = j;
+
+  for (unsigned i = 0; i < dim0; i++)
+    for (unsigned j = 0; j < dim1; j++)
+      if (i == j)
+       matrix2[i * dim1 + j] = 1;
+      else
+       matrix2[i * dim1 + j] = 0;
+
+   MULT (matrix1, matrix2, result, dim0, dim1);
+
+   for (unsigned i = 0; i < dim0; i++)
+     for (unsigned j = 0; j < dim1; j++) {
+       if (matrix1[i * dim1 + j] != result[i * dim1 + j]) {
+        print_matrix (matrix1, dim0, dim1);
+        print_matrix (matrix2, dim0, dim1);
+        print_matrix (result, dim0, dim1);
+        fprintf(stderr, "%s: ERROR at %d, %d\n", __FUNCTION__, i, j);
+        abort();
+       }
+     }
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-constant-iter.h b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-constant-iter.h
new file mode 100644
index 00000000000..769c04044c3
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-constant-iter.h
@@ -0,0 +1,71 @@
+#include <stdlib.h>
+#include <string.h>
+#include <stdio.h>
+#include <math.h>
+
+#ifndef FUN_NAME_SUFFIX
+#define FUN_NAME_SUFFIX
+#endif
+
+#ifdef MULT
+#undef MULT
+#endif
+#define MULT CAT(mult, FUN_NAME_SUFFIX)
+
+#ifdef MAIN
+#undef MAIN
+#endif
+#define MAIN CAT(main, FUN_NAME_SUFFIX)
+
+void MULT (float *matrix1, float *matrix2, float *result)
+{
+  const unsigned dim0 = 20;
+  const unsigned dim1 = 20;
+
+  memset (result, 0, sizeof (float) * dim0 * dim1);
+  DIRECTIVE
+  TRANSFORMATION1
+  for (unsigned i = 0; i < dim0; i++)
+    TRANSFORMATION2
+    for (unsigned j = 0; j < dim1; j++)
+      TRANSFORMATION3
+      for (unsigned k = 0; k < dim1; k++)
+       result[i * dim1 + j] += matrix1[i * dim1 + k] * matrix2[k * dim0 + j];
+}
+
+int MAIN ()
+{
+  const unsigned dim0 = 20;
+  const unsigned dim1 = 20;
+
+  float *result = (float *)malloc (sizeof (float) * dim0 * dim1);
+  float *matrix1 = (float *)malloc (sizeof (float) * dim0 * dim1);
+  float *matrix2 = (float *)malloc (sizeof (float) * dim0 * dim1);
+
+  for (unsigned i = 0; i < dim0; i++)
+    for (unsigned j = 0; j < dim1; j++)
+      matrix1[i * dim1 + j] = j;
+
+  for (unsigned i = 0; i < dim0; i++)
+    for (unsigned j = 0; j < dim1; j++)
+      if (i == j)
+       matrix2[i * dim1 + j] = 1;
+      else
+       matrix2[i * dim1 + j] = 0;
+
+   MULT (matrix1, matrix2, result);
+
+   for (unsigned i = 0; i < dim0; i++)
+     for (unsigned j = 0; j < dim1; j++) {
+       if (matrix1[i * dim1 + j] != result[i * dim1 + j]) {
+        __builtin_printf("%s: error at %d, %d\n", __FUNCTION__, i, j);
+        print_matrix (matrix1, dim0, dim1);
+        print_matrix (matrix2, dim0, dim1);
+        print_matrix (result, dim0, dim1);
+        __builtin_printf("\n");
+        __builtin_abort();
+       }
+     }
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-helper.h b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-helper.h
new file mode 100644
index 00000000000..4f69463d9dd
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-helper.h
@@ -0,0 +1,19 @@
+#include <stdio.h>
+#include <stdlib.h>
+
+#define CAT(x,y) XCAT(x,y)
+#define XCAT(x,y) x ## y
+#define DO_PRAGMA(x) XDO_PRAGMA(x)
+#define XDO_PRAGMA(x) _Pragma (#x)
+
+
+void print_matrix (float *matrix, unsigned dim0, unsigned dim1)
+{
+  for (unsigned i = 0; i < dim0; i++)
+    {
+      for (unsigned j = 0; j < dim1; j++)
+       fprintf (stderr, "%f ", matrix[i * dim1 + j]);
+      fprintf (stderr, "\n");
+    }
+  fprintf (stderr, "\n");
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-1.c b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-1.c
new file mode 100644
index 00000000000..9f7f02041b0
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-1.c
@@ -0,0 +1,11 @@
+/* { dg-additional-options {-fdump-tree-original} } */
+
+#define COMMON_DIRECTIVE
+#define COLLAPSE_1 collapse(1)
+#define COLLAPSE_2 collapse(2)
+#define COLLAPSE_3 collapse(3)
+
+#include "matrix-transform-variants-1.h"
+
+/* A consistency check to prevent broken macro usage. */
+/* { dg-final { scan-tree-dump-times "unroll_partial" 12 "original" } } */
diff --git a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-unroll-full-1.c b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-unroll-full-1.c
new file mode 100644
index 00000000000..5dd0b5d2989
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-unroll-full-1.c
@@ -0,0 +1,13 @@
+/* { dg-additional-options {-fdump-tree-original} } */
+
+#define COMMON_DIRECTIVE
+#define COMMON_TOP_TRANSFORM omp unroll full
+#define COLLAPSE_1
+#define COLLAPSE_2
+#define COLLAPSE_3
+#define IMPLEMENTATION_FILE "matrix-constant-iter.h"
+
+#include "matrix-transform-variants-1.h"
+
+/* A consistency check to prevent broken macro usage. */
+/* { dg-final { scan-tree-dump-times "unroll_full" 13 "original" } } */
diff --git a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-distribute-parallel-for-1.c b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-distribute-parallel-for-1.c
new file mode 100644
index 00000000000..d855857e5ee
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-distribute-parallel-for-1.c
@@ -0,0 +1,6 @@
+#define COMMON_DIRECTIVE "omp teams distribute parallel for"
+#define COLLAPSE_1 "collapse(1)"
+#define COLLAPSE_2 "collapse(2)"
+#define COLLAPSE_3 "collapse(3)"
+
+#include "matrix-transform-variants-1.h"
diff --git a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-for-1.c b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-for-1.c
new file mode 100644
index 00000000000..f2a2b80b2fd
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-for-1.c
@@ -0,0 +1,13 @@
+/* { dg-additional-options {-fdump-tree-original} } */
+
+#define COMMON_DIRECTIVE omp for
+#define COLLAPSE_1 collapse(1)
+#define COLLAPSE_2 collapse(2)
+#define COLLAPSE_3 collapse(3)
+
+#include "matrix-transform-variants-1.h"
+
+
+/* A consistency check to prevent broken macro usage. */
+/* { dg-final { scan-tree-dump-times "omp for" 13 "original" } } */
+/* { dg-final { scan-tree-dump-times "collapse" 12 "original" } } */
diff --git a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-for-1.c b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-for-1.c
new file mode 100644
index 00000000000..2c5701efca4
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-for-1.c
@@ -0,0 +1,13 @@
+/* { dg-additional-options {-fdump-tree-original} } */
+
+#define COMMON_DIRECTIVE omp parallel for
+#define COLLAPSE_1 collapse(1)
+#define COLLAPSE_2 collapse(2)
+#define COLLAPSE_3
+
+#include "matrix-transform-variants-1.h"
+
+
+/* A consistency check to prevent broken macro usage. */
+/* { dg-final { scan-tree-dump-times "omp parallel" 13 "original" } } */
+/* { dg-final { scan-tree-dump-times "collapse" 9 "original" } } */
diff --git a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-1.c b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-1.c
new file mode 100644
index 00000000000..e2def212725
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-1.c
@@ -0,0 +1,6 @@
+#define COMMON_DIRECTIVE omp parallel masked taskloop
+#define COLLAPSE_1 collapse(1)
+#define COLLAPSE_2 collapse(2)
+#define COLLAPSE_3
+
+#include "matrix-transform-variants-1.h"
diff --git a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-simd-1.c b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-simd-1.c
new file mode 100644
index 00000000000..ce601555cfb
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-simd-1.c
@@ -0,0 +1,6 @@
+#define COMMON_DIRECTIVE omp parallel masked taskloop simd
+#define COLLAPSE_1 collapse(1)
+#define COLLAPSE_2 collapse(2)
+#define COLLAPSE_3
+
+#include "matrix-transform-variants-1.h"
diff --git a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-parallel-for-1.c b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-parallel-for-1.c
new file mode 100644
index 00000000000..365b39ba385
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-parallel-for-1.c
@@ -0,0 +1,13 @@
+/* { dg-additional-options {-fdump-tree-original} } */
+
+#define COMMON_DIRECTIVE omp target parallel for map(tofrom:result[0:dim0*dim1]) map(to:matrix1[0:dim0*dim1], matrix2[0:dim0*dim1])
+#define COLLAPSE_1 collapse(1)
+#define COLLAPSE_2 collapse(2)
+#define COLLAPSE_3
+
+#include "matrix-transform-variants-1.h"
+
+/* A consistency check to prevent broken macro usage. */
+/* { dg-final { scan-tree-dump-times "omp target" 13 "original" } } */
+/* { dg-final { scan-tree-dump-times "collapse" 9 "original" } } */
+/* { dg-final { scan-tree-dump-times "unroll_partial" 12 "original" } } */
diff --git a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-teams-distribute-parallel-for-1.c b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-teams-distribute-parallel-for-1.c
new file mode 100644
index 00000000000..8afe34874c9
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-teams-distribute-parallel-for-1.c
@@ -0,0 +1,6 @@
+#define COMMON_DIRECTIVE omp target teams distribute parallel for map(tofrom:result[:dim0*dim1]) map(to:matrix1[0:dim0*dim1], matrix2[0:dim0*dim1])
+#define COLLAPSE_1 collapse(1)
+#define COLLAPSE_2 collapse(2)
+#define COLLAPSE_3
+
+#include "matrix-transform-variants-1.h"
diff --git a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-taskloop-1.c b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-taskloop-1.c
new file mode 100644
index 00000000000..bbc78b39db0
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-taskloop-1.c
@@ -0,0 +1,6 @@
+#define COMMON_DIRECTIVE omp taskloop
+#define COLLAPSE_1 collapse(1)
+#define COLLAPSE_2 collapse(2)
+#define COLLAPSE_3 collapse(3)
+
+#include "matrix-transform-variants-1.h"
diff --git a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-teams-distribute-parallel-for-1.c b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-teams-distribute-parallel-for-1.c
new file mode 100644
index 00000000000..3a58e479374
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-teams-distribute-parallel-for-1.c
@@ -0,0 +1,6 @@
+#define COMMON_DIRECTIVE omp teams distribute parallel for
+#define COLLAPSE_1 collapse(1)
+#define COLLAPSE_2 collapse(2)
+#define COLLAPSE_3
+
+#include "matrix-transform-variants-1.h"
diff --git a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-simd-1.c b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-simd-1.c
new file mode 100644
index 00000000000..e5155dcf76d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-simd-1.c
@@ -0,0 +1,6 @@
+#define COMMON_DIRECTIVE omp simd
+#define COLLAPSE_1 collapse(1)
+#define COLLAPSE_2 collapse(2)
+#define COLLAPSE_3 collapse(3)
+
+#include "matrix-transform-variants-1.h"
diff --git a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-transform-variants-1.h b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-transform-variants-1.h
new file mode 100644
index 00000000000..24c3d073024
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-transform-variants-1.h
@@ -0,0 +1,191 @@
+#include "matrix-helper.h"
+
+#ifndef COMMON_TOP_TRANSFORM
+#define COMMON_TOP_TRANSFORM
+#endif
+
+#ifndef IMPLEMENTATION_FILE
+#define IMPLEMENTATION_FILE "matrix-1.h"
+#endif
+
+#define FUN_NAME_SUFFIX 1
+#define DIRECTIVE DO_PRAGMA(COMMON_DIRECTIVE)
+#define TRANSFORMATION1 DO_PRAGMA(COMMON_TOP_TRANSFORM) _Pragma("omp unroll partial(2)") _Pragma("omp tile sizes(10)")
+#define TRANSFORMATION2
+#define TRANSFORMATION3
+#include IMPLEMENTATION_FILE
+
+#undef DIRECTIVE
+#undef TRANSFORMATION1
+#undef TRANSFORMATION2
+#undef TRANSFORMATION3
+#undef FUN_NAME_SUFFIX
+
+#define FUN_NAME_SUFFIX 2
+#define DIRECTIVE DO_PRAGMA(COMMON_DIRECTIVE COLLAPSE_3)
+#define TRANSFORMATION1 DO_PRAGMA(COMMON_TOP_TRANSFORM) _Pragma("omp tile sizes(8,16,4)")
+#define TRANSFORMATION2
+#define TRANSFORMATION3
+#include IMPLEMENTATION_FILE
+
+#undef DIRECTIVE
+#undef TRANSFORMATION1
+#undef TRANSFORMATION2
+#undef TRANSFORMATION3
+#undef FUN_NAME_SUFFIX
+
+#define FUN_NAME_SUFFIX 3
+#define DIRECTIVE DO_PRAGMA(COMMON_DIRECTIVE COLLAPSE_2)
+#define TRANSFORMATION1 DO_PRAGMA(COMMON_TOP_TRANSFORM) _Pragma("omp tile sizes(8, 8)")
+#define TRANSFORMATION2
+#define TRANSFORMATION3
+#include IMPLEMENTATION_FILE
+
+#undef DIRECTIVE
+#undef TRANSFORMATION1
+#undef TRANSFORMATION2
+#undef TRANSFORMATION3
+#undef FUN_NAME_SUFFIX
+
+#define FUN_NAME_SUFFIX 4
+#define DIRECTIVE DO_PRAGMA(COMMON_DIRECTIVE COLLAPSE_1)
+#define TRANSFORMATION1 DO_PRAGMA(COMMON_TOP_TRANSFORM) _Pragma("omp tile sizes(8, 8)")
+#define TRANSFORMATION2
+#define TRANSFORMATION3
+#include IMPLEMENTATION_FILE
+
+#undef DIRECTIVE
+#undef TRANSFORMATION1
+#undef TRANSFORMATION2
+#undef TRANSFORMATION3
+#undef FUN_NAME_SUFFIX
+
+#define FUN_NAME_SUFFIX 5
+#define DIRECTIVE DO_PRAGMA(COMMON_DIRECTIVE COLLAPSE_1)
+#define TRANSFORMATION1 DO_PRAGMA(COMMON_TOP_TRANSFORM) _Pragma("omp tile sizes(8, 8, 8)")
+#define TRANSFORMATION2
+#define TRANSFORMATION3
+#include IMPLEMENTATION_FILE
+
+#undef DIRECTIVE
+#undef TRANSFORMATION1
+#undef TRANSFORMATION2
+#undef TRANSFORMATION3
+#undef FUN_NAME_SUFFIX
+
+#define FUN_NAME_SUFFIX 6
+#define DIRECTIVE DO_PRAGMA(COMMON_DIRECTIVE COLLAPSE_1)
+#define TRANSFORMATION1 DO_PRAGMA(COMMON_TOP_TRANSFORM) _Pragma("omp tile sizes(10)") _Pragma("omp unroll partial(2)")
+#define TRANSFORMATION2
+#define TRANSFORMATION3
+#include IMPLEMENTATION_FILE
+
+#undef DIRECTIVE
+#undef TRANSFORMATION1
+#undef TRANSFORMATION2
+#undef TRANSFORMATION3
+#undef FUN_NAME_SUFFIX
+
+#define FUN_NAME_SUFFIX 7
+#define DIRECTIVE DO_PRAGMA(COMMON_DIRECTIVE COLLAPSE_2)
+#define TRANSFORMATION1 DO_PRAGMA(COMMON_TOP_TRANSFORM) _Pragma("omp tile sizes(7, 11)")
+#define TRANSFORMATION2 _Pragma("omp unroll partial(7)")
+#define TRANSFORMATION3
+#include IMPLEMENTATION_FILE
+
+#undef DIRECTIVE
+#undef TRANSFORMATION1
+#undef TRANSFORMATION2
+#undef TRANSFORMATION3
+#undef FUN_NAME_SUFFIX
+
+#define FUN_NAME_SUFFIX 8
+#define DIRECTIVE DO_PRAGMA(COMMON_DIRECTIVE COLLAPSE_2)
+#define TRANSFORMATION1 DO_PRAGMA(COMMON_TOP_TRANSFORM) _Pragma("omp tile sizes(7, 11)")
+#define TRANSFORMATION2 _Pragma("omp tile sizes(7)") _Pragma("omp unroll partial(7)")
+#define TRANSFORMATION3
+#include IMPLEMENTATION_FILE
+
+#undef DIRECTIVE
+#undef TRANSFORMATION1
+#undef TRANSFORMATION2
+#undef TRANSFORMATION3
+#undef FUN_NAME_SUFFIX
+
+#define FUN_NAME_SUFFIX 9
+#define DIRECTIVE DO_PRAGMA(COMMON_DIRECTIVE COLLAPSE_2)
+#define TRANSFORMATION1 DO_PRAGMA(COMMON_TOP_TRANSFORM) _Pragma("omp tile sizes(7, 11)")
+#define TRANSFORMATION2 _Pragma("omp tile sizes(7)") _Pragma("omp unroll partial(3)") _Pragma("omp tile sizes(7)")
+#define TRANSFORMATION3
+#include IMPLEMENTATION_FILE
+
+#undef DIRECTIVE
+#undef TRANSFORMATION1
+#undef TRANSFORMATION2
+#undef TRANSFORMATION3
+#undef FUN_NAME_SUFFIX
+
+#define FUN_NAME_SUFFIX 10
+#define DIRECTIVE DO_PRAGMA(COMMON_DIRECTIVE COLLAPSE_1)
+#define TRANSFORMATION1 DO_PRAGMA(COMMON_TOP_TRANSFORM) _Pragma("omp unroll partial(5)") _Pragma("omp tile sizes(7)") _Pragma("omp unroll partial(3)") _Pragma("omp tile sizes(7)")
+#define TRANSFORMATION2
+#define TRANSFORMATION3
+#include IMPLEMENTATION_FILE
+
+#undef DIRECTIVE
+#undef TRANSFORMATION1
+#undef TRANSFORMATION2
+#undef TRANSFORMATION3
+#undef FUN_NAME_SUFFIX
+
+#define FUN_NAME_SUFFIX 11
+#define DIRECTIVE DO_PRAGMA(COMMON_DIRECTIVE COLLAPSE_2)
+#define TRANSFORMATION1 DO_PRAGMA(COMMON_TOP_TRANSFORM)
+#define TRANSFORMATION2 _Pragma("omp unroll partial(5)") _Pragma("omp tile sizes(7)") _Pragma("omp unroll partial(3)") _Pragma("omp tile sizes(7)")
+#define TRANSFORMATION3
+#include IMPLEMENTATION_FILE
+
+#undef DIRECTIVE
+#undef TRANSFORMATION1
+#undef TRANSFORMATION2
+#undef TRANSFORMATION3
+#undef FUN_NAME_SUFFIX
+
+#define FUN_NAME_SUFFIX 12
+#define DIRECTIVE DO_PRAGMA(COMMON_DIRECTIVE COLLAPSE_3)
+#define TRANSFORMATION1 DO_PRAGMA(COMMON_TOP_TRANSFORM)
+#define TRANSFORMATION2
+#define TRANSFORMATION3 _Pragma("omp unroll partial(5)") _Pragma("omp tile sizes(7)") _Pragma("omp unroll partial(3)") _Pragma("omp tile sizes(7)")
+#include IMPLEMENTATION_FILE
+
+#undef DIRECTIVE
+#undef TRANSFORMATION1
+#undef TRANSFORMATION2
+#undef TRANSFORMATION3
+#undef FUN_NAME_SUFFIX
+
+#define FUN_NAME_SUFFIX 13
+#define DIRECTIVE DO_PRAGMA(COMMON_DIRECTIVE COLLAPSE_3)
+#define TRANSFORMATION1 DO_PRAGMA(COMMON_TOP_TRANSFORM)
+#define TRANSFORMATION2 _Pragma("omp tile sizes(7,8)")
+#define TRANSFORMATION3 _Pragma("omp unroll partial(3)") _Pragma("omp tile sizes(7)")
+#include IMPLEMENTATION_FILE
+
+int main ()
+{
+  main1 ();
+  main2 ();
+  main3 ();
+  main4 ();
+  main5 ();
+  main6 ();
+  main7 ();
+  main8 ();
+  main9 ();
+  main10 ();
+  main11 ();
+  main12 ();
+  main13 ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/unroll-non-rect-1.c b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/unroll-non-rect-1.c
new file mode 100644
index 00000000000..2f9924aea1f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/unroll-non-rect-1.c
@@ -0,0 +1,129 @@
+#include <stdio.h>
+#include <stdlib.h>
+
+void test1 ()
+{
+  int sum = 0;
+  for (int i = -3; i != 1; ++i)
+    for (int j = -2; j < i * -1; ++j)
+      sum++;
+
+  if (sum != 14)
+    {
+      fprintf (stderr, "%s: Wrong sum: %d\n", __FUNCTION__, sum);
+      abort ();
+    }
+}
+
+void test2 ()
+{
+  int sum = 0;
+  #pragma omp unroll partial
+  for (int i = -3; i != 1; ++i)
+    for (int j = -2; j < i * -1; ++j)
+      sum++;
+
+  if (sum != 14)
+    {
+      fprintf (stderr, "%s: Wrong sum: %d\n", __FUNCTION__, sum);
+      abort ();
+    }
+}
+
+void test3 ()
+{
+  int sum = 0;
+  #pragma omp unroll partial
+  for (int i = -3; i != 1; ++i)
+  #pragma omp unroll partial
+    for (int j = -2; j < i * -1; ++j)
+      sum++;
+
+  if (sum != 14)
+    {
+      fprintf (stderr, "%s: Wrong sum: %d\n", __FUNCTION__, sum);
+      abort ();
+    }
+}
+
+void test4 ()
+{
+  int sum = 0;
+#pragma omp for
+#pragma omp unroll partial(5)
+  for (int i = -3; i != 1; ++i)
+#pragma omp unroll partial(2)
+    for (int j = -2; j < i * -1; ++j)
+      sum++;
+
+  if (sum != 14)
+    {
+      fprintf (stderr, "%s: Wrong sum: %d\n", __FUNCTION__, sum);
+      abort ();
+    }
+}
+
+void test5 ()
+{
+  int sum = 0;
+#pragma omp parallel for reduction(+:sum)
+#pragma omp unroll partial(2)
+  for (int i = -3; i != 1; ++i)
+#pragma omp unroll partial(2)
+    for (int j = -2; j < i * -1; ++j)
+      sum++;
+
+  if (sum != 14)
+    {
+      fprintf (stderr, "%s: Wrong sum: %d\n", __FUNCTION__, sum);
+      abort ();
+    }
+}
+
+void test6 ()
+{
+  int sum = 0;
+#pragma omp target parallel for reduction(+:sum)
+#pragma omp unroll partial(7)
+  for (int i = -3; i != 1; ++i)
+#pragma omp unroll partial(2)
+    for (int j = -2; j < i * -1; ++j)
+      sum++;
+
+  if (sum != 14)
+    {
+      fprintf (stderr, "%s: Wrong sum: %d\n", __FUNCTION__, sum);
+      abort ();
+    }
+}
+
+void test7 ()
+{
+  int sum = 0;
+#pragma omp target teams distribute parallel for reduction(+:sum)
+#pragma omp unroll partial(7)
+  for (int i = -3; i != 1; ++i)
+#pragma omp unroll partial(2)
+    for (int j = -2; j < i * -1; ++j)
+      sum++;
+
+  if (sum != 14)
+    {
+      fprintf (stderr, "%s: Wrong sum: %d\n", __FUNCTION__, sum);
+      abort ();
+    }
+}
+
+int
+main ()
+{
+  test1 ();
+  test2 ();
+  test3 ();
+  test4 ();
+  test5 ();
+  test6 ();
+  test7 ();
+
+  return 0;
+}
--
2.36.1

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/7] openmp: Add Fortran support for "omp unroll" directive
  2023-03-24 15:30 ` [PATCH 1/7] openmp: Add Fortran support for "omp unroll" directive Frederik Harwath
@ 2023-04-01  8:42   ` Thomas Schwinge
  2023-04-06 13:07     ` Frederik Harwath
  0 siblings, 1 reply; 21+ messages in thread
From: Thomas Schwinge @ 2023-04-01  8:42 UTC (permalink / raw)
  To: Frederik Harwath; +Cc: gcc-patches, tobias, fortran, jakub

Hi Frederik!

Thanks for including a good number of test cases with your code changes!

This new test case:

On 2023-03-24T16:30:39+0100, Frederik Harwath <frederik@codesourcery.com> wrote:
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90
> @@ -0,0 +1,52 @@
> +! { dg-additional-options "-fdump-tree-original" }
> +! { dg-do run }
> +
> +module test_functions
> +  contains
> +  integer function compute_sum() result(sum)
> +    implicit none
> +
> +    integer :: i,j
> +
> +    !$omp do
> +    do i = 1,10,3
> +       !$omp unroll full
> +       do j = 1,10,3
> +          sum = sum + 1
> +       end do
> +    end do
> +  end function
> +
> +  integer function compute_sum2() result(sum)
> +    implicit none
> +
> +    integer :: i,j
> +
> +    !$omp parallel do reduction(+:sum)
> +    !$omp unroll partial(2)
> +    do i = 1,10,3
> +       do j = 1,10,3
> +          sum = sum + 1
> +       end do
> +    end do
> +  end function
> +end module test_functions
> +
> +program test
> +  use test_functions
> +  implicit none
> +
> +  integer :: result
> +
> +  result = compute_sum ()
> +  write (*,*) result
> +  if (result .ne. 16) then
> +     call abort
> +  end if
> +
> +  result = compute_sum2 ()
> +  write (*,*) result
> +  if (result .ne. 16) then
> +     call abort
> +  end if
> +end program

... I see FAIL for x86_64-pc-linux-gnu '-m32' (thus, host, not
offloading), '-O0' (only):

    spawn [open ...]
      1437822992

    Program aborted. Backtrace:
    #0  0x8048df0 in ???
    #1  0x8048ea6 in ???
    #2  0x559a3af2 in ???
    #3  0x8048bc0 in ???
    FAIL: libgomp.fortran/loop-transforms/unroll-1.f90   -O0  execution test

All other variants PASS with:

    spawn [open ...]
              16
              16

And similarly, this new test case:

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90
> @@ -0,0 +1,33 @@
> +! { dg-options "-fno-openmp -fopenmp-simd" }
> +! { dg-additional-options "-fdump-tree-original" }
> +! { dg-do run }
> +
> +module test_functions
> +  contains
> +  integer function compute_sum() result(sum)
> +    implicit none
> +
> +    integer :: i,j
> +
> +    !$omp simd
> +    do i = 1,10,3
> +       !$omp unroll full
> +       do j = 1,10,3
> +          sum = sum + 1
> +       end do
> +    end do
> +  end function compute_sum
> +end module test_functions
> +
> +program test
> +  use test_functions
> +  implicit none
> +
> +  integer :: result
> +
> +  result = compute_sum ()
> +  write (*,*) result
> +  if (result .ne. 16) then
> +     call abort
> +  end if
> +end program

... I see FAIL for x86_64-pc-linux-gnu '-m32' (thus, host, not
offloading), '-O0' (only):

    spawn [open ...]
              41

    Program aborted. Backtrace:
    #0  0x8048c35 in ???
    #1  0x8048c72 in ???
    #2  0x55977af2 in ???
    #3  0x8048a60 in ???
    FAIL: libgomp.fortran/loop-transforms/unroll-simd-1.f90   -O0  execution test

All other variants PASS with:

    spawn [open ...]
              16


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/7] openmp: Add Fortran support for "omp unroll" directive
  2023-04-01  8:42   ` Thomas Schwinge
@ 2023-04-06 13:07     ` Frederik Harwath
  0 siblings, 0 replies; 21+ messages in thread
From: Frederik Harwath @ 2023-04-06 13:07 UTC (permalink / raw)
  To: Thomas Schwinge, Frederik Harwath; +Cc: gcc-patches, tobias, fortran, jakub

[-- Attachment #1: Type: text/plain, Size: 653 bytes --]

Hi Thomas,

On 01.04.23 10:42, Thomas Schwinge wrote:
> ... I see FAIL for x86_64-pc-linux-gnu '-m32' (thus, host, not
> offloading), '-O0' (only):
>    
[...]
>      FAIL: libgomp.fortran/loop-transforms/unroll-1.f90   -O0  execution test
[...]
>      FAIL: libgomp.fortran/loop-transforms/unroll-simd-1.f90   -O0  execution test


Thank you for reporting the failures! They are caused by mistakes in the 
test code, not the implementation. I have attached a patch which fixes 
the failures.

I have been able to reproduce the failures with -m32. With the patch 
they went away, even with 100 of repeated test executions ;-).


Best regards,

Frederik

[-- Attachment #2: 0001-openmp-Fix-loop-transformation-tests.patch --]
[-- Type: text/x-patch, Size: 2453 bytes --]

From 3f471ed293d2e97198a65447d2f0d2bb69a2f305 Mon Sep 17 00:00:00 2001
From: Frederik Harwath <frederik@codesourcery.com>
Date: Thu, 6 Apr 2023 14:52:07 +0200
Subject: [PATCH] openmp: Fix loop transformation tests

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/loop-transforms/tile-2.f90: Add reduction clause.
	* testsuite/libgomp.fortran/loop-transforms/unroll-1.f90: Initialize var.
	* testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90: Add reduction
	and initialization.
---
 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90   | 2 +-
 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90 | 2 ++
 .../libgomp.fortran/loop-transforms/unroll-simd-1.f90          | 3 ++-
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90
index 6aedbf4724f..a7cb5e7635d 100644
--- a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90
@@ -69,7 +69,7 @@ module test_functions
     integer :: i,j
 
     sum = 0
-    !$omp parallel do collapse(2)
+    !$omp parallel do collapse(2) reduction(+:sum)
     !$omp tile sizes(6,10)
     do i = 1,10,3
        do j = 1,10,3
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90
index f07aab898fa..b91ea275577 100644
--- a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90
@@ -8,6 +8,7 @@ module test_functions
 
     integer :: i,j
 
+    sum = 0
     !$omp do
     do i = 1,10,3
        !$omp unroll full
@@ -22,6 +23,7 @@ module test_functions
 
     integer :: i,j
 
+    sum = 0
     !$omp parallel do reduction(+:sum)
     !$omp unroll partial(2)
     do i = 1,10,3
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90
index 5fb64ddd6fd..7a43458f0dd 100644
--- a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90
@@ -9,7 +9,8 @@ module test_functions
 
     integer :: i,j
 
-    !$omp simd
+    sum = 0
+    !$omp simd reduction(+:sum)
     do i = 1,10,3
        !$omp unroll full
        do j = 1,10,3
-- 
2.36.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives
  2023-03-24 15:30 [PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives Frederik Harwath
                   ` (6 preceding siblings ...)
  2023-03-24 15:30 ` [PATCH 7/7] openmp: Add C/C++ " Frederik Harwath
@ 2023-05-15 10:19 ` Jakub Jelinek
  2023-05-15 11:03   ` Jakub Jelinek
  2023-05-16  9:45   ` Frederik Harwath
  2023-07-28 13:04 ` [PATCH 0/4] openmp: loop transformation fixes Frederik Harwath
  8 siblings, 2 replies; 21+ messages in thread
From: Jakub Jelinek @ 2023-05-15 10:19 UTC (permalink / raw)
  To: Frederik Harwath; +Cc: gcc-patches, fortran, tobias, joseph, jason

On Fri, Mar 24, 2023 at 04:30:38PM +0100, Frederik Harwath wrote:
> this patch series implements the OpenMP 5.1 "unroll" and "tile"
> constructs.  It includes changes to the C,C++, and Fortran front end
> for parsing the new constructs and a new middle-end
> "omp_transform_loops" pass which implements the transformations in a
> source language agnostic way.

I'm afraid we can't do it this way, at least not completely.

The OpenMP requirements and what is being discussed for further loop
transformations pretty much requires parts of it to be done as soon as possible.
My understanding is that that is where other implementations implement that
too and would also prefer GCC not to be the only implementation that takes
significantly different decision in that case from other implementations
like e.g. in the offloading case (where all other implementations
preprocess/parse etc. source multiple times compared to GCC splitting stuff
only at IPA time; this affects what can be done with metadirectives,
declare variant etc.).
Now, e.g. data sharing is done almost exclusively during gimplification,
the proposed pass is later than that; it needs to be done before the data
sharing.  Ditto doacross handling.
The normal loop constructs (OMP_FOR, OMP_SIMD, OMP_DISTRIBUTE, OMP_LOOP)
already need to know given their collapse/ordered how many loops they are
actually associated with and the loop transformation constructs can change
that.
So, I think we need to do the loop transformations in the FEs, that doesn't
mean we need to write everything 3 times, once for each frontend.
Already now, e.g. various stuff is shared between C and C++ FEs in c-family,
though how much can be shared between c-family and Fortran is to be
discovered.

Or at least partially, to the extent that we compute how many canonical
loops the loop transformations result in, what artificial iterators they
will use etc., so that during gimplification we can take all that into
account and then can do the actual transformations later.

For C, I think the lowering of loop transformation constructs or at least
determining what it means can be done right after we actually parse it and
before we finalize the OMP_FOR eetc. that wraps it if any.  As discussed last
week at F2F, I think we want to remember in OMP_FOR_ORIG_DECLS the user
iterators on the loop transformation constructs and take it into account
for data sharing purposes.

For C++ in templates we obviously need to defer that until instantiations,
the constants in the clauses etc. could be template parameters etc.

For Fortran during resolving.

>  The "unroll" and "tile" directives are
> internally implemented as clauses.  This fits the representation of

So perhaps just use OMP_UNROLL/OMP_TILE as GENERIC constructs like
OMP_FOR etc. but with some argument where from the early loop
transformation analysis you can remember the important stuff, whether
does the loop transformation result in a canonical loop nest or not
and in the former case with how many nested loops.

And then handle the actual transformation IMHO best at gimplification
time, find them in the OMP_FOR etc. body if they are nested in there,
let the transformation happen on GENERIC before the containing OMP_FOR
etc. if any is actually finalized and from the transformation remember
the original user decls and what should happen with them for data sharing
(e.g. lastprivate/lastprivate conditional).
From the slides I saw last week, a lot of other transformations are in the
planning, like loop reversal etc.
And, I think even in OpenMP 5.1 nothing prevents e.g.
#pragma omp for collapse(3) // etc.
#pragma omp tile sizes (4, 2, 2)
#pragma omp tile sizes (4, 8, 16)
for (int i = 0; i < 64; ++i)
  for (int j = 0; j < 64; ++j)
    for (int k = 0; k < 64; ++k)
      body;
where the inner tile takes the i and j loops and makes
for (int i1 = 0; i1 < 64; i1 += 4)
  for (int j1 = 0; j1 < 64; j1 += 8)
    for (int k1 = 0; k1 < 64; k1 += 16)
      for (int i2 = 0; i2 < 4; i2++)
	{
	  int i = i1 + i2;
	  for (int j2 = 0; j2 < 8; j2++)
	    {
	      int j = j1 + j2;
	      for (int k2 = 0; k2 < 16; k2++)
		{
		  int k = k1 + k2;
		  body;
		}
	    }
	}
out of it with 3 outer loops which have canonical loop form (the rest
doesn't).  And then the second tile takes the outermost 3 of those generated
loops and tiles them again, making it into again 3 canonical loop form
loops plus stuff inside of it.
Or one can replace the
#pragma omp for collapse(3) // etc.
with
#pragma omp for
#pragma omp unroll partial(2)
which furthermore unrolls the outermost generated loop from the outer tile
turning it into 1 canonical loop form loop plus stuff in it.
Or of course as you have in your testcases, some loop transformation
constructs could be used on more nested loops, not necessarily before
the outermost one.  But still, in all cases you need to know quite early
how many canonical loop form nested loops you get from each loop
transformation, so that it can be e.g. checked against the collapse/ordered
clauses.

Feel free to disagree if you think your approach is able to handle all of
this, just put details in why do you think so.

	Jakub


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives
  2023-05-15 10:19 ` [PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives Jakub Jelinek
@ 2023-05-15 11:03   ` Jakub Jelinek
  2023-05-16  9:45   ` Frederik Harwath
  1 sibling, 0 replies; 21+ messages in thread
From: Jakub Jelinek @ 2023-05-15 11:03 UTC (permalink / raw)
  To: Frederik Harwath, gcc-patches, fortran, tobias, joseph, jason

On Mon, May 15, 2023 at 12:19:00PM +0200, Jakub Jelinek via Gcc-patches wrote:
> For C++ in templates we obviously need to defer that until instantiations,
> the constants in the clauses etc. could be template parameters etc.

Even in C++ the how many canonical loop nest form loops does this
transformation generate can be probably answered during parsing at least
for the 5.1/5.2 loop transformations.
I think we don't really allow
template <int ...args>
void foo ()
{
  #pragma omp for collapse(2)
  #pragma omp tile sizes(args...)
  for (int i = 0; i < 64; i++)
    for (int j = 0; j < 64; j++)
      for (int k = 0; k < 64; k++)
	;
}
there how many arguments sizes clause has would be determined only
after instantiation.  Of course, we don't know the exact values...

	Jakub


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives
  2023-05-15 10:19 ` [PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives Jakub Jelinek
  2023-05-15 11:03   ` Jakub Jelinek
@ 2023-05-16  9:45   ` Frederik Harwath
  2023-05-16 11:00     ` Jakub Jelinek
  1 sibling, 1 reply; 21+ messages in thread
From: Frederik Harwath @ 2023-05-16  9:45 UTC (permalink / raw)
  To: Jakub Jelinek, Frederik Harwath
  Cc: gcc-patches, fortran, tobias, joseph, jason

Hi Jakub,

On 15.05.23 12:19, Jakub Jelinek wrote:
> On Fri, Mar 24, 2023 at 04:30:38PM +0100, Frederik Harwath wrote:
>> this patch series implements the OpenMP 5.1 "unroll" and "tile"
>> constructs.  It includes changes to the C,C++, and Fortran front end
>> for parsing the new constructs and a new middle-end
>> "omp_transform_loops" pass which implements the transformations in a
>> source language agnostic way.
> I'm afraid we can't do it this way, at least not completely.
>
> The OpenMP requirements and what is being discussed for further loop
> transformations pretty much requires parts of it to be done as soon as possible.
> My understanding is that that is where other implementations implement that
> too and would also prefer GCC not to be the only implementation that takes
> significantly different decision in that case from other implementations

The place where different compilers implement the loop transformations
was discussed in an OpenMP loop transformation meeting last year. Two 
compilers (another one and GCC with this patch series) transformed the 
loops in the middle end after the handling of data sharing, one planned 
to do so. Yet another vendor had not yet decided where it will be 
implemented. Clang currently does everything in the front end, but it 
was mentioned that this might change in the future e.g. for code sharing 
with Flang. Implementing the loop transformations late could potentially
complicate the implementation of transformations which require 
adjustments of the data sharing clauses, but this is known and 
consequentially, no such transformations are planned for OpenMP 6.0. In 
particular, the "apply" clause therefore only permits loop-transforming 
constructs to be applied to the loops generated from other loop
transformations in TR11.

> The normal loop constructs (OMP_FOR, OMP_SIMD, OMP_DISTRIBUTE, OMP_LOOP)
> already need to know given their collapse/ordered how many loops they are
> actually associated with and the loop transformation constructs can change
> that.
> So, I think we need to do the loop transformations in the FEs, that doesn't
> mean we need to write everything 3 times, once for each frontend.
> Already now, e.g. various stuff is shared between C and C++ FEs in c-family,
> though how much can be shared between c-family and Fortran is to be
> discovered.
> Or at least partially, to the extent that we compute how many canonical
> loops the loop transformations result in, what artificial iterators they
> will use etc., so that during gimplification we can take all that into
> account and then can do the actual transformations later.

The patches in this patch series already do compute how many canonical
loop nests result from the loop transformations in the front end.
This is necessary to represent the loop nest that is affected by the
loop transformations by a single OMP_FOR to meet the expectations
of all later OpenMP code transformations. This is also the major
reason why the loop transformations are represented by clauses
instead of representing them as  "OMP_UNROLL/OMP_TILE as
GENERIC constructs like OMP_FOR" as you suggest below. Since the
loop transformations may also appear on inner loops of a collapsed
loop nest (i.e. within the collapsed depth), representing the
transformation by OMP_FOR-like constructs would imply that a collapsed
loop nest would have to be broken apart into single loops. Perhaps this
could be handled somehow, but the collapsed loop nest would have to be
re-assembled to meet the expectations of e.g. gimplification.
The clause representation is also much better suited for the upcoming
OpenMP "apply" clause where the transformations will not appear
as directives in front of actual loops but inside of other clauses.
In fact, the loop transformation clauses in the implementation already
specify the level of a loop nest to which they apply and it could
be possible to re-use this handling for "apply".

My initial reaction also was to implement the loop transformations
as OMP_FOR-like constructs and the patch actually introduces an
OMP_LOOP_TRANS construct which is used to represent loops that
are not going to be associated with another OpenMP directive after
the transformation, e.g.

void foo () {
   #pragma omp tile sizes (4, 8, 16)
   for (int i = 0; i < 64; ++i)
   {
     ...
   }

}

You suggest to implement the loop transformations during gimplification.
I am not sure if gimplification is actually well-suited to implement the 
depth-first evaluation of the loop transformations. I also believe that 
gimplification already handles too many things which conceptually are 
not related to the translation to GIMPLE. Having a separate pass seems 
to be the right move to achieve a better separation of concerns. I think 
this will be even more important in the future as the size of the loop 
transformation implementation keeps growing. As you mention below, 
several new constructs are already planned.

> For C, I think the lowering of loop transformation constructs or at least
> determining what it means can be done right after we actually parse it and
> before we finalize the OMP_FOR eetc. that wraps it if any.  As discussed last
> week at F2F, I think we want to remember in OMP_FOR_ORIG_DECLS the user
> iterators on the loop transformation constructs and take it into account
> for data sharing purposes.
>
> For C++ in templates we obviously need to defer that until instantiations,
> the constants in the clauses etc. could be template parameters etc.
>
> For Fortran during resolving.
>
>>   The "unroll" and "tile" directives are
>> internally implemented as clauses.  This fits the representation of
> So perhaps just use OMP_UNROLL/OMP_TILE as GENERIC constructs like
> OMP_FOR etc. but with some argument where from the early loop
> transformation analysis you can remember the important stuff, whether
> does the loop transformation result in a canonical loop nest or not
> and in the former case with how many nested loops.
>
> And then handle the actual transformation IMHO best at gimplification
> time, find them in the OMP_FOR etc. body if they are nested in there,
> let the transformation happen on GENERIC before the containing OMP_FOR
> etc. if any is actually finalized and from the transformation remember
> the original user decls and what should happen with them for data sharing
> (e.g. lastprivate/lastprivate conditional).
>  From the slides I saw last week, a lot of other transformations are in the
> planning, like loop reversal etc.

Correct, more transformations are planned. I have been following the 
design discussions and I do not think that any of those constructs will 
cause problems to the implementation contained in this patch series. Or 
do you see one that would be problematic?

> And, I think even in OpenMP 5.1 nothing prevents e.g.
> #pragma omp for collapse(3) // etc.
> #pragma omp tile sizes (4, 2, 2)
> #pragma omp tile sizes (4, 8, 16)
> for (int i = 0; i < 64; ++i)
>    for (int j = 0; j < 64; ++j)
>      for (int k = 0; k < 64; ++k)
>        body;
> where the inner tile takes the i and j loops and makes
> for (int i1 = 0; i1 < 64; i1 += 4)
>    for (int j1 = 0; j1 < 64; j1 += 8)
>      for (int k1 = 0; k1 < 64; k1 += 16)
>        for (int i2 = 0; i2 < 4; i2++)
> 	{
> 	  int i = i1 + i2;
> 	  for (int j2 = 0; j2 < 8; j2++)
> 	    {
> 	      int j = j1 + j2;
> 	      for (int k2 = 0; k2 < 16; k2++)
> 		{
> 		  int k = k1 + k2;
> 		  body;
> 		}
> 	    }
> 	}
> out of it with 3 outer loops which have canonical loop form (the rest
> doesn't).  And then the second tile takes the outermost 3 of those generated
> loops and tiles them again, making it into again 3 canonical loop form
> loops plus stuff inside of it.
> Or one can replace the
> #pragma omp for collapse(3) // etc.
> with
> #pragma omp for
> #pragma omp unroll partial(2)
> which furthermore unrolls the outermost generated loop from the outer tile
> turning it into 1 canonical loop form loop plus stuff in it.

You are right, this is valid and the implementation in this patch series 
is able to cope with this. The patches contain corresponding tests. The 
most extensive testing for different combinations of loop 
transformations, collapse values, and other directives is done in the 
libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-* tests.

> Or of course as you have in your testcases, some loop transformation
> constructs could be used on more nested loops, not necessarily before
> the outermost one.  But still, in all cases you need to know quite early
> how many canonical loop form nested loops you get from each loop
> transformation, so that it can be e.g. checked against the collapse/ordered
> clauses.

You are right and this happens in the implementation. The front ends 
extend the "collapse" (i.e. the loop nest depth) of the OMP_FOR-like 
constructs as necessary for the transformations (and rejects invalid 
combinations, e.g. "collapse" larger than the tiled depth, early) and 
the transformation pass reduces the "collapse" of the gomp_for to the 
value requested by the user code after the transformations have been 
executed.

> Feel free to disagree if you think your approach is able to handle all of
> this, just put details in why do you think so.

I think it is able to handle everything, as I described above.


Thanks,

Frederik

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives
  2023-05-16  9:45   ` Frederik Harwath
@ 2023-05-16 11:00     ` Jakub Jelinek
  2023-05-17 11:55       ` Frederik Harwath
  0 siblings, 1 reply; 21+ messages in thread
From: Jakub Jelinek @ 2023-05-16 11:00 UTC (permalink / raw)
  To: Frederik Harwath
  Cc: Frederik Harwath, gcc-patches, fortran, tobias, joseph, jason

On Tue, May 16, 2023 at 11:45:16AM +0200, Frederik Harwath wrote:
> The place where different compilers implement the loop transformations
> was discussed in an OpenMP loop transformation meeting last year. Two
> compilers (another one and GCC with this patch series) transformed the loops
> in the middle end after the handling of data sharing, one planned to do so.
> Yet another vendor had not yet decided where it will be implemented. Clang
> currently does everything in the front end, but it was mentioned that this
> might change in the future e.g. for code sharing with Flang. Implementing
> the loop transformations late could potentially
> complicate the implementation of transformations which require adjustments
> of the data sharing clauses, but this is known and consequentially, no such

When already in the FE we determine how many canonical loops a particular
loop transformation creates, I think the primary changes I'd like to see is
really have OMP_UNROLL/OMP_TILE GENERIC statements (see below) and consider
where is the best spot to lower it.  I believe for data sharing it is best
done during gimplification before the containing loops are handled, it is
already shared code among all the FEs, I think will make it easier to handle
data sharing right and gimplification is also where doacross processing is
done.  While there is restriction that ordered clause is incompatible with
generated loops from tile construct, there isn't one for unroll (unless
"The ordered clause must not appear on a worksharing-loop directive if the associated loops
include the generated loops of a tile directive."
means unroll partial implicitly because partial unroll tiles the loop, but
it doesn't say it acts as if it was a tile construct), so we'd have to handle
#pragma omp for ordered(2)
for (int i = 0; i < 64; i++)
  #pragma omp unroll partial(4)
  for (int j = 0; j < 64; j++)
    {
      #pragma omp ordered depend (sink: i - 1, j - 2)
      #pragma omp ordered depend (source)
    }
and I think handling it after gimplification is going to be increasingly
harder.  Of course another possibility is ask lang committee to clarify
unless it has been clarified already in 6.0 (but in TR11 it is not).
Also, I think creating temporaries is easier to be done during
gimplification than later.

Another option is as you implemented a separate pre-omp-lowering pass,
and another one would be do it in the omplower pass, which has actually
several subpasses internally, do it in the scan phase.  Disadvantage of
a completely separate pass is that we have to walk the whole IL again,
while doing it in the scan phase means we avoid that cost.  We already
do there similar transformations, scan_omp_simd transforms simd constructs
into if (...) simd else simt and then we process it with normal scan_omp_for
on what we've created.  So, if you insist doing it after gimplification
perhaps for compatibility with other non-LLVM compilers, I'd prefer to
do it there rather than in a completely separate pass.

> transformations are planned for OpenMP 6.0. In particular, the "apply"
> clause therefore only permits loop-transforming constructs to be applied to
> the loops generated from other loop
> transformations in TR11.
> 
> > The normal loop constructs (OMP_FOR, OMP_SIMD, OMP_DISTRIBUTE, OMP_LOOP)
> > already need to know given their collapse/ordered how many loops they are
> > actually associated with and the loop transformation constructs can change
> > that.
> > So, I think we need to do the loop transformations in the FEs, that doesn't
> > mean we need to write everything 3 times, once for each frontend.
> > Already now, e.g. various stuff is shared between C and C++ FEs in c-family,
> > though how much can be shared between c-family and Fortran is to be
> > discovered.
> > Or at least partially, to the extent that we compute how many canonical
> > loops the loop transformations result in, what artificial iterators they
> > will use etc., so that during gimplification we can take all that into
> > account and then can do the actual transformations later.
> 
> The patches in this patch series already do compute how many canonical
> loop nests result from the loop transformations in the front end.

Good.

> This is necessary to represent the loop nest that is affected by the
> loop transformations by a single OMP_FOR to meet the expectations
> of all later OpenMP code transformations. This is also the major
> reason why the loop transformations are represented by clauses
> instead of representing them as  "OMP_UNROLL/OMP_TILE as
> GENERIC constructs like OMP_FOR" as you suggest below. Since the

I really don't see why.  We try to represent what we see in the source
as OpenMP constructs as those constructs.  We already have a precedent
with composite loop constructs, where for the combined constructs which
aren't innermost we temporarily use NULL OMP_FOR_{INIT,COND,INCR,ORIG_DECLS}
vectors to stand for this will be some loop, but the details for it aren't
known yet, to be filled up later.  So, why can't we similarly represent
#pragma omp for collapse(3)
#pragma omp tile sizes (4, 2, 2)
#pragma omp tile sizes (4, 8, 16)
for (int i = 0; i < 64; ++i)
  for (int j = 0; j < 64; ++j)
    for (int k = 0; k < 64; ++k)
      body;
as OMP_FOR with NULL OMP_FOR_{INIT,COND,INCR,ORIG_DECLS}
with the appropriate clauses on it, with
OMP_TILE (again, right clauses, NULL OMP_FOR_{INIT,COND,INCR,ORIG_DECLS})
and another OMP_TILE, this time with all the vectors filled in in GENERIC?

#pragma omp for collapse(2)
for (int i = 0; i < 64; ++i)
#pragma omp tile sizes (4)
  for (int j = 0; j < 64; ++j)
would be represented by non-NULL vectors which would have all the inner
entries NULL (the outer loop is not generated loop, the inner one is
generated) with OMP_TILE inside of it.

Then depending on where the loop transformation is actually performed,
we'd either need to preserve such shape from gimplification until the
loop transformations are applied, or would be solely on GENERIC and
GIMPLE would already have the transformed loops.

Clauses e.g. have the disadvantage that generally they aren't ordered.
If it is separate statements, it is e.g. easier to print it right in
original dump, so that people can compare the loops before the
transformation and after it.

> You suggest to implement the loop transformations during gimplification.
> I am not sure if gimplification is actually well-suited to implement the
> depth-first evaluation of the loop transformations. I also believe that

Why not?  The loop transformation constructs can't be deeply nested in the
bodies, they need to be close.
gimplify_omp_for already searches the body for the case of composite
constructs - if (OMP_FOR_INIT (for_stmt) == NULL_TREE) early in it.
So, this would just mean doing it if that condition is true also looking
for loop transformation constructs (if they are found, pass in the
containing OMP_{FOR,SIMD,LOOP,DISTRIBUTE,TASKLOOP} if any to a routine
that handles the transformation, such that it can update the containing
looping construct if any during the transformation.
That alone would handle the case where the looping construct should work
solely with the generated loop.  It would need to do the same thing
also if OMP_FOR_INIT (for_stmt) is non-NULL but
TREE_VEC_ELT (OMP_FOR_INIT (for_stmt), TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt)))
is NULL to handle the case where generated loops are just some of the inner
ones.
And then when gimplify_omp_for would encounter an OMP_TILE/OMP_UNROLL
loop on its own (i.e. not nested inside some other loop), it would similarly
find further transform constructs in it like the above but then would just
normally do the loop transformation, with NULL_TREE for the containing loop,
meaning it is a sequential stuff.

	Jakub


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives
  2023-05-16 11:00     ` Jakub Jelinek
@ 2023-05-17 11:55       ` Frederik Harwath
  2023-05-22 14:20         ` Jakub Jelinek
  0 siblings, 1 reply; 21+ messages in thread
From: Frederik Harwath @ 2023-05-17 11:55 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Frederik Harwath, gcc-patches, fortran, tobias, joseph, jason

Hi Jakub,

On 16.05.23 13:00, Jakub Jelinek wrote:
> On Tue, May 16, 2023 at 11:45:16AM +0200, Frederik Harwath wrote:
>> The place where different compilers implement the loop transformations
>> was discussed in an OpenMP loop transformation meeting last year. Two
>> compilers (another one and GCC with this patch series) transformed 
>> the loops
>> in the middle end after the handling of data sharing, one planned to 
>> do so.
>> Yet another vendor had not yet decided where it will be implemented. 
>> Clang
>> currently does everything in the front end, but it was mentioned that 
>> this
>> might change in the future e.g. for code sharing with Flang. Implementing
>> the loop transformations late could potentially
>> complicate the implementation of transformations which require 
>> adjustments
>> of the data sharing clauses, but this is known and consequentially, 
>> no such
> When already in the FE we determine how many canonical loops a particular
> loop transformation creates, I think the primary changes I'd like to 
> see is
> really have OMP_UNROLL/OMP_TILE GENERIC statements (see below) and 
> consider
> where is the best spot to lower it. I believe for data sharing it is best
> done during gimplification before the containing loops are handled, it is
> already shared code among all the FEs, I think will make it easier to 
> handle
> data sharing right and gimplification is also where doacross processing is
> done. While there is restriction that ordered clause is incompatible with
> generated loops from tile construct, there isn't one for unroll (unless
> "The ordered clause must not appear on a worksharing-loop directive if 
> the associated loops
> include the generated loops of a tile directive."
> means unroll partial implicitly because partial unroll tiles the loop, but
> it doesn't say it acts as if it was a tile construct), so we'd have to 
> handle
> #pragma omp for ordered(2)
> for (int i = 0; i < 64; i++)
> #pragma omp unroll partial(4)
> for (int j = 0; j < 64; j++)
> {
> #pragma omp ordered depend (sink: i - 1, j - 2)
> #pragma omp ordered depend (source)
> }
> and I think handling it after gimplification is going to be increasingly
> harder. Of course another possibility is ask lang committee to clarify
> unless it has been clarified already in 6.0 (but in TR11 it is not).

I do not really expect that we will have to handle this. Questions 
concerning
the correctness of code after applying loop transformations came up several
times since I have been following the design meetings and the result was
always either that nothing will be changed, because the loop transformations
are not expected to ensure the correctness of enclosing directives, or that
the use of the problematic construct in conjunction with loop 
transformations
will be forbidden. Concerning the use of "ordered" on transformed loops, the
latter approach was suggested for all transformations, cf. issue #3494 
in the
private OpenMP spec repository. I see that you have already asked for 
clarification
on unroll. I suppose this could also be fixed after gimplification with
reasonable effort. But let's just wait for the result of that discussion 
before we
continue worrying about this.

> Also, I think creating temporaries is easier to be done during
> gimplification than later.

This has not caused problems with the current approach.

> Another option is as you implemented a separate pre-omp-lowering pass,
> and another one would be do it in the omplower pass, which has actually
> several subpasses internally, do it in the scan phase. Disadvantage of
> a completely separate pass is that we have to walk the whole IL again,
> while doing it in the scan phase means we avoid that cost. We already
> do there similar transformations, scan_omp_simd transforms simd constructs
> into if (...) simd else simt and then we process it with normal 
> scan_omp_for
> on what we've created. So, if you insist doing it after gimplification
> perhaps for compatibility with other non-LLVM compilers, I'd prefer to
> do it there rather than in a completely separate pass.

I see. This would be possible. My current approach is indeed rather
wasteful because the pass is not restricted to functions that actually
use loop transformations. I could add an attribute to such functions
that could be used to avoid the execution of the pass and hence
the gimple walk on functions that do not use transformations.

>> This is necessary to represent the loop nest that is affected by the
>> loop transformations by a single OMP_FOR to meet the expectations
>> of all later OpenMP code transformations. This is also the major
>> reason why the loop transformations are represented by clauses
>> instead of representing them as  "OMP_UNROLL/OMP_TILE as
>> GENERIC constructs like OMP_FOR" as you suggest below. Since the
> I really don't see why. We try to represent what we see in the source
> as OpenMP constructs as those constructs. We already have a precedent
> with composite loop constructs, where for the combined constructs which
> aren't innermost we temporarily use NULL 
> OMP_FOR_{INIT,COND,INCR,ORIG_DECLS}
> vectors to stand for this will be some loop, but the details for it aren't
> known yet, to be filled up later. So, why can't we similarly represent
> #pragma omp for collapse(3)
> #pragma omp tile sizes (4, 2, 2)
> #pragma omp tile sizes (4, 8, 16)
> for (int i = 0; i < 64; ++i)
> for (int j = 0; j < 64; ++j)
> for (int k = 0; k < 64; ++k)
> body;
> as OMP_FOR with NULL OMP_FOR_{INIT,COND,INCR,ORIG_DECLS}
> with the appropriate clauses on it, with
> OMP_TILE (again, right clauses, NULL OMP_FOR_{INIT,COND,INCR,ORIG_DECLS})
> and another OMP_TILE, this time with all the vectors filled in in GENERIC?
>
> #pragma omp for collapse(2)
> for (int i = 0; i < 64; ++i)
> #pragma omp tile sizes (4)
> for (int j = 0; j < 64; ++j)
> would be represented by non-NULL vectors which would have all the inner
> entries NULL (the outer loop is not generated loop, the inner one is
> generated) with OMP_TILE inside of it.
>
> Then depending on where the loop transformation is actually performed,
> we'd either need to preserve such shape from gimplification until the
> loop transformations are applied, or would be solely on GENERIC and
> GIMPLE would already have the transformed loops.

Thanks for the explanation! I think now I understand how you would do this.

> Clauses e.g. have the disadvantage that generally they aren't ordered.
> If it is separate statements, it is e.g. easier to print it right in
> original dump, so that people can compare the loops before the
> transformation and after it.
>
You mean, the clauses are not ordered at the level of the specification?
In the implementation they are of course ordered and the order has
proved to be sufficiently stable. But perhaps you mean that you would
like to avoid introducing code that relies on the ordering of the clauses?
In this case, I could move the transformations to a separate chain which
could be accessed e.g. by OMP_FOR_TRANSFORMS and by
gimple_omp_for_transforms per level. This would also allow to print the
transformations in the pretty printing functions on the corresponding levels
of the loop nest. This would also be possible somehow with the present
representation. But right now the transformations are just printed together
with the other clauses on the directive. I considerd this to be acceptable
because I suppose the dumps will be mostly read by GCC developers.
There are also other clauses that are only used internally.
>> You suggest to implement the loop transformations during gimplification.
>> I am not sure if gimplification is actually well-suited to implement the
>> depth-first evaluation of the loop transformations. I also believe that
> Why not? The loop transformation constructs can't be deeply nested in the
> bodies, they need to be close.
> gimplify_omp_for already searches the body for the case of composite
> constructs - if (OMP_FOR_INIT (for_stmt) == NULL_TREE) early in it.
> So, this would just mean doing it if that condition is true also looking
> for loop transformation constructs (if they are found, pass in the
> containing OMP_{FOR,SIMD,LOOP,DISTRIBUTE,TASKLOOP} if any to a routine
> that handles the transformation, such that it can update the containing
> looping construct if any during the transformation.
> That alone would handle the case where the looping construct should work
> solely with the generated loop. It would need to do the same thing
> also if OMP_FOR_INIT (for_stmt) is non-NULL but
> TREE_VEC_ELT (OMP_FOR_INIT (for_stmt), TREE_VEC_LENGTH (OMP_FOR_INIT 
> (for_stmt)))
> is NULL to handle the case where generated loops are just some of the 
> inner
> ones.
> And then when gimplify_omp_for would encounter an OMP_TILE/OMP_UNROLL
> loop on its own (i.e. not nested inside some other loop), it would 
> similarly
> find further transform constructs in it like the above but then would just
> normally do the loop transformation, with NULL_TREE for the containing 
> loop,
> meaning it is a sequential stuff.

Thanks for the explanation. But actually doing this would require a
complete rewrite which would almost certainly imply that mainline GCC
would not support the loop transformations for a long time.


Best regards,

Frederik


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives
  2023-05-17 11:55       ` Frederik Harwath
@ 2023-05-22 14:20         ` Jakub Jelinek
  0 siblings, 0 replies; 21+ messages in thread
From: Jakub Jelinek @ 2023-05-22 14:20 UTC (permalink / raw)
  To: Frederik Harwath
  Cc: Frederik Harwath, gcc-patches, fortran, tobias, joseph, jason

On Wed, May 17, 2023 at 01:55:00PM +0200, Frederik Harwath wrote:
> Thanks for the explanation. But actually doing this would require a
> complete rewrite which would almost certainly imply that mainline GCC
> would not support the loop transformations for a long time.

I don't think it needs complete rewrite, the change to use
OMP_UNROLL/OMP_TILE should actually simplify stuff when you already have
some other extra construct to handle the clauses if it isn't nested into
something else, so I wouldn't expect it needs more than 2-3 hours of work.
It is true that doing the transformation on trees rather than high gimple
is something different, but again it doesn't require everything to be
rewritten and we have code to do code copying both on trees and high and low
gimple in tree-inline.cc, so the unrolling can just use different APIs
to perform it.

I'd still prefer to do it like that, I think it will pay back in
maintainance costs.

If you don't get to this within say 2 weeks, I'll try to do the conversion
myself.

	Jakub


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 0/4] openmp: loop transformation fixes
  2023-03-24 15:30 [PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives Frederik Harwath
                   ` (7 preceding siblings ...)
  2023-05-15 10:19 ` [PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives Jakub Jelinek
@ 2023-07-28 13:04 ` Frederik Harwath
  2023-07-28 13:04   ` [PATCH 1/4] openmp: Fix loop transformation tests Frederik Harwath
                     ` (3 more replies)
  8 siblings, 4 replies; 21+ messages in thread
From: Frederik Harwath @ 2023-07-28 13:04 UTC (permalink / raw)
  To: gcc-patches, tobias, jakub

Hi,
the following patches contain some fixes from the devel/omp/gcc-13 branch
to the patches that implement the OpenMP 5.1. loop transformation directives
which I have posted in March 2023.

Frederik



Frederik Harwath (4):
  openmp: Fix loop transformation tests
  openmp: Fix initialization for 'unroll full'
  openmp: Fix diagnostic message for "omp unroll"
  openmp: Fix number of iterations computation for "omp unroll full"

 gcc/omp-transform-loops.cc                    | 99 ++++++++++++++-----
 .../gomp/loop-transforms/unroll-8.c           | 76 ++++++++++++++
 .../gomp/loop-transforms/unroll-8.f90         |  2 +-
 .../gomp/loop-transforms/unroll-9.f90         |  2 +-
 .../matrix-no-directive-unroll-full-1.C       | 13 +++
 .../loop-transforms/matrix-no-directive-1.c   |  2 +-
 .../matrix-no-directive-unroll-full-1.c       |  2 +-
 .../matrix-omp-distribute-parallel-for-1.c    |  2 +
 .../loop-transforms/matrix-omp-for-1.c        |  2 +-
 .../matrix-omp-parallel-for-1.c               |  2 +-
 .../matrix-omp-parallel-masked-taskloop-1.c   |  2 +
 ...trix-omp-parallel-masked-taskloop-simd-1.c |  2 +
 .../matrix-omp-target-parallel-for-1.c        |  2 +-
 ...p-target-teams-distribute-parallel-for-1.c |  2 +
 .../loop-transforms/matrix-omp-taskloop-1.c   |  2 +
 ...trix-omp-teams-distribute-parallel-for-1.c |  2 +
 .../loop-transforms/matrix-simd-1.c           |  2 +
 .../loop-transforms/unroll-1.c                |  8 +-
 .../loop-transforms/unroll-non-rect-1.c       |  2 +
 .../loop-transforms/tile-2.f90                |  2 +-
 .../loop-transforms/unroll-1.f90              |  2 +
 .../loop-transforms/unroll-6.f90              |  4 +-
 .../loop-transforms/unroll-simd-1.f90         |  3 +-
 23 files changed, 197 insertions(+), 40 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-8.c
 create mode 100644 libgomp/testsuite/libgomp.c++/loop-transforms/matrix-no-directive-unroll-full-1.C

--
2.36.1

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/4] openmp: Fix loop transformation tests
  2023-07-28 13:04 ` [PATCH 0/4] openmp: loop transformation fixes Frederik Harwath
@ 2023-07-28 13:04   ` Frederik Harwath
  2023-07-28 13:04   ` [PATCH 2/4] openmp: Fix initialization for 'unroll full' Frederik Harwath
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 21+ messages in thread
From: Frederik Harwath @ 2023-07-28 13:04 UTC (permalink / raw)
  To: gcc-patches, tobias, jakub

libgomp/ChangeLog:

        * testsuite/libgomp.fortran/loop-transforms/tile-2.f90: Add reduction clause.
        * testsuite/libgomp.fortran/loop-transforms/unroll-1.f90: Initialize var.
        * testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90: Add reduction
        and initialization.
---
 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90   | 2 +-
 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90 | 2 ++
 .../libgomp.fortran/loop-transforms/unroll-simd-1.f90          | 3 ++-
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90
index 6aedbf4724f..a7cb5e7635d 100644
--- a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90
@@ -69,7 +69,7 @@ module test_functions
     integer :: i,j

     sum = 0
-    !$omp parallel do collapse(2)
+    !$omp parallel do collapse(2) reduction(+:sum)
     !$omp tile sizes(6,10)
     do i = 1,10,3
        do j = 1,10,3
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90
index f07aab898fa..b91ea275577 100644
--- a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90
@@ -8,6 +8,7 @@ module test_functions

     integer :: i,j

+    sum = 0
     !$omp do
     do i = 1,10,3
        !$omp unroll full
@@ -22,6 +23,7 @@ module test_functions

     integer :: i,j

+    sum = 0
     !$omp parallel do reduction(+:sum)
     !$omp unroll partial(2)
     do i = 1,10,3
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90
index 5fb64ddd6fd..7a43458f0dd 100644
--- a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90
@@ -9,7 +9,8 @@ module test_functions

     integer :: i,j

-    !$omp simd
+    sum = 0
+    !$omp simd reduction(+:sum)
     do i = 1,10,3
        !$omp unroll full
        do j = 1,10,3
--
2.36.1

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 2/4] openmp: Fix initialization for 'unroll full'
  2023-07-28 13:04 ` [PATCH 0/4] openmp: loop transformation fixes Frederik Harwath
  2023-07-28 13:04   ` [PATCH 1/4] openmp: Fix loop transformation tests Frederik Harwath
@ 2023-07-28 13:04   ` Frederik Harwath
  2023-07-28 13:04   ` [PATCH 3/4] openmp: Fix diagnostic message for "omp unroll" Frederik Harwath
  2023-07-28 13:04   ` [PATCH 4/4] openmp: Fix number of iterations computation for "omp unroll full" Frederik Harwath
  3 siblings, 0 replies; 21+ messages in thread
From: Frederik Harwath @ 2023-07-28 13:04 UTC (permalink / raw)
  To: gcc-patches, tobias, jakub

The index variable initialization for the 'omp unroll'
directive with 'full' clause got lost and the testsuite
did not catch it.

Add the initialization and add -Wall to some tests
to detect uninitialized variable uses and other
potential problems in the code generation.

gcc/ChangeLog:

        * omp-transform-loops.cc (full_unroll): Add initialization of index variable.

libgomp/ChangeLog:

        * testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-unroll-full-1.c:
        Use -Wall and add -Wno-unknown-pragmas to disable warnings about empty pragmas.
        Use -O2.
        * testsuite/libgomp.c++/loop-transforms/matrix-no-directive-unroll-full-1.C:
        Copy of testsuite/libgomp.c-c++-common/matrix-no-directive-unroll-full-1.c,
        but using -O0 which works only for C++.
        * testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-1.c: Use -Wall
        and use -Wno-unknown-pragmas to disable warnings about empty pragmas.
        * testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-distribute-parallel-for-1.c:
        Likewise.
        * testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-for-1.c:
        Likewise.
        * testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-for-1.c:
        Likewise.
        * testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-1.c:
        Likewise.
        * testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-simd-1.c:
        Likewise.
        * testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-parallel-for-1.c:
        Likewise.
        * testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-teams-distribute-parallel-for-1.c:
        Likewise.
        * testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-taskloop-1.c:
        Likewise.
        * testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-teams-distribute-parallel-for-1.c:
        Likewise.
        * testsuite/libgomp.c-c++-common/loop-transforms/matrix-simd-1.c:
        Likewise.
        * testsuite/libgomp.c-c++-common/loop-transforms/unroll-non-rect-1.c:
        Likewise.
        * testsuite/libgomp.c-c++-common/loop-transforms/unroll-1.c:
        Likewise and fix broken function calls found by -Wall.
---
 gcc/omp-transform-loops.cc                          |  1 +
 .../matrix-no-directive-unroll-full-1.C             | 13 +++++++++++++
 .../loop-transforms/matrix-no-directive-1.c         |  2 +-
 .../matrix-no-directive-unroll-full-1.c             |  2 +-
 .../matrix-omp-distribute-parallel-for-1.c          |  2 ++
 .../loop-transforms/matrix-omp-for-1.c              |  2 +-
 .../loop-transforms/matrix-omp-parallel-for-1.c     |  2 +-
 .../matrix-omp-parallel-masked-taskloop-1.c         |  2 ++
 .../matrix-omp-parallel-masked-taskloop-simd-1.c    |  2 ++
 .../matrix-omp-target-parallel-for-1.c              |  2 +-
 ...rix-omp-target-teams-distribute-parallel-for-1.c |  2 ++
 .../loop-transforms/matrix-omp-taskloop-1.c         |  2 ++
 .../matrix-omp-teams-distribute-parallel-for-1.c    |  2 ++
 .../loop-transforms/matrix-simd-1.c                 |  2 ++
 .../libgomp.c-c++-common/loop-transforms/unroll-1.c |  8 +++++---
 .../loop-transforms/unroll-non-rect-1.c             |  2 ++
 16 files changed, 40 insertions(+), 8 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c++/loop-transforms/matrix-no-directive-unroll-full-1.C

diff --git a/gcc/omp-transform-loops.cc b/gcc/omp-transform-loops.cc
index 517faea537c..275a5260dae 100644
--- a/gcc/omp-transform-loops.cc
+++ b/gcc/omp-transform-loops.cc
@@ -548,6 +548,7 @@ full_unroll (gomp_for *omp_for, location_t loc, walk_ctx *ctx ATTRIBUTE_UNUSED)

   gimple_seq unrolled = NULL;
   gimple_seq_add_seq (&unrolled, gimple_omp_for_pre_body (omp_for));
+  gimplify_assign (index, init, &unrolled);
   push_gimplify_context ();
   gimple_seq_add_seq (&unrolled,
                      build_unroll_body (body, unroll_factor, index, incr));
diff --git a/libgomp/testsuite/libgomp.c++/loop-transforms/matrix-no-directive-unroll-full-1.C b/libgomp/testsuite/libgomp.c++/loop-transforms/matrix-no-directive-unroll-full-1.C
new file mode 100644
index 00000000000..3a684219627
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c++/loop-transforms/matrix-no-directive-unroll-full-1.C
@@ -0,0 +1,13 @@
+/* { dg-additional-options { -O0 -fdump-tree-original -Wall -Wno-unknown-pragmas } } */
+
+#define COMMON_DIRECTIVE
+#define COMMON_TOP_TRANSFORM omp unroll full
+#define COLLAPSE_1
+#define COLLAPSE_2
+#define COLLAPSE_3
+#define IMPLEMENTATION_FILE "../../libgomp.c-c++-common/loop-transforms/matrix-constant-iter.h"
+
+#include "../../libgomp.c-c++-common/loop-transforms/matrix-transform-variants-1.h"
+
+/* A consistency check to prevent broken macro usage. */
+/* { dg-final { scan-tree-dump-times "unroll_full" 13 "original" } } */
diff --git a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-1.c b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-1.c
index 9f7f02041b0..7904a5617f3 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-1.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-1.c
@@ -1,4 +1,4 @@
-/* { dg-additional-options {-fdump-tree-original} } */
+/* { dg-additional-options { -fdump-tree-original -Wall -Wno-unknown-pragmas } } */

 #define COMMON_DIRECTIVE
 #define COLLAPSE_1 collapse(1)
diff --git a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-unroll-full-1.c b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-unroll-full-1.c
index 5dd0b5d2989..bd431a25102 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-unroll-full-1.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-unroll-full-1.c
@@ -1,4 +1,4 @@
-/* { dg-additional-options {-fdump-tree-original} } */
+/* { dg-additional-options { -O2 -fdump-tree-original -Wall -Wno-unknown-pragmas } } */

 #define COMMON_DIRECTIVE
 #define COMMON_TOP_TRANSFORM omp unroll full
diff --git a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-distribute-parallel-for-1.c b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-distribute-parallel-for-1.c
index d855857e5ee..3875014dc96 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-distribute-parallel-for-1.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-distribute-parallel-for-1.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options { -Wall -Wno-unknown-pragmas } } */
+
 #define COMMON_DIRECTIVE "omp teams distribute parallel for"
 #define COLLAPSE_1 "collapse(1)"
 #define COLLAPSE_2 "collapse(2)"
diff --git a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-for-1.c b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-for-1.c
index f2a2b80b2fd..671396cd533 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-for-1.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-for-1.c
@@ -1,4 +1,4 @@
-/* { dg-additional-options {-fdump-tree-original} } */
+/* { dg-additional-options { -fdump-tree-original -Wall -Wno-unknown-pragmas } } */

 #define COMMON_DIRECTIVE omp for
 #define COLLAPSE_1 collapse(1)
diff --git a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-for-1.c b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-for-1.c
index 2c5701efca4..cc66df42679 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-for-1.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-for-1.c
@@ -1,4 +1,4 @@
-/* { dg-additional-options {-fdump-tree-original} } */
+/* { dg-additional-options { -fdump-tree-original -Wall -Wno-unknown-pragmas } } */

 #define COMMON_DIRECTIVE omp parallel for
 #define COLLAPSE_1 collapse(1)
diff --git a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-1.c b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-1.c
index e2def212725..890b460f374 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-1.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-1.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options { -Wall -Wno-unknown-pragmas } } */
+
 #define COMMON_DIRECTIVE omp parallel masked taskloop
 #define COLLAPSE_1 collapse(1)
 #define COLLAPSE_2 collapse(2)
diff --git a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-simd-1.c b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-simd-1.c
index ce601555cfb..74f6271504a 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-simd-1.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-simd-1.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options { -Wall -Wno-unknown-pragmas } } */
+
 #define COMMON_DIRECTIVE omp parallel masked taskloop simd
 #define COLLAPSE_1 collapse(1)
 #define COLLAPSE_2 collapse(2)
diff --git a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-parallel-for-1.c b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-parallel-for-1.c
index 365b39ba385..8138ea57f38 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-parallel-for-1.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-parallel-for-1.c
@@ -1,4 +1,4 @@
-/* { dg-additional-options {-fdump-tree-original} } */
+/* { dg-additional-options { -fdump-tree-original -Wall -Wno-unknown-pragmas } } */

 #define COMMON_DIRECTIVE omp target parallel for map(tofrom:result[0:dim0*dim1]) map(to:matrix1[0:dim0*dim1], matrix2[0:dim0*dim1])
 #define COLLAPSE_1 collapse(1)
diff --git a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-teams-distribute-parallel-for-1.c b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-teams-distribute-parallel-for-1.c
index 8afe34874c9..d4d162d9c2b 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-teams-distribute-parallel-for-1.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-teams-distribute-parallel-for-1.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options { -Wall -Wno-unknown-pragmas } } */
+
 #define COMMON_DIRECTIVE omp target teams distribute parallel for map(tofrom:result[:dim0*dim1]) map(to:matrix1[0:dim0*dim1], matrix2[0:dim0*dim1])
 #define COLLAPSE_1 collapse(1)
 #define COLLAPSE_2 collapse(2)
diff --git a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-taskloop-1.c b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-taskloop-1.c
index bbc78b39db0..28edb6ce83e 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-taskloop-1.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-taskloop-1.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options { -Wall -Wno-unknown-pragmas } } */
+
 #define COMMON_DIRECTIVE omp taskloop
 #define COLLAPSE_1 collapse(1)
 #define COLLAPSE_2 collapse(2)
diff --git a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-teams-distribute-parallel-for-1.c b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-teams-distribute-parallel-for-1.c
index 3a58e479374..481a20a18d0 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-teams-distribute-parallel-for-1.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-teams-distribute-parallel-for-1.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options { -Wall -Wno-unknown-pragmas } } */
+
 #define COMMON_DIRECTIVE omp teams distribute parallel for
 #define COLLAPSE_1 collapse(1)
 #define COLLAPSE_2 collapse(2)
diff --git a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-simd-1.c b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-simd-1.c
index e5155dcf76d..200ddd859f5 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-simd-1.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-simd-1.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options { -Wall -Wno-unknown-pragmas } } */
+
 #define COMMON_DIRECTIVE omp simd
 #define COLLAPSE_1 collapse(1)
 #define COLLAPSE_2 collapse(2)
diff --git a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/unroll-1.c b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/unroll-1.c
index 2ac0fff16af..eb5d3d77eb8 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/unroll-1.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/unroll-1.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options { -Wall -Wno-unknown-pragmas } } */
+
 #include <stdio.h>

 int compute_sum1 ()
@@ -11,7 +13,7 @@ int compute_sum1 ()
     sum++;

   if (j != 7)
-    __builtin_abort;
+    __builtin_abort ();

   return sum;
 }
@@ -27,7 +29,7 @@ int compute_sum2()
     sum++;

   if (j != 7)
-    __builtin_abort;
+    __builtin_abort ();

   return sum;
 }
@@ -43,7 +45,7 @@ int compute_sum3()
     sum++;

   if (j != 7)
-    __builtin_abort;
+    __builtin_abort ();

   return sum;
 }
diff --git a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/unroll-non-rect-1.c b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/unroll-non-rect-1.c
index 2f9924aea1f..7bd9b906235 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/unroll-non-rect-1.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/loop-transforms/unroll-non-rect-1.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options { -Wall -Wno-unknown-pragmas } } */
+
 #include <stdio.h>
 #include <stdlib.h>

--
2.36.1

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 3/4] openmp: Fix diagnostic message for "omp unroll"
  2023-07-28 13:04 ` [PATCH 0/4] openmp: loop transformation fixes Frederik Harwath
  2023-07-28 13:04   ` [PATCH 1/4] openmp: Fix loop transformation tests Frederik Harwath
  2023-07-28 13:04   ` [PATCH 2/4] openmp: Fix initialization for 'unroll full' Frederik Harwath
@ 2023-07-28 13:04   ` Frederik Harwath
  2023-07-28 13:04   ` [PATCH 4/4] openmp: Fix number of iterations computation for "omp unroll full" Frederik Harwath
  3 siblings, 0 replies; 21+ messages in thread
From: Frederik Harwath @ 2023-07-28 13:04 UTC (permalink / raw)
  To: gcc-patches, tobias, jakub

gcc/ChangeLog:

        * omp-transform-loops.cc (print_optimized_unroll_partial_msg):
        Output "omp unroll partial" instead of "omp unroll auto".
        (optimize_transformation_clauses): Likewise.

libgomp/ChangeLog:

        * testsuite/libgomp.fortran/loop-transforms/unroll-6.f90: Adjust.

gcc/testsuite/ChangeLog:

        * gfortran.dg/gomp/loop-transforms/unroll-8.f90: Adjust.
        * gfortran.dg/gomp/loop-transforms/unroll-9.f90: Adjust.
---
 gcc/omp-transform-loops.cc                                    | 4 ++--
 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-8.f90   | 2 +-
 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-9.f90   | 2 +-
 .../testsuite/libgomp.fortran/loop-transforms/unroll-6.f90    | 4 ++--
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/omp-transform-loops.cc b/gcc/omp-transform-loops.cc
index 275a5260dae..c8853bcee89 100644
--- a/gcc/omp-transform-loops.cc
+++ b/gcc/omp-transform-loops.cc
@@ -1423,7 +1423,7 @@ print_optimized_unroll_partial_msg (tree c)
   tree unroll_factor = OMP_CLAUSE_UNROLL_PARTIAL_EXPR (c);
   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, dump_loc,
                   "replaced consecutive %<omp unroll%> directives by "
-                  "%<omp unroll auto(" HOST_WIDE_INT_PRINT_UNSIGNED
+                  "%<omp unroll partial(" HOST_WIDE_INT_PRINT_UNSIGNED
                   ")%>\n", tree_to_uhwi (unroll_factor));
 }

@@ -1483,7 +1483,7 @@ optimize_transformation_clauses (tree clauses)

                  dump_printf_loc (
                      MSG_OPTIMIZED_LOCATIONS, dump_loc,
-                     "removed useless %<omp unroll auto%> directives "
+                     "removed useless %<omp unroll partial%> directives "
                      "preceding 'omp unroll full'\n");
                }
            }
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-8.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-8.f90
index fd687890ee6..dab3f0fb5cf 100644
--- a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-8.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-8.f90
@@ -5,7 +5,7 @@ subroutine test1
   implicit none
   integer :: i
   !$omp parallel do collapse(1)
-  !$omp unroll partial(4) ! { dg-optimized {replaced consecutive 'omp unroll' directives by 'omp unroll auto\(24\)'} }
+  !$omp unroll partial(4) ! { dg-optimized {replaced consecutive 'omp unroll' directives by 'omp unroll partial\(24\)'} }
   !$omp unroll partial(3)
   !$omp unroll partial(2)
   !$omp unroll partial(1)
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-9.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-9.f90
index 928ca44e811..91e13ff1b37 100644
--- a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-9.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-9.f90
@@ -4,7 +4,7 @@
 subroutine test1
   implicit none
   integer :: i
-  !$omp unroll full ! { dg-optimized {removed useless 'omp unroll auto' directives preceding 'omp unroll full'} }
+  !$omp unroll full ! { dg-optimized {removed useless 'omp unroll partial' directives preceding 'omp unroll full'} }
   !$omp unroll partial(3)
   !$omp unroll partial(2)
   !$omp unroll partial(1)
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-6.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-6.f90
index 1df8ce8d5bb..b953ce31b5b 100644
--- a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-6.f90
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-6.f90
@@ -22,7 +22,7 @@ contains

     sum = 0
     !$omp parallel do reduction(+:sum) lastprivate(i)
-    !$omp unroll partial(5) ! { dg-optimized {replaced consecutive 'omp unroll' directives by 'omp unroll auto\(50\)'} }
+    !$omp unroll partial(5) ! { dg-optimized {replaced consecutive 'omp unroll' directives by 'omp unroll partial\(50\)'} }
     !$omp unroll partial(10)
     do i = 1,n,step
        sum = sum + 1
@@ -36,7 +36,7 @@ contains
     sum = 0
     !$omp parallel do reduction(+:sum) lastprivate(i)
     do i = 1,n,step
-       !$omp unroll full ! { dg-optimized {removed useless 'omp unroll auto' directives preceding 'omp unroll full'} }
+       !$omp unroll full ! { dg-optimized {removed useless 'omp unroll partial' directives preceding 'omp unroll full'} }
        !$omp unroll partial(10)
        do j = 1, 1000
           sum = sum + 1
--
2.36.1

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 4/4] openmp: Fix number of iterations computation for "omp unroll full"
  2023-07-28 13:04 ` [PATCH 0/4] openmp: loop transformation fixes Frederik Harwath
                     ` (2 preceding siblings ...)
  2023-07-28 13:04   ` [PATCH 3/4] openmp: Fix diagnostic message for "omp unroll" Frederik Harwath
@ 2023-07-28 13:04   ` Frederik Harwath
  3 siblings, 0 replies; 21+ messages in thread
From: Frederik Harwath @ 2023-07-28 13:04 UTC (permalink / raw)
  To: gcc-patches, tobias, jakub

gcc/ChangeLog:

        * omp-transform-loops.cc (gomp_for_number_of_iterations):
        Always compute "final - init" and do not take absolute value.
        Identify non-iterating and infinite loops for constant init,
        final, step values for better diagnostic messages, consistent
        behaviour in those corner cases, and better testability.
        (gomp_for_constant_iterations_p): Add new argument to pass
        on information about infinite loops, and ...
        (full_unroll): ... use from here to emit a warning and remove
        unrolled, known infinite loops consistently.
        (process_omp_for): Only print dump message if loop has not
        been removed by transformation.

gcc/testsuite/ChangeLog:

        * c-c++-common/gomp/loop-transforms/unroll-8.c: New test.
---
 gcc/omp-transform-loops.cc                    | 94 ++++++++++++++-----
 .../gomp/loop-transforms/unroll-8.c           | 76 +++++++++++++++
 2 files changed, 146 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-8.c

diff --git a/gcc/omp-transform-loops.cc b/gcc/omp-transform-loops.cc
index c8853bcee89..b0645397641 100644
--- a/gcc/omp-transform-loops.cc
+++ b/gcc/omp-transform-loops.cc
@@ -153,20 +153,27 @@ subst_defs (tree expr, gimple_seq seq)
   return expr;
 }

-/* Return an expression for the number of iterations of the outermost loop of
-   OMP_FOR. */
+/* Return an expression for the number of iterations of the loop at
+   the given LEVEL of OMP_FOR.
+
+   If the expression is a negative constant, this means that the loop
+   is infinite. This can only be recognized for loops with constant
+   initial, final, and step values.  In general, according to the
+   OpenMP specification, the behaviour is unspecified if the number of
+   iterations does not fit the types used for their computation, and
+   hence in particular if the loop is infinite. */

 tree
 gomp_for_number_of_iterations (const gomp_for *omp_for, size_t level)
 {
   gcc_assert (!non_rectangular_p (omp_for));
-
   tree init = gimple_omp_for_initial (omp_for, level);
   tree final = gimple_omp_for_final (omp_for, level);
   tree_code cond = gimple_omp_for_cond (omp_for, level);
   tree index = gimple_omp_for_index (omp_for, level);
   tree type = gomp_for_iter_count_type (index, final);
-  tree step = TREE_OPERAND (gimple_omp_for_incr (omp_for, level), 1);
+  tree incr = gimple_omp_for_incr (omp_for, level);
+  tree step = omp_get_for_step_from_incr (gimple_location (omp_for), incr);

   init = subst_defs (init, gimple_omp_for_pre_body (omp_for));
   init = fold (init);
@@ -181,34 +188,64 @@ gomp_for_number_of_iterations (const gomp_for *omp_for, size_t level)
       diff_type = ptrdiff_type_node;
     }

-  tree diff;
-  if (cond == GT_EXPR)
-    diff = fold_build2 (minus_code, diff_type, init, final);
-  else if (cond == LT_EXPR)
-    diff = fold_build2 (minus_code, diff_type, final, init);
-  else
-    gcc_unreachable ();

-  diff = fold_build2 (CEIL_DIV_EXPR, type, diff, step);
-  diff = fold_build1 (ABS_EXPR, type, diff);
+  /* Identify a simple case in which the loop does not iterate. The
+     computation below could not tell this apart from an infinite
+     loop, hence we handle this separately for better diagnostic
+     messages. */
+  gcc_assert (cond == GT_EXPR || cond == LT_EXPR);
+  if (TREE_CONSTANT (init) && TREE_CONSTANT (final)
+      && ((cond == GT_EXPR && tree_int_cst_le (init, final))
+         || (cond == LT_EXPR && tree_int_cst_le (final, init))))
+    return build_int_cst (diff_type, 0);
+
+  tree diff = fold_build2 (minus_code, diff_type, final, init);
+
+  /* Divide diff by the step.
+
+     We could always use CEIL_DIV_EXPR since only non-negative results
+     correspond to valid number of iterations and the behaviour is
+     unspecified by the spec otherwise. But we try to get the rounding
+     right for constant negative values to identify infinite loops
+     more precisely for better warnings. */
+  tree_code div_expr = CEIL_DIV_EXPR;
+  if (TREE_CONSTANT (diff) && TREE_CONSTANT (step))
+    {
+      bool diff_is_neg = tree_int_cst_lt (diff, size_zero_node);
+      bool step_is_neg = tree_int_cst_lt (step, size_zero_node);
+      if ((diff_is_neg && !step_is_neg)
+         || (!diff_is_neg && step_is_neg))
+       div_expr = FLOOR_DIV_EXPR;
+    }

+  diff = fold_build2 (div_expr, type, diff, step);
   return diff;
 }

-/* Return true if the expression representing the number of iterations for
-   OMP_FOR is a constant expression, false otherwise. */
+/* Return true if the expression representing the number of iterations
+   for OMP_FOR is a non-negative constant and set ITERATIONS to the
+   value of that expression. Otherwise, return false.  Set INFINITE to
+   true if the number of iterations was recognized to be infinite. */

 bool
 gomp_for_constant_iterations_p (gomp_for *omp_for,
-                               unsigned HOST_WIDE_INT *iterations)
+                               unsigned HOST_WIDE_INT *iterations,
+                               bool *infinite = NULL)
 {
   tree t = gomp_for_number_of_iterations (omp_for, 0);
-  if (!TREE_CONSTANT (t)
-      || !tree_fits_uhwi_p (t))
+  if (!TREE_CONSTANT (t))
     return false;

-  *iterations = tree_to_uhwi (t);
-  return true;
+  if (infinite &&
+      tree_int_cst_lt (t, size_zero_node))
+    *infinite = true;
+  else if (tree_fits_uhwi_p (t))
+    {
+      *iterations = tree_to_uhwi (t);
+      return true;
+    }
+
+  return false;
 }

 static gimple_seq
@@ -525,10 +562,18 @@ full_unroll (gomp_for *omp_for, location_t loc, walk_ctx *ctx ATTRIBUTE_UNUSED)
 {
   tree init = gimple_omp_for_initial (omp_for, 0);
   unsigned HOST_WIDE_INT niter = 0;
-  if (!gomp_for_constant_iterations_p (omp_for, &niter))
+  bool infinite = false;
+  bool constant = gomp_for_constant_iterations_p (omp_for, &niter, &infinite);
+
+  if (infinite)
+    {
+      warning_at (loc, 0, "Cannot apply full unrolling to infinite loop");
+      return NULL;
+    }
+  if (!constant)
     {
       error_at (loc, "Cannot apply full unrolling to loop with "
-                    "non-constant number of iterations");
+               "non-constant number of iterations");
       return omp_for;
     }

@@ -1595,8 +1640,9 @@ process_omp_for (gomp_for *omp_for, gimple_seq *containing_seq, walk_ctx *ctx)
   if (!dump_enabled_p () || !(dump_flags & TDF_DETAILS))
     return;

-  dump_printf_loc (MSG_NOTE | MSG_PRIORITY_INTERNALS, transformed,
-                  "Transformed loop: %G\n\n", transformed);
+  if (transformed)
+    dump_printf_loc (MSG_NOTE | MSG_PRIORITY_INTERNALS, transformed,
+                    "Transformed loop: %G\n\n", transformed);
 }

 /* Traverse SEQ in depth-first order and apply the loop transformation
diff --git a/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-8.c b/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-8.c
new file mode 100644
index 00000000000..d49d7c42c87
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-8.c
@@ -0,0 +1,76 @@
+extern void dummy(int);
+
+void
+test1 ()
+{
+#pragma omp unroll full /* { dg-warning "Cannot apply full unrolling to infinite loop" } */
+  for (int i = 101; i > 100; i++)
+    dummy (i);
+}
+
+
+void
+test2 ()
+{
+#pragma omp unroll full
+  for (int i = 101; i != 100; i++)
+    dummy (i);
+}
+
+void
+test3 ()
+{
+#pragma omp unroll full /* { dg-warning "Cannot apply full unrolling to infinite loop" } */
+  for (int i = 0; i <= 0; i--)
+    dummy (i);
+}
+
+void
+test4 ()
+{
+#pragma omp unroll full /* { dg-warning "Cannot apply full unrolling to infinite loop" } */
+  for (int i = 101; i > 100; i=i+2)
+    dummy (i);
+}
+
+void
+test5 ()
+{
+#pragma omp unroll full /* { dg-warning "Cannot apply full unrolling to infinite loop" } */
+  for (int i = -101; i < 100; i=i-10)
+    dummy (i);
+}
+
+void
+test6 ()
+{
+#pragma omp unroll full /* { dg-warning "Cannot apply full unrolling to infinite loop" } */
+  for (int i = -101; i < 100; i=i-300)
+    dummy (i);
+}
+
+void
+test7 ()
+{
+#pragma omp unroll full /* { dg-warning "Cannot apply full unrolling to infinite loop" } */
+  for (int i = 101; i > -100; i=i+300)
+    dummy (i);
+
+  /* Loop does not iterate, hence no warning. */
+#pragma omp unroll full
+  for (int i = 101; i > 101; i=i+300)
+    dummy (i);
+}
+
+void
+test8 ()
+{
+#pragma omp unroll full /* { dg-warning "Cannot apply full unrolling to infinite loop" } */
+  for (int i = -21; i < -20; i=i-40)
+    dummy (i);
+
+  /* Loop does not iterate, hence no warning. */
+#pragma omp unroll full
+  for (int i = -21; i > 20; i=i-40)
+    dummy (i);
+}
--
2.36.1

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2023-07-28 13:05 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-24 15:30 [PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives Frederik Harwath
2023-03-24 15:30 ` [PATCH 1/7] openmp: Add Fortran support for "omp unroll" directive Frederik Harwath
2023-04-01  8:42   ` Thomas Schwinge
2023-04-06 13:07     ` Frederik Harwath
2023-03-24 15:30 ` [PATCH 2/7] openmp: Add C/C++ " Frederik Harwath
2023-03-24 15:30 ` [PATCH 3/7] openacc: Rename OMP_CLAUSE_TILE to OMP_CLAUSE_OACC_TILE Frederik Harwath
2023-03-24 15:30 ` [PATCH 4/7] openmp: Add Fortran support for "omp tile" Frederik Harwath
2023-03-24 15:30 ` [PATCH 5/7] openmp: Add C/C++ " Frederik Harwath
2023-03-24 15:30 ` [PATCH 6/7] openmp: Add Fortran support for loop transformations on inner loops Frederik Harwath
2023-03-24 15:30 ` [PATCH 7/7] openmp: Add C/C++ " Frederik Harwath
2023-05-15 10:19 ` [PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives Jakub Jelinek
2023-05-15 11:03   ` Jakub Jelinek
2023-05-16  9:45   ` Frederik Harwath
2023-05-16 11:00     ` Jakub Jelinek
2023-05-17 11:55       ` Frederik Harwath
2023-05-22 14:20         ` Jakub Jelinek
2023-07-28 13:04 ` [PATCH 0/4] openmp: loop transformation fixes Frederik Harwath
2023-07-28 13:04   ` [PATCH 1/4] openmp: Fix loop transformation tests Frederik Harwath
2023-07-28 13:04   ` [PATCH 2/4] openmp: Fix initialization for 'unroll full' Frederik Harwath
2023-07-28 13:04   ` [PATCH 3/4] openmp: Fix diagnostic message for "omp unroll" Frederik Harwath
2023-07-28 13:04   ` [PATCH 4/4] openmp: Fix number of iterations computation for "omp unroll full" Frederik Harwath

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).