From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180]) by sourceware.org (Postfix) with ESMTPS id 4641A3858C50; Fri, 24 Mar 2023 15:51:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4641A3858C50 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.98,288,1673942400"; d="scan'208";a="274531" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa3.mentor.iphmx.com with ESMTP; 24 Mar 2023 07:31:21 -0800 IronPort-SDR: 7JtpxPoxVWZGGnlJ8/XVAKNkjDX4KRJWys6wszrpD70XOKGncTINR1M48i5Jbyk+lhEqnH8mlQ wbXWV0qe5KnSXzffU2A5yyffn43EgGqBOqD+RGC7z4SBwcT37rx9OgiWcjSN9ILenMzjD6wEWj WIlFm3lvg98muRN6FL+QPR1yt5cELEG1hz139gmj5ThhJUUF3NRH7rdOZdUxIOnWgMsmu7BBuv 1VTTUk+3mJq3NfWOuvpqlZno1SENZ06sUphyzi+PWjd5ZDNJNcVUAY/SGcLuJVqWzRW+qfg3bl CvQ= From: Frederik Harwath To: , , , Subject: [PATCH 4/7] openmp: Add Fortran support for "omp tile" Date: Fri, 24 Mar 2023 16:30:42 +0100 Message-ID: <20230324153046.3996092-5-frederik@codesourcery.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20230324153046.3996092-1-frederik@codesourcery.com> References: <20230324153046.3996092-1-frederik@codesourcery.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-14.mgc.mentorg.com (139.181.222.14) To svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00,GIT_PATCH_0,HEADER_FROM_DIFFERENT_DOMAINS,KAM_DMARC_STATUS,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This commit implements the Fortran front end support for the "omp tile" directive and the corresponding middle end transformation. gcc/fortran/ChangeLog: * gfortran.h (enum gfc_statement): Add ST_OMP_TILE, ST_OMP_END_TILE= . (enum gfc_exec_op): Add EXEC_OMP_TILE. (loop_transform_p): New declaration. (struct gfc_omp_clauses): Add "tile_sizes" field. * dump-parse-tree.cc (show_omp_clauses): Handle "tile_sizes" dumpin= g. (show_omp_node): Handle EXEC_OMP_TILE. (show_code_node): Likewise. * match.h (gfc_match_omp_tile): New declaration. * openmp.cc (gfc_free_omp_clauses): Free "tile_sizes" field. (match_tile_sizes): New function. (OMP_TILE_CLAUSES): New macro. (gfc_match_omp_tile): New function. (resolve_omp_do): Handle EXEC_OMP_TILE. (resolve_omp_tile): New function. (omp_code_to_statement): Handle EXEC_OMP_TILE. (gfc_resolve_omp_directive): Likewise. * parse.cc (decode_omp_directive): Handle ST_OMP_END_TILE and ST_OMP_TILE. (next_statement): Handle ST_OMP_TILE. (gfc_ascii_statement): Likewise. (parse_omp_do): Likewise. (parse_executable): Likewise. * resolve.cc (gfc_resolve_blocks): Handle EXEC_OMP_TILE. (gfc_resolve_code): Likewise. * st.cc (gfc_free_statement): Likewise. * trans-openmp.cc (gfc_trans_omp_clauses): Handle "tile_sizes" fiel= d. (loop_transform_p): New function. (gfc_expr_list_len): New function. (gfc_trans_omp_do): Handle EXEC_OMP_TILE. (gfc_trans_omp_directive): Likewise. * trans.cc (trans_code): Likewise. gcc/ChangeLog: * gimplify.cc (gimplify_scan_omp_clauses): Handle OMP_CLAUSE_TILE. (gimplify_adjust_omp_clauses): Likewise. (gimplify_omp_loop): Likewise. * omp-transform-loops.cc (walk_omp_for_loops): New declaration. (subst_var_in_op): New function. (subst_var): New function. (gomp_for_number_of_iterations): Adjust. (gomp_for_iter_count_type): New function. (gimple_assign_rhs_to_tree): New function. (subst_defs): New function. (gomp_for_uncollapse): Adjust. (transformation_clause_p): Add OMP_CLAUSE_TILE. (tile): New function. (transform_gomp_for): Handle OMP_CLAUSE_TILE. (optimize_transformation_clauses): Handle OMP_CLAUSE_TILE. * omp-general.cc (omp_loop_transform_clauses_p): Add OMP_CLAUSE_TIL= E. * tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_TILE. * tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_TILE. * tree.cc: Add OMP_CLAUSE_TILE. * tree.h (OMP_CLAUSE_TILE_SIZES): New macro. libgomp/ChangeLog: * testsuite/libgomp.fortran/loop-transforms/tile-1.f90: New test. * testsuite/libgomp.fortran/loop-transforms/tile-2.f90: New test. * testsuite/libgomp.fortran/loop-transforms/tile-unroll-1.f90: New = test. * testsuite/libgomp.fortran/loop-transforms/tile-unroll-2.f90: New = test. * testsuite/libgomp.fortran/loop-transforms/tile-unroll-3.f90: New = test. * testsuite/libgomp.fortran/loop-transforms/tile-unroll-4.f90: New = test. * testsuite/libgomp.fortran/loop-transforms/unroll-tile-1.f90: New = test. * testsuite/libgomp.fortran/loop-transforms/unroll-tile-2.f90: New = test. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/loop-transforms/tile-1.f90: New test. * gfortran.dg/gomp/loop-transforms/tile-1a.f90: New test. * gfortran.dg/gomp/loop-transforms/tile-2.f90: New test. * gfortran.dg/gomp/loop-transforms/tile-3.f90: New test. * gfortran.dg/gomp/loop-transforms/tile-4.f90: New test. * gfortran.dg/gomp/loop-transforms/tile-unroll-1.f90: New test. * gfortran.dg/gomp/loop-transforms/unroll-tile-1.f90: New test. * gfortran.dg/gomp/loop-transforms/unroll-tile-2.f90: New test. --- gcc/fortran/dump-parse-tree.cc | 17 +- gcc/fortran/gfortran.h | 7 +- gcc/fortran/match.h | 1 + gcc/fortran/openmp.cc | 373 +++++++++++++----- gcc/fortran/parse.cc | 15 + gcc/fortran/resolve.cc | 3 + gcc/fortran/st.cc | 1 + gcc/fortran/trans-openmp.cc | 86 ++-- gcc/fortran/trans.cc | 1 + gcc/gimplify.cc | 3 + gcc/omp-general.cc | 2 +- gcc/omp-transform-loops.cc | 340 +++++++++++++++- .../gomp/loop-transforms/tile-1.f90 | 163 ++++++++ .../gomp/loop-transforms/tile-1a.f90 | 10 + .../gomp/loop-transforms/tile-2.f90 | 80 ++++ .../gomp/loop-transforms/tile-3.f90 | 18 + .../gomp/loop-transforms/tile-4.f90 | 95 +++++ .../gomp/loop-transforms/tile-unroll-1.f90 | 57 +++ .../gomp/loop-transforms/unroll-tile-1.f90 | 37 ++ .../gomp/loop-transforms/unroll-tile-2.f90 | 41 ++ gcc/tree-core.h | 3 + gcc/tree-pretty-print.cc | 8 + gcc/tree.cc | 7 +- gcc/tree.h | 3 + .../loop-transforms/unroll-full-tile.C | 84 ++++ .../loop-transforms/tile-1.f90 | 71 ++++ .../loop-transforms/tile-2.f90 | 117 ++++++ .../loop-transforms/tile-unroll-1.f90 | 112 ++++++ .../loop-transforms/tile-unroll-2.f90 | 71 ++++ .../loop-transforms/tile-unroll-3.f90 | 77 ++++ .../loop-transforms/tile-unroll-4.f90 | 75 ++++ .../loop-transforms/unroll-tile-1.f90 | 112 ++++++ .../loop-transforms/unroll-tile-2.f90 | 71 ++++ 33 files changed, 2042 insertions(+), 119 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-1.f= 90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-1a.= f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-2.f= 90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-3.f= 90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-4.f= 90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-unr= oll-1.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-t= ile-1.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-t= ile-2.f90 create mode 100644 libgomp/testsuite/libgomp.c++/loop-transforms/unroll-fu= ll-tile.C create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-= 1.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-= 2.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-= unroll-1.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-= unroll-2.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-= unroll-3.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-= unroll-4.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unrol= l-tile-1.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unrol= l-tile-2.f90 diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.c= c index e069aca1f1d..82183285954 100644 --- a/gcc/fortran/dump-parse-tree.cc +++ b/gcc/fortran/dump-parse-tree.cc @@ -2062,6 +2062,18 @@ show_omp_clauses (gfc_omp_clauses *omp_clauses) if (omp_clauses->unroll_partial_factor > 0) fprintf (dumpfile, "(%u)", omp_clauses->unroll_partial_factor); } + if (omp_clauses->tile_sizes) + { + gfc_expr_list *sizes; + fputs (" TILE SIZES(", dumpfile); + for (sizes =3D omp_clauses->tile_sizes; sizes; sizes =3D sizes->next= ) + { + show_expr (sizes->expr); + if (sizes->next) + fputs (", ", dumpfile); + } + fputc (')', dumpfile); + } } /* Show a single OpenMP or OpenACC directive node and everything underneat= h it @@ -2172,6 +2184,7 @@ show_omp_node (int level, gfc_code *c) name =3D "TEAMS DISTRIBUTE PARALLEL DO SIMD"; break; case EXEC_OMP_TEAMS_DISTRIBUTE_SIMD: name =3D "TEAMS DISTRIBUTE SIMD";= break; case EXEC_OMP_TEAMS_LOOP: name =3D "TEAMS LOOP"; break; + case EXEC_OMP_TILE: name =3D "TILE"; break; case EXEC_OMP_UNROLL: name =3D "UNROLL"; break; case EXEC_OMP_WORKSHARE: name =3D "WORKSHARE"; break; default: @@ -2249,6 +2262,7 @@ show_omp_node (int level, gfc_code *c) case EXEC_OMP_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD: case EXEC_OMP_TEAMS_DISTRIBUTE_SIMD: case EXEC_OMP_TEAMS_LOOP: + case EXEC_OMP_TILE: case EXEC_OMP_UNROLL: case EXEC_OMP_WORKSHARE: omp_clauses =3D c->ext.omp_clauses; @@ -2311,7 +2325,7 @@ show_omp_node (int level, gfc_code *c) d =3D d->block; } } - else if (c->op =3D=3D EXEC_OMP_UNROLL) + else if (c->op =3D=3D EXEC_OMP_UNROLL || c->op =3D=3D EXEC_OMP_TILE) show_code (level + 1, c->block !=3D NULL ? c->block->next : c->next); else show_code (level + 1, c->block->next); @@ -3491,6 +3505,7 @@ show_code_node (int level, gfc_code *c) case EXEC_OMP_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD: case EXEC_OMP_TEAMS_DISTRIBUTE_SIMD: case EXEC_OMP_TEAMS_LOOP: + case EXEC_OMP_TILE: case EXEC_OMP_UNROLL: case EXEC_OMP_WORKSHARE: show_omp_node (level, c); diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h index 5ef4a8907b0..8b4eadf9b4d 100644 --- a/gcc/fortran/gfortran.h +++ b/gcc/fortran/gfortran.h @@ -320,7 +320,8 @@ enum gfc_statement ST_OMP_ERROR, ST_OMP_ASSUME, ST_OMP_END_ASSUME, ST_OMP_ASSUMES, /* Note: gfc_match_omp_nothing returns ST_NONE. */ ST_OMP_NOTHING, ST_NONE, - ST_OMP_UNROLL, ST_OMP_END_UNROLL + ST_OMP_UNROLL, ST_OMP_END_UNROLL, + ST_OMP_TILE, ST_OMP_END_TILE }; /* Types of interfaces that we can have. Assignment interfaces are @@ -1550,6 +1551,7 @@ typedef struct gfc_omp_clauses struct gfc_expr *dist_chunk_size; struct gfc_expr *message; struct gfc_omp_assumptions *assume; + struct gfc_expr_list *tile_sizes; const char *critical_name; enum gfc_omp_default_sharing default_sharing; enum gfc_omp_atomic_op atomic_op; @@ -2977,7 +2979,7 @@ enum gfc_exec_op EXEC_OMP_TARGET_TEAMS_LOOP, EXEC_OMP_MASKED, EXEC_OMP_PARALLEL_MASKED, EXEC_OMP_PARALLEL_MASKED_TASKLOOP, EXEC_OMP_PARALLEL_MASKED_TASKLOOP_SIM= D, EXEC_OMP_MASKED_TASKLOOP, EXEC_OMP_MASKED_TASKLOOP_SIMD, EXEC_OMP_SCOPE, - EXEC_OMP_UNROLL, + EXEC_OMP_UNROLL, EXEC_OMP_TILE, EXEC_OMP_ERROR }; @@ -3874,6 +3876,7 @@ bool gfc_inline_intrinsic_function_p (gfc_expr *); /* trans-openmp.cc */ bool loop_transform_p (gfc_exec_op op); +int gfc_expr_list_len (gfc_expr_list *); /* bbt.cc */ typedef int (*compare_fn) (void *, void *); diff --git a/gcc/fortran/match.h b/gcc/fortran/match.h index 5640c725f09..d04e1cd66a4 100644 --- a/gcc/fortran/match.h +++ b/gcc/fortran/match.h @@ -226,6 +226,7 @@ match gfc_match_omp_teams_distribute_parallel_do_simd (= void); match gfc_match_omp_teams_distribute_simd (void); match gfc_match_omp_teams_loop (void); match gfc_match_omp_threadprivate (void); +match gfc_match_omp_tile (void); match gfc_match_omp_unroll (void); match gfc_match_omp_workshare (void); match gfc_match_omp_end_critical (void); diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc index ec707d977cd..1de61029768 100644 --- a/gcc/fortran/openmp.cc +++ b/gcc/fortran/openmp.cc @@ -191,6 +191,7 @@ gfc_free_omp_clauses (gfc_omp_clauses *c) i =3D=3D OMP_LIST_ALLOCATE); gfc_free_expr_list (c->wait_list); gfc_free_expr_list (c->tile_list); + gfc_free_expr_list (c->tile_sizes); free (CONST_CAST (char *, c->critical_name)); if (c->assume) { @@ -977,6 +978,76 @@ cleanup: return MATCH_ERROR; } +static match +match_tile_sizes (gfc_expr_list **list) +{ + gfc_expr_list *head, *tail, *p; + locus old_loc; + gfc_expr *expr; + match m; + + head =3D tail =3D NULL; + + old_loc =3D gfc_current_locus; + + m =3D gfc_match_char ('('); + if (m !=3D MATCH_YES) + goto syntax; + + for (;;) + { + m =3D gfc_match_expr (&expr); + if (m =3D=3D MATCH_YES) + { + p =3D gfc_get_expr_list (); + if (head =3D=3D NULL) + head =3D tail =3D p; + else + { + tail->next =3D p; + tail =3D tail->next; + } + int size =3D 0; + if (m =3D=3D MATCH_YES) + { + if (gfc_extract_int (expr, &size, 1)) + goto cleanup; + else if (size < 1) + { + gfc_error_now ("tile size not constant " + "positive integer at %C"); + goto cleanup; + } + tail->expr =3D expr; + } + goto next_item; + } + if (m =3D=3D MATCH_ERROR) + goto cleanup; + goto syntax; + + next_item: + if (gfc_match_char (')') =3D=3D MATCH_YES) + break; + if (gfc_match_char (',') !=3D MATCH_YES) + goto syntax; + } + + while (*list) + list =3D &(*list)->next; + + *list =3D head; + return MATCH_YES; + +syntax: + gfc_error ("Syntax error in 'tile sizes' list at %C"); + +cleanup: + gfc_free_expr_list (head); + gfc_current_locus =3D old_loc; + return MATCH_ERROR; +} + /* OpenMP clauses. */ enum omp_mask1 { @@ -1054,6 +1125,7 @@ enum omp_mask2 OMP_CLAUSE_UNROLL_FULL, /* OpenMP 5.1. */ OMP_CLAUSE_UNROLL_NONE, /* OpenMP 5.1. */ OMP_CLAUSE_UNROLL_PARTIAL, /* OpenMP 5.1. */ + OMP_CLAUSE_TILE, /* OpenMP 5.1. */ OMP_CLAUSE_ASYNC, OMP_CLAUSE_NUM_GANGS, OMP_CLAUSE_NUM_WORKERS, @@ -4310,7 +4382,8 @@ cleanup: omp_mask (OMP_CLAUSE_NOWAIT) #define OMP_UNROLL_CLAUSES \ (omp_mask (OMP_CLAUSE_UNROLL_FULL) | OMP_CLAUSE_UNROLL_PARTIAL) - +#define OMP_TILE_CLAUSES \ + (omp_mask (OMP_CLAUSE_TILE)) static match match_omp (gfc_exec_op op, const omp_mask mask) @@ -6409,6 +6482,16 @@ gfc_match_omp_teams_distribute_simd (void) | OMP_SIMD_CLAUSES); } +match +gfc_match_omp_tile (void) +{ + gfc_omp_clauses *c =3D gfc_get_omp_clauses(); + new_st.op =3D EXEC_OMP_TILE; + new_st.ext.omp_clauses =3D c; + + return match_tile_sizes (&c->tile_sizes); +} + match gfc_match_omp_unroll (void) { @@ -9289,75 +9372,6 @@ gfc_resolve_do_iterator (gfc_code *code, gfc_symbol = *sym, bool add_clause) } } - -static bool -omp_unroll_removes_loop_nest (gfc_code *code) -{ - gcc_assert (code->op =3D=3D EXEC_OMP_UNROLL); - if (!code->ext.omp_clauses) - return true; - - if (code->ext.omp_clauses->unroll_none) - { - gfc_warning (0, "!$OMP UNROLL without PARTIAL clause at %L turns loo= p " - "into a non-loop", - &code->loc); - return true; - } - if (code->ext.omp_clauses->unroll_full) - { - gfc_warning (0, "!$OMP UNROLL with FULL clause at %L turns loop into= a " - "non-loop", - &code->loc); - return true; - } - return false; -} - -static void -resolve_loop_transform_generic (gfc_code *code, const char *descr) -{ - gcc_assert (code->block); - - if (code->block->op =3D=3D EXEC_OMP_UNROLL - && !omp_unroll_removes_loop_nest (code->block)) - return; - - if (code->block->next->op =3D=3D EXEC_OMP_UNROLL - && !omp_unroll_removes_loop_nest (code->block->next)) - return; - - if (code->block->next->op =3D=3D EXEC_DO_WHILE) - { - gfc_error ("%s invalid around DO WHILE or DO without loop " - "control at %L", descr, &code->loc); - return; - } - if (code->block->next->op =3D=3D EXEC_DO_CONCURRENT) - { - gfc_error ("%s invalid around DO CONCURRENT loop at %L", - descr, &code->loc); - return; - } - - gfc_error ("missing canonical loop nest after %s at %L", - descr, &code->loc); - -} - -static void -resolve_omp_unroll (gfc_code *code) -{ - if (!code->block || code->block->op =3D=3D EXEC_DO) - return; - - if (code->block->next->op =3D=3D EXEC_DO) - return; - - resolve_loop_transform_generic (code, "!$OMP UNROLL"); -} - - static void handle_local_var (gfc_symbol *sym) { @@ -9488,6 +9502,106 @@ bound_expr_is_canonical (gfc_code *code, int depth,= gfc_expr *expr, return false; } +static bool +omp_unroll_removes_loop_nest (gfc_code *code) +{ + gcc_assert (code->op =3D=3D EXEC_OMP_UNROLL); + if (!code->ext.omp_clauses) + return true; + + if (code->ext.omp_clauses->unroll_none) + { + gfc_warning (0, "!$OMP UNROLL without PARTIAL clause at %L turns loo= p " + "into a non-loop", + &code->loc); + return true; + } + if (code->ext.omp_clauses->unroll_full) + { + gfc_warning (0, "!$OMP UNROLL with FULL clause at %L turns loop into= a " + "non-loop", + &code->loc); + return true; + } + return false; +} + +static gfc_code * +resolve_nested_loop_transforms (gfc_code *code, const char *name, + int required_depth, locus *loc) +{ + if (!code) + return code; + + bool error =3D false; + while (loop_transform_p (code->op)) + { + if (!error && code->op =3D=3D EXEC_OMP_UNROLL) + { + if (omp_unroll_removes_loop_nest (code)) + { + gfc_error ("missing canonical loop nest after %s at %L", name= , + loc); + error =3D true; + } + else if (required_depth > 1) + { + gfc_error ("loop nest depth after !$OMP UNROLL at %L is insuf= ficient " + "for outer %s", &code->loc, name); + error =3D true; + } + } + else if (!error && code->op =3D=3D EXEC_OMP_TILE + && required_depth > gfc_expr_list_len (code->ext.omp_clauses= ->tile_sizes)) + { + gfc_error ("loop nest depth after !$OMP TILE at %L is insuffi= cient " + "for outer %s", &code->loc, name); + error =3D true; + } + + if (code->block) + code =3D code->block->next; + else + code =3D code->next; + } + gcc_assert (!loop_transform_p (code->op)); + + return code; +} + +static void +resolve_omp_unroll (gfc_code *code) +{ + const char *descr =3D "!$OMP UNROLL"; + locus *loc =3D &code->loc; + + if (!code->block || code->block->op =3D=3D EXEC_DO) + return; + + code =3D resolve_nested_loop_transforms (code->block->next, descr, 1, + &code->loc); + + if (code->op =3D=3D EXEC_DO) + return; + + if (code->op =3D=3D EXEC_DO_WHILE) + { + gfc_error ("%s invalid around DO WHILE or DO without loop " + "control at %L", descr, loc); + return; + } + + if (code->op =3D=3D EXEC_DO_CONCURRENT) + { + gfc_error ("%s invalid around DO CONCURRENT loop at %L", + descr, loc); + return; + } + + gfc_error ("missing canonical loop nest after %s at %L", + descr, loc); +} + static void resolve_omp_do (gfc_code *code) { @@ -9592,30 +9706,13 @@ resolve_omp_do (gfc_code *code) break; case EXEC_OMP_TEAMS_LOOP: name =3D "!$OMP TEAMS LOOP"; break; case EXEC_OMP_UNROLL: name =3D "!$OMP UNROLL"; break; + case EXEC_OMP_TILE: name =3D "!$OMP TILE"; break; default: gcc_unreachable (); } if (code->ext.omp_clauses) resolve_omp_clauses (code, code->ext.omp_clauses, NULL); - do_code =3D code->block->next; - /* Move forward over any loop transformation directives to find the loop= . */ - bool error =3D false; - while (do_code->op =3D=3D EXEC_OMP_UNROLL) - { - if (!error && omp_unroll_removes_loop_nest (do_code)) - { - gfc_error ("missing canonical loop nest after %s at %L", name, - &code->loc); - error =3D true; - } - if (do_code->block) - do_code =3D do_code->block->next; - else - do_code =3D do_code->next; - } - gcc_assert (do_code->op !=3D EXEC_OMP_UNROLL); - if (code->ext.omp_clauses->orderedc) collapse =3D code->ext.omp_clauses->orderedc; else @@ -9630,6 +9727,9 @@ resolve_omp_do (gfc_code *code) depth and treats any further inner loops as the final-loop-body. So here we also check canonical loop nest form only for the number of outer loops specified by the COLLAPSE clause too. */ + do_code =3D resolve_nested_loop_transforms (code->block->next, name, col= lapse, + &code->loc); + for (i =3D 1; i <=3D collapse; i++) { gfc_symbol *start_var =3D NULL, *end_var =3D NULL; @@ -9745,6 +9845,98 @@ resolve_omp_do (gfc_code *code) } } +static void +resolve_omp_tile (gfc_code *code) +{ + gfc_code *do_code, *c; + gfc_symbol *dovar; + const char *name =3D "!$OMP TILE"; + + unsigned num_loops =3D 0; + gcc_assert (code->ext.omp_clauses->tile_sizes); + for (gfc_expr_list *el =3D code->ext.omp_clauses->tile_sizes; el; + el =3D el->next) + num_loops++; + + do_code =3D resolve_nested_loop_transforms (code, name, num_loops, &code= ->loc); + + for (unsigned i =3D 1; i <=3D num_loops; i++) + { + if (do_code->op =3D=3D EXEC_DO_WHILE) + { + gfc_error ("%s cannot be a DO WHILE or DO without loop control " + "at %L", name, &do_code->loc); + break; + } + if (do_code->op =3D=3D EXEC_DO_CONCURRENT) + { + gfc_error ("%s cannot be a DO CONCURRENT loop at %L", name, + &do_code->loc); + break; + } + if (do_code->op !=3D EXEC_DO) + { + gfc_error ("%s must be DO loop at %L", name, + &do_code->loc); + break; + } + + gcc_assert (do_code->op !=3D EXEC_OMP_UNROLL); + gcc_assert (do_code->op =3D=3D EXEC_DO); + dovar =3D do_code->ext.iterator->var->symtree->n.sym; + if (i > 1) + { + gfc_code *do_code2 =3D code; + while (loop_transform_p (do_code2->op)) + { + if (do_code2->block) + do_code2 =3D do_code2->block->next; + else + do_code2 =3D do_code2->next; + } + gcc_assert (!loop_transform_p (do_code2->op)); + + for (unsigned j =3D 1; j < i; j++) + { + gfc_symbol *ivar =3D do_code2->ext.iterator->var->symtree->n.= sym; + if (dovar =3D=3D ivar + || gfc_find_sym_in_expr (ivar, do_code->ext.iterator->sta= rt) + || gfc_find_sym_in_expr (ivar, do_code->ext.iterator->end= ) + || gfc_find_sym_in_expr (ivar, do_code->ext.iterator->ste= p)) + { + gfc_error ("%s loops don't form rectangular " + "iteration space at %L", name, &do_code->loc); + break; + } + do_code2 =3D do_code2->block->next; + } + } + for (c =3D do_code->next; c; c =3D c->next) + if (c->op !=3D EXEC_NOP && c->op !=3D EXEC_CONTINUE) + { + gfc_error ("%s loops not perfectly nested at %L", + name, &c->loc); + break; + } + if (i =3D=3D num_loops || c) + break; + do_code =3D do_code->block; + if (do_code->op !=3D EXEC_DO && do_code->op !=3D EXEC_DO_WHILE) + { + gfc_error ("not enough DO loops for %s at %L", + name, &code->loc); + break; + } + do_code =3D do_code->next; + if (do_code =3D=3D NULL + || (do_code->op !=3D EXEC_DO && do_code->op !=3D EXEC_DO_WHILE)) + { + gfc_error ("not enough DO loops for %s at %L", + name, &code->loc); + break; + } + } +} static gfc_statement omp_code_to_statement (gfc_code *code) @@ -9889,6 +10081,8 @@ omp_code_to_statement (gfc_code *code) return ST_OMP_PARALLEL_LOOP; case EXEC_OMP_DEPOBJ: return ST_OMP_DEPOBJ; + case EXEC_OMP_TILE: + return ST_OMP_TILE; case EXEC_OMP_UNROLL: return ST_OMP_UNROLL; default: @@ -10320,6 +10514,9 @@ gfc_resolve_omp_directive (gfc_code *code, gfc_name= space *ns) case EXEC_OMP_TEAMS_LOOP: resolve_omp_do (code); break; + case EXEC_OMP_TILE: + resolve_omp_tile (code); + break; case EXEC_OMP_UNROLL: resolve_omp_unroll (code); break; diff --git a/gcc/fortran/parse.cc b/gcc/fortran/parse.cc index 094678436b4..1cc5200f35a 100644 --- a/gcc/fortran/parse.cc +++ b/gcc/fortran/parse.cc @@ -1009,6 +1009,7 @@ decode_omp_directive (void) matcho ("end teams loop", gfc_match_omp_eos_error, ST_OMP_END_TEAMS_= LOOP); matcho ("end teams", gfc_match_omp_eos_error, ST_OMP_END_TEAMS); matchs ("end unroll", gfc_match_omp_eos_error, ST_OMP_END_UNROLL); + matchs ("end tile", gfc_match_omp_eos_error, ST_OMP_END_TILE); matcho ("end workshare", gfc_match_omp_end_nowait, ST_OMP_END_WORKSHARE); break; @@ -1137,6 +1138,7 @@ decode_omp_directive (void) matcho ("teams", gfc_match_omp_teams, ST_OMP_TEAMS); matchdo ("threadprivate", gfc_match_omp_threadprivate, ST_OMP_THREADPRIVATE); + matchs ("tile sizes", gfc_match_omp_tile, ST_OMP_TILE); break; case 'u': matchs ("unroll", gfc_match_omp_unroll, ST_OMP_UNROLL); @@ -1729,6 +1731,7 @@ next_statement (void) case ST_OMP_TARGET_PARALLEL_LOOP: case ST_OMP_TARGET_TEAMS_LOOP: \ case ST_OMP_ASSUME: \ case ST_OMP_UNROLL: \ + case ST_OMP_TILE: \ case ST_CRITICAL: \ case ST_OACC_PARALLEL_LOOP: case ST_OACC_PARALLEL: case ST_OACC_KERNELS:= \ case ST_OACC_DATA: case ST_OACC_HOST_DATA: case ST_OACC_LOOP: \ @@ -2774,6 +2777,9 @@ gfc_ascii_statement (gfc_statement st, bool strip_sen= tinel) case ST_OMP_THREADPRIVATE: p =3D "!$OMP THREADPRIVATE"; break; + case ST_OMP_TILE: + p =3D "!$OMP TILE"; + break; case ST_OMP_UNROLL: p =3D "!$OMP UNROLL"; break; @@ -5214,6 +5220,11 @@ parse_omp_do (gfc_statement omp_st) num_unroll++; continue; } + else if (st =3D=3D ST_OMP_TILE) + { + accept_statement (st); + continue; + } else unexpected_statement (st); } @@ -5338,6 +5349,9 @@ parse_omp_do (gfc_statement omp_st) case ST_OMP_TEAMS_LOOP: omp_end_st =3D ST_OMP_END_TEAMS_LOOP; break; + case ST_OMP_TILE: + omp_end_st =3D ST_OMP_END_TILE; + break; case ST_OMP_UNROLL: omp_end_st =3D ST_OMP_END_UNROLL; break; @@ -6025,6 +6039,7 @@ parse_executable (gfc_statement st) case ST_OMP_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD: case ST_OMP_TEAMS_DISTRIBUTE_SIMD: case ST_OMP_TEAMS_LOOP: + case ST_OMP_TILE: case ST_OMP_UNROLL: st =3D parse_omp_do (st); if (st =3D=3D ST_IMPLIED_ENDDO) diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc index 46988ff281d..182aa18053c 100644 --- a/gcc/fortran/resolve.cc +++ b/gcc/fortran/resolve.cc @@ -11041,6 +11041,7 @@ gfc_resolve_blocks (gfc_code *b, gfc_namespace *ns) case EXEC_OMP_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD: case EXEC_OMP_TEAMS_LOOP: case EXEC_OMP_TEAMS_DISTRIBUTE_SIMD: + case EXEC_OMP_TILE: case EXEC_OMP_UNROLL: case EXEC_OMP_WORKSHARE: break; @@ -12198,6 +12199,7 @@ gfc_resolve_code (gfc_code *code, gfc_namespace *ns= ) case EXEC_OMP_LOOP: case EXEC_OMP_SIMD: case EXEC_OMP_TARGET_SIMD: + case EXEC_OMP_TILE: case EXEC_OMP_UNROLL: gfc_resolve_omp_do_blocks (code, ns); break; @@ -12695,6 +12697,7 @@ start: case EXEC_OMP_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD: case EXEC_OMP_TEAMS_DISTRIBUTE_SIMD: case EXEC_OMP_TEAMS_LOOP: + case EXEC_OMP_TILE: case EXEC_OMP_UNROLL: case EXEC_OMP_WORKSHARE: gfc_resolve_omp_directive (code, ns); diff --git a/gcc/fortran/st.cc b/gcc/fortran/st.cc index 6112831e621..cea874e4474 100644 --- a/gcc/fortran/st.cc +++ b/gcc/fortran/st.cc @@ -277,6 +277,7 @@ gfc_free_statement (gfc_code *p) case EXEC_OMP_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD: case EXEC_OMP_TEAMS_DISTRIBUTE_SIMD: case EXEC_OMP_TEAMS_LOOP: + case EXEC_OMP_TILE: case EXEC_OMP_UNROLL: case EXEC_OMP_WORKSHARE: gfc_free_omp_clauses (p->ext.omp_clauses); diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc index 73c416c951d..6936cd7f5ee 100644 --- a/gcc/fortran/trans-openmp.cc +++ b/gcc/fortran/trans-openmp.cc @@ -3913,6 +3913,24 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_c= lauses *clauses, omp_clauses =3D gfc_trans_add_clause (c, omp_clauses); } + if (clauses->tile_sizes) + { + vec *tvec; + gfc_expr_list *el; + + vec_alloc (tvec, 4); + + for (el =3D clauses->tile_sizes; el; el =3D el->next) + vec_safe_push (tvec, gfc_convert_expr_to_tree (block, el->expr)); + + c =3D build_omp_clause (gfc_get_location (&where), + OMP_CLAUSE_TILE); + OMP_CLAUSE_TILE_SIZES (c) =3D build_tree_list_vec (tvec); + omp_clauses =3D gfc_trans_add_clause (c, omp_clauses); + + tvec->truncate (0); + } + if (clauses->ordered) { c =3D build_omp_clause (gfc_get_location (&where), OMP_CLAUSE_ORDERE= D); @@ -5106,7 +5124,7 @@ gfc_trans_omp_cancel (gfc_code *code) bool loop_transform_p (gfc_exec_op op) { - return op =3D=3D EXEC_OMP_UNROLL; + return op =3D=3D EXEC_OMP_UNROLL || op =3D=3D EXEC_OMP_TILE; } static tree @@ -5280,6 +5298,16 @@ gfc_nonrect_loop_expr (stmtblock_t *pblock, gfc_se *= sep, int loop_n, return true; } +int +gfc_expr_list_len (gfc_expr_list *list) +{ + unsigned len =3D 0; + for (; list; list =3D list->next) + len++; + + return len; +} + static tree gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock, gfc_omp_clauses *do_clauses, tree par_clauses) @@ -5295,25 +5323,14 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, s= tmtblock_t *pblock, dovar_init *di; unsigned ix; vec *saved_doacross_steps =3D doacross_steps; - gfc_expr_list *tile =3D do_clauses ? do_clauses->tile_list : clauses->ti= le_list; gfc_code *orig_code =3D code; locus top_loc =3D code->loc; - - /* Both collapsed and tiled loops are lowered the same way. In - OpenACC, those clauses are not compatible, so prioritize the tile - clause, if present. */ - if (tile) - { - collapse =3D 0; - for (gfc_expr_list *el =3D tile; el; el =3D el->next) - collapse++; - } - - doacross_steps =3D NULL; - if (clauses->orderedc) - collapse =3D clauses->orderedc; - if (collapse <=3D 0) - collapse =3D 1; + gfc_expr_list *oacc_tile + =3D do_clauses ? do_clauses->tile_list : clauses->tile_list; + gfc_expr_list *omp_tile + =3D do_clauses ? do_clauses->tile_sizes : clauses->tile_sizes; + gcc_assert (!omp_tile || op =3D=3D EXEC_OMP_TILE); + gcc_assert (!(oacc_tile && omp_tile)); if (pblock =3D=3D NULL) { @@ -5321,21 +5338,42 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, s= tmtblock_t *pblock, pblock =3D █ } code =3D code->block->next; - gcc_assert (code->op =3D=3D EXEC_DO || code->op =3D=3D EXEC_OMP_UNROLL); + gcc_assert (code->op =3D=3D EXEC_DO || loop_transform_p (code->op)); /* Loop transformation directives surrounding the associated loop of an = "omp do" (or similar directive) are represented as clauses on the "omp do"= . */ loop_transform_clauses =3D NULL; - while (code->op =3D=3D EXEC_OMP_UNROLL) + int omp_tile_depth =3D gfc_expr_list_len (omp_tile); + while (loop_transform_p (code->op)) { tree clauses =3D gfc_trans_omp_clauses (pblock, code->ext.omp_clause= s, code->loc); - loop_transform_clauses =3D chainon (loop_transform_clauses, clauses)= ; + /* There might be several "!$omp tile" transformations surrounding t= he + loop. Use the innermost one which must have the largest tiling dep= th. + If an inner directive has a smaller tiling depth than an outer + directive, an error will be emitted in pass-omp_transform_loops. *= / + omp_tile_depth =3D gfc_expr_list_len (code->ext.omp_clauses->tile_si= zes); + + loop_transform_clauses =3D chainon (loop_transform_clauses, clauses)= ; code =3D code->block ? code->block->next : code->next; } - gcc_assert (code->op !=3D EXEC_OMP_UNROLL); + gcc_assert (!loop_transform_p (code->op)); gcc_assert (code->op =3D=3D EXEC_DO); + /* Both collapsed and tiled loops are lowered the same way. In + OpenACC, those clauses are not compatible, so prioritize the tile + clause, if present. */ + if (oacc_tile) + collapse =3D gfc_expr_list_len (oacc_tile); + + doacross_steps =3D NULL; + if (clauses->orderedc) + collapse =3D clauses->orderedc; + if (collapse <=3D 0) + collapse =3D 1; + + collapse =3D MAX (collapse, omp_tile_depth); + init =3D make_tree_vec (collapse); cond =3D make_tree_vec (collapse); incr =3D make_tree_vec (collapse); @@ -5346,7 +5384,7 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stm= tblock_t *pblock, on the simd construct and DO's clauses are translated elsewhere. */ do_clauses->sched_simd =3D false; - if (op =3D=3D EXEC_OMP_UNROLL) + if (loop_transform_p (op)) { /* This is a loop transformation on a loop which is not associated w= ith any other directive. Use the directive location instead of the loo= p @@ -5695,6 +5733,7 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stm= tblock_t *pblock, case EXEC_OMP_LOOP: stmt =3D make_node (OMP_LOOP); break; case EXEC_OMP_TASKLOOP: stmt =3D make_node (OMP_TASKLOOP); break; case EXEC_OACC_LOOP: stmt =3D make_node (OACC_LOOP); break; + case EXEC_OMP_TILE: stmt =3D make_node (OMP_LOOP_TRANS); break; case EXEC_OMP_UNROLL: stmt =3D make_node (OMP_LOOP_TRANS); break; default: gcc_unreachable (); } @@ -7793,6 +7832,7 @@ gfc_trans_omp_directive (gfc_code *code) case EXEC_OMP_LOOP: case EXEC_OMP_SIMD: case EXEC_OMP_TASKLOOP: + case EXEC_OMP_TILE: case EXEC_OMP_UNROLL: return gfc_trans_omp_do (code, code->op, NULL, code->ext.omp_clauses= , NULL); diff --git a/gcc/fortran/trans.cc b/gcc/fortran/trans.cc index 56ec59fe80e..94b23c3b77a 100644 --- a/gcc/fortran/trans.cc +++ b/gcc/fortran/trans.cc @@ -2520,6 +2520,7 @@ trans_code (gfc_code * code, tree cond) case EXEC_OMP_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD: case EXEC_OMP_TEAMS_DISTRIBUTE_SIMD: case EXEC_OMP_TEAMS_LOOP: + case EXEC_OMP_TILE: case EXEC_OMP_UNROLL: case EXEC_OMP_WORKSHARE: res =3D gfc_trans_omp_directive (code); diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc index 14616eb5316..4d504a12451 100644 --- a/gcc/gimplify.cc +++ b/gcc/gimplify.cc @@ -12105,6 +12105,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq= *pre_p, case OMP_CLAUSE_UNROLL_FULL: case OMP_CLAUSE_UNROLL_NONE: case OMP_CLAUSE_UNROLL_PARTIAL: + case OMP_CLAUSE_TILE: break; case OMP_CLAUSE_NOHOST: default: @@ -13076,6 +13077,7 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, gim= ple_seq body, tree *list_p, case OMP_CLAUSE_FINALIZE: case OMP_CLAUSE_INCLUSIVE: case OMP_CLAUSE_EXCLUSIVE: + case OMP_CLAUSE_TILE: case OMP_CLAUSE_UNROLL_FULL: case OMP_CLAUSE_UNROLL_NONE: case OMP_CLAUSE_UNROLL_PARTIAL: @@ -15134,6 +15136,7 @@ gimplify_omp_loop (tree *expr_p, gimple_seq *pre_p) } pc =3D &OMP_CLAUSE_CHAIN (*pc); break; + case OMP_CLAUSE_TILE: case OMP_CLAUSE_UNROLL_PARTIAL: case OMP_CLAUSE_UNROLL_FULL: case OMP_CLAUSE_UNROLL_NONE: diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc index 0f326128874..e568ba0703e 100644 --- a/gcc/omp-general.cc +++ b/gcc/omp-general.cc @@ -2264,7 +2264,7 @@ omp_loop_transform_clause_p (tree c) enum omp_clause_code code =3D OMP_CLAUSE_CODE (c); return (code =3D=3D OMP_CLAUSE_UNROLL_FULL || code =3D=3D OMP_CLAUSE_UNR= OLL_PARTIAL - || code =3D=3D OMP_CLAUSE_UNROLL_NONE); + || code =3D=3D OMP_CLAUSE_UNROLL_NONE || code =3D=3D OMP_CLAUSE_T= ILE); } /* Try to resolve declare variant, return the variant decl if it should diff --git a/gcc/omp-transform-loops.cc b/gcc/omp-transform-loops.cc index d845d0e4798..858a271261a 100644 --- a/gcc/omp-transform-loops.cc +++ b/gcc/omp-transform-loops.cc @@ -211,6 +211,9 @@ gomp_for_constant_iterations_p (gomp_for *omp_for, return true; } +static gimple_seq +expand_transformed_loop (gomp_for *omp_for); + /* Split a gomp_for that represents a collapsed loop-nest into single loops. The result is a gomp_for of the same kind which is not collapsed (i.e. gimple_omp_for_collapse (OMP_FOR) =3D=3D 1) and which contains ne= sted, @@ -220,7 +223,7 @@ gomp_for_constant_iterations_p (gomp_for *omp_for, FROM_DEPTH are left collapsed. */ static gomp_for* -gomp_for_uncollapse (gomp_for *omp_for, int from_depth =3D 0) +gomp_for_uncollapse (gomp_for *omp_for, int from_depth =3D 0, bool expand = =3D false) { int collapse =3D gimple_omp_for_collapse (omp_for); gcc_assert (from_depth < collapse); @@ -251,7 +254,11 @@ gomp_for_uncollapse (gomp_for *omp_for, int from_depth= =3D 0) gimple_omp_for_set_index (level_omp_for, 0, gimple_omp_for_index (omp_for, level)); - body =3D level_omp_for; + + if (expand) + body =3D expand_transformed_loop (level_omp_for); + else + body =3D level_omp_for; } omp_for->collapse =3D from_depth; @@ -808,6 +815,316 @@ canonicalize_conditions (gomp_for *omp_for) return new_decls; } +/* Execute the tiling transformation for OMP_FOR with the given TILE_SIZES= and + return the resulting gimple bind. TILE_SIZES must be a non-empty tree c= hain + of integer constants and the collapse of OMP_FOR must be at least the l= ength + of TILE_SIZES. TRANSFORMATION_CLAUSES are the loop transformations that + must be applied to OMP_FOR. Those are applied on the result of the tili= ng + transformation. LOC is the location for diagnostic messages. + + Example 1 + --------- + --------- + + Original loop + ------------- + + #pragma omp for + #pragma omp tile sizes(3) + for (i =3D 1; i <=3D n; i =3D i + 1) + { + body; + } + + Internally, the tile directive is represented as a clause on the + omp for, i.e. as #pragma omp for tile_sizes(3). + + Transformed loop + ---------------- + + #pragma omp for + for (.omp_tile_index =3D 1; .omp_tile_index < ceil(n/3); .omp_tile_inde= x =3D .omp_tile_index + 3) + { + D.4287 =3D .omp_tile_index + 3 + 1 + #pragma omp loop_transform + for (i =3D .omp_tile_index; i < D.4287; i =3D i + 1) + { + if (i.0 > n) + goto L.0 + body; + } + L_0: + } + + The outer loop is the "floor loop" and the inner loop is the "tile + loop". The tile loop is never in canonical loop nest form and + hence it cannot be associated with any loop construct. The + GCC-internal "omp loop transform" construct will be lowered after + the tiling transformation. + */ + +static gimple_seq +tile (gomp_for *omp_for, location_t loc, tree tile_sizes, + tree transformation_clauses, walk_ctx *ctx) +{ + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE | MSG_PRIORITY_INTERNALS, + dump_user_location_t::from_location_t (loc), + "Executing tile transformation %T:\n %G\n", + transformation_clauses, static_cast (omp_for= )); + + gimple_seq tile_loops =3D copy_gimple_seq_and_replace_locals (omp_for); + gimple_seq floor_loops =3D copy_gimple_seq_and_replace_locals (omp_for); + + size_t collapse =3D gimple_omp_for_collapse (omp_for); + size_t tiling_depth =3D list_length (tile_sizes); + tree clauses =3D gimple_omp_for_clauses (omp_for); + size_t clause_collapse =3D 1; + tree collapse_clause =3D NULL; + + if (tree c =3D omp_find_clause (clauses, OMP_CLAUSE_ORDERED)) + { + error_at (OMP_CLAUSE_LOCATION (c), + "% invalid in conjunction with %"); + return omp_for; + } + + if (tree c =3D omp_find_clause (clauses, OMP_CLAUSE_COLLAPSE)) + { + tree expr =3D OMP_CLAUSE_COLLAPSE_EXPR (c); + clause_collapse =3D tree_to_uhwi (expr); + collapse_clause =3D c; + } + + /* The 'omp tile' construct creates a canonical loop-nest whose nesting = depth + equals tiling_depth. The whole loop-nest has depth at least 2 * + omp_tile_depth, but the 'tile loops' at levels + omp_tile_depth+1...2*omp_tile_depth are not in canonical loop-nest fo= rm + and hence cannot be associated with a loop construct. */ + if (clause_collapse > tiling_depth) + { + error_at (OMP_CLAUSE_LOCATION (collapse_clause), + "collapse cannot extend below the floor loops " + "generated by the % construct"); + OMP_CLAUSE_COLLAPSE_EXPR (collapse_clause) + =3D build_int_cst (unsigned_type_node, tiling_depth); + return transform_gomp_for (omp_for, NULL, ctx); + } + + if (tiling_depth > collapse) + return transform_gomp_for (omp_for, NULL, ctx); + + gcc_assert (collapse >=3D clause_collapse); + + push_gimplify_context (); + + /* Create the index variables for iterating the tiles in the floor + loops first tiling_depth loops transformed loop nest. */ + gimple_seq floor_loops_pre_body =3D NULL; + size_t tile_level =3D 0; + auto_vec sizes_vec; + for (tree el =3D tile_sizes; el; el =3D TREE_CHAIN (el), tile_level++) + { + size_t nest_level =3D tile_level; + tree index =3D gimple_omp_for_index (omp_for, nest_level); + tree init =3D gimple_omp_for_initial (omp_for, nest_level); + tree incr =3D gimple_omp_for_incr (omp_for, nest_level); + tree step =3D TREE_OPERAND (incr, 1); + + /* Initialize original index variables in the pre-body. The + loop lowering will not initialize them because of the changed + index variables. */ + gimplify_assign (index, init, &floor_loops_pre_body); + + tree tile_size =3D fold_convert (TREE_TYPE (step), TREE_VALUE (el)); + sizes_vec.safe_push (tile_size); + tree tile_index =3D create_tmp_var (TREE_TYPE (index), ".omp_tile_in= dex"); + gimplify_assign (tile_index, init, &floor_loops_pre_body); + + /* Floor loops */ + step =3D fold_build2 (MULT_EXPR, TREE_TYPE (step), step, tile_size); + tree tile_step =3D step; + /* For combined constructs, step will be gimplified on the outer + gomp_for. */ + if (!gimple_omp_for_combined_into_p (omp_for) && !TREE_CONSTANT (ste= p)) + { + tile_step =3D create_tmp_var (TREE_TYPE (step), ".omp_tile_step")= ; + gimplify_assign (tile_step, step, &floor_loops_pre_body); + } + incr =3D fold_build2 (TREE_CODE (incr), TREE_TYPE (incr), tile_index= , + tile_step); + gimple_omp_for_set_incr (floor_loops, nest_level, incr); + gimple_omp_for_set_index (floor_loops, nest_level, tile_index); + } + gbind *result_bind =3D gimple_build_bind (NULL, NULL, NULL); + pop_gimplify_context (result_bind); + gimple_seq_add_seq (gimple_omp_for_pre_body_ptr (floor_loops), + floor_loops_pre_body); + + /* The tiling loops will not form a perfect loop nest because the + loop for each tiling dimension needs to check if the current tile + is incomplete and this check is intervening code. Since OpenMP + 5.1 does not allow the collapse of the loop-nest to extend beyond + the floor loops, this is not a problem. + + "Uncollapse" the tiling loop nest, i.e. split the loop nest into + nested separate gomp_for structures for each level. This allows + to add the incomplete tile checks to each level loop. */ + + tile_loops =3D gomp_for_uncollapse (as_a (tile_loops)); + gimple_omp_for_set_kind (as_a (tile_loops), + GF_OMP_FOR_KIND_TRANSFORM_LOOP); + gimple_omp_for_set_clauses (tile_loops, NULL_TREE); + gimple_omp_for_set_pre_body (tile_loops, NULL); + + /* Transform the loop bodies of the "uncollapsed" tiling loops and + add them to the body of the floor loops. At this point, the + loop nest consists of perfectly nested gimple_omp_for constructs, + each representing a single loop. */ + gimple_seq floor_loops_body =3D NULL; + gimple *level_loop =3D tile_loops; + gimple_seq_add_stmt (&floor_loops_body, tile_loops); + gimple_seq *surrounding_seq =3D &floor_loops_body; + + push_gimplify_context (); + + tree break_label =3D create_artificial_label (UNKNOWN_LOCATION); + gimple_seq_add_stmt (surrounding_seq, gimple_build_label (break_label)); + for (size_t level =3D 0; level < tiling_depth; level++) + { + tree original_index =3D gimple_omp_for_index (omp_for, level); + tree original_final =3D gimple_omp_for_final (omp_for, level); + + tree tile_index =3D gimple_omp_for_index (floor_loops, level); + tree tile_size =3D sizes_vec[level]; + tree type =3D TREE_TYPE (tile_index); + tree plus_type =3D type; + + tree incr =3D gimple_omp_for_incr (omp_for, level); + tree step =3D omp_get_for_step_from_incr (gimple_location (omp_for),= incr); + + gimple_seq *pre_body =3D gimple_omp_for_pre_body_ptr (level_loop); + gimple_seq level_body =3D gimple_omp_body (level_loop); + gcc_assert (gimple_omp_for_collapse (level_loop) =3D=3D 1); + tree_code original_cond =3D gimple_omp_for_cond (omp_for, level); + + gimple_omp_for_set_initial (level_loop, 0, tile_index); + + tree tile_final =3D create_tmp_var (type); + tree scaled_tile_size =3D fold_build2 (MULT_EXPR, TREE_TYPE (tile_si= ze), + tile_size, step); + + tree_code plus_code =3D PLUS_EXPR; + if (POINTER_TYPE_P (TREE_TYPE (tile_index))) + { + plus_code =3D POINTER_PLUS_EXPR; + int unsignedp =3D TYPE_UNSIGNED (TREE_TYPE (scaled_tile_size)); + plus_type =3D signed_or_unsigned_type_for (unsignedp, ptrdiff_typ= e_node); + } + + scaled_tile_size =3D fold_convert (plus_type, scaled_tile_size); + gimplify_assign (tile_final, + fold_build2 (plus_code, type, + tile_index, scaled_tile_size), + pre_body); + gimple_omp_for_set_final (level_loop, 0, tile_final); + + /* Redefine the original loop index variable of OMP_FOR in terms of = the + floor loop and the tiling loop index variable for the current + dimension/level at the top of the loop. */ + gimple_seq level_preamble =3D NULL; + + push_gimplify_context (); + + tree body_label =3D create_artificial_label (UNKNOWN_LOCATION); + + /* Handle partial tiles, i.e. add a check that breaks from the tile = loop + if the new index value does not belong to the iteration space of t= he + original loop. */ + gimple_seq_add_stmt (&level_preamble, + gimple_build_cond (original_cond, original_index= , + original_final, body_label, + break_label)); + gimple_seq_add_stmt (&level_preamble, gimple_build_label (body_label= )); + + auto gsi =3D gsi_start (level_body); + gsi_insert_seq_before (&gsi, level_preamble, GSI_SAME_STMT); + gbind *level_bind =3D gimple_build_bind (NULL, NULL, NULL); + pop_gimplify_context (level_bind); + gimple_bind_set_body (level_bind, level_body); + gimple_omp_set_body (level_loop, level_bind); + + surrounding_seq =3D &level_body; + level_loop =3D gsi_stmt (gsi); + + /* The label for jumping out of the loop at the next nesting + level. For the outermost level, the label is put after the + loop-nest, for the last one it is not necessary. */ + if (level !=3D tiling_depth - 1) + { + break_label =3D create_artificial_label (UNKNOWN_LOCATION); + gsi_insert_after (&gsi, gimple_build_label (break_label), + GSI_NEW_STMT); + } + } + + gbind *tile_loops_bind; + tile_loops_bind =3D gimple_build_bind (NULL, tile_loops, NULL); + pop_gimplify_context (tile_loops_bind); + + gimple_omp_set_body (floor_loops, tile_loops_bind); + + tree remaining_clauses =3D OMP_CLAUSE_CHAIN (transformation_clauses); + + /* Collapsing of the OMP_FOR is used both for the "omp tile" + implementation and for the actual "collapse" clause. If the + tiling depth was greater than the collapse depth required by the + clauses on OMP_FOR, the collapse of OMP_FOR must be adjusted to + the latter value and all loops below the new collapse depth must + be transformed to GF_OMP_FOR_KIND_TRANSFORM_LOOP to ensure their + lowering in this pass. */ + size_t new_collapse =3D clause_collapse; + + /* Keep the omp_for collapsed if there are further transformations */ + if (remaining_clauses) + { + size_t next_transform_depth =3D 1; + if (OMP_CLAUSE_CODE (remaining_clauses) =3D=3D OMP_CLAUSE_TILE) + next_transform_depth + =3D list_length (OMP_CLAUSE_TILE_SIZES (remaining_clauses)); + + /* The current "omp tile" transformation reduces the nesting depth + of the canonical loop-nest to TILING_DEPTH. + Hence the following "omp tile" transformation is invalid if + it requires a greater nesting depth. */ + gcc_assert (next_transform_depth <=3D tiling_depth); + if (next_transform_depth > new_collapse) + new_collapse =3D next_transform_depth; + } + + if (collapse > new_collapse) + floor_loops =3D gomp_for_uncollapse (as_a (floor_loops), + new_collapse, true); + + /* Lower the uncollapsed tile loops. */ + walk_omp_for_loops (gimple_bind_body_ptr (tile_loops_bind), ctx); + + gcc_assert (remaining_clauses || !collapse_clause + || gimple_omp_for_collapse (floor_loops) + =3D=3D (size_t)clause_collapse); + + if (gimple_omp_for_combined_into_p (omp_for)) + ctx->inner_combined_loop =3D as_a (floor_loops); + + /* Apply remaining transformation clauses and assemble the transformatio= n + result. */ + gimple_bind_set_body (result_bind, + transform_gomp_for (as_a (floor_loops), + remaining_clauses, ctx)); + + return result_bind; +} + /* Combined distribute or taskloop constructs are represented by two or more nested gomp_for constructs which are created during gimplification. Loop transformations on the combined construct are @@ -999,6 +1316,10 @@ transform_gomp_for (gomp_for *omp_for, tree transform= ation, walk_ctx *ctx) ctx); } break; + case OMP_CLAUSE_TILE: + result =3D tile (omp_for, loc, OMP_CLAUSE_TILE_SIZES (transformation= ), + transformation, ctx); + break; default: gcc_unreachable (); } @@ -1177,6 +1498,21 @@ optimize_transformation_clauses (tree clauses) unroll_partial =3D c; } break; + case OMP_CLAUSE_TILE: + { + /* No optimization for those clauses yet, but they end any chai= n of + "unroll partial" clauses. */ + if (merged_unroll_partial && dump_enabled_p ()) + print_optimized_unroll_partial_msg (unroll_partial); + + if (unroll_partial) + OMP_CLAUSE_CHAIN (unroll_partial) =3D c; + + unroll_partial =3D NULL; + merged_unroll_partial =3D false; + last_non_unroll =3D c; + } + break; default: gcc_unreachable (); } diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-1.f90 b/gc= c/testsuite/gfortran.dg/gomp/loop-transforms/tile-1.f90 new file mode 100644 index 00000000000..84ea93300fa --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-1.f90 @@ -0,0 +1,163 @@ +subroutine test + implicit none + integer :: i, j, k + + !$omp tile sizes(1) + do i =3D 1,100 + call dummy(i) + end do + + !$omp tile sizes(1) + do i =3D 1,100 + call dummy(i) + end do + !$end omp tile + + !$omp tile sizes(2+3) + do i =3D 1,100 + call dummy(i) + end do + !$end omp tile + + !$omp tile sizes(-21) ! { dg-error {tile size not constant positive inte= ger at \(1\)} } + do i =3D 1,100 + call dummy(i) + end do + !$end omp tile + + !$omp tile sizes(0) ! { dg-error {tile size not constant positive intege= r at \(1\)} } + do i =3D 1,100 + call dummy(i) + end do + !$end omp tile + + !$omp tile sizes(i) ! { dg-error {Constant expression required at \(1\)}= } + do i =3D 1,100 + call dummy(i) + end do + !$end omp tile + + !$omp tile sizes ! { dg-error {Syntax error in 'tile sizes' list at \(1\= )} } + do i =3D 1,100 + call dummy(i) + end do + !$end omp tile + + !$omp tile sizes( ! { dg-error {Syntax error in 'tile sizes' list at \(1= \)} } + do i =3D 1,100 + call dummy(i) + end do + !$end omp tile + + !$omp tile sizes(2 ! { dg-error {Syntax error in 'tile sizes' list at \(= 1\)} } + do i =3D 1,100 + call dummy(i) + end do + !$end omp tile + + !$omp tile sizes() ! { dg-error {Syntax error in 'tile sizes' list at \(= 1\)} } + do i =3D 1,100 + call dummy(i) + end do + !$end omp tile + + !$omp tile sizes(2,) ! { dg-error {Syntax error in 'tile sizes' list at = \(1\)} } + do i =3D 1,100 + call dummy(i) + end do + !$end omp tile + + !$omp tile sizes(,2) ! { dg-error {Syntax error in 'tile sizes' list at = \(1\)} } + do i =3D 1,100 + call dummy(i) + end do + !$end omp tile + + !$omp tile sizes(,i) ! { dg-error {Syntax error in 'tile sizes' list at = \(1\)} } + do i =3D 1,100 + call dummy(i) + end do + !$end omp tile + + !$omp tile sizes(i,) ! { dg-error {Constant expression required at \(1\)= } } + do i =3D 1,100 + call dummy(i) + end do + !$end omp tile + + !$omp tile sizes(1,2) + do i =3D 1,100 + do j =3D 1,100 + call dummy(j) + end do + end do + !$end omp tile + + !$omp tile sizes(1,2) ! { dg-error {not enough DO loops for \!\$OMP TILE= at \(1\)} } + do i =3D 1,100 + call dummy(i) + end do + !$end omp tile + + !$omp tile sizes(1,2,1) ! { dg-error {not enough DO loops for \!\$OMP TI= LE at \(1\)} } + do i =3D 1,100 + do j =3D 1,100 + call dummy(i) + end do + end do + !$end omp tile + + !$omp tile sizes(1,2,1) + do i =3D 1,100 + do j =3D 1,100 + do k =3D 1,100 + call dummy(i) + end do + end do + end do + !$end omp tile + + !$omp tile sizes(1,2,1) + do i =3D 1,100 + do j =3D 1,100 + do k =3D 1,100 + call dummy(i) + end do + end do + call dummy(i) ! { dg-error {\!\$OMP TILE loops not perfectly nested a= t \(1\)} } + end do + !$end omp tile + + !$omp tile sizes(1,2,1) + do i =3D 1,100 + do j =3D 1,100 + do k =3D 1,100 + call dummy(i) + end do + call dummy(j) ! { dg-error {\!\$OMP TILE loops not perfectly neste= d at \(1\)} } + end do + end do + !$end omp tile + + !$omp tile sizes(1,2,1) ! { dg-error {not enough DO loops for \!\$OMP TI= LE at \(1\)} } + do i =3D 1,100 + call dummy(i) + do j =3D 1,100 + do k =3D 1,100 + call dummy(i) + end do + end do + end do + !$end omp tile + + !$omp tile sizes(1,2,1) ! { dg-error {not enough DO loops for \!\$OMP TI= LE at \(1\)} } + do i =3D 1,100 + do j =3D 1,100 + call dummy(j) + do k =3D 1,100 + call dummy(i) + end do + end do + end do + !$end omp tile +end subroutine test diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-1a.f90 b/g= cc/testsuite/gfortran.dg/gomp/loop-transforms/tile-1a.f90 new file mode 100644 index 00000000000..29d7532bc37 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-1a.f90 @@ -0,0 +1,10 @@ + +subroutine test + !$omp tile sizes(1,2,1) ! { dg-error {not enough DO loops for \!\$OMP TI= LE at \(1\)} } + do i =3D 1,100 + do j =3D 1,100 + call dummy(i) + end do + end do + !$end omp tile +end subroutine test diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-2.f90 b/gc= c/testsuite/gfortran.dg/gomp/loop-transforms/tile-2.f90 new file mode 100644 index 00000000000..8a5eae3a188 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-2.f90 @@ -0,0 +1,80 @@ +subroutine test1 + implicit none + integer :: i, j, k + + !$omp tile sizes (1,2) + !$omp tile sizes (1,2) + do i =3D 1,100 + do j =3D 1,100 + call dummy(j) + do k =3D 1,100 + call dummy(i) + end do + end do + end do + !$end omp tile + + !$omp tile sizes (8) + !$omp tile sizes (1,2) + !$omp tile sizes (1,2,3) + do i =3D 1,100 + do j =3D 1,100 + do k =3D 1,100 + call dummy(i) + end do + end do + end do + !$end omp tile +end subroutine test1 + +subroutine test2 + implicit none + integer :: i, j, k + + !$omp taskloop collapse(2) + !$omp tile sizes (3,4) + !$omp tile sizes (1,2) + do i =3D 1,100 + do j =3D 1,100 + call dummy(j) + do k =3D 1,100 + call dummy(i) + end do + end do + end do + !$end omp tile + !$omp end taskloop + + !$omp taskloop simd + !$omp tile sizes (8) + !$omp tile sizes (1,2) + !$omp tile sizes (1,2,3) + do i =3D 1,100 + do j =3D 1,100 + do k =3D 1,100 + call dummy(i) + end do + end do + end do + !$end omp tile + !$omp end taskloop simd +end subroutine test2 + +subroutine test3 + implicit none + integer :: i, j, k + + !$omp taskloop collapse(3) ! { dg-error {not enough DO loops for collaps= ed \!\$OMP TASKLOOP at \(1\)} } + !$omp tile sizes (1,2) ! { dg-error {loop nest depth after \!\$OMP TILE = at \(1\) is insufficient for outer \!\$OMP TASKLOOP} } + !$omp tile sizes (1,2) + do i =3D 1,100 + do j =3D 1,100 + call dummy(j) + do k =3D 1,100 + call dummy(i) + end do + end do + end do + !$end omp tile + !$omp end taskloop +end subroutine test3 diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-3.f90 b/gc= c/testsuite/gfortran.dg/gomp/loop-transforms/tile-3.f90 new file mode 100644 index 00000000000..eaa7895eaa0 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-3.f90 @@ -0,0 +1,18 @@ +subroutine test + implicit none + integer :: i, j, k + + !$omp parallel do collapse(2) ordered(2) + !$omp tile sizes (1,2) + do i =3D 1,100 ! { dg-error {'ordered' invalid in conjunction with 'omp = tile'} } + do j =3D 1,100 + call dummy(j) + do k =3D 1,100 + call dummy(i) + end do + end do + end do + !$end omp tile + !$end omp target + +end subroutine test diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-4.f90 b/gc= c/testsuite/gfortran.dg/gomp/loop-transforms/tile-4.f90 new file mode 100644 index 00000000000..b2dca0bbec6 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-4.f90 @@ -0,0 +1,95 @@ + +subroutine test1 + implicit none + integer :: i, j, k + + !$omp tile sizes (1,2) + !$omp tile sizes (1) ! { dg-error {loop nest depth after \!\$OMP TILE a= t \(1\) is insufficient for outer \!\$OMP TILE} } + do i =3D 1,100 + do j =3D 1,100 + call dummy(j) + do k =3D 1,100 + call dummy(i) + end do + end do + end do + !$end omp tile + +end subroutine test1 + +subroutine test2 + implicit none + integer :: i, j, k + + !$omp tile sizes (1,2) + !$omp tile sizes (1) ! { dg-error {loop nest depth after \!\$OMP TILE a= t \(1\) is insufficient for outer \!\$OMP TILE} } + do i =3D 1,100 + do j =3D 1,100 + call dummy(j) + do k =3D 1,100 + call dummy(i) + end do + end do + end do + !$end omp tile + +end subroutine test2 + +subroutine test3 + implicit none + integer :: i, j, k + + !$omp target teams distribute + !$omp tile sizes (1,2) + !$omp tile sizes (1) ! { dg-error {loop nest depth after \!\$OMP TILE a= t \(1\) is insufficient for outer \!\$OMP TILE} } + do i =3D 1,100 + do j =3D 1,100 + call dummy(j) + do k =3D 1,100 + call dummy(i) + end do + end do + end do + !$end omp tile + +end subroutine test3 + +subroutine test4 + implicit none + integer :: i, j, k + + !$omp target teams distribute collapse(2) + !$omp tile sizes (8) ! { dg-error {loop nest depth after \!\$OMP TILE a= t \(1\) is insufficient for outer \!\$OMP TARGET TEAMS DISTRIBUTE} } + !$omp tile sizes (1,2) + do i =3D 1,100 + do j =3D 1,100 + call dummy(j) + do k =3D 1,100 + call dummy(i) + end do + end do + end do + !$end omp tile + +end subroutine test4 + +subroutine test5 + implicit none + integer :: i, j, k + + !$omp parallel do collapse(2) ordered(2) + !$omp tile sizes (8) ! { dg-error {loop nest depth after \!\$OMP TILE a= t \(1\) is insufficient for outer \!\$OMP PARALLEL DO} } + !$omp tile sizes (1,2) + do i =3D 1,100 + do j =3D 1,100 + call dummy(j) + do k =3D 1,100 + call dummy(i) + end do + end do + end do + !$end omp tile + !$end omp tile + !$end omp target + +end subroutine test5 diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-unroll-1.f= 90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-unroll-1.f90 new file mode 100644 index 00000000000..27920701b36 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-unroll-1.f90 @@ -0,0 +1,57 @@ +function mult (a, b) result (c) + integer, allocatable, dimension (:,:) :: a,b,c + integer :: i, j, k, inner + + allocate(c( n, m )) + + !$omp parallel do collapse(2) + !$omp tile sizes (8,8) + !$omp unroll partial(2) ! { dg-error {loop nest depth after \!\$OMP UNRO= LL at \(1\) is insufficient for outer \!\$OMP TILE} } + ! { dg-error {loop nest depth after \!\$OMP UNROLL at \(1\) is insuffici= ent for outer \!\$OMP PARALLEL DO} "" { target *-*-*} .-1 } + do i =3D 1,m + do j =3D 1,n + inner =3D 0 + do k =3D 1, n + inner =3D inner + a(k, i) * b(j, k) + end do + c(j, i) =3D inner + end do + end do + + !$omp tile sizes (8,8) + !$omp unroll partial(2) ! { dg-error {loop nest depth after \!\$OMP UNRO= LL at \(1\) is insufficient for outer \!\$OMP TILE} } + do i =3D 1,m + do j =3D 1,n + inner =3D 0 + do k =3D 1, n + inner =3D inner + a(k, i) * b(j, k) + end do + c(j, i) =3D inner + end do + end do + + !$omp tile sizes (8) + !$omp unroll partial(1) + do i =3D 1,m + do j =3D 1,n + inner =3D 0 + do k =3D 1, n + inner =3D inner + a(k, i) * b(j, k) + end do + c(j, i) =3D inner + end do + end do + + !$omp parallel do collapse(2) ! { dg-error {missing canonical loop nest = after \!\$OMP PARALLEL DO at \(1\)} } + !$omp tile sizes (8,8) ! { dg-error {missing canonical loop nest after \= !\$OMP TILE at \(1\)} } + !$omp unroll full ! { dg-warning {\!\$OMP UNROLL with FULL clause at \(1= \) turns loop into a non-loop} } + do i =3D 1,m + do j =3D 1,n + inner =3D 0 + do k =3D 1, n + inner =3D inner + a(k, i) * b(j, k) + end do + c(j, i) =3D inner + end do + end do +end function mult diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-1.f= 90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-1.f90 new file mode 100644 index 00000000000..cda878f3037 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-1.f90 @@ -0,0 +1,37 @@ +! { dg-additional-options "-fdump-tree-original" } +! { dg-additional-options "-fdump-tree-omp_transform_loops" } + +function mult (a, b) result (c) + integer, allocatable, dimension (:,:) :: a,b,c + integer :: i, j, k, inner + + allocate(c( n, m )) + + !$omp parallel do + !$omp unroll partial(1) + !$omp tile sizes (8,8) + do i =3D 1,m + do j =3D 1,n + inner =3D 0 + do k =3D 1, n + inner =3D inner + a(k, i) * b(j, k) + end do + c(j, i) =3D inner + end do + end do +end function mult + +! { dg-final { scan-tree-dump-times {#pragma omp for nowait unroll_partial= \(1\) tile sizes\(8, 8\)} 1 "original" } } +! { dg-final { scan-tree-dump-not "#pragma omp loop_transform unroll_parti= al" "omp_transform_loops" } } + +! Tiling adds two floor and two tile loops. + +! Number of conditional statements after tiling: +! 5 +! =3D 2 (lowering of 2 tile loops) +! + 1 (partial tile handling in 2 tile loops) +! + 1 (lowering of non-associated floor loop) + +! The unrolling with unroll factor 1 currently gets executed (TODO could/s= hould be skipped?) + +! { dg-final { scan-tree-dump-times {if \([A-Za-z0-9_.]+ < } 5 "omp_transf= orm_loops" } } diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-2.f= 90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-2.f90 new file mode 100644 index 00000000000..00615011856 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-2.f90 @@ -0,0 +1,41 @@ +! { dg-additional-options "-fdump-tree-original" } +! { dg-additional-options "-fdump-tree-omp_transform_loops" } + +function mult (a, b) result (c) + integer, allocatable, dimension (:,:) :: a,b,c + integer :: i, j, k, inner + + allocate(c( n, m )) + c =3D 0 + + !$omp target + !$omp parallel do + !$omp unroll partial(2) + !$omp tile sizes (8,8,4) + do i =3D 1,m + do j =3D 1,n + do k =3D 1, n + c(j,i) =3D c(j,i) + a(k, i) * b(j, k) + end do + end do + end do + !$omp end target +end function mult + +! { dg-final { scan-tree-dump-times {#pragma omp for nowait unroll_partial= \(2\) tile sizes\(8, 8, 4\)} 1 "original" } } +! { dg-final { scan-tree-dump-not "#pragma omp loop_transform unroll_parti= al" "omp_transform_loops" } } + +! Check the number of loops + +! Tiling adds three tile and three floor loops. +! The outermost floor loop is associated with the "!$omp parallel do" +! and hence it isn't lowered in the transformation pass. +! Number of conditional statements after tiling: +! 8 +! =3D 2 (inner floor loop lowering) +! + 3 (partial tile handling in 3 tile loops) +! + 3 (lowering of 3 tile loops) +! +! Unrolling creates 2 copies of the tiled loop nest. + +! { dg-final { scan-tree-dump-times {if \([A-Za-z0-9_.]+ < } 16 "omp_trans= form_loops" } } diff --git a/gcc/tree-core.h b/gcc/tree-core.h index f1429824158..b241e144515 100644 --- a/gcc/tree-core.h +++ b/gcc/tree-core.h @@ -534,6 +534,9 @@ enum omp_clause_code { /* Internal representation for an "omp unroll partial" directive. */ OMP_CLAUSE_UNROLL_PARTIAL, + + /* Represents a "tile" directive internally. */ + OMP_CLAUSE_TILE }; #undef DEFTREESTRUCT diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc index cae81719e68..02c207d87a0 100644 --- a/gcc/tree-pretty-print.cc +++ b/gcc/tree-pretty-print.cc @@ -521,6 +521,14 @@ dump_omp_clause (pretty_printer *pp, tree clause, int = spc, dump_flags_t flags) pp_right_paren (pp); } break; + case OMP_CLAUSE_TILE: + pp_string (pp, "tile sizes"); + pp_left_paren (pp); + gcc_assert (OMP_CLAUSE_TILE_SIZES (clause)); + dump_generic_node (pp, OMP_CLAUSE_TILE_SIZES (clause), spc, flags, + false); + pp_right_paren (pp); + break; case OMP_CLAUSE__LOOPTEMP_: name =3D "_looptemp_"; goto print_remap; diff --git a/gcc/tree.cc b/gcc/tree.cc index fc7e22d352f..893f509fa3a 100644 --- a/gcc/tree.cc +++ b/gcc/tree.cc @@ -327,8 +327,10 @@ unsigned const char omp_clause_num_ops[] =3D 0, /* OMP_CLAUSE_FINALIZE */ 0, /* OMP_CLAUSE_NOHOST */ 0, /* OMP_CLAUSE_UNROLL_FULL */ + 0, /* OMP_CLAUSE_UNROLL_NONE */ - 1 /* OMP_CLAUSE_UNROLL_PARTIAL */ + 1, /* OMP_CLAUSE_UNROLL_PARTIAL */ + 1 /* OMP_CLAUSE_TILE */ }; const char * const omp_clause_code_name[] =3D @@ -422,7 +424,8 @@ const char * const omp_clause_code_name[] =3D "nohost", "unroll_full", "unroll_none", - "unroll_partial" + "unroll_partial", + "tile" }; /* Unless specific to OpenACC, we tend to internally maintain OpenMP-centr= ic diff --git a/gcc/tree.h b/gcc/tree.h index 6f7a6e7017a..8f4d2761d1a 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -1790,6 +1790,9 @@ class auto_suppress_location_wrappers #define OMP_CLAUSE_UNROLL_PARTIAL_EXPR(NODE) \ OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_UNROLL_PA= RTIAL), 0) +#define OMP_CLAUSE_TILE_SIZES(NODE) \ + OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_TILE), 0) + #define OMP_CLAUSE_PROC_BIND_KIND(NODE) \ (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_PROC_BIND)->omp_clause.subco= de.proc_bind_kind) diff --git a/libgomp/testsuite/libgomp.c++/loop-transforms/unroll-full-tile= .C b/libgomp/testsuite/libgomp.c++/loop-transforms/unroll-full-tile.C new file mode 100644 index 00000000000..8970bfa7fd8 --- /dev/null +++ b/libgomp/testsuite/libgomp.c++/loop-transforms/unroll-full-tile.C @@ -0,0 +1,84 @@ +#include +#include + +template +int sum () +{ + int sum =3D 0; +#pragma omp unroll full +#pragma omp tile sizes(dim0, dim1) + for (unsigned i =3D 0; i < 4; i++) + for (unsigned j =3D 0; j < 5; j++) + sum++; + + return sum; +} + +int main () +{ + if (sum <1,1> () !=3D 20) + __builtin_abort (); + if (sum <1,2> () !=3D 20) + __builtin_abort (); + if (sum <1,3> () !=3D 20) + __builtin_abort (); + if (sum <1,4> () !=3D 20) + __builtin_abort (); + if (sum <1,5> () !=3D 20) + __builtin_abort (); + + if (sum <2,1> () !=3D 20) + __builtin_abort (); + if (sum <2,2> () !=3D 20) + __builtin_abort (); + if (sum <2,3> () !=3D 20) + __builtin_abort (); + if (sum <2,4> () !=3D 20) + __builtin_abort (); + if (sum <2,5> () !=3D 20) + __builtin_abort (); + + if (sum <3,1> () !=3D 20) + __builtin_abort (); + if (sum <3,2> () !=3D 20) + __builtin_abort (); + if (sum <3,3> () !=3D 20) + __builtin_abort (); + if (sum <3,4> () !=3D 20) + __builtin_abort (); + if (sum <3,5> () !=3D 20) + __builtin_abort (); + + if (sum <4,1> () !=3D 20) + __builtin_abort (); + if (sum <4,2> () !=3D 20) + __builtin_abort (); + if (sum <4,3> () !=3D 20) + __builtin_abort (); + if (sum <4,4> () !=3D 20) + __builtin_abort (); + if (sum <4,5> () !=3D 20) + __builtin_abort (); + + if (sum <5,1> () !=3D 20) + __builtin_abort (); + if (sum <5,2> () !=3D 20) + __builtin_abort (); + if (sum <5,3> () !=3D 20) + __builtin_abort (); + if (sum <5,4> () !=3D 20) + __builtin_abort (); + if (sum <5,5> () !=3D 20) + __builtin_abort (); + + if (sum <6,1> () !=3D 20) + __builtin_abort (); + if (sum <6,2> () !=3D 20) + __builtin_abort (); + if (sum <6,3> () !=3D 20) + __builtin_abort (); + if (sum <6,4> () !=3D 20) + __builtin_abort (); + if (sum <6,5> () !=3D 20) + __builtin_abort (); +} diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-1.f90 b= /libgomp/testsuite/libgomp.fortran/loop-transforms/tile-1.f90 new file mode 100644 index 00000000000..bb48c31224e --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-1.f90 @@ -0,0 +1,71 @@ +module matrix + implicit none + integer :: n =3D 10 + integer :: m =3D 10 + +contains + function mult (a, b) result (c) + integer, allocatable, dimension (:,:) :: a,b,c + integer :: i, j, k, inner + + allocate(c( n, m )) + !$omp parallel do collapse(2) private(inner) + !$omp tile sizes (8, 1) + do i =3D 1,m + do j =3D 1,n + inner =3D 0 + do k =3D 1, n + inner =3D inner + a(k, i) * b(j, k) + end do + c(j, i) =3D inner + end do + end do + end function mult + + subroutine print_matrix (m) + integer, allocatable :: m(:,:) + integer :: i, j, n + + n =3D size (m, 1) + do i =3D 1,n + do j =3D 1,n + write (*, fmt=3D"(i4)", advance=3D'no') m(j, i) + end do + write (*, *) "" + end do + write (*, *) "" + end subroutine + +end module matrix + +program main + use matrix + implicit none + + integer, allocatable :: a(:,:),b(:,:),c(:,:) + integer :: i,j + + allocate(a( n, m )) + allocate(b( n, m )) + + do i =3D 1,n + do j =3D 1,m + a(j,i) =3D merge(1,0, i.eq.j) + b(j,i) =3D j + end do + end do + + c =3D mult (a, b) + + call print_matrix (a) + call print_matrix (b) + call print_matrix (c) + + do i =3D 1,n + do j =3D 1,m + if (b(i,j) .ne. c(i,j)) call abort () + end do + end do + + +end program main diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90 b= /libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90 new file mode 100644 index 00000000000..6aedbf4724f --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90 @@ -0,0 +1,117 @@ +! { dg-additional-options "-fdump-tree-original" } +! { dg-do run } + +module test_functions + contains + integer function compute_sum1() result(sum) + implicit none + + integer :: i,j + + sum =3D 0 + !$omp do + do i =3D 1,10,3 + !$omp tile sizes(2) + do j =3D 1,10,3 + sum =3D sum + 1 + end do + end do + end function + + integer function compute_sum2() result(sum) + implicit none + + integer :: i,j + + sum =3D 0 + !$omp do + do i =3D 1,10,3 + !$omp tile sizes(16) + do j =3D 1,10,3 + sum =3D sum + 1 + end do + end do + end function + + integer function compute_sum3() result(sum) + implicit none + + integer :: i,j + + sum =3D 0 + !$omp do + do i =3D 1,10,3 + !$omp tile sizes(100) + do j =3D 1,10,3 + sum =3D sum + 1 + end do + end do + end function + + integer function compute_sum4() result(sum) + implicit none + + integer :: i,j + + sum =3D 0 + !$omp do + !$omp tile sizes(6,10) + do i =3D 1,10,3 + do j =3D 1,10,3 + sum =3D sum + 1 + end do + end do + end function + + integer function compute_sum5() result(sum) + implicit none + + integer :: i,j + + sum =3D 0 + !$omp parallel do collapse(2) + !$omp tile sizes(6,10) + do i =3D 1,10,3 + do j =3D 1,10,3 + sum =3D sum + 1 + end do + end do + end function +end module test_functions + +program test + use test_functions + implicit none + + integer :: result + + result =3D compute_sum1 () + write (*,*) result + if (result .ne. 16) then + call abort + end if + + result =3D compute_sum2 () + write (*,*) result + if (result .ne. 16) then + call abort + end if + + result =3D compute_sum3 () + write (*,*) result + if (result .ne. 16) then + call abort + end if + + result =3D compute_sum4 () + write (*,*) result + if (result .ne. 16) then + call abort + end if + + result =3D compute_sum5 () + write (*,*) result + if (result .ne. 16) then + call abort + end if +end program diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-= 1.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-1.f90 new file mode 100644 index 00000000000..2f2f014ead9 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-1.f90 @@ -0,0 +1,112 @@ +module matrix + implicit none + integer :: n =3D 10 + integer :: m =3D 10 + +contains + + function mult (a, b) result (c) + integer, allocatable, dimension (:,:) :: a,b,c + integer :: i, j, k, inner + + allocate(c( n, m )) + do i =3D 1,10 + do j =3D 1,n + c(j,i) =3D 0 + end do + end do + + !$omp unroll partial(10) + !$omp tile sizes(1, 3) + do i =3D 1,10 + do j =3D 1,n + do k =3D 1, n + write (*,*) i, j, k + c(j,i) =3D c(j,i) + a(k, i) * b(j, k) + end do + end do + end do + end function mult + + function mult2 (a, b) result (c) + integer, allocatable, dimension (:,:) :: a,b,c + integer :: i, j, k, inner + + allocate(c( n, m )) + do i =3D 1,10 + do j =3D 1,n + c(j,i) =3D 0 + end do + end do + + !$omp unroll partial(2) + !$omp tile sizes(1,2) + do i =3D 1,10 + do j =3D 1,n + do k =3D 1, n + write (*,*) i, j, k + c(j,i) =3D c(j,i) + a(k, i) * b(j, k) + end do + end do + end do + end function mult2 + + subroutine print_matrix (m) + integer, allocatable :: m(:,:) + integer :: i, j, n + + n =3D size (m, 1) + do i =3D 1,n + do j =3D 1,n + write (*, fmt=3D"(i4)", advance=3D'no') m(j, i) + end do + write (*, *) "" + end do + write (*, *) "" + end subroutine + +end module matrix + +program main + use matrix + implicit none + + integer, allocatable :: a(:,:),b(:,:),c(:,:) + integer :: i,j + + allocate(a( n, m )) + allocate(b( n, m )) + + do i =3D 1,n + do j =3D 1,m + a(j,i) =3D merge(1,0, i.eq.j) + b(j,i) =3D j + end do + end do + + ! c =3D mult (a, b) + + ! call print_matrix (a) + ! call print_matrix (b) + ! call print_matrix (c) + + ! do i =3D 1,n + ! do j =3D 1,m + ! if (b(i,j) .ne. c(i,j)) call abort () + ! end do + ! end do + + + c =3D mult2 (a, b) + + call print_matrix (a) + call print_matrix (b) + call print_matrix (c) + + do i =3D 1,n + do j =3D 1,m + if (b(i,j) .ne. c(i,j)) call abort () + end do + end do + +end program main diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-= 2.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-2.f90 new file mode 100644 index 00000000000..1b5b623b838 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-2.f90 @@ -0,0 +1,71 @@ +module matrix + implicit none + integer :: n =3D 10 + integer :: m =3D 10 + +contains + + function copy (a, b) result (c) + integer, allocatable, dimension (:,:) :: a,b,c + integer :: i, j, k, inner + + allocate(c( n, m )) + do i =3D 1,10 + do j =3D 1,n + c(j,i) =3D 0 + end do + end do + + !$omp unroll partial(2) + !$omp tile sizes (1,5) + do i =3D 1,10 + do j =3D 1,n + c(j,i) =3D c(j,i) + a(j, i) + end do + end do + end function copy + + subroutine print_matrix (m) + integer, allocatable :: m(:,:) + integer :: i, j, n + + n =3D size (m, 1) + do i =3D 1,n + do j =3D 1,n + write (*, fmt=3D"(i4)", advance=3D'no') m(j, i) + end do + write (*, *) "" + end do + write (*, *) "" + end subroutine +end module matrix + +program main + use matrix + implicit none + + integer, allocatable :: a(:,:),b(:,:),c(:,:) + integer :: i,j + + allocate(a( n, m )) + allocate(b( n, m )) + + do i =3D 1,n + do j =3D 1,m + a(j,i) =3D 1 + end do + end do + + c =3D copy (a, b) + + call print_matrix (a) + call print_matrix (b) + call print_matrix (c) + + do i =3D 1,n + do j =3D 1,m + if (c(i,j) .ne. a(i,j)) call abort () + end do + end do + +end program main diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-= 3.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-3.f90 new file mode 100644 index 00000000000..518968f1335 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-3.f90 @@ -0,0 +1,77 @@ +module matrix + implicit none + integer :: n =3D 4 + integer :: m =3D 4 + +contains + function mult (a, b) result (c) + integer, allocatable, dimension (:,:) :: a,b,c + integer :: i, j, k, inner + + allocate(c( n, m )) + ! omp do private(inner) + do i =3D 1,m + !$omp unroll partial(4) + !$omp tile sizes (5) + do j =3D 1,n + do k =3D 1, n + write (*,*) "i", i, "j", j, "k", k + if (k =3D=3D 1) then + inner =3D 0 + endif + inner =3D inner + a(k, i) * b(j, k) + if (k =3D=3D n) then + c(j, i) =3D inner + endif + end do + end do + end do + end function mult + + subroutine print_matrix (m) + integer, allocatable :: m(:,:) + integer :: i, j, n + + n =3D size (m, 1) + do i =3D 1,n + do j =3D 1,n + write (*, fmt=3D"(i4)", advance=3D'no') m(j, i) + end do + write (*, *) "" + end do + write (*, *) "" + end subroutine + +end module matrix + +program main + use matrix + implicit none + + integer, allocatable :: a(:,:),b(:,:),c(:,:) + integer :: i,j + + allocate(a( n, m )) + allocate(b( n, m )) + + do i =3D 1,n + do j =3D 1,m + a(j,i) =3D merge(1,0, i.eq.j) + b(j,i) =3D j + end do + end do + + c =3D mult (a, b) + + call print_matrix (a) + call print_matrix (b) + call print_matrix (c) + + do i =3D 1,n + do j =3D 1,m + if (b(i,j) .ne. c(i,j)) call abort () + end do + end do + + +end program main diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-= 4.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-4.f90 new file mode 100644 index 00000000000..807135df5e8 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-4.f90 @@ -0,0 +1,75 @@ +module matrix + implicit none + integer :: n =3D 4 + integer :: m =3D 4 + +contains + function mult (a, b) result (c) + integer, allocatable, dimension (:,:) :: a,b,c + integer :: i, j, k, inner + + allocate(c( n, m )) + do i =3D 1,m + do j =3D 1,n + c(j, i) =3D 0 + end do + end do + + !$omp parallel do + do i =3D 1,m + !$omp tile sizes (5,2) + do j =3D 1,n + do k =3D 1, n + c(j,i) =3D c(j,i) + a(k, i) * b(j, k) + end do + end do + end do + end function mult + + subroutine print_matrix (m) + integer, allocatable :: m(:,:) + integer :: i, j, n + + n =3D size (m, 1) + do i =3D 1,n + do j =3D 1,n + write (*, fmt=3D"(i4)", advance=3D'no') m(j, i) + end do + write (*, *) "" + end do + write (*, *) "" + end subroutine + +end module matrix + +program main + use matrix + implicit none + + integer, allocatable :: a(:,:),b(:,:),c(:,:) + integer :: i,j + + allocate(a( n, m )) + allocate(b( n, m )) + + do i =3D 1,n + do j =3D 1,m + a(j,i) =3D merge(1,0, i.eq.j) + b(j,i) =3D j + end do + end do + + c =3D mult (a, b) + + call print_matrix (a) + call print_matrix (b) + call print_matrix (c) + + do i =3D 1,n + do j =3D 1,m + if (b(i,j) .ne. c(i,j)) call abort () + end do + end do + + +end program main diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-tile-= 1.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-tile-1.f90 new file mode 100644 index 00000000000..2f2f014ead9 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-tile-1.f90 @@ -0,0 +1,112 @@ +module matrix + implicit none + integer :: n =3D 10 + integer :: m =3D 10 + +contains + + function mult (a, b) result (c) + integer, allocatable, dimension (:,:) :: a,b,c + integer :: i, j, k, inner + + allocate(c( n, m )) + do i =3D 1,10 + do j =3D 1,n + c(j,i) =3D 0 + end do + end do + + !$omp unroll partial(10) + !$omp tile sizes(1, 3) + do i =3D 1,10 + do j =3D 1,n + do k =3D 1, n + write (*,*) i, j, k + c(j,i) =3D c(j,i) + a(k, i) * b(j, k) + end do + end do + end do + end function mult + + function mult2 (a, b) result (c) + integer, allocatable, dimension (:,:) :: a,b,c + integer :: i, j, k, inner + + allocate(c( n, m )) + do i =3D 1,10 + do j =3D 1,n + c(j,i) =3D 0 + end do + end do + + !$omp unroll partial(2) + !$omp tile sizes(1,2) + do i =3D 1,10 + do j =3D 1,n + do k =3D 1, n + write (*,*) i, j, k + c(j,i) =3D c(j,i) + a(k, i) * b(j, k) + end do + end do + end do + end function mult2 + + subroutine print_matrix (m) + integer, allocatable :: m(:,:) + integer :: i, j, n + + n =3D size (m, 1) + do i =3D 1,n + do j =3D 1,n + write (*, fmt=3D"(i4)", advance=3D'no') m(j, i) + end do + write (*, *) "" + end do + write (*, *) "" + end subroutine + +end module matrix + +program main + use matrix + implicit none + + integer, allocatable :: a(:,:),b(:,:),c(:,:) + integer :: i,j + + allocate(a( n, m )) + allocate(b( n, m )) + + do i =3D 1,n + do j =3D 1,m + a(j,i) =3D merge(1,0, i.eq.j) + b(j,i) =3D j + end do + end do + + ! c =3D mult (a, b) + + ! call print_matrix (a) + ! call print_matrix (b) + ! call print_matrix (c) + + ! do i =3D 1,n + ! do j =3D 1,m + ! if (b(i,j) .ne. c(i,j)) call abort () + ! end do + ! end do + + + c =3D mult2 (a, b) + + call print_matrix (a) + call print_matrix (b) + call print_matrix (c) + + do i =3D 1,n + do j =3D 1,m + if (b(i,j) .ne. c(i,j)) call abort () + end do + end do + +end program main diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-tile-= 2.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-tile-2.f90 new file mode 100644 index 00000000000..1b5b623b838 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-tile-2.f90 @@ -0,0 +1,71 @@ +module matrix + implicit none + integer :: n =3D 10 + integer :: m =3D 10 + +contains + + function copy (a, b) result (c) + integer, allocatable, dimension (:,:) :: a,b,c + integer :: i, j, k, inner + + allocate(c( n, m )) + do i =3D 1,10 + do j =3D 1,n + c(j,i) =3D 0 + end do + end do + + !$omp unroll partial(2) + !$omp tile sizes (1,5) + do i =3D 1,10 + do j =3D 1,n + c(j,i) =3D c(j,i) + a(j, i) + end do + end do + end function copy + + subroutine print_matrix (m) + integer, allocatable :: m(:,:) + integer :: i, j, n + + n =3D size (m, 1) + do i =3D 1,n + do j =3D 1,n + write (*, fmt=3D"(i4)", advance=3D'no') m(j, i) + end do + write (*, *) "" + end do + write (*, *) "" + end subroutine +end module matrix + +program main + use matrix + implicit none + + integer, allocatable :: a(:,:),b(:,:),c(:,:) + integer :: i,j + + allocate(a( n, m )) + allocate(b( n, m )) + + do i =3D 1,n + do j =3D 1,m + a(j,i) =3D 1 + end do + end do + + c =3D copy (a, b) + + call print_matrix (a) + call print_matrix (b) + call print_matrix (c) + + do i =3D 1,n + do j =3D 1,m + if (c(i,j) .ne. a(i,j)) call abort () + end do + end do + +end program main -- 2.36.1 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstra=DFe 201, 8= 0634 M=FCnchen; Gesellschaft mit beschr=E4nkter Haftung; Gesch=E4ftsf=FChre= r: Thomas Heurung, Frank Th=FCrauf; Sitz der Gesellschaft: M=FCnchen; Regis= tergericht M=FCnchen, HRB 106955