From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id E7BF43858D32 for ; Thu, 25 May 2023 10:00:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E7BF43858D32 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1685008828; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references; bh=H54+H2bIjboT/Pim0RgfkqrkY/bg1+GLMR2PcY4VpOI=; b=hQo4NzBfQTcaj6FKaKPce9mTSBuEM3i2GbE3NIbV684NBO2CbFWF3hxZ2DzsND0a7BCH5r 7JQJ2FRIZ09qVYrUi0y5G1ZuykaokOVBFxBYf2gvkkAjn0vMuMPNIXjujvJYhRpGPsZSpn 11MuD8Zgp40c5x8zhdFH7eaYtxd51M4= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-526-Sn-MTkZqOeaURz7ZsXcEIA-1; Thu, 25 May 2023 06:00:27 -0400 X-MC-Unique: Sn-MTkZqOeaURz7ZsXcEIA-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 817F185A5AA; Thu, 25 May 2023 10:00:26 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.192.17]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 28E641121314; Thu, 25 May 2023 10:00:25 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 34PA0NqQ2437558 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Thu, 25 May 2023 12:00:23 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 34PA0MVs2437557; Thu, 25 May 2023 12:00:22 +0200 Date: Thu, 25 May 2023 12:00:21 +0200 From: Jakub Jelinek To: Sandra Loosemore Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH 1/3] OpenMP: C support for imperfectly-nested loops Message-ID: Reply-To: Jakub Jelinek References: <20230428232254.628185-1-sandra@codesourcery.com> <20230428232254.628185-2-sandra@codesourcery.com> MIME-Version: 1.0 In-Reply-To: <20230428232254.628185-2-sandra@codesourcery.com> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-9.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, Apr 28, 2023 at 05:22:52PM -0600, Sandra Loosemore wrote: > OpenMP 5.0 removed the restriction that multiple collapsed loops must > be perfectly nested, allowing "intervening code" (including nested > BLOCKs) before or after each nested loop. In GCC this code is moved > into the inner loop body by the respective front ends. > > This patch changes the C front end to use recursive descent parsing > on nested loops within an "omp for" construct, rather than an iterative > approach, in order to preserve proper nesting of compound statements. > > gcc/c/ChangeLog > * c-parser.cc (struct c_parser): Add omp_for_parse_state field. > (struct omp_for_parse_data): New. > (c_parser_compound_statement_nostart): Special-case nested > OMP loops and blocks in intervening code. > (c_parser_while_statement): Reject in intervening code. > (c_parser_do_statement): Likewise. > (c_parser_for_statement): Likewise. > (c_parser_postfix_expression_after_primary): Reject calls to OMP > runtime routines in intervening code. > (c_parser_pragma): Reject OMP pragmas in intervening code. > (c_parser_omp_loop_nest): New, split from c_parser_omp_for_loop. > (c_parser_omp_for_loop): Rewrite to use recursive descent and > generalize handling for intervening code. > > gcc/ChangeLog > * omp-api.h: New file. Why? Just add those to omp-general.h. > * omp-general.cc (omp_runtime_api_procname): New. > (omp_runtime_api_call): Moved here from omp-low.cc, and make > non-static. > * omp-general.h: Include omp-api.h. > * omp-low.cc (omp_runtime_api_call): Delete this copy. > > gcc/testsuite/ChangeLog > * c-c++-common/goacc/collapse-1.c: Adjust expected error messages. > * c-c++-common/goacc/tile-2.c: Likewise. > * c-c++-common/gomp/imperfect1.c: New. > * c-c++-common/gomp/imperfect2.c: New. > * c-c++-common/gomp/imperfect3.c: New. > * c-c++-common/gomp/imperfect4.c: New. > * c-c++-common/gomp/imperfect5.c: New. > * gcc.dg/gomp/collapse-1.c: Adjust expected error messages. > > libgomp/ChangeLog > * testsuite/libgomp.c-c++-common/imperfect1.c: New. > * testsuite/libgomp.c-c++-common/imperfect2.c: New. > * testsuite/libgomp.c-c++-common/imperfect3.c: New. > * testsuite/libgomp.c-c++-common/imperfect4.c: New. > * testsuite/libgomp.c-c++-common/imperfect5.c: New. > * testsuite/libgomp.c-c++-common/imperfect6.c: New. > * testsuite/libgomp.c-c++-common/offload-imperfect1.c: New. > * testsuite/libgomp.c-c++-common/offload-imperfect2.c: New. > * testsuite/libgomp.c-c++-common/offload-imperfect3.c: New. > * testsuite/libgomp.c-c++-common/offload-imperfect4.c: New. If the 3 patches are going to be committed separately (which I think is a good idea), then the *c-c++-common* tests are a problem, because the tests will then fail after the C FE part is committed before the C++ FE part is committed. For the new tests there are 2 options, one is commit them in the C patch with /* { dg-do run { target c } } */ instead of just /* { dg-do run } */ etc. and then in the second patch remove those " { target c }" parts, or commit them in the second patch only. For the existing tests with adjustments, do the { target c } vs. { target c++ } games and tweak in the second patch. The offload-imperfect* tests should be called target-imperfect* I think, for consistency with other tests. In the gcc/testsuite/c-c++-common/gomp/ tests I miss some coverage for the boundary cases what is and isn't intervening code. Before your changes, we were allowing multiple levels of {}s, so #pragma omp for ordered(2) for (int i = 0; i < 64; i++) { { { for (int j = 0; j < 64; j++) ; } } } which is valid in 5.0 (but should be tested in the testsuite), but also empty statements, which when reading the 5.1/5.2 spec don't actually seem to be valid. #pragma omp for ordered(2) for (int i = 0; i < 64; i++) { ; ; ; for (int j = 0; j < 64; j++) ; ; } because even the empty statement is I think intervening code according to the grammar. Another thing I don't really see covered in the testsuite nor in the code is if some variable declared in intervening code is then used in the inner loop's init/cond/incr expressions. I mean something like: #pragma omp for collapse(2) for (int i = 0; i < 64; i++) { int v = (i + 4) * 2; for (int j = v; j < 64; j++) ; } That just ICEs with your patch, we should diagnose it as invalid. In the canonical loop form requirements the standard requires that the expressions are loop invariant with the exceptions of specific cases allowed for non-rectangular loops. So, above I'm sure that is violated. Another case would be #pragma omp for collapse(2) for (int i = 0; i < 64; i++) { int v = (i + 4); for (int j = v; j < 64; j++) ; } but here it isn't invariant either, although for (int j = i + 4; j < 64; j++) would be valid non-rectangular loop I think it requires the exact syntax. Yet another case is #pragma omp for collapse(2) for (int i = 0; i < 64; i++) { int v = 8; for (int j = v; j < 64; j++) ; } This actually is loop invariant (the value) and not really sure what would in the standard prevent it from being valid, but it doesn't feel right and can't be handled easily, because one should be able to compute the number of iterations before the loop where v isn't in scope or there could be a different v in scope. And/or for C++ #pragma omp for collapse(2) for (int i = 0; i < 64; i++) { const int v = 8; // or even constexpr for (int j = v; j < 64; j++) ; } where already during parsing we won't really know we are actually using v if it isn't ODR used. Or unless it is in a template and the const/constexpr variable is value or type dependent. I think we want to discuss this in omp-lang. > @@ -1525,6 +1529,23 @@ struct oacc_routine_data { > /* Used for parsing objc foreach statements. */ > static tree objc_foreach_break_label, objc_foreach_continue_label; > > +/* Used for parsing OMP for loops. See c_parser_omp_for_loop. */ > +struct omp_for_parse_data { > + tree declv, condv, incrv, initv; > + tree pre_body; > + tree bindings; > + int collapse; I think it is confusing to call this collapse when it isn't collapse but ordered ? ordered : collapse. Call it count like in c_parser_omp_for_loop? > @@ -6123,6 +6145,15 @@ c_parser_compound_statement_nostart (c_parser *parser) > bool last_label = false; > bool save_valid_for_pragma = valid_location_for_stdc_pragma_p (); > location_t label_loc = UNKNOWN_LOCATION; /* Quiet warning. */ > + struct omp_for_parse_data *omp_for_parse_state; > + > + if (parser->omp_for_parse_state > + && (parser->omp_for_parse_state->depth > + < parser->omp_for_parse_state->collapse - 1)) > + omp_for_parse_state = parser->omp_for_parse_state; > + else > + omp_for_parse_state = NULL; > + > if (c_parser_next_token_is (parser, CPP_CLOSE_BRACE)) > { > location_t endloc = c_parser_peek_token (parser)->location; So, here we skip over various cases that need to be covered in the testcases and decided what to do about. The first one is: #pragma omp for collapse(2) for (int i = 0; i < 64; ++i) { __label__ a, b; goto a; a:; for (int j = 0; j < 64; ++j) ; goto b; b:; } This is a GNU extension, do we want to allow it? Conceptually it is like variable declarations which can appear in intervening code, but we need to make sure it is handled properly (the code registers those in the current scope using declare_label, so supposedly we want to move those later to the body scope when moving there other intervening code. And probably it should be rejected when intervening code is not allowed (+ test for that). Next are C2X standard attributes, case XX:/default: (I think these would violate the single entry single exit requirements of intervening code), label: (that can be fine as long as it is only jumped to from within the same intervening code), so again something for the testsuite: int k = 0; #pragma omp for collapse(2) for (int i = 0; i < 64; ++i) { a: if (k) goto a; for (int j = 0; j < 64; ++j) ; b: if (k) goto b; } Then __extension__, again something that should be considered whether it acts as intervening code, or is even allowed as if it wasn't intervening code + testsuite coverage. Then pragmas which you handle in c_parser_pragma already (but will need some tweaks for tile/unroll from Frederik). else is an error, so nothing needs to be done about it. > @@ -7138,6 +7223,16 @@ c_parser_while_statement (c_parser *parser, bool ivdep, unsigned short unroll, > gcc_assert (c_parser_next_token_is_keyword (parser, RID_WHILE)); > token_indent_info while_tinfo > = get_token_indent_info (c_parser_peek_token (parser)); > + > + if (parser->omp_for_parse_state > + && (parser->omp_for_parse_state->depth > + < parser->omp_for_parse_state->collapse - 1)) > + { > + error_at (c_parser_peek_token (parser)->location, > + "loop not permitted in intervening code in OMP loop body"); Please use OpenMP in diagnostics rather than OMP (multiple times). Also, the while and do cases are not covered in the testsuite, they should be (there is just test for for in there). Would be nice to also test it nested in other compound statements, like #pragma omp for collapse(2) for (int i = 0; i < 64; ++i) { if (1) { for (int j = 0; j < 64; ++j) // { dg-error "..." } ; while (0) ; // { dg-error "..." } do { } while (0); // { dg-error "..." } } for (int k = 0; k < 64; ++k) ; } I guess especially the do { ... } while (0); limitation could be quite harmful, that is something used heavily in various macros. So perhaps for OpenMP 6.0 we should consider there rejecting just for statements for C/C++? And/or also allow for loops if inside of some substatement of say if/do/while/switch, those are also clearly distinguishable from the main loop. On the other side, I've never been really excited by this imperfectly nested loops mess making it into the standard. > + parser->omp_for_parse_state->fail = true; > + } > + > c_parser_consume_token (parser); > block = c_begin_compound_stmt (flag_isoc99); > loc = c_parser_peek_token (parser)->location; > @@ -11234,6 +11349,14 @@ c_parser_postfix_expression_after_primary (c_parser *parser, > && fndecl_built_in_p (expr.value, BUILT_IN_NORMAL) > && vec_safe_length (exprlist) == 1) > warn_for_abs (expr_loc, expr.value, (*exprlist)[0]); > + if (parser->omp_for_parse_state > + && parser->omp_for_parse_state->in_intervening_code > + && omp_runtime_api_call (expr.value)) > + { > + error_at (expr_loc, "Calls to the OpenMP runtime API are " s/Calls/calls/, C/C++ FE/middle end diagnostics never start with capital letter unless it is part of some shorthand/word always spelled with capital letters. > + "not permitted in intervening code"); > + parser->omp_for_parse_state->fail = true; > + } > } > > start = expr.get_start (); > + if (decl == NULL || decl == error_mark_node || init == error_mark_node) > + omp_for_parse_state->fail = true; > + else > + { > + TREE_VEC_ELT (omp_for_parse_state->declv, omp_for_parse_state->depth) > + = decl; > + TREE_VEC_ELT (omp_for_parse_state->initv, omp_for_parse_state->depth) > + = init; > + TREE_VEC_ELT (omp_for_parse_state->condv, omp_for_parse_state->depth) > + = cond; > + TREE_VEC_ELT (omp_for_parse_state->incrv, omp_for_parse_state->depth) > + = incr; This would be more readable using a temporary: int depth = omp_for_parse_state->depth; TREE_VEC_ELT (omp_for_parse_state->declv, depth) = decl; TREE_VEC_ELT (omp_for_parse_state->initv, depth) = init; TREE_VEC_ELT (omp_for_parse_state->condv, depth) = cond; TREE_VEC_ELT (omp_for_parse_state->incrv, depth) = incr; > + } > + > +parse_next: > + omp_for_parse_state->want_nested_loop = true; > + moreloops = omp_for_parse_state->depth < omp_for_parse_state->collapse - 1; Shouldn't omp_for_parse_state->want_nested_loop be initialized to moreloops here? > + if (moreloops && c_parser_next_token_is_keyword (parser, RID_FOR)) > + { > + omp_for_parse_state->depth++; > + body = c_parser_omp_loop_nest (parser, if_p); > + omp_for_parse_state->depth--; > + } > + else if (moreloops && c_parser_next_token_is (parser, CPP_OPEN_BRACE)) > + { > + /* This is the open brace in the loop-body grammar production. Rather > + than trying to special-case braces, just parse it as a compound > + statement and handle the nested loop-body case there. Note that > + when we see a further open brace inside the compound statement > + loop-body, we don't know whether it is the start of intervening > + code that is a compound statement, or a level of braces > + surrounding a nested loop-body. Use the WANT_NESTED_LOOP state > + bit to ensure we have only one nested loop at each level. */ > + omp_for_parse_state->in_intervening_code = true; > + body = c_parser_compound_statement (parser, NULL); > + omp_for_parse_state->in_intervening_code = false; > + if (omp_for_parse_state->want_nested_loop) > + { > + /* We have already parsed the whole loop body and not found a > + nested loop. */ > + error_at (omp_for_parse_state->for_loc, > + "not enough nested loops"); > + omp_for_parse_state->fail = true; > + } > + if_p = NULL; > + } > + else > + { > + /* This is the final-loop-body case in the grammar: we have > + something that is not a FOR and not an open brace. */ > + if (moreloops) > + { > + /* If we were expecting a nested loop, give an error and mark > + that parsing has failed, and try to recover by parsing the > + body as regular code without further collapsing. */ > + error_at (omp_for_parse_state->for_loc, > + "not enough nested loops"); > + omp_for_parse_state->fail = true; > + } > + in_statement = IN_OMP_FOR; And/or temporarily clear parser->omp_for_parse_state here and reset it back afterwards? Then if (parser->omp_for_parse_state && (parser->omp_for_parse_state->depth < parser->omp_for_parse_state->collapse - 1)) omp_for_parse_state = parser->omp_for_parse_state; else omp_for_parse_state = NULL; etc. wouldn't be really needed, if parser->omp_for_parse_state would be non-NULL, we'd want to use it without further checks. > + body = push_stmt_list (); > + if (omp_for_parse_state->inscan) > + c_parser_omp_scan_loop_body (parser, false); > + else > + add_stmt (c_parser_c99_block_statement (parser, if_p)); > + body = pop_stmt_list (body); > + } > + in_statement = save_in_statement; > + omp_for_parse_state->want_nested_loop = false; > + omp_for_parse_state->in_intervening_code = true; > + > + /* Pop and return the implicit scope surrounding this level of loop. > + Any iteration variable bound in loop_scope is pulled out and later > + will be added to the scope surrounding the entire OMP_FOR. That > + keeps the gimplifier happy later on, and meanwhile we have already > + resolved all references to the iteration variable in its true scope. */ > + add_stmt (body); > + body = c_end_compound_stmt (loc, loop_scope, true); > + if (decl && TREE_CODE (body) == BIND_EXPR) So, this moves just the iteration var if declared in the for loop and nothing else? > + { > + tree t = BIND_EXPR_VARS (body); > + tree prev = NULL_TREE, next = NULL_TREE; > + while (t) > + { > + next = DECL_CHAIN (t); > + if (t == decl) > + { > + if (prev) > + DECL_CHAIN (prev) = next; > + else > + { > + BIND_EXPR_VARS (body) = next; > + BLOCK_VARS (BIND_EXPR_BLOCK (body)) = next; > + } > + DECL_CHAIN (t) = omp_for_parse_state->bindings; > + omp_for_parse_state->bindings = t; > + break; > + } > + else > + { > + prev = t; > + t = next; > + } > + } > + if (BIND_EXPR_VARS (body) == NULL_TREE) > + body = BIND_EXPR_BODY (body); > + } > + > + return body; > +} > + > /* Parse the restricted form of loop statements allowed by OpenACC and OpenMP. > The real trick here is to determine the loop control variable early > so that we can push a new decl if necessary to make it private. > +#pragma omp for ordered(3) > + for (i = 0; i < a1; i++) /* { dg-error "inner loops must be perfectly nested" } */ > + { > + f1 (0, i); > + for (j = 0; j < a2; j++) > + { > + f1 (1, j); > + for (k = 0; k < a3; k++) > + { > + f1 (2, k); > + f2 (2, k); Would be good to stick here #pragma omp ordered doacross(source) and sink, just to make it valid except for the intervening code. > + } > + f2 (1, j); > + } > + f2 (0, i); > + } > +} > + > diff --git a/gcc/testsuite/c-c++-common/gomp/imperfect4.c b/gcc/testsuite/c-c++-common/gomp/imperfect4.c > new file mode 100644 > index 00000000000..e5feff730a9 > --- /dev/null > +++ b/gcc/testsuite/c-c++-common/gomp/imperfect4.c > @@ -0,0 +1,35 @@ > +/* { dg-do compile } */ > + > +/* This test case is expected to fail due to errors. */ > + > +static int f1count[3], f2count[3]; Don't declare variables you don't use at all in the test (in multiple tests). > --- /dev/null > +++ b/gcc/testsuite/c-c++-common/gomp/imperfect5.c > @@ -0,0 +1,59 @@ > +/* { dg-do compile } */ I think you'd better just copy some existing scan testcase and just add intervening code to it. > + > +/* This test case is expected to fail due to errors. */ > + > +static int f1count[3], f2count[3]; > + > +int f1 (int depth, int iter); > +int f2 (int depth, int iter); > +int ijk (int x, int y, int z); > +void f3 (int sum); > + > +/* This function isn't particularly meaningful, but it should compile without > + error. */ > +int s1 (int a1, int a2, int a3) > +{ > + int i, j, k; > + int r = 0; > + > +#pragma omp simd collapse(3) reduction (inscan, +:r) > + for (i = 0; i < a1; i++) > + { > + for (j = 0; j < a2; j++) > + { > + for (k = 0; k < a3; k++) > + { > + r = r + ijk (i, j, k); > +#pragma omp scan exclusive (r) > + f3 (r); > + } > + } > + } > + return r; > +} > + > +/* Adding intervening code should trigger an error. */ > +int s2 (int a1, int a2, int a3) > +{ > + int i, j, k; > + int r = 0; > + > +#pragma omp simd collapse(3) reduction (inscan, +:r) > + for (i = 0; i < a1; i++) /* { dg-error "inner loops must be perfectly nested" } */ > + { > + f1 (0, i); > + for (j = 0; j < a2; j++) > + { > + f1 (1, j); > + for (k = 0; k < a3; k++) > + { > + r = r + ijk (i, j, k); > +#pragma omp scan exclusive (r) > + f3 (r); > + } > + f2 (1, j); > + } > + f2 (0, i); > + } > + return r; > +} Jakub