From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153]) by sourceware.org (Postfix) with ESMTPS id 7B20A3858D20; Tue, 1 Mar 2022 16:46:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7B20A3858D20 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.90,146,1643702400"; d="scan'208";a="75132591" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa1.mentor.iphmx.com with ESMTP; 01 Mar 2022 08:46:35 -0800 IronPort-SDR: DB2q1Zy5iW8JHEi2lRCklxRR+RwtnGuI61chdRpojp+cceO/jo5+TrCZFuMmgR/vrAZu6QZV3Y 5mSEeNw8WAtN2HkhzDPtu/vZsXQtlYBtwr738E57Tj2mwAWFLCwW7QVerbLrrCaTW3+whJIxoe Ouxe0tG6kkTCGB91zJZpn9EzC1e5A74qCpd1B8A55spZlAngMoWyle+bmZxnA3HZdJuvb1NhEi kKiCoOUT8EPbb4ZKs+0irLTINa+hzw8iwKhu9XhfpDxfPZbElZrEHyOQSyRB+BxKWdzI3pq5Nz mpw= From: Thomas Schwinge To: Jakub Jelinek , , CC: , Julian Brown Subject: Re: OpenACC 'kernels' decomposition: Mark variables used in synthesized data clauses as addressable [PR100280] In-Reply-To: <87iluovu47.fsf@euler.schwinge.homeip.net> References: <20190508145157.08beb4df@squid.athome> <87iluovu47.fsf@euler.schwinge.homeip.net> User-Agent: Notmuch/0.29.3+94~g74c3f1b (https://notmuchmail.org) Emacs/27.1 (x86_64-pc-linux-gnu) Date: Tue, 1 Mar 2022 17:46:20 +0100 Message-ID: <87zgm9mxib.fsf@euler.schwinge.homeip.net> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-09.mgc.mentorg.com (139.181.222.9) To svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) X-Spam-Status: No, score=-6.0 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: fortran@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Fortran mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Mar 2022 16:46:38 -0000 Hi! Jakub, need your review/approval here, please: On 2022-01-13T10:54:16+0100, I wrote: > On 2019-05-08T14:51:57+0100, Julian Brown wrote= : >> - The "addressable" bit is set during the kernels conversion pass for >> variables that have "create" (alloc) clauses created for them in the >> synthesised outer data region (instead of in the front-end, etc., >> where it can't be done accurately). Such variables actually have >> their address taken during transformations made in a later pass >> (omp-low, I think), but there's a phase-ordering problem that means >> the flag should be set earlier. > > The actual issue is a bit different, but yes, there is a problem. > The related ICE has also been reported as > "ICE in lower_omp_target, at omp-low.c:12287". (And I'm confused why we > didn't run into that with the OpenACC 'kernels' decomposition > originally.) I've pushed to master branch > commit 9b32c1669aad5459dd053424f9967011348add83 > "OpenACC 'kernels' decomposition: Mark variables used in synthesized data > clauses as addressable [PR100280]" > ... as otherwise 'gcc/omp-low.c:lower_omp_target' has to create a tempora= ry: > > 13073 else if (is_gimple_reg (var)) > 13074 { > 13075 gcc_assert (offloaded); > 13076 tree avar =3D create_tmp_var (TREE_TYPE= (var)); > 13077 mark_addressable (avar); > > ..., which (a) is only implemented for actualy *offloaded* regions (but n= ot > data regions), and (b) the subsequently synthesized code for writing to a= nd > later reading back from the temporary fundamentally conflicts with OpenAC= C > 'async' (as used by OpenACC 'kernels' decomposition). That's all not tri= vial > to make work, so let's just avoid this case. > --- a/gcc/omp-oacc-kernels-decompose.cc > +++ b/gcc/omp-oacc-kernels-decompose.cc > @@ -793,7 +793,8 @@ make_data_region_try_statement (location_t loc, gimpl= e *body) > > /* If INNER_BIND_VARS holds variables, build an OpenACC data region with > location LOC containing BODY and having 'create (var)' clauses for ea= ch > - variable. If INNER_CLEANUP is present, add a try-finally statement w= ith > + variable (as a side effect, such variables also get TREE_ADDRESSABLE = set). > + If INNER_CLEANUP is present, add a try-finally statement with > this cleanup code in the finally block. Return the new data region, = or > the original BODY if no data region was needed. */ > > @@ -842,6 +843,9 @@ maybe_build_inner_data_region (location_t loc, gimple= *body, > inner_data_clauses =3D new_clause; > > prev_mapped_var =3D v; > + > + /* See . */ > + TREE_ADDRESSABLE (v) =3D 1; > } > } So, that's too simple. ;-) ... and gives rise to workaround patches like we have on the og11 development branch: - "Avoid introducing 'create' mapping clauses for loop index variables in= kernels regions", - "Run all kernels regions with GOMP_MAP_FORCE_TOFROM mappings synchronou= sly", - "Fix for is_gimple_reg vars to 'data kernels'" We're after gimplification, and must not just set 'TREE_ADDRESSABLE', because that may easily violate GIMPLE invariants, leading to ICEs later. There are a few open PRs, which my following changes are addressing. To make "late" 'TREE_ADDRESSABLE' work, we have a precedent in OpenMP's 'gcc/omp-low.cc:task_shared_vars' handling, as Jakub had pointed to in discussion of . (PR102330 turned out to be unrelated from the "late" 'TREE_ADDRESSABLE' problem here; I have a different patch for it.) I'm thus proposing to generalize 'gcc/omp-low.cc:task_shared_vars' into 'make_addressable_vars', plus new 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' that we then may use instead of the 'TREE_ADDRESSABLE (v) =3D 1;' quoted above (plus one or two additional ones to be introduced in later patches), and wire that up in 'gcc/omp-low.cc:scan_sharing_clauses', for 'OMP_CLAUSE_MAP': set 'TREE_ADDRESSABLE' and put into 'make_addressable_vars' for later fix-up. (In reply to Jakub Jelinek from comment #9) > Whether you can use the same bitmap or need to add another bitmap next to > task_shared_vars is something hard to guess without diving into it deeply= . Per my understanding of the code, the only place where I had doubts is 'gcc/omp-low.cc:finish_taskreg_scan', but I have convinced myself that what this is doing is either a no-op in the 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' case, or in fact necessary as the original 'task_shared_vars' handling has been. Either way: I couldn't come up with a way (test case) that we'd actually run into this case; you'd have to have the relevant OpenMP constructs inside an OpenACC 'kernels' region, which isn't permitted per 'gcc/omp-low.cc:check_omp_nesting_restrictions'. OK to proceed in this way? Gr=C3=BC=C3=9Fe Thomas --- gcc/omp-low.cc +++ gcc/omp-low.cc @@ -188,7 +188,7 @@ struct omp_context static splay_tree all_contexts; static int taskreg_nesting_level; static int target_nesting_level; -static bitmap task_shared_vars; +static bitmap make_addressable_vars; static bitmap global_nonaddressable_vars; static vec taskreg_contexts; static vec task_cpyfns; @@ -572,9 +572,9 @@ use_pointer_for_field (tree decl, omp_context *shared_c= tx) /* Taking address of OUTER in lower_send_shared_vars might need regimplification of everything that uses the variable. */ - if (!task_shared_vars) - task_shared_vars =3D BITMAP_ALLOC (NULL); - bitmap_set_bit (task_shared_vars, DECL_UID (outer)); + if (!make_addressable_vars) + make_addressable_vars =3D BITMAP_ALLOC (NULL); + bitmap_set_bit (make_addressable_vars, DECL_UID (outer)); TREE_ADDRESSABLE (outer) =3D 1; } return true; @@ -601,13 +601,13 @@ omp_copy_decl_2 (tree var, tree name, tree type, omp_= context *ctx) else record_vars (copy); - /* If VAR is listed in task_shared_vars, it means it wasn't - originally addressable and is just because task needs to take - it's address. But we don't need to take address of privatizations + /* If VAR is listed in make_addressable_vars, it wasn't + originally addressable, but was only later made so. + We don't need to take address of privatizations from that var. */ if (TREE_ADDRESSABLE (var) - && ((task_shared_vars - && bitmap_bit_p (task_shared_vars, DECL_UID (var))) + && ((make_addressable_vars + && bitmap_bit_p (make_addressable_vars, DECL_UID (var))) || (global_nonaddressable_vars && bitmap_bit_p (global_nonaddressable_vars, DECL_UID (var)))= )) TREE_ADDRESSABLE (copy) =3D 0; @@ -1495,6 +1495,21 @@ scan_sharing_clauses (tree clauses, omp_context *ctx= ) if (ctx->outer) scan_omp_op (&OMP_CLAUSE_SIZE (c), ctx->outer); decl =3D OMP_CLAUSE_DECL (c); + /* If requested, make 'decl' addressable. */ + if (OMP_CLAUSE_CODE (c) =3D=3D OMP_CLAUSE_MAP + && OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE (c)) + { + gcc_checking_assert (DECL_P (decl)); + + gcc_checking_assert (!TREE_ADDRESSABLE (decl)); + if (!make_addressable_vars) + make_addressable_vars =3D BITMAP_ALLOC (NULL); + bitmap_set_bit (make_addressable_vars, DECL_UID (decl)); + TREE_ADDRESSABLE (decl) =3D 1; + + /* Done. */ + OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE (c) =3D 0; + } /* Global variables with "omp declare target" attribute don't need to be copied, the receiver side will use them directly. However, global variables with "omp declare target = link" @@ -2371,11 +2405,11 @@ finish_taskreg_scan (omp_context *ctx) if (ctx->record_type =3D=3D NULL_TREE) return; - /* If any task_shared_vars were needed, verify all + /* If any make_addressable_vars were needed, verify all OMP_CLAUSE_SHARED clauses on GIMPLE_OMP_{PARALLEL,TASK,TEAMS} statements if use_pointer_for_field hasn't changed because of that. If it did, update field types now. */ - if (task_shared_vars) + if (make_addressable_vars) { tree c; @@ -2390,7 +2424,7 @@ finish_taskreg_scan (omp_context *ctx) the receiver side will use them directly. */ if (is_global_var (maybe_lookup_decl_in_outer_ctx (decl, ctx))) continue; - if (!bitmap_bit_p (task_shared_vars, DECL_UID (decl)) + if (!bitmap_bit_p (make_addressable_vars, DECL_UID (decl)) || !use_pointer_for_field (decl, ctx)) continue; tree field =3D lookup_field (decl, ctx); @@ -14040,7 +14074,7 @@ lower_omp_teams (gimple_stmt_iterator *gsi_p, omp_c= ontext *ctx) /* Callback for lower_omp_1. Return non-NULL if *tp needs to be regimplified. If DATA is non-NULL, lower_omp_1 is outside - of OMP context, but with task_shared_vars set. */ + of OMP context, but with make_addressable_vars set. */ static tree lower_omp_regimplify_p (tree *tp, int *walk_subtrees, @@ -14054,9 +14088,9 @@ lower_omp_regimplify_p (tree *tp, int *walk_subtree= s, && DECL_HAS_VALUE_EXPR_P (t)) return t; - if (task_shared_vars + if (make_addressable_vars && DECL_P (t) - && bitmap_bit_p (task_shared_vars, DECL_UID (t))) + && bitmap_bit_p (make_addressable_vars, DECL_UID (t))) return t; /* If a global variable has been privatized, TREE_CONSTANT on @@ -14141,7 +14175,7 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_conte= xt *ctx) if (gimple_has_location (stmt)) input_location =3D gimple_location (stmt); - if (task_shared_vars) + if (make_addressable_vars) memset (&wi, '\0', sizeof (wi)); /* If we have issued syntax errors, avoid doing any heavy lifting. @@ -14158,7 +14192,7 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_conte= xt *ctx) case GIMPLE_COND: { gcond *cond_stmt =3D as_a (stmt); - if ((ctx || task_shared_vars) + if ((ctx || make_addressable_vars) && (walk_tree (gimple_cond_lhs_ptr (cond_stmt), lower_omp_regimplify_p, ctx ? NULL : &wi, NULL) @@ -14250,7 +14284,7 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_conte= xt *ctx) lower_omp_critical (gsi_p, ctx); break; case GIMPLE_OMP_ATOMIC_LOAD: - if ((ctx || task_shared_vars) + if ((ctx || make_addressable_vars) && walk_tree (gimple_omp_atomic_load_rhs_ptr ( as_a (stmt)), lower_omp_regimplify_p, ctx ? NULL : &wi, NULL)) @@ -14371,7 +14405,7 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_conte= xt *ctx) default: regimplify: - if ((ctx || task_shared_vars) + if ((ctx || make_addressable_vars) && walk_gimple_op (stmt, lower_omp_regimplify_p, ctx ? NULL : &wi)) { @@ -14435,10 +14469,10 @@ execute_lower_omp (void) if (all_contexts->root) { - if (task_shared_vars) + if (make_addressable_vars) push_gimplify_context (); lower_omp (&body, NULL); - if (task_shared_vars) + if (make_addressable_vars) pop_gimplify_context (NULL); } @@ -14447,7 +14481,7 @@ execute_lower_omp (void) splay_tree_delete (all_contexts); all_contexts =3D NULL; } - BITMAP_FREE (task_shared_vars); + BITMAP_FREE (make_addressable_vars); BITMAP_FREE (global_nonaddressable_vars); /* If current function is a method, remove artificial dummy VAR_DECL cre= ated --- gcc/omp-oacc-kernels-decompose.cc +++ gcc/omp-oacc-kernels-decompose.cc @@ -845,7 +845,11 @@ maybe_build_inner_data_region (location_t loc, gimple = *body, prev_mapped_var =3D v; /* See . */ - TREE_ADDRESSABLE (v) =3D 1; + if (!TREE_ADDRESSABLE (v)) + { + /* Request that OMP lowering make 'v' addressable. */ + OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE (new_clause) =3D 1; + } } } --- gcc/tree-core.h +++ gcc/tree-core.h @@ -1145,6 +1145,9 @@ struct GTY(()) tree_base { PREDICT_EXPR_OUTCOME in PREDICT_EXPR + OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE in + OMP_CLAUSE + static_flag: TREE_STATIC in --- gcc/tree.h +++ gcc/tree.h @@ -1695,6 +1695,11 @@ class auto_suppress_location_wrappers #define OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P(NODE) \ (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP)->base.deprecated_flag) +/* Flag that 'OMP_CLAUSE_DECL (NODE)' is to be made addressable during OMP + lowering. */ +#define OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE(NODE) \ + (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP)->base.addressable_flag) + /* True on an OMP_CLAUSE_USE_DEVICE_PTR with an OpenACC 'if_present' clause. */ #define OMP_CLAUSE_USE_DEVICE_PTR_IF_PRESENT(NODE) \ ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstra=C3=9Fe 201= , 80634 M=C3=BCnchen; Gesellschaft mit beschr=C3=A4nkter Haftung; Gesch=C3= =A4ftsf=C3=BChrer: Thomas Heurung, Frank Th=C3=BCrauf; Sitz der Gesellschaf= t: M=C3=BCnchen; Registergericht M=C3=BCnchen, HRB 106955