From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180]) by sourceware.org (Postfix) with ESMTPS id 703153858D28 for ; Fri, 1 Jul 2022 16:31:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 703153858D28 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.92,237,1650960000"; d="scan'208";a="78056400" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa3.mentor.iphmx.com with ESMTP; 01 Jul 2022 08:31:54 -0800 IronPort-SDR: ihsARNwcgn6cmmUo/1Bb0vht560ACYiw726otDp/sDGL6EV3I4n8gzLsVdjZyD7U8wdrFx0UwI UnkLc4lvdA70Shw8rykhIpWD22Yr/K85WDjLvAkHnqrvUPE5chVGhfovOOZ9MCLnVGfcBuN85G sjsfEc50MHaRxJcjP3zgKUue9rHwSyco6IxlTFL6hf4GzRDjsKRWNXD4RjI1smhbbnDP+o3OQ0 rFaXmjB7nAg5hwQTpY3eiXVXAs/6qqZag80JDy5b3KUetm8xp2pIXPd8q8nfYl/j/OnEjki3sp onk= Message-ID: Date: Fri, 1 Jul 2022 18:31:48 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Subject: Re: [Patch][v5] OpenMP: Move omp requires checks to libgomp Content-Language: en-US To: Jakub Jelinek CC: gcc-patches References: <07fec82a-41cf-fdc5-6307-c068dd95ef1a@mentor.com> <7f9c91c1-a479-f94f-ac14-1d6827ce671b@codesourcery.com> <5576fa00-0ddd-8046-17c1-d1cea82bdcf5@codesourcery.com> <77331328-4961-9dab-db58-b5b03daf218c@codesourcery.com> <16ca2aa4-7e73-cf9d-9482-dd59f5b0cdae@codesourcery.com> From: Tobias Burnus In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: quoted-printable X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: SVR-IES-MBX-07.mgc.mentorg.com (139.181.222.7) To svr-ies-mbx-12.mgc.mentorg.com (139.181.222.12) X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, NICE_REPLY_A, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Jul 2022 16:31:58 -0000 On 01.07.22 16:34, Jakub Jelinek wrote: > On Fri, Jul 01, 2022 at 03:06:05PM +0200, Tobias Burnus wrote: > [...] > Will Fortran diagnose: > subroutine foo > !$omp requires unified_shared_memory > !$omp target > !$omp end target > end subroutine foo > subroutine bar > !$omp requires reverse_offload > !$omp target > !$omp end target > end subroutine bar > > or just merge it from the different namespaces? This is done in openmp.cc during parsing. The merging you quoted (in parse.= cc) happens after the whole input file has been parsed and resolved. For your test case= , the following error is shown: test.f90:1:15: 1 | subroutine foo | 1 Error: Program unit at (1) has OpenMP device constructs/routines but does n= ot set !$OMP REQUIRES REVERSE_OFFLOAD but other program units do test.f90:6:14: 6 | subroutine bar | 1 Error: Program unit at (1) has OpenMP device constructs/routines but does n= ot set !$OMP REQUIRES UNIFIED_SHARED_MEMORY but other program units do > @@ -1764,6 +1781,20 @@ input_symtab (void) >> } >> } >> >> +static void >> +omp_requires_to_name (char *buf, size_t size, unsigned int requires_mas= k) >> +{ >> + char *end =3D buf + size, *p =3D buf; >> + if (requires_mask & GOMP_REQUIRES_UNIFIED_ADDRESS) >> + p +=3D snprintf (p, end - p, "unified_address"); >> + if (requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_MEMORY) >> + p +=3D snprintf (p, end - p, "%sunified_shared_memory", >> + (p =3D=3D buf ? "" : ", ")); >> + if (requires_mask & GOMP_REQUIRES_REVERSE_OFFLOAD) >> + p +=3D snprintf (p, end - p, "%sreverse_offload", >> + (p =3D=3D buf ? "" : ", ")); > So, what does this print if requires_mask is 0 (or just the target used b= it > set but not unified_address, unified_shared_memory nor reverse_offload)? Well, that's what libgomp/testsuite/libgomp.c-c++-common/requires-2.c (+ *-= 2-aux.c) tests: /* { dg-error "OpenMP 'requires' directive with non-identical clauses in mu= ltiple compilation units: 'unified_shared_memory' vs. ''" "" { target *-*-*= } 0 } */ I hope the '' vs. 'unified_shared_memory' is clear - but if you have a bett= er wording. Note that both: no 'omp requires' and 'omp requires' with other clauses (such as the atomic ones or dynamic_al= locators) will lead to 0. Thus, if the wording is changed, it should fit for both cas= es. >> @@ -1810,6 +1847,54 @@ input_offload_tables (bool do_force_output) >> may be no refs to var_decl in offload LTO mode. */ >> if (do_force_output) >> varpool_node::get (var_decl)->force_output =3D 1; >> + tmp_decl =3D var_decl; >> + } >> + else if (tag =3D=3D LTO_symtab_edge) >> + { >> + static bool error_emitted =3D false; >> + HOST_WIDE_INT val =3D streamer_read_hwi (ib); >> + >> + if (omp_requires_mask =3D=3D 0) >> + { >> + omp_requires_mask =3D (omp_requires) val; >> + requires_decl =3D tmp_decl; >> + requires_fn =3D file_data->file_name; > And similarly here, if some device construct is seen but requires > directive isn't, not sure if in this version val would be 0 or something > with the TARGET_USED bit set. In the latter case, only what is printed > for no requires or just atomic related requires is a problem, in the form= er > case due to the =3D=3D 0 check mixing of 0 with non-zero would be ignored > but mixing of non-zero with 0 wouldn't be. Here: 0 =3D "unset" in the sense that either TARGET_USE nor USM/UA/RO was specified. If any of those is set, we get !=3D 0. For mkoffload, the single results are merged - and TARGET_USE is stripped, such that it is either 0 or a combination of USM/UA/RO >> + } >> + else if (omp_requires_mask !=3D val && !error_emitted) >> + { >> + char buf[64], buf2[64]; > Perhaps cleaner would be to size the buffers as > sizeof ("unified_address,unified_shared_memory,reverse_offload") > 64 is more, but just a wild guess and if further clauses are added later, > it might be too small. I concur =E2=80=93 except that ',' should be ', '. (Likewise in libgomp/target.c) > @@ -1821,6 +1906,18 @@ input_offload_tables (bool do_force_output) >> lto_destroy_simple_input_block (file_data, LTO_section_offload_t= able, >> ib, data, len); >> } >> +#ifdef ACCEL_COMPILER >> + char *omp_requires_file =3D getenv ("GCC_OFFLOAD_OMP_REQUIRES_FILE"); >> + if (omp_requires_file =3D=3D NULL || omp_requires_file[0] =3D=3D '\0'= ) >> + fatal_error (input_location, "GCC_OFFLOAD_OMP_REQUIRES_FILE unset")= ; >> + FILE *f =3D fopen (omp_requires_file, "wb"); >> + if (!f) >> + fatal_error (input_location, "Cannot open omp_requires file %qs", >> + omp_requires_file); >> + uint32_t req_mask =3D omp_requires_mask & ~OMP_REQUIRES_TARGET_USED; > Perhaps it is better to also store the TARGET_USED bit and on the library > side completely ignore values of 0. For the compiler side, we need to distinguish no requires vs. some requires when checking multiple TU (to distinguish it from TU which do not use target constructs). But for libgomp only the result counts: no requires or some requires. Thus, passing 0 if there are no USM/UA/RO should be fine =E2=80=93 and the = code does so. This 0 is then passed on to the plugin to check against it. If we pass target_used to libgomp, we need to filter it out at some point. >> --- a/gcc/omp-low.cc >> +++ b/gcc/omp-low.cc >> @@ -12701,6 +12701,11 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, = omp_context *ctx) >> gcc_unreachable (); >> } >> >> + /* Ensure that requires map is written via output_offload_tables, eve= n if only >> + 'target (enter/exit) data' is used in the translation unit. */ >> + if (ENABLE_OFFLOADING && (omp_requires_mask & OMP_REQUIRES_TARGET_USE= D)) >> + g->have_offload =3D true; > Is > c.c: > #pragma omp requires unified_shared_memory > d.c: > void baz (void) { > #pragma omp target > ; > } > ok? This one is *already* streamed out as it creates a symbol and entry in in offload_functions (baz.omp_fn.0). The code is rather for '#pragma omp target enter data map(x)' as this only adds a library call and no symbol. > Pedantically reading current standard probably yes, but perhaps again > something to be discussed. The question is what the requires directive > in that case would do, nothing at all as there are no device constructs > etc.? Isn't there a device construct =E2=80=93 which happens to be empty? With 'omp target map(always, to: x)' it would be even observable that the code is run. > In that case omp_requires_mask & OMP_REQUIRES_TARGET_USED is right. > But if it should influence the behavior anyway, the restriction should be > Either all compilation units of a program that contain ... device > constructs ... should include also requires directive with one of the > unified_shared_memory, unified_address or reverse_offload clauses. > In that case the test would be > omp_requires_mask & (OMP_REQUIRES_TARGET_USED | OMP_REQUIRES_UNIFIED* | O= MP_REQUIRES_REV*) I think I am lost =E2=80=93 don't we effectively test this? We filter out everything else in output_offload_tables. Thus, in input_offload_tables, a single '=3D=3D' will do. (We additionally know that TARGET_USED is set - as otherwise there wouldn't be a symbol in the offload table.) Thus, it is unclear to me what you propose here. >> +static void >> +gomp_requires_to_name (char *buf, size_t size, int requires_mask) >> +{ >> + char *end =3D buf + size, *p =3D buf; >> + if (requires_mask & GOMP_REQUIRES_UNIFIED_ADDRESS) >> + p +=3D snprintf (p, end - p, "unified_address"); >> + if (requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_MEMORY) >> + p +=3D snprintf (p, end - p, "%sunified_shared_memory", >> + (p =3D=3D buf ? "" : ", ")); >> + if (requires_mask & GOMP_REQUIRES_REVERSE_OFFLOAD) >> + p +=3D snprintf (p, end - p, "%sreverse_offload", >> + (p =3D=3D buf ? "" : ", ")); >> +} > Same question as earlier. Same answer, except that in libgomp, this code is effectively only reachable when omp_requires_mask !=3D 0 as it reaches this code only if either some additional flag was added (in register_ver) or when devices were available, but those do not support a flag. We just have to remember to update this, if we ever add additional flags. >> /* This function should be called from every offload image while loadi= ng. >> It gets the descriptor of the host func and var tables HOST_TABLE, = TYPE of >> the target, and TARGET_DATA needed by target plugin. */ >> @@ -2323,11 +2341,29 @@ GOMP_offload_register_ver (unsigned version, con= st void *host_table, >> int target_type, const void *target_data) >> { >> int i; >> + int omp_req =3D omp_requires_mask; >> >> if (GOMP_VERSION_LIB (version) > GOMP_VERSION) >> gomp_fatal ("Library too old for offload (version %u < %u)", >> GOMP_VERSION, GOMP_VERSION_LIB (version)); >> - >> + >> + if (GOMP_VERSION_LIB (version) > 1) >> + { >> + omp_req =3D (int) (size_t) ((void **) target_data)[0]; >> + target_data =3D &((void **) target_data)[1]; >> + if (num_devices && (omp_req & ~omp_requires_mask)) >> + { >> + char buf[64]; >> + gomp_requires_to_name (buf, sizeof (buf), >> + omp_req & ~omp_requires_mask); >> + gomp_error ("devices already initialized when registering additio= nal " >> + "offload images that use the additional OpenMP 'requi= res'" >> + " directive clauses %s. Therefore, the program might = not " >> + "run correctly", buf); >> + } >> + omp_requires_mask |=3D omp_req; >> + } > Both omp_requires_mask and num_devices are global vars that would be > modified concurrently in some other thread, so the above is racy. > > What I'd do is int omp_req =3D 0; early, just the omp_req + target_data i= n > if (GOMP_VERSION_LIB (version) > 1) otherwise. That computes > the local omp_req only. > >> + >> gomp_mutex_lock (®ister_lock); > Then under the lock, you can do the merging. > But, IMHO the runtime library should repeat what is done in the offloadin= g > lto1, diagnose if there are differences between the masks in between > different TUs, here at runtime on the program/shared library level. > And IMHO the error you emit above is unnecessary, because (at least > hopefully) the num_devices computation / device initialization should > only happen on behalf of some device construct or device related OpenMP A= PI > routine, so at that point the shared library or program that does that > should have its own mask and if something is dlopened later, it should > either have compatible mask (nothing is diagnosed) or incompatible, but t= hen > it should be diagnosed like any other incompatibilities. OK =E2=80=93 I will diagnose it always. Question: If it is not the same, should there just be a message to stderr (gomp_error) or should libgomp abort (gomp_fatal)? Downside is that I cannot really provide much data where it fails. But on the other hand, it will probably only rarely occur. > I thought I've mentioned earlier it would be nice to rename the > get_num_devices plugin hook because its API has changed, so that > if one mixes old plugin with new libgomp or vice versa it doesn't > break silently. As discussed off list, gomp_load_plugin_for_device calls if (device->version_func () !=3D GOMP_VERSION) and we did bump the GOMP_VERSIO= N. Tobias ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstra=C3=9Fe 201= , 80634 M=C3=BCnchen; Gesellschaft mit beschr=C3=A4nkter Haftung; Gesch=C3= =A4ftsf=C3=BChrer: Thomas Heurung, Frank Th=C3=BCrauf; Sitz der Gesellschaf= t: M=C3=BCnchen; Registergericht M=C3=BCnchen, HRB 106955