From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa4.mentor.iphmx.com (esa4.mentor.iphmx.com [68.232.137.252]) by sourceware.org (Postfix) with ESMTPS id A3FF5386CE59 for ; Thu, 30 Jun 2022 09:09:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A3FF5386CE59 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.92,233,1650960000"; d="scan'208";a="78078625" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa4.mentor.iphmx.com with ESMTP; 30 Jun 2022 01:09:23 -0800 IronPort-SDR: NcsQvgG03oyWDP1BUzlLIq9YMuHJkhBRdQibIbdMwo5WJtxZzTlK+5opxJx83Zjg1s5M1wDaAR BJpj2qMEb10//4UsMfcZ0AEY9WUWXmQ+J5l3rcFVIileEkP1c+HSoXDSDpoKIQ7BAnLo3stATd dXJWrlVgjG6ESR+aMya9/IcnA8o/DQ852mvnOLwlZ5BPickY+TAvmaXPssOEL1AmWr9ZKrCxlL YzzkUET9ZLo7N5wh4skAL35uBXry8YQCYQkqw9erbdV8lE0y0FAvrmp5MsvpOoKfqcYua5RHUG KhU= Message-ID: Date: Thu, 30 Jun 2022 11:09:16 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Subject: Re: [Patch] OpenMP: Prepare omp-* for ancestor:1 handling Content-Language: en-US To: Jakub Jelinek CC: gcc-patches References: From: Tobias Burnus In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: quoted-printable X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-05.mgc.mentorg.com (139.181.222.5) To svr-ies-mbx-12.mgc.mentorg.com (139.181.222.12) X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, NICE_REPLY_A, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Jun 2022 09:09:30 -0000 Hi Jakub, On 30.06.22 10:21, Jakub Jelinek wrote: > So, what is the plan with reverse offload? My idea was to just call omp_target_ext with 'device(omp_initial_device)'. This then automatically works when called from a target region that runs on omp_get_initial_device(). For the actual device part, this can be implemented incrementally by supporting the reverse_offload for a given device type. For getting it to work when the code enclosing the ancestor:1 target region runs on an offloading device, my idea is the following. Comments are welcome! My idea was to do the same as done for I/O (which supported for both nvptx and gcn). For GCN: libgomp/plugin/plugin-gcn.c has: struct kernargs { /* A pointer to struct output, below, for console output data. */ int64_t out_ptr; /* A pointer to struct heap, below. */ int64_t heap_ptr; /* A pointer to an ephemeral memory arena. Only needed for OpenMP. */ int64_t arena_ptr; /* to be added: */ /* A pointer to reverse-offload. */ int64_t rev_ptr; /* Now come the actual structs.*/ /* Output data. */ struct output { int return_value; unsigned int next_output; struct printf_data { ... }; This gets initialized on the host and then: while (hsa_fns.hsa_signal_wait_acquire_fn (s, HSA_SIGNAL_CONDITION_LT, 1= , 1000 * 1000, HSA_WAIT_STATE_BLOCKED) !=3D = 0) console_output (kernel, shadow->kernarg_address, false); with: unsigned int from =3D __atomic_load_n (&kernargs->output_data.consumed, __ATOMIC_ACQUIRE); The I/O itself is implemented in newlib, https://sourceware.org/git/?p=3Dnewlib-cygwin.git;a=3Dblob;f=3Dnewlib/libc/= sys/amdgcn/write.c register void **kernargs asm("s8"); struct output *data =3D (struct output *)kernargs[2]; and then the data is filled. For reverse offload, the idea is fill it on the device side via /libgomp/config/gcn/target.c's GOMP_target_ext for device =3D=3D GOMP_DEVICE_HOST_FALLBACK && fn !=3D NULL as: Try to obtain a lock (busy wait) Put addr/kinds/sizes into the struct Put the device's fn pointer in the struct busy wait for completion ('while (fn !=3D NULL) { }') unlock And on the host side: If fn =3D=3D NULL (=3D data there) - return output/offload checking loop Otherwise: call a new function in target.c and pass args to it. Once it completed, set fn =3D NULL to indicate it has been processed. And in target.c's new reverse-offload-handling function: - find generated-target function on the host, based on device stub function's pointer address - Handle the mapping - Call host function - Handle the mapping - return Additionally: If 'requires reverse_offload' is set, fill not only the normal splay_tree for "host -> device" lookup but also another one for the "device -> host" lookups. Does this make sense? Tobias ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstra=C3=9Fe 201= , 80634 M=C3=BCnchen; Gesellschaft mit beschr=C3=A4nkter Haftung; Gesch=C3= =A4ftsf=C3=BChrer: Thomas Heurung, Frank Th=C3=BCrauf; Sitz der Gesellschaf= t: M=C3=BCnchen; Registergericht M=C3=BCnchen, HRB 106955