From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180]) by sourceware.org (Postfix) with ESMTPS id 6BACD3858D39 for ; Thu, 29 Sep 2022 16:24:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6BACD3858D39 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.93,355,1654588800"; d="scan'208";a="83742974" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa3.mentor.iphmx.com with ESMTP; 29 Sep 2022 08:24:20 -0800 IronPort-SDR: u22SCp2wdpJy0zczYiVMMYqC7yTzztaWw5cCIN1erkWIMwVtk9LotMnElPEOpVCBYxYwoCSb3q 28mQRJIlm0POlMyVkfwx04dCSnn42DneGYU21+bVxNz53rvZD40biTZRG82YwbEftsrP66kWSd jZwXjquE/JVuO3yhDcRRKi0NQsxLgHqNj9fGEzrA42tr1/2FTuosiOYx/g4ltthJj0h0ES6xnl ChQLcrOMFIft2ZdFctfdmO0lXuOidiyM5Pr0S57/wBNbwCBg1v0zhsbnbSmXA+t8FEWm9XMK7m dCM= Message-ID: <02ec5f29-953b-63dd-7d44-04f9af36a114@codesourcery.com> Date: Thu, 29 Sep 2022 17:24:14 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.3.0 Subject: Re: [Patch] libgomp/gcn: Prepare for reverse-offload callback handling Content-Language: en-GB To: Tobias Burnus , gcc-patches CC: Jakub Jelinek References: <55dacdd3-4a82-8087-fdba-824d9910e186@codesourcery.com> From: Andrew Stubbs In-Reply-To: <55dacdd3-4a82-8087-fdba-824d9910e186@codesourcery.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-12.mgc.mentorg.com (139.181.222.12) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-8.8 required=5.0 tests=BAYES_00,HEADER_FROM_DIFFERENT_DOMAINS,KAM_DMARC_STATUS,NICE_REPLY_A,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 27/09/2022 14:16, Tobias Burnus wrote: > @@ -422,6 +428,12 @@ struct agent_info > if it has been. */ > bool initialized; > > + /* Flag whether the HSA program that consists of all the modules has been > + finalized. */ > + bool prog_finalized; > + /* Flag whether the HSA OpenMP's requires_reverse_offload has been used. */ > + bool has_reverse_offload; > + > /* The instruction set architecture of the device. */ > gcn_isa device_isa; > /* Name of the agent. */ > @@ -456,9 +468,6 @@ struct agent_info > thread should have locked agent->module_rwlock for reading before > acquiring it. */ > pthread_mutex_t prog_mutex; > - /* Flag whether the HSA program that consists of all the modules has been > - finalized. */ > - bool prog_finalized; > /* HSA executable - the finalized program that is used to locate kernels. */ > hsa_executable_t executable; > }; Why has prog_finalized been moved? > Andrew did suggest a while back to piggyback on the console_output handling, > avoiding another atomic access. - If this is still wanted, I like to have some > guidance regarding how to actually implement it. The console output ring buffer has the following type: struct output { int return_value; unsigned int next_output; struct printf_data { int written; char msg[128]; int type; union { int64_t ivalue; double dvalue; char text[128]; }; } queue[1024]; unsigned int consumed; } output_data; That is, for each entry in the buffer there is a 128-byte message string, an integer argument-type identifier, and a 128-byte argument field. Before we had printf we had functions that could print string+int (gomp_print_integer, type==0), string+double (gomp_print_double, type==1) and string+string (gomp_print_string, type==2). The string conversion could then be done on the host to keep the target code simple. These would still be useful functions if you want to dump debug quickly without affecting performance so much, but I don't think they ever got upstreamed because somebody (who should have known better!) created an unrelated function upstream with the same name (gomp_print_string) and we already had working printf by then so the effort to fix it wasn't worth it. The current printf implementation (actually the write syscall), uses type==3 to print 256-bytes of output, per packet, with no implied newline. The point is that you can use the "msg" and "text" fields for whatever data you want, as long as you invent a new value for "type". The current loop has: switch (data->type) { case 0: printf ("%.128s%ld\n", data->msg, data->ivalue); break; case 1: printf ("%.128s%f\n", data->msg, data->dvalue); break; case 2: printf ("%.128s%.128s\n", data->msg, data->text); break; case 3: printf ("%.128s%.128s", data->msg, data->text); break; default: printf ("GCN print buffer error!\n"); break; } You can make "case 4" do whatever you want. There are enough bytes for 4 pointers, and you could use multiple packets (although it's not safe to assume they're contiguous or already arrived; maybe "case 4" for part 1, "case 5" for part 2). It's possible to change this structure, of course, but the target implementation is in newlib so versioning becomes a problem. Reusing this would remove the need for has_reverse_offload, since the console output is scanned anyway, and also eliminate rev_ptr, rev_data, and means that, hypothetically, the device can queue up reverse offload requests asynchronously in the ring buffer (you'd need to ensure multi-part packets don't get interleaved though). Andrew