Attached is the updated/rediffed version, which now uses the builtin instead of the 'asm("s8"). The code in principle works; that is: If no private stack variables are copied, it works. Or in other words: reverse-offload target regions that don't use firstprivate or mapping work, the rest would crash. That's avoided by not accepting reverse offload inside GOMP_OFFLOAD_get_num_devices for now. To get it working, the manual stack allocation patch + the trivial update to that get_num_devices func is needed, but no change to the attached patch. In order to reduce local patches, I would love to have it on mainline – otherwise, I have at least the current version in gcc-patches@. Tobias PS: Previous patch email quoted below. Note: there were two follow up emails, one by Andrew and one by me; cf. your own mail archive (of this thread) or https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603383.html + the next two by thread messages. On 12.10.22 16:29, Tobias Burnus wrote: > On 29.09.22 18:24, Andrew Stubbs wrote: >> On 27/09/2022 14:16, Tobias Burnus wrote: >>> Andrew did suggest a while back to piggyback on the console_output >>> handling, >>> avoiding another atomic access. - If this is still wanted, I like to >>> have some >>> guidance regarding how to actually implement it. >> [...] >> The point is that you can use the "msg" and "text" fields for >> whatever data you want, as long as you invent a new value for "type". >> [....] >> You can make "case 4" do whatever you want. There are enough bytes >> for 4 pointers, and you could use multiple packets (although it's not >> safe to assume they're contiguous or already arrived; maybe "case 4" >> for part 1, "case 5" for part 2). It's possible to change this >> structure, of course, but the target implementation is in newlib so >> versioning becomes a problem. > > I think – also looking at the Newlib write.c implementation - that > the data is contiguous: there is an atomic add, where instead of > passing '1' for a single slot, I could also add '2' for two slots. > > Attached is one variant – for the decl of the GOMP_OFFLOAD_target_rev, > it needs the generic parts of the sister nvptx patch.* > > 2*128 bytes were not enough, I need 3*128 bytes. (Or rather 5*64 + > 32.) As target_ext is blocking, I decided to use a stack local > variable for the remaining arguments and pass it along. Alternatively, > I could also use 2 slots - and process them together. This would avoid > one device->host memory copy but would make console_output less clear. > > OK for mainline? > > Tobias > > * https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603354.html > > PS: Currently, device stack variables are private and cannot be > accessed from the host; this will change in a separate patch. It not > only affects the "rest" part as used in this patch but also the actual > arrays behind addr, kinds, and sizes. And quite likely a lot of the > map/firstprivate variables passed to addr. > > As num_devices() will return 0 or -1, this is for now a non-issue. ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955