Hi all, Jakub Jelinek wrote: > On Sat, Jul 20, 2024 at 02:42:22PM -0600, Sandra Loosemore wrote: >> This patch implements the libgomp runtime support for the dynamic >> target_device selector via the GOMP_evaluate_target_device function. > […] > > Now for kind, isa and arch traits in the target_device set this patch > decides based on compiler flags used to compile some routine in libgomp.so > or libgomp.a. > > While this can work in the (very unfortunate) GCN state of things where > only exact isa match is possible (I really hope we can one day generalize > it by being able to compile for a set of isas by supporting lowest > denominator and patching the EM_* in the ELF header or something similar, > perhaps with runtime decisions on what to do for different CPUs), I think that can only work to some extend. LLVM has "gfx11-generic" which is compatible with gfx110{0,1,2,3,} and gfx115{0,1,2}, which at least helps a bit. For gfx10, it has gfx10-1-generic for gfx101{0,1,2,3} and gfx10-3-generic for gfx103[0-6] and gfx9-generic for gfx90{0,2,4,6,9,c}. Thus, we could have versions which support a common subset, but we still need multiple libraries. And it needs to be implemented … This sounds like a task for the GCN maintainer … * * * > deciding what to do based on how libgomp.a or libgomp.so.1 has been compiled for the > rest is IMHO wrong. I wonder whether we should do something like the following. [The following is a mix between compile code and generated code, for illustrative purpose.] Inside the compiler do: #ifndef ACCEL_COMPILER intr = 0; if (targetm.omp.device_kind_arch_isa != NULL) r = targetm.omp.device_kind_arch_isa (omp_device_{kind,arch,isa}, val); if (dev_num && TREE_CODE (dev_num) == INTEGER_CST) { if (dev_num < -1 /* INVALID_DEVICE or nonconforming */) → 0 if (dev_num == initial_device) → r } /* The '? :' condition is a compile time condition. */ d = ? : omp_get_default_device (); if (d < -1) → 0 else if (d == -1 || d == omp_get_initial_device ()) → r else → GOMP_get_device_kind_arch_isa (d, kind, arch, isa) #else /* VARIANT 1: Assume that neither reverse offload nor nested target occurs. */ →targetm.omp.device_kind_arch_isa (kind, arch, isa) /* VARIANT 2 - d = ? : omp_get_default_device (); if (d == omp_get_device_num ()) →targetm.omp.device_kind_arch_isa (kind, arch, isa) else /* Cannot really do anything here - and as no nested target is permitted, use 'false'. */ → 0 #endif * * * And on the libgomp side GOMP_get_device_kind_arch_isa → plugin code. And there: (A) GCN: kind and arch are clear. For ISA: agent->device_isa + use existing isa_hsa_name() function (or likewise). (B) Nvptx: cuDeviceGetAttribute + CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR = 75 and CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR = 76. Example: sm_89 = (major) 8 and (minor) 9. * * * Does this sound sensible? Tobias PS: For the current host-offload GSoC task, we might eventually think of using cpuid on x86-64, i.e. gcc/config/i386/cpuid.h. PS: RFC remains: Should 'sm_80' be true if the hardware/compilation is 'sm_89' or not? Namely: Does 'sm_80' denote the capability or the specific hardware? Regarding this topic, see also https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662059.html