From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153]) by sourceware.org (Postfix) with ESMTPS id CAB2E38555AB for ; Mon, 21 Aug 2023 16:23:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CAB2E38555AB Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="6.01,190,1684828800"; d="scan'208";a="16936321" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa1.mentor.iphmx.com with ESMTP; 21 Aug 2023 08:23:10 -0800 IronPort-SDR: htlOcjLQBiuFyARPUeP11NtdhfFUHjqfdmMRNSMgK0c1xQ2uNL1X9ldyu0gNvS/Q7eKmSB3oAd q+8OgyWI3LadiLCrcLVgSKUlEfLnkRxUvp+rfuQBx8LzoHv4Iu+54D3Ptdzo6XQL/Lo2P6fsYe vMHeJ0cQW7sQ4D7YpEYj0IHFQtBFpLdKPgdcSYeM+M2f3quwdYwB5ANmF1H99qGz8ak8g9XZxB bYdmAv6OMgHwhaClghC/6QVtoaysXFZOSG9lndym9LQglz7JirWBg0PyJGVYICf1A3VLlEix3i mR8= Message-ID: Date: Mon, 21 Aug 2023 18:23:05 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.14.0 Content-Language: en-US To: gcc-patches , Jakub Jelinek , Thomas Schwinge , Joseph Myers From: Tobias Burnus Subject: [OpenMP/offloading][RFC] How to handle target/device-specifics with C pre-processor (in general, inside 'omp declare variant') Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: quoted-printable X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-13.mgc.mentorg.com (139.181.222.13) To svr-ies-mbx-12.mgc.mentorg.com (139.181.222.12) X-Spam-Status: No, score=-5.3 required=5.0 tests=BAYES_00,HEADER_FROM_DIFFERENT_DOMAINS,KAM_DMARC_STATUS,KAM_SHORT,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: RFC =E2=80=93 and idea how to handle this best in GCC? See the two examples below for what we would like to support. * * * In GCC, we handle OpenMP (and OpenACC) by parsing the input file once, produce an internal representation (in LTO format) for offloading code and only at link time process it by passing it via the LTO wrapper to the offloading-device compilers (mkoffload / device lto1). See https://gcc.gnu.org/wiki/Offloading This works okayish - even though it causes some issues like with metadirectives (they are implemened on the OG13 branch, however). And with declare variant or a nohost version, where getting rid of the host version is not that easy as it has to be in there until omp-offload.cc's functions are run, which comes rather late. There are currently already some issues like with -ffast-math and GLIBC's finite math functions, which are not be available on the device side when using newlib's libm.. (However, GLIBC has removed those.) Likewise, it would be nice to do like Clang+LLVM does: Auto-enable some device-specific math functions. (Albeit that won't work well with Fortran.) However, with OpenMP 5.1, there is a real issue. In 5.1, Appendix B it reads as: "For C/C++, the declare variant directive was extended to support elision of preprocessed code and to allow enclosed function definitions to be interpreted as variant functions (see Section 7.5)." The problem is the "elision of preprocessed" as it permits code like the following: |#ifdef _OPENMP #pragma omp begin declare variant match(device=3D{arch=3DNVPTX}) #include "cuda/math.h" #pragma omp begin declare variant match(device=3D{isa=3Dsm70}) #include "cuda/sm70/math.h" #pragma omp end declare variant #pragma omp end declare variant #pragma omp begin declare variant match(arch=3DAMD) #include "amdgpu/math.h" #pragma omp end declare variant #endif| And such code needs to keep working if there is a '#define ABC ...' in one file and an '#ifndef ABC / #define ABC ...' in the other file. Additionally, it would be neat if it would handle target-specific defines like '#if __PTX_SM__ =3D=3D 350' for the relevant parts (here: arch=3Dnvptx= ). (We already do support context selectors via the gcc/config/*/t-omp-device = files; see also https://gcc.gnu.org/onlinedocs/libgomp/OpenMP-Context-Selectors.ht= ml ) Thoughts? * * * The question is also what to support =E2=80=93 "just" function declarations= which are specific to a device or some generic replacement of the kind: |#pragma omp begin declare variant match(device=3D{arch=3DNVPTX})| #define NUM_THREADS 128 #pragma omp end declare variant |#pragma omp begin declare variant match(device=3D{arch=3DAMDGCN})| #define NUM_THREADS 64 #pragma omp end declare variant #ifndef NUM_THREADS=C3=9F #define NUM_THREADS 16 #endif ... printf ("Running with %d threads\n", NUM_THREADS); #pragma omp parallel for num_threads(NUM_THREADS) * * * If we only handle 'begin/end declare variant', the following works in principle: - Parse the file once with only host-code parsing but - keep track of delimited '|omp begin declare variant|' where the context selector matches one of the supported offload targets. - parse the file n-times again but this time set the target-#defined (extended version of gcc/config/*/t-omp-device to make them available?) - When doing so, ignore all non-offloading bits (issue: implicit 'declare target' + have the data available for variant resolution). - Store this in some way. But it is not really clear to me how to do this in actual code. Any suggestion? Tobias PS: I would like to have some input before the Cauldron, but we might want to additionally discuss this in detail during the cauldron, possibly some brainstorming before the BoF and then surely also in the BoF. ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstra=C3=9Fe 201= , 80634 M=C3=BCnchen; Gesellschaft mit beschr=C3=A4nkter Haftung; Gesch=C3= =A4ftsf=C3=BChrer: Thomas Heurung, Frank Th=C3=BCrauf; Sitz der Gesellschaf= t: M=C3=BCnchen; Registergericht M=C3=BCnchen, HRB 106955