From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153]) by sourceware.org (Postfix) with ESMTPS id 6A89B3857BA3 for ; Tue, 12 Jul 2022 14:16:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6A89B3857BA3 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.92,265,1650960000"; d="scan'208";a="81358736" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa1.mentor.iphmx.com with ESMTP; 12 Jul 2022 06:16:40 -0800 IronPort-SDR: P3OW6WaHae2JMwVvUdSkiwRIOx+gP7LCeE+CwIQyh35cNiG5DJwqMxMsj1wcI+l9wy0gxRURtD j7jzjWIyBdkGNI/8hAyAZHZGoBbaIsPGYm6oX9GTTkDC7hztGpFI5fEVlySpZykrBW+CYQ9r6h KQG1k3WXv2o/TBPb6jgcI7TWs2H30CnNEcKUbkHXUlftnsTTbnvuwDDuuYTuD1mi+ATkwNmfOV 4DgdhivoaAAMZaJINiFv3ZicqxuDc/lFSNrtEI4JScLbQ4smLhiOO2zUWhgh+oinVlUK0pTvrj BE4= Content-Type: multipart/mixed; boundary="------------H8g6FZNFFRkRTJiGZ9j4eqdh" Message-ID: <0e1a740e-46d5-ebfa-36f4-9a069ddf8620@codesourcery.com> Date: Tue, 12 Jul 2022 15:16:35 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Content-Language: en-GB From: Andrew Stubbs Subject: [PATCH] openmp: fix max_vf setting for amdgcn offloading To: "gcc-patches@gcc.gnu.org" X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Jul 2022 14:16:43 -0000 --------------H8g6FZNFFRkRTJiGZ9j4eqdh Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit This patch ensures that the maximum vectorization factor used to set the "safelen" attribute on "omp simd" constructs is suitable for all the configured offload devices. Right now it makes the proper adjustment for NVPTX, but otherwise just uses a value suitable for the host system (always x86_64 in the case of amdgcn). This typically ends up being 16 where 64 is the minimum for vectorization to work properly on GCN. There is a potential problem that one "safelen" must be set for *all* offload devices, which means it can't be perfect for all devices. However I believe that too big is always OK (at least for powers of two?) whereas too small is not OK, so this code always selects the largest value of max_vf, regardless of where it comes from. The existing target VF function, omp_max_simt_vf, is tangled up with the notion of whether SIMT is available or not, so I couldn't add amdgcn in there. It's tempting to have omp_max_vf do some kind of autodetect what VF to choose, but the current implementation in omp-general.cc doesn't have access to the context in a convenient way, and nor do all the callers, so I couldn't easily do that. Instead, I have opted to add a new function, omp_max_simd_vf, which can check for the presence of amdgcn. While reviewing the callers of omp_max_vf I found one other case that looks like it ought to be tuned for the device, not just the host. In that case it's not clear how to achieve that and in fact, at least on x86_64, the way it is coded the actual value from omp_max_vf is always ignored in favour of a much larger "minimum", so I have added a comment for the next person to touch that spot and left it alone. This change gives a 10x performance improvement on the BabelStream "dot" benchmark on amdgcn and is not harmful on nvptx. OK for mainline? I will commit a backport to OG12 shortly. Andrew --------------H8g6FZNFFRkRTJiGZ9j4eqdh Content-Type: text/plain; charset="UTF-8"; name="220712-max_vf.patch" Content-Disposition: attachment; filename="220712-max_vf.patch" Content-Transfer-Encoding: base64 b3Blbm1wOiBmaXggbWF4X3ZmIHNldHRpbmcgZm9yIGFtZGdjbiBvZmZsb2FkaW5nCgpFbnN1 cmUgdGhhdCB0aGUgIm1heF92ZiIgZmlndXJlIHVzZWQgZm9yIHRoZSAic2FmZWxlbiIgYXR0 cmlidXRlIGlzIGxhcmdlCmVub3VnaCBmb3IgdGhlIGxhcmdlc3QgY29uZmlndXJlZCBvZmZs b2FkIGRldmljZS4KClRoaXMgY2hhbmdlIGdpdmVzIH4xMHggc3BlZWQgaW1wcm92ZW1lbnQg b24gdGhlIEJhYmxlc3RyZWFtICJkb3QiIGJlbmNobWFyayBmb3IKQU1EIEdDTi4KCmdjYy9D aGFuZ2VMb2c6CgoJKiBnaW1wbGUtbG9vcC12ZXJzaW9uaW5nLmNjIChsb29wX3ZlcnNpb25p bmc6Omxvb3BfdmVyc2lvbmluZyk6IEFkZAoJY29tbWVudC4KCSogb21wLWdlbmVyYWwuY2Mg KG9tcF9tYXhfc2ltZF92Zik6IE5ldyBmdW5jdGlvbi4KCSogb21wLWdlbmVyYWwuaCAob21w X21heF9zaW1kX3ZmKTogTmV3IHByb3RvdHlwZS4KCSogb21wLWxvdy5jYyAobG93ZXJfcmVj X3NpbWRfaW5wdXRfY2xhdXNlcyk6IFNlbGVjdCBsYXJnZXN0IGZyb20KCSAgb21wX21heF92 Ziwgb21wX21heF9zaW10X3ZmLCBhbmQgb21wX21heF9zaW1kX3ZmLgoKZ2NjL3Rlc3RzdWl0 ZS9DaGFuZ2VMb2c6CgoJKiBsaWIvdGFyZ2V0LXN1cHBvcnRzLmV4cAoJKGNoZWNrX2VmZmVj dGl2ZV90YXJnZXRfYW1kZ2NuX29mZmxvYWRpbmdfZW5hYmxlZCk6IE5ldy4KCShjaGVja19l ZmZlY3RpdmVfdGFyZ2V0X252cHR4X29mZmxvYWRpbmdfZW5hYmxlZCk6IE5ldy4KCSogZ2Nj LmRnL2dvbXAvdGFyZ2V0LXZmLmM6IE5ldyB0ZXN0LgoKZGlmZiAtLWdpdCBhL2djYy9naW1w bGUtbG9vcC12ZXJzaW9uaW5nLmNjIGIvZ2NjL2dpbXBsZS1sb29wLXZlcnNpb25pbmcuY2MK aW5kZXggNmJjZjZlYmE2OTEuLmU5MDhjMjdmYzQ0IDEwMDY0NAotLS0gYS9nY2MvZ2ltcGxl LWxvb3AtdmVyc2lvbmluZy5jYworKysgYi9nY2MvZ2ltcGxlLWxvb3AtdmVyc2lvbmluZy5j YwpAQCAtNTU1LDcgKzU1NSwxMCBAQCBsb29wX3ZlcnNpb25pbmc6Omxvb3BfdmVyc2lvbmlu ZyAoZnVuY3Rpb24gKmZuKQogICAgICB1bnZlY3Rvcml6YWJsZSBjb2RlLCBzaW5jZSBpdCBp cyB0aGUgbGFyZ2VzdCBzaXplIHRoYXQgY2FuIGJlCiAgICAgIGhhbmRsZWQgZWZmaWNpZW50 bHkgYnkgc2NhbGFyIGNvZGUuICBvbXBfbWF4X3ZmIGNhbGN1bGF0ZXMgdGhlCiAgICAgIG1h eGltdW0gbnVtYmVyIG9mIGJ5dGVzIGluIGEgdmVjdG9yLCB3aGVuIHN1Y2ggYSB2YWx1ZSBp cyByZWxldmFudAotICAgICB0byBsb29wIG9wdGltaXphdGlvbi4gICovCisgICAgIHRvIGxv b3Agb3B0aW1pemF0aW9uLgorICAgICBGSVhNRTogdGhpcyBwcm9iYWJseSBuZWVkcyB0byB1 c2Ugb21wX21heF9zaW1kX3ZmIHdoZW4gaW4gYSB0YXJnZXQKKyAgICAgcmVnaW9uLCBidXQg aG93IHRvIHRlbGw/IChBbmQgTUFYX0ZJWEVEX01PREVfU0laRSBpcyBsYXJnZSBlbm91Z2gg dGhhdAorICAgICBpdCBkb2Vzbid0IGFjdHVhbGx5IG1hdHRlci4pICAqLwogICBtX21heGlt dW1fc2NhbGUgPSBlc3RpbWF0ZWRfcG9seV92YWx1ZSAob21wX21heF92ZiAoKSk7CiAgIG1f bWF4aW11bV9zY2FsZSA9IE1BWCAobV9tYXhpbXVtX3NjYWxlLCBNQVhfRklYRURfTU9ERV9T SVpFKTsKIH0KZGlmZiAtLWdpdCBhL2djYy9vbXAtZ2VuZXJhbC5jYyBiL2djYy9vbXAtZ2Vu ZXJhbC5jYwppbmRleCBhNDA2YzU3OGYzMy4uOGM2ZmNlYmM0YjMgMTAwNjQ0Ci0tLSBhL2dj Yy9vbXAtZ2VuZXJhbC5jYworKysgYi9nY2Mvb21wLWdlbmVyYWwuY2MKQEAgLTk5NCw2ICs5 OTQsMjQgQEAgb21wX21heF9zaW10X3ZmICh2b2lkKQogICByZXR1cm4gMDsKIH0KIAorLyog UmV0dXJuIG1heGltdW0gU0lNRCB3aWR0aCBpZiBvZmZsb2FkaW5nIG1heSB0YXJnZXQgU0lN RCBoYXJkd2FyZS4gICovCisKK2ludAorb21wX21heF9zaW1kX3ZmICh2b2lkKQoreworICBp ZiAoIW9wdGltaXplKQorICAgIHJldHVybiAwOworICBpZiAoRU5BQkxFX09GRkxPQURJTkcp CisgICAgZm9yIChjb25zdCBjaGFyICpjID0gZ2V0ZW52ICgiT0ZGTE9BRF9UQVJHRVRfTkFN RVMiKTsgYzspCisgICAgICB7CisJaWYgKHN0YXJ0c3dpdGggKGMsICJhbWRnY24iKSkKKwkg IHJldHVybiA2NDsKKwllbHNlIGlmICgoYyA9IHN0cmNociAoYywgJzonKSkpCisJICBjKys7 CisgICAgICB9CisgIHJldHVybiAwOworfQorCiAvKiBTdG9yZSB0aGUgY29uc3RydWN0IHNl bGVjdG9ycyBhcyB0cmVlIGNvZGVzIGZyb20gbGFzdCB0byBmaXJzdCwKICAgIHJldHVybiB0 aGVpciBudW1iZXIuICAqLwogCmRpZmYgLS1naXQgYS9nY2Mvb21wLWdlbmVyYWwuaCBiL2dj Yy9vbXAtZ2VuZXJhbC5oCmluZGV4IDc0ZTkwZTFhNzFhLi40MTAzNDNlNDVmYSAxMDA2NDQK LS0tIGEvZ2NjL29tcC1nZW5lcmFsLmgKKysrIGIvZ2NjL29tcC1nZW5lcmFsLmgKQEAgLTEw NCw2ICsxMDQsNyBAQCBleHRlcm4gZ2ltcGxlICpvbXBfYnVpbGRfYmFycmllciAodHJlZSBs aHMpOwogZXh0ZXJuIHRyZWUgZmluZF9jb21iaW5lZF9vbXBfZm9yICh0cmVlICosIGludCAq LCB2b2lkICopOwogZXh0ZXJuIHBvbHlfdWludDY0IG9tcF9tYXhfdmYgKHZvaWQpOwogZXh0 ZXJuIGludCBvbXBfbWF4X3NpbXRfdmYgKHZvaWQpOworZXh0ZXJuIGludCBvbXBfbWF4X3Np bWRfdmYgKHZvaWQpOwogZXh0ZXJuIGludCBvbXBfY29uc3RydWN0b3JfdHJhaXRzX3RvX2Nv ZGVzICh0cmVlLCBlbnVtIHRyZWVfY29kZSAqKTsKIGV4dGVybiB0cmVlIG9tcF9jaGVja19j b250ZXh0X3NlbGVjdG9yIChsb2NhdGlvbl90IGxvYywgdHJlZSBjdHgpOwogZXh0ZXJuIHZv aWQgb21wX21hcmtfZGVjbGFyZV92YXJpYW50IChsb2NhdGlvbl90IGxvYywgdHJlZSB2YXJp YW50LApkaWZmIC0tZ2l0IGEvZ2NjL29tcC1sb3cuY2MgYi9nY2Mvb21wLWxvdy5jYwppbmRl eCBkNzNjMTY1ZjAyOS4uMWE5YTUwOWFkYjkgMTAwNjQ0Ci0tLSBhL2djYy9vbXAtbG93LmNj CisrKyBiL2djYy9vbXAtbG93LmNjCkBAIC00NjQ2LDcgKzQ2NDYsMTQgQEAgbG93ZXJfcmVj X3NpbWRfaW5wdXRfY2xhdXNlcyAodHJlZSBuZXdfdmFyLCBvbXBfY29udGV4dCAqY3R4LAog ewogICBpZiAoa25vd25fZXEgKHNjdHgtPm1heF92ZiwgMFUpKQogICAgIHsKLSAgICAgIHNj dHgtPm1heF92ZiA9IHNjdHgtPmlzX3NpbXQgPyBvbXBfbWF4X3NpbXRfdmYgKCkgOiBvbXBf bWF4X3ZmICgpOworICAgICAgLyogSWYgd2UgYXJlIGNvbXBpbGluZyBmb3IgbXVsdGlwbGUg ZGV2aWNlcyBjaG9vc2UgdGhlIGxhcmdlc3QgVkYuICAqLworICAgICAgc2N0eC0+bWF4X3Zm ID0gb21wX21heF92ZiAoKTsKKyAgICAgIGlmIChvbXBfbWF5YmVfb2ZmbG9hZGVkX2N0eCAo Y3R4KSkKKwl7CisJICBpZiAoc2N0eC0+aXNfc2ltdCkKKwkgICAgc2N0eC0+bWF4X3ZmID0g b3JkZXJlZF9tYXggKHNjdHgtPm1heF92Ziwgb21wX21heF9zaW10X3ZmICgpKTsKKwkgIHNj dHgtPm1heF92ZiA9IG9yZGVyZWRfbWF4IChzY3R4LT5tYXhfdmYsIG9tcF9tYXhfc2ltZF92 ZiAoKSk7CisJfQogICAgICAgaWYgKG1heWJlX2d0IChzY3R4LT5tYXhfdmYsIDFVKSkKIAl7 CiAJICB0cmVlIGMgPSBvbXBfZmluZF9jbGF1c2UgKGdpbXBsZV9vbXBfZm9yX2NsYXVzZXMg KGN0eC0+c3RtdCksCmRpZmYgLS1naXQgYS9nY2MvdGVzdHN1aXRlL2djYy5kZy9nb21wL3Rh cmdldC12Zi5jIGIvZ2NjL3Rlc3RzdWl0ZS9nY2MuZGcvZ29tcC90YXJnZXQtdmYuYwpuZXcg ZmlsZSBtb2RlIDEwMDY0NAppbmRleCAwMDAwMDAwMDAwMC4uMTRjZWE0NWU1M2MKLS0tIC9k ZXYvbnVsbAorKysgYi9nY2MvdGVzdHN1aXRlL2djYy5kZy9nb21wL3RhcmdldC12Zi5jCkBA IC0wLDAgKzEsMjEgQEAKKy8qIHsgZGctZG8gY29tcGlsZSB9ICovCisvKiB7IGRnLW9wdGlv bnMgIi1mb3Blbm1wIC1PMiAtZmR1bXAtdHJlZS1vbXBsb3dlciIgfSAqLyAKKworLyogRW5z dXJlIHRoYXQgdGhlIG9tcF9tYXhfdmYsIG9tcF9tYXhfc2ltdF92ZiwgYW5kIG9tcF9tYXhf c2ltZF92ZiBhcmUgd29ya2luZworICAgcHJvcGVybHkgdG8gc2V0IHRoZSBPcGVuTVAgdmVj dG9yaXphdGlvbiBmYWN0b3IgZm9yIHRoZSBvZmZsb2FkIHRhcmdldCwgYW5kCisgICBub3Qg anVzdCBmb3IgdGhlIGhvc3QuICAqLworCitmbG9hdAorZm9vIChmbG9hdCAqIF9fcmVzdHJp Y3QgeCwgZmxvYXQgKiBfX3Jlc3RyaWN0IHkpCit7CisgIGZsb2F0IHN1bSA9IDAuMDsKKwor I3ByYWdtYSBvbXAgdGFyZ2V0IHRlYW1zIGRpc3RyaWJ1dGUgcGFyYWxsZWwgZm9yIHNpbWQg bWFwKHRvZnJvbTogc3VtKSByZWR1Y3Rpb24oKzpzdW0pCisgIGZvciAoaW50IGk9MDsgaTwx MDI0OyBpKyspCisgICAgc3VtICs9IHhbaV0gKiB5W2ldOworCisgIHJldHVybiBzdW07Cit9 CisKKy8qIHsgZGctZmluYWwgeyBzY2FuLXRyZWUtZHVtcCAgInNhZmVsZW5cXCg2NFxcKSIg Im9tcGxvd2VyIiB7IHRhcmdldCBhbWRnY25fb2ZmbG9hZGluZ19lbmFibGVkIH0gfSB9ICov CisvKiB7IGRnLWZpbmFsIHsgc2Nhbi10cmVlLWR1bXAgICJzYWZlbGVuXFwoMzJcXCkiICJv bXBsb3dlciIgeyB0YXJnZXQgeyB7IG52cHR4X29mZmxvYWRpbmdfZW5hYmxlZCB9ICYmIHsg ISBhbWRnY25fb2ZmbG9hZGluZ19lbmFibGVkIH0gfSB9IH0gfSAqLwpkaWZmIC0tZ2l0IGEv Z2NjL3Rlc3RzdWl0ZS9saWIvdGFyZ2V0LXN1cHBvcnRzLmV4cCBiL2djYy90ZXN0c3VpdGUv bGliL3RhcmdldC1zdXBwb3J0cy5leHAKaW5kZXggNGVkN2IyNWI5YTQuLjM2MzM1NGJlNDYx IDEwMDY0NAotLS0gYS9nY2MvdGVzdHN1aXRlL2xpYi90YXJnZXQtc3VwcG9ydHMuZXhwCisr KyBiL2djYy90ZXN0c3VpdGUvbGliL3RhcmdldC1zdXBwb3J0cy5leHAKQEAgLTEwMjUsNiAr MTAyNSwxNiBAQCBwcm9jIGNoZWNrX2VmZmVjdGl2ZV90YXJnZXRfb2ZmbG9hZGluZ19lbmFi bGVkIHt9IHsKICAgICByZXR1cm4gW2NoZWNrX2NvbmZpZ3VyZWRfd2l0aCAiLS1lbmFibGUt b2ZmbG9hZC10YXJnZXRzIl0KIH0KIAorIyBSZXR1cm4gMSBpZiBjb21waWxlZCB3aXRoIC0t ZW5hYmxlLW9mZmxvYWQtdGFyZ2V0cz1hbWRnY24KK3Byb2MgY2hlY2tfZWZmZWN0aXZlX3Rh cmdldF9hbWRnY25fb2ZmbG9hZGluZ19lbmFibGVkIHt9IHsKKyAgICByZXR1cm4gW2NoZWNr X2NvbmZpZ3VyZWRfd2l0aCB7LS1lbmFibGUtb2ZmbG9hZC10YXJnZXRzPVteIF0qYW1kZ2Nu fV0KK30KKworIyBSZXR1cm4gMSBpZiBjb21waWxlZCB3aXRoIC0tZW5hYmxlLW9mZmxvYWQt dGFyZ2V0cz1hbWRnY24KK3Byb2MgY2hlY2tfZWZmZWN0aXZlX3RhcmdldF9udnB0eF9vZmZs b2FkaW5nX2VuYWJsZWQge30geworICAgIHJldHVybiBbY2hlY2tfY29uZmlndXJlZF93aXRo IHstLWVuYWJsZS1vZmZsb2FkLXRhcmdldHM9W14gXSpudnB0eH1dCit9CisKICMgUmV0dXJu IDEgaWYgY29tcGlsYXRpb24gd2l0aCAtZm9wZW5hY2MgaXMgZXJyb3ItZnJlZSBmb3IgdHJp dmlhbAogIyBjb2RlLCAwIG90aGVyd2lzZS4KIAo= --------------H8g6FZNFFRkRTJiGZ9j4eqdh--