From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id E9DCA3858D39 for ; Wed, 28 Feb 2024 17:25:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E9DCA3858D39 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E9DCA3858D39 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1709141135; cv=none; b=nfJRFHKU6fAL7zr+aTVOxpg8MRsm3pEd3eDUCsfpjH7Olebmi58/uXdrVdECO8EWufGk8skDbKA7NbPPEaCdLMIi9d79GVk1Cn+ZELdLWcKttjTddy94kEcTbwzEiJLBeOkRdeKNjz3Au8rD/XrI1BVmevv2GaKzYWv0BdG+Dbc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1709141135; c=relaxed/simple; bh=ZnntloP6V91DNr5WwmO+0ctE50jT0H/pEMbhiRC9q2U=; h=Message-ID:Date:MIME-Version:Subject:To:From; b=OFwdV0sZpW2rmn7PZHYqr/iuup0UoWl//wzzLQ2pZ0i6UOLYawrC9IsUH4+YdV4zCvIG98zfvJFLUS6uSnfEhtsyj5w+1KL3kTd4UB9iGReZ/2DCWjks+qFYsdGypUxCtLAzmMjIkbBEJH25C8/RKa9Y+LRdvvxSMVS+gNwOg9g= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1952CDA7; Wed, 28 Feb 2024 09:26:11 -0800 (PST) Received: from [10.57.67.120] (unknown [10.57.67.120]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 860853F762; Wed, 28 Feb 2024 09:25:31 -0800 (PST) Content-Type: multipart/mixed; boundary="------------X1FUDPecHoqzEc8VT6E6AXpt" Message-ID: <0a69b67c-7fa3-4a79-86c4-c6311cc56ea7@arm.com> Date: Wed, 28 Feb 2024 17:25:30 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE Content-Language: en-US To: Richard Biener Cc: gcc-patches@gcc.gnu.org, Richard.Sandiford@arm.com References: <20240130143132.9575-1-andre.simoesdiasvieira@arm.com> <20240130143132.9575-2-andre.simoesdiasvieira@arm.com> <47e1aeb2-94ac-4733-b49f-ea97932cc49f@arm.com> <545r8s73-675p-4o48-sr66-q6956nqp6r6p@fhfr.qr> <3rq8sn71-8188-o4rq-9spp-q9spn98163q5@fhfr.qr> <359e8112-65c9-40b1-9566-aa31165c05e8@arm.com> From: "Andre Vieira (lists)" In-Reply-To: X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_LOTSOFHASH,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This is a multi-part message in MIME format. --------------X1FUDPecHoqzEc8VT6E6AXpt Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 27/02/2024 08:47, Richard Biener wrote: > On Mon, 26 Feb 2024, Andre Vieira (lists) wrote: > >> >> >> On 05/02/2024 09:56, Richard Biener wrote: >>> On Thu, 1 Feb 2024, Andre Vieira (lists) wrote: >>> >>>> >>>> >>>> On 01/02/2024 07:19, Richard Biener wrote: >>>>> On Wed, 31 Jan 2024, Andre Vieira (lists) wrote: >>>>> >>>>> >>>>> The patch didn't come with a testcase so it's really hard to tell >>>>> what goes wrong now and how it is fixed ... >>>> >>>> My bad! I had a testcase locally but never added it... >>>> >>>> However... now I look at it and ran it past Richard S, the codegen isn't >>>> 'wrong', but it does have the potential to lead to some pretty slow >>>> codegen, >>>> especially for inbranch simdclones where it transforms the SVE predicate >>>> into >>>> an Advanced SIMD vector by inserting the elements one at a time... >>>> >>>> An example of which can be seen if you do: >>>> >>>> gcc -O3 -march=armv8-a+sve -msve-vector-bits=128 -fopenmp-simd t.c -S >>>> >>>> with the following t.c: >>>> #pragma omp declare simd simdlen(4) inbranch >>>> int __attribute__ ((const)) fn5(int); >>>> >>>> void fn4 (int *a, int *b, int n) >>>> { >>>> for (int i = 0; i < n; ++i) >>>> b[i] = fn5(a[i]); >>>> } >>>> >>>> Now I do have to say, for our main usecase of libmvec we won't have any >>>> 'inbranch' Advanced SIMD clones, so we avoid that issue... But of course >>>> that >>>> doesn't mean user-code will. >>> >>> It seems to use SVE masks with vector(4) and the >>> ABI says the mask is vector(4) int. You say that's because we choose >>> a Adv SIMD clone for the SVE VLS vector code (it calls _ZGVnM4v_fn5). >>> >>> The vectorizer creates >>> >>> _44 = VEC_COND_EXPR ; >>> >>> and then vector lowering decomposes this. That means the vectorizer >>> lacks a check that the target handles this VEC_COND_EXPR. >>> >>> Of course I would expect that SVE with VLS vectors is able to >>> code generate this operation, so it's missing patterns in the end. >>> >>> Richard. >>> >> >> What should we do for GCC-14? Going forward I think the right thing to do is >> to add these patterns. But I am not even going to try to do that right now and >> even though we can codegen for this, the result doesn't feel like it would >> ever be profitable which means I'd rather not vectorize, or well pick a >> different vector mode if possible. >> >> This would be achieved with the change to the targethook. If I change the hook >> to take modes, using STMT_VINFO_VECTYPE (stmt_vinfo), is that OK for now? > > Passing in a mode is OK. I'm still not fully understanding why the > clone isn't fully specifying 'mode' and if it does not why the > vectorizer itself can not disregard it. We could check that the modes of the parameters & return type are the same as the vector operands & result in the vectorizer. But then we'd also want to make sure we don't reject cases where we have simdclones with compatible modes, aka same element type, but a multiple element count. Which is where'd we get in trouble again I think, because we'd want to accept V8SI -> 2x V4SI, but not V8SI -> 2x VNx4SI (with VLS and aarch64_sve_vg = 2), not because it's invalid, but because right now the codegen is bad. And it's easier to do this in the targethook, which we can technically also use to 'rank' simdclones by setting a target_badness value, so in the future we could decide to assign some 'badness' to influence the rank a SVE simdclone for Advanced SIMD loops vs an Advanced SIMD clone for Advanced SIMD loops. This does touch another issue of simdclone costing, which is a larger issue in general and one we (arm) might want to approach in the future. It's a complex issue, because the vectorizer doesn't know the performance impact of a simdclone, we assume (as we should) that its faster than the original scalar, though we currently don't record costs for either, but we don't know by how much or how much impact it has, so the vectorizer can't reason whether it's beneficial to use a simdclone if it has to do a lot of operand preparation, we can merely tell it to use it, or not and all the other operations in the loop will determine costing. > From the past discussion I understood the existing situation isn't > as bad as initially thought and no bad things happen right now? Nope, I thought they compiler would fall apart, but it seems to be able to transform the operands from one mode into the other, so without the targethook it just generates slower loops in certain cases, which we'd rather avoid given the usecase for simdclones is to speed things up ;) Attached reworked patch. This patch adds a machine_mode argument to TARGET_SIMD_CLONE_USABLE to make sure the target can reject a simd_clone based on the vector mode it is using. This is needed because for VLS SVE vectorization the vectorizer accepts Advanced SIMD simd clones when vectorizing using SVE types because the simdlens might match, this currently leads to suboptimal codegen. Other targets do not currently need to use this argument. gcc/ChangeLog: * target.def (TARGET_SIMD_CLONE_USABLE): Add argument. * tree-vect-stmts.cc (vectorizable_simd_clone_call): Pass vector_mode to call TARGET_SIMD_CLONE_USABLE. * config/aarch64/aarch64.cc (aarch64_simd_clone_usable): Add argument and use it to reject the use of SVE simd clones with Advanced SIMD modes. * config/gcn/gcn.cc (gcn_simd_clone_usable): Add unused argument. * config/i386/i386.cc (ix86_simd_clone_usable): Likewise. --------------X1FUDPecHoqzEc8VT6E6AXpt Content-Type: text/plain; charset=UTF-8; name="target_simd_clone_usable.patch" Content-Disposition: attachment; filename="target_simd_clone_usable.patch" Content-Transfer-Encoding: base64 ZGlmZiAtLWdpdCBhL2djYy9jb25maWcvYWFyY2g2NC9hYXJjaDY0LmNjIGIvZ2NjL2NvbmZp Zy9hYXJjaDY0L2FhcmNoNjQuY2MKaW5kZXggMTYzMThiZjkyNTg4M2VjZWRmOTM0NWU1M2Zj MDgyNGE1NTNiMjc0Ny4uNmVlNzdmNjEyMzUyMTliNDc3ZDFmNjIyZmNlYjc1MmQ1NGM1OGI4 NyAxMDA2NDQKLS0tIGEvZ2NjL2NvbmZpZy9hYXJjaDY0L2FhcmNoNjQuY2MKKysrIGIvZ2Nj L2NvbmZpZy9hYXJjaDY0L2FhcmNoNjQuY2MKQEAgLTI4NzY5LDEyICsyODc2OSwxMiBAQCBh YXJjaDY0X3NpbWRfY2xvbmVfYWRqdXN0IChzdHJ1Y3QgY2dyYXBoX25vZGUgKm5vZGUpCiAv KiBJbXBsZW1lbnQgVEFSR0VUX1NJTURfQ0xPTkVfVVNBQkxFLiAgKi8KIAogc3RhdGljIGlu dAotYWFyY2g2NF9zaW1kX2Nsb25lX3VzYWJsZSAoc3RydWN0IGNncmFwaF9ub2RlICpub2Rl KQorYWFyY2g2NF9zaW1kX2Nsb25lX3VzYWJsZSAoc3RydWN0IGNncmFwaF9ub2RlICpub2Rl LCBtYWNoaW5lX21vZGUgdmVjdG9yX21vZGUpCiB7CiAgIHN3aXRjaCAobm9kZS0+c2ltZGNs b25lLT52ZWNzaXplX21hbmdsZSkKICAgICB7CiAgICAgY2FzZSAnbic6Ci0gICAgICBpZiAo IVRBUkdFVF9TSU1EKQorICAgICAgaWYgKCFUQVJHRVRfU0lNRCB8fCBhYXJjaDY0X3N2ZV9t b2RlX3AgKHZlY3Rvcl9tb2RlKSkKIAlyZXR1cm4gLTE7CiAgICAgICByZXR1cm4gMDsKICAg ICBkZWZhdWx0OgpkaWZmIC0tZ2l0IGEvZ2NjL2NvbmZpZy9nY24vZ2NuLmNjIGIvZ2NjL2Nv bmZpZy9nY24vZ2NuLmNjCmluZGV4IGJjMDc2ZDExMjBkOWU3ZDAzYzliZWQyM2I4ZGYyMTVh ZTM1ZTQ0MmMuLjk2MjRiN2MxYWFiMjk2NjVjNTJmN2I4MmQ4YjQzN2FmMmU4ZTFlYTEgMTAw NjQ0Ci0tLSBhL2djYy9jb25maWcvZ2NuL2djbi5jYworKysgYi9nY2MvY29uZmlnL2djbi9n Y24uY2MKQEAgLTU2NjcsNyArNTY2Nyw4IEBAIGdjbl9zaW1kX2Nsb25lX2FkanVzdCAoc3Ry dWN0IGNncmFwaF9ub2RlICpBUkdfVU5VU0VEIChub2RlKSkKIC8qIEltcGxlbWVudCBUQVJH RVRfU0lNRF9DTE9ORV9VU0FCTEUuICAqLwogCiBzdGF0aWMgaW50Ci1nY25fc2ltZF9jbG9u ZV91c2FibGUgKHN0cnVjdCBjZ3JhcGhfbm9kZSAqQVJHX1VOVVNFRCAobm9kZSkpCitnY25f c2ltZF9jbG9uZV91c2FibGUgKHN0cnVjdCBjZ3JhcGhfbm9kZSAqQVJHX1VOVVNFRCAobm9k ZSksCisJCSAgICAgICBtYWNoaW5lX21vZGUgQVJHX1VOVVNFRCAodmVjdG9yX21vZGUpKQog ewogICAvKiBXZSBkb24ndCBuZWVkIHRvIGRvIGFueXRoaW5nIGhlcmUgYmVjYXVzZQogICAg ICBnY25fc2ltZF9jbG9uZV9jb21wdXRlX3ZlY3NpemVfYW5kX3NpbWRsZW4gY3VycmVudGx5 IG9ubHkgcmV0dXJucyBvbmUKZGlmZiAtLWdpdCBhL2djYy9jb25maWcvaTM4Ni9pMzg2LmNj IGIvZ2NjL2NvbmZpZy9pMzg2L2kzODYuY2MKaW5kZXggZmM1MDY4NTM5YzExNzQ4YTVhZGY3 MGVjNzdiMmYxY2FlMWExZTIzMS4uYzU0ZjY2NTQzZmRkNDEwM2Q1OGMyZjkzOTBhM2M5MTA2 MDU5N2I5NCAxMDA2NDQKLS0tIGEvZ2NjL2NvbmZpZy9pMzg2L2kzODYuY2MKKysrIGIvZ2Nj L2NvbmZpZy9pMzg2L2kzODYuY2MKQEAgLTI1MjQ5LDcgKzI1MjQ5LDggQEAgaXg4Nl9zaW1k X2Nsb25lX2NvbXB1dGVfdmVjc2l6ZV9hbmRfc2ltZGxlbiAoc3RydWN0IGNncmFwaF9ub2Rl ICpub2RlLAogICAgc2xpZ2h0bHkgbGVzcyBkZXNpcmFibGUsIGV0Yy4pLiAgKi8KIAogc3Rh dGljIGludAotaXg4Nl9zaW1kX2Nsb25lX3VzYWJsZSAoc3RydWN0IGNncmFwaF9ub2RlICpu b2RlKQoraXg4Nl9zaW1kX2Nsb25lX3VzYWJsZSAoc3RydWN0IGNncmFwaF9ub2RlICpub2Rl LAorCQkJbWFjaGluZV9tb2RlIEFSR19VTlVTRUQgKHZlY3Rvcl9tb2RlKSkKIHsKICAgc3dp dGNoIChub2RlLT5zaW1kY2xvbmUtPnZlY3NpemVfbWFuZ2xlKQogICAgIHsKZGlmZiAtLWdp dCBhL2djYy9kb2MvdG0udGV4aSBiL2djYy9kb2MvdG0udGV4aQppbmRleCBjOGI4YjEyNmIy NDI0YjY1NTJmODI0YmE0MmFjMzI5Y2ZhZjg0ZDg0Li4wM2Y3ZDcyYTQyOTIwNGE1ODQyNTNk YzVjNmU4ZmExYjMwNzQ3OTVkIDEwMDY0NAotLS0gYS9nY2MvZG9jL3RtLnRleGkKKysrIGIv Z2NjL2RvYy90bS50ZXhpCkBAIC02NDk4LDExICs2NDk4LDExIEBAIFRoaXMgaG9vayBzaG91 bGQgYWRkIGltcGxpY2l0IEBjb2Rle2F0dHJpYnV0ZSh0YXJnZXQoIi4uLiIpKX0gYXR0cmli dXRlCiB0byBTSU1EIGNsb25lIEB2YXJ7bm9kZX0gaWYgbmVlZGVkLgogQGVuZCBkZWZ0eXBl Zm4KIAotQGRlZnR5cGVmbiB7VGFyZ2V0IEhvb2t9IGludCBUQVJHRVRfU0lNRF9DTE9ORV9V U0FCTEUgKHN0cnVjdCBjZ3JhcGhfbm9kZSAqQHZhcnt9KQorQGRlZnR5cGVmbiB7VGFyZ2V0 IEhvb2t9IGludCBUQVJHRVRfU0lNRF9DTE9ORV9VU0FCTEUgKHN0cnVjdCBjZ3JhcGhfbm9k ZSAqQHZhcnt9LCBAdmFye21hY2hpbmVfbW9kZX0pCiBUaGlzIGhvb2sgc2hvdWxkIHJldHVy biAtMSBpZiBTSU1EIGNsb25lIEB2YXJ7bm9kZX0gc2hvdWxkbid0IGJlIHVzZWQKLWluIHZl Y3Rvcml6ZWQgbG9vcHMgaW4gY3VycmVudCBmdW5jdGlvbiwgb3Igbm9uLW5lZ2F0aXZlIG51 bWJlciBpZiBpdCBpcwotdXNhYmxlLiAgSW4gdGhhdCBjYXNlLCB0aGUgc21hbGxlciB0aGUg bnVtYmVyIGlzLCB0aGUgbW9yZSBkZXNpcmFibGUgaXQgaXMKLXRvIHVzZSBpdC4KK2luIHZl Y3Rvcml6ZWQgbG9vcHMgaW4gY3VycmVudCBmdW5jdGlvbiB3aXRoIEB2YXJ7dmVjdG9yX21v ZGV9LCBvcgorbm9uLW5lZ2F0aXZlIG51bWJlciBpZiBpdCBpcyB1c2FibGUuICBJbiB0aGF0 IGNhc2UsIHRoZSBzbWFsbGVyIHRoZSBudW1iZXIKK2lzLCB0aGUgbW9yZSBkZXNpcmFibGUg aXQgaXMgdG8gdXNlIGl0LgogQGVuZCBkZWZ0eXBlZm4KIAogQGRlZnR5cGVmbiB7VGFyZ2V0 IEhvb2t9IGludCBUQVJHRVRfU0lNVF9WRiAodm9pZCkKZGlmZiAtLWdpdCBhL2djYy90YXJn ZXQuZGVmIGIvZ2NjL3RhcmdldC5kZWYKaW5kZXggZmRhZDdiYmM5M2UyYWQ4YWVhMzAzMzZk NWNkNGFmNjc4MDFlOWM3NC4uN2U4OTIxYjZiZDQwNzg3NzAyNjg4MTlhMzg1OTVmZGNlNjEy YjU0OCAxMDA2NDQKLS0tIGEvZ2NjL3RhcmdldC5kZWYKKysrIGIvZ2NjL3RhcmdldC5kZWYK QEAgLTE2NDUsMTAgKzE2NDUsMTAgQEAgdm9pZCwgKHN0cnVjdCBjZ3JhcGhfbm9kZSAqKSwg TlVMTCkKIERFRkhPT0sKICh1c2FibGUsCiAiVGhpcyBob29rIHNob3VsZCByZXR1cm4gLTEg aWYgU0lNRCBjbG9uZSBAdmFye25vZGV9IHNob3VsZG4ndCBiZSB1c2VkXG5cCi1pbiB2ZWN0 b3JpemVkIGxvb3BzIGluIGN1cnJlbnQgZnVuY3Rpb24sIG9yIG5vbi1uZWdhdGl2ZSBudW1i ZXIgaWYgaXQgaXNcblwKLXVzYWJsZS4gIEluIHRoYXQgY2FzZSwgdGhlIHNtYWxsZXIgdGhl IG51bWJlciBpcywgdGhlIG1vcmUgZGVzaXJhYmxlIGl0IGlzXG5cCi10byB1c2UgaXQuIiwK LWludCwgKHN0cnVjdCBjZ3JhcGhfbm9kZSAqKSwgTlVMTCkKK2luIHZlY3Rvcml6ZWQgbG9v cHMgaW4gY3VycmVudCBmdW5jdGlvbiB3aXRoIEB2YXJ7dmVjdG9yX21vZGV9LCBvclxuXAor bm9uLW5lZ2F0aXZlIG51bWJlciBpZiBpdCBpcyB1c2FibGUuICBJbiB0aGF0IGNhc2UsIHRo ZSBzbWFsbGVyIHRoZSBudW1iZXJcblwKK2lzLCB0aGUgbW9yZSBkZXNpcmFibGUgaXQgaXMg dG8gdXNlIGl0LiIsCitpbnQsIChzdHJ1Y3QgY2dyYXBoX25vZGUgKiwgbWFjaGluZV9tb2Rl KSwgTlVMTCkKIAogSE9PS19WRUNUT1JfRU5EIChzaW1kX2Nsb25lKQogCmRpZmYgLS1naXQg YS9nY2MvdHJlZS12ZWN0LXN0bXRzLmNjIGIvZ2NjL3RyZWUtdmVjdC1zdG10cy5jYwppbmRl eCAxZGJlMTExNWRhNGQ3ZGQ0ZmM1OTBlNTgzMGE5YzdmMDViZTY5NDVhLi5mMDZhNTNkMzdl ZTA1NzM3ZTAwZTgwZDljMjY1MTkyYmVkZTZhYTE4IDEwMDY0NAotLS0gYS9nY2MvdHJlZS12 ZWN0LXN0bXRzLmNjCisrKyBiL2djYy90cmVlLXZlY3Qtc3RtdHMuY2MKQEAgLTQwNzQsNyAr NDA3NCwxNCBAQCB2ZWN0b3JpemFibGVfc2ltZF9jbG9uZV9jYWxsICh2ZWNfaW5mbyAqdmlu Zm8sIHN0bXRfdmVjX2luZm8gc3RtdF9pbmZvLAogCSAgdGhpc19iYWRuZXNzICs9IGZsb29y X2xvZzIgKG51bV9jYWxscykgKiA0MDk2OwogCWlmIChuLT5zaW1kY2xvbmUtPmluYnJhbmNo KQogCSAgdGhpc19iYWRuZXNzICs9IDgxOTI7Ci0JaW50IHRhcmdldF9iYWRuZXNzID0gdGFy Z2V0bS5zaW1kX2Nsb25lLnVzYWJsZSAobik7CisKKwkvKiBJZiBTVE1UX1ZJTkZPX1ZFQ1RZ UEUgaGFzIG5vdCBiZWVuIHNldCB5ZXQgcGFzcyB0aGUgZ2VuZXJhbCB2ZWN0b3IKKwkgICBt b2RlLCAgd2hpY2ggZm9yIHRhcmdldHMgdGhhdCB1c2UgaXQgd2lsbCBkZXRlcm1pbmUgd2hh dCBJU0Egd2UgY2FuCisJICAgdmVjdG9yaXplIHRoaXMgY29kZSB3aXRoLiAgKi8KKwltYWNo aW5lX21vZGUgdmVjdG9yX21vZGUgPSB2aW5mby0+dmVjdG9yX21vZGU7CisJaWYgKHZlY3R5 cGUpCisJICB2ZWN0b3JfbW9kZSA9IFRZUEVfTU9ERSAodmVjdHlwZSk7CisJaW50IHRhcmdl dF9iYWRuZXNzID0gdGFyZ2V0bS5zaW1kX2Nsb25lLnVzYWJsZSAobiwgdmVjdG9yX21vZGUp OwogCWlmICh0YXJnZXRfYmFkbmVzcyA8IDApCiAJICBjb250aW51ZTsKIAl0aGlzX2JhZG5l c3MgKz0gdGFyZ2V0X2JhZG5lc3MgKiA1MTI7Cg== --------------X1FUDPecHoqzEc8VT6E6AXpt--