From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 02E1A3858407 for ; Mon, 30 Jan 2023 16:41:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 02E1A3858407 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A6E991A00; Mon, 30 Jan 2023 08:42:32 -0800 (PST) Received: from [10.57.76.155] (unknown [10.57.76.155]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id C4B653F882; Mon, 30 Jan 2023 08:41:49 -0800 (PST) Content-Type: multipart/mixed; boundary="------------FvRAxhDUkiW4d42KuiqZ10It" Message-ID: <3f60c6b8-1c42-f528-dc24-f804d9fee719@arm.com> Date: Mon, 30 Jan 2023 16:41:44 +0000 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.6.1 Subject: Re: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE predicates before use [PR 107674] Content-Language: en-US To: Kyrylo Tkachov , "gcc-patches@gcc.gnu.org" Cc: Richard Sandiford , Richard Earnshaw , Richard Biener References: <13d03aef-f5d1-03fe-5281-31921d24dce0@arm.com> <22ba05fb-774e-62b8-64a2-90c5d73fcaba@arm.com> <433b8286-f54a-1a4a-e194-4ffbe0851a74@arm.com> From: "Andre Vieira (lists)" In-Reply-To: X-Spam-Status: No, score=-16.4 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_LOTSOFHASH,KAM_SHORT,NICE_REPLY_A,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This is a multi-part message in MIME format. --------------FvRAxhDUkiW4d42KuiqZ10It Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Changed the testcase to be more robust (as per the discussion for the first patch). Still need the OK for the mid-end (simplify-rtx) part. Kind regards, Andre On 27/01/2023 09:59, Kyrylo Tkachov wrote: > > >> -----Original Message----- >> From: Andre Vieira (lists) >> Sent: Friday, January 27, 2023 9:58 AM >> To: Kyrylo Tkachov ; gcc-patches@gcc.gnu.org >> Cc: Richard Sandiford ; Richard Earnshaw >> ; Richard Biener >> Subject: Re: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE >> predicates before use [PR 107674] >> >> >> >> On 26/01/2023 15:06, Kyrylo Tkachov wrote: >>> Hi Andre, >>> >>>> -----Original Message----- >>>> From: Andre Vieira (lists) >>>> Sent: Tuesday, January 24, 2023 1:54 PM >>>> To: gcc-patches@gcc.gnu.org >>>> Cc: Richard Sandiford ; Richard Earnshaw >>>> ; Richard Biener ; >>>> Kyrylo Tkachov >>>> Subject: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE >>>> predicates before use [PR 107674] >>>> >>>> Hi, >>>> >>>> This patch teaches GCC that zero-extending a MVE predicate from 16-bits >>>> to 32-bits and then only using 16-bits is a no-op. >>>> It does so in two steps: >>>> - it lets gcc know that it can access any MVE predicate mode using any >>>> other MVE predicate mode without needing to copy it, using the >>>> TARGET_MODES_TIEABLE_P hook, >>>> - it teaches simplify_subreg to optimize a subreg with a vector >>>> outermode, by replacing this outermode with a same-sized integer mode >>>> and trying the avalailable optimizations, then if successful it >>>> surrounds the result with a subreg casting it back to the original >>>> vector outermode. >>>> >>>> This removes the unnecessary zero-extending shown on PR 107674 >> (though >>>> it's a sign-extend there), that was introduced in gcc 11. >>>> >>>> Bootstrapped on aarch64-none-linux-gnu and regression tested on >>>> arm-none-eabi and armeb-none-eabi for armv8.1-m.main+mve.fp. >>>> >>>> OK for trunk? >>>> >>>> gcc/ChangeLog: >>>> >>>> PR target/107674 >>>> * conig/arm/arm.cc (arm_hard_regno_mode_ok): Use new MACRO. >>>> (arm_modes_tieable_p): Make MVE predicate modes tieable. >>>> * config/arm/arm.h (VALID_MVE_PRED_MODE): New define. >>>> * simplify-rtx.cc (simplify_context::simplify_subreg): Teach >>>> simplify_subreg to simplify subregs where the outermode is not >>>> scalar. >>> >>> The arm changes look ok to me. We'll want a midend maintainer to have a >> look at simplify-rtx.cc >>> >>>> >>>> gcc/testsuite/ChangeLog: >>>> >>>> * gcc.target/arm/mve/mve_vpt.c: Change to remove unecessary >>>> zero-extend. >>> >>> diff --git a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c >> b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c >>> index >> 26a565b79dd1348e361b3aa23a1d6e6d13bffce8..8e562a9f065eff157f63ebd5 >> acf9af0a2155b5c5 100644 >>> --- a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c >>> +++ b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c >>> @@ -16,9 +16,6 @@ void test0 (uint8_t *a, uint8_t *b, uint8_t *c) >>> ** vldrb.8 q2, \[r0\] >>> ** vldrb.8 q1, \[r1\] >>> ** vcmp.i8 eq, q2, q1 >>> -** vmrs r3, p0 @ movhi >>> -** uxth r3, r3 >>> -** vmsr p0, r3 @ movhi >>> ** vpst >>> ** vaddt.i8 q3, q2, q1 >>> ** vpst >>> >>> Ah I see, that's the testcase from patch 1/3 that I criticized :) >>> Maybe if we just scan for absence of an uxth, vmrs and vmsr it will be more >> robust? >>> Thanks, >>> Kyrill >> I could, but I would rather not. I have a patch series waiting for GCC >> 14 that does further improvements to this (and other VPST codegen) >> sequences and if I do scan for 'absence' of an instruction I have to >> break them up into single tests each. Also it wouldn't then fail if we >> start spilling the predicate directly to memory for instance. Like I >> mentioned in the previous patch, the sequence is unlikely to be able to >> change through scheduling (other than maybe the reordering of the loads >> through some bad luck, but I could make it robust to that). > > Ok, looks like it was thought through, so fine by me. > Thanks, > Kyrill --------------FvRAxhDUkiW4d42KuiqZ10It Content-Type: text/plain; charset=UTF-8; name="pr107674-2v2.patch" Content-Disposition: attachment; filename="pr107674-2v2.patch" Content-Transfer-Encoding: base64 ZGlmZiAtLWdpdCBhL2djYy9jb25maWcvYXJtL2FybS5oIGIvZ2NjL2NvbmZpZy9hcm0vYXJt LmgKaW5kZXggNjMyNzI4MzcxZDVjZWYzNjRlNDdiZjMzYmZhMGZhYmE3MzhkYjg3MS4uODMy NWU3YTg3NmUyZTAzZjE0Y2JhMDczODVjYzVhMWRkZDc3MTY1NSAxMDA2NDQKLS0tIGEvZ2Nj L2NvbmZpZy9hcm0vYXJtLmgKKysrIGIvZ2NjL2NvbmZpZy9hcm0vYXJtLmgKQEAgLTExMDQs NiArMTEwNCwxMCBAQCBleHRlcm4gY29uc3QgaW50IGFybV9hcmNoX2NkZV9jb3Byb2NfYml0 c1tdOwogICAgfHwgKE1PREUpID09IFYxNlFJbW9kZSB8fCAoTU9ERSkgPT0gVjhIRm1vZGUg fHwgKE1PREUpID09IFY0U0Ztb2RlIFwKICAgIHx8IChNT0RFKSA9PSBWMkRGbW9kZSkKIAor I2RlZmluZSBWQUxJRF9NVkVfUFJFRF9NT0RFKE1PREUpIFwKKyAgKChNT0RFKSA9PSBISW1v ZGUJCQkJCQkJXAorICAgfHwgKE1PREUpID09IFYxNkJJbW9kZSB8fCAoTU9ERSkgPT0gVjhC SW1vZGUgfHwgKE1PREUpID09IFY0Qkltb2RlKQorCiAjZGVmaW5lIFZBTElEX01WRV9TSV9N T0RFKE1PREUpIFwKICAgKChNT0RFKSA9PSBWMkRJbW9kZSB8fChNT0RFKSA9PSBWNFNJbW9k ZSB8fCAoTU9ERSkgPT0gVjhISW1vZGUgXAogICAgfHwgKE1PREUpID09IFYxNlFJbW9kZSkK ZGlmZiAtLWdpdCBhL2djYy9jb25maWcvYXJtL2FybS5jYyBiL2djYy9jb25maWcvYXJtL2Fy bS5jYwppbmRleCBlZmM0ODM0OWRkMzUwOGU2NzkwYzFhOWYzYmJhNWRhNjg5YTk4NmJjLi40 ZDlkMjAyY2FkMWYzOWJhMzg2ZGY5ZDhlNDI3NzAwN2ZkOTYwMjYyIDEwMDY0NAotLS0gYS9n Y2MvY29uZmlnL2FybS9hcm0uY2MKKysrIGIvZ2NjL2NvbmZpZy9hcm0vYXJtLmNjCkBAIC0y NTY1NiwxMCArMjU2NTYsNyBAQCBhcm1faGFyZF9yZWdub19tb2RlX29rICh1bnNpZ25lZCBp bnQgcmVnbm8sIG1hY2hpbmVfbW9kZSBtb2RlKQogICAgIHJldHVybiBmYWxzZTsKIAogICBp ZiAoSVNfVlBSX1JFR05VTSAocmVnbm8pKQotICAgIHJldHVybiBtb2RlID09IEhJbW9kZQot ICAgICAgfHwgbW9kZSA9PSBWMTZCSW1vZGUKLSAgICAgIHx8IG1vZGUgPT0gVjhCSW1vZGUK LSAgICAgIHx8IG1vZGUgPT0gVjRCSW1vZGU7CisgICAgcmV0dXJuIFZBTElEX01WRV9QUkVE X01PREUgKG1vZGUpOwogCiAgIGlmIChUQVJHRVRfVEhVTUIxKQogICAgIC8qIEZvciB0aGUg VGh1bWIgd2Ugb25seSBhbGxvdyB2YWx1ZXMgYmlnZ2VyIHRoYW4gU0ltb2RlIGluCkBAIC0y NTczOCw2ICsyNTczNSwxMCBAQCBhcm1fbW9kZXNfdGllYWJsZV9wIChtYWNoaW5lX21vZGUg bW9kZTEsIG1hY2hpbmVfbW9kZSBtb2RlMikKICAgaWYgKEdFVF9NT0RFX0NMQVNTIChtb2Rl MSkgPT0gR0VUX01PREVfQ0xBU1MgKG1vZGUyKSkKICAgICByZXR1cm4gdHJ1ZTsKIAorICBp ZiAoVEFSR0VUX0hBVkVfTVZFCisgICAgICAmJiAoVkFMSURfTVZFX1BSRURfTU9ERSAobW9k ZTEpICYmIFZBTElEX01WRV9QUkVEX01PREUgKG1vZGUyKSkpCisgICAgcmV0dXJuIHRydWU7 CisKICAgLyogV2Ugc3BlY2lmaWNhbGx5IHdhbnQgdG8gYWxsb3cgZWxlbWVudHMgb2YgInN0 cnVjdHVyZSIgbW9kZXMgdG8KICAgICAgYmUgdGllYWJsZSB0byB0aGUgc3RydWN0dXJlLiAg VGhpcyBtb3JlIGdlbmVyYWwgY29uZGl0aW9uIGFsbG93cwogICAgICBvdGhlciByYXJlciBz aXR1YXRpb25zIHRvby4gICovCmRpZmYgLS1naXQgYS9nY2Mvc2ltcGxpZnktcnR4LmNjIGIv Z2NjL3NpbXBsaWZ5LXJ0eC5jYwppbmRleCA3ZmIxZTk3ZmJlYTRlN2I4YjA5MWY1NzI0ZWJl MGNiNjFlZWU3ZWMzLi5hOTUxMjcyMTg2NTg1YzBhNWNjM2UwMTU1Mjg1ZTdhNjM1ODY1ZjQy IDEwMDY0NAotLS0gYS9nY2Mvc2ltcGxpZnktcnR4LmNjCisrKyBiL2djYy9zaW1wbGlmeS1y dHguY2MKQEAgLTc2NTIsNiArNzY1MiwyMiBAQCBzaW1wbGlmeV9jb250ZXh0OjpzaW1wbGlm eV9zdWJyZWcgKG1hY2hpbmVfbW9kZSBvdXRlcm1vZGUsIHJ0eCBvcCwKIAl9CiAgICAgfQog CisgIC8qIFRyeSBzaW1wbGlmeWluZyBhIFNVQlJFRyBleHByZXNzaW9uIG9mIGEgbm9uLWlu dGVnZXIgT1VURVJNT0RFIGJ5IHVzaW5nIGEKKyAgICAgTkVXX09VVEVSTU9ERSBvZiB0aGUg c2FtZSBzaXplIGluc3RlYWQsIG90aGVyIHNpbXBsaWZpY2F0aW9ucyByZWx5IG9uCisgICAg IGludGVnZXIgdG8gaW50ZWdlciBzdWJyZWdzIGFuZCB3ZSdkIHBvdGVudGlhbGx5IG1pc3Mg b3V0IG9uIG9wdGltaXphdGlvbnMKKyAgICAgb3RoZXJ3aXNlLiAgKi8KKyAgaWYgKGtub3du X2d0IChHRVRfTU9ERV9TSVpFIChpbm5lcm1vZGUpLAorCQlHRVRfTU9ERV9TSVpFIChvdXRl cm1vZGUpKQorICAgICAgJiYgU0NBTEFSX0lOVF9NT0RFX1AgKGlubmVybW9kZSkKKyAgICAg ICYmICFTQ0FMQVJfSU5UX01PREVfUCAob3V0ZXJtb2RlKQorICAgICAgJiYgaW50X21vZGVf Zm9yX3NpemUgKEdFVF9NT0RFX0JJVFNJWkUgKG91dGVybW9kZSksCisJCQkgICAgMCkuZXhp c3RzICgmaW50X291dGVybW9kZSkpCisgICAgeworICAgICAgcnR4IHRlbSA9IHNpbXBsaWZ5 X3N1YnJlZyAoaW50X291dGVybW9kZSwgb3AsIGlubmVybW9kZSwgYnl0ZSk7CisgICAgICBp ZiAodGVtKQorCXJldHVybiBzaW1wbGlmeV9nZW5fc3VicmVnIChvdXRlcm1vZGUsIHRlbSwg R0VUX01PREUgKHRlbSksIGJ5dGUpOworICAgIH0KKwogICAvKiBJZiBPUCBpcyBhIHZlY3Rv ciBjb21wYXJpc29uIGFuZCB0aGUgc3VicmVnIGlzIG5vdCBjaGFuZ2luZyB0aGUKICAgICAg bnVtYmVyIG9mIGVsZW1lbnRzIG9yIHRoZSBzaXplIG9mIHRoZSBlbGVtZW50cywgY2hhbmdl IHRoZSByZXN1bHQKICAgICAgb2YgdGhlIGNvbXBhcmlzb24gdG8gdGhlIG5ldyBtb2RlLiAg Ki8KZGlmZiAtLWdpdCBhL2djYy90ZXN0c3VpdGUvZ2NjLnRhcmdldC9hcm0vbXZlL212ZV92 cHQuYyBiL2djYy90ZXN0c3VpdGUvZ2NjLnRhcmdldC9hcm0vbXZlL212ZV92cHQuYwppbmRl eCAyOGU0Njk3YzNjNWJjYzg5YjM3ZmNiMjk2ZjRiNDZjODYxYWVkMjdkLi40MWY0ZTM4MDVk NjJkMDM0M2M0MDM1YTMyODI1MGZiOGM3YjBjNDdmIDEwMDY0NAotLS0gYS9nY2MvdGVzdHN1 aXRlL2djYy50YXJnZXQvYXJtL212ZS9tdmVfdnB0LmMKKysrIGIvZ2NjL3Rlc3RzdWl0ZS9n Y2MudGFyZ2V0L2FybS9tdmUvbXZlX3ZwdC5jCkBAIC0xNiwxMiArMTYsOSBAQCB2b2lkIHRl c3QwICh1aW50OF90ICphLCB1aW50OF90ICpiLCB1aW50OF90ICpjKQogKioJdmxkcmIuOAlx WzAtOV0rLCBcW3JbMC05XStcXQogKioJdmxkcmIuOAlxWzAtOV0rLCBcW3JbMC05XStcXQog KioJdmNtcC5pOAllcSwgcVswLTldKywgcVswLTldKwotKioJdm1ycwkoclswLTldKyksIHAw CUAgbW92aGkKLSoqCXV4dGgJXDEsIFwxCi0qKgl2bXNyCXAwLCBcMQlAIG1vdmhpCiAqKgl2 cHN0CiAqKgl2YWRkdC5pOAkocVswLTldKyksIHFbMC05XSssIHFbMC05XSsKICoqCXZwc3QK LSoqCXZzdHJidC44CVwyLCBcW3JbMC05XStcXQorKioJdnN0cmJ0LjgJXDEsIFxbclswLTld K1xdCiAqKglieAlscgogKi8K --------------FvRAxhDUkiW4d42KuiqZ10It--