From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 9691E3858C00 for ; Fri, 27 Jan 2023 09:58:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9691E3858C00 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 29C1F2B; Fri, 27 Jan 2023 01:58:51 -0800 (PST) Received: from [10.57.76.247] (unknown [10.57.76.247]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 58C603F5A1; Fri, 27 Jan 2023 01:58:08 -0800 (PST) Message-ID: <433b8286-f54a-1a4a-e194-4ffbe0851a74@arm.com> Date: Fri, 27 Jan 2023 09:58:03 +0000 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.6.1 Subject: Re: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE predicates before use [PR 107674] Content-Language: en-US To: Kyrylo Tkachov , "gcc-patches@gcc.gnu.org" Cc: Richard Sandiford , Richard Earnshaw , Richard Biener References: <13d03aef-f5d1-03fe-5281-31921d24dce0@arm.com> <22ba05fb-774e-62b8-64a2-90c5d73fcaba@arm.com> From: "Andre Vieira (lists)" In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-17.2 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_SHORT,NICE_REPLY_A,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 26/01/2023 15:06, Kyrylo Tkachov wrote: > Hi Andre, > >> -----Original Message----- >> From: Andre Vieira (lists) >> Sent: Tuesday, January 24, 2023 1:54 PM >> To: gcc-patches@gcc.gnu.org >> Cc: Richard Sandiford ; Richard Earnshaw >> ; Richard Biener ; >> Kyrylo Tkachov >> Subject: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE >> predicates before use [PR 107674] >> >> Hi, >> >> This patch teaches GCC that zero-extending a MVE predicate from 16-bits >> to 32-bits and then only using 16-bits is a no-op. >> It does so in two steps: >> - it lets gcc know that it can access any MVE predicate mode using any >> other MVE predicate mode without needing to copy it, using the >> TARGET_MODES_TIEABLE_P hook, >> - it teaches simplify_subreg to optimize a subreg with a vector >> outermode, by replacing this outermode with a same-sized integer mode >> and trying the avalailable optimizations, then if successful it >> surrounds the result with a subreg casting it back to the original >> vector outermode. >> >> This removes the unnecessary zero-extending shown on PR 107674 (though >> it's a sign-extend there), that was introduced in gcc 11. >> >> Bootstrapped on aarch64-none-linux-gnu and regression tested on >> arm-none-eabi and armeb-none-eabi for armv8.1-m.main+mve.fp. >> >> OK for trunk? >> >> gcc/ChangeLog: >> >> PR target/107674 >> * conig/arm/arm.cc (arm_hard_regno_mode_ok): Use new MACRO. >> (arm_modes_tieable_p): Make MVE predicate modes tieable. >> * config/arm/arm.h (VALID_MVE_PRED_MODE): New define. >> * simplify-rtx.cc (simplify_context::simplify_subreg): Teach >> simplify_subreg to simplify subregs where the outermode is not >> scalar. > > The arm changes look ok to me. We'll want a midend maintainer to have a look at simplify-rtx.cc > >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/arm/mve/mve_vpt.c: Change to remove unecessary >> zero-extend. > > diff --git a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c > index 26a565b79dd1348e361b3aa23a1d6e6d13bffce8..8e562a9f065eff157f63ebd5acf9af0a2155b5c5 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c > +++ b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c > @@ -16,9 +16,6 @@ void test0 (uint8_t *a, uint8_t *b, uint8_t *c) > ** vldrb.8 q2, \[r0\] > ** vldrb.8 q1, \[r1\] > ** vcmp.i8 eq, q2, q1 > -** vmrs r3, p0 @ movhi > -** uxth r3, r3 > -** vmsr p0, r3 @ movhi > ** vpst > ** vaddt.i8 q3, q2, q1 > ** vpst > > Ah I see, that's the testcase from patch 1/3 that I criticized :) > Maybe if we just scan for absence of an uxth, vmrs and vmsr it will be more robust? > Thanks, > Kyrill I could, but I would rather not. I have a patch series waiting for GCC 14 that does further improvements to this (and other VPST codegen) sequences and if I do scan for 'absence' of an instruction I have to break them up into single tests each. Also it wouldn't then fail if we start spilling the predicate directly to memory for instance. Like I mentioned in the previous patch, the sequence is unlikely to be able to change through scheduling (other than maybe the reordering of the loads through some bad luck, but I could make it robust to that).