Changed the testcase to be more robust (as per the discussion for the first patch). Still need the OK for the mid-end (simplify-rtx) part. Kind regards, Andre On 27/01/2023 09:59, Kyrylo Tkachov wrote: > > >> -----Original Message----- >> From: Andre Vieira (lists) >> Sent: Friday, January 27, 2023 9:58 AM >> To: Kyrylo Tkachov ; gcc-patches@gcc.gnu.org >> Cc: Richard Sandiford ; Richard Earnshaw >> ; Richard Biener >> Subject: Re: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE >> predicates before use [PR 107674] >> >> >> >> On 26/01/2023 15:06, Kyrylo Tkachov wrote: >>> Hi Andre, >>> >>>> -----Original Message----- >>>> From: Andre Vieira (lists) >>>> Sent: Tuesday, January 24, 2023 1:54 PM >>>> To: gcc-patches@gcc.gnu.org >>>> Cc: Richard Sandiford ; Richard Earnshaw >>>> ; Richard Biener ; >>>> Kyrylo Tkachov >>>> Subject: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE >>>> predicates before use [PR 107674] >>>> >>>> Hi, >>>> >>>> This patch teaches GCC that zero-extending a MVE predicate from 16-bits >>>> to 32-bits and then only using 16-bits is a no-op. >>>> It does so in two steps: >>>> - it lets gcc know that it can access any MVE predicate mode using any >>>> other MVE predicate mode without needing to copy it, using the >>>> TARGET_MODES_TIEABLE_P hook, >>>> - it teaches simplify_subreg to optimize a subreg with a vector >>>> outermode, by replacing this outermode with a same-sized integer mode >>>> and trying the avalailable optimizations, then if successful it >>>> surrounds the result with a subreg casting it back to the original >>>> vector outermode. >>>> >>>> This removes the unnecessary zero-extending shown on PR 107674 >> (though >>>> it's a sign-extend there), that was introduced in gcc 11. >>>> >>>> Bootstrapped on aarch64-none-linux-gnu and regression tested on >>>> arm-none-eabi and armeb-none-eabi for armv8.1-m.main+mve.fp. >>>> >>>> OK for trunk? >>>> >>>> gcc/ChangeLog: >>>> >>>> PR target/107674 >>>> * conig/arm/arm.cc (arm_hard_regno_mode_ok): Use new MACRO. >>>> (arm_modes_tieable_p): Make MVE predicate modes tieable. >>>> * config/arm/arm.h (VALID_MVE_PRED_MODE): New define. >>>> * simplify-rtx.cc (simplify_context::simplify_subreg): Teach >>>> simplify_subreg to simplify subregs where the outermode is not >>>> scalar. >>> >>> The arm changes look ok to me. We'll want a midend maintainer to have a >> look at simplify-rtx.cc >>> >>>> >>>> gcc/testsuite/ChangeLog: >>>> >>>> * gcc.target/arm/mve/mve_vpt.c: Change to remove unecessary >>>> zero-extend. >>> >>> diff --git a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c >> b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c >>> index >> 26a565b79dd1348e361b3aa23a1d6e6d13bffce8..8e562a9f065eff157f63ebd5 >> acf9af0a2155b5c5 100644 >>> --- a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c >>> +++ b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c >>> @@ -16,9 +16,6 @@ void test0 (uint8_t *a, uint8_t *b, uint8_t *c) >>> ** vldrb.8 q2, \[r0\] >>> ** vldrb.8 q1, \[r1\] >>> ** vcmp.i8 eq, q2, q1 >>> -** vmrs r3, p0 @ movhi >>> -** uxth r3, r3 >>> -** vmsr p0, r3 @ movhi >>> ** vpst >>> ** vaddt.i8 q3, q2, q1 >>> ** vpst >>> >>> Ah I see, that's the testcase from patch 1/3 that I criticized :) >>> Maybe if we just scan for absence of an uxth, vmrs and vmsr it will be more >> robust? >>> Thanks, >>> Kyrill >> I could, but I would rather not. I have a patch series waiting for GCC >> 14 that does further improvements to this (and other VPST codegen) >> sequences and if I do scan for 'absence' of an instruction I have to >> break them up into single tests each. Also it wouldn't then fail if we >> start spilling the predicate directly to memory for instance. Like I >> mentioned in the previous patch, the sequence is unlikely to be able to >> change through scheduling (other than maybe the reordering of the loads >> through some bad luck, but I could make it robust to that). > > Ok, looks like it was thought through, so fine by me. > Thanks, > Kyrill