Hi Uros, As requested, here's an updated version of my patch that introduces a new const_0_to_255_not_mul_8_operand as you've requested. I think in this instance, having mutually exclusive patterns that can appear in any order, without imposing implicit ordering constraints, is slightly preferable, especially as (thanks to STV) some related patterns may appear in sse.md and others appear in i386.md (making ordering tricky). This patch has been retested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32}, with no new failures. Ok for mainline? 2022-08-12 Roger Sayle Uroš Bizjak gcc/ChangeLog * config/i386/predicates.md (const_0_to_255_not_mul_8_operand): New predicate for values between 0/1 and 255, not multiples of 8. * config/i386/sse.md (ashlv1ti3): Delay lowering of logical left shifts by constant bit counts. (*ashlvti3_internal): New define_insn_and_split that lowers logical left shifts by constant bit counts, that aren't multiples of 8, before reload. (lshrv1ti3): Delay lowering of logical right shifts by constant. (*lshrv1ti3_internal): New define_insn_and_split that lowers logical right shifts by constant bit counts, that aren't multiples of 8, before reload. (ashrv1ti3):: Delay lowering of arithmetic right shifts by constant bit counts. (*ashrv1ti3_internal): New define_insn_and_split that lowers arithmetic right shifts by constant bit counts before reload. (rotlv1ti3): Delay lowering of rotate left by constant. (*rotlv1ti3_internal): New define_insn_and_split that lowers rotate left by constant bits counts before reload. (rotrv1ti3): Delay lowering of rotate right by constant. (*rotrv1ti3_internal): New define_insn_and_split that lowers rotate right by constant bits counts before reload. Thanks again, Roger > -----Original Message----- > From: Uros Bizjak > Sent: 08 August 2022 08:48 > To: Roger Sayle > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [x86 PATCH] Move V1TI shift/rotate lowering from expand to pre- > reload split. > > On Fri, Aug 5, 2022 at 8:36 PM Roger Sayle > wrote: > > > > > > This patch moves the lowering of 128-bit V1TImode shifts and rotations > > by constant bit counts to sequences of SSE operations from the RTL > > expansion pass to the pre-reload split pass. Postponing this > > splitting of shifts and rotates enables (will enable) the TImode > > equivalents of these operations/ instructions to be considered as > > candidates by the (TImode) STV pass. > > Technically, this patch changes the existing expanders to continue to > > lower shifts by variable amounts, but constant operands become RTL > > instructions, specified by define_insn_and_split that are triggered by > > x86_pre_reload_split. The one minor complication is that logical > > shifts by multiples of eight, don't get split, but are handled by > > existing insn patterns, such as sse2_ashlv1ti3 and sse2_lshrv1ti3. > > There should be no changes in generated code with this patch, which > > just adjusts the pass in which transformations get applied. > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > and make -k check, both with and without --target_board=unix{-m32}, > > with no new failures. Ok for mainline? > > > > > > > > 2022-08-05 Roger Sayle > > > > gcc/ChangeLog > > * config/i386/sse.md (ashlv1ti3): Delay lowering of logical left > > shifts by constant bit counts. > > (*ashlvti3_internal): New define_insn_and_split that lowers > > logical left shifts by constant bit counts, that aren't multiples > > of 8, before reload. > > (lshrv1ti3): Delay lowering of logical right shifts by constant. > > (*lshrv1ti3_internal): New define_insn_and_split that lowers > > logical right shifts by constant bit counts, that aren't multiples > > of 8, before reload. > > (ashrv1ti3):: Delay lowering of arithmetic right shifts by > > constant bit counts. > > (*ashrv1ti3_internal): New define_insn_and_split that lowers > > arithmetic right shifts by constant bit counts before reload. > > (rotlv1ti3): Delay lowering of rotate left by constant. > > (*rotlv1ti3_internal): New define_insn_and_split that lowers > > rotate left by constant bits counts before reload. > > (rotrv1ti3): Delay lowering of rotate right by constant. > > (*rotrv1ti3_internal): New define_insn_and_split that lowers > > rotate right by constant bits counts before reload. > > +(define_insn_and_split "*ashlv1ti3_internal" > + [(set (match_operand:V1TI 0 "register_operand") > (ashift:V1TI > (match_operand:V1TI 1 "register_operand") > - (match_operand:QI 2 "general_operand")))] > - "TARGET_SSE2 && TARGET_64BIT" > + (match_operand:SI 2 "const_0_to_255_operand")))] > + "TARGET_SSE2 > + && TARGET_64BIT > + && (INTVAL (operands[2]) & 7) != 0 > > Please introduce const_0_to_255_not_mul_8_operand predicate. > Alternatively, and preferably, you can use pattern shadowing, where the > preceding, more constrained pattern will match before the following, more > broad pattern will. > > Uros.