Patch updated with comments to simplify pattern .from Richard Sandiford. Okay for trunk? 2017-09-14 Michael Collison * config/aarch64/aarch64.md (*aarch64_reg__minus3): New pattern. 2017-09-14 Michael Collison * gcc.target/aarch64/var_shift_mask_2.c: New test. -----Original Message----- From: Richard Sandiford [mailto:richard.sandiford@linaro.org] Sent: Wednesday, September 6, 2017 10:22 AM To: Michael Collison Cc: Richard Biener ; Richard Kenner ; GCC Patches ; nd ; Andrew Pinski Subject: Re: [PATCH] [Aarch64] Optimize subtract in shift counts Richard Sandiford writes: > Richard Sandiford writes: >> Michael Collison writes: >>> Richard, >>> >>> The problem with this approach for Aarch64 is that >>> TARGET_SHIFT_TRUNCATION_MASK is based on SHIFT_COUNT_TRUNCATED which >>> is normally 0 as it based on the TARGET_SIMD flag. >> >> Maybe I'm wrong, but that seems like a missed optimisation in itself. Sorry to follow up on myself yet again, but I'd forgotten this was because we allow the SIMD unit to do scalar shifts. So I guess we have no choice, even though it seems unfortunate. > +(define_insn_and_split "*aarch64_reg__minus3" > + [(set (match_operand:GPI 0 "register_operand" "=&r") > + (ASHIFT:GPI > + (match_operand:GPI 1 "register_operand" "r") > + (minus:QI (match_operand 2 "const_int_operand" "n") > + (match_operand:QI 3 "register_operand" "r"))))] > + "INTVAL (operands[2]) == GET_MODE_BITSIZE (mode)" > + "#" > + "&& true" > + [(const_int 0)] > + { > + /* Handle cases where operand 3 is a plain QI register, or > + a subreg with either a SImode or DImode register. */ > + > + rtx subreg_tmp = (REG_P (operands[3]) > + ? gen_lowpart_SUBREG (SImode, operands[3]) > + : SUBREG_REG (operands[3])); > + > + if (REG_P (subreg_tmp) && GET_MODE (subreg_tmp) == DImode) > + subreg_tmp = gen_lowpart_SUBREG (SImode, subreg_tmp); I think this all simplifies to: rtx subreg_tmp = gen_lowpart (SImode, operands[3]); (or it would be worth having a comment that explains why not). As well as being shorter, it will properly simplify hard REGs to new hard REGs. > + rtx tmp = (can_create_pseudo_p () ? gen_reg_rtx (SImode) > + : operands[0]); > + > + if (mode == DImode && !can_create_pseudo_p ()) > + tmp = gen_lowpart_SUBREG (SImode, operands[0]); I think this too would be simpler with gen_lowpart: rtx tmp = (can_create_pseudo_p () ? gen_reg_rtx (SImode) : gen_lowpart (SImode, operands[0])); > + > + emit_insn (gen_negsi2 (tmp, subreg_tmp)); > + > + rtx and_op = gen_rtx_AND (SImode, tmp, > + GEN_INT (GET_MODE_BITSIZE (mode) - 1)); > + > + rtx subreg_tmp2 = gen_lowpart_SUBREG (QImode, and_op); > + > + emit_insn (gen_3 (operands[0], operands[1], subreg_tmp2)); > + DONE; > + } > +) The pattern should probably set the "length" attribute to 8. Looks good to me with those changes FWIW. Thanks, Richard