From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 4D2553858D32 for ; Thu, 1 Dec 2022 10:59:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4D2553858D32 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7F895D6E; Thu, 1 Dec 2022 02:59:31 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.62]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 4EB753F73D; Thu, 1 Dec 2022 02:59:24 -0800 (PST) From: Richard Sandiford To: Wilco Dijkstra Mail-Followup-To: Wilco Dijkstra ,GCC Patches , richard.sandiford@arm.com Cc: GCC Patches Subject: Re: [PATCH][AArch64] Cleanup move immediate code References: Date: Thu, 01 Dec 2022 10:59:23 +0000 In-Reply-To: (Wilco Dijkstra's message of "Tue, 29 Nov 2022 13:57:45 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-39.4 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_LOTSOFHASH,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Wilco Dijkstra writes: > Hi Richard, > >> Just to make sure I understand: isn't it really just MOVN?=C2=A0 I would= have >> expected a 32-bit MOVZ to be equivalent to (and add no capabilities over) >> a 64-bit MOVZ. > > The 32-bit MOVZ immediates are equivalent, MOVN never overlaps, and > MOVI has some overlaps . Since we allow all 3 variants, the 2 alternatives > in the movdi pattern are overlapping for MOVZ and MOVI immediates. > >> I agree the ctz trick is more elegant than (and an improvement over) >> the current approach to testing for movz.=C2=A0 But I think the overall = logic >> is harder to follow than it was in the original version.=C2=A0 Initially >> canonicalising val2 based on the sign bit seems unintuitive since we >> still need to handle all four combinations of (top bit set, top bit clea= r) >> x (low 48 bits set, low 48 bits clear).=C2=A0 I preferred the original >> approach of testing once with the original value (for MOVZ) and once >> with the inverted value (for MOVN). > > Yes, the canonicalization on the sign ends up requiring 2 special cases. > Handling the MOVZ case first and then MOVN does avoid that, and makes > things simpler overall, so I've used that approach in v2. > >> Don't the new cases boil down to: if mode is DImode and the upper 32 bits >> are clear, we can test based on SImode instead?=C2=A0 In other words, co= uldn't >> the "(val >> 32) =3D=3D 0" part of the final test be done first, with the >> effect of changing the mode to SImode?=C2=A0 Something like: > > Yes that works. I used masking of the top bits to avoid repeatedly testin= g the > same condition. The new version removes most special cases and ends up > both smaller and simpler: > > > v2: Simplify the special cases in aarch64_move_imm, use aarch64_is_movz. > > Simplify, refactor and improve various move immediate functions. > Allow 32-bit MOVZ/I/N as a valid 64-bit immediate which removes special > cases in aarch64_internal_mov_immediate. Add new constraint so the movdi > pattern only needs a single alternative for move immediate. > > Passes bootstrap and regress, OK for commit? > > gcc/ChangeLog: > > * config/aarch64/aarch64.cc (aarch64_bitmask_imm): Use unsigned type. > (aarch64_zeroextended_move_imm): New function. > (aarch64_move_imm): Refactor, assert mode is SImode or DImode. > (aarch64_internal_mov_immediate): Assert mode is SImode or DImode. > Simplify special cases. > (aarch64_uimm12_shift): Simplify code. > (aarch64_clamp_to_uimm12_shift): Likewise. > (aarch64_movw_imm): Rename to aarch64_is_movz. > (aarch64_float_const_rtx_p): Pass either SImode or DImode to > aarch64_internal_mov_immediate. > (aarch64_rtx_costs): Likewise. > * config/aarch64/aarch64.md (movdi_aarch64): Merge 'N' and 'M' > constraints into single 'O'. > (mov_aarch64): Likewise. > * config/aarch64/aarch64-protos.h (aarch64_move_imm): Use unsigne= d. > (aarch64_bitmask_imm): Likewise. > (aarch64_uimm12_shift): Likewise. > (aarch64_zeroextended_move_imm): New prototype. > * config/aarch64/constraints.md: Add 'O' for 32/64-bit immediates, > limit 'N' to 64-bit only moves. > > --- > > diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aar= ch64-protos.h > index 4be93c93c26e091f878bc8e4cf06e90888405fb2..8bce6ec7599edcc2e6a1d8006= 450f35c0ce7f61f 100644 > --- a/gcc/config/aarch64/aarch64-protos.h > +++ b/gcc/config/aarch64/aarch64-protos.h > @@ -756,7 +756,7 @@ void aarch64_post_cfi_startproc (void); > poly_int64 aarch64_initial_elimination_offset (unsigned, unsigned); > int aarch64_get_condition_code (rtx); > bool aarch64_address_valid_for_prefetch_p (rtx, bool); > -bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode); > +bool aarch64_bitmask_imm (unsigned HOST_WIDE_INT val, machine_mode); > unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT val_in); > unsigned HOST_WIDE_INT aarch64_and_split_imm2 (HOST_WIDE_INT val_in); > bool aarch64_and_bitmask_imm (unsigned HOST_WIDE_INT val_in, machine_mod= e mode); > @@ -793,7 +793,7 @@ bool aarch64_masks_and_shift_for_bfi_p (scalar_int_mo= de, unsigned HOST_WIDE_INT, > unsigned HOST_WIDE_INT, > unsigned HOST_WIDE_INT); > bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx); > -bool aarch64_move_imm (HOST_WIDE_INT, machine_mode); > +bool aarch64_move_imm (unsigned HOST_WIDE_INT, machine_mode); > machine_mode aarch64_sve_int_mode (machine_mode); > opt_machine_mode aarch64_sve_pred_mode (unsigned int); > machine_mode aarch64_sve_pred_mode (machine_mode); > @@ -843,8 +843,9 @@ bool aarch64_sve_float_arith_immediate_p (rtx, bool); > bool aarch64_sve_float_mul_immediate_p (rtx); > bool aarch64_split_dimode_const_store (rtx, rtx); > bool aarch64_symbolic_address_p (rtx); > -bool aarch64_uimm12_shift (HOST_WIDE_INT); > +bool aarch64_uimm12_shift (unsigned HOST_WIDE_INT); > int aarch64_movk_shift (const wide_int_ref &, const wide_int_ref &); > +bool aarch64_zeroextended_move_imm (unsigned HOST_WIDE_INT); > bool aarch64_use_return_insn_p (void); > const char *aarch64_output_casesi (rtx *); >=20=20 > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index e97f3b32f7c7f43564d6a4207eae5a34b9e9bfe7..fab478ada8d72d163c8daa371= 3e5ff6945de409f 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -5625,12 +5625,10 @@ aarch64_bitmask_imm (unsigned HOST_WIDE_INT val) >=20=20 > /* Return true if VAL is a valid bitmask immediate for MODE. */ > bool > -aarch64_bitmask_imm (HOST_WIDE_INT val_in, machine_mode mode) > +aarch64_bitmask_imm (unsigned HOST_WIDE_INT val, machine_mode mode) > { > if (mode =3D=3D DImode) > - return aarch64_bitmask_imm (val_in); > - > - unsigned HOST_WIDE_INT val =3D val_in; > + return aarch64_bitmask_imm (val); >=20=20 > if (mode =3D=3D SImode) > return aarch64_bitmask_imm ((val & 0xffffffff) | (val << 32)); > @@ -5669,51 +5667,57 @@ aarch64_check_bitmask (unsigned HOST_WIDE_INT val, > } >=20=20 >=20=20 > -/* Return true if val is an immediate that can be loaded into a > - register by a MOVZ instruction. */ > -static bool > -aarch64_movw_imm (HOST_WIDE_INT val, scalar_int_mode mode) > +/* Return true if immediate VAL can only be created by using a 32-bit > + zero-extended move immediate, not by a 64-bit move. */ > +bool > +aarch64_zeroextended_move_imm (unsigned HOST_WIDE_INT val) > { > - if (GET_MODE_SIZE (mode) > 4) > - { > - if ((val & (((HOST_WIDE_INT) 0xffff) << 32)) =3D=3D val > - || (val & (((HOST_WIDE_INT) 0xffff) << 48)) =3D=3D val) > - return 1; > - } > - else > - { > - /* Ignore sign extension. */ > - val &=3D (HOST_WIDE_INT) 0xffffffff; > - } > - return ((val & (((HOST_WIDE_INT) 0xffff) << 0)) =3D=3D val > - || (val & (((HOST_WIDE_INT) 0xffff) << 16)) =3D=3D val); > + if (val < 65536 || (val >> 32) !=3D 0 || (val & 0xffff) =3D=3D 0) > + return false; > + return !aarch64_bitmask_imm (val); > } >=20=20 >=20=20 > -/* Return true if VAL is an immediate that can be loaded into a > - register in a single instruction. */ > +/* Return true if VAL is a valid MOVZ immediate. */ > +static inline bool > +aarch64_is_movz (unsigned HOST_WIDE_INT val) > +{ > + return (val >> (ctz_hwi (val) & 48)) < 65536; > +} > + > + > +/* Return true if VAL is an immediate that can be created by a single > + MOV instruction. */ > bool > -aarch64_move_imm (HOST_WIDE_INT val, machine_mode mode) > +aarch64_move_imm (unsigned HOST_WIDE_INT val, machine_mode mode) > { > - scalar_int_mode int_mode; > - if (!is_a (mode, &int_mode)) > - return false; > + gcc_assert (mode =3D=3D SImode || mode =3D=3D DImode); >=20=20 > - if (aarch64_movw_imm (val, int_mode) || aarch64_movw_imm (~val, int_mo= de)) > - return 1; > - return aarch64_bitmask_imm (val, int_mode); > + if (val < 65536) > + return true; > + > + unsigned HOST_WIDE_INT mask =3D > + (val >> 32) =3D=3D 0 || mode =3D=3D SImode ? 0xffffffff : HOST_WIDE_= INT_M1U; > + > + if (aarch64_is_movz (val & mask) || aarch64_is_movz (~val & mask)) > + return true; > + > + val =3D (val & mask) | ((val << 32) & ~mask); > + return aarch64_bitmask_imm (val); > } >=20=20 >=20=20 > static int > aarch64_internal_mov_immediate (rtx dest, rtx imm, bool generate, > - scalar_int_mode mode) > + machine_mode mode) > { > int i; > unsigned HOST_WIDE_INT val, val2, mask; > int one_match, zero_match; > int num_insns; >=20=20 > + gcc_assert (mode =3D=3D SImode || mode =3D=3D DImode); > + > val =3D INTVAL (imm); >=20=20 > if (aarch64_move_imm (val, mode)) > @@ -5723,31 +5727,6 @@ aarch64_internal_mov_immediate (rtx dest, rtx imm,= bool generate, > return 1; > } >=20=20 > - /* Check to see if the low 32 bits are either 0xffffXXXX or 0xXXXXffff > - (with XXXX non-zero). In that case check to see if the move can be = done in > - a smaller mode. */ > - val2 =3D val & 0xffffffff; > - if (mode =3D=3D DImode > - && aarch64_move_imm (val2, SImode) > - && (((val >> 32) & 0xffff) =3D=3D 0 || (val >> 48) =3D=3D 0)) > - { > - if (generate) > - emit_insn (gen_rtx_SET (dest, GEN_INT (val2))); > - > - /* Check if we have to emit a second instruction by checking to see > - if any of the upper 32 bits of the original DI mode value is set. */ > - if (val =3D=3D val2) > - return 1; > - > - i =3D (val >> 48) ? 48 : 32; > - > - if (generate) > - emit_insn (gen_insv_immdi (dest, GEN_INT (i), > - GEN_INT ((val >> i) & 0xffff))); > - > - return 2; > - } > - > if ((val >> 32) =3D=3D 0 || mode =3D=3D SImode) > { > if (generate) > @@ -5771,24 +5750,31 @@ aarch64_internal_mov_immediate (rtx dest, rtx imm= , bool generate, > one_match =3D ((~val & mask) =3D=3D 0) + ((~val & (mask << 16)) =3D=3D= 0) + > ((~val & (mask << 32)) =3D=3D 0) + ((~val & (mask << 48)) =3D=3D 0); >=20=20 > + /* Try a bitmask immediate and a movk to generate the immediate > + in 2 instructions. */ > + > if (zero_match < 2 && one_match < 2) > { > - /* Try emitting a bitmask immediate with a movk replacing 16 bits. > - For a 64-bit bitmask try whether changing 16 bits to all ones or > - zeroes creates a valid bitmask. To check any repeated bitmask, > - try using 16 bits from the other 32-bit half of val. */ > - > for (i =3D 0; i < 64; i +=3D 16) > - if (aarch64_check_bitmask (val, val2, mask << i)) > - { > - if (generate) > - { > - emit_insn (gen_rtx_SET (dest, GEN_INT (val2))); > - emit_insn (gen_insv_immdi (dest, GEN_INT (i), > - GEN_INT ((val >> i) & 0xffff))); > - } > - return 2; > - } > + { > + if (aarch64_check_bitmask (val, val2, mask << i)) > + break; > + > + val2 =3D val & ~(mask << i); > + if ((val2 >> 32) =3D=3D 0 && aarch64_move_imm (val2, DImode)) > + break; > + } > + > + if (i !=3D 64) > + { > + if (generate) > + { > + emit_insn (gen_rtx_SET (dest, GEN_INT (val2))); > + emit_insn (gen_insv_immdi (dest, GEN_INT (i), > + GEN_INT ((val >> i) & 0xffff))); > + } > + return 2; > + } > } >=20=20 > /* Try a bitmask plus 2 movk to generate the immediate in 3 instructio= ns. */ > @@ -5857,26 +5843,24 @@ aarch64_mov128_immediate (rtx imm) > /* Return true if val can be encoded as a 12-bit unsigned immediate with > a left shift of 0 or 12 bits. */ > bool > -aarch64_uimm12_shift (HOST_WIDE_INT val) > +aarch64_uimm12_shift (unsigned HOST_WIDE_INT val) > { > - return ((val & (((HOST_WIDE_INT) 0xfff) << 0)) =3D=3D val > - || (val & (((HOST_WIDE_INT) 0xfff) << 12)) =3D=3D val > - ); > + return val < 4096 || (val & 0xfff000) =3D=3D val; > } >=20=20 > /* Returns the nearest value to VAL that will fit as a 12-bit unsigned i= mmediate > that can be created with a left shift of 0 or 12. */ > static HOST_WIDE_INT > -aarch64_clamp_to_uimm12_shift (HOST_WIDE_INT val) > +aarch64_clamp_to_uimm12_shift (unsigned HOST_WIDE_INT val) > { > /* Check to see if the value fits in 24 bits, as that is the maximum w= e can > handle correctly. */ > - gcc_assert ((val & 0xffffff) =3D=3D val); > + gcc_assert (val < 0x1000000); >=20=20 > - if (((val & 0xfff) << 0) =3D=3D val) > + if (val < 4096) > return val; >=20=20 > - return val & (0xfff << 12); > + return val & 0xfff000; > } >=20=20 >=20=20 > @@ -7024,8 +7008,7 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) > return; > } >=20=20 > - aarch64_internal_mov_immediate (dest, imm, true, > - as_a (mode)); > + aarch64_internal_mov_immediate (dest, imm, true, mode); > } >=20=20 > /* Return the MEM rtx that provides the canary value that should be used > @@ -11197,9 +11180,7 @@ aarch64_float_const_rtx_p (rtx x) > && SCALAR_FLOAT_MODE_P (mode) > && aarch64_reinterpret_float_as_int (x, &ival)) > { > - scalar_int_mode imode =3D (mode =3D=3D HFmode > - ? SImode > - : int_mode_for_mode (mode).require ()); > + machine_mode imode =3D (mode =3D=3D DFmode) ? DImode : SImode; It looks like this might mishandle DDmode, if not now then in the future. Seems safer to use known_eq (GET_MODE_SIZE (mode), 8) > int num_instr =3D aarch64_internal_mov_immediate > (NULL_RTX, gen_int_mode (ival, imode), false, imode); > return num_instr < 3; > @@ -13857,10 +13838,9 @@ aarch64_rtx_costs (rtx x, machine_mode mode, int= outer ATTRIBUTE_UNUSED, > proportionally expensive to the number of instructions > required to build that constant. This is true whether we > are compiling for SPEED or otherwise. */ > - if (!is_a (mode, &int_mode)) > - int_mode =3D word_mode; > + machine_mode imode =3D (mode =3D=3D SImode) ? SImode : DImode; And here known_le (GET_MODE_SIZE (mode), 4), so that we handle QImode and HImode as SImode rather than DImode. > *cost =3D COSTS_N_INSNS (aarch64_internal_mov_immediate > - (NULL_RTX, x, false, int_mode)); > + (NULL_RTX, x, false, imode)); > } > return true; >=20=20 > @@ -13876,9 +13856,7 @@ aarch64_rtx_costs (rtx x, machine_mode mode, int = outer ATTRIBUTE_UNUSED, > bool succeed =3D aarch64_reinterpret_float_as_int (x, &ival); > gcc_assert (succeed); >=20=20 > - scalar_int_mode imode =3D (mode =3D=3D HFmode > - ? SImode > - : int_mode_for_mode (mode).require ()); > + machine_mode imode =3D (mode =3D=3D DFmode) ? DImode : SImode; Same GET_MODE_SIZE test here. > int ncost =3D aarch64_internal_mov_immediate > (NULL_RTX, gen_int_mode (ival, imode), false, imode); > *cost +=3D COSTS_N_INSNS (ncost); > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index 76b6898ca048c559ff7f7fba78119161b3d382f6..5694a1efea1df8e5325204751= 40c8748b54e3d89 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -1309,16 +1309,15 @@ (define_insn_and_split "*movsi_aarch64" > ) >=20=20 > (define_insn_and_split "*movdi_aarch64" > - [(set (match_operand:DI 0 "nonimmediate_operand" "=3Dr,k,r,r,r,r,r, r,= w, m,m, r, r, r, w,r,w, w") > - (match_operand:DI 1 "aarch64_mov_operand" " r,r,k,N,M,n,Usv,m,m,rZ,w,U= sw,Usa,Ush,rZ,w,w,Dd"))] > + [(set (match_operand:DI 0 "nonimmediate_operand" "=3Dr,k,r,r,r,r, r,w,= m,m, r, r, r, w,r,w, w") > + (match_operand:DI 1 "aarch64_mov_operand" " r,r,k,O,n,Usv,m,m,rZ,w,Usw= ,Usa,Ush,rZ,w,w,Dd"))] > "(register_operand (operands[0], DImode) > || aarch64_reg_or_zero (operands[1], DImode))" > "@ > mov\\t%x0, %x1 > mov\\t%0, %x1 > mov\\t%x0, %1 > - mov\\t%x0, %1 > - mov\\t%w0, %1 > + * return aarch64_zeroextended_move_imm (INTVAL (operands[1])) ? \"mov= \\t%w0, %1\" : \"mov\\t%x0, %1\"; > # > * return aarch64_output_sve_cnt_immediate (\"cnt\", \"%x0\", operands= [1]); > ldr\\t%x0, %1 > @@ -1340,11 +1339,11 @@ (define_insn_and_split "*movdi_aarch64" > DONE; > }" > ;; The "mov_imm" type for CNTD is just a placeholder. > - [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,mov_imm,mov= _imm, > + [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,mov_imm, > load_8,load_8,store_8,store_8,load_8,adr,adr,f_mcr,f_mrc, > fmov,neon_move") > - (set_attr "arch" "*,*,*,*,*,*,sve,*,fp,*,fp,*,*,*,fp,fp,fp,simd") > - (set_attr "length" "4,4,4,4,4,*, 4,4, 4,4, 4,8,4,4, 4, 4, 4, 4")] > + (set_attr "arch" "*,*,*,*,*,sve,*,fp,*,fp,*,*,*,fp,fp,fp,simd") > + (set_attr "length" "4,4,4,4,*, 4,4, 4,4, 4,8,4,4, 4, 4, 4, 4")] > ) >=20=20 > (define_insn "insv_imm" > @@ -1508,7 +1507,7 @@ (define_insn "*mov_aarch64" >=20=20 > (define_insn "*mov_aarch64" > [(set (match_operand:DFD 0 "nonimmediate_operand" "=3Dw, w ,?r,w,w ,= w ,w,m,r,m ,r,r") > - (match_operand:DFD 1 "general_operand" "Y , ?rY, w,w,Ufc,Uvi,m,w,m= ,rY,r,N"))] > + (match_operand:DFD 1 "general_operand" "Y , ?rY, w,w,Ufc,Uvi,m,w,m= ,rY,r,O"))] > "TARGET_FLOAT && (register_operand (operands[0], mode) > || aarch64_reg_or_fp_zero (operands[1], mode))" > "@ > @@ -1523,7 +1522,7 @@ (define_insn "*mov_aarch64" > ldr\\t%x0, %1 > str\\t%x1, %0 > mov\\t%x0, %x1 > - mov\\t%x0, %1" > + * return aarch64_zeroextended_move_imm (INTVAL (operands[1])) ? \"mov= \\t%w0, %1\" : \"mov\\t%x0, %1\";" > [(set_attr "type" "neon_move,f_mcr,f_mrc,fmov,fconstd,neon_move,\ > f_loadd,f_stored,load_8,store_8,mov_reg,\ > fconstd") > diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/const= raints.md > index 29efb6c0cff7574c9b239ef358acaca96dd75d03..6168295814c3ce4b6f65030cd= dfd08172b6febd8 100644 > --- a/gcc/config/aarch64/constraints.md > +++ b/gcc/config/aarch64/constraints.md > @@ -106,6 +106,12 @@ (define_constraint "M" >=20=20 > (define_constraint "N" > "A constant that can be used with a 64-bit MOV immediate operation." > + (and (match_code "const_int") > + (match_test "aarch64_move_imm (ival, DImode)") > + (match_test "!aarch64_zeroextended_move_imm (ival)"))) Sorry for not noticing last time, but: rather than have aarch64_zeroextended_move_imm (which is quite a complicated test), could we just add an extra (default off) parameter to aarch64_move_imm that suppresses the (val >> 32) =3D=3D 0 test? Thanks, Richard > + > +(define_constraint "O" > + "A constant that can be used with a 32 or 64-bit MOV immediate operatio= n." > (and (match_code "const_int") > (match_test "aarch64_move_imm (ival, DImode)"))) >=20=20