Hi all, Here's the re-spun patch. Aside from the grouping of the split patterns it now also uses h register for the fmov for HF when available, otherwise it forces a literal load. Regression tested on aarch64-none-linux-gnu and no regressions. OK for trunk? Thanks, Tamar gcc/ 2017-06-26 Tamar Christina Richard Sandiford * config/aarch64/aarch64.md (mov): Generalize. (*movhf_aarch64, *movsf_aarch64, *movdf_aarch64): Add integer and movi cases. (movi-split-hf-df-sf split, fp16): New. (enabled): Added TARGET_FP_F16INST. * config/aarch64/iterators.md (GPF_HF): New. ________________________________________ From: Tamar Christina Sent: Wednesday, June 21, 2017 11:48:33 AM To: James Greenhalgh Cc: GCC Patches; nd; Marcus Shawcroft; Richard Earnshaw Subject: RE: [PATCH][GCC][AArch64] optimize float immediate moves (2 /4) - HF/DF/SF mode. > > movi\\t%0.4h, #0 > > - mov\\t%0.h[0], %w1 > > + fmov\\t%s0, %w1 > > Should this not be %h0? The problem is that H registers are only available in ARMv8.2+, I'm not sure what to do about ARMv8.1 given your other feedback Pointing out that the bit patterns between how it's stored in s vs h registers differ. > > > umov\\t%w0, %1.h[0] > > mov\\t%0.h[0], %1.h[0] > > + fmov\\t%s0, %1 > > Likewise, and much more important for correctness as it changes the way the > bit pattern ends up in the register (see table C2-1 in release B.a of the ARM > Architecture Reference Manual for ARMv8-A), here. > > > + * return aarch64_output_scalar_simd_mov_immediate (operands[1], > > + SImode); > > ldr\\t%h0, %1 > > str\\t%h1, %0 > > ldrh\\t%w0, %1 > > strh\\t%w1, %0 > > mov\\t%w0, %w1" > > - [(set_attr "type" > "neon_move,neon_from_gp,neon_to_gp,neon_move,\ > > - f_loads,f_stores,load1,store1,mov_reg") > > - (set_attr "simd" "yes,yes,yes,yes,*,*,*,*,*")] > > + "&& can_create_pseudo_p () > > + && !aarch64_can_const_movi_rtx_p (operands[1], HFmode) > > + && !aarch64_float_const_representable_p (operands[1]) > > + && aarch64_float_const_rtx_p (operands[1])" > > + [(const_int 0)] > > + "{ > > + unsigned HOST_WIDE_INT ival; > > + if (!aarch64_reinterpret_float_as_int (operands[1], &ival)) > > + FAIL; > > + > > + rtx tmp = gen_reg_rtx (SImode); > > + aarch64_expand_mov_immediate (tmp, GEN_INT (ival)); > > + tmp = simplify_gen_subreg (HImode, tmp, SImode, 0); > > + emit_move_insn (operands[0], gen_lowpart (HFmode, tmp)); > > + DONE; > > + }" > > + [(set_attr "type" "neon_move,f_mcr,neon_to_gp,neon_move,fconsts, > \ > > + neon_move,f_loads,f_stores,load1,store1,mov_reg") > > + (set_attr "simd" "yes,*,yes,yes,*,yes,*,*,*,*,*")] > > ) > > Thanks, > James