On Thu, 25 May 2023 at 01:28, Richard Sandiford wrote: > > Prathamesh Kulkarni writes: > > On Wed, 24 May 2023 at 15:40, Richard Sandiford > > wrote: > >> > >> Prathamesh Kulkarni writes: > >> > On Mon, 22 May 2023 at 14:18, Richard Sandiford > >> > wrote: > >> >> > >> >> Prathamesh Kulkarni writes: > >> >> > Hi Richard, > >> >> > Thanks for the suggestions. Does the attached patch look OK ? > >> >> > Boostrap+test in progress on aarch64-linux-gnu. > >> >> > >> >> Like I say, please wait for the tests to complete before sending an RFA. > >> >> It saves a review cycle if the tests don't in fact pass. > >> > Right, sorry, will post patches after completion of testing henceforth. > >> >> > >> >> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > >> >> > index 29dbacfa917..e611a7cca25 100644 > >> >> > --- a/gcc/config/aarch64/aarch64.cc > >> >> > +++ b/gcc/config/aarch64/aarch64.cc > >> >> > @@ -22332,6 +22332,43 @@ aarch64_unzip_vector_init (machine_mode mode, rtx vals, bool even_p) > >> >> > return gen_rtx_PARALLEL (new_mode, vec); > >> >> > } > >> >> > > >> >> > +/* Return true if INSN is a scalar move. */ > >> >> > + > >> >> > +static bool > >> >> > +scalar_move_insn_p (const rtx_insn *insn) > >> >> > +{ > >> >> > + rtx set = single_set (insn); > >> >> > + if (!set) > >> >> > + return false; > >> >> > + rtx src = SET_SRC (set); > >> >> > + rtx dest = SET_DEST (set); > >> >> > + return is_a(GET_MODE (dest)) > >> >> > + && aarch64_mov_operand_p (src, GET_MODE (src)); > >> >> > >> >> Formatting: > >> >> > >> >> return (is_a(GET_MODE (dest)) > >> >> && aarch64_mov_operand_p (src, GET_MODE (src))); > >> >> > >> >> OK with that change if the tests pass, thanks. > >> > Unfortunately, the patch regressed vec-init-21.c: > >> > > >> > int8x16_t f_s8(int8_t x, int8_t y) > >> > { > >> > return (int8x16_t) { x, y, 1, 2, 3, 4, 5, 6, > >> > 7, 8, 9, 10, 11, 12, 13, 14 }; > >> > } > >> > > >> > -O3 code-gen trunk: > >> > f_s8: > >> > adrp x2, .LC0 > >> > ldr q0, [x2, #:lo12:.LC0] > >> > ins v0.b[0], w0 > >> > ins v0.b[1], w1 > >> > ret > >> > > >> > -O3 code-gen patch: > >> > f_s8: > >> > adrp x2, .LC0 > >> > ldr d31, [x2, #:lo12:.LC0] > >> > adrp x2, .LC1 > >> > ldr d0, [x2, #:lo12:.LC1] > >> > ins v31.b[0], w0 > >> > ins v0.b[0], w1 > >> > zip1 v0.16b, v31.16b, v0.16b > >> > ret > >> > > >> > With trunk, it chooses the fallback sequence because both fallback > >> > and zip1 sequence had cost = 20, however with patch applied, > >> > we end up with zip1 sequence cost = 24 and fallback sequence > >> > cost = 28. > >> > > >> > This happens because of using insn_cost instead of > >> > set_rtx_cost for the following expression: > >> > (set (reg:QI 100) > >> > (subreg/s/u:QI (reg/v:SI 94 [ y ]) 0)) > >> > set_rtx_cost returns 0 for above expression but insn_cost returns 4. > >> > >> Yeah, was wondering why you'd dropped the set_rtx_cost thing, > >> but decided not to question it since using insn_cost seemed > >> reasonable if it worked. > > The attached patch uses set_rtx_cost for single_set and insn_cost > > otherwise for non debug insns similar to seq_cost. > > FWIW, I think with the aarch64_mov_operand fix, the old way of using > insn_cost for everything would have worked too. But either way is fine. > > >> > This expression template appears twice in fallback sequence, which raises > >> > the cost to 28 from 20, while it appears once in each half of zip1 sequence, > >> > which raises the cost to 24 from 20, and so it now prefers zip1 sequence > >> > instead. > >> > > >> > I assumed this expression would be ignored because it looks like a scalar move, > >> > but that doesn't seem to be the case ? > >> > aarch64_classify_symbolic_expression returns > >> > SYMBOL_FORCE_TO_MEM for (subreg/s/u:QI (reg/v:SI 94 [ y ]) 0) > >> > and thus aarch64_mov_operand_p returns false. > >> > >> Ah, I guess it should be aarch64_mov_operand instead. Confusing that > >> they're so different... > > Thanks, using aarch64_mov_operand worked. > >> > >> > Another issue with the zip1 sequence above is using same register x2 > >> > for loading another half of constant in: > >> > adrp x2, .LC1 > >> > > >> > I guess this will create an output dependency from adrp x2, .LC0 -> > >> > adrp x2, .LC1 > >> > and anti-dependency from ldr d31, [x2, #:lo12:.LC0] -> adrp x2, .LC1 > >> > essentially forcing almost the entire sequence (except ins > >> > instructions) to execute sequentially ? > >> > >> I'd expect modern cores to handle that via renaming. > > Ah right, thanks for the clarification. > > > > For some reason, it seems git diff is not formatting the patch correctly :/ > > Or perhaps I am doing something wrongly. > > No, I think it's fine. It's just tabs vs. spaces. A leading > "+" followed by a tab is still only indented 8 columns, whereas > "+" followed by 6 spaces is indented 7 columns. So indentation > can look a bit weird in the diff. > > I was accounting for that though. :) > > > For eg, it shows: > > + return is_a(GET_MODE (dest)) > > + && aarch64_mov_operand (src, GET_MODE (src)); > > but after applying the patch, it's formatted correctly with "&& > > aarch64..." right below is_a, both on column 10. > > Yeah, the indentation itself was OK. But there's an “emacs rule” > that says that parens should be used when splitting an expression > over multiple lines like this. So: > > ------- > Formatting: > > return (is_a(GET_MODE (dest)) > && aarch64_mov_operand_p (src, GET_MODE (src))); > ------- > > was about adding the parens. > > > + for (; seq; seq = NEXT_INSN (seq)) > > + if (NONDEBUG_INSN_P (seq) > > + && !scalar_move_insn_p (seq)) > > + { > > + if (rtx set = single_set (seq)) > > + cost += set_rtx_cost (set, speed); > > + else > > + { > > + int this_cost = insn_cost (CONST_CAST_RTX_INSN (seq), speed); > > + if (this_cost > 0) > > + cost += this_cost; > > + else > > + cost++; > > + } > > + } > > I think it'd be better to do the single_set first, and pass the set > to scalar_move_insn_p. I.e. > > for (; seq; seq = NEXT_INSN (seq)) > if (NONDEBUG_INSN_P (seq)) > { > if (rtx set = single_set (seq)) > { > if (!scalar_move_insn_p (set)) > cost += set_rtx_cost (set, speed); > } > else > { > int this_cost = insn_cost (CONST_CAST_RTX_INSN (seq), speed); > if (this_cost > 0) > cost += this_cost; > else > cost++; > } > } > > Then scalar_move_insn_p can just be the last three statements, > adjusted as follows: > > rtx src = SET_SRC (set); > rtx dest = SET_DEST (set); > return (is_a (GET_MODE (dest)) > && aarch64_mov_operand (src, GET_MODE (dest))); > > Note the space after ">", and that the mode passed to aarch64_mov_operand > is the destination mode (since the source mode might be VOIDmode). > > OK with that change, thanks. Hi Richard, Thanks for the suggestions, and sorry for being a bit daft in my previous replies. Does the attached patch look OK ? Bootstrapped+tested on aarch64-linux-gnu. Thanks, Prathamesh > > Thanks, > Richard