On Wed, 24 May 2023 at 15:40, Richard Sandiford wrote: > > Prathamesh Kulkarni writes: > > On Mon, 22 May 2023 at 14:18, Richard Sandiford > > wrote: > >> > >> Prathamesh Kulkarni writes: > >> > Hi Richard, > >> > Thanks for the suggestions. Does the attached patch look OK ? > >> > Boostrap+test in progress on aarch64-linux-gnu. > >> > >> Like I say, please wait for the tests to complete before sending an RFA. > >> It saves a review cycle if the tests don't in fact pass. > > Right, sorry, will post patches after completion of testing henceforth. > >> > >> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > >> > index 29dbacfa917..e611a7cca25 100644 > >> > --- a/gcc/config/aarch64/aarch64.cc > >> > +++ b/gcc/config/aarch64/aarch64.cc > >> > @@ -22332,6 +22332,43 @@ aarch64_unzip_vector_init (machine_mode mode, rtx vals, bool even_p) > >> > return gen_rtx_PARALLEL (new_mode, vec); > >> > } > >> > > >> > +/* Return true if INSN is a scalar move. */ > >> > + > >> > +static bool > >> > +scalar_move_insn_p (const rtx_insn *insn) > >> > +{ > >> > + rtx set = single_set (insn); > >> > + if (!set) > >> > + return false; > >> > + rtx src = SET_SRC (set); > >> > + rtx dest = SET_DEST (set); > >> > + return is_a(GET_MODE (dest)) > >> > + && aarch64_mov_operand_p (src, GET_MODE (src)); > >> > >> Formatting: > >> > >> return (is_a(GET_MODE (dest)) > >> && aarch64_mov_operand_p (src, GET_MODE (src))); > >> > >> OK with that change if the tests pass, thanks. > > Unfortunately, the patch regressed vec-init-21.c: > > > > int8x16_t f_s8(int8_t x, int8_t y) > > { > > return (int8x16_t) { x, y, 1, 2, 3, 4, 5, 6, > > 7, 8, 9, 10, 11, 12, 13, 14 }; > > } > > > > -O3 code-gen trunk: > > f_s8: > > adrp x2, .LC0 > > ldr q0, [x2, #:lo12:.LC0] > > ins v0.b[0], w0 > > ins v0.b[1], w1 > > ret > > > > -O3 code-gen patch: > > f_s8: > > adrp x2, .LC0 > > ldr d31, [x2, #:lo12:.LC0] > > adrp x2, .LC1 > > ldr d0, [x2, #:lo12:.LC1] > > ins v31.b[0], w0 > > ins v0.b[0], w1 > > zip1 v0.16b, v31.16b, v0.16b > > ret > > > > With trunk, it chooses the fallback sequence because both fallback > > and zip1 sequence had cost = 20, however with patch applied, > > we end up with zip1 sequence cost = 24 and fallback sequence > > cost = 28. > > > > This happens because of using insn_cost instead of > > set_rtx_cost for the following expression: > > (set (reg:QI 100) > > (subreg/s/u:QI (reg/v:SI 94 [ y ]) 0)) > > set_rtx_cost returns 0 for above expression but insn_cost returns 4. > > Yeah, was wondering why you'd dropped the set_rtx_cost thing, > but decided not to question it since using insn_cost seemed > reasonable if it worked. [reposting because my reply got blocked for moderator approval] The attached patch uses set_rtx_cost for single_set and insn_cost otherwise for non debug insns similar to seq_cost. > > > This expression template appears twice in fallback sequence, which raises > > the cost to 28 from 20, while it appears once in each half of zip1 sequence, > > which raises the cost to 24 from 20, and so it now prefers zip1 sequence > > instead. > > > > I assumed this expression would be ignored because it looks like a scalar move, > > but that doesn't seem to be the case ? > > aarch64_classify_symbolic_expression returns > > SYMBOL_FORCE_TO_MEM for (subreg/s/u:QI (reg/v:SI 94 [ y ]) 0) > > and thus aarch64_mov_operand_p returns false. > > Ah, I guess it should be aarch64_mov_operand instead. Confusing that > they're so different... Thanks, using aarch64_mov_operand worked. > > > Another issue with the zip1 sequence above is using same register x2 > > for loading another half of constant in: > > adrp x2, .LC1 > > > > I guess this will create an output dependency from adrp x2, .LC0 -> > > adrp x2, .LC1 > > and anti-dependency from ldr d31, [x2, #:lo12:.LC0] -> adrp x2, .LC1 > > essentially forcing almost the entire sequence (except ins > > instructions) to execute sequentially ? > > I'd expect modern cores to handle that via renaming. Ah right, thanks for the clarification. For some reason, it seems git diff is not formatting the patch correctly :/ Or perhaps I am doing something wrongly. For eg, it shows: + return is_a(GET_MODE (dest)) + && aarch64_mov_operand (src, GET_MODE (src)); but after applying the patch, it's formatted correctly with "&&aarch64..." right below is_a, both on column 10. Similarly, for following hunk in seq_cost_ignoring_scalar_moves: + if (NONDEBUG_INSN_P (seq) + && !scalar_move_insn_p (seq)) After applying patch, "&&" is below N, and not '('. Both N and "&&" are on col 9. And for the following just below: + { + if (rtx set = single_set (seq)) diff shows only one space difference between '{' and the following if, but after applying the patch it's formatted correctly, with two spaces after the curly brace. This is the entire file aarch64.cc with patch applied: https://people.linaro.org/~prathamesh.kulkarni/aarch64.cc Does the formatting look OK for scalar_move_insn_p and seq_cost_ignoring_scalar_moves in above aarch64.cc ? Patch passes bootstrap+test on aarch64-linux-gnu with no regressions. OK to commit ? Thanks, Prathamesh > > Thanks, > Richard