On Tue, 25 Apr 2023 at 16:29, Richard Sandiford wrote: > > Prathamesh Kulkarni writes: > > Hi Richard, > > While digging thru aarch64_expand_vector_init, I noticed it gives > > priority to loading a constant first: > > /* Initialise a vector which is part-variable. We want to first try > > to build those lanes which are constant in the most efficient way we > > can. */ > > > > which results in suboptimal code-gen for following case: > > int16x8_t f_s16(int16_t x) > > { > > return (int16x8_t) { x, x, x, x, x, x, x, 1 }; > > } > > > > code-gen trunk: > > f_s16: > > movi v0.8h, 0x1 > > ins v0.h[0], w0 > > ins v0.h[1], w0 > > ins v0.h[2], w0 > > ins v0.h[3], w0 > > ins v0.h[4], w0 > > ins v0.h[5], w0 > > ins v0.h[6], w0 > > ret > > > > The attached patch tweaks the following condition: > > if (n_var == n_elts && n_elts <= 16) > > { > > ... > > } > > > > to pass if maxv >= 80% of n_elts, with 80% being an > > arbitrary "high enough" threshold. The intent is to dup > > the most repeating variable if it it's repetition > > is "high enough" and insert constants which should be "better" than > > loading constant first and inserting variables like in the above case. > > I'm not too keen on the 80%. Like you say, it seems a bit arbitrary. > > The case above can also be handled by relaxing n_var == n_elts to > n_var >= n_elts - 1, so that if there's just one constant element, > we look for duplicated variable elements. If there are none > (maxv == 1), but there is a constant element, we can duplicate > the constant element into a register. > > The case when there's more than one constant element needs more thought > (and testcases :-)). E.g. after a certain point, it would probably be > better to load the variable and constant parts separately and blend them > using TBL. It also matters whether the constants are equal or not. > > There are also cases that could be handled using EXT. > > Plus, if we're inserting many variable elements that are already > in GPRs, we can probably do better by coalescing them into bigger > GPR values and inserting them as wider elements. > > Because of things like that, I think we should stick to the > single-constant case for now. Hi Richard, Thanks for the suggestions. The attached patch only handles the single constant case. Bootstrap+test in progress on aarch64-linux-gnu. Does it look OK ? Thanks, Prathamesh > > Thanks, > Richard