From: Richard Sandiford Date: Tuesday, May 16, 2023 at 5:36 PM To: Tejas Belagod Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH] [PR96339] AArch64: Optimise svlast[ab] Tejas Belagod writes: >>> + { >>> + b = build3 (BIT_FIELD_REF, TREE_TYPE (f.lhs), val, >>> + bitsize_int (step * BITS_PER_UNIT), >>> + bitsize_int ((16 - step) * BITS_PER_UNIT)); >>> + >>> + return gimple_build_assign (f.lhs, b); >>> + } >>> + >>> + /* If VECTOR_CST_NELTS_PER_PATTERN (pred) == 2 and every multiple of >>> + 'step_1' in >>> + [VECTOR_CST_NPATTERNS .. VECTOR_CST_ENCODED_NELTS - 1] >>> + is zero, then we can treat the vector as VECTOR_CST_NPATTERNS >>> + elements followed by all inactive elements. */ >>> + if (!const_vl && VECTOR_CST_NELTS_PER_PATTERN (pred) == 2) >> >> Following on from the above, maybe use: >> >> !VECTOR_CST_NELTS (pred).is_constant () >> >> instead of !const_vl here. >> >> I have a horrible suspicion that I'm contradicting our earlier discussion >> here, sorry, but: I think we have to return null if NELTS_PER_PATTERN != 2. >> >> >> >> IIUC, the NPATTERNS .. ENCODED_ELTS represent the repeated part of the > encoded >> constant. This means the repetition occurs if NELTS_PER_PATTERN == 2, IOW the >> base1 repeats in the encoding. This loop is checking this condition and looks >> for a 1 in the repeated part of the NELTS_PER_PATTERN == 2 in a VL vector. >> Please correct me if I’m misunderstanding here. > > NELTS_PER_PATTERN == 1 is also a repeating pattern: it means that the > entire sequence is repeated to fill a vector. So if an NELTS_PER_PATTERN > == 1 constant has elements {0, 1, 0, 0}, the vector is: > > {0, 1, 0, 0, 0, 1, 0, 0, ...} > > > Wouldn’t the vect_all_same(pred, step) cover this case for a given value of > step? > > > and the optimisation can't handle that. NELTS_PER_PATTERN == 3 isn't > likely to occur for predicates, but in principle it has the same problem. > > > > OK, I had misunderstood the encoding to always make base1 the repeating value > by adjusting the NPATTERNS accordingly – I didn’t know you could also have the > base2 value and beyond encoding the repeat value. In this case could I just > remove NELTS_PER_PATTERN == 2 condition and the enclosed loop would check for a > repeating ‘1’ in the repeated part of the encoded pattern? But for NELTS_PER_PATTERN==1, the whole encoded sequence repeats. So you would have to start the check at element 0 rather than NPATTERNS. And then (for NELTS_PER_PATTERN==1) the loop would reject any constant that has a nonzero element. But all valid zero-vector cases have been handled by this point, so the effect wouldn't be useful. It should never be the case that all elements from NPATTERNS onwards are zero for NELTS_PER_PATTERN==3; that case should be canonicalised to NELTS_PER_PATTERN==2 instead. So in practice it's simpler and more obviously correct to punt when NELTS_PER_PATTERN != 2. Thanks for the clarification. I understand all points about punting when NELTS_PER_PATTERN !=2, but one. Am I correct to understand that we still need to check for the case when there's a repeating non-zero elements in the case of NELTS_PER_PATTERN == 2? eg. { 0, 0, 1, 1, 1, 1,....} which should be encoded as {0, 0, 1, 1} with NPATTERNS = 2 ? Thanks, Tejas. Thanks, Richard