On Fri, 7 Aug 2020, Richard Biener wrote: > On Fri, Aug 7, 2020 at 10:33 AM Marc Glisse wrote: >> >> On Fri, 7 Aug 2020, Richard Biener wrote: >> >>> On Thu, Aug 6, 2020 at 8:07 PM Marc Glisse wrote: >>>> >>>> On Thu, 6 Aug 2020, Christophe Lyon wrote: >>>> >>>>>> Was I on the right track configuring with >>>>>> --target=arm-none-linux-gnueabihf --with-cpu=cortex-a9 >>>>>> --with-fpu=neon-fp16 >>>>>> then compiling without any special option? >>>>> >>>>> Maybe you also need --with-float=hard, I don't remember if it's >>>>> implied by the 'hf' target suffix >>>> >>>> Thanks! That's what I was missing to reproduce the issue. Now I can >>>> reproduce it with just >>>> >>>> typedef unsigned int vec __attribute__((vector_size(16))); >>>> typedef int vi __attribute__((vector_size(16))); >>>> vi f(vec a,vec b){ >>>> return a==5 | b==7; >>>> } >>>> >>>> with -fdisable-tree-forwprop1 -fdisable-tree-forwprop2 -fdisable-tree-forwprop3 -O1 >>>> >>>> _1 = a_5(D) == { 5, 5, 5, 5 }; >>>> _3 = b_6(D) == { 7, 7, 7, 7 }; >>>> _9 = _1 | _3; >>>> _7 = .VCOND (_9, { 0, 0, 0, 0 }, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }, 107); >>>> >>>> we fail to expand the equality comparison (expand_vec_cmp_expr_p returns >>>> false), while with -fdisable-tree-forwprop4 we do manage to expand >>>> >>>> _2 = .VCONDU (a_5(D), { 5, 5, 5, 5 }, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }, 112); >>>> >>>> It doesn't make much sense to me that we can expand the more complicated >>>> form and not the simpler form of the same operation (both compare a to 5 >>>> and produce a vector of -1 or 0 of the same size), especially when the >>>> target has an instruction (vceq) that does just what we want. >>>> >>>> Introducing boolean vectors was fine, but I think they should be real >>>> types, that we can operate on, not be forced to appear only as the first >>>> argument of a vcond. >>>> >>>> I can think of 2 natural ways to improve things: either implement vector >>>> comparisons in the ARM backend (possibly by forwarding to their existing >>>> code for vcond), or in the generic expansion code try using vcond if the >>>> direct comparison opcode is not provided. >>>> >>>> We can temporarily revert my patch, but I would like it to be temporary. >>>> Since aarch64 seems to handle the same code just fine, maybe someone who >>>> knows arm could copy the relevant code over? >>>> >>>> Does my message make sense, do people have comments? >>> >>> So what complicates things now (and to some extent pre-existed when you >>> used AVX512 which _could_ operate on boolean vectors) is that we >>> have split out the condition from VEC_COND_EXPR to separate stmts >>> but we do not expect backends to be able to code-generate the separate >>> form - instead we rely on the ISEL pass to trasform VEC_COND_EXPRs >>> to .VCOND[U] "merging" the compares again. Now that process breaks >>> down once we have things like _9 = _1 | _3; - at some point I argued >>> that we should handle vector compares [and operations on boolean vectors] >>> as well in ISEL but then when it came up again for some reason I >>> disregarded that again. >>> >>> Thus - we don't want to go back to fixing up the generic expansion code >>> (which looks at one instruction at a time and is restricted by TER single-use >>> restrictions). >> >> Here, it would only be handling comparisons left over by ISEL because they >> do not feed a VEC_COND_EXPR, maybe not so bad. Handling it in ISEL would >> be more consistent, but then we may have to teach the vector lowering pass >> about this. >> >>> Instead we want to deal with this in ISEL which should >>> behave more intelligently. In the above case it might involve turning >>> the _1 and _3 defs into .VCOND [with different result type], doing >>> _9 in that type and then somehow dealing with _7 ... but this eventually >>> means undoing the match simplification that introduced the code? >> >> For targets that do not have compact boolean vectors, >> VEC_COND_EXPR<*,-1,0> does nothing, it is just there as a technicality. >> Removing it to allow more simplifications makes sense, reintroducing it >> for expansion can make sense as well, I think it is ok if the second one >> reverses the first, but very locally, without having to propagate a change >> of type to the uses. So on ARM we could turn _1 and _3 into .VCOND >> producing bool:32 vectors, and replace _7 with a VIEW_CONVERT_EXPR. Or >> does using bool vectors like that seem bad? (maybe they aren't guaranteed >> to be equivalent to signed integer vectors with values 0 and -1 and we >> need to query the target to know if that is the case, or introduce an >> extra .VCOND) >> >> Also, we only want to replace comparisons with .VCOND if the target >> doesn't handle the comparison directly. For AVX512, we do want to produce >> SImode bool vectors (for k* registers) and operate on them directly, >> that's the whole point of introducing the bool vectors (if bool vectors >> were only used to feed VEC_COND_EXPR and were all turned into .VCOND >> before expansion, I don't see the point of specifying different types for >> bool vectors for AVX512 vs non-AVX512, as it would make no difference on >> what is passed to the backend). >> >> I think targets should provide some number of operations, including for >> instance bit_and and bit_ior on bool vectors, and be a bit consistent >> about what they provide, it becomes unmanageable in the middle-end >> otherwise... > > It's currently somewhat of a mess I guess. It might help to > restrict the simplification patterns to passes before vector lowering > so maybe add something like (or re-use) > canonicalize_math[_after_vectorization]_p ()? That's indeed a nice way to gain time to sort out the mess :-) This patch solved the issue on ARM for at least 1 testcase. I have a bootstrap+regtest running on x86_64-linux-gnu, at least the tests added in the previous patch still work. 2020-08-05 Marc Glisse * generic-match-head.c (optimize_vectors_before_lowering_p): New function. * gimple-match-head.c (optimize_vectors_before_lowering_p): Likewise. * match.pd ((v ? w : 0) ? a : b, c1 ? c2 ? a : b : b): Use it. -- Marc Glisse