On Fri, 7 Aug 2020, Richard Biener wrote:

> On Fri, Aug 7, 2020 at 10:33 AM Marc Glisse <marc.glisse@inria.fr> wrote:
>>
>> On Fri, 7 Aug 2020, Richard Biener wrote:
>>
>>> On Thu, Aug 6, 2020 at 8:07 PM Marc Glisse <marc.glisse@inria.fr> wrote:
>>>>
>>>> On Thu, 6 Aug 2020, Christophe Lyon wrote:
>>>>
>>>>>> Was I on the right track configuring with
>>>>>> --target=arm-none-linux-gnueabihf --with-cpu=cortex-a9
>>>>>> --with-fpu=neon-fp16
>>>>>> then compiling without any special option?
>>>>>
>>>>> Maybe you also need --with-float=hard, I don't remember if it's
>>>>> implied by the 'hf' target suffix
>>>>
>>>> Thanks! That's what I was missing to reproduce the issue. Now I can
>>>> reproduce it with just
>>>>
>>>> typedef unsigned int vec __attribute__((vector_size(16)));
>>>> typedef int vi __attribute__((vector_size(16)));
>>>> vi f(vec a,vec b){
>>>>      return a==5 | b==7;
>>>> }
>>>>
>>>> with -fdisable-tree-forwprop1 -fdisable-tree-forwprop2 -fdisable-tree-forwprop3 -O1
>>>>
>>>>    _1 = a_5(D) == { 5, 5, 5, 5 };
>>>>    _3 = b_6(D) == { 7, 7, 7, 7 };
>>>>    _9 = _1 | _3;
>>>>    _7 = .VCOND (_9, { 0, 0, 0, 0 }, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }, 107);
>>>>
>>>> we fail to expand the equality comparison (expand_vec_cmp_expr_p returns
>>>> false), while with -fdisable-tree-forwprop4 we do manage to expand
>>>>
>>>>    _2 = .VCONDU (a_5(D), { 5, 5, 5, 5 }, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }, 112);
>>>>
>>>> It doesn't make much sense to me that we can expand the more complicated
>>>> form and not the simpler form of the same operation (both compare a to 5
>>>> and produce a vector of -1 or 0 of the same size), especially when the
>>>> target has an instruction (vceq) that does just what we want.
>>>>
>>>> Introducing boolean vectors was fine, but I think they should be real
>>>> types, that we can operate on, not be forced to appear only as the first
>>>> argument of a vcond.
>>>>
>>>> I can think of 2 natural ways to improve things: either implement vector
>>>> comparisons in the ARM backend (possibly by forwarding to their existing
>>>> code for vcond), or in the generic expansion code try using vcond if the
>>>> direct comparison opcode is not provided.
>>>>
>>>> We can temporarily revert my patch, but I would like it to be temporary.
>>>> Since aarch64 seems to handle the same code just fine, maybe someone who
>>>> knows arm could copy the relevant code over?
>>>>
>>>> Does my message make sense, do people have comments?
>>>
>>> So what complicates things now (and to some extent pre-existed when you
>>> used AVX512 which _could_ operate on boolean vectors) is that we
>>> have split out the condition from VEC_COND_EXPR to separate stmts
>>> but we do not expect backends to be able to code-generate the separate
>>> form - instead we rely on the ISEL pass to trasform VEC_COND_EXPRs
>>> to .VCOND[U] "merging" the compares again.  Now that process breaks
>>> down once we have things like _9 = _1 | _3;  -  at some point I argued
>>> that we should handle vector compares [and operations on boolean vectors]
>>> as well in ISEL but then when it came up again for some reason I
>>> disregarded that again.
>>>
>>> Thus - we don't want to go back to fixing up the generic expansion code
>>> (which looks at one instruction at a time and is restricted by TER single-use
>>> restrictions).
>>
>> Here, it would only be handling comparisons left over by ISEL because they
>> do not feed a VEC_COND_EXPR, maybe not so bad. Handling it in ISEL would
>> be more consistent, but then we may have to teach the vector lowering pass
>> about this.
>>
>>> Instead we want to deal with this in ISEL which should
>>> behave more intelligently.  In the above case it might involve turning
>>> the _1 and _3 defs into .VCOND [with different result type], doing
>>> _9 in that type and then somehow dealing with _7 ... but this eventually
>>> means undoing the match simplification that introduced the code?
>>
>> For targets that do not have compact boolean vectors,
>> VEC_COND_EXPR<*,-1,0> does nothing, it is just there as a technicality.
>> Removing it to allow more simplifications makes sense, reintroducing it
>> for expansion can make sense as well, I think it is ok if the second one
>> reverses the first, but very locally, without having to propagate a change
>> of type to the uses. So on ARM we could turn _1 and _3 into .VCOND
>> producing bool:32 vectors, and replace _7 with a VIEW_CONVERT_EXPR. Or
>> does using bool vectors like that seem bad? (maybe they aren't guaranteed
>> to be equivalent to signed integer vectors with values 0 and -1 and we
>> need to query the target to know if that is the case, or introduce an
>> extra .VCOND)
>>
>> Also, we only want to replace comparisons with .VCOND if the target
>> doesn't handle the comparison directly. For AVX512, we do want to produce
>> SImode bool vectors (for k* registers) and operate on them directly,
>> that's the whole point of introducing the bool vectors (if bool vectors
>> were only used to feed VEC_COND_EXPR and were all turned into .VCOND
>> before expansion, I don't see the point of specifying different types for
>> bool vectors for AVX512 vs non-AVX512, as it would make no difference on
>> what is passed to the backend).
>>
>> I think targets should provide some number of operations, including for
>> instance bit_and and bit_ior on bool vectors, and be a bit consistent
>> about what they provide, it becomes unmanageable in the middle-end
>> otherwise...
>
> It's currently somewhat of a mess I guess.  It might help to
> restrict the simplification patterns to passes before vector lowering
> so maybe add something like (or re-use)
> canonicalize_math[_after_vectorization]_p ()?

That's indeed a nice way to gain time to sort out the mess :-)

This patch solved the issue on ARM for at least 1 testcase. I have a 
bootstrap+regtest running on x86_64-linux-gnu, at least the tests added in 
the previous patch still work.

2020-08-05  Marc Glisse  <marc.glisse@inria.fr>

 	* generic-match-head.c (optimize_vectors_before_lowering_p): New
 	function.
 	* gimple-match-head.c (optimize_vectors_before_lowering_p):
 	Likewise.
 	* match.pd ((v ? w : 0) ? a : b, c1 ? c2 ? a : b : b): Use it.

-- 
Marc Glisse