From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-x62f.google.com (mail-ej1-x62f.google.com [IPv6:2a00:1450:4864:20::62f]) by sourceware.org (Postfix) with ESMTPS id 744463858D35 for ; Fri, 7 Aug 2020 08:47:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 744463858D35 Received: by mail-ej1-x62f.google.com with SMTP id jp10so1284276ejb.0 for ; Fri, 07 Aug 2020 01:47:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ygQzz4eJYBrMcCUHUEpqrWUWwtSM8y2KJlK1YikwWbE=; b=CunPsjr+4Ij9C6hgVncQ+O2j4Do6HNm0hbss+Terg/i9wyCGyZjfSvCt2ej910JBO2 WCNaQNkALBwz1AfCf1gTgLAUfHIm5b5VR71+/oQFeobW9omcLnANDGn6nWRnKol7fBH9 QjCUGnntI7uauWuBffTM9lB3EnJlqCuBvEPJRFdOyUrZ83UJYU3OjWkgnanLoUytAS9h QvSzUTQonEI7LIYze2CfdqrV9QvNfkrFzrbuzob1Qz8vJvrtHQ9d/Ur0KJx5BD+Oe90h eLo3N/BaTRTisudmjBWIYYKm8abGQw6pUptii7XufKo38xMPKpfjqGVtS6cEq8hx3Z1h s+bQ== X-Gm-Message-State: AOAM531iTtMqZ/Pv4NrNDVE7kSpOVY3PtuZhf0iHk+Ojo1Yvk0RWm0BQ GI0/ow/qHG1vBz8lTnJ6XYPPfQKaCzScHRzWI1Y= X-Google-Smtp-Source: ABdhPJwbT6ulLjqS5Cq9K3DhAPn+bXR37IZpJkSzuJV+ewtVzJfjkafqS2APxeCPK8OWz+rxerydGr1ja4u16PsHTEE= X-Received: by 2002:a17:906:68da:: with SMTP id y26mr8108460ejr.250.1596790059462; Fri, 07 Aug 2020 01:47:39 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Richard Biener Date: Fri, 7 Aug 2020 10:47:28 +0200 Message-ID: Subject: Re: VEC_COND_EXPR optimizations v2 To: Marc Glisse Cc: Christophe Lyon , GCC Patches Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, KAM_NUMSUBJECT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Aug 2020 08:47:42 -0000 On Fri, Aug 7, 2020 at 10:33 AM Marc Glisse wrote: > > On Fri, 7 Aug 2020, Richard Biener wrote: > > > On Thu, Aug 6, 2020 at 8:07 PM Marc Glisse wrote: > >> > >> On Thu, 6 Aug 2020, Christophe Lyon wrote: > >> > >>>> Was I on the right track configuring with > >>>> --target=arm-none-linux-gnueabihf --with-cpu=cortex-a9 > >>>> --with-fpu=neon-fp16 > >>>> then compiling without any special option? > >>> > >>> Maybe you also need --with-float=hard, I don't remember if it's > >>> implied by the 'hf' target suffix > >> > >> Thanks! That's what I was missing to reproduce the issue. Now I can > >> reproduce it with just > >> > >> typedef unsigned int vec __attribute__((vector_size(16))); > >> typedef int vi __attribute__((vector_size(16))); > >> vi f(vec a,vec b){ > >> return a==5 | b==7; > >> } > >> > >> with -fdisable-tree-forwprop1 -fdisable-tree-forwprop2 -fdisable-tree-forwprop3 -O1 > >> > >> _1 = a_5(D) == { 5, 5, 5, 5 }; > >> _3 = b_6(D) == { 7, 7, 7, 7 }; > >> _9 = _1 | _3; > >> _7 = .VCOND (_9, { 0, 0, 0, 0 }, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }, 107); > >> > >> we fail to expand the equality comparison (expand_vec_cmp_expr_p returns > >> false), while with -fdisable-tree-forwprop4 we do manage to expand > >> > >> _2 = .VCONDU (a_5(D), { 5, 5, 5, 5 }, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }, 112); > >> > >> It doesn't make much sense to me that we can expand the more complicated > >> form and not the simpler form of the same operation (both compare a to 5 > >> and produce a vector of -1 or 0 of the same size), especially when the > >> target has an instruction (vceq) that does just what we want. > >> > >> Introducing boolean vectors was fine, but I think they should be real > >> types, that we can operate on, not be forced to appear only as the first > >> argument of a vcond. > >> > >> I can think of 2 natural ways to improve things: either implement vector > >> comparisons in the ARM backend (possibly by forwarding to their existing > >> code for vcond), or in the generic expansion code try using vcond if the > >> direct comparison opcode is not provided. > >> > >> We can temporarily revert my patch, but I would like it to be temporary. > >> Since aarch64 seems to handle the same code just fine, maybe someone who > >> knows arm could copy the relevant code over? > >> > >> Does my message make sense, do people have comments? > > > > So what complicates things now (and to some extent pre-existed when you > > used AVX512 which _could_ operate on boolean vectors) is that we > > have split out the condition from VEC_COND_EXPR to separate stmts > > but we do not expect backends to be able to code-generate the separate > > form - instead we rely on the ISEL pass to trasform VEC_COND_EXPRs > > to .VCOND[U] "merging" the compares again. Now that process breaks > > down once we have things like _9 = _1 | _3; - at some point I argued > > that we should handle vector compares [and operations on boolean vectors] > > as well in ISEL but then when it came up again for some reason I > > disregarded that again. > > > > Thus - we don't want to go back to fixing up the generic expansion code > > (which looks at one instruction at a time and is restricted by TER single-use > > restrictions). > > Here, it would only be handling comparisons left over by ISEL because they > do not feed a VEC_COND_EXPR, maybe not so bad. Handling it in ISEL would > be more consistent, but then we may have to teach the vector lowering pass > about this. > > > Instead we want to deal with this in ISEL which should > > behave more intelligently. In the above case it might involve turning > > the _1 and _3 defs into .VCOND [with different result type], doing > > _9 in that type and then somehow dealing with _7 ... but this eventually > > means undoing the match simplification that introduced the code? > > For targets that do not have compact boolean vectors, > VEC_COND_EXPR<*,-1,0> does nothing, it is just there as a technicality. > Removing it to allow more simplifications makes sense, reintroducing it > for expansion can make sense as well, I think it is ok if the second one > reverses the first, but very locally, without having to propagate a change > of type to the uses. So on ARM we could turn _1 and _3 into .VCOND > producing bool:32 vectors, and replace _7 with a VIEW_CONVERT_EXPR. Or > does using bool vectors like that seem bad? (maybe they aren't guaranteed > to be equivalent to signed integer vectors with values 0 and -1 and we > need to query the target to know if that is the case, or introduce an > extra .VCOND) > > Also, we only want to replace comparisons with .VCOND if the target > doesn't handle the comparison directly. For AVX512, we do want to produce > SImode bool vectors (for k* registers) and operate on them directly, > that's the whole point of introducing the bool vectors (if bool vectors > were only used to feed VEC_COND_EXPR and were all turned into .VCOND > before expansion, I don't see the point of specifying different types for > bool vectors for AVX512 vs non-AVX512, as it would make no difference on > what is passed to the backend). > > I think targets should provide some number of operations, including for > instance bit_and and bit_ior on bool vectors, and be a bit consistent > about what they provide, it becomes unmanageable in the middle-end > otherwise... It's currently somewhat of a mess I guess. It might help to restrict the simplification patterns to passes before vector lowering so maybe add something like (or re-use) canonicalize_math[_after_vectorization]_p ()? Richard. > > -- > Marc Glisse