From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.guenther@gmail.com>
Received: from mail-ej1-x62f.google.com (mail-ej1-x62f.google.com
 [IPv6:2a00:1450:4864:20::62f])
 by sourceware.org (Postfix) with ESMTPS id 744463858D35
 for <gcc-patches@gcc.gnu.org>; Fri,  7 Aug 2020 08:47:40 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 744463858D35
Received: by mail-ej1-x62f.google.com with SMTP id jp10so1284276ejb.0
 for <gcc-patches@gcc.gnu.org>; Fri, 07 Aug 2020 01:47:40 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=ygQzz4eJYBrMcCUHUEpqrWUWwtSM8y2KJlK1YikwWbE=;
 b=CunPsjr+4Ij9C6hgVncQ+O2j4Do6HNm0hbss+Terg/i9wyCGyZjfSvCt2ej910JBO2
 WCNaQNkALBwz1AfCf1gTgLAUfHIm5b5VR71+/oQFeobW9omcLnANDGn6nWRnKol7fBH9
 QjCUGnntI7uauWuBffTM9lB3EnJlqCuBvEPJRFdOyUrZ83UJYU3OjWkgnanLoUytAS9h
 QvSzUTQonEI7LIYze2CfdqrV9QvNfkrFzrbuzob1Qz8vJvrtHQ9d/Ur0KJx5BD+Oe90h
 eLo3N/BaTRTisudmjBWIYYKm8abGQw6pUptii7XufKo38xMPKpfjqGVtS6cEq8hx3Z1h
 s+bQ==
X-Gm-Message-State: AOAM531iTtMqZ/Pv4NrNDVE7kSpOVY3PtuZhf0iHk+Ojo1Yvk0RWm0BQ
 GI0/ow/qHG1vBz8lTnJ6XYPPfQKaCzScHRzWI1Y=
X-Google-Smtp-Source: ABdhPJwbT6ulLjqS5Cq9K3DhAPn+bXR37IZpJkSzuJV+ewtVzJfjkafqS2APxeCPK8OWz+rxerydGr1ja4u16PsHTEE=
X-Received: by 2002:a17:906:68da:: with SMTP id
 y26mr8108460ejr.250.1596790059462; 
 Fri, 07 Aug 2020 01:47:39 -0700 (PDT)
MIME-Version: 1.0
References: <alpine.DEB.2.23.453.2007291859410.6927@stedding.saclay.inria.fr>
 <alpine.DEB.2.23.453.2008051443320.18411@stedding.saclay.inria.fr>
 <CAFiYyc3Cw=sXXd8099p0J3FskkHDA2orUR5Qczkqz0aZ9rky=g@mail.gmail.com>
 <CAKdteOYB5_WkzbvMMNyRiHAGbrTAW-zY_aQqi91ReTpc5VQ78Q@mail.gmail.com>
 <alpine.DEB.2.23.453.2008061105440.8021@stedding.saclay.inria.fr>
 <CAKdteOYxSh_8+LrziL_XhaXiB56UQOyFjtM5kgaoY7cK-A6F5g@mail.gmail.com>
 <alpine.DEB.2.23.453.2008061335420.8021@stedding.saclay.inria.fr>
 <CAKdteOYyK+QHhmiDMHbh-a-t-ix7U9ApWVabs2h3_gJz8zZ5Zg@mail.gmail.com>
 <alpine.DEB.2.23.453.2008061937220.8021@stedding.saclay.inria.fr>
 <CAFiYyc3EYMa3aG789kMfDOiZo5ZK+nULdKnx=zg+H4=xkvqBkg@mail.gmail.com>
 <alpine.DEB.2.23.453.2008070941090.6903@stedding.saclay.inria.fr>
In-Reply-To: <alpine.DEB.2.23.453.2008070941090.6903@stedding.saclay.inria.fr>
From: Richard Biener <richard.guenther@gmail.com>
Date: Fri, 7 Aug 2020 10:47:28 +0200
Message-ID: <CAFiYyc278rOj8tAXCMKuj00TW_8JxYoHwinO2deK82AkFZC47g@mail.gmail.com>
Subject: Re: VEC_COND_EXPR optimizations v2
To: Marc Glisse <marc.glisse@inria.fr>
Cc: Christophe Lyon <christophe.lyon@linaro.org>,
 GCC Patches <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset="UTF-8"
X-Spam-Status: No, score=-3.3 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, KAM_NUMSUBJECT,
 RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=no autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 07 Aug 2020 08:47:42 -0000

On Fri, Aug 7, 2020 at 10:33 AM Marc Glisse <marc.glisse@inria.fr> wrote:
>
> On Fri, 7 Aug 2020, Richard Biener wrote:
>
> > On Thu, Aug 6, 2020 at 8:07 PM Marc Glisse <marc.glisse@inria.fr> wrote:
> >>
> >> On Thu, 6 Aug 2020, Christophe Lyon wrote:
> >>
> >>>> Was I on the right track configuring with
> >>>> --target=arm-none-linux-gnueabihf --with-cpu=cortex-a9
> >>>> --with-fpu=neon-fp16
> >>>> then compiling without any special option?
> >>>
> >>> Maybe you also need --with-float=hard, I don't remember if it's
> >>> implied by the 'hf' target suffix
> >>
> >> Thanks! That's what I was missing to reproduce the issue. Now I can
> >> reproduce it with just
> >>
> >> typedef unsigned int vec __attribute__((vector_size(16)));
> >> typedef int vi __attribute__((vector_size(16)));
> >> vi f(vec a,vec b){
> >>      return a==5 | b==7;
> >> }
> >>
> >> with -fdisable-tree-forwprop1 -fdisable-tree-forwprop2 -fdisable-tree-forwprop3 -O1
> >>
> >>    _1 = a_5(D) == { 5, 5, 5, 5 };
> >>    _3 = b_6(D) == { 7, 7, 7, 7 };
> >>    _9 = _1 | _3;
> >>    _7 = .VCOND (_9, { 0, 0, 0, 0 }, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }, 107);
> >>
> >> we fail to expand the equality comparison (expand_vec_cmp_expr_p returns
> >> false), while with -fdisable-tree-forwprop4 we do manage to expand
> >>
> >>    _2 = .VCONDU (a_5(D), { 5, 5, 5, 5 }, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }, 112);
> >>
> >> It doesn't make much sense to me that we can expand the more complicated
> >> form and not the simpler form of the same operation (both compare a to 5
> >> and produce a vector of -1 or 0 of the same size), especially when the
> >> target has an instruction (vceq) that does just what we want.
> >>
> >> Introducing boolean vectors was fine, but I think they should be real
> >> types, that we can operate on, not be forced to appear only as the first
> >> argument of a vcond.
> >>
> >> I can think of 2 natural ways to improve things: either implement vector
> >> comparisons in the ARM backend (possibly by forwarding to their existing
> >> code for vcond), or in the generic expansion code try using vcond if the
> >> direct comparison opcode is not provided.
> >>
> >> We can temporarily revert my patch, but I would like it to be temporary.
> >> Since aarch64 seems to handle the same code just fine, maybe someone who
> >> knows arm could copy the relevant code over?
> >>
> >> Does my message make sense, do people have comments?
> >
> > So what complicates things now (and to some extent pre-existed when you
> > used AVX512 which _could_ operate on boolean vectors) is that we
> > have split out the condition from VEC_COND_EXPR to separate stmts
> > but we do not expect backends to be able to code-generate the separate
> > form - instead we rely on the ISEL pass to trasform VEC_COND_EXPRs
> > to .VCOND[U] "merging" the compares again.  Now that process breaks
> > down once we have things like _9 = _1 | _3;  -  at some point I argued
> > that we should handle vector compares [and operations on boolean vectors]
> > as well in ISEL but then when it came up again for some reason I
> > disregarded that again.
> >
> > Thus - we don't want to go back to fixing up the generic expansion code
> > (which looks at one instruction at a time and is restricted by TER single-use
> > restrictions).
>
> Here, it would only be handling comparisons left over by ISEL because they
> do not feed a VEC_COND_EXPR, maybe not so bad. Handling it in ISEL would
> be more consistent, but then we may have to teach the vector lowering pass
> about this.
>
> > Instead we want to deal with this in ISEL which should
> > behave more intelligently.  In the above case it might involve turning
> > the _1 and _3 defs into .VCOND [with different result type], doing
> > _9 in that type and then somehow dealing with _7 ... but this eventually
> > means undoing the match simplification that introduced the code?
>
> For targets that do not have compact boolean vectors,
> VEC_COND_EXPR<*,-1,0> does nothing, it is just there as a technicality.
> Removing it to allow more simplifications makes sense, reintroducing it
> for expansion can make sense as well, I think it is ok if the second one
> reverses the first, but very locally, without having to propagate a change
> of type to the uses. So on ARM we could turn _1 and _3 into .VCOND
> producing bool:32 vectors, and replace _7 with a VIEW_CONVERT_EXPR. Or
> does using bool vectors like that seem bad? (maybe they aren't guaranteed
> to be equivalent to signed integer vectors with values 0 and -1 and we
> need to query the target to know if that is the case, or introduce an
> extra .VCOND)
>
> Also, we only want to replace comparisons with .VCOND if the target
> doesn't handle the comparison directly. For AVX512, we do want to produce
> SImode bool vectors (for k* registers) and operate on them directly,
> that's the whole point of introducing the bool vectors (if bool vectors
> were only used to feed VEC_COND_EXPR and were all turned into .VCOND
> before expansion, I don't see the point of specifying different types for
> bool vectors for AVX512 vs non-AVX512, as it would make no difference on
> what is passed to the backend).
>
> I think targets should provide some number of operations, including for
> instance bit_and and bit_ior on bool vectors, and be a bit consistent
> about what they provide, it becomes unmanageable in the middle-end
> otherwise...

It's currently somewhat of a mess I guess.  It might help to
restrict the simplification patterns to passes before vector lowering
so maybe add something like (or re-use)
canonicalize_math[_after_vectorization]_p ()?

Richard.

>
> --
> Marc Glisse