From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <marc.glisse@inria.fr>
Received: from mail3-relais-sop.national.inria.fr
 (mail3-relais-sop.national.inria.fr [192.134.164.104])
 by sourceware.org (Postfix) with ESMTPS id 831F63857C58
 for <gcc-patches@gcc.gnu.org>; Thu,  6 Aug 2020 18:07:59 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 831F63857C58
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=inria.fr
Authentication-Results: sourceware.org;
 spf=pass smtp.mailfrom=marc.glisse@inria.fr
X-IronPort-AV: E=Sophos;i="5.75,441,1589234400"; d="scan'208";a="356086938"
Received: from 85-171-191-139.rev.numericable.fr (HELO stedding)
 ([85.171.191.139])
 by mail3-relais-sop.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 06 Aug 2020 20:07:58 +0200
Date: Thu, 6 Aug 2020 20:07:57 +0200 (CEST)
From: Marc Glisse <marc.glisse@inria.fr>
X-X-Sender: glisse@stedding.saclay.inria.fr
To: Christophe Lyon <christophe.lyon@linaro.org>
cc: Richard Biener <richard.guenther@gmail.com>, 
 GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: VEC_COND_EXPR optimizations v2
In-Reply-To: <CAKdteOYyK+QHhmiDMHbh-a-t-ix7U9ApWVabs2h3_gJz8zZ5Zg@mail.gmail.com>
Message-ID: <alpine.DEB.2.23.453.2008061937220.8021@stedding.saclay.inria.fr>
References: <alpine.DEB.2.23.453.2007291859410.6927@stedding.saclay.inria.fr>
 <alpine.DEB.2.23.453.2008051443320.18411@stedding.saclay.inria.fr>
 <CAFiYyc3Cw=sXXd8099p0J3FskkHDA2orUR5Qczkqz0aZ9rky=g@mail.gmail.com>
 <CAKdteOYB5_WkzbvMMNyRiHAGbrTAW-zY_aQqi91ReTpc5VQ78Q@mail.gmail.com>
 <alpine.DEB.2.23.453.2008061105440.8021@stedding.saclay.inria.fr>
 <CAKdteOYxSh_8+LrziL_XhaXiB56UQOyFjtM5kgaoY7cK-A6F5g@mail.gmail.com>
 <alpine.DEB.2.23.453.2008061335420.8021@stedding.saclay.inria.fr>
 <CAKdteOYyK+QHhmiDMHbh-a-t-ix7U9ApWVabs2h3_gJz8zZ5Zg@mail.gmail.com>
User-Agent: Alpine 2.23 (DEB 453 2020-06-18)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII; format=flowed
X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS,
 KAM_NUMSUBJECT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=no autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 06 Aug 2020 18:08:01 -0000

On Thu, 6 Aug 2020, Christophe Lyon wrote:

>> Was I on the right track configuring with
>> --target=arm-none-linux-gnueabihf --with-cpu=cortex-a9
>> --with-fpu=neon-fp16
>> then compiling without any special option?
>
> Maybe you also need --with-float=hard, I don't remember if it's
> implied by the 'hf' target suffix

Thanks! That's what I was missing to reproduce the issue. Now I can
reproduce it with just

typedef unsigned int vec __attribute__((vector_size(16)));
typedef int vi __attribute__((vector_size(16)));
vi f(vec a,vec b){
     return a==5 | b==7;
}

with -fdisable-tree-forwprop1 -fdisable-tree-forwprop2 -fdisable-tree-forwprop3 -O1

   _1 = a_5(D) == { 5, 5, 5, 5 };
   _3 = b_6(D) == { 7, 7, 7, 7 };
   _9 = _1 | _3;
   _7 = .VCOND (_9, { 0, 0, 0, 0 }, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }, 107);

we fail to expand the equality comparison (expand_vec_cmp_expr_p returns
false), while with -fdisable-tree-forwprop4 we do manage to expand

   _2 = .VCONDU (a_5(D), { 5, 5, 5, 5 }, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }, 112);

It doesn't make much sense to me that we can expand the more complicated
form and not the simpler form of the same operation (both compare a to 5
and produce a vector of -1 or 0 of the same size), especially when the
target has an instruction (vceq) that does just what we want.

Introducing boolean vectors was fine, but I think they should be real 
types, that we can operate on, not be forced to appear only as the first 
argument of a vcond.

I can think of 2 natural ways to improve things: either implement vector 
comparisons in the ARM backend (possibly by forwarding to their existing 
code for vcond), or in the generic expansion code try using vcond if the 
direct comparison opcode is not provided.

We can temporarily revert my patch, but I would like it to be temporary. 
Since aarch64 seems to handle the same code just fine, maybe someone who 
knows arm could copy the relevant code over?

Does my message make sense, do people have comments?

-- 
Marc Glisse