From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot1-x341.google.com (mail-ot1-x341.google.com [IPv6:2607:f8b0:4864:20::341]) by sourceware.org (Postfix) with ESMTPS id 455E2383E83B for ; Mon, 30 Nov 2020 10:39:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 455E2383E83B Received: by mail-ot1-x341.google.com with SMTP id z24so10816498oto.6 for ; Mon, 30 Nov 2020 02:39:19 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Gg6R6f/KGlQ3J4+AJ7EYKdUKv/2QgMkcMYXuvNeLVjs=; b=UVKbXhTqKyEQH2bDDIk1a8XMNWYb/zKatZH4214jMQNnjIYHIFIHYAilual39INlVD WVaNmNKDKoIIwQLSh/KirYIDNm/nFipoPGAxJCg6Z1QKHCcETrVA5QFIIZoNyNHXxfoX 3g4AzI/5vvC+hYvd9OIB2t5Qh5S3HfdfqJSI5ZjenrMqN7DiVgP3X92ocG4XhmVIY0/h 2po1jcp2W3/rW7CfgIDjFrC9XwExfIex1r4HIrIMgxlUsMxoPtRsZTvTV1KdDDgV1WWg rUT//j105HWkWjnbtb7Ehzl728CT+2ddNuxL+anOohirxmwzIXaitLaaP/DKxQY2zW3h Kw+w== X-Gm-Message-State: AOAM532kDwKfRvSYv0b48khsHCg1KPJ3rURtXTCA48M76RhuQOuVso63 k1jABopM2SU8KmWb0sTDnuCPoeQJe5lpiiBkzHaTzg== X-Google-Smtp-Source: ABdhPJyw5asWmkIzVmVCp7tzU/GgSejg0FtEasuCkqmufQpnNgwTkGga93TZiMbiBBC3DwVnnL3JAB5tdU9tk4anrY0= X-Received: by 2002:a05:6830:1213:: with SMTP id r19mr4039188otp.269.1606732758534; Mon, 30 Nov 2020 02:39:18 -0800 (PST) MIME-Version: 1.0 References: <1606312469-1370-1-git-send-email-christophe.lyon@linaro.org> In-Reply-To: From: Christophe Lyon Date: Mon, 30 Nov 2020 11:39:07 +0100 Message-ID: Subject: Re: [PATCH 1/7] arm: Auto-vectorization for MVE: vand To: "Andre Vieira (lists)" Cc: gcc Patches Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-10.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Nov 2020 10:39:22 -0000 On Fri, 27 Nov 2020 at 16:29, Christophe Lyon wrote: > > On Fri, 27 Nov 2020 at 15:13, Andre Vieira (lists) > wrote: > > > > Hi Christophe, > > > > On 26/11/2020 15:31, Christophe Lyon wrote: > > > Hi Andre, > > > > > > Thanks for the quick feedback. > > > > > > On Wed, 25 Nov 2020 at 18:17, Andre Simoes Dias Vieira > > > wrote: > > >> Hi Christophe, > > >> > > >> Thanks for these! See some inline comments. > > >> > > >> On 25/11/2020 13:54, Christophe Lyon via Gcc-patches wrote: > > >>> This patch enables MVE vandq instructions for auto-vectorization. MVE > > >>> vandq insns in mve.md are modified to use 'and' instead of unspec > > >>> expression to support and3. The and3 expander is added to > > >>> vec-common.md > > >>> > > >>> 2020-11-12 Christophe Lyon > > >>> > > >>> gcc/ > > >>> * gcc/config/arm/iterators.md (supf): Remove VANDQ_S and VANDQ_U. > > >>> (VANQ): Remove. > > >>> * config/arm/mve.md (mve_vandq_u): New entry for vand > > >>> instruction using expression and. > > >>> (mve_vandq_s): New expander. > > >>> * config/arm/neon.md (and3): Renamed into and3_neon. > > >>> * config/arm/unspecs.md (VANDQ_S, VANDQ_U): Remove. > > >>> * config/arm/vec-common.md (and3): New expander. > > >>> > > >>> gcc/testsuite/ > > >>> * gcc.target/arm/simd/mve-vand.c: New test. > > >>> --- > > >>> gcc/config/arm/iterators.md | 4 +--- > > >>> gcc/config/arm/mve.md | 20 ++++++++++++---- > > >>> gcc/config/arm/neon.md | 2 +- > > >>> gcc/config/arm/unspecs.md | 2 -- > > >>> gcc/config/arm/vec-common.md | 15 ++++++++++++ > > >>> gcc/testsuite/gcc.target/arm/simd/mve-vand.c | 34 ++++++++++++++++++++++++++++ > > >>> 6 files changed, 66 insertions(+), 11 deletions(-) > > >>> create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vand.c > > >>> > > >>> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md > > >>> index 592af35..72039e4 100644 > > >>> --- a/gcc/config/arm/iterators.md > > >>> +++ b/gcc/config/arm/iterators.md > > >>> @@ -1232,8 +1232,7 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U "u") (VREV16Q_S "s") > > >>> (VADDLVQ_P_U "u") (VCMPNEQ_U "u") (VCMPNEQ_S "s") > > >>> (VABDQ_M_S "s") (VABDQ_M_U "u") (VABDQ_S "s") > > >>> (VABDQ_U "u") (VADDQ_N_S "s") (VADDQ_N_U "u") > > >>> - (VADDVQ_P_S "s") (VADDVQ_P_U "u") (VANDQ_S "s") > > >>> - (VANDQ_U "u") (VBICQ_S "s") (VBICQ_U "u") > > >>> + (VADDVQ_P_S "s") (VADDVQ_P_U "u") (VBICQ_S "s") (VBICQ_U "u") > > >>> (VBRSRQ_N_S "s") (VBRSRQ_N_U "u") (VCADDQ_ROT270_S "s") > > >>> (VCADDQ_ROT270_U "u") (VCADDQ_ROT90_S "s") > > >>> (VCMPEQQ_S "s") (VCMPEQQ_U "u") (VCADDQ_ROT90_U "u") > > >>> @@ -1501,7 +1500,6 @@ (define_int_iterator VABDQ [VABDQ_S VABDQ_U]) > > >>> (define_int_iterator VADDQ_N [VADDQ_N_S VADDQ_N_U]) > > >>> (define_int_iterator VADDVAQ [VADDVAQ_S VADDVAQ_U]) > > >>> (define_int_iterator VADDVQ_P [VADDVQ_P_U VADDVQ_P_S]) > > >>> -(define_int_iterator VANDQ [VANDQ_U VANDQ_S]) > > >>> (define_int_iterator VBICQ [VBICQ_S VBICQ_U]) > > >>> (define_int_iterator VBRSRQ_N [VBRSRQ_N_U VBRSRQ_N_S]) > > >>> (define_int_iterator VCADDQ_ROT270 [VCADDQ_ROT270_S VCADDQ_ROT270_U]) > > >>> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md > > >>> index ecbaaa9..975eb7d 100644 > > >>> --- a/gcc/config/arm/mve.md > > >>> +++ b/gcc/config/arm/mve.md > > >>> @@ -894,17 +894,27 @@ (define_insn "mve_vaddvq_p_" > > >>> ;; > > >>> ;; [vandq_u, vandq_s]) > > >>> ;; > > >>> -(define_insn "mve_vandq_" > > >>> +;; signed and unsigned versions are the same: define the unsigned > > >>> +;; insn, and use an expander for the signed one as we still reference > > >>> +;; both names from arm_mve.h. > > >>> +(define_insn "mve_vandq_u" > > >>> [ > > >>> (set (match_operand:MVE_2 0 "s_register_operand" "=w") > > >>> - (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w") > > >>> - (match_operand:MVE_2 2 "s_register_operand" "w")] > > >>> - VANDQ)) > > >>> + (and:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w") > > >>> + (match_operand:MVE_2 2 "s_register_operand" "w"))) > > >> The predicate on the second operand is more restrictive than the one in > > >> expand 'neon_inv_logic_op2'. This should still work with immediates, or > > >> well I checked for integers, it generates a loop as such: > > >> > > > Right, thanks for catching it. > > > > > >> vldrw.32 q3, [r0] > > >> vldr.64 d4, .L8 > > >> vldr.64 d5, .L8+8 > > >> vand q3, q3, q2 > > >> vstrw.32 q3, [r2] > > >> > > >> MVE does support vand's with immediates, just like NEON, I suspect you > > >> could just copy the way Neon handles these, possibly worth the little > > >> extra effort. You can use dest[i] = a[i] & ~1 as a testcase. > > >> If you don't it might still be worth expanding the test to make sure > > >> other immediates-types combinations don't trigger an ICE? > > >> > > >> I'm not sure I understand why it loads it in two 64-bit chunks and not > > >> do a single load or not just do something like a vmov or vbic immediate. > > >> Anyhow that's a worry for another day I guess.. > > > Do you mean something like the attached (on top of this patch)? > > > I dislike the code duplication in mve_vandq_u which would > > > become a copy of and3_neon. > > Hi Christophe, > > > > Yeah that's what I meant. I agree with the code duplication. The reason > > we still use separate ones is because of the difference in supported > > modes. Maybe the right way around that would be to redefine VDQ as: > > > > (define_mode_iterator VDQ [(V8QI "TARGET_HAVE_NEON") V16QI > > (V4HI > > "TARGET_HAVE_NEON") V8HI > > (V2SI > > "TARGET_HAVE_NEON") V4SI > > (V4HF > > "TARGET_HAVE_NEON") V8HF > > (V2SF > > "TARGET_HAVE_NEON") V4SF V2DI]) > > > > And have a single define_expand and insn for both vector extensions. > > Indeed, I can try that. > I have also noticed the VNIM1 / VNINOTM1 pair. > > > Though we would also need to do something about the type attribut in > > case we want to have different scheduling types for both. Though right > > now we don't do any scheduling for MVE, I don't know whether these can > > be conditionalized on target features though, something to look at. > > > > > > > > The other concern is that it's not exercised by testcase: as you noted > > > the compiler uses a pair of loads to prepare the second operand. > > > > > > But indeed that's probably a separate problem. > > > > > > I guess your comments apply to patch 2 (vorr)? > > > > Yeah, forgot to reply to that one, but yes! I still need to have a look > > at 4-7. > > Ok thanks, I have a WIP fix for #7 (vmvn) anyway. And I never sent #7 because I knew it wasn't ready yet :-) > > > Kind regards, > > Andre > >