From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 91216 invoked by alias); 28 Nov 2018 17:16:03 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 91061 invoked by uid 89); 28 Nov 2018 17:15:53 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,SPF_PASS autolearn=ham version=3.3.2 spammy=plane, 3200, rotation, rot X-HELO: foss.arm.com Received: from usa-sjc-mx-foss1.foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 28 Nov 2018 17:15:49 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 01EBA165C; Wed, 28 Nov 2018 09:15:48 -0800 (PST) Received: from arm.com (usa-sjc-imap-foss1.foss.arm.com [10.72.51.249]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id BE3FA3F59C; Wed, 28 Nov 2018 09:15:46 -0800 (PST) Date: Wed, 28 Nov 2018 17:16:00 -0000 From: James Greenhalgh To: Tamar Christina Cc: Kyrill Tkachov , "gcc-patches@gcc.gnu.org" , nd , Richard Earnshaw , Marcus Shawcroft Subject: Re: [PATCH 3/9][GCC][AArch64] Add autovectorization support for Complex instructions Message-ID: <20181128171543.GE24922@arm.com> References: <20181111102628.GA4529@arm.com> <5BE96F4D.50405@foss.arm.com> <20181112123143.GA26014@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181112123143.GA26014@arm.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-IsSubscribed: yes X-SW-Source: 2018-11/txt/msg02332.txt.bz2 On Mon, Nov 12, 2018 at 06:31:45AM -0600, Tamar Christina wrote: > Hi Kyrill, > > > Hi Tamar, > > > > On 11/11/18 10:26, Tamar Christina wrote: > > > Hi All, > > > > > > This patch adds the expander support for supporting autovectorization of complex number operations > > > such as Complex addition with a rotation along the Argand plane. This also adds support for complex > > > FMA. > > > > > > The instructions are described in the ArmARM [1] and are available from Armv8.3-a onwards. > > > > > > Concretely, this generates > > > > > > f90: > > > mov x3, 0 > > > .p2align 3,,7 > > > .L2: > > > ldr q0, [x0, x3] > > > ldr q1, [x1, x3] > > > fcadd v0.2d, v0.2d, v1.2d, #90 > > > str q0, [x2, x3] > > > add x3, x3, 16 > > > cmp x3, 3200 > > > bne .L2 > > > ret > > > > > > now instead of > > > > > > f90: > > > mov x4, x1 > > > mov x1, x2 > > > add x3, x4, 31 > > > add x2, x0, 31 > > > sub x3, x3, x1 > > > sub x2, x2, x1 > > > cmp x3, 62 > > > mov x3, 62 > > > ccmp x2, x3, 0, hi > > > bls .L5 > > > mov x2, x4 > > > add x3, x0, 3200 > > > .p2align 3,,7 > > > .L3: > > > ld2 {v2.2d - v3.2d}, [x0], 32 > > > ld2 {v4.2d - v5.2d}, [x2], 32 > > > cmp x0, x3 > > > fsub v0.2d, v2.2d, v5.2d > > > fadd v1.2d, v4.2d, v3.2d > > > st2 {v0.2d - v1.2d}, [x1], 32 > > > bne .L3 > > > ret > > > .L5: > > > add x6, x0, 8 > > > add x5, x4, 8 > > > add x2, x1, 8 > > > mov x3, 0 > > > .p2align 3,,7 > > > .L2: > > > ldr d1, [x0, x3] > > > ldr d3, [x5, x3] > > > ldr d0, [x6, x3] > > > ldr d2, [x4, x3] > > > fsub d1, d1, d3 > > > fadd d0, d0, d2 > > > str d1, [x1, x3] > > > str d0, [x2, x3] > > > add x3, x3, 16 > > > cmp x3, 3200 > > > bne .L2 > > > ret > > > > > > For complex additions with a 90* rotation along the Argand plane. > > > > > > [1] https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile > > > > > > Bootstrap and Regtest on aarch64-none-linux-gnu, arm-none-gnueabihf and x86_64-pc-linux-gnu > > > are still on going but previous patch showed no regressions. > > > > > > The instructions have also been tested on aarch64-none-elf and arm-none-eabi on a Armv8.3-a model > > > and -march=Armv8.3-a+fp16 and all tests pass. > > > > > > Ok for trunk? OK with the comment typos fixed. > > > gcc/ChangeLog: > > > > > > 2018-11-11 Tamar Christina > > > > > > * config/aarch64/aarch64-simd.md (aarch64_fcadd, > > > fcadd3, aarch64_fcmla, > > > fcmla4): New. > > > * config/aarch64/aarch64.h (TARGET_COMPLEX): New. > > > * config/aarch64/iterators.md (UNSPEC_FCADD90, UNSPEC_FCADD270, > > > UNSPEC_FCMLA, UNSPEC_FCMLA90, UNSPEC_FCMLA180, UNSPEC_FCMLA270): New. > > > (FCADD, FCMLA): New. > > > (rot, rotsplit1, rotsplit2): New. > > > * config/arm/types.md (neon_fcadd, neon_fcmla): New. Should we push these to an existing class for now, and split them later when someone provides a scheduling model which makes use of them? > > diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md > > index c4be3101fdec930707918106cd7c53cf7584553e..12a91183a98ea23015860c77a97955cb1b30bfbb 100644 > > --- a/gcc/config/aarch64/aarch64-simd.md > > +++ b/gcc/config/aarch64/aarch64-simd.md > > @@ -419,6 +419,63 @@ > > } > > ) > > > > +;; The fcadd and fcmla patterns are made UNSPEC for the explicitly due to the s/for the explicitly/explicitly > > +;; fact that their usage need to guarantee that the source vectors are s/need/needs > > +;; contiguous. It would be wrong to describe the operation without being able > > +;; to describe the permute that is also required, but even if that is done > > +;; the permute would have been created as a LOAD_LANES which means the values > > +;; in the registers are in the wrong order. > > +(define_insn "aarch64_fcadd" > > + [(set (match_operand:VHSDF 0 "register_operand" "=w") > > + (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w") > > + (match_operand:VHSDF 2 "register_operand" "w")] > > + FCADD))] > > + "TARGET_COMPLEX" > > + "fcadd\t%0., %1., %2., #" > > + [(set_attr "type" "neon_fcadd")] > > +)