From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 30910 invoked by alias); 12 Apr 2012 15:48:27 -0000 Received: (qmail 30803 invoked by uid 22791); 12 Apr 2012 15:48:25 -0000 X-SWARE-Spam-Status: No, hits=-2.6 required=5.0 tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,KHOP_THREADED,RCVD_IN_DNSWL_LOW,TW_FM X-Spam-Check-By: sourceware.org Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 12 Apr 2012 15:48:11 +0000 Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Thu, 12 Apr 2012 16:48:08 +0100 Received: from [10.1.69.67] ([10.1.255.212]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Thu, 12 Apr 2012 16:49:12 +0100 Message-ID: <4F86F932.60606@arm.com> Date: Thu, 12 Apr 2012 15:48:00 -0000 From: Richard Earnshaw User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:11.0) Gecko/20120327 Thunderbird/11.0.1 MIME-Version: 1.0 To: Andrew Stubbs CC: "gcc-patches@gcc.gnu.org" , "patches@linaro.org" Subject: Re: [PATCH][ARM] NEON DImode neg References: <4F4D12C5.9070805@codesourcery.com> <4F704189.4010302@codesourcery.com> In-Reply-To: <4F704189.4010302@codesourcery.com> X-MC-Unique: 112041216480804301 Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2012-04/txt/msg00777.txt.bz2 On 26/03/12 11:14, Andrew Stubbs wrote: > On 28/02/12 17:45, Andrew Stubbs wrote: >> Hi all, >> >> This patch adds a DImode negate pattern for NEON. >> >> Unfortunately, the NEON vneg instruction only supports vectors, not >> singletons, so there's no direct way to do it in DImode, and the >> compiler ends up moving the value back to core registers, negating it, >> and returning to NEON afterwards: >> >> fmrrd r2, r3, d16 @ int >> negs r2, r2 >> sbc r3, r3, r3, lsl #1 >> fmdrr d16, r2, r3 @ int >> >> The new patch does it entirely in NEON: >> >> vmov.i32 d17, #0 @ di >> vsub.i64 d16, d17, d16 >> >> (Note that this is the result when combined with my recent patch for >> NEON DImode immediates. Without that you get a constant pool load.) >=20 > This updates fixes a bootstrap failure caused by an early clobber error.= =20 > I've also got a native regression test running now. >=20 > OK? >=20 > Andrew >=20 >=20 > neon-neg64.patch >=20 >=20 > 2012-03-26 Andrew Stubbs >=20 > gcc/ > * config/arm/arm.md (negdi2): Use gen_negdi2_neon. > * config/arm/neon.md (negdi2_neon): New insn. > Also add splitters for core and NEON registers. >=20 > --- > gcc/config/arm/arm.md | 8 +++++++- > gcc/config/arm/neon.md | 37 +++++++++++++++++++++++++++++++++++++ > 2 files changed, 44 insertions(+), 1 deletions(-) >=20 > diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md > index 751997f..f1dbbf7 100644 > --- a/gcc/config/arm/arm.md > +++ b/gcc/config/arm/arm.md > @@ -4048,7 +4048,13 @@ > (neg:DI (match_operand:DI 1 "s_register_operand" ""))) > (clobber (reg:CC CC_REGNUM))])] > "TARGET_EITHER" > - "" > + { > + if (TARGET_NEON) > + { > + emit_insn (gen_negdi2_neon (operands[0], operands[1])); > + DONE; > + } > + } > ) >=20=20 > ;; The constraints here are to prevent a *partial* overlap (where %Q0 = =3D=3D %R1). > diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md > index 3c88568..bf229a7 100644 > --- a/gcc/config/arm/neon.md > +++ b/gcc/config/arm/neon.md > @@ -922,6 +922,43 @@ > (const_string "neon_int_3")))] > ) >=20=20 > +(define_insn "negdi2_neon" > + [(set (match_operand:DI 0 "s_register_operand" "=3D w,?r,?&r,?w") > + (neg:DI (match_operand:DI 1 "s_register_operand" " w, 0, r, w"))) > + (clobber (match_scratch:DI 2 "=3D&w, X, X,&w")) > + (clobber (reg:CC CC_REGNUM))] > + "TARGET_NEON" > + "#" > + [(set_attr "length" "8") > + (set_attr "arch" "nota8,*,*,onlya8")] > +) > + If negation in Neon needs a scratch register, it seems to me to be somewhat odd that we're disparaging the ARM version. Also, wouldn't it be sensible to support a variant that was early-clobber on operand 0, but loaded immediate zero into that value first: vmov Dd, #0 vsub Dd, Dd, Dm That way you'll never need more than two registers, whereas today you want three. R.