From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 20263 invoked by alias); 28 Apr 2014 10:41:24 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 20246 invoked by uid 89); 28 Apr 2014 10:41:23 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.8 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: service87.mimecast.com Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 28 Apr 2014 10:41:22 +0000 Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Mon, 28 Apr 2014 11:41:20 +0100 Received: from [10.1.209.147] ([10.1.255.212]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Mon, 28 Apr 2014 11:41:32 +0100 Message-ID: <535E304D.7000800@arm.com> Date: Mon, 28 Apr 2014 10:44:00 -0000 From: Ramana Radhakrishnan User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:14.0) Gecko/20120713 Thunderbird/14.0 MIME-Version: 1.0 To: "gcc-patches@gcc.gnu.org ;" CC: Christophe Lyon Subject: [Patch ARM 0/3] Neon intrinsics TLC - Replace intrinsics with GNU C implementations where possible and remove dead code. X-MC-Unique: 114042811412000801 Content-Type: text/plain; charset=WINDOWS-1252; format=flowed Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes X-SW-Source: 2014-04/txt/msg01836.txt.bz2 Hi, I was investigating a performance issue with Neon intrinsics and=20 realized this needed to happen. Patch 1/3 does this. I've special cased the ffast-math case for the=20 _f32 intrinsics to prevent the auto-vectorizer from coming along and=20 vectorizing addv2sf and addv4sf type operations which we don't want to=20 happen by default. Patch 1/3 causes apparent "regressions" in the rather=20 ineffective neon intrinsics tests that we currently carry soon hopefully=20 to be replaced by Christophe Lyon's rewrite that is being reviewed. On=20 the whole I deem this patch stack to be safe to go in if necessary.=20 These "regressions" are for -O0 with the vbic and vorn intrinsics which=20 don't now get combined and well, so be it. This then left us in the happy position of being able to delete code=20 but I was worried about LTO streaming as these "builtins" are=20 essentially streamed out in LTO object code format. However since we=20 make no promises about LTO compatibility across releases, that's safe=20 but I structured the dead code elimination as Patch 2/3. This will be=20 committed separately in case folks want to backport Patch 1/3 separately=20 and want to assure their users of LTO compatibility within a release=20 branch (if that even works :) ) . Patch 3/3 removes the ML to generate Neon intrinsics and the=20 documentation and updates the comments in the files to show that these=20 are now hand crafted rather than auto-generated. We've had these for=20 many years now and I think it's time we got rid of this. Not everyone=20 groks ML and it doesn't help that only one or 2 folks can actually do=20 this properly everytime. Instead of having these bottlenecks and given=20 the fact that the intrinsics are pretty stable now, there's no point in=20 retaining the generator interface. I'd rather get rid of them. The only=20 bit left is neon-schedgen.ml, neon.ml and neon-testgen.ml. I think we=20 can safely remove neon-testgen.ml once Christophe's testsuite is done=20 and we'll probably just have to carry neon-schedgen.ml / neon.ml as it=20 still generates the neon descriptions for both a8 and a9. The patch stack was caught up in the C++ type info mess recently and=20 I've tested this on a cross arm-linux-gnueabihf testsuite run and it=20 looks ok module the issues mentioned for Patch 1/3. I've deliberately=20 resisted deleting the entire gcc.target/arm/neon and neon-testgen.ml in=20 the hope that Christophe's testsuite will do the honours at that point=20 :). Given we're in stage 1 and that I think we're getting some where=20 with clyon's testsuite I feel that is reasonably practical in just=20 carrying the noise with these extra failures. Christophe and I will=20 testdrive his testsuite work in this space with these patches to see how=20 the conversion process works and if there are any issues with these patches. If there are issues I'm happy to hear about them. Will apply to trunk in a couple of days if no regressions with clyon's=20 testsuite for these intrinsics. regards Ramana --=20 Ramana Radhakrishnan Principal Engineer ARM Ltd.