From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-366131-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 20263 invoked by alias); 28 Apr 2014 10:41:24 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 20246 invoked by uid 89); 28 Apr 2014 10:41:23 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.8 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2
X-HELO: service87.mimecast.com
Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 28 Apr 2014 10:41:22 +0000
Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Mon, 28 Apr 2014 11:41:20 +0100
Received: from [10.1.209.147] ([10.1.255.212]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959);	 Mon, 28 Apr 2014 11:41:32 +0100
Message-ID: <535E304D.7000800@arm.com>
Date: Mon, 28 Apr 2014 10:44:00 -0000
From: Ramana Radhakrishnan <ramrad01@arm.com>
User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:14.0) Gecko/20120713 Thunderbird/14.0
MIME-Version: 1.0
To: "gcc-patches@gcc.gnu.org ;" <gcc-patches@gcc.gnu.org>
CC: Christophe Lyon <christophe.lyon@linaro.org>
Subject: [Patch ARM 0/3] Neon intrinsics TLC - Replace intrinsics with GNU C implementations where possible and remove dead code.
X-MC-Unique: 114042811412000801
Content-Type: text/plain; charset=WINDOWS-1252; format=flowed
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
X-SW-Source: 2014-04/txt/msg01836.txt.bz2

Hi,

	I was investigating a performance issue with Neon intrinsics and=20
realized this needed to happen.

	Patch 1/3 does this. I've special cased the ffast-math case for the=20
_f32 intrinsics to prevent the auto-vectorizer from coming along and=20
vectorizing addv2sf and addv4sf type operations which we don't want to=20
happen by default. Patch 1/3 causes apparent "regressions" in the rather=20
ineffective neon intrinsics tests that we currently carry soon hopefully=20
to be replaced by Christophe Lyon's rewrite that is being reviewed. On=20
the whole I deem this patch stack to be safe to go in if necessary.=20
These "regressions" are for -O0 with the vbic and vorn intrinsics which=20
don't now get combined and well, so be it.

	This then left us in the happy position of being able to delete code=20
but I was worried about LTO streaming as these "builtins" are=20
essentially streamed out in LTO object code format. However since we=20
make no promises about LTO compatibility across releases, that's safe=20
but I structured the dead code elimination as Patch 2/3. This will be=20
committed separately in case folks want to backport Patch 1/3 separately=20
and want to assure their users of LTO compatibility within a release=20
branch (if that even works :)  ) .

	Patch 3/3 removes the ML to generate Neon intrinsics and the=20
documentation and updates the comments in the files to show that these=20
are now hand crafted rather than auto-generated. We've had these for=20
many years now and I think it's time we got rid of this. Not everyone=20
groks ML and it doesn't help that only one or 2 folks can actually do=20
this properly everytime. Instead of having these bottlenecks and given=20
the fact that the intrinsics are pretty stable now, there's no point in=20
retaining the generator interface. I'd rather get rid of them. The only=20
bit left is neon-schedgen.ml, neon.ml and neon-testgen.ml. I think we=20
can safely remove neon-testgen.ml once Christophe's testsuite is done=20
and we'll probably just have to carry neon-schedgen.ml / neon.ml as it=20
still generates the neon descriptions for both a8 and a9.

	The patch stack was caught up in the C++ type info mess recently and=20
I've tested this on a cross arm-linux-gnueabihf testsuite run and it=20
looks ok module the issues mentioned for Patch 1/3. I've deliberately=20
resisted deleting the entire gcc.target/arm/neon and neon-testgen.ml in=20
the hope that Christophe's testsuite will do the honours at that point=20
:). Given we're in stage 1 and that I think we're getting some where=20
with clyon's testsuite I feel that is reasonably practical in just=20
carrying the noise with these extra failures. Christophe and I will=20
testdrive his testsuite work in this space with these patches to see how=20
the conversion process works and if there are any issues with these patches.

If there are issues I'm happy to hear about them.

Will apply to trunk in a couple of days if no regressions with clyon's=20
testsuite for these intrinsics.


regards
Ramana
--=20
Ramana Radhakrishnan
Principal Engineer
ARM Ltd.