From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-402277-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 31352 invoked by alias); 7 Jul 2015 17:17:16 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 31339 invoked by uid 89); 7 Jul 2015 17:17:15 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.2
X-HELO: eu-smtp-delivery-143.mimecast.com
Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (146.101.78.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 07 Jul 2015 17:17:14 +0000
Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-3-A2_mezvuS1imZqbHqDrBuA-1; Tue, 07 Jul 2015 18:17:09 +0100
Received: from [10.2.207.65] ([10.1.2.79]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959);	 Tue, 7 Jul 2015 18:17:08 +0100
Message-ID: <559C0995.6010105@arm.com>
Date: Tue, 07 Jul 2015 17:17:00 -0000
From: Alan Lawrence <alan.lawrence@arm.com>
User-Agent: Thunderbird 2.0.0.24 (X11/20101213)
MIME-Version: 1.0
To: Kyrill Tkachov <kyrylo.tkachov@arm.com>
CC: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,  Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>, Tejas Belagod <Tejas.Belagod@arm.com>,  Richard Earnshaw <Richard.Earnshaw@arm.com>
Subject: Re: [PATCH 3/16][ARM] Add float16x4_t intrinsics
References: <559BC75A.1080606@arm.com> <559BCF8A.4090704@arm.com> <559BFCBD.4080806@arm.com> <559BFF83.20306@arm.com> <559C03D0.8010104@arm.com>
In-Reply-To: <559C03D0.8010104@arm.com>
X-MC-Unique: A2_mezvuS1imZqbHqDrBuA-1
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
X-SW-Source: 2015-07/txt/msg00538.txt.bz2

Kyrill Tkachov wrote:
> On 07/07/15 17:34, Alan Lawrence wrote:
>> Kyrill Tkachov wrote:
>>> On 07/07/15 14:09, Kyrill Tkachov wrote:
>>>> Hi Alan,
>>>>
>>>> On 07/07/15 13:34, Alan Lawrence wrote:
>>>>> As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01335.html
>>>> For some context, the reference for these is at:
>>>> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm=
_neon_intrinsics_ref.pdf
>>>>
>>>> This patch is ok once you and Charles decide on how to proceed with th=
e two prerequisites.
>>> On second thought, the ACLE document at http://infocenter.arm.com/help/=
topic/com.arm.doc.ihi0053c/IHI0053C_acle_2_0.pdf
>>>
>>> says in 12.2.1:
>>> "float16 types are only available when the __fp16 type is defined, i.e.=
 when supported by the hardware"
>> However, we support __fp16 whenever the user specifies -mfp16-format=3Di=
eee or
>> -mfp16-format=3Dalternative, regardless of whether we have hardware supp=
ort or not.
>>
>> (Without hardware support, gcc generates calls to  __gnu_f2h_ieee or
>> __gnu_f2h_alternative instead of vcvtb.f16.f32, and  __gnu_h2f_ieee or
>> __gnu_h2f_alternative instead of vcvtb.f32.f16. However, there is no way=
 to
>> support __fp16 just using those hardware instructions without caring abo=
ut which
>> format is in use.)
>=20
> Hmmm... In my opinion intrinsics should aim to map to instructions rather=
 than go away and
> call library functions, but this is the existing functionality
> that current users might depend on :(

Sorry - to clarify: currently we generate __gnu_f2h_ieee / __gnu_h2f_ieee, =
to=20
convert between single __fp16 and 'float' values, when there is no HW. Gene=
ral=20
operations on scalar __fp16 values are performed by converting to float,=20
performing operations on float, and converting back. The __fp16 type is=20
available and "usable" without HW support, but only when -mfp16-format is s=
pecified.

(The existing) intrinsics operating on float16x[48] vectors (converting to/=
from=20
float32x4) are *not* available without hardware support; these intrinsics *=
are*=20
available without specifying -mfp16-format.

ACLE (4.1.2) allows toolchains to provide __fp16 when not implemented in HW=
,=20
even if this is not required.

> CC'ing the ARM maintainers and Tejas for an ACLE perspective.
> I think that we'd want to gate the definition of __fp16 on hardware avail=
ability as well
> (the -mfpu option) rather than just arm_fp16_format but I'm not sure of t=
he impact this will have
> on existing users.

Sure....but do we require -mfpu *and* -mfp16-format? s/and/or/ ?   Do we re=
quire=20
-mfp16-format for float16x[48] intrinsics, or allow format-agnostic code (a=
s HW=20
support allows us to!)?

I don't have very strong opinions as to which way we should go, I merely tr=
ied=20
to be consistent with the existing codebase, and to support as much code as=
=20
possible, although I agree I ignored cases where defining functions unexpec=
tedly=20
might cause problems.

Cheers, Alan