From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 92262 invoked by alias); 18 May 2016 15:21:06 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 92252 invoked by uid 89); 18 May 2016 15:21:05 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,SPF_PASS,URIBL_RED autolearn=ham version=3.3.2 spammy=promotions, expanders, narrower X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Wed, 18 May 2016 15:20:55 +0000 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=SVR-IES-FEM-01.mgc.mentorg.com) by relay1.mentorg.com with esmtp id 1b33HI-0007d6-1v from joseph_myers@mentor.com ; Wed, 18 May 2016 08:20:52 -0700 Received: from digraph.polyomino.org.uk (137.202.0.76) by SVR-IES-FEM-01.mgc.mentorg.com (137.202.0.104) with Microsoft SMTP Server id 14.3.224.2; Wed, 18 May 2016 16:20:50 +0100 Received: from jsm28 (helo=localhost) by digraph.polyomino.org.uk with local-esmtp (Exim 4.86_2) (envelope-from ) id 1b33HF-000082-55; Wed, 18 May 2016 15:20:49 +0000 Date: Wed, 18 May 2016 15:21:00 -0000 From: Joseph Myers To: Matthew Wahab CC: gcc-patches Subject: Re: [PATCH 8/17][ARM] Add VFP FP16 arithmetic instructions. In-Reply-To: <573C70BE.2010005@foss.arm.com> Message-ID: References: <573B28A3.9030603@foss.arm.com> <573B2C4E.4090900@foss.arm.com> <573C70BE.2010005@foss.arm.com> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" X-SW-Source: 2016-05/txt/msg01378.txt.bz2 On Wed, 18 May 2016, Matthew Wahab wrote: > AArch64 follows IEEE-754 but ARM (AArch32) adds restrictions like > flush-to-zero that could affect the outcome of a calculation. The result of a float computation on two values immediately promoted from fp16 cannot be within the subnormal range for float. Thus, only one flush to zero can happen, on the final conversion back to fp16, and that cannot make the result different from doing direct arithmetic in fp16 (assuming flush to zero affects conversion from float to fp16 the same way it affects direct fp16 arithmetic). > > So I'd expect e.g. > > > > __fp16 a, b; > > __fp16 c = a / b; > > > > to generate the new instructions, because direct binary16 arithmetic is a > > correct implementation of (__fp16) ((float) a / (float) b). > > Something like > > __fp16 a, b, c; > __fp16 d = (a / b) * c; > > would be done as the sequence of single precision operations: > > vcvtb.f32.f16 s0, s0 > vcvtb.f32.f16 s1, s1 > vcvtb.f32.f16 s2, s2 > vdiv.f32 s15, s0, s1 > vmul.f32 s0, s15, s2 > vcvtb.f16.f32 s0, s0 > > Doing this with vdiv.f16 and vmul.f16 could change the calculated result > because the flush-to-zero rule is related to operation precision so affects > the value of a vdiv.f16 differently from the vdiv.f32. Flush to zero is irrelevant here, since that sequence of three operations also cannot produce anything in the subnormal range for float. (It's true that double rounding is relevant for your example and so converting it to direct fp16 arithmetic would not be safe for that reason.) That example is also not relevant to my point. In my example > > __fp16 a, b; > > __fp16 c = a / b; it's already the case that GCC will (a) promote to float, because the target hooks say to do so, (b) notice that the result is immediately converted back to fp16, and that this means fp16 arithmetic could be used directly, and so adjust it back to fp16 arithmetic (see convert_to_real_1, and the call therein to real_can_shorten_arithmetic which knows conditions under which it's safe to change such promoted arithmetic back to arithmetic on a narrower type). Then the expanders (I think) notice the lack of direct HFmode arithmetic and so put the widening / narrowing back again. But in your example, *because* doing it with direct fp16 arithmetic would not be equivalent, convert_to_real_1 would not eliminate the conversions to float, the float operations would still be present at expansion time, and so direct HFmode arithmetic patterns would not match. In short: instructions for direct HFmode arithmetic should be described with patterns with the standard names. It's the job of the architecture-independent compiler to ensure that fp16 arithmetic in the user's source code only generates direct fp16 arithmetic in GIMPLE (and thus ends up using those patterns) if that is a correct representation of the source code's semantics according to ACLE. The intrinsics you provide can then be written to use direct arithmetic, and rely on convert_to_real_1 eliminating the promotions, rather than needing built-in functions at all, just like many arm_neon.h intrinsics make direct use of GNU C vector arithmetic. -- Joseph S. Myers joseph@codesourcery.com