From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-alpha-return-65110-listarch-libc-alpha=sources.redhat.com@sourceware.org>
Received: (qmail 24652 invoked by alias); 18 Nov 2015 23:53:35 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Received: (qmail 24643 invoked by uid 89); 18 Nov 2015 23:53:34 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2
X-HELO: relay1.mentorg.com
Date: Wed, 18 Nov 2015 23:53:00 -0000
From: Joseph Myers <joseph@codesourcery.com>
To: Steve Munroe <sjmunroe@us.ibm.com>
CC: "libc-alpha@sourceware.org" <libc-alpha@sourceware.org>, Michael R
 Meissner <mrmeissn@us.ibm.com>, "Paul E. Murphy"
	<murphyp@linux.vnet.ibm.com>, Tulio Magno Quites Machado Filho
	<tuliom@linux.vnet.ibm.com>
Subject: Re: IEEE128 binary float to decimal float conversion routines
In-Reply-To: <201511182301.tAIN1Igc011083@d03av02.boulder.ibm.com>
Message-ID: <alpine.DEB.2.10.1511182322260.26547@digraph.polyomino.org.uk>
References: <564A16D5.3020105@linux.vnet.ibm.com> <alpine.DEB.2.10.1511161803500.30498@digraph.polyomino.org.uk> <564A6A90.40607@linux.vnet.ibm.com> <alpine.DEB.2.10.1511162356020.32387@digraph.polyomino.org.uk> <201511180131.tAI1Vs2L023118@d03av01.boulder.ibm.com>
 <alpine.DEB.2.10.1511180144150.2302@digraph.polyomino.org.uk> <201511182301.tAIN1Igc011083@d03av02.boulder.ibm.com>
User-Agent: Alpine 2.10 (DEB 1266 2009-07-14)
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
X-SW-Source: 2015-11/txt/msg00443.txt.bz2

On Wed, 18 Nov 2015, Steve Munroe wrote:

> > The problem I see is with the final "result = temp;" which converts
> double
> > to float.
> >
> > The earlier steps are probably accurate to within 1ulp.  But if temp (a
> > double) is half way between two representable float values - while the
> > original argument is very close to that half way value, but not exact -
> > then the final conversion will round to even, which may or may not be
> > correct depending on which side of that double value the original
> > _Decimal128 value was.  (Much the same applies in other rounding modes
> > when the double value equals a float value but the original value isn't
> > exactly that float value.)
> >
> Would changing the the decimal to binary conversion to be round to odd,
> offset the following round double to float?
> 
> http://www.exploringbinary.com/gcc-avoids-double-rounding-errors-with-round-to-odd/

No, because it would just offload the problem onto getting a conversion 
from _Decimal128 to double that is correctly rounded to odd, which is no 
easier (indeed, requires more work, not less) than the original problem of 
converting to float.

The existing code loses some of the original precision when taking just 15 
digits of the mantissa for conversion to double (not OK when you want to 
determine the exact value rounded to odd after further operations - in the 
hard cases, the final decimal digit will affect the correct rounding).  
Then the multiplications / divisions by precomputed powers of 10 use a 
table of long double values - while that gives extra precision (though 
probably not enough extra precision), it's also incompatible with doing 
rounding to odd, since IBM long double doesn't give meaningful "inexact" 
exceptions or work in non-default rounding modes, while rounding to odd 
requires working in round-to-zero mode and then checking the "inexact" 
flag.

> We could look at this if it requires a few additional instructions. But I
> would be very reluctant to resort to heavy handed (and extremely slow)
> solutions to get perfect rounding for a few corner cases.

It is of course possible to achieve IEEE-conforming results by first doing 
an approximate conversion with rigorous error bounds, then only doing the 
slower conversion if the result of the first conversion was very close to 
half way / exact (depending on the rounding mode), within the error bounds 
(so only using the slow case rarely, as long as you avoid it in the cases 
where the conversion is exact).  Cf. the dbl-64 libm functions that do 
things like that (and get complaints for the slowness of the slow case, 
because they use far more precision than is actually needed for correct 
rounding - in the case of conversions it's much easier to determine how 
much precision is actually needed).  (Now most of those libm functions 
don't actually need to be correctly rounded at all - TS 18661-4 defines 
separate names such as crexp for correctly rounded functions - whereas 
conversions between binary and decimal are defined to be correctly rounded 
by both TS 18661-2 and the older TR 24732 specification of C bindings for 
decimal floating-point.)

Another issue I see with the implementation: the "Obvious underflow" case 
for exponents below -39 includes a substantial part of the subnormal 
range, so that decimal values in that range will be wrongly converted to 
zero instead of appropriate subnormal floats (so being wildly inaccurate 
rather than the incorrect last place of the issue discussed above).  
Likewise for truncation to double (trunctddf.c).

-- 
Joseph S. Myers
joseph@codesourcery.com