From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 24652 invoked by alias); 18 Nov 2015 23:53:35 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 24643 invoked by uid 89); 18 Nov 2015 23:53:34 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: relay1.mentorg.com Date: Wed, 18 Nov 2015 23:53:00 -0000 From: Joseph Myers To: Steve Munroe CC: "libc-alpha@sourceware.org" , Michael R Meissner , "Paul E. Murphy" , Tulio Magno Quites Machado Filho Subject: Re: IEEE128 binary float to decimal float conversion routines In-Reply-To: <201511182301.tAIN1Igc011083@d03av02.boulder.ibm.com> Message-ID: References: <564A16D5.3020105@linux.vnet.ibm.com> <564A6A90.40607@linux.vnet.ibm.com> <201511180131.tAI1Vs2L023118@d03av01.boulder.ibm.com> <201511182301.tAIN1Igc011083@d03av02.boulder.ibm.com> User-Agent: Alpine 2.10 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" X-SW-Source: 2015-11/txt/msg00443.txt.bz2 On Wed, 18 Nov 2015, Steve Munroe wrote: > > The problem I see is with the final "result = temp;" which converts > double > > to float. > > > > The earlier steps are probably accurate to within 1ulp. But if temp (a > > double) is half way between two representable float values - while the > > original argument is very close to that half way value, but not exact - > > then the final conversion will round to even, which may or may not be > > correct depending on which side of that double value the original > > _Decimal128 value was. (Much the same applies in other rounding modes > > when the double value equals a float value but the original value isn't > > exactly that float value.) > > > Would changing the the decimal to binary conversion to be round to odd, > offset the following round double to float? > > http://www.exploringbinary.com/gcc-avoids-double-rounding-errors-with-round-to-odd/ No, because it would just offload the problem onto getting a conversion from _Decimal128 to double that is correctly rounded to odd, which is no easier (indeed, requires more work, not less) than the original problem of converting to float. The existing code loses some of the original precision when taking just 15 digits of the mantissa for conversion to double (not OK when you want to determine the exact value rounded to odd after further operations - in the hard cases, the final decimal digit will affect the correct rounding). Then the multiplications / divisions by precomputed powers of 10 use a table of long double values - while that gives extra precision (though probably not enough extra precision), it's also incompatible with doing rounding to odd, since IBM long double doesn't give meaningful "inexact" exceptions or work in non-default rounding modes, while rounding to odd requires working in round-to-zero mode and then checking the "inexact" flag. > We could look at this if it requires a few additional instructions. But I > would be very reluctant to resort to heavy handed (and extremely slow) > solutions to get perfect rounding for a few corner cases. It is of course possible to achieve IEEE-conforming results by first doing an approximate conversion with rigorous error bounds, then only doing the slower conversion if the result of the first conversion was very close to half way / exact (depending on the rounding mode), within the error bounds (so only using the slow case rarely, as long as you avoid it in the cases where the conversion is exact). Cf. the dbl-64 libm functions that do things like that (and get complaints for the slowness of the slow case, because they use far more precision than is actually needed for correct rounding - in the case of conversions it's much easier to determine how much precision is actually needed). (Now most of those libm functions don't actually need to be correctly rounded at all - TS 18661-4 defines separate names such as crexp for correctly rounded functions - whereas conversions between binary and decimal are defined to be correctly rounded by both TS 18661-2 and the older TR 24732 specification of C bindings for decimal floating-point.) Another issue I see with the implementation: the "Obvious underflow" case for exponents below -39 includes a substantial part of the subnormal range, so that decimal values in that range will be wrongly converted to zero instead of appropriate subnormal floats (so being wildly inaccurate rather than the incorrect last place of the issue discussed above). Likewise for truncation to double (trunctddf.c). -- Joseph S. Myers joseph@codesourcery.com