From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 57321 invoked by alias); 15 Dec 2015 21:18:58 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 57312 invoked by uid 89); 15 Dec 2015 21:18:58 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,T_RP_MATCHES_RCVD autolearn=no version=3.3.2 X-HELO: e36.co.us.ibm.com X-IBM-Helo: d03dlp03.boulder.ibm.com X-IBM-MailFrom: munroesj@linux.vnet.ibm.com X-IBM-RcptTo: libc-alpha@sourceware.org Subject: Re: IEEE128 binary float to decimal float conversion routines From: Steven Munroe Reply-To: munroesj@linux.vnet.ibm.com To: Joseph Myers Cc: Steve Munroe , "libc-alpha@sourceware.org" , Michael R Meissner , "Paul E. Murphy" , Tulio Magno Quites Machado Filho In-Reply-To: References: <564A16D5.3020105@linux.vnet.ibm.com> <564A6A90.40607@linux.vnet.ibm.com> <201511180131.tAI1Vs2L023118@d03av01.boulder.ibm.com> <201511182301.tAIN1Igc011083@d03av02.boulder.ibm.com> <1449594999.9274.45.camel@oc7878010663> Content-Type: text/plain; charset="UTF-8" Date: Tue, 15 Dec 2015 21:18:00 -0000 Message-ID: <1450214326.9926.37.camel@oc7878010663> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15121521-0021-0000-0000-0000154F4821 X-SW-Source: 2015-12/txt/msg00356.txt.bz2 On Tue, 2015-12-08 at 18:25 +0000, Joseph Myers wrote: > On Tue, 8 Dec 2015, Steven Munroe wrote: > > > The PowerISA (2.05 and later ) Decimal Floating-point "Round to Prepare > > for Shorter Precision" mode would not address the Decimal128 > > convert/truncate to shorter binary floating-point (double or float). > > > > But it will address the Float128 convert/truncate to to shorter decimal > > floating-pointer (_Decimal64 and _Decimal32). > > Yes, if you have a conversion from _Float128 to _Decimal128 that works for > Round to Prepare for Shorter Precision then you could use that as an > intermediate step in converting to _Decimal64 and _Decimal32 (it's not the > most efficient approach, but it's certainly simpler than having multiple > variants of the full conversion code). > > The hardest part is converting from _Float128 to _Decimal128. Once you > can do that (for all rounding modes and with correct exceptions), > converting to the narrower types is easy, whether you have multiple > variants of the same code or use Round to Prepare for Shorter Precision. > Likewise for conversions in the other direction - _Decimal128 to _Float128 > is the hardest part, if you can do that then converting to narrower types > is straightforward. > > > So in the case of TIMode or KFmode conversion to _Decimal64/_Decimal32 > > we can save the current rounding mode (fe_dec_getround()) then use > > fe_dec_setround (DEC_ROUND_05UP) to set the "Round to Prepare for > > Shorter Precision" before the multiply that converts the mantissa to the > > target radix. Then just before the the instruction that rounds to the > > final (_Decimal64 or _Decimal32) type, we restore the callers rounding > > more and execute the final version in the correct rounding mode. > > > > I believe that addresses you double rounding concern for these > > conversions. > > For TImode it's not hard to avoid double rounding this way, by splitting > the TImode number into two numbers that are exactly convertible to > _Decimal128, so the only inexact operation is a single addition, which can > be done in the Round to Prepare for Shorter Precision mode (and then you > can convert to _Decimal64 / _Decimal32 in the original mode). [In all > cases, getting the preferred quantum for decimal results is a minor matter > to deal with at the end.] > > For _Float128, this only reduces the problem to doing a conversion of > _Float128 to _Decimal128 in that mode. Which is not simply a single > multiply. Not all mantissa values for _Float128 can be represented in > _Decimal128 (2**113 > 10**34). And nor can all powers of 2 that you need > to multiply / divide by be represented in _Decimal128. And when you have > more than one inexact operation, the final result is generally not > correctly rounded for any rounding mode. And so the complexity goes > massively up (compare the fmaf implementation with round-to-odd on double > - a single inexact addition on double done in round-to-odd followed by > converting back to float in the original rounding mode - with the > sysdeps/ieee754/dbl-64/s_fma.c code, which also uses round-to-odd, but > with far more complexity in order to achieve the precision extension > required for intermediate computations). > > You may well be able to use precision-extension techniques - so doing a > conversion that produces a sum of two or three _Decimal128 values (the > exact number needed being determined by a continued fraction analysis) and > then adding up those values in the Round to Prepare for Shorter Precision > mode. But I'd be surprised if there is a simple and correct > implementation of the conversion that doesn't involve extending > intermediate precision to have about 128 extra bits, given the complexity > and extra precision described in the papers on this subject such as the > one referenced in this thread. > > > My observation is that a common element of these conversion is a large > > precision multiply (to convert the radix of the mantissa) then a > > possible truncation (with rounding) to the final precision in the new > > radix. > > Where large precision means about 256 bits (not simply 128 * 128 -> 256 > multiplication, but also having the powers of 2 or 10 to that precision, > so more like 128 * 256 -> 384 which can be truncated to about 256). > Again, exact precisions to be determined by continued fraction analysis. > Ok let my try with the simpler case of _Decimal128 to Float128 where the significand conversion is exact (log2(10e34) -> 112.9 -> <= 113 bits). So you mention "continued fraction analysis" which was not part of my formal education (40+ years ago) but I will try. The question is how many significant bits does it take to represent a power of 10? This is interesting because my implementation of trunctfkf involves a multiple of converted (to float128) mantissa by 10eN where N is the exponent of the original _Decimal128. So what powers of 10 can be represented exactly as a float128? The requires significant bits should be log2(10eN), but as the binary of an exact power of 10 generate trailing zero bit for each N (1000 has 3 trailing zeros, 10000000 has 6, ...) So the number significant bits are log2(10eN)-N. A quick binary search of shows that values up to 10e48 require less than 113-bits and so can be represented exactly in _float128. So any _Decimal128 < 9999999999999999999999999999999999e48 (1.0e82) can be converted with one _Float128 multiply, of 2 exact values, giving a rounded result to 1ULP. This does not require conversion to string and back or carrying more precision then naturally available in the _float128. Now as the exponent of _Decimal128 input exceeds 48 the table of _float128 powers of 10 will contain values that have been rounded. Now I assume that some additional exponent range can be covered by by insuring that the table _float128 powers_of_10 have been pre-rounded to odd? Do you agree with this analysis?