From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-alpha-return-65646-listarch-libc-alpha=sources.redhat.com@sourceware.org>
Received: (qmail 57321 invoked by alias); 15 Dec 2015 21:18:58 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Received: (qmail 57312 invoked by uid 89); 15 Dec 2015 21:18:58 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,T_RP_MATCHES_RCVD autolearn=no version=3.3.2
X-HELO: e36.co.us.ibm.com
X-IBM-Helo: d03dlp03.boulder.ibm.com
X-IBM-MailFrom: munroesj@linux.vnet.ibm.com
X-IBM-RcptTo: libc-alpha@sourceware.org
Subject: Re: IEEE128 binary float to decimal float conversion routines
From: Steven Munroe <munroesj@linux.vnet.ibm.com>
Reply-To: munroesj@linux.vnet.ibm.com
To: Joseph Myers <joseph@codesourcery.com>
Cc: Steve Munroe <sjmunroe@us.ibm.com>,
        "libc-alpha@sourceware.org"
 <libc-alpha@sourceware.org>,
        Michael R Meissner <mrmeissn@us.ibm.com>,
        "Paul E. Murphy" <murphyp@linux.vnet.ibm.com>,
        Tulio Magno Quites Machado
 Filho <tuliom@linux.vnet.ibm.com>
In-Reply-To: <alpine.DEB.2.10.1512081737230.19569@digraph.polyomino.org.uk>
References: <564A16D5.3020105@linux.vnet.ibm.com>
	 <alpine.DEB.2.10.1511161803500.30498@digraph.polyomino.org.uk>
	 <564A6A90.40607@linux.vnet.ibm.com>
	 <alpine.DEB.2.10.1511162356020.32387@digraph.polyomino.org.uk>
	 <201511180131.tAI1Vs2L023118@d03av01.boulder.ibm.com>
	 <alpine.DEB.2.10.1511180144150.2302@digraph.polyomino.org.uk>
	 <201511182301.tAIN1Igc011083@d03av02.boulder.ibm.com>
	 <alpine.DEB.2.10.1511182322260.26547@digraph.polyomino.org.uk>
	 <1449594999.9274.45.camel@oc7878010663>
	 <alpine.DEB.2.10.1512081737230.19569@digraph.polyomino.org.uk>
Content-Type: text/plain; charset="UTF-8"
Date: Tue, 15 Dec 2015 21:18:00 -0000
Message-ID: <1450214326.9926.37.camel@oc7878010663>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
X-TM-AS-MML: disable
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 15121521-0021-0000-0000-0000154F4821
X-SW-Source: 2015-12/txt/msg00356.txt.bz2

On Tue, 2015-12-08 at 18:25 +0000, Joseph Myers wrote:
> On Tue, 8 Dec 2015, Steven Munroe wrote:
> 
> > The PowerISA (2.05 and later ) Decimal Floating-point "Round to Prepare
> > for Shorter Precision" mode would not address the Decimal128
> > convert/truncate to shorter binary floating-point (double or float).
> > 
> > But it will address the Float128 convert/truncate to to shorter decimal
> > floating-pointer (_Decimal64 and _Decimal32).
> 
> Yes, if you have a conversion from _Float128 to _Decimal128 that works for 
> Round to Prepare for Shorter Precision then you could use that as an 
> intermediate step in converting to _Decimal64 and _Decimal32 (it's not the 
> most efficient approach, but it's certainly simpler than having multiple 
> variants of the full conversion code).
> 
> The hardest part is converting from _Float128 to _Decimal128.  Once you 
> can do that (for all rounding modes and with correct exceptions), 
> converting to the narrower types is easy, whether you have multiple 
> variants of the same code or use Round to Prepare for Shorter Precision.  
> Likewise for conversions in the other direction - _Decimal128 to _Float128 
> is the hardest part, if you can do that then converting to narrower types 
> is straightforward.
> 
> > So in the case of TIMode or KFmode conversion to _Decimal64/_Decimal32
> > we can save the current rounding mode (fe_dec_getround()) then use
> > fe_dec_setround (DEC_ROUND_05UP) to set the "Round to Prepare for
> > Shorter Precision" before the multiply that converts the mantissa to the
> > target radix. Then just before the the instruction that rounds to the
> > final (_Decimal64 or _Decimal32) type, we restore the callers rounding
> > more and execute the final version in the correct rounding mode.
> > 
> > I believe that addresses you double rounding concern for these
> > conversions.
> 
> For TImode it's not hard to avoid double rounding this way, by splitting 
> the TImode number into two numbers that are exactly convertible to 
> _Decimal128, so the only inexact operation is a single addition, which can 
> be done in the Round to Prepare for Shorter Precision mode (and then you 
> can convert to _Decimal64 / _Decimal32 in the original mode).  [In all 
> cases, getting the preferred quantum for decimal results is a minor matter 
> to deal with at the end.]
> 
> For _Float128, this only reduces the problem to doing a conversion of 
> _Float128 to _Decimal128 in that mode.  Which is not simply a single 
> multiply.  Not all mantissa values for _Float128 can be represented in 
> _Decimal128 (2**113 > 10**34).  And nor can all powers of 2 that you need 
> to multiply / divide by be represented in _Decimal128.  And when you have 
> more than one inexact operation, the final result is generally not 
> correctly rounded for any rounding mode.  And so the complexity goes 
> massively up (compare the fmaf implementation with round-to-odd on double 
> - a single inexact addition on double done in round-to-odd followed by 
> converting back to float in the original rounding mode - with the 
> sysdeps/ieee754/dbl-64/s_fma.c code, which also uses round-to-odd, but 
> with far more complexity in order to achieve the precision extension 
> required for intermediate computations).
> 
> You may well be able to use precision-extension techniques - so doing a 
> conversion that produces a sum of two or three _Decimal128 values (the 
> exact number needed being determined by a continued fraction analysis) and 
> then adding up those values in the Round to Prepare for Shorter Precision 
> mode.  But I'd be surprised if there is a simple and correct 
> implementation of the conversion that doesn't involve extending 
> intermediate precision to have about 128 extra bits, given the complexity 
> and extra precision described in the papers on this subject such as the 
> one referenced in this thread.
> 
> > My observation is that a common element of these conversion is a large
> > precision multiply (to convert the radix of the mantissa) then a
> > possible truncation (with rounding) to the final precision in the new
> > radix. 
> 
> Where large precision means about 256 bits (not simply 128 * 128 -> 256 
> multiplication, but also having the powers of 2 or 10 to that precision, 
> so more like 128 * 256 -> 384 which can be truncated to about 256).  
> Again, exact precisions to be determined by continued fraction analysis.
> 

Ok let my try with the simpler case of _Decimal128 to Float128 where the
significand conversion is exact (log2(10e34) -> 112.9 -> <= 113 bits).
So you mention "continued fraction analysis" which was not part of my
formal education (40+ years ago) but I will try.

The question is how many significant bits does it take to represent a
power of 10? This is interesting because my implementation of trunctfkf
involves a multiple of converted (to float128) mantissa by 10eN where N
is the exponent of the original _Decimal128. So what powers of 10 can be
represented exactly as a float128?

The requires significant bits should be log2(10eN), but as the binary of
an exact power of 10 generate trailing zero bit for each N (1000 has 3
trailing zeros, 10000000 has 6, ...)

So the number significant bits are log2(10eN)-N. A quick binary search
of shows that values up to 10e48 require less than 113-bits and so can
be represented exactly in _float128.

So any _Decimal128 < 9999999999999999999999999999999999e48 (1.0e82) can
be converted with one _Float128 multiply, of 2 exact values, giving a
rounded result to 1ULP.

This does not require conversion to string and back or carrying more
precision then naturally available in the _float128.

Now as the exponent of _Decimal128 input exceeds 48 the table of
_float128 powers of 10 will contain values that have been rounded. Now I
assume that some additional exponent range can be covered by by insuring
that the table _float128 powers_of_10 have been pre-rounded to odd?

Do you agree with this analysis?