From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=SE+A=DO=mentor.com=joseph_myers@sourceware.org>
Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180])
	by sourceware.org (Postfix) with ESMTPS id 3ED503858CDA
	for <gcc-patches@gcc.gnu.org>; Fri, 28 Jul 2023 18:03:42 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3ED503858CDA
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
X-IronPort-AV: E=Sophos;i="6.01,238,1684828800"; 
   d="scan'208";a="12995735"
Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167])
  by esa3.mentor.iphmx.com with ESMTP; 28 Jul 2023 10:03:38 -0800
IronPort-SDR: wvqOfvLPpgULBiUZ/2zQoyEUMAKncfCl0z2Ki0JGDa6AsjUHcR1jsXnmF1GX2BjL+/wypLzZni
 vM7r0wrubGuO54+UYBiJM4+3+k0z7VJngyrClWjFC5BnywGzk2GW9PpCrydZon5WeHlrsxjylK
 sXR3aX/6Ei/DCMW7y3wSGPFC32UXQYfOA6vlLyWGdoySdsKNICyZZeuqpymNd5nUsPQ2lIvgpb
 lLq9TbZ6TfpcUaXr+mOayXE/wFcC0bynFol4DXF4RPWYgJegbZM0Trh/imrq6PfShok6w+eBiK
 QJk=
Date: Fri, 28 Jul 2023 18:03:33 +0000
From: Joseph Myers <joseph@codesourcery.com>
To: Jakub Jelinek <jakub@redhat.com>
CC: <gcc-patches@gcc.gnu.org>, Richard Biener <rguenther@suse.de>, Uros Bizjak
	<ubizjak@gmail.com>, <hjl.tools@gmail.com>
Subject: Re: [PATCH 0/5] GCC _BitInt support [PR102989]
In-Reply-To: <ZMOE5myZNxw5gzSO@tucnak>
Message-ID: <c4d69e7b-58b1-239-77a-1e92aa81e565@codesourcery.com>
References: <ZMKlEWqSJ941v3UV@tucnak> <28223020-b396-2018-12bc-54b084d3ee8f@codesourcery.com> <ZMOE5myZNxw5gzSO@tucnak>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-13.mgc.mentorg.com (139.181.222.13) To
 svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10)
X-Spam-Status: No, score=-3105.2 required=5.0 tests=BAYES_00,HEADER_FROM_DIFFERENT_DOMAINS,KAM_DMARC_STATUS,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Fri, 28 Jul 2023, Jakub Jelinek via Gcc-patches wrote:

> I had a brief look at libbid and am totally unimpressed.
> Seems we don't implement {,unsigned} __int128 <-> _Decimal{32,64,128}
> conversions at all (we emit calls to __bid_* functions which don't exist),

That's bug 65833.

> the library (or the way we configure it) doesn't care about exceptions nor
> rounding mode (see following testcase)

And this is related to the never-properly-resolved issue about the split 
of responsibility between libgcc, libdfp and glibc.

Decimal floating point has its own rounding mode, set with fe_dec_setround 
and read with fe_dec_getround (so this test is incorrect).  In some cases 
(e.g. Power), that's a hardware rounding mode.  In others, it needs to be 
implemented in software as a TLS variable.  In either case, it's part of 
the floating-point environment, so should be included in the state 
manipulated by functions using fenv_t or femode_t.  Exceptions are shared 
with binary floating point.

libbid in libgcc has its own TLS rounding mode and exceptions state, but 
the former isn't connected to fe_dec_setround / fe_dec_getround functions, 
while the latter isn't the right way to do things when there's hardware 
exceptions state.

libdfp - https://github.com/libdfp/libdfp - is a separate library, not 
part of libgcc or glibc (and with its own range of correctness bugs) - 
maintained, but not very actively (maybe more so than the DFP support in 
GCC - we haven't had a listed DFP maintainer since 2019).  It has various 
standard DFP library functions - maybe not the full C23 set, though some 
of the TS 18661-2 functions did get added, so it's not just the old TR 
24732 set.  That includes its own version of the libgcc support, which I 
think has some more support for using exceptions and rounding modes.  It 
includes the fe_dec_getround and fe_dec_setround functions.  It doesn't do 
anything to help with the issue of including the DFP rounding state in the 
state manipulated by functions such as fegetenv.

Being a separate library probably in turn means that it's less likely to 
be used (although any code that uses DFP can probably readily enough 
choose to use a separate library if it wishes).  And it introduces issues 
with linker command line ordering, if the user intends to use libdfp's 
copy of the functions but the linker processes -lgcc first.

For full correctness, at least some functionality (such as the rounding 
modes and associated inclusion in fenv_t) would probably need to go in 
glibc.  See 
https://sourceware.org/pipermail/libc-alpha/2019-September/106579.html 
for more discussion.

But if you do put some things in glibc, maybe you still don't want the 
_BitInt conversions there?  Rather, if you keep the _BitInt conversions in 
libgcc (even when the other support is in glibc), you'd have some 
libc-provided interface for libgcc code to get the DFP rounding mode from 
glibc in the case where it's handled in software, like some interfaces 
already present in the soft-float powerpc case to provide access to its 
floating-point state from libc (and something along the lines of 
sfp-machine.h could tell libgcc how to use either that interface or 
hardware instructions to access the rounding mode and exceptions as 
needed).

> and for integral <-> _Decimal32
> conversions implement them as integral <-> _Decimal64 <-> _Decimal32
> conversions.  While in the _Decimal32 -> _Decimal64 -> integral
> direction that is probably ok, even if exceptions and rounding (other than
> to nearest) were supported, the other direction I'm sure can suffer from
> double rounding.

Yes, double rounding would be an issue for converting 64-bit integers to 
_Decimal32 via _Decimal64 (it would be fine to convert 32-bit integers 
like that since they can be exactly represented in _Decimal64; it would be 
fine to convert 64-bit integers via _Decimal128).

> So, wonder if it wouldn't be better to implement these in the soft-fp
> infrastructure which at least has the exception and rounding mode support.
> Unlike DPD, decoding BID seems to be about 2 simple tests of the 4 bits
> below the sign bit and doing some shifts, so not something one needs a 10MB
> of a library for.  Now, sure, 5MB out of that are generated tables in

Note that representations with too-large significand are defined to be 
noncanonical representations of zero, so you need to take care of that in 
decoding BID.

> bid_binarydecimal.c, but unfortunately those are static and not in a form
> which could be directly fed into multiplication (unless we'd want to go
> through conversions to/from strings).
> So, it seems to be easier to guess needed power of 10 from number of binary
> digits or vice versa, have a small table of powers of 10 (say those which
> fit into a limb) and construct larger powers of 10 by multiplicating those
> several times, _Decimal128 has exponent up to 6144 which is ~ 2552 bytes
> or 319 64-bit limbs, but having a table with all the 6144 powers of ten
> would be just huge.  In 64-bit limb fit power of ten until 10^19, so we
> might need say < 32 multiplications to cover it all (but with the current
> 575 bits limitation far less).  Perhaps later on write a few selected powers
> of 10 as _BitInt to decrease that number.

You could e.g. have a table up to 10^(N-1) for some N, and 10^N, 10^2N 
etc. up to 10^6144 (or rather up to 10^6111, which can then be multiplied 
by a 34-digit integer significand), so that only one multiplication is 
needed to get the power of 10 and then a second multiplication by the 
significand.  (Or split into three parts at the cost of an extra 
multiplication, or multiply the significand by 1, 10, 100, 1000 or 10000 
as a multiplication within 128 bits and so only need to compute 10^k for k 
a multiple of 5, or any number of variations on those themes.)

> > For conversion *from _BitInt to DFP*, the _BitInt value needs to be 
> > expressed in decimal.  In the absence of optimized multiplication / 
> > division for _BitInt, it seems reasonable enough to do this naively 
> > (repeatedly dividing by a power of 10 that fits in one limb to determine 
> > base 10^N digits from the least significant end, for example), modulo 
> > detecting obvious overflow cases up front (if the absolute value is at 
> 
> Wouldn't it be cheaper to guess using the 10^3 ~= 2^10 approximation
> and instead repeatedly multiply like in the other direction and then just
> divide once with remainder?

I don't know what's most efficient here, given that it's quadratic in the 
absence of optimized multiplication / division (so a choice between 
different approaches that take quadratic time).

-- 
Joseph S. Myers
joseph@codesourcery.com