From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180]) by sourceware.org (Postfix) with ESMTPS id 3ED503858CDA for ; Fri, 28 Jul 2023 18:03:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3ED503858CDA Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="6.01,238,1684828800"; d="scan'208";a="12995735" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa3.mentor.iphmx.com with ESMTP; 28 Jul 2023 10:03:38 -0800 IronPort-SDR: wvqOfvLPpgULBiUZ/2zQoyEUMAKncfCl0z2Ki0JGDa6AsjUHcR1jsXnmF1GX2BjL+/wypLzZni vM7r0wrubGuO54+UYBiJM4+3+k0z7VJngyrClWjFC5BnywGzk2GW9PpCrydZon5WeHlrsxjylK sXR3aX/6Ei/DCMW7y3wSGPFC32UXQYfOA6vlLyWGdoySdsKNICyZZeuqpymNd5nUsPQ2lIvgpb lLq9TbZ6TfpcUaXr+mOayXE/wFcC0bynFol4DXF4RPWYgJegbZM0Trh/imrq6PfShok6w+eBiK QJk= Date: Fri, 28 Jul 2023 18:03:33 +0000 From: Joseph Myers To: Jakub Jelinek CC: , Richard Biener , Uros Bizjak , Subject: Re: [PATCH 0/5] GCC _BitInt support [PR102989] In-Reply-To: Message-ID: References: <28223020-b396-2018-12bc-54b084d3ee8f@codesourcery.com> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-13.mgc.mentorg.com (139.181.222.13) To svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) X-Spam-Status: No, score=-3105.2 required=5.0 tests=BAYES_00,HEADER_FROM_DIFFERENT_DOMAINS,KAM_DMARC_STATUS,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, 28 Jul 2023, Jakub Jelinek via Gcc-patches wrote: > I had a brief look at libbid and am totally unimpressed. > Seems we don't implement {,unsigned} __int128 <-> _Decimal{32,64,128} > conversions at all (we emit calls to __bid_* functions which don't exist), That's bug 65833. > the library (or the way we configure it) doesn't care about exceptions nor > rounding mode (see following testcase) And this is related to the never-properly-resolved issue about the split of responsibility between libgcc, libdfp and glibc. Decimal floating point has its own rounding mode, set with fe_dec_setround and read with fe_dec_getround (so this test is incorrect). In some cases (e.g. Power), that's a hardware rounding mode. In others, it needs to be implemented in software as a TLS variable. In either case, it's part of the floating-point environment, so should be included in the state manipulated by functions using fenv_t or femode_t. Exceptions are shared with binary floating point. libbid in libgcc has its own TLS rounding mode and exceptions state, but the former isn't connected to fe_dec_setround / fe_dec_getround functions, while the latter isn't the right way to do things when there's hardware exceptions state. libdfp - https://github.com/libdfp/libdfp - is a separate library, not part of libgcc or glibc (and with its own range of correctness bugs) - maintained, but not very actively (maybe more so than the DFP support in GCC - we haven't had a listed DFP maintainer since 2019). It has various standard DFP library functions - maybe not the full C23 set, though some of the TS 18661-2 functions did get added, so it's not just the old TR 24732 set. That includes its own version of the libgcc support, which I think has some more support for using exceptions and rounding modes. It includes the fe_dec_getround and fe_dec_setround functions. It doesn't do anything to help with the issue of including the DFP rounding state in the state manipulated by functions such as fegetenv. Being a separate library probably in turn means that it's less likely to be used (although any code that uses DFP can probably readily enough choose to use a separate library if it wishes). And it introduces issues with linker command line ordering, if the user intends to use libdfp's copy of the functions but the linker processes -lgcc first. For full correctness, at least some functionality (such as the rounding modes and associated inclusion in fenv_t) would probably need to go in glibc. See https://sourceware.org/pipermail/libc-alpha/2019-September/106579.html for more discussion. But if you do put some things in glibc, maybe you still don't want the _BitInt conversions there? Rather, if you keep the _BitInt conversions in libgcc (even when the other support is in glibc), you'd have some libc-provided interface for libgcc code to get the DFP rounding mode from glibc in the case where it's handled in software, like some interfaces already present in the soft-float powerpc case to provide access to its floating-point state from libc (and something along the lines of sfp-machine.h could tell libgcc how to use either that interface or hardware instructions to access the rounding mode and exceptions as needed). > and for integral <-> _Decimal32 > conversions implement them as integral <-> _Decimal64 <-> _Decimal32 > conversions. While in the _Decimal32 -> _Decimal64 -> integral > direction that is probably ok, even if exceptions and rounding (other than > to nearest) were supported, the other direction I'm sure can suffer from > double rounding. Yes, double rounding would be an issue for converting 64-bit integers to _Decimal32 via _Decimal64 (it would be fine to convert 32-bit integers like that since they can be exactly represented in _Decimal64; it would be fine to convert 64-bit integers via _Decimal128). > So, wonder if it wouldn't be better to implement these in the soft-fp > infrastructure which at least has the exception and rounding mode support. > Unlike DPD, decoding BID seems to be about 2 simple tests of the 4 bits > below the sign bit and doing some shifts, so not something one needs a 10MB > of a library for. Now, sure, 5MB out of that are generated tables in Note that representations with too-large significand are defined to be noncanonical representations of zero, so you need to take care of that in decoding BID. > bid_binarydecimal.c, but unfortunately those are static and not in a form > which could be directly fed into multiplication (unless we'd want to go > through conversions to/from strings). > So, it seems to be easier to guess needed power of 10 from number of binary > digits or vice versa, have a small table of powers of 10 (say those which > fit into a limb) and construct larger powers of 10 by multiplicating those > several times, _Decimal128 has exponent up to 6144 which is ~ 2552 bytes > or 319 64-bit limbs, but having a table with all the 6144 powers of ten > would be just huge. In 64-bit limb fit power of ten until 10^19, so we > might need say < 32 multiplications to cover it all (but with the current > 575 bits limitation far less). Perhaps later on write a few selected powers > of 10 as _BitInt to decrease that number. You could e.g. have a table up to 10^(N-1) for some N, and 10^N, 10^2N etc. up to 10^6144 (or rather up to 10^6111, which can then be multiplied by a 34-digit integer significand), so that only one multiplication is needed to get the power of 10 and then a second multiplication by the significand. (Or split into three parts at the cost of an extra multiplication, or multiply the significand by 1, 10, 100, 1000 or 10000 as a multiplication within 128 bits and so only need to compute 10^k for k a multiple of 5, or any number of variations on those themes.) > > For conversion *from _BitInt to DFP*, the _BitInt value needs to be > > expressed in decimal. In the absence of optimized multiplication / > > division for _BitInt, it seems reasonable enough to do this naively > > (repeatedly dividing by a power of 10 that fits in one limb to determine > > base 10^N digits from the least significant end, for example), modulo > > detecting obvious overflow cases up front (if the absolute value is at > > Wouldn't it be cheaper to guess using the 10^3 ~= 2^10 approximation > and instead repeatedly multiply like in the other direction and then just > divide once with remainder? I don't know what's most efficient here, given that it's quadratic in the absence of optimized multiplication / division (so a choice between different approaches that take quadratic time). -- Joseph S. Myers joseph@codesourcery.com