From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf1-x42b.google.com (mail-pf1-x42b.google.com [IPv6:2607:f8b0:4864:20::42b]) by sourceware.org (Postfix) with ESMTPS id 04DFC3858417 for ; Mon, 23 Aug 2021 14:11:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 04DFC3858417 Received: by mail-pf1-x42b.google.com with SMTP id j187so15467309pfg.4 for ; Mon, 23 Aug 2021 07:11:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=KaZACYty1vU+JX09ZBBBTD0+iiavThNYni7dA/RQHls=; b=q306P+xV0PkFWct9waO7iYm8J5ACBo53MkWTN20ZnkaBh7mF1uebpFywLbAZkPz6Y/ v36P1BNHex/nhL5+jT3zJH6DG74UDNL4myGzcRWGK3EY2aLCS59QJL5TWb2nLZRD1+Hq 0xAhMvmUQDzAPXOpanSTbzMWBsezdNkHXpjXJSG9r3dzRv/pb/3KWWxN4lX1SGnOQjhj 1JfGltGIuhUwYDsVynsM7nhuP9WXvRa1shad60vTIZxPXeZZy9gdzC5H6ma3IV+Xxm4p 0KKlMxZqBYRYimuTKF8DgSgLcK0Zujk03oFeILphdun0SiN6yiCTG5MaQlh/mRiebHgI vFXg== X-Gm-Message-State: AOAM532v9R9RmPtQg9R1wCd8+TldYl/AoU5KRTIC9slml8T7Fl56RyK3 q/q6mwmE2yx3B7fZvxUAlPLnKHAceEtbYA== X-Google-Smtp-Source: ABdhPJzI0H/7MScSPRZucc8fyEybsKplfbti7EQ9Xrb3QjX27tKmZPKwXrS655L9B/Ra8P2+WYCITg== X-Received: by 2002:aa7:86c3:0:b0:3eb:1857:8ae9 with SMTP id h3-20020aa786c3000000b003eb18578ae9mr10717557pfo.52.1629727891925; Mon, 23 Aug 2021 07:11:31 -0700 (PDT) Received: from ?IPv6:2804:431:c7ca:cd83:c38b:b50d:5d9a:43d4? ([2804:431:c7ca:cd83:c38b:b50d:5d9a:43d4]) by smtp.gmail.com with ESMTPSA id a21sm18456368pgl.51.2021.08.23.07.11.30 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 23 Aug 2021 07:11:31 -0700 (PDT) Subject: Re: Twiddling with 64-bit values as 2 ints; To: Stefan Kanthak , libc-help@sourceware.org References: <4DD65B114A174A35AC6960DD2104BDE7@H270> <4c8ee26d-764e-736f-c3d6-5728e54c4c0f@linaro.org> <52E35AACEB174FDDAA3697DE66BB6ACA@H270> From: Adhemerval Zanella Message-ID: Date: Mon, 23 Aug 2021 11:11:28 -0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <52E35AACEB174FDDAA3697DE66BB6ACA@H270> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-6.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-help@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-help mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Aug 2021 14:11:44 -0000 On 23/08/2021 10:18, Stefan Kanthak wrote: > Adhemerval Zanella wrote: > >> On 21/08/2021 10:34, Stefan Kanthak wrote: >>> >>> (Heretic.-) questions: >>> - why does glibc still employ such ugly code? >>> - Why doesn't glibc take advantage of 64-bit integers in such code? >> >> Because no one cared to adjust the implementation. Recently Wilco >> has removed a lot of old code that still uses 32-bit instead of 64-bit >> bo bit twinddling in floating-pointer implementation (check caa884dda7 >> and 9e97f239eae1f2). > > That's good to hear. > >> I think we should move to use a simplest code assuming 64-bit CPU > > D'accord. > And there's a second direction where you might move: almost all CPUs > have separate general purpose registers and floating-point registers. > Bit-twiddling generally needs extra (and sometimes slow) transfers > between them. > In 32-bit environment, where arguments are typically passed on the > stack, at least loading an argument from the stack into a GPR or FPR > makes no difference. > In 64-bit environment, where arguments are passed in registers, they > should be operated on in these registers. > > So: why not implement routines like nextafter() without bit-twiddling, > using floating-point as far as possible for architectures where this > gives better results? Mainly because some math routines are not performance critical in the sense they are usually not hotspots and for these I would prefer the simplest code that work with reasonable performance independently of the underlying ABI or architecture (using integer operation might be be for soft-fp ABI for instance). For symbols are might be performance critical, we do have more optimized version. Szabolcs and Wilco spent considerable time to tune a lot of math functions and to remove the slow code path; also for some routines we have internal defines that map then to compiler builtin when we know that compiler and architecture allows us to do so (check the rounding routines or sqrt for instance). Recently we are aiming to avoid arch-specific code for complex routines, and prefer C implementation that leverage the compiler support. It makes a *much* maintainable code and without the need to keep evaluating the routines on each architecture new iterations (as some routines proven to be slower than more well coded generic implementation). > > The simple implementation I showed in my initial post improved the > throughput in my benchmark (on AMD64) by an order of magnitude. > In Szabolcs Nagy benchmark measuring latency it took 0.04ns/call > longer (5.72ns vs. 5.68ns) -- despite the POOR job GCC does on FP. Your implementation triggered a lot of regression, you will need to sort this out before considering performance numbers. Also, we will need a proper benchmark to evaluate it, as Szabolcs and Wilco has done for their math work. > > Does GLIBC offer a macro like "PREFER_FP_IMPLEMENTATION" that can be > used to select between the integer bit-twiddling code and FP-preferring > code during compilation? No and I don't think we this would be a good addition. As before, I would prefer to have a simple generic implementation that give us a good performance on modern hardware instead of a configurable one with many tunables. The later is increases the maintainable cost (with testing and performance evaluation). > >> and let the compiler optimize it (which unfortunately gcc is not that >> smart in all the cases). > > I know, and I just learned that GCC does NOT perform quite some > optimisations I expect from a mature compiler. > Quoting Jakub Jelinek on gcc@gcc.gnu.org: > > | GCC doesn't do value range propagation of floating point values, not > | even the special ones like NaNs, infinities, +/- zeros etc., and without > | that the earlier ifs aren't taken into account for the earlier code. > > The code I used to demonstrate this deficiency is TOMS 722... > > Stefan >