From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpout2.vodafonemail.de (smtpout2.vodafonemail.de [145.253.239.133]) by sourceware.org (Postfix) with ESMTPS id BA1913858417 for ; Mon, 23 Aug 2021 13:31:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BA1913858417 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nexgo.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nexgo.de Received: from smtp.vodafone.de (smtpa03.fra-mediabeam.com [10.2.0.34]) by smtpout2.vodafonemail.de (Postfix) with ESMTP id 0CDD0122770; Mon, 23 Aug 2021 15:31:21 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nexgo.de; s=vfde-smtpout-mb-15sep; t=1629725481; bh=REO98EDyk3vQA06XEbZEbD0IwhKy7l3vPXdkunkQt3c=; h=From:To:References:In-Reply-To:Subject:Date; b=Q9G89loW62221L6i/7Dxm5olUe+sYCu6LdtrXtw1h95N/xLqxaVgOakfz784k0bvn qtTp3E70x8DSBuW/hsa9XKFC15QPUj4lVzW9jan/uhCmcuHzTi80/isGCpWpKAAWM2 I5IoFfFwrJYatYTK5447vj2KJZmzyXM9tiQa6+K4= Received: from H270 (p5b38f1bc.dip0.t-ipconnect.de [91.56.241.188]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by smtp.vodafone.de (Postfix) with ESMTPSA id 83DDA1401F0; Mon, 23 Aug 2021 13:31:20 +0000 (UTC) Message-ID: <52E35AACEB174FDDAA3697DE66BB6ACA@H270> From: "Stefan Kanthak" To: , "Adhemerval Zanella" References: <4DD65B114A174A35AC6960DD2104BDE7@H270> <4c8ee26d-764e-736f-c3d6-5728e54c4c0f@linaro.org> In-Reply-To: <4c8ee26d-764e-736f-c3d6-5728e54c4c0f@linaro.org> Subject: Re: Twiddling with 64-bit values as 2 ints; Date: Mon, 23 Aug 2021 15:18:50 +0200 Organization: Me, myself & IT MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Windows Mail 6.0.6002.18197 X-MimeOLE: Produced By Microsoft MimeOLE V6.1.7601.24158 X-purgate-type: clean X-purgate-Ad: Categorized by eleven eXpurgate (R) http://www.eleven.de X-purgate: This mail is considered clean (visit http://www.eleven.de for further information) X-purgate: clean X-purgate-size: 2277 X-purgate-ID: 155817::1629725480-00000B26-8E2A912F/0/0 X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-help@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-help mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Aug 2021 13:31:33 -0000 Adhemerval Zanella wrote: > On 21/08/2021 10:34, Stefan Kanthak wrote: >> >> (Heretic.-) questions: >> - why does glibc still employ such ugly code? >> - Why doesn't glibc take advantage of 64-bit integers in such code? > > Because no one cared to adjust the implementation. Recently Wilco > has removed a lot of old code that still uses 32-bit instead of 64-bit > bo bit twinddling in floating-pointer implementation (check caa884dda7 > and 9e97f239eae1f2). That's good to hear. > I think we should move to use a simplest code assuming 64-bit CPU D'accord. And there's a second direction where you might move: almost all CPUs have separate general purpose registers and floating-point registers. Bit-twiddling generally needs extra (and sometimes slow) transfers between them. In 32-bit environment, where arguments are typically passed on the stack, at least loading an argument from the stack into a GPR or FPR makes no difference. In 64-bit environment, where arguments are passed in registers, they should be operated on in these registers. So: why not implement routines like nextafter() without bit-twiddling, using floating-point as far as possible for architectures where this gives better results? The simple implementation I showed in my initial post improved the throughput in my benchmark (on AMD64) by an order of magnitude. In Szabolcs Nagy benchmark measuring latency it took 0.04ns/call longer (5.72ns vs. 5.68ns) -- despite the POOR job GCC does on FP. Does GLIBC offer a macro like "PREFER_FP_IMPLEMENTATION" that can be used to select between the integer bit-twiddling code and FP-preferring code during compilation? > and let the compiler optimize it (which unfortunately gcc is not that > smart in all the cases). I know, and I just learned that GCC does NOT perform quite some optimisations I expect from a mature compiler. Quoting Jakub Jelinek on gcc@gcc.gnu.org: | GCC doesn't do value range propagation of floating point values, not | even the special ones like NaNs, infinities, +/- zeros etc., and without | that the earlier ifs aren't taken into account for the earlier code. The code I used to demonstrate this deficiency is TOMS 722... Stefan