From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpout2.vodafonemail.de (smtpout2.vodafonemail.de [145.253.239.133]) by sourceware.org (Postfix) with ESMTPS id F19263858C2C for ; Mon, 23 Aug 2021 17:32:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org F19263858C2C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nexgo.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nexgo.de Received: from smtp.vodafone.de (smtpa04.fra-mediabeam.com [10.2.0.35]) by smtpout2.vodafonemail.de (Postfix) with ESMTP id 238A4120ABE; Mon, 23 Aug 2021 19:32:41 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nexgo.de; s=vfde-smtpout-mb-15sep; t=1629739961; bh=TWXqcC4pz/Oe8B3NWpqFOY3vejVXqd3hRRd6XIiITH0=; h=From:To:References:In-Reply-To:Subject:Date; b=qk4Y6uL97hlirbnzUTBteaAjmnuwmQOjJ7ZrVxlDhVreXBmcNrkNxOX41Oo2Ob6dG ETNZhPg1lCFIVMu4gSPilV1QDzWBK011PioR3ZgURcM7BuARQSwWun5kzWWiTvduTj mRIkLvqU8tQM4xELD2liVfguyDtRDq+Vd2jvyq58= Received: from H270 (p5b38f1bc.dip0.t-ipconnect.de [91.56.241.188]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by smtp.vodafone.de (Postfix) with ESMTPSA id 4713B1401A5; Mon, 23 Aug 2021 17:32:40 +0000 (UTC) Message-ID: From: "Stefan Kanthak" To: , "Adhemerval Zanella" References: <4DD65B114A174A35AC6960DD2104BDE7@H270> <4c8ee26d-764e-736f-c3d6-5728e54c4c0f@linaro.org> <52E35AACEB174FDDAA3697DE66BB6ACA@H270> <3F07DF81FC2040E69CB83A78EDE05BB7@H270> <0978c043-b32b-ecf8-5cfe-de31c473bb4d@linaro.org> In-Reply-To: <0978c043-b32b-ecf8-5cfe-de31c473bb4d@linaro.org> Subject: Re: Twiddling with 64-bit values as 2 ints; Date: Mon, 23 Aug 2021 19:32:11 +0200 Organization: Me, myself & IT MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Windows Mail 6.0.6002.18197 X-MimeOLE: Produced By Microsoft MimeOLE V6.1.7601.24158 X-purgate-type: clean X-purgate-Ad: Categorized by eleven eXpurgate (R) http://www.eleven.de X-purgate: This mail is considered clean (visit http://www.eleven.de for further information) X-purgate: clean X-purgate-size: 1666 X-purgate-ID: 155817::1629739960-00006056-21481F60/0/0 X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-help@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-help mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Aug 2021 17:32:52 -0000 Adhemerval Zanella wrote: > On 23/08/2021 12:37, Stefan Kanthak wrote: >> Adhemerval Zanella wrote: >> >>> On 23/08/2021 10:18, Stefan Kanthak wrote: >>>> Adhemerval Zanella wrote: >>>> The simple implementation I showed in my initial post improved the >>>> throughput in my benchmark (on AMD64) by an order of magnitude. >>>> In Szabolcs Nagy benchmark measuring latency it took 0.04ns/call >>>> longer (5.72ns vs. 5.68ns) -- despite the POOR job GCC does on FP. >>> >>> Your implementation triggered a lot of regression, >> >> The initial, FP-preferring code was a demonstration, not a patch. > > Right, but it does do not much sense comparing performance numbers with > an implementation that adds a lot of regressions. This argument also holds for a correct FP-preferring implementation due to the POOR code GCC currently generates: the 4 superfluous FP-comparisions plus conditional branches GCC generates have worse runtime than the missing code to handle fenv/underflow/overflow/errno. [...] >> Having dedicated implementations for different architectures is even more >> costly! >> My intention/proposal is to have at most two different generic implementations, >> one using integer bit-twiddling wherever possible, thus supporting soft-fp well, >> the second using floating-point wherever possible, thus supporting modern >> hardware well. > > The only reservation I have for such approach it it would add some more maintenance > and testing. Insert "wherever needed" before/after "wherever possible". Stefan