From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk1-x72c.google.com (mail-qk1-x72c.google.com [IPv6:2607:f8b0:4864:20::72c]) by sourceware.org (Postfix) with ESMTPS id B1DBA3857C52 for ; Mon, 23 Aug 2021 18:24:26 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B1DBA3857C52 Received: by mail-qk1-x72c.google.com with SMTP id a66so1462568qkc.1 for ; Mon, 23 Aug 2021 11:24:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=afz5zQuGhP4RETCWFXGQBKBrnDqQYHL0HBPGJePH6LE=; b=jJ4JOVV8IBUV0dY3XHCiPwlO1xTamoqLJ6tq+ca/vP+Jobek7tI+6mUuN7fvbcehTu gZw1uKvwjPe1w7L3iKL7SPLbyn7tgLvh1wyZi+iKySWGPqU/ymrnFRhTpA7MAAXhvK/8 qpQNuw5B4oslnnHFGgFmfA/iUhoACqVJRFI79sZsd9Kx469BxLSzRgjCdUimr8ebD/dM voHzoIsLDQjGbX128RBDKINT12RBxDoie2N0V+iKnrEztAUU6SgLAXpoc1ROr9yefXwv zwyKL0bB8zPuC7Si5V/7LTrKMqLCIAxmDu+7jr06nHcRQYmeOGyIinoRak02DBpgT1sh 2cdQ== X-Gm-Message-State: AOAM5308IpPHwSMpRRKPtqv5LzRZO582N4e7O4BA7HM8e1b/WFBrQYZ7 x/Z6AiylYbR9Sv8VVZfuwuhfHa/rC/FD7A== X-Google-Smtp-Source: ABdhPJwJwx+zIhGaV9Q6KoEHFEq/88t+NyDipdnF49Z0bi7AR38AUNzxEz1TSe+mCMQkoAwfQIFe0g== X-Received: by 2002:a37:541:: with SMTP id 62mr22387160qkf.478.1629743066187; Mon, 23 Aug 2021 11:24:26 -0700 (PDT) Received: from ?IPv6:2804:431:c7ca:cd83:c38b:b50d:5d9a:43d4? ([2804:431:c7ca:cd83:c38b:b50d:5d9a:43d4]) by smtp.gmail.com with ESMTPSA id h12sm5273583qth.1.2021.08.23.11.24.24 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 23 Aug 2021 11:24:25 -0700 (PDT) Subject: Re: Twiddling with 64-bit values as 2 ints; To: Stefan Kanthak , libc-help@sourceware.org References: <4DD65B114A174A35AC6960DD2104BDE7@H270> <4c8ee26d-764e-736f-c3d6-5728e54c4c0f@linaro.org> <52E35AACEB174FDDAA3697DE66BB6ACA@H270> <3F07DF81FC2040E69CB83A78EDE05BB7@H270> <0978c043-b32b-ecf8-5cfe-de31c473bb4d@linaro.org> From: Adhemerval Zanella Message-ID: Date: Mon, 23 Aug 2021 15:24:24 -0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-6.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-help@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-help mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Aug 2021 18:24:37 -0000 On 23/08/2021 14:32, Stefan Kanthak wrote: > Adhemerval Zanella wrote: > >> On 23/08/2021 12:37, Stefan Kanthak wrote: >>> Adhemerval Zanella wrote: >>> >>>> On 23/08/2021 10:18, Stefan Kanthak wrote: >>>>> Adhemerval Zanella wrote: > >>>>> The simple implementation I showed in my initial post improved the >>>>> throughput in my benchmark (on AMD64) by an order of magnitude. >>>>> In Szabolcs Nagy benchmark measuring latency it took 0.04ns/call >>>>> longer (5.72ns vs. 5.68ns) -- despite the POOR job GCC does on FP. >>>> >>>> Your implementation triggered a lot of regression, >>> >>> The initial, FP-preferring code was a demonstration, not a patch. >> >> Right, but it does do not much sense comparing performance numbers with >> an implementation that adds a lot of regressions. > > This argument also holds for a correct FP-preferring implementation due to > the POOR code GCC currently generates: the 4 superfluous FP-comparisions > plus conditional branches GCC generates have worse runtime than the missing > code to handle fenv/underflow/overflow/errno. In any case, please come up with number only *after* you fix any regression on testcases. > > [...] > >>> Having dedicated implementations for different architectures is even more >>> costly! >>> My intention/proposal is to have at most two different generic implementations, >>> one using integer bit-twiddling wherever possible, thus supporting soft-fp well, >>> the second using floating-point wherever possible, thus supporting modern >>> hardware well. >> >> The only reservation I have for such approach it it would add some more maintenance >> and testing. > > Insert "wherever needed" before/after "wherever possible". > > Stefan >