From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <adhemerval.zanella@linaro.org>
Received: from mail-pf1-x42b.google.com (mail-pf1-x42b.google.com
 [IPv6:2607:f8b0:4864:20::42b])
 by sourceware.org (Postfix) with ESMTPS id 04DFC3858417
 for <libc-help@sourceware.org>; Mon, 23 Aug 2021 14:11:34 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 04DFC3858417
Received: by mail-pf1-x42b.google.com with SMTP id j187so15467309pfg.4
 for <libc-help@sourceware.org>; Mon, 23 Aug 2021 07:11:33 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:subject:to:references:from:message-id:date
 :user-agent:mime-version:in-reply-to:content-language
 :content-transfer-encoding;
 bh=KaZACYty1vU+JX09ZBBBTD0+iiavThNYni7dA/RQHls=;
 b=q306P+xV0PkFWct9waO7iYm8J5ACBo53MkWTN20ZnkaBh7mF1uebpFywLbAZkPz6Y/
 v36P1BNHex/nhL5+jT3zJH6DG74UDNL4myGzcRWGK3EY2aLCS59QJL5TWb2nLZRD1+Hq
 0xAhMvmUQDzAPXOpanSTbzMWBsezdNkHXpjXJSG9r3dzRv/pb/3KWWxN4lX1SGnOQjhj
 1JfGltGIuhUwYDsVynsM7nhuP9WXvRa1shad60vTIZxPXeZZy9gdzC5H6ma3IV+Xxm4p
 0KKlMxZqBYRYimuTKF8DgSgLcK0Zujk03oFeILphdun0SiN6yiCTG5MaQlh/mRiebHgI
 vFXg==
X-Gm-Message-State: AOAM532v9R9RmPtQg9R1wCd8+TldYl/AoU5KRTIC9slml8T7Fl56RyK3
 q/q6mwmE2yx3B7fZvxUAlPLnKHAceEtbYA==
X-Google-Smtp-Source: ABdhPJzI0H/7MScSPRZucc8fyEybsKplfbti7EQ9Xrb3QjX27tKmZPKwXrS655L9B/Ra8P2+WYCITg==
X-Received: by 2002:aa7:86c3:0:b0:3eb:1857:8ae9 with SMTP id
 h3-20020aa786c3000000b003eb18578ae9mr10717557pfo.52.1629727891925; 
 Mon, 23 Aug 2021 07:11:31 -0700 (PDT)
Received: from ?IPv6:2804:431:c7ca:cd83:c38b:b50d:5d9a:43d4?
 ([2804:431:c7ca:cd83:c38b:b50d:5d9a:43d4])
 by smtp.gmail.com with ESMTPSA id a21sm18456368pgl.51.2021.08.23.07.11.30
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Mon, 23 Aug 2021 07:11:31 -0700 (PDT)
Subject: Re: Twiddling with 64-bit values as 2 ints;
To: Stefan Kanthak <stefan.kanthak@nexgo.de>, libc-help@sourceware.org
References: <4DD65B114A174A35AC6960DD2104BDE7@H270>
 <4c8ee26d-764e-736f-c3d6-5728e54c4c0f@linaro.org>
 <52E35AACEB174FDDAA3697DE66BB6ACA@H270>
From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Message-ID: <f8460d66-dec6-9852-3710-8e5d6627df54@linaro.org>
Date: Mon, 23 Aug 2021 11:11:28 -0300
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
 Thunderbird/78.11.0
MIME-Version: 1.0
In-Reply-To: <52E35AACEB174FDDAA3697DE66BB6ACA@H270>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-6.6 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_NONE,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: libc-help@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-help mailing list <libc-help.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-help>,
 <mailto:libc-help-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-help/>
List-Help: <mailto:libc-help-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-help>,
 <mailto:libc-help-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Mon, 23 Aug 2021 14:11:44 -0000


On 23/08/2021 10:18, Stefan Kanthak wrote:
> Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
> 
>> On 21/08/2021 10:34, Stefan Kanthak wrote:
>>>
>>> (Heretic.-) questions:
>>> - why does glibc still employ such ugly code?
>>> - Why doesn't glibc take advantage of 64-bit integers in such code?
>>
>> Because no one cared to adjust the implementation.  Recently Wilco
>> has removed a lot of old code that still uses 32-bit instead of 64-bit
>> bo bit twinddling in floating-pointer implementation (check caa884dda7
>> and 9e97f239eae1f2).
> 
> That's good to hear.
> 
>> I think we should move to use a simplest code assuming 64-bit CPU
> 
> D'accord.
> And there's a second direction where you might move: almost all CPUs
> have separate general purpose registers and floating-point registers.
> Bit-twiddling generally needs extra (and sometimes slow) transfers
> between them.
> In 32-bit environment, where arguments are typically passed on the
> stack, at least loading an argument from the stack into a GPR or FPR
> makes no difference.
> In 64-bit environment, where arguments are passed in registers, they
> should be operated on in these registers.
> 
> So: why not implement routines like nextafter() without bit-twiddling,
> using floating-point as far as possible for architectures where this
> gives better results?

Mainly because some math routines are not performance critical in the
sense they are usually not hotspots and for these I would prefer the 
simplest code that work with reasonable performance independently of
the underlying ABI or architecture (using integer operation might be
be for soft-fp ABI for instance).

For symbols are might be performance critical, we do have more optimized
version.  Szabolcs and Wilco spent considerable time to tune a lot of
math functions and to remove the slow code path; also for some routines
we have internal defines that map then to compiler builtin when we know
that compiler and architecture allows us to do so (check the rounding
routines or sqrt for instance).

Recently we are aiming to avoid arch-specific code for complex routines,
and prefer C implementation that leverage the compiler support.  It makes
a *much* maintainable code and without the need to keep evaluating the 
routines on each architecture new iterations (as some routines proven to
be slower than more well coded generic implementation).

> 
> The simple implementation I showed in my initial post improved the
> throughput in my benchmark (on AMD64) by an order of magnitude.
> In Szabolcs Nagy benchmark measuring latency it took 0.04ns/call
> longer (5.72ns vs. 5.68ns) -- despite the POOR job GCC does on FP.

Your implementation triggered a lot of regression, you will need to sort
this out before considering performance numbers.  Also, we will need
a proper benchmark to evaluate it, as Szabolcs and Wilco has done for
their math work.

> 
> Does GLIBC offer a macro like "PREFER_FP_IMPLEMENTATION" that can be
> used to select between the integer bit-twiddling code and FP-preferring
> code during compilation?

No and I don't think we this would be a good addition.  As before, I would
prefer to have a simple generic implementation that give us a good
performance on modern hardware instead of a configurable one with many
tunables.  The later is increases the maintainable cost (with testing and
performance evaluation).

> 
>> and let the compiler optimize it (which unfortunately gcc is not that
>> smart in all the cases).
> 
> I know, and I just learned that GCC does NOT perform quite some
> optimisations I expect from a mature compiler.
> Quoting Jakub Jelinek on gcc@gcc.gnu.org:
> 
> | GCC doesn't do value range propagation of floating point values, not
> | even the special ones like NaNs, infinities, +/- zeros etc., and without
> | that the earlier ifs aren't taken into account for the earlier code.
> 
> The code I used to demonstrate this deficiency is TOMS 722...
> 
> Stefan
>