From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 82849 invoked by alias); 14 Sep 2018 13:17:17 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 82831 invoked by uid 89); 14 Sep 2018 13:17:17 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=dealt, Hx-languages-length:1321 X-HELO: relay1.mentorg.com Date: Fri, 14 Sep 2018 13:17:00 -0000 From: Joseph Myers To: Wilco Dijkstra CC: nd , "libc-alpha@sourceware.org" Subject: Re: Use floor functions not __floor functions in glibc libm In-Reply-To: Message-ID: References: User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" X-SW-Source: 2018-09/txt/msg00209.txt.bz2 On Fri, 14 Sep 2018, Wilco Dijkstra wrote: > Going via the PLT is expensive and it would be stupid to not inline simple > functions like floor, lrint etc. I did a quick experiment on floorf: > On AArch64 a tight loop calling floorf is at least twice as fast than a library > call. On x64 the PLT overhead is at least 2.5 times. > > The SSE2 floor instruction is twice as slow as the SSE4 version, however > due to the high PLT call overhead, inlining the SSE2 version is still 25% > faster than calling floorf using the SSE4 instruction. So inlining these > functions is always better. Thanks for the benchmarking - I've now committed the floor and rint patches. (Three more will be needed to deal with round / trunc / ceil where the __* variants have macros in the powerpc math_private.h. Once that's been done and copysign has been dealt with as well, it will be possible to look for and remove unused math_private.h includes based on an actual set of APIs provided by math_private.h - and with the expectation that wrongly removing an include will result in a build failure - rather than needing to allow for those __* macros where removing an include could result in less efficient code but no build failure.) -- Joseph S. Myers joseph@codesourcery.com