From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 78325 invoked by alias); 2 Sep 2019 11:32:30 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 78310 invoked by uid 89); 2 Sep 2019 11:32:30 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-16.4 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,SPF_HELO_PASS autolearn=ham version=3.3.1 spammy=sk:__from_, __val, sorry!, limits X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 02 Sep 2019 11:32:28 +0000 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 3A1B518C4279; Mon, 2 Sep 2019 11:32:27 +0000 (UTC) Received: from localhost (unknown [10.33.36.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id DB9C810016EB; Mon, 2 Sep 2019 11:32:26 +0000 (UTC) Date: Mon, 02 Sep 2019 11:32:00 -0000 From: Jonathan Wakely To: Antony Polukhin Cc: libstdc++ , gcc-patches List Subject: Re: [PATCH] Optimize to_chars Message-ID: <20190902113226.GW9487@redhat.com> References: <20190830160126.GO9487@redhat.com> <20190830160857.GP9487@redhat.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="uwB7x3tnyrZQfZJI" Content-Disposition: inline In-Reply-To: <20190830160857.GP9487@redhat.com> X-Clacks-Overhead: GNU Terry Pratchett User-Agent: Mutt/1.12.0 (2019-05-25) X-SW-Source: 2019-09/txt/msg00044.txt.bz2 --uwB7x3tnyrZQfZJI Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline Content-length: 3780 On 30/08/19 17:08 +0100, Jonathan Wakely wrote: >On 30/08/19 17:01 +0100, Jonathan Wakely wrote: >>On 30/08/19 17:27 +0300, Antony Polukhin wrote: >>>Bunch of micro optimizations for std::to_chars: >>>* For base == 8 replacing the lookup in __digits table with arithmetic >>>computations leads to a same CPU cycles for a loop (exchanges two >>>movzx with 3 bit ops https://godbolt.org/z/RTui7m ). However this >>>saves 129 bytes of data and totally avoids a chance of cache misses on >>>__digits. >>>* For base == 16 replacing the lookup in __digits table with >>>arithmetic computations leads to a few additional instructions, but >>>totally avoids a chance of cache misses on __digits (- ~9 cache misses >>>for worst case) and saves 513 bytes of const data. >>>* Replacing __first[pos] and __first[pos - 1] with __first[1] and >>>__first[0] on final iterations saves ~2% of code size. >>>* Removing trailing '\0' from arrays of digits allows the linker to >>>merge the symbols (so that "0123456789abcdefghijklmnopqrstuvwxyz" and >>>"0123456789abcdef" could share the same address). This improves data >>>locality and reduces binary sizes. >>>* Using __detail::__to_chars_len_2 instead of a generic >>>__detail::__to_chars_len makes the operation O(1) instead of O(N). It >>>also makes the code two times shorter ( https://godbolt.org/z/Peq_PG) >>>. >>> >>>In sum: this significantly reduces the size of a binary (for about >>>4KBs only for base-8 conversion https://godbolt.org/z/WPKijS ), deals >>>with latency (CPU cache misses) without changing the iterations count >>>and without adding costly instructions into the loops. >> >>This is great, thanks. >> >>Have you tried comparing the improved code to libc++'s implementation? >>I believe they use precomputed arrays of digits, but they use larger >>arrays that allow 4 bytes to be written at once, which is considerably >>faster (and those precomputed arrays libe in libc++.so not in the >>header). Would we be better off keeping the precomputed arrays and >>expanding them to do 4-byte writes? >> >>Since we don't have a patch to do that, I think I'll commit yours. We >>can always go back to precomputed arrays later if somebody does that >>work. >> >>My only comments are on the changelog: >> >>>Changelog: >>> * include/std/charconv (__detail::__to_chars_8, >>> __detail::__to_chars_16): Replace array of precomputed digits >> >>When the list of changed functions is split across lines it should be >>like this: >> >> * include/std/charconv (__detail::__to_chars_8) >> (__detail::__to_chars_16): Replace array of precomputed digits >> >>i.e close the parentheses before the line break, and reopen on the >>next line. >> >>> with arithmetic operations to avoid CPU cache misses. Remove >>> zero termination from array of digits to allow symbol merge with >>> generic implementation of __detail::__to_chars. Replace final >>> offsets with constants. Use __detail::__to_chars_len_2 instead >>> of a generic __detail::__to_chars_len. >>> * include/std/charconv (__detail::__to_chars): Remove >> >>Don't repeat the asterisk and filename for later changes in the same >>file, i.e. >> >> (__detail::__to_chars): Remove zero termination from array of digits. >> (__detail::__to_chars_2): Leading digit is always '1'. >> >>There's no changelog entry for the changes to __to_chars_len_8 and >>__to_chars_len_16. > >Oh, there's no separate __to_chars_len_16 function anyway, and you did >mention it as "Use __detail::__to_chars_len_2 instead ..." - sorry! > >I think we might as well inline __to_chars_len_8 into __to_chars_8, >there's not much benefit to having a separate function. I'll do that >as a follow up patch. I've committed this patch to simplify the code a bit. Tested x86_64-linux, committed to trunk. --uwB7x3tnyrZQfZJI Content-Type: text/x-patch; charset=us-ascii Content-Disposition: attachment; filename="patch.txt" Content-length: 3344 commit 8ea5798914e9cf5b823d245d85db4d76e0631d0e Author: Jonathan Wakely Date: Fri Aug 30 17:27:48 2019 +0100 Minor simplifications for std::to_chars implementation * include/std/charconv (__detail::__to_chars_2_len): Use std::log2p1. (__detail::__to_chars_8_len): Remove. (__detail::__to_chars_8): Inline length calculation here. (__detail::__from_chars_binary): Use numeric_limits instead of CHAR_BIT. diff --git a/libstdc++-v3/include/std/charconv b/libstdc++-v3/include/std/charconv index 4e94c39656d..ceefa3b6778 100644 --- a/libstdc++-v3/include/std/charconv +++ b/libstdc++-v3/include/std/charconv @@ -35,8 +35,9 @@ #include #include -#include -#include // for __to_chars_len, __to_chars_10_impl +#include // for __log2p1 +#include // for isdigit +#include // for __to_chars_len, __to_chars_10_impl #include // for std::errc // Define when floating point is supported: #define __cpp_lib_to_chars 201611L @@ -96,43 +97,7 @@ namespace __detail template constexpr unsigned __to_chars_len_2(_Tp __value) noexcept - { - static_assert(is_integral<_Tp>::value, "implementation bug"); - static_assert(is_unsigned<_Tp>::value, "implementation bug"); - - constexpr size_t __nbits = __CHAR_BIT__ * sizeof(_Tp); - - // N.B. __builtin_clzll is undefined if __value == 0, but std::to_chars - // handles zero values directly. - - // For sizeof(_Tp) > 1 this is an order of magnitude faster than - // the generic __to_chars_len. - return __nbits - - (__builtin_clzll(__value) - - ((__CHAR_BIT__ * sizeof(long long)) - __nbits)); - } - - template - constexpr unsigned - __to_chars_len_8(_Tp __value) noexcept - { - static_assert(is_integral<_Tp>::value, "implementation bug"); - static_assert(is_unsigned<_Tp>::value, "implementation bug"); - - constexpr size_t __nbits = __CHAR_BIT__ * sizeof(_Tp); - - if _GLIBCXX17_CONSTEXPR (__nbits <= 16) - { - return __value > 077777u ? 6u - : __value > 07777u ? 5u - : __value > 0777u ? 4u - : __value > 077u ? 3u - : __value > 07u ? 2u - : 1u; - } - else - return (__to_chars_len_2(__value) + 2) / 3; - } + { return std::__log2p1(__value); } // Generic implementation for arbitrary bases. template @@ -255,8 +220,19 @@ namespace __detail static_assert(is_unsigned<_Tp>::value, "implementation bug"); to_chars_result __res; + unsigned __len; - const unsigned __len = __to_chars_len_8(__val); + if _GLIBCXX17_CONSTEXPR (numeric_limits<_Tp>::digits <= 16) + { + __len = __val > 077777u ? 6u + : __val > 07777u ? 5u + : __val > 0777u ? 4u + : __val > 077u ? 3u + : __val > 07u ? 2u + : 1u; + } + else + __len = (__to_chars_len_2(__val) + 2) / 3; if (__builtin_expect((__last - __first) < __len, 0)) { @@ -397,7 +373,7 @@ namespace __detail __i++; } __first += __i; - return __i <= (sizeof(_Tp) * __CHAR_BIT__); + return __i <= numeric_limits<_Tp>::digits; } /// std::from_chars implementation for integers in bases 3 to 10. --uwB7x3tnyrZQfZJI--