public inbox for libstdc++@gcc.gnu.org
 help / color / mirror / Atom feed
From: Jonathan Wakely <jwakely@redhat.com>
To: Antony Polukhin <antoshkka@gmail.com>
Cc: libstdc++ <libstdc++@gcc.gnu.org>,
	gcc-patches List <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] Optimize to_chars
Date: Mon, 02 Sep 2019 11:32:00 -0000	[thread overview]
Message-ID: <20190902113226.GW9487@redhat.com> (raw)
In-Reply-To: <20190830160857.GP9487@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 3780 bytes --]

On 30/08/19 17:08 +0100, Jonathan Wakely wrote:
>On 30/08/19 17:01 +0100, Jonathan Wakely wrote:
>>On 30/08/19 17:27 +0300, Antony Polukhin wrote:
>>>Bunch of micro optimizations for std::to_chars:
>>>* For base == 8 replacing the lookup in __digits table with arithmetic
>>>computations leads to a same CPU cycles for a loop (exchanges two
>>>movzx with 3 bit ops https://godbolt.org/z/RTui7m ). However this
>>>saves 129 bytes of data and totally avoids a chance of cache misses on
>>>__digits.
>>>* For base == 16 replacing the lookup in __digits table with
>>>arithmetic computations leads to a few additional instructions, but
>>>totally avoids a chance of cache misses on __digits (- ~9 cache misses
>>>for worst case) and saves 513 bytes of const data.
>>>* Replacing __first[pos] and __first[pos - 1] with __first[1] and
>>>__first[0] on final iterations saves ~2% of code size.
>>>* Removing trailing '\0' from arrays of digits allows the linker to
>>>merge the symbols (so that "0123456789abcdefghijklmnopqrstuvwxyz" and
>>>"0123456789abcdef" could share the same address). This improves data
>>>locality and reduces binary sizes.
>>>* Using __detail::__to_chars_len_2 instead of a generic
>>>__detail::__to_chars_len makes the operation O(1) instead of O(N). It
>>>also makes the code two times shorter ( https://godbolt.org/z/Peq_PG)
>>>.
>>>
>>>In sum: this significantly reduces the size of a binary (for about
>>>4KBs only for base-8 conversion https://godbolt.org/z/WPKijS ), deals
>>>with latency (CPU cache misses) without changing the iterations count
>>>and without adding costly instructions into the loops.
>>
>>This is great, thanks.
>>
>>Have you tried comparing the improved code to libc++'s implementation?
>>I believe they use precomputed arrays of digits, but they use larger
>>arrays that allow 4 bytes to be written at once, which is considerably
>>faster (and those precomputed arrays libe in libc++.so not in the
>>header). Would we be better off keeping the precomputed arrays and
>>expanding them to do 4-byte writes?
>>
>>Since we don't have a patch to do that, I think I'll commit yours. We
>>can always go back to precomputed arrays later if somebody does that
>>work.
>>
>>My only comments are on the changelog:
>>
>>>Changelog:
>>>  * include/std/charconv (__detail::__to_chars_8,
>>>  __detail::__to_chars_16): Replace array of precomputed digits
>>
>>When the list of changed functions is split across lines it should be
>>like this:
>>
>>  * include/std/charconv (__detail::__to_chars_8)
>>  (__detail::__to_chars_16): Replace array of precomputed digits
>>
>>i.e close the parentheses before the line break, and reopen on the
>>next line.
>>
>>>  with arithmetic operations to avoid CPU cache misses. Remove
>>>  zero termination from array of digits to allow symbol merge with
>>>  generic implementation of __detail::__to_chars. Replace final
>>>  offsets with constants. Use __detail::__to_chars_len_2 instead
>>>  of a generic __detail::__to_chars_len.
>>>  * include/std/charconv (__detail::__to_chars): Remove
>>
>>Don't repeat the asterisk and filename for later changes in the same
>>file, i.e.
>>
>>  (__detail::__to_chars): Remove zero termination from array of digits.
>>  (__detail::__to_chars_2): Leading digit is always '1'.
>>
>>There's no changelog entry for the changes to __to_chars_len_8 and
>>__to_chars_len_16.
>
>Oh, there's no separate __to_chars_len_16 function anyway, and you did
>mention it as "Use __detail::__to_chars_len_2 instead ..." - sorry!
>
>I think we might as well inline __to_chars_len_8 into __to_chars_8,
>there's not much benefit to having a separate function. I'll do that
>as a follow up patch.

I've committed this patch to simplify the code a bit.

Tested x86_64-linux, committed to trunk.


[-- Attachment #2: patch.txt --]
[-- Type: text/x-patch, Size: 3344 bytes --]

commit 8ea5798914e9cf5b823d245d85db4d76e0631d0e
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Fri Aug 30 17:27:48 2019 +0100

    Minor simplifications for std::to_chars implementation
    
            * include/std/charconv (__detail::__to_chars_2_len): Use std::log2p1.
            (__detail::__to_chars_8_len): Remove.
            (__detail::__to_chars_8): Inline length calculation here.
            (__detail::__from_chars_binary): Use numeric_limits instead of
            CHAR_BIT.

diff --git a/libstdc++-v3/include/std/charconv b/libstdc++-v3/include/std/charconv
index 4e94c39656d..ceefa3b6778 100644
--- a/libstdc++-v3/include/std/charconv
+++ b/libstdc++-v3/include/std/charconv
@@ -35,8 +35,9 @@
 
 #include <type_traits>
 #include <limits>
-#include <cctype>
-#include <bits/charconv.h> // for __to_chars_len, __to_chars_10_impl
+#include <bit>			// for __log2p1
+#include <cctype>		// for isdigit
+#include <bits/charconv.h>	// for __to_chars_len, __to_chars_10_impl
 #include <bits/error_constants.h> // for std::errc
 
 // Define when floating point is supported: #define __cpp_lib_to_chars 201611L
@@ -96,43 +97,7 @@ namespace __detail
   template<typename _Tp>
     constexpr unsigned
     __to_chars_len_2(_Tp __value) noexcept
-    {
-      static_assert(is_integral<_Tp>::value, "implementation bug");
-      static_assert(is_unsigned<_Tp>::value, "implementation bug");
-
-      constexpr size_t __nbits = __CHAR_BIT__ * sizeof(_Tp);
-
-      // N.B. __builtin_clzll is undefined if __value == 0, but std::to_chars
-      // handles zero values directly.
-
-      // For sizeof(_Tp) > 1 this is an order of magnitude faster than
-      // the generic __to_chars_len.
-      return __nbits
-	- (__builtin_clzll(__value)
-	    - ((__CHAR_BIT__ * sizeof(long long)) - __nbits));
-    }
-
-  template<typename _Tp>
-    constexpr unsigned
-    __to_chars_len_8(_Tp __value) noexcept
-    {
-      static_assert(is_integral<_Tp>::value, "implementation bug");
-      static_assert(is_unsigned<_Tp>::value, "implementation bug");
-
-      constexpr size_t __nbits = __CHAR_BIT__ * sizeof(_Tp);
-
-      if _GLIBCXX17_CONSTEXPR (__nbits <= 16)
-	{
-	  return __value > 077777u ? 6u
-	    : __value > 07777u ? 5u
-	    : __value > 0777u ? 4u
-	    : __value > 077u ? 3u
-	    : __value > 07u ? 2u
-	    : 1u;
-	}
-      else
-	return (__to_chars_len_2(__value) + 2) / 3;
-    }
+    { return std::__log2p1(__value); }
 
   // Generic implementation for arbitrary bases.
   template<typename _Tp>
@@ -255,8 +220,19 @@ namespace __detail
       static_assert(is_unsigned<_Tp>::value, "implementation bug");
 
       to_chars_result __res;
+      unsigned __len;
 
-      const unsigned __len = __to_chars_len_8(__val);
+      if _GLIBCXX17_CONSTEXPR (numeric_limits<_Tp>::digits <= 16)
+	{
+	  __len = __val > 077777u ? 6u
+	    : __val > 07777u ? 5u
+	    : __val > 0777u ? 4u
+	    : __val > 077u ? 3u
+	    : __val > 07u ? 2u
+	    : 1u;
+	}
+      else
+	__len = (__to_chars_len_2(__val) + 2) / 3;
 
       if (__builtin_expect((__last - __first) < __len, 0))
 	{
@@ -397,7 +373,7 @@ namespace __detail
 	  __i++;
 	}
       __first += __i;
-      return __i <= (sizeof(_Tp) * __CHAR_BIT__);
+      return __i <= numeric_limits<_Tp>::digits;
     }
 
   /// std::from_chars implementation for integers in bases 3 to 10.

  reply	other threads:[~2019-09-02 11:32 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-30 14:27 Antony Polukhin
2019-08-30 16:01 ` Jonathan Wakely
2019-08-30 16:09   ` Jonathan Wakely
2019-09-02 11:32     ` Jonathan Wakely [this message]
2019-09-08 13:45       ` Antony Polukhin
2019-09-09 11:13         ` Jonathan Wakely
2019-08-30 19:46   ` Antony Polukhin
2019-08-30 20:05     ` Jonathan Wakely
2019-08-30 17:03 ` Martin Sebor
2019-08-30 19:24   ` Jonathan Wakely

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190902113226.GW9487@redhat.com \
    --to=jwakely@redhat.com \
    --cc=antoshkka@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=libstdc++@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).