public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Antony Polukhin <antoshkka@gmail.com>
To: "libstdc++" <libstdc++@gcc.gnu.org>,
	gcc-patches List <gcc-patches@gcc.gnu.org>
Subject: [PATCH] Optimize to_chars
Date: Fri, 30 Aug 2019 14:39:00 -0000	[thread overview]
Message-ID: <CAKqmYPZh+KXP3J_5X3xmz=-D0h013_w201Z=E5e56sfvCtAr1A@mail.gmail.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 2069 bytes --]

Bunch of micro optimizations for std::to_chars:
* For base == 8 replacing the lookup in __digits table with arithmetic
computations leads to a same CPU cycles for a loop (exchanges two
movzx with 3 bit ops https://godbolt.org/z/RTui7m ). However this
saves 129 bytes of data and totally avoids a chance of cache misses on
__digits.
* For base == 16 replacing the lookup in __digits table with
arithmetic computations leads to a few additional instructions, but
totally avoids a chance of cache misses on __digits (- ~9 cache misses
for worst case) and saves 513 bytes of const data.
* Replacing __first[pos] and __first[pos - 1] with __first[1] and
__first[0] on final iterations saves ~2% of code size.
* Removing trailing '\0' from arrays of digits allows the linker to
merge the symbols (so that "0123456789abcdefghijklmnopqrstuvwxyz" and
"0123456789abcdef" could share the same address). This improves data
locality and reduces binary sizes.
* Using __detail::__to_chars_len_2 instead of a generic
__detail::__to_chars_len makes the operation O(1) instead of O(N). It
also makes the code two times shorter ( https://godbolt.org/z/Peq_PG)
.

In sum: this significantly reduces the size of a binary (for about
4KBs only for base-8 conversion https://godbolt.org/z/WPKijS ), deals
with latency (CPU cache misses) without changing the iterations count
and without adding costly instructions into the loops.

Changelog:
    * include/std/charconv (__detail::__to_chars_8,
    __detail::__to_chars_16): Replace array of precomputed digits
    with arithmetic operations to avoid CPU cache misses. Remove
    zero termination from array of digits to allow symbol merge with
    generic implementation of __detail::__to_chars. Replace final
    offsets with constants. Use __detail::__to_chars_len_2 instead
    of a generic __detail::__to_chars_len.
    * include/std/charconv (__detail::__to_chars): Remove
    zero termination from array of digits.
    * include/std/charconv (__detail::__to_chars_2): Leading digit
    is always '1'.

-- 
Best regards,
Antony Polukhin

[-- Attachment #2: charconv_patch.txt --]
[-- Type: text/plain, Size: 5042 bytes --]

diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog
index eaa6f74..35706d0 100644
--- a/libstdc++-v3/ChangeLog
+++ b/libstdc++-v3/ChangeLog
@@ -1,3 +1,17 @@
+2019-08-30  Antony Polukhin  <antoshkka@gmail.com>
+
+	* include/std/charconv (__detail::__to_chars_8,
+	__detail::__to_chars_16): Replace array of precomputed digits
+	with arithmetic operations to avoid CPU cache misses. Remove
+	zero termination from array of digits to allow symbol merge with
+	generic implementation of __detail::__to_chars. Replace final
+	offsets with constants. Use __detail::__to_chars_len_2 instead
+	of a generic __detail::__to_chars_len.
+	* include/std/charconv (__detail::__to_chars): Remove
+	zero termination from array of digits.
+	* include/std/charconv (__detail::__to_chars_2): Leading digit
+	is always '1'.
+
 2019-08-29  Jonathan Wakely  <jwakely@redhat.com>
 
 	PR libstdc++/91067
diff --git a/libstdc++-v3/include/std/charconv b/libstdc++-v3/include/std/charconv
index 53aa63e..4e94c39 100644
--- a/libstdc++-v3/include/std/charconv
+++ b/libstdc++-v3/include/std/charconv
@@ -131,7 +131,7 @@ namespace __detail
 	    : 1u;
 	}
       else
-	return __to_chars_len(__value, 8);
+	return (__to_chars_len_2(__value) + 2) / 3;
     }
 
   // Generic implementation for arbitrary bases.
@@ -155,8 +155,12 @@ namespace __detail
 
       unsigned __pos = __len - 1;
 
-      static constexpr char __digits[]
-	= "0123456789abcdefghijklmnopqrstuvwxyz";
+      static constexpr char __digits[] = {
+	'0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
+	'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
+	'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't',
+	'u', 'v', 'w', 'x', 'y', 'z'
+      };
 
       while (__val >= __base)
 	{
@@ -181,7 +185,7 @@ namespace __detail
 
       to_chars_result __res;
 
-      const unsigned __len = __to_chars_len(__val, 0x10);
+      const unsigned __len = (__to_chars_len_2(__val) + 3) / 4;
 
       if (__builtin_expect((__last - __first) < __len, 0))
 	{
@@ -190,32 +194,30 @@ namespace __detail
 	  return __res;
 	}
 
-      static constexpr char __digits[513] =
-	"000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f"
-	"202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f"
-	"404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f"
-	"606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f"
-	"808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9f"
-	"a0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebf"
-	"c0c1c2c3c4c5c6c7c8c9cacbcccdcecfd0d1d2d3d4d5d6d7d8d9dadbdcdddedf"
-	"e0e1e2e3e4e5e6e7e8e9eaebecedeeeff0f1f2f3f4f5f6f7f8f9fafbfcfdfeff";
+      static constexpr char __digits[] = {
+	'0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
+	'a', 'b', 'c', 'd', 'e', 'f'
+      };
       unsigned __pos = __len - 1;
       while (__val >= 0x100)
 	{
-	  auto const __num = (__val % 0x100) * 2;
-	  __val /= 0x100;
-	  __first[__pos] = __digits[__num + 1];
+	  auto __num = __val & 0xF;
+	  __val >>= 4;
+	  __first[__pos] = __digits[__num];
+	  __num = __val & 0xF;
+	  __val >>= 4;
 	  __first[__pos - 1] = __digits[__num];
 	  __pos -= 2;
 	}
       if (__val >= 0x10)
 	{
-	  auto const __num = __val * 2;
-	  __first[__pos] = __digits[__num + 1];
-	  __first[__pos - 1] = __digits[__num];
+	  const auto __num = __val & 0xF;
+	  __val >>= 4;
+	  __first[1] = __digits[__num];
+	  __first[0] = __digits[__val];
 	}
       else
-	__first[__pos] = "0123456789abcdef"[__val];
+	__first[0] = __digits[__val];
       __res.ptr = __first + __len;
       __res.ec = {};
       return __res;
@@ -263,28 +265,26 @@ namespace __detail
 	  return __res;
 	}
 
-      static constexpr char __digits[129] =
-	"00010203040506071011121314151617"
-	"20212223242526273031323334353637"
-	"40414243444546475051525354555657"
-	"60616263646566677071727374757677";
       unsigned __pos = __len - 1;
       while (__val >= 0100)
 	{
-	  auto const __num = (__val % 0100) * 2;
-	  __val /= 0100;
-	  __first[__pos] = __digits[__num + 1];
-	  __first[__pos - 1] = __digits[__num];
+	  auto __num = __val & 7;
+	  __val >>= 3;
+	  __first[__pos] = '0' + __num;
+	  __num = __val & 7;
+	  __val >>= 3;
+	  __first[__pos - 1] = '0' + __num;
 	  __pos -= 2;
 	}
       if (__val >= 010)
 	{
-	  auto const __num = __val * 2;
-	  __first[__pos] = __digits[__num + 1];
-	  __first[__pos - 1] = __digits[__num];
+	  auto const __num = __val & 7;
+	  __val >>= 3;
+	  __first[1] = '0' + __num;
+	  __first[0] = '0' + __val;
 	}
       else
-	__first[__pos] = '0' + __val;
+	__first[0] = '0' + __val;
       __res.ptr = __first + __len;
       __res.ec = {};
       return __res;
@@ -315,7 +315,10 @@ namespace __detail
 	  __first[__pos--] = '0' + (__val & 1);
 	  __val >>= 1;
 	}
-      *__first = '0' + (__val & 1);
+      // First digit is always '1' because __to_chars_len_2 skips
+      // leading zero bits and std::to_chars handles zero values
+      // directly.
+      __first[0] = '1';
 
       __res.ptr = __first + __len;
       __res.ec = {};

             reply	other threads:[~2019-08-30 14:27 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-30 14:39 Antony Polukhin [this message]
2019-08-30 16:35 ` Jonathan Wakely
2019-08-30 16:44   ` Jonathan Wakely
2019-09-02 11:32     ` Jonathan Wakely
2019-09-08 13:45       ` Antony Polukhin
2019-09-09 11:13         ` Jonathan Wakely
2019-08-30 21:50   ` Antony Polukhin
2019-08-30 23:33     ` Jonathan Wakely
2019-08-30 19:41 ` Martin Sebor
2019-08-30 21:11   ` Jonathan Wakely

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKqmYPZh+KXP3J_5X3xmz=-D0h013_w201Z=E5e56sfvCtAr1A@mail.gmail.com' \
    --to=antoshkka@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=libstdc++@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).