* Performance improvement for std::to_chars(char* first, char* last, /* integer-type */ value, int base = 10 );
[not found] <0b5ec858-ebcf-44c2-b66b-0f7c2f278261.ref@yahoo.de>
@ 2024-07-29 8:35 ` Ehrnsperger, Markus
2024-07-29 9:45 ` Jonathan Wakely
0 siblings, 1 reply; 6+ messages in thread
From: Ehrnsperger, Markus @ 2024-07-29 8:35 UTC (permalink / raw)
To: gcc-patches, libstdc++
[-- Attachment #1.1: Type: text/plain, Size: 2597 bytes --]
Hi,
I'm attaching two files:
1.: *to_chars10.h*:
This is intended to be included in libstdc++ / gcc to achieve
performance improvements. It is an implementation of
to_chars10(char* first, char* last, /* integer-type */ value);
Parameters are identical to std::to_chars(char* first, char* last, /*
integer-type */ value, int base = 10 ); . It only works for base == 10.
If it is included in libstdc++, to_chars10(...) could be renamed to
std::to_chars(char* first, char* last, /* integer-type */ value) to
provide an overload for the default base = 10
2.: t*o_chars10.cpp*:
This is a test program for to_chars10 verifying the correctness of the
results, and measuring the performance. The actual performance
improvement is system dependent, so please test on your own system.
On my system the performance improvement is about factor two, my results
are:
Test int8_t verifying to_chars10 = std::to_chars ... OK
Test uint8_t verifying to_chars10 = std::to_chars ... OK
Test int16_t verifying to_chars10 = std::to_chars ... OK
Test uint16_t verifying to_chars10 = std::to_chars ... OK
Test int32_t verifying to_chars10 = std::to_chars ... OK
Test uint32_t verifying to_chars10 = std::to_chars ... OK
Test int64_t verifying to_chars10 = std::to_chars ... OK
Test uint64_t verifying to_chars10 = std::to_chars ... OK
Benchmarking test case tested method ... time (lower is
better)
Benchmarking random unsigned 64 bit to_chars10 ... 0.00957
Benchmarking random unsigned 64 bit std::to_chars ... 0.01854
Benchmarking random signed 64 bit to_chars10 ... 0.01018
Benchmarking random signed 64 bit std::to_chars ... 0.02297
Benchmarking random unsigned 32 bit to_chars10 ... 0.00620
Benchmarking random unsigned 32 bit std::to_chars ... 0.01275
Benchmarking random signed 32 bit to_chars10 ... 0.00783
Benchmarking random signed 32 bit std::to_chars ... 0.01606
Benchmarking random unsigned 16 bit to_chars10 ... 0.00536
Benchmarking random unsigned 16 bit std::to_chars ... 0.00871
Benchmarking random signed 16 bit to_chars10 ... 0.00664
Benchmarking random signed 16 bit std::to_chars ... 0.01154
Benchmarking random unsigned 08 bit to_chars10 ... 0.00393
Benchmarking random unsigned 08 bit std::to_chars ... 0.00626
Benchmarking random signed 08 bit to_chars10 ... 0.00465
Benchmarking random signed 08 bit std::to_chars ... 0.01089
Thanks, Markus
[-- Attachment #2: to_chars10.cpp --]
[-- Type: text/plain, Size: 8805 bytes --]
// g++ -std=c++17 -O3 -g to_chars10.cpp
/*
Copyright (C) Markus Ehrnsperger. All rights reserved.
Licence: GNU General Public License version 3
This program does:
- check correctness of to_chars10
- compare performance of to_chars10 with std::to_chars
Note: Part of the code is copied / modifies from https://github.com/miloyip/itoa-benchmark
*/
#include "to_chars10.h"
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <iostream>
#include <iomanip>
#include <cstring>
#include <chrono>
#include <random>
#include <charconv>
#include <algorithm>
const unsigned kIterationPerDigit = 100000;
const unsigned kIterationForRandom = 100;
const unsigned kTrial = 10;
template <typename T>
struct Traits { };
template <>
struct Traits<uint8_t> {
enum { kBufferSize = 3 };
enum { kMaxDigit = 3 };
static uint8_t Negate(uint8_t x) { return x; };
};
template <>
struct Traits<int8_t> {
enum { kBufferSize = 4 };
enum { kMaxDigit = 3 };
static int8_t Negate(int8_t x) { return -x; };
};
template <>
struct Traits<uint16_t> {
enum { kBufferSize = 5 };
enum { kMaxDigit = 5 };
static uint16_t Negate(uint16_t x) { return x; };
};
template <>
struct Traits<int16_t> {
enum { kBufferSize = 6 };
enum { kMaxDigit = 5 };
static int16_t Negate(int16_t x) { return -x; };
};
template <>
struct Traits<uint32_t> {
enum { kBufferSize = 10 };
enum { kMaxDigit = 10 };
static uint32_t Negate(uint32_t x) { return x; };
};
template <>
struct Traits<int32_t> {
enum { kBufferSize = 11 };
enum { kMaxDigit = 10 };
static int32_t Negate(int32_t x) { return -x; };
};
template <>
struct Traits<uint64_t> {
enum { kBufferSize = 20 };
enum { kMaxDigit = 20 };
static uint64_t Negate(uint64_t x) { return x; };
};
template <>
struct Traits<int64_t> {
enum { kBufferSize = 20 };
enum { kMaxDigit = 19 };
static int64_t Negate(int64_t x) { return -x; };
};
template <typename T>
void VerifyValue(T value, std::to_chars_result(*f)(char*, char*, T, int), std::to_chars_result(*g)(char*, char*, T, int), const char* test, const char* fname, const char* gname) {
char buffer_f[Traits<T>::kBufferSize];
char buffer_g[Traits<T>::kBufferSize];
std::to_chars_result r_f = f(buffer_f, buffer_f+sizeof(buffer_f), value, 10);
std::to_chars_result r_g = g(buffer_g, buffer_g+sizeof(buffer_g), value, 10);
if (r_f.ec != r_g.ec) {
std::cout << "\nError: " << fname << " -> " << std::make_error_code(r_f.ec).message();
std::cout << ", " << gname << " -> " << std::make_error_code(r_g.ec).message() << "\n";
std::cout << "Value " << +value << ", sizeof(buffer) " << sizeof(buffer_f) << "\n";
std::cout << "Test " << test << "\n";
exit(0);
}
if (r_f.ec == std::errc()) {
if (std::string_view(buffer_f, r_f.ptr-buffer_f) != std::string_view(buffer_g, r_g.ptr-buffer_g)) {
std::cout << "\nError: " << fname << " -> " << std::string_view(buffer_f, r_f.ptr-buffer_f);
std::cout << ", " << gname << " -> " << std::string_view(buffer_g, r_g.ptr-buffer_g) << "\n";
std::cout << "Value " << +value << ", sizeof(buffer) " << sizeof(buffer_f) << "\n";
std::cout << "Test " << test << "\n";
exit(0);
}
}
}
template <typename T>
void Verify(std::to_chars_result(*f)(char*, char*, T, int), std::to_chars_result(*g)(char*, char*, T, int), const char* test, const char* fname, const char* gname) {
std::cout << "Test " << test << " verifying " << fname << " = " << gname << " ... ";
// Boundary cases
VerifyValue<T>(0, f, g, test, fname, gname);
VerifyValue<T>(std::numeric_limits<T>::min(), f, g, test, fname, gname);
VerifyValue<T>(std::numeric_limits<T>::max(), f, g, test, fname, gname);
// 2^n - 1, 2^n, 10^n - 1, 10^n until overflow
for (uint32_t power = 2; power <= 10; power += 8) {
T i = 1, last;
do {
VerifyValue<T>(i - 1, f, g, test, fname, gname);
VerifyValue<T>(i, f, g, test, fname, gname);
if (std::numeric_limits<T>::min() < 0) {
VerifyValue<T>(Traits<T>::Negate(i), f, g, test, fname, gname);
VerifyValue<T>(Traits<T>::Negate(i + 1), f, g, test, fname, gname);
}
last = i;
i *= power;
} while (last < i);
}
std::cout << "OK\n";
}
template <class T>
class RandomData {
public:
static T* GetData() {
static RandomData singleton;
return singleton.mData;
}
static const size_t kCountPerDigit = 1000;
static const size_t kCount = kCountPerDigit * Traits<T>::kMaxDigit;
private:
RandomData() :
mData(new T[kCount])
{
T* p = mData;
T start = 1;
for (int digit = 1; digit <= Traits<T>::kMaxDigit; digit++) {
T end = (digit == Traits<T>::kMaxDigit) ? std::numeric_limits<T>::max() : start * 10;
T v = start;
T sign = 1;
for (size_t i = 0; i < kCountPerDigit; i++) {
*p++ = v * sign;
sign = Traits<T>::Negate(sign);
if (++v == end)
v = start;
}
start = end;
}
std::random_device rd;
std::mt19937 g(rd());
std::shuffle(mData, mData + kCount, g);
}
~RandomData() {
delete[] mData;
}
T* mData;
};
template <typename T>
void BenchRandom(std::to_chars_result(*f)(char*, char*, T, int), const char* type) {
printf("Benchmarking random %-20s ... ", type);
char buffer[Traits<T>::kBufferSize];
T* data = RandomData<T>::GetData();
size_t n = RandomData<T>::kCount;
double duration = std::numeric_limits<double>::max();
for (unsigned trial = 0; trial < kTrial; trial++) {
std::chrono::time_point<std::chrono::high_resolution_clock> begin;
std::chrono::duration<double> timeNeeded;
begin = std::chrono::high_resolution_clock::now();
for (unsigned iteration = 0; iteration < kIterationForRandom; iteration++)
for (size_t i = 0; i < n; i++)
f(buffer, buffer+20, data[i], 10);
timeNeeded = std::chrono::high_resolution_clock::now() - begin;
duration = std::min(duration, timeNeeded.count() );
}
duration *= 1e6 / (kIterationForRandom * n); // convert to nano second per operation
printf("%8.5f\n", duration);
}
template<typename T, std::enable_if_t<std::is_integral_v<T>, bool> = true>
inline std::to_chars_result to_chars10(char* first, char* last, T value, int base) {
return to_chars10(first, last, value);
}
int main()
{
Verify<int8_t>(to_chars10, std::to_chars, " int8_t", "to_chars10", "std::to_chars");
Verify<uint8_t>(to_chars10, std::to_chars, " uint8_t", "to_chars10", "std::to_chars");
Verify<int16_t>(to_chars10, std::to_chars, " int16_t", "to_chars10", "std::to_chars");
Verify<uint16_t>(to_chars10, std::to_chars, "uint16_t", "to_chars10", "std::to_chars");
Verify<int32_t>(to_chars10, std::to_chars, " int32_t", "to_chars10", "std::to_chars");
Verify<uint32_t>(to_chars10, std::to_chars, "uint32_t", "to_chars10", "std::to_chars");
Verify<int64_t>(to_chars10, std::to_chars, " int64_t", "to_chars10", "std::to_chars");
Verify<uint64_t>(to_chars10, std::to_chars, "uint64_t", "to_chars10", "std::to_chars");
std::cout << "\n";
std::cout << "Benchmarking test case tested method ... time (lower is better)\n";
BenchRandom<uint64_t>(to_chars10 , "unsigned 64 bit to_chars10 ");
BenchRandom<uint64_t>(std::to_chars , "unsigned 64 bit std::to_chars ");
BenchRandom<int64_t>(to_chars10 , " signed 64 bit to_chars10 ");
BenchRandom<int64_t>(std::to_chars , " signed 64 bit std::to_chars ");
BenchRandom<uint32_t>(to_chars10 , "unsigned 32 bit to_chars10 ");
BenchRandom<uint32_t>(std::to_chars , "unsigned 32 bit std::to_chars ");
BenchRandom<int32_t>(to_chars10 , " signed 32 bit to_chars10 ");
BenchRandom<int32_t>(std::to_chars , " signed 32 bit std::to_chars ");
BenchRandom<uint16_t>(to_chars10 , "unsigned 16 bit to_chars10 ");
BenchRandom<uint16_t>(std::to_chars , "unsigned 16 bit std::to_chars ");
BenchRandom<int16_t>(to_chars10 , " signed 16 bit to_chars10 ");
BenchRandom<int16_t>(std::to_chars , " signed 16 bit std::to_chars ");
BenchRandom<uint8_t>(to_chars10 , "unsigned 08 bit to_chars10 ");
BenchRandom<uint8_t>(std::to_chars , "unsigned 08 bit std::to_chars ");
BenchRandom<int8_t>(to_chars10 , " signed 08 bit to_chars10 ");
BenchRandom<int8_t>(std::to_chars , " signed 08 bit std::to_chars ");
}
[-- Attachment #3: to_chars10.h --]
[-- Type: text/plain, Size: 13356 bytes --]
/*
Copyright (C) Markus Ehrnsperger. All rights reserved.
Licence: GNU General Public License version 3, with the GCC RUNTIME LIBRARY EXCEPTION
Usage:
use
res = to_chars10(first, last, value);
instead of
res = std::to_chars(first, last, value, 10);
same parameters, return values, ... . So, identical but faster
*/
#ifndef TO_CHARS10_H
#define TO_CHARS10_H
#include <cstdint>
#include <cstring>
#include <charconv>
#include <endian.h>
namespace to_chars10_internal {
/*
Naming schema
char *itoaN_M(char *b, uintXX_t i): ==================================
- write between N and M digits to b.
- if less than N digits are needed for i, left fill with '0'
- return the one-past-the-end pointer of the digits written
- even if b is returned (because i == 0 && N == 0), b[0] might change.
- i < 10^M must be valid. This is not checked
char *itoaN_M_ow(char *b, uintXX_t i): ============================
- same as itoaN_M, but data might be written to [b, b+M),
even if not the complete range is needed.
*/
alignas(2) static const char digits_100[200] = {
'0','0','0','1','0','2','0','3','0','4','0','5','0','6','0','7','0','8','0','9',
'1','0','1','1','1','2','1','3','1','4','1','5','1','6','1','7','1','8','1','9',
'2','0','2','1','2','2','2','3','2','4','2','5','2','6','2','7','2','8','2','9',
'3','0','3','1','3','2','3','3','3','4','3','5','3','6','3','7','3','8','3','9',
'4','0','4','1','4','2','4','3','4','4','4','5','4','6','4','7','4','8','4','9',
'5','0','5','1','5','2','5','3','5','4','5','5','5','6','5','7','5','8','5','9',
'6','0','6','1','6','2','6','3','6','4','6','5','6','6','6','7','6','8','6','9',
'7','0','7','1','7','2','7','3','7','4','7','5','7','6','7','7','7','8','7','9',
'8','0','8','1','8','2','8','3','8','4','8','5','8','6','8','7','8','8','8','9',
'9','0','9','1','9','2','9','3','9','4','9','5','9','6','9','7','9','8','9','9'
};
static const uint16_t *digits_100_2 = reinterpret_cast<const uint16_t*>(digits_100);
// ======== 0-2 digits ========================================================
// max uint16_t 65535 -> large enough for 1-2 digits
#if __BYTE_ORDER == __LITTLE_ENDIAN
inline char *itoa0_2(char *b, uint16_t i, bool write_ten) {
// write 0 or 2 chars to b. 0 <= i < 100
// if write_ten == true, writte the i/10 digit
// return b+write_ten
// note: if the i%10 digit is needed, the caller has to increase b.
uint16_t res = digits_100_2[i];
*b = res;
b += write_ten;
*b = res >> 8;
return b;
}
inline char *itoa1_2_ow(char *b, uint16_t i) {
// write 2 chars to b and return the new position, 0 <= i < 10^2
// the new position is (b + (1..2))
bool shift = i < 10;
*reinterpret_cast<uint16_t*>(b) = digits_100_2[i] >> shift*8;
return b+2-shift;
}
#else
inline char *itoa0_2(char *b, uint16_t i, bool write_ten) {
// see comment to __LITTLE_ENDIAN method
const char *src = digits_100 + 2*i;
*b = *src;
b += write_ten;
*b = *++src;
return b;
}
inline char *itoa1_2(char *b, uint16_t i);
inline char *itoa1_2_ow(char *b, uint16_t i) {
return itoa1_2(b, i);
}
#endif
inline char *itoa0_2(char *b, uint16_t i) {
// write 0 or 2 chars to b and return the new position, 0 <= i < 100
return itoa0_2(b, i, i>9) + (i != 0);
}
inline char *itoa1_2(char *b, uint16_t i) {
// write 1 or 2 chars to b and return the new position, 0 <= i < 100
return itoa0_2(b, i, i>9) + 1;
}
// ======== 1-4 digits ========================================================
// max uint16_t 65535 -> large enough for 1-4 digits
inline char *itoa1_4(char *b, uint16_t i) {
// write 1 to 4 chars to b and return the new position, 0 <= i < 10^4
uint16_t q = ((uint32_t)i * 5243U) >> 19; // q = i/100;
b = itoa0_2(b, q);
return itoa0_2(b, i - q*100, i>9) + 1;
}
inline char *itoa3_4(char *b, uint16_t i) {
// write 3 to 4 chars to b and return the new position, 0 <= i < 10^4
// Left-fill with 0.
uint16_t q = ((uint32_t)i * 5243U) >> 19; // q = i/100;
b = itoa1_2_ow(b, q);
*reinterpret_cast<uint16_t*>(b) = digits_100_2[i - q*100];
return b+2;
}
#if __BYTE_ORDER == __LITTLE_ENDIAN && defined(__aarch64__)
inline uint32_t itoa4_int(uint16_t i) {
// write each char to one byte of result. i < 10^4
// Left-fill with 0.
uint16_t q = ((uint32_t)i * 5243U) >> 19; // q = i/100, i < 43699
return digits_100_2[q] | ((uint32_t)(digits_100_2[i - q*100]) << 16);
}
inline char *itoa1_4_ow(char *b, uint16_t i) {
// write 4 chars to b and return the new position, 0 <= i < 10^4
// the new position is (b + (1..4))
unsigned char shift = (i<10) + (i<100) + (i<1000);
*reinterpret_cast<uint32_t*>(b) = itoa4_int(i) >> shift*8;
return b + 4-shift;
}
inline char *itoa4_4(char *b, uint16_t i) {
*reinterpret_cast<uint32_t*>(b) = itoa4_int(i);
return b+4;
}
#else
inline char *itoa4_4(char *b, uint16_t i) {
// write exactly 4 chars to b. Left-fill with 0. i < 10^4
uint16_t q = ((uint32_t)i * 5243U) >> 19; // q = i/100; i < 43699
reinterpret_cast<uint16_t*>(b)[0] = digits_100_2[q];
reinterpret_cast<uint16_t*>(b)[1] = digits_100_2[i - q*100];
return b+4;
}
inline char *itoa1_4_ow(char *b, uint16_t i) {
return itoa1_4(b, i);
return b+4;
}
#endif
// ======== 1-8 digits ========================================================
// max uint32_t 4294967295 (10 digits) -> large enough for 1-8 digits
inline char *itoa5_8(char *b, uint32_t i) {
// write 5 to 8 chars to b and return the new position, 0 <= i < 10^8
// Left-fill with 0.
uint32_t q = i/10000;
b = itoa1_4_ow(b, q);
return itoa4_4(b, i - q*10000);
}
inline char *itoa1_8(char *b, uint32_t i) {
// write 1 to 8 chars to b and return the new position, 0 <= i < 10^8
uint32_t q = i/1000000;
uint32_t j = i - q*1000000;
b = itoa0_2(b, q);
q = j/10000;
j = j - q*10000;
b = itoa0_2(b, q, i>99999) + (i>9999);
q = (j * 5243U) >> 19; // q = j/100;
b = itoa0_2(b, q, i>999) + (i>99);
return itoa0_2(b, j - q*100, i>9) + 1;
}
#if __BYTE_ORDER == __LITTLE_ENDIAN
uint64_t itoa8_int(uint32_t i) {
// write each char to one byte of result. i < 10^8
uint64_t result = 0;
for (int j = 0; j < 3; j ++) {
uint32_t q = i/100;
result |= digits_100_2[i - q*100];
result <<= 16;
i = q;
}
return result | digits_100_2[i];
}
#endif
#if __BYTE_ORDER == __LITTLE_ENDIAN
inline char *itoa1_8_ow(char *b, uint32_t i) {
// write 8 chars to b and return the new position, 0 <= i < 10^8
// the new position is (b + (1..8))
int digits = (i>9)+(i>99)+(i>999)+(i>9999)+(i>99999)+(i>999999)+(i>9999999) + 1;
*reinterpret_cast<uint64_t*>(b) = itoa8_int(i) >> (8-digits)*8;
return b+digits;
}
#else
inline char *itoa1_8_ow(char *b, uint32_t i) {
return itoa1_8(b, i);
}
#endif
#if __BYTE_ORDER == __LITTLE_ENDIAN && defined(__aarch64__)
inline char *itoa8_8(char *b, uint32_t i) {
*reinterpret_cast<uint64_t*>(b) = itoa8_int(i);
return b+8;
}
#else
inline char *itoa8_8(char *b, uint32_t i) {
// write exactly 8 chars to b. Left-fill with 0. i < 10^8 (required, but not checked)
for (int j = 3; j > 0; --j) {
uint32_t q = i/100;
reinterpret_cast<uint16_t*>(b)[j] = digits_100_2[i - q*100];
i = q;
}
reinterpret_cast<uint16_t*>(b)[0] = digits_100_2[i];
return b+8;
}
#endif
// ======== 8-16 digits ========================================================
template<typename T> inline char *itoa9_16(char *b, T i) {
// write 9 to 16 chars to b and return the new position, 10^8 <= i < 10^16
T q = i/100000000;
b = itoa1_8_ow(b, q);
return itoa8_8(b, i - q*100000000);
}
// ======== implementation of itoa(char *b, T i), depending on type T =========
// max uint8_t 256
template<typename T, std::enable_if_t<sizeof(T) == 1, bool> = true, std::enable_if_t<std::is_unsigned_v<T>, bool> = true>
inline char *itoa(char *b, T i) {
// write 1 to 3 digits to b and return the new position
if (i < 100) return itoa1_2(b, i);
T q = i/100; // 256/100 = 2
*b++ = q + '0';
reinterpret_cast<uint16_t*>(b)[0] = digits_100_2[i - q*100];
return b+2;
}
// max uint16_t 65535 5 digits
template<typename T, std::enable_if_t<sizeof(T) == 2, bool> = true, std::enable_if_t<std::is_unsigned_v<T>, bool> = true>
inline char *itoa(char *b, T i) {
// write 1 to 5 digits to b and return the new position
if (i < 100) return itoa1_2(b, i);
if (i < 10000) return itoa3_4(b, i);
T q = i/10000; // 65535/10000 = 6
*b++ = q + '0';
return itoa4_4(b, i - q*10000);
}
// max uint32_t 4294967295 (10 digits)
template<typename T, std::enable_if_t<sizeof(T) <= 4, bool> = true, std::enable_if_t<sizeof(T) >= 3, bool> = true, std::enable_if_t<std::is_unsigned_v<T>, bool> = true>
inline char *itoa(char *b, T i) {
// write 1 to 10 digits to b and return the new position
if (i < 100) return itoa1_2(b, i);
if (i < 10000) return itoa3_4(b, i);
if (i < 100000000) return itoa5_8(b, i);
T q = i/100000000; // 4294967295/100000000 = 42;
b = itoa1_2_ow(b, q);
return itoa8_8(b, i - q*100000000);
}
// max uint64_t 18446744073709551615 (20 digits)
template<typename T, std::enable_if_t<sizeof(T) >= 5, bool> = true, std::enable_if_t<std::is_unsigned_v<T>, bool> = true>
inline char *itoa(char *b, T i) {
// write 1 to 20 digits to b and return the new position
if (i < 100) return itoa1_2(b, i);
if (i < 10000) return itoa3_4(b, i);
if (i < 100000000) return itoa5_8(b, i);
if (i < 10000000000000000) return itoa9_16(b, i);
T q = i/10000000000000000; // 18446744073709551615/10000000000000000 = 1844
b = itoa1_4_ow(b, q);
i = i - q*10000000000000000; // now i up to 16 digits
q = i/100000000;
itoa8_8(b , q);
itoa8_8(b + 8, i - q*100000000);
return b+16;
}
template<typename T, std::enable_if_t<std::is_signed_v<T>, bool> = true>
inline char *itoa(char *b, T i) {
// write 1 to 20 chars to b and return the new position
typedef std::make_unsigned_t<T> TU;
bool minus = i < 0;
TU u = (TU)(i) - 2 * (TU)(i) * minus;
*b = '-';
return itoa(b + minus, u);
}
// ======== verify if the provided range is large enough =========
// max_int[0] = 0
// max_int[N>0] = largest integer number that can be presented with N digits
static const uint64_t max_int[20] = {
0,
9,
99,
999,
9999,
99999,
999999,
9999999,
99999999,
999999999,
9999999999,
99999999999,
999999999999,
9999999999999,
99999999999999,
999999999999999,
9999999999999999,
99999999999999999,
999999999999999999,
9999999999999999999u };
// ===== max_chars_for_to_chars10(T value) ===============================
// return the maximum number of charaters that will be used by to_chars10
// for any value of data type T
template<typename T, std::enable_if_t<sizeof(T) == 1, bool> = true, std::enable_if_t<std::is_unsigned_v<T>, bool> = true>
inline size_t max_chars_for_to_chars10(T value) {
return 3; // 256
}
template<typename T, std::enable_if_t<sizeof(T) == 1, bool> = true, std::enable_if_t<std::is_signed_v<T>, bool> = true>
inline size_t max_chars_for_to_chars10(T value) {
return 4; // -128
}
template<typename T, std::enable_if_t<sizeof(T) == 2, bool> = true, std::enable_if_t<std::is_unsigned_v<T>, bool> = true>
inline size_t max_chars_for_to_chars10(T value) {
return 5; // 65535
}
template<typename T, std::enable_if_t<sizeof(T) == 2, bool> = true, std::enable_if_t<std::is_signed_v<T>, bool> = true>
inline size_t max_chars_for_to_chars10(T value) {
return 6; // -32768
}
template<typename T, std::enable_if_t<sizeof(T) == 4, bool> = true, std::enable_if_t<std::is_unsigned_v<T>, bool> = true>
inline size_t max_chars_for_to_chars10(T value) {
return 10; // 4294967295
}
template<typename T, std::enable_if_t<sizeof(T) == 4, bool> = true, std::enable_if_t<std::is_signed_v<T>, bool> = true>
inline size_t max_chars_for_to_chars10(T value) {
return 11; // -2147483648
}
template<typename T, std::enable_if_t<sizeof(T) == 8, bool> = true>
inline size_t max_chars_for_to_chars10(T value) {
return 20;
}
inline std::to_chars_result to_chars_error(char* last) {
std::to_chars_result res;
res.ec = std::errc::value_too_large;
res.ptr = last;
return res;
}
inline std::to_chars_result to_chars_success(char* last) {
std::to_chars_result res;
res.ec = std::errc();
res.ptr = last;
return res;
}
template<typename T, std::enable_if_t<std::is_integral_v<T>, bool> = true>
inline bool to_chars10_range_check(char* first, char* last, T value) {
if (__builtin_expect(last <= first, false)) return false;
if (__builtin_expect(last - first < max_chars_for_to_chars10(value), false)) {
if (value >= 0) {
// last - first > 0
if (value > max_int[last - first]) return false;
} else {
// value != 0, && last - first - 1 < 19
if (value < -(int64_t)max_int[last - first - 1] ) return false;
}
}
return true;
}
} // end of namespace to_chars10_internal
template<typename T, std::enable_if_t<std::is_integral_v<T>, bool> = true>
inline std::to_chars_result to_chars10(char* first, char* last, T value) {
// same as std::to_chars(char* first, char* last, T value, 10)
if (__builtin_expect(to_chars10_internal::to_chars10_range_check(first, last, value), true))
return to_chars10_internal::to_chars_success(to_chars10_internal::itoa(first, value));
return to_chars10_internal::to_chars_error(last);
}
#endif // TO_CHARS10_H
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Performance improvement for std::to_chars(char* first, char* last, /* integer-type */ value, int base = 10 );
2024-07-29 8:35 ` Performance improvement for std::to_chars(char* first, char* last, /* integer-type */ value, int base = 10 ); Ehrnsperger, Markus
@ 2024-07-29 9:45 ` Jonathan Wakely
2024-07-29 10:16 ` Jonathan Wakely
0 siblings, 1 reply; 6+ messages in thread
From: Jonathan Wakely @ 2024-07-29 9:45 UTC (permalink / raw)
To: Ehrnsperger, Markus; +Cc: gcc-patches, libstdc++
On Mon, 29 Jul 2024 at 09:42, Ehrnsperger, Markus
<markus_ehrnsperger@yahoo.de> wrote:
>
> Hi,
>
>
> I'm attaching two files:
>
> 1.: to_chars10.h:
>
> This is intended to be included in libstdc++ / gcc to achieve performance improvements. It is an implementation of
>
> to_chars10(char* first, char* last, /* integer-type */ value);
>
> Parameters are identical to std::to_chars(char* first, char* last, /* integer-type */ value, int base = 10 ); . It only works for base == 10.
>
> If it is included in libstdc++, to_chars10(...) could be renamed to std::to_chars(char* first, char* last, /* integer-type */ value) to provide an overload for the default base = 10
Thanks for the email. This isn't in the form of a patch that we can
accept as-is, although I see that the license is compatible with
libstdc++, so if you are looking to contribute it then that could be
done either by assigning copyright to the FSF or under the DCO terms.
See https://gcc.gnu.org/contribute.html#legal for more details.
I haven't looked at the code in detail, but is it a similar approach
to https://jk-jeon.github.io/posts/2022/02/jeaiii-algorithm/ ?
How does it compare to the performance of that algorithm?
I have an incomplete implementation of that algorithm for libstdc++
somewhere, but I haven't looked at it for a while.
>
> 2.: to_chars10.cpp:
>
> This is a test program for to_chars10 verifying the correctness of the results, and measuring the performance. The actual performance improvement is system dependent, so please test on your own system.
>
> On my system the performance improvement is about factor two, my results are:
>
>
> Test int8_t verifying to_chars10 = std::to_chars ... OK
> Test uint8_t verifying to_chars10 = std::to_chars ... OK
> Test int16_t verifying to_chars10 = std::to_chars ... OK
> Test uint16_t verifying to_chars10 = std::to_chars ... OK
> Test int32_t verifying to_chars10 = std::to_chars ... OK
> Test uint32_t verifying to_chars10 = std::to_chars ... OK
> Test int64_t verifying to_chars10 = std::to_chars ... OK
> Test uint64_t verifying to_chars10 = std::to_chars ... OK
>
> Benchmarking test case tested method ... time (lower is better)
> Benchmarking random unsigned 64 bit to_chars10 ... 0.00957
> Benchmarking random unsigned 64 bit std::to_chars ... 0.01854
> Benchmarking random signed 64 bit to_chars10 ... 0.01018
> Benchmarking random signed 64 bit std::to_chars ... 0.02297
> Benchmarking random unsigned 32 bit to_chars10 ... 0.00620
> Benchmarking random unsigned 32 bit std::to_chars ... 0.01275
> Benchmarking random signed 32 bit to_chars10 ... 0.00783
> Benchmarking random signed 32 bit std::to_chars ... 0.01606
> Benchmarking random unsigned 16 bit to_chars10 ... 0.00536
> Benchmarking random unsigned 16 bit std::to_chars ... 0.00871
> Benchmarking random signed 16 bit to_chars10 ... 0.00664
> Benchmarking random signed 16 bit std::to_chars ... 0.01154
> Benchmarking random unsigned 08 bit to_chars10 ... 0.00393
> Benchmarking random unsigned 08 bit std::to_chars ... 0.00626
> Benchmarking random signed 08 bit to_chars10 ... 0.00465
> Benchmarking random signed 08 bit std::to_chars ... 0.01089
>
>
> Thanks, Markus
>
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Performance improvement for std::to_chars(char* first, char* last, /* integer-type */ value, int base = 10 );
2024-07-29 9:45 ` Jonathan Wakely
@ 2024-07-29 10:16 ` Jonathan Wakely
2024-07-30 5:21 ` Ehrnsperger, Markus
0 siblings, 1 reply; 6+ messages in thread
From: Jonathan Wakely @ 2024-07-29 10:16 UTC (permalink / raw)
To: Ehrnsperger, Markus; +Cc: gcc-patches, libstdc++
On Mon, 29 Jul 2024 at 10:45, Jonathan Wakely <jwakely.gcc@gmail.com> wrote:
>
> On Mon, 29 Jul 2024 at 09:42, Ehrnsperger, Markus
> <markus_ehrnsperger@yahoo.de> wrote:
> >
> > Hi,
> >
> >
> > I'm attaching two files:
> >
> > 1.: to_chars10.h:
> >
> > This is intended to be included in libstdc++ / gcc to achieve performance improvements. It is an implementation of
> >
> > to_chars10(char* first, char* last, /* integer-type */ value);
> >
> > Parameters are identical to std::to_chars(char* first, char* last, /* integer-type */ value, int base = 10 ); . It only works for base == 10.
> >
> > If it is included in libstdc++, to_chars10(...) could be renamed to std::to_chars(char* first, char* last, /* integer-type */ value) to provide an overload for the default base = 10
>
> Thanks for the email. This isn't in the form of a patch that we can
> accept as-is, although I see that the license is compatible with
> libstdc++, so if you are looking to contribute it then that could be
> done either by assigning copyright to the FSF or under the DCO terms.
> See https://gcc.gnu.org/contribute.html#legal for more details.
>
> I haven't looked at the code in detail, but is it a similar approach
> to https://jk-jeon.github.io/posts/2022/02/jeaiii-algorithm/ ?
> How does it compare to the performance of that algorithm?
>
> I have an incomplete implementation of that algorithm for libstdc++
> somewhere, but I haven't looked at it for a while.
I took a closer look and the reinterpret_casts worried me, so I tried
your test code with UBsan. There are a number of errors that would
need to be fixed before we would consider using this code.
>
>
> >
> > 2.: to_chars10.cpp:
> >
> > This is a test program for to_chars10 verifying the correctness of the results, and measuring the performance. The actual performance improvement is system dependent, so please test on your own system.
> >
> > On my system the performance improvement is about factor two, my results are:
> >
> >
> > Test int8_t verifying to_chars10 = std::to_chars ... OK
> > Test uint8_t verifying to_chars10 = std::to_chars ... OK
> > Test int16_t verifying to_chars10 = std::to_chars ... OK
> > Test uint16_t verifying to_chars10 = std::to_chars ... OK
> > Test int32_t verifying to_chars10 = std::to_chars ... OK
> > Test uint32_t verifying to_chars10 = std::to_chars ... OK
> > Test int64_t verifying to_chars10 = std::to_chars ... OK
> > Test uint64_t verifying to_chars10 = std::to_chars ... OK
> >
> > Benchmarking test case tested method ... time (lower is better)
> > Benchmarking random unsigned 64 bit to_chars10 ... 0.00957
> > Benchmarking random unsigned 64 bit std::to_chars ... 0.01854
> > Benchmarking random signed 64 bit to_chars10 ... 0.01018
> > Benchmarking random signed 64 bit std::to_chars ... 0.02297
> > Benchmarking random unsigned 32 bit to_chars10 ... 0.00620
> > Benchmarking random unsigned 32 bit std::to_chars ... 0.01275
> > Benchmarking random signed 32 bit to_chars10 ... 0.00783
> > Benchmarking random signed 32 bit std::to_chars ... 0.01606
> > Benchmarking random unsigned 16 bit to_chars10 ... 0.00536
> > Benchmarking random unsigned 16 bit std::to_chars ... 0.00871
> > Benchmarking random signed 16 bit to_chars10 ... 0.00664
> > Benchmarking random signed 16 bit std::to_chars ... 0.01154
> > Benchmarking random unsigned 08 bit to_chars10 ... 0.00393
> > Benchmarking random unsigned 08 bit std::to_chars ... 0.00626
> > Benchmarking random signed 08 bit to_chars10 ... 0.00465
> > Benchmarking random signed 08 bit std::to_chars ... 0.01089
> >
> >
> > Thanks, Markus
> >
> >
> >
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Performance improvement for std::to_chars(char* first, char* last, /* integer-type */ value, int base = 10 );
2024-07-29 10:16 ` Jonathan Wakely
@ 2024-07-30 5:21 ` Ehrnsperger, Markus
2024-07-30 7:56 ` Jonathan Wakely
0 siblings, 1 reply; 6+ messages in thread
From: Ehrnsperger, Markus @ 2024-07-30 5:21 UTC (permalink / raw)
To: Jonathan Wakely; +Cc: gcc-patches, libstdc++
[-- Attachment #1: Type: text/plain, Size: 4371 bytes --]
On 2024-07-29 12:16, Jonathan Wakely wrote:
> On Mon, 29 Jul 2024 at 10:45, Jonathan Wakely <jwakely.gcc@gmail.com> wrote:
>> On Mon, 29 Jul 2024 at 09:42, Ehrnsperger, Markus
>> <markus_ehrnsperger@yahoo.de> wrote:
>>> Hi,
>>>
>>>
>>> I'm attaching two files:
>>>
>>> 1.: to_chars10.h:
>>>
>>> This is intended to be included in libstdc++ / gcc to achieve performance improvements. It is an implementation of
>>>
>>> to_chars10(char* first, char* last, /* integer-type */ value);
>>>
>>> Parameters are identical to std::to_chars(char* first, char* last, /* integer-type */ value, int base = 10 ); . It only works for base == 10.
>>>
>>> If it is included in libstdc++, to_chars10(...) could be renamed to std::to_chars(char* first, char* last, /* integer-type */ value) to provide an overload for the default base = 10
>> Thanks for the email. This isn't in the form of a patch that we can
>> accept as-is, although I see that the license is compatible with
>> libstdc++, so if you are looking to contribute it then that could be
>> done either by assigning copyright to the FSF or under the DCO terms.
>> See https://gcc.gnu.org/contribute.html#legal for more details.
>>
>> I haven't looked at the code in detail, but is it a similar approach
>> to https://jk-jeon.github.io/posts/2022/02/jeaiii-algorithm/ ?
>> How does it compare to the performance of that algorithm?
>>
>> I have an incomplete implementation of that algorithm for libstdc++
>> somewhere, but I haven't looked at it for a while.
> I took a closer look and the reinterpret_casts worried me, so I tried
> your test code with UBsan. There are a number of errors that would
> need to be fixed before we would consider using this code.
Attached are new versions of to_chars10.cpp, to_chars10.h and the new
file itoa_better_y.h
Changes:
- I removed all reinterpret_casts, and tested with -fsanitize=undefined
- I added itoa_better_y.h from
https://jk-jeon.github.io/posts/2022/02/jeaiii-algorithm/ to the
performance test.
Note: There is only one line in the benchmark test for itoa_better_y due
to limited features of itoa_better_y:
Benchmarking random unsigned 32 bit itoa_better_y ...
to_chars10.h: Signed-off-by: Markus Ehrnsperger
<Markus_Ehrnsperger@yahoo.de>
The other files are only for performance tests.
>
>
>>
>>> 2.: to_chars10.cpp:
>>>
>>> This is a test program for to_chars10 verifying the correctness of the results, and measuring the performance. The actual performance improvement is system dependent, so please test on your own system.
>>>
>>> On my system the performance improvement is about factor two, my results are:
>>>
>>>
>>> Test int8_t verifying to_chars10 = std::to_chars ... OK
>>> Test uint8_t verifying to_chars10 = std::to_chars ... OK
>>> Test int16_t verifying to_chars10 = std::to_chars ... OK
>>> Test uint16_t verifying to_chars10 = std::to_chars ... OK
>>> Test int32_t verifying to_chars10 = std::to_chars ... OK
>>> Test uint32_t verifying to_chars10 = std::to_chars ... OK
>>> Test int64_t verifying to_chars10 = std::to_chars ... OK
>>> Test uint64_t verifying to_chars10 = std::to_chars ... OK
>>>
>>> Benchmarking test case tested method ... time (lower is better)
>>> Benchmarking random unsigned 64 bit to_chars10 ... 0.00957
>>> Benchmarking random unsigned 64 bit std::to_chars ... 0.01854
>>> Benchmarking random signed 64 bit to_chars10 ... 0.01018
>>> Benchmarking random signed 64 bit std::to_chars ... 0.02297
>>> Benchmarking random unsigned 32 bit to_chars10 ... 0.00620
>>> Benchmarking random unsigned 32 bit std::to_chars ... 0.01275
>>> Benchmarking random signed 32 bit to_chars10 ... 0.00783
>>> Benchmarking random signed 32 bit std::to_chars ... 0.01606
>>> Benchmarking random unsigned 16 bit to_chars10 ... 0.00536
>>> Benchmarking random unsigned 16 bit std::to_chars ... 0.00871
>>> Benchmarking random signed 16 bit to_chars10 ... 0.00664
>>> Benchmarking random signed 16 bit std::to_chars ... 0.01154
>>> Benchmarking random unsigned 08 bit to_chars10 ... 0.00393
>>> Benchmarking random unsigned 08 bit std::to_chars ... 0.00626
>>> Benchmarking random signed 08 bit to_chars10 ... 0.00465
>>> Benchmarking random signed 08 bit std::to_chars ... 0.01089
>>>
>>>
>>> Thanks, Markus
>>>
>>>
>>>
[-- Attachment #2: itoa_better_y.h --]
[-- Type: text/plain, Size: 3570 bytes --]
// code from https://jk-jeon.github.io/posts/2022/02/jeaiii-algorithm/
static constexpr char radix_100_table[] = {
'0', '0', '0', '1', '0', '2', '0', '3', '0', '4',
'0', '5', '0', '6', '0', '7', '0', '8', '0', '9',
'1', '0', '1', '1', '1', '2', '1', '3', '1', '4',
'1', '5', '1', '6', '1', '7', '1', '8', '1', '9',
'2', '0', '2', '1', '2', '2', '2', '3', '2', '4',
'2', '5', '2', '6', '2', '7', '2', '8', '2', '9',
'3', '0', '3', '1', '3', '2', '3', '3', '3', '4',
'3', '5', '3', '6', '3', '7', '3', '8', '3', '9',
'4', '0', '4', '1', '4', '2', '4', '3', '4', '4',
'4', '5', '4', '6', '4', '7', '4', '8', '4', '9',
'5', '0', '5', '1', '5', '2', '5', '3', '5', '4',
'5', '5', '5', '6', '5', '7', '5', '8', '5', '9',
'6', '0', '6', '1', '6', '2', '6', '3', '6', '4',
'6', '5', '6', '6', '6', '7', '6', '8', '6', '9',
'7', '0', '7', '1', '7', '2', '7', '3', '7', '4',
'7', '5', '7', '6', '7', '7', '7', '8', '7', '9',
'8', '0', '8', '1', '8', '2', '8', '3', '8', '4',
'8', '5', '8', '6', '8', '7', '8', '8', '8', '9',
'9', '0', '9', '1', '9', '2', '9', '3', '9', '4',
'9', '5', '9', '6', '9', '7', '9', '8', '9', '9'
};
char* itoa_better_y(std::uint32_t n, char* buffer) {
std::uint64_t prod;
auto get_next_two_digits = [&]() {
prod = std::uint32_t(prod) * std::uint64_t(100);
return int(prod >> 32);
};
auto print_1 = [&](int digit) {
buffer[0] = char(digit + '0');
buffer += 1;
};
auto print_2 = [&] (int two_digits) {
std::memcpy(buffer, radix_100_table + two_digits * 2, 2);
buffer += 2;
};
auto print = [&](std::uint64_t magic_number, int extra_shift, auto remaining_count) {
prod = n * magic_number;
prod >>= extra_shift;
auto two_digits = int(prod >> 32);
if (two_digits < 10) {
print_1(two_digits);
for (int i = 0; i < remaining_count; ++i) {
print_2(get_next_two_digits());
}
}
else {
print_2(two_digits);
for (int i = 0; i < remaining_count; ++i) {
print_2(get_next_two_digits());
}
}
};
if (n < 100) {
if (n < 10) {
// 1 digit.
print_1(n);
}
else {
// 2 digit.
print_2(n);
}
}
else {
if (n < 100'0000) {
if (n < 1'0000) {
// 3 or 4 digits.
// 42949673 = ceil(2^32 / 10^2)
print(42949673, 0, std::integral_constant<int, 1>{});
}
else {
// 5 or 6 digits.
// 429497 = ceil(2^32 / 10^4)
print(429497, 0, std::integral_constant<int, 2>{});
}
}
else {
if (n < 1'0000'0000) {
// 7 or 8 digits.
// 281474978 = ceil(2^48 / 10^6) + 1
print(281474978, 16, std::integral_constant<int, 3>{});
}
else {
if (n < 10'0000'0000) {
// 9 digits.
// 1441151882 = ceil(2^57 / 10^8) + 1
prod = n * std::uint64_t(1441151882);
prod >>= 25;
print_1(int(prod >> 32));
print_2(get_next_two_digits());
print_2(get_next_two_digits());
print_2(get_next_two_digits());
print_2(get_next_two_digits());
}
else {
// 10 digits.
// 1441151881 = ceil(2^57 / 10^8)
prod = n * std::uint64_t(1441151881);
prod >>= 25;
print_2(int(prod >> 32));
print_2(get_next_two_digits());
print_2(get_next_two_digits());
print_2(get_next_two_digits());
print_2(get_next_two_digits());
}
}
}
}
return buffer;
}
[-- Attachment #3: to_chars10.cpp --]
[-- Type: text/plain, Size: 9804 bytes --]
// g++ -std=c++17 -O3 -g -Wall to_chars10.cpp (performance test)
// g++ -std=c++17 -O3 -g -Wall -fsanitize=undefined to_chars10.cpp (test of code correctness)
/*
Copyright (C) Markus Ehrnsperger. All rights reserved.
Licence: GNU General Public License version 3
This program does:
- check correctness of to_chars10
- compare performance of to_chars10 with std::to_chars
Note: Part of the code is copied / modifies from https://github.com/miloyip/itoa-benchmark
*/
#include "to_chars10.h"
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <iostream>
#include <iomanip>
#include <cstring>
#include <chrono>
#include <random>
#include <charconv>
#include <algorithm>
#include "itoa_better_y.h"
const unsigned kIterationPerDigit = 100000;
const unsigned kIterationForRandom = 100;
const unsigned kTrial = 10;
template <typename T>
struct Traits { };
template <>
struct Traits<uint8_t> {
enum { kBufferSize = 3 };
enum { kMaxDigit = 3 };
static uint8_t Negate(uint8_t x) { return x; };
};
template <>
struct Traits<int8_t> {
enum { kBufferSize = 4 };
enum { kMaxDigit = 3 };
static int8_t Negate(int8_t x) { return -x; };
};
template <>
struct Traits<uint16_t> {
enum { kBufferSize = 5 };
enum { kMaxDigit = 5 };
static uint16_t Negate(uint16_t x) { return x; };
};
template <>
struct Traits<int16_t> {
enum { kBufferSize = 6 };
enum { kMaxDigit = 5 };
static int16_t Negate(int16_t x) { return -x; };
};
template <>
struct Traits<uint32_t> {
enum { kBufferSize = 10 };
enum { kMaxDigit = 10 };
static uint32_t Negate(uint32_t x) { return x; };
};
template <>
struct Traits<int32_t> {
enum { kBufferSize = 11 };
enum { kMaxDigit = 10 };
static int32_t Negate(int32_t x) { return -x; };
};
template <>
struct Traits<uint64_t> {
enum { kBufferSize = 20 };
enum { kMaxDigit = 20 };
static uint64_t Negate(uint64_t x) { return x; };
};
template <>
struct Traits<int64_t> {
enum { kBufferSize = 20 };
enum { kMaxDigit = 19 };
static int64_t Negate(int64_t x) { return -x; };
};
template <typename T>
void VerifyValue(T value, std::to_chars_result(*f)(char*, char*, T, int), std::to_chars_result(*g)(char*, char*, T, int), const char* test, const char* fname, const char* gname) {
char buffer_f[Traits<T>::kBufferSize];
char buffer_g[Traits<T>::kBufferSize];
std::to_chars_result r_f = f(buffer_f, buffer_f+sizeof(buffer_f), value, 10);
std::to_chars_result r_g = g(buffer_g, buffer_g+sizeof(buffer_g), value, 10);
if (r_f.ec != r_g.ec) {
std::cout << "\nError: " << fname << " -> " << std::make_error_code(r_f.ec).message();
std::cout << ", " << gname << " -> " << std::make_error_code(r_g.ec).message() << "\n";
std::cout << "Value " << +value << ", sizeof(buffer) " << sizeof(buffer_f) << "\n";
std::cout << "Test " << test << "\n";
exit(0);
}
if (r_f.ec == std::errc()) {
if (std::string_view(buffer_f, r_f.ptr-buffer_f) != std::string_view(buffer_g, r_g.ptr-buffer_g)) {
std::cout << "\nError: " << fname << " -> " << std::string_view(buffer_f, r_f.ptr-buffer_f);
std::cout << ", " << gname << " -> " << std::string_view(buffer_g, r_g.ptr-buffer_g) << "\n";
std::cout << "Value " << +value << ", sizeof(buffer) " << sizeof(buffer_f) << "\n";
std::cout << "Test " << test << "\n";
exit(0);
}
}
}
template <typename T>
void Verify(std::to_chars_result(*f)(char*, char*, T, int), std::to_chars_result(*g)(char*, char*, T, int), const char* test, const char* fname, const char* gname) {
std::cout << "Test " << test << " verifying " << fname << " = " << gname << " ... ";
// Boundary cases
VerifyValue<T>(0, f, g, test, fname, gname);
VerifyValue<T>(std::numeric_limits<T>::min(), f, g, test, fname, gname);
VerifyValue<T>(std::numeric_limits<T>::max(), f, g, test, fname, gname);
// 2^n - 1, 2^n, 10^n - 1, 10^n until overflow
for (uint32_t power = 2; power <= 10; power += 8) {
T i = 1, last;
do {
VerifyValue<T>(i - 1, f, g, test, fname, gname);
VerifyValue<T>(i, f, g, test, fname, gname);
if (std::numeric_limits<T>::min() < 0) {
VerifyValue<T>(Traits<T>::Negate(i), f, g, test, fname, gname);
VerifyValue<T>(Traits<T>::Negate(i + 1), f, g, test, fname, gname);
}
if (i >= std::numeric_limits<T>::max() / (T)power) break;
last = i;
i *= power;
} while (last < i);
}
std::cout << "OK\n";
}
template <class T>
class RandomData {
public:
static T* GetData() {
static RandomData singleton;
return singleton.mData;
}
static const size_t kCountPerDigit = 1000;
static const size_t kCount = kCountPerDigit * Traits<T>::kMaxDigit;
private:
RandomData() :
mData(new T[kCount])
{
T* p = mData;
T start = 1;
for (int digit = 1; digit <= Traits<T>::kMaxDigit; digit++) {
T end = (digit == Traits<T>::kMaxDigit) ? std::numeric_limits<T>::max() : start * 10;
T v = start;
T sign = 1;
for (size_t i = 0; i < kCountPerDigit; i++) {
*p++ = v * sign;
sign = Traits<T>::Negate(sign);
if (++v == end)
v = start;
}
start = end;
}
std::random_device rd;
std::mt19937 g(rd());
std::shuffle(mData, mData + kCount, g);
}
~RandomData() {
delete[] mData;
}
T* mData;
};
template <typename T>
void BenchRandom(std::to_chars_result(*f)(char*, char*, T, int), const char* type) {
printf("Benchmarking random %-20s ... ", type);
char buffer[Traits<T>::kBufferSize];
T* data = RandomData<T>::GetData();
size_t n = RandomData<T>::kCount;
double duration = std::numeric_limits<double>::max();
for (unsigned trial = 0; trial < kTrial; trial++) {
std::chrono::time_point<std::chrono::high_resolution_clock> begin;
std::chrono::duration<double> timeNeeded;
begin = std::chrono::high_resolution_clock::now();
for (unsigned iteration = 0; iteration < kIterationForRandom; iteration++)
for (size_t i = 0; i < n; i++)
f(buffer, buffer+Traits<T>::kBufferSize, data[i], 10);
timeNeeded = std::chrono::high_resolution_clock::now() - begin;
duration = std::min(duration, timeNeeded.count() );
}
duration *= 1e6 / (kIterationForRandom * n); // convert to nano second per operation
printf("%8.5f\n", duration);
}
template<typename T, std::enable_if_t<std::is_integral_v<T>, bool> = true>
inline std::to_chars_result to_chars10(char* first, char* last, T value, int base) {
return to_chars10(first, last, value);
}
template<typename T, std::enable_if_t<std::is_integral_v<T>, bool> = true>
inline std::to_chars_result itoa_better_y(char* first, char* last, T value, int base) {
if (__builtin_expect(to_chars10_internal::to_chars10_range_check(first, last, value), true))
return to_chars10_internal::to_chars_success(itoa_better_y(value, first));
return to_chars10_internal::to_chars_error(last);
}
int main()
{
Verify<int8_t>(to_chars10, std::to_chars, " int8_t", "to_chars10", "std::to_chars");
Verify<uint8_t>(to_chars10, std::to_chars, " uint8_t", "to_chars10", "std::to_chars");
Verify<uint8_t>(itoa_better_y, std::to_chars, " uint8_t", "itoa_better_y", "std::to_chars");
Verify<int16_t>(to_chars10, std::to_chars, " int16_t", "to_chars10", "std::to_chars");
Verify<uint16_t>(to_chars10, std::to_chars, "uint16_t", "to_chars10", "std::to_chars");
Verify<uint16_t>(itoa_better_y, std::to_chars, "uint16_t", "itoa_better_y", "std::to_chars");
Verify<int32_t>(to_chars10, std::to_chars, " int32_t", "to_chars10", "std::to_chars");
Verify<uint32_t>(to_chars10, std::to_chars, "uint32_t", "to_chars10", "std::to_chars");
Verify<uint32_t>(itoa_better_y, std::to_chars, "uint32_t", "itoa_better_y", "std::to_chars");
Verify<int64_t>(to_chars10, std::to_chars, " int64_t", "to_chars10", "std::to_chars");
Verify<uint64_t>(to_chars10, std::to_chars, "uint64_t", "to_chars10", "std::to_chars");
std::cout << "\n";
std::cout << "Benchmarking test case tested method ... time (lower is better)\n";
BenchRandom<uint64_t>(to_chars10 , "unsigned 64 bit to_chars10 ");
BenchRandom<uint64_t>(std::to_chars , "unsigned 64 bit std::to_chars ");
BenchRandom<int64_t>(to_chars10 , " signed 64 bit to_chars10 ");
BenchRandom<int64_t>(std::to_chars , " signed 64 bit std::to_chars ");
BenchRandom<uint32_t>(to_chars10 , "unsigned 32 bit to_chars10 ");
BenchRandom<uint32_t>(itoa_better_y , "unsigned 32 bit itoa_better_y ");
BenchRandom<uint32_t>(std::to_chars , "unsigned 32 bit std::to_chars ");
BenchRandom<int32_t>(to_chars10 , " signed 32 bit to_chars10 ");
BenchRandom<int32_t>(std::to_chars , " signed 32 bit std::to_chars ");
BenchRandom<uint16_t>(to_chars10 , "unsigned 16 bit to_chars10 ");
BenchRandom<uint16_t>(std::to_chars , "unsigned 16 bit std::to_chars ");
BenchRandom<int16_t>(to_chars10 , " signed 16 bit to_chars10 ");
BenchRandom<int16_t>(std::to_chars , " signed 16 bit std::to_chars ");
BenchRandom<uint8_t>(to_chars10 , "unsigned 08 bit to_chars10 ");
BenchRandom<uint8_t>(std::to_chars , "unsigned 08 bit std::to_chars ");
BenchRandom<int8_t>(to_chars10 , " signed 08 bit to_chars10 ");
BenchRandom<int8_t>(std::to_chars , " signed 08 bit std::to_chars ");
}
[-- Attachment #4: to_chars10.h --]
[-- Type: text/plain, Size: 10731 bytes --]
/*
Copyright (C) Markus Ehrnsperger. All rights reserved.
Licence: GNU General Public License version 3, with the GCC RUNTIME LIBRARY EXCEPTION
Usage:
use
res = to_chars10(first, last, value);
instead of
res = std::to_chars(first, last, value, 10);
same parameters, return values, ... . So, identical but faster
*/
#ifndef TO_CHARS10_H
#define TO_CHARS10_H
#include <cstdint>
#include <cstring>
#include <charconv>
#include <endian.h>
namespace to_chars10_internal {
/*
Naming schema
char *itoaN_M(char *b, uintXX_t i): ==================================
- write between N and M digits to b.
- if less than N digits are needed for i, left fill with '0'
- return the one-past-the-end pointer of the digits written
- even if b is returned (because i == 0 && N == 0), b[0] might change.
- i < 10^M must be valid. This is not checked
*/
alignas(2) static const char digits_100[200] = {
'0','0','0','1','0','2','0','3','0','4','0','5','0','6','0','7','0','8','0','9',
'1','0','1','1','1','2','1','3','1','4','1','5','1','6','1','7','1','8','1','9',
'2','0','2','1','2','2','2','3','2','4','2','5','2','6','2','7','2','8','2','9',
'3','0','3','1','3','2','3','3','3','4','3','5','3','6','3','7','3','8','3','9',
'4','0','4','1','4','2','4','3','4','4','4','5','4','6','4','7','4','8','4','9',
'5','0','5','1','5','2','5','3','5','4','5','5','5','6','5','7','5','8','5','9',
'6','0','6','1','6','2','6','3','6','4','6','5','6','6','6','7','6','8','6','9',
'7','0','7','1','7','2','7','3','7','4','7','5','7','6','7','7','7','8','7','9',
'8','0','8','1','8','2','8','3','8','4','8','5','8','6','8','7','8','8','8','9',
'9','0','9','1','9','2','9','3','9','4','9','5','9','6','9','7','9','8','9','9'
};
// ======== 0-2 digits ========================================================
// max uint16_t 65535 -> large enough for 1-2 digits
inline char *itoa0_2(char *b, uint16_t i, bool write_ten) {
// write 0 to 2 chars to b. 0 <= i < 100
// if write_ten == true, write the higher(i/10) digit
// return b+write_ten
// note: if the lower(i%10) digit is needed, the caller has to increase b.
const char *src = digits_100 + 2*i;
*b = *src;
b += write_ten;
*b = *++src;
return b;
}
inline char *itoa0_2(char *b, uint16_t i) {
// write 0 to 2 chars to b and return the new position, 0 <= i < 100
return itoa0_2(b, i, i>9) + (i != 0);
}
inline char *itoa1_2(char *b, uint16_t i) {
// write 1 or 2 chars to b and return the new position, 0 <= i < 100
return itoa0_2(b, i, i>9) + 1;
}
// ======== 1-4 digits ========================================================
// max uint16_t 65535 -> large enough for 1-4 digits
inline char *itoa1_4(char *b, uint16_t i) {
// write 1 to 4 chars to b and return the new position, 0 <= i < 10^4
uint16_t q = ((uint32_t)i * 5243U) >> 19; // q = i/100;
b = itoa0_2(b, q);
return itoa0_2(b, i - q*100, i>9) + 1;
}
inline char *itoa3_4(char *b, uint16_t i) {
// write 3 to 4 chars to b and return the new position, 0 <= i < 10^4
// Left-fill with 0.
uint16_t q = ((uint32_t)i * 5243U) >> 19; // q = i/100;
b = itoa1_2(b, q);
memcpy(b, digits_100 + ((i - q*100) << 1), 2);
return b+2;
}
inline char *itoa4_4(char *b, uint16_t i) {
// write exactly 4 chars to b. Left-fill with 0. i < 10^4
uint16_t q = ((uint32_t)i * 5243U) >> 19; // q = i/100; i < 43699
memcpy(b, digits_100 + (q << 1), 2);
b += 2;
memcpy(b, digits_100 + ((i - q*100) << 1), 2);
return b+2;
}
// ======== 1-8 digits ========================================================
// max uint32_t 4294967295 (10 digits) -> large enough for 1-8 digits
inline char *itoa5_8(char *b, uint32_t i) {
// write 5 to 8 chars to b and return the new position, 0 <= i < 10^8
// Left-fill with 0.
uint32_t q = i/10000;
b = itoa1_4(b, q);
return itoa4_4(b, i - q*10000);
}
inline char *itoa1_8(char *b, uint32_t i) {
// write 1 to 8 chars to b and return the new position, 0 <= i < 10^8
uint32_t q = i/1000000;
uint32_t j = i - q*1000000;
b = itoa0_2(b, q);
q = j/10000;
j = j - q*10000;
b = itoa0_2(b, q, i>99999) + (i>9999);
q = (j * 5243U) >> 19; // q = j/100;
b = itoa0_2(b, q, i>999) + (i>99);
return itoa0_2(b, j - q*100, i>9) + 1;
}
inline char *itoa8_8(char *b, uint32_t i) {
// write exactly 8 chars to b. Left-fill with 0. i < 10^8 (required, but not checked)
for (int j = 6; j > 0; j-=2) {
uint32_t q = i/100;
memcpy(b+j, digits_100 + ((i - q*100) << 1), 2);
i = q;
}
memcpy(b, digits_100 + (i<< 1), 2);
return b+8;
}
// ======== 8-16 digits ========================================================
template<typename T> inline char *itoa9_16(char *b, T i) {
// write 9 to 16 chars to b and return the new position, 10^8 <= i < 10^16
T q = i/100000000;
b = itoa1_8(b, q);
return itoa8_8(b, i - q*100000000);
}
// ======== implementation of itoa(char *b, T i), depending on type T =========
// max uint8_t 256
template<typename T, std::enable_if_t<sizeof(T) == 1, bool> = true, std::enable_if_t<std::is_unsigned_v<T>, bool> = true>
inline char *itoa(char *b, T i) {
// write 1 to 3 digits to b and return the new position
if (i < 100) return itoa1_2(b, i);
T q = i/100; // 256/100 = 2
*b++ = q + '0';
memcpy(b, digits_100 + ((i - q*100) << 1), 2);
return b+2;
}
// max uint16_t 65535 5 digits
template<typename T, std::enable_if_t<sizeof(T) == 2, bool> = true, std::enable_if_t<std::is_unsigned_v<T>, bool> = true>
inline char *itoa(char *b, T i) {
// write 1 to 5 digits to b and return the new position
if (i < 100) return itoa1_2(b, i);
if (i < 10000) return itoa3_4(b, i);
T q = i/10000; // 65535/10000 = 6
*b++ = q + '0';
return itoa4_4(b, i - q*10000);
}
// max uint32_t 4294967295 (10 digits)
template<typename T, std::enable_if_t<sizeof(T) <= 4, bool> = true, std::enable_if_t<sizeof(T) >= 3, bool> = true, std::enable_if_t<std::is_unsigned_v<T>, bool> = true>
inline char *itoa(char *b, T i) {
// write 1 to 10 digits to b and return the new position
if (i < 100) return itoa1_2(b, i);
if (i < 10000) return itoa3_4(b, i);
if (i < 100000000) return itoa5_8(b, i);
T q = i/100000000; // 4294967295/100000000 = 42;
b = itoa1_2(b, q);
return itoa8_8(b, i - q*100000000);
}
// max uint64_t 18446744073709551615 (20 digits)
template<typename T, std::enable_if_t<sizeof(T) >= 5, bool> = true, std::enable_if_t<std::is_unsigned_v<T>, bool> = true>
inline char *itoa(char *b, T i) {
// write 1 to 20 digits to b and return the new position
if (i < 100) return itoa1_2(b, i);
if (i < 10000) return itoa3_4(b, i);
if (i < 100000000) return itoa5_8(b, i);
if (i < 10000000000000000) return itoa9_16(b, i);
T q = i/10000000000000000; // 18446744073709551615/10000000000000000 = 1844
b = itoa1_4(b, q);
i = i - q*10000000000000000; // now i up to 16 digits
q = i/100000000;
itoa8_8(b , q);
itoa8_8(b + 8, i - q*100000000);
return b+16;
}
template<typename T, std::enable_if_t<std::is_signed_v<T>, bool> = true>
inline char *itoa(char *b, T i) {
// write 1 to 20 chars to b and return the new position
typedef std::make_unsigned_t<T> TU;
bool minus = i < 0;
TU u = (TU)(i) - 2 * (TU)(i) * minus;
*b = '-';
return itoa(b + minus, u);
}
// ======== verify if the provided range is large enough =========
// max_int[0] = 0
// max_int[N>0] = largest integer number that can be presented with N digits
static const uint64_t max_int[20] = {
0,
9,
99,
999,
9999,
99999,
999999,
9999999,
99999999,
999999999,
9999999999,
99999999999,
999999999999,
9999999999999,
99999999999999,
999999999999999,
9999999999999999,
99999999999999999,
999999999999999999,
9999999999999999999u };
// ===== max_chars_for_to_chars10(T value) ===============================
// return the maximum number of charaters that will be used by to_chars10
// for any value of data type T
template<typename T, std::enable_if_t<sizeof(T) == 1, bool> = true, std::enable_if_t<std::is_unsigned_v<T>, bool> = true>
inline std::ptrdiff_t max_chars_for_to_chars10(T value) {
return 3; // 256
}
template<typename T, std::enable_if_t<sizeof(T) == 1, bool> = true, std::enable_if_t<std::is_signed_v<T>, bool> = true>
inline std::ptrdiff_t max_chars_for_to_chars10(T value) {
return 4; // -128
}
template<typename T, std::enable_if_t<sizeof(T) == 2, bool> = true, std::enable_if_t<std::is_unsigned_v<T>, bool> = true>
inline std::ptrdiff_t max_chars_for_to_chars10(T value) {
return 5; // 65535
}
template<typename T, std::enable_if_t<sizeof(T) == 2, bool> = true, std::enable_if_t<std::is_signed_v<T>, bool> = true>
inline std::ptrdiff_t max_chars_for_to_chars10(T value) {
return 6; // -32768
}
template<typename T, std::enable_if_t<sizeof(T) == 4, bool> = true, std::enable_if_t<std::is_unsigned_v<T>, bool> = true>
inline std::ptrdiff_t max_chars_for_to_chars10(T value) {
return 10; // 4294967295
}
template<typename T, std::enable_if_t<sizeof(T) == 4, bool> = true, std::enable_if_t<std::is_signed_v<T>, bool> = true>
inline std::ptrdiff_t max_chars_for_to_chars10(T value) {
return 11; // -2147483648
}
template<typename T, std::enable_if_t<sizeof(T) == 8, bool> = true>
inline std::ptrdiff_t max_chars_for_to_chars10(T value) {
return 20;
}
inline std::to_chars_result to_chars_error(char* last) {
std::to_chars_result res;
res.ec = std::errc::value_too_large;
res.ptr = last;
return res;
}
inline std::to_chars_result to_chars_success(char* last) {
std::to_chars_result res;
res.ec = std::errc();
res.ptr = last;
return res;
}
template<typename T, std::enable_if_t<std::is_integral_v<T>, bool> = true>
inline bool to_chars10_range_check(char* first, char* last, T value) {
// return true if [first, last) is large enough for value
if (__builtin_expect(last - first >= max_chars_for_to_chars10(value), true)) return true;
if (__builtin_expect(last <= first, false)) return false;
if (value >= 0) {
// last - first > 0
return (uint64_t)value <= max_int[last - first];
} else {
// value != 0, && last - first - 1 < 19
return (int64_t)value >= -(int64_t)max_int[last - first - 1];
}
}
} // end of namespace to_chars10_internal
template<typename T, std::enable_if_t<std::is_integral_v<T>, bool> = true>
inline std::to_chars_result to_chars10(char* first, char* last, T value) {
// same as std::to_chars(char* first, char* last, T value, 10)
if (__builtin_expect(to_chars10_internal::to_chars10_range_check(first, last, value), true))
return to_chars10_internal::to_chars_success(to_chars10_internal::itoa(first, value));
return to_chars10_internal::to_chars_error(last);
}
#endif // TO_CHARS10_H
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Performance improvement for std::to_chars(char* first, char* last, /* integer-type */ value, int base = 10 );
2024-07-30 5:21 ` Ehrnsperger, Markus
@ 2024-07-30 7:56 ` Jonathan Wakely
2024-09-28 6:41 ` Markus Ehrnsperger
0 siblings, 1 reply; 6+ messages in thread
From: Jonathan Wakely @ 2024-07-30 7:56 UTC (permalink / raw)
To: Ehrnsperger, Markus; +Cc: gcc-patches, libstdc++
[-- Attachment #1: Type: text/plain, Size: 4690 bytes --]
On Tue, 30 Jul 2024, 06:21 Ehrnsperger, Markus, <markus_ehrnsperger@yahoo.de>
wrote:
> On 2024-07-29 12:16, Jonathan Wakely wrote:
>
> > On Mon, 29 Jul 2024 at 10:45, Jonathan Wakely <jwakely.gcc@gmail.com>
> wrote:
> >> On Mon, 29 Jul 2024 at 09:42, Ehrnsperger, Markus
> >> <markus_ehrnsperger@yahoo.de> wrote:
> >>> Hi,
> >>>
> >>>
> >>> I'm attaching two files:
> >>>
> >>> 1.: to_chars10.h:
> >>>
> >>> This is intended to be included in libstdc++ / gcc to achieve
> performance improvements. It is an implementation of
> >>>
> >>> to_chars10(char* first, char* last, /* integer-type */ value);
> >>>
> >>> Parameters are identical to std::to_chars(char* first, char* last, /*
> integer-type */ value, int base = 10 ); . It only works for base == 10.
> >>>
> >>> If it is included in libstdc++, to_chars10(...) could be renamed to
> std::to_chars(char* first, char* last, /* integer-type */ value) to
> provide an overload for the default base = 10
> >> Thanks for the email. This isn't in the form of a patch that we can
> >> accept as-is, although I see that the license is compatible with
> >> libstdc++, so if you are looking to contribute it then that could be
> >> done either by assigning copyright to the FSF or under the DCO terms.
> >> See https://gcc.gnu.org/contribute.html#legal for more details.
> >>
> >> I haven't looked at the code in detail, but is it a similar approach
> >> to https://jk-jeon.github.io/posts/2022/02/jeaiii-algorithm/ ?
> >> How does it compare to the performance of that algorithm?
> >>
> >> I have an incomplete implementation of that algorithm for libstdc++
> >> somewhere, but I haven't looked at it for a while.
> > I took a closer look and the reinterpret_casts worried me, so I tried
> > your test code with UBsan. There are a number of errors that would
> > need to be fixed before we would consider using this code.
>
> Attached are new versions of to_chars10.cpp, to_chars10.h and the new
> file itoa_better_y.h
>
Thanks! I'll take another look.
> Changes:
>
> - I removed all reinterpret_casts, and tested with -fsanitize=undefined
>
> - I added itoa_better_y.h from
> https://jk-jeon.github.io/posts/2022/02/jeaiii-algorithm/ to the
> performance test.
>
> Note: There is only one line in the benchmark test for itoa_better_y due
> to limited features of itoa_better_y:
>
> Benchmarking random unsigned 32 bit itoa_better_y ...
>
>
> to_chars10.h: Signed-off-by: Markus Ehrnsperger
> <Markus_Ehrnsperger@yahoo.de>
>
> The other files are only for performance tests.
>
> >
> >
> >>
> >>> 2.: to_chars10.cpp:
> >>>
> >>> This is a test program for to_chars10 verifying the correctness of the
> results, and measuring the performance. The actual performance improvement
> is system dependent, so please test on your own system.
> >>>
> >>> On my system the performance improvement is about factor two, my
> results are:
> >>>
> >>>
> >>> Test int8_t verifying to_chars10 = std::to_chars ... OK
> >>> Test uint8_t verifying to_chars10 = std::to_chars ... OK
> >>> Test int16_t verifying to_chars10 = std::to_chars ... OK
> >>> Test uint16_t verifying to_chars10 = std::to_chars ... OK
> >>> Test int32_t verifying to_chars10 = std::to_chars ... OK
> >>> Test uint32_t verifying to_chars10 = std::to_chars ... OK
> >>> Test int64_t verifying to_chars10 = std::to_chars ... OK
> >>> Test uint64_t verifying to_chars10 = std::to_chars ... OK
> >>>
> >>> Benchmarking test case tested method ... time (lower
> is better)
> >>> Benchmarking random unsigned 64 bit to_chars10 ... 0.00957
> >>> Benchmarking random unsigned 64 bit std::to_chars ... 0.01854
> >>> Benchmarking random signed 64 bit to_chars10 ... 0.01018
> >>> Benchmarking random signed 64 bit std::to_chars ... 0.02297
> >>> Benchmarking random unsigned 32 bit to_chars10 ... 0.00620
> >>> Benchmarking random unsigned 32 bit std::to_chars ... 0.01275
> >>> Benchmarking random signed 32 bit to_chars10 ... 0.00783
> >>> Benchmarking random signed 32 bit std::to_chars ... 0.01606
> >>> Benchmarking random unsigned 16 bit to_chars10 ... 0.00536
> >>> Benchmarking random unsigned 16 bit std::to_chars ... 0.00871
> >>> Benchmarking random signed 16 bit to_chars10 ... 0.00664
> >>> Benchmarking random signed 16 bit std::to_chars ... 0.01154
> >>> Benchmarking random unsigned 08 bit to_chars10 ... 0.00393
> >>> Benchmarking random unsigned 08 bit std::to_chars ... 0.00626
> >>> Benchmarking random signed 08 bit to_chars10 ... 0.00465
> >>> Benchmarking random signed 08 bit std::to_chars ... 0.01089
> >>>
> >>>
> >>> Thanks, Markus
> >>>
> >>>
> >>>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Performance improvement for std::to_chars(char* first, char* last, /* integer-type */ value, int base = 10 );
2024-07-30 7:56 ` Jonathan Wakely
@ 2024-09-28 6:41 ` Markus Ehrnsperger
0 siblings, 0 replies; 6+ messages in thread
From: Markus Ehrnsperger @ 2024-09-28 6:41 UTC (permalink / raw)
To: Jonathan Wakely; +Cc: gcc-patches, libstdc++
[-- Attachment #1: Type: text/plain, Size: 5307 bytes --]
Any update?
Thanks and Regards, Markus
Am 30.07.24 um 09:56 schrieb Jonathan Wakely:
>
>
> On Tue, 30 Jul 2024, 06:21 Ehrnsperger, Markus,
> <markus_ehrnsperger@yahoo.de> wrote:
>
> On 2024-07-29 12:16, Jonathan Wakely wrote:
>
> > On Mon, 29 Jul 2024 at 10:45, Jonathan Wakely
> <jwakely.gcc@gmail.com> wrote:
> >> On Mon, 29 Jul 2024 at 09:42, Ehrnsperger, Markus
> >> <markus_ehrnsperger@yahoo.de> wrote:
> >>> Hi,
> >>>
> >>>
> >>> I'm attaching two files:
> >>>
> >>> 1.: to_chars10.h:
> >>>
> >>> This is intended to be included in libstdc++ / gcc to achieve
> performance improvements. It is an implementation of
> >>>
> >>> to_chars10(char* first, char* last, /* integer-type */ value);
> >>>
> >>> Parameters are identical to std::to_chars(char* first, char*
> last, /* integer-type */ value, int base = 10 ); . It only works
> for base == 10.
> >>>
> >>> If it is included in libstdc++, to_chars10(...) could be
> renamed to std::to_chars(char* first, char* last, /* integer-type
> */ value) to provide an overload for the default base = 10
> >> Thanks for the email. This isn't in the form of a patch that we can
> >> accept as-is, although I see that the license is compatible with
> >> libstdc++, so if you are looking to contribute it then that
> could be
> >> done either by assigning copyright to the FSF or under the DCO
> terms.
> >> See https://gcc.gnu.org/contribute.html#legal for more details.
> >>
> >> I haven't looked at the code in detail, but is it a similar
> approach
> >> to https://jk-jeon.github.io/posts/2022/02/jeaiii-algorithm/ ?
> >> How does it compare to the performance of that algorithm?
> >>
> >> I have an incomplete implementation of that algorithm for libstdc++
> >> somewhere, but I haven't looked at it for a while.
> > I took a closer look and the reinterpret_casts worried me, so I
> tried
> > your test code with UBsan. There are a number of errors that would
> > need to be fixed before we would consider using this code.
>
> Attached are new versions of to_chars10.cpp, to_chars10.h and the new
> file itoa_better_y.h
>
>
> Thanks! I'll take another look.
>
>
>
> Changes:
>
> - I removed all reinterpret_casts, and tested with
> -fsanitize=undefined
>
> - I added itoa_better_y.h from
> https://jk-jeon.github.io/posts/2022/02/jeaiii-algorithm/ to the
> performance test.
>
> Note: There is only one line in the benchmark test
> for itoa_better_y due
> to limited features of itoa_better_y:
>
> Benchmarking random unsigned 32 bit itoa_better_y ...
>
>
> to_chars10.h: Signed-off-by: Markus Ehrnsperger
> <Markus_Ehrnsperger@yahoo.de>
>
> The other files are only for performance tests.
>
> >
> >
> >>
> >>> 2.: to_chars10.cpp:
> >>>
> >>> This is a test program for to_chars10 verifying the
> correctness of the results, and measuring the performance. The
> actual performance improvement is system dependent, so please test
> on your own system.
> >>>
> >>> On my system the performance improvement is about factor two,
> my results are:
> >>>
> >>>
> >>> Test int8_t verifying to_chars10 = std::to_chars ... OK
> >>> Test uint8_t verifying to_chars10 = std::to_chars ... OK
> >>> Test int16_t verifying to_chars10 = std::to_chars ... OK
> >>> Test uint16_t verifying to_chars10 = std::to_chars ... OK
> >>> Test int32_t verifying to_chars10 = std::to_chars ... OK
> >>> Test uint32_t verifying to_chars10 = std::to_chars ... OK
> >>> Test int64_t verifying to_chars10 = std::to_chars ... OK
> >>> Test uint64_t verifying to_chars10 = std::to_chars ... OK
> >>>
> >>> Benchmarking test case tested method ... time
> (lower is better)
> >>> Benchmarking random unsigned 64 bit to_chars10 ... 0.00957
> >>> Benchmarking random unsigned 64 bit std::to_chars ... 0.01854
> >>> Benchmarking random signed 64 bit to_chars10 ... 0.01018
> >>> Benchmarking random signed 64 bit std::to_chars ... 0.02297
> >>> Benchmarking random unsigned 32 bit to_chars10 ... 0.00620
> >>> Benchmarking random unsigned 32 bit std::to_chars ... 0.01275
> >>> Benchmarking random signed 32 bit to_chars10 ... 0.00783
> >>> Benchmarking random signed 32 bit std::to_chars ... 0.01606
> >>> Benchmarking random unsigned 16 bit to_chars10 ... 0.00536
> >>> Benchmarking random unsigned 16 bit std::to_chars ... 0.00871
> >>> Benchmarking random signed 16 bit to_chars10 ... 0.00664
> >>> Benchmarking random signed 16 bit std::to_chars ... 0.01154
> >>> Benchmarking random unsigned 08 bit to_chars10 ... 0.00393
> >>> Benchmarking random unsigned 08 bit std::to_chars ... 0.00626
> >>> Benchmarking random signed 08 bit to_chars10 ... 0.00465
> >>> Benchmarking random signed 08 bit std::to_chars ... 0.01089
> >>>
> >>>
> >>> Thanks, Markus
> >>>
> >>>
> >>>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-09-28 6:41 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <0b5ec858-ebcf-44c2-b66b-0f7c2f278261.ref@yahoo.de>
2024-07-29 8:35 ` Performance improvement for std::to_chars(char* first, char* last, /* integer-type */ value, int base = 10 ); Ehrnsperger, Markus
2024-07-29 9:45 ` Jonathan Wakely
2024-07-29 10:16 ` Jonathan Wakely
2024-07-30 5:21 ` Ehrnsperger, Markus
2024-07-30 7:56 ` Jonathan Wakely
2024-09-28 6:41 ` Markus Ehrnsperger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).