* Re: [PATCH 1/1] string: Add stpecpy(3) @ 2022-12-23 23:24 Wilco Dijkstra 2022-12-24 0:05 ` Alejandro Colomar 0 siblings, 1 reply; 16+ messages in thread From: Wilco Dijkstra @ 2022-12-23 23:24 UTC (permalink / raw) To: Alejandro Colomar; +Cc: 'GNU C Library' Hi Alex, > For that, we'd first need to discuss what is a typical scenario. Like copying/concatenating strings that fit within the buffer. > And also, it depends a lot on what the compiler can optimize. If I call > strlcat(3) in a loop, I know that stpecpy(3) is going to be orders of magnitude > faster. If you're trying to say that the 'strcat' variant is bad then yes absolutely - it's better to inline in the compiler or avoid the 'strcat' versions altogether (that's also why I would strongly suggest never to add more 'cat' variants). But that doesn't say anything about whether stpecpy is better than strlcpy. > If I call strlcpy(3) in a loop, doing what an ideal compiler might do, that > might be something to benchmark, but we'd also need to discuss what is a good > input for the benchmark. The typical case would be copying or concatenating smallish strings to a buffer. > In the OpenBSD definition of strlcpy(), I count 4 branches, and one of them is > inside a while loop. So, I'd find it very surprising if strlcpy(3) outperformed > stpecpy(3). If that really is the OpenBSD implementation then this proves my point that non-standard string functions are often totally unoptimized. A basic implementation of strlcpy would use strlen and memcpy so it is fast on every system without requiring any optimization: size_t strlcpy (char *dst, const char *src, size_t size) { size_t len = strlen (src); if (size == 0) return len; size = len >= size ? size - 1 : len; dst[size] = 0; memcpy (dst, src, size); return len; } > Well, with the current memccpy(3) I already suspect it's going to be faster than > strlcpy(3). If you optimize it, it would increase the chances that it's faster :) I don't see why it would be any faster given memccpy might also not be optimized. > I find it _way_ more readable than the strlcpy(3)/cat(3) code. Oh, and did I > say it has less branches? :) I'm not so sure about that - you've got 3 call/returns plus at least 4 branches for each stpecpy (besides whatever memcpy/memchr do). strlcpy has 2 calls/ returns plus one branch. So needing an extra branch in case you need to do something special for the buffer full case doesn't seem like a major problem. >> In contrast we can be pretty sure that the standard strlen, memcpy etc are both >> correct and efficient on all targets/libc's. > > Sure, but memcpy(3) is not usable in code that needs to truncate. We need to > compare against stpncpy(3) (ughhh) and strlcpy(3). The idea is that if we add new string functions, their implementation should use other string functions that are known to be well optimized for most targets. Cheers, Wilco ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/1] string: Add stpecpy(3) 2022-12-23 23:24 [PATCH 1/1] string: Add stpecpy(3) Wilco Dijkstra @ 2022-12-24 0:05 ` Alejandro Colomar 2022-12-24 0:26 ` Alejandro Colomar 0 siblings, 1 reply; 16+ messages in thread From: Alejandro Colomar @ 2022-12-24 0:05 UTC (permalink / raw) To: Wilco Dijkstra; +Cc: 'GNU C Library' [-- Attachment #1.1: Type: text/plain, Size: 6084 bytes --] On 12/24/22 00:24, Wilco Dijkstra wrote: > Hi Alex, > >> For that, we'd first need to discuss what is a typical scenario. > > Like copying/concatenating strings that fit within the buffer. > >> And also, it depends a lot on what the compiler can optimize. If I call >> strlcat(3) in a loop, I know that stpecpy(3) is going to be orders of magnitude >> faster. > > If you're trying to say that the 'strcat' variant is bad then yes absolutely - I must admit that it has good things, if you have a compiler that does the magic for you. GCC optimizes strcat(3) into stpcpy(3), so if you know that it will be optimized, it's not so bad, and the source code is cleaner to the eye. > it's better to inline in the compiler or avoid the 'strcat' versions altogether > (that's also why I would strongly suggest never to add more 'cat' variants). inlining in the compiler is a good solution. And yes, I agree on not adding cat variants, but at the same time, str functions are problematic in that they can't be chained, as opposed to stp functions. That problem shows itself in the snprintf(3) bugs I mentioned. Users need a way to catenate easily, even if not with a 'cat' function. > But that doesn't say anything about whether stpecpy is better than strlcpy. > >> If I call strlcpy(3) in a loop, doing what an ideal compiler might do, that >> might be something to benchmark, but we'd also need to discuss what is a good >> input for the benchmark. > > The typical case would be copying or concatenating smallish strings to a buffer. Okay, I'll try to prepare a benchmark. > >> In the OpenBSD definition of strlcpy(), I count 4 branches, and one of them is >> inside a while loop. So, I'd find it very surprising if strlcpy(3) outperformed >> stpecpy(3). > > If that really is the OpenBSD implementation then this proves my point that > non-standard string functions are often totally unoptimized. And not only that, but I find your version much more readable. I don't understand how the OpenBSD version was written that way, and hasn't been fixed so far. > > A basic implementation of strlcpy would use strlen and memcpy so it is fast > on every system without requiring any optimization: > > size_t > strlcpy (char *dst, const char *src, size_t size) > { > size_t len = strlen (src); > > if (size == 0) > return len; > size = len >= size ? size - 1 : len; I'd use a separate variable dlen, to differentiate it from size. Otherwise, it looks like an off-by-one bug just below, since writing at a [size] usually means writing past the array. > dst[size] = 0; > memcpy (dst, src, size); > return len; > } Then, to compare oranges to oranges, I'll provide the equivalently optimized stpecpy(3): char * stpecpy (char *dst, char *end, const char *restrict src) { size_t dsize; size_t dlen; size_t slen; slen = strlen(src); if (dst == end) return NULL; if (unlikely(dst == NULL)) return NULL; if (dst > end) unreachable(); dsize = end - dst; dlen = slen >= dsize ? dsize - 1 : slen; dst[dlen] = 0; return mempcpy(dst, src, dlen); } Now we can really compare them. (unlikely() is the obvious wrapper over __builtin_expect(), and unreachable() is C23's equivalent of __builtin_unreachable(); they're just extra optimizations, but can be ignored.) There are various decissions to take here: - We could leave NULL as UB, but I want to handle it for being able to combine with stpeprintf(). Although, we could implement stpeprintf() so that it never fails (we would need to implement it without the INT_MAX limitation of snprintf(3)). - We could call strnlen(3) instead, but strlen(3) is probably faster in the average use case, and has the benefit of crashing on invalid input. The differences with your strlcpy(3) implementation are: - NULL check. - dsize = end - dst; calculation Considering that strlcpy(3) chained calls need extra boilerplate at call site (remember): n = strlcpy(buf, "Hello ", sizeof(buf)); if (n >= sizeof(buf)) goto toolong; n += strlcpy(buf + n, "world", sizeof(buf) - n); if (n >= sizeof(buf)) goto toolong; n += strlcpy(buf + n, "!", sizeof(buf) - n); if (n >= sizeof(buf)) goto toolong; puts(buf); we see that there are in reality more calculations in the case of strlcpy(3); I see 2 '+'s and 1 '-' at strlcpy(3) call site, while we only had an extra '-' in the stpecpy(3) internals. The number of conditionals seems to be the same after all, except for one single conditional after all the chained stpecpy(3) calls. So, for equally optimized code, stpecpy(3) seems to win. It's not hard to believe, since they perform the same operation, with the difference that strlcpy(3) forces the user to recalculate the buffer size (or really, the compiler on behalf of the user, since the intention is that users don't write this code, and instead write calls to strlcat(3)). While it seems very obvious, since the source code is so similar that we can see the differences, I'll still try to write a benchmark. And remember that performance is only second to usability, which is the main selling point of stpe*() functions. [...] >>> In contrast we can be pretty sure that the standard strlen, memcpy etc are both >>> correct and efficient on all targets/libc's. >> >> Sure, but memcpy(3) is not usable in code that needs to truncate. We need to >> compare against stpncpy(3) (ughhh) and strlcpy(3). > > The idea is that if we add new string functions, their implementation should use > other string functions that are known to be well optimized for most targets. Fair. Above is my optimized version of stpecpy(3). > > Cheers, > Wilco Cheers, Alex -- <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/1] string: Add stpecpy(3) 2022-12-24 0:05 ` Alejandro Colomar @ 2022-12-24 0:26 ` Alejandro Colomar 2022-12-24 2:30 ` stpecpy(3) vs strlcpy(3) benchmark (was: [PATCH 1/1] string: Add stpecpy(3)) Alejandro Colomar 2022-12-25 1:52 ` [PATCH 1/1] string: Add stpecpy(3) Noah Goldstein 0 siblings, 2 replies; 16+ messages in thread From: Alejandro Colomar @ 2022-12-24 0:26 UTC (permalink / raw) To: Wilco Dijkstra; +Cc: 'GNU C Library' [-- Attachment #1.1: Type: text/plain, Size: 4175 bytes --] On 12/24/22 01:05, Alejandro Colomar wrote: > > > On 12/24/22 00:24, Wilco Dijkstra wrote: >> Hi Alex, >> >>> For that, we'd first need to discuss what is a typical scenario. >> >> Like copying/concatenating strings that fit within the buffer. > >>> And also, it depends a lot on what the compiler can optimize. If I call >>> strlcat(3) in a loop, I know that stpecpy(3) is going to be orders of magnitude >>> faster. >> >> If you're trying to say that the 'strcat' variant is bad then yes absolutely - > > I must admit that it has good things, if you have a compiler that does the magic > for you. GCC optimizes strcat(3) into stpcpy(3), so if you know that it will be > optimized, it's not so bad, and the source code is cleaner to the eye. > >> it's better to inline in the compiler or avoid the 'strcat' versions altogether >> (that's also why I would strongly suggest never to add more 'cat' variants). > > inlining in the compiler is a good solution. And yes, I agree on not adding cat > variants, but at the same time, str functions are problematic in that they can't > be chained, as opposed to stp functions. That problem shows itself in the > snprintf(3) bugs I mentioned. Users need a way to catenate easily, even if not > with a 'cat' function. > >> But that doesn't say anything about whether stpecpy is better than strlcpy. >> >>> If I call strlcpy(3) in a loop, doing what an ideal compiler might do, that >>> might be something to benchmark, but we'd also need to discuss what is a good >>> input for the benchmark. >> >> The typical case would be copying or concatenating smallish strings to a buffer. > > Okay, I'll try to prepare a benchmark. > >> >>> In the OpenBSD definition of strlcpy(), I count 4 branches, and one of them is >>> inside a while loop. So, I'd find it very surprising if strlcpy(3) outperformed >>> stpecpy(3). >> >> If that really is the OpenBSD implementation then this proves my point that >> non-standard string functions are often totally unoptimized. > > And not only that, but I find your version much more readable. I don't > understand how the OpenBSD version was written that way, and hasn't been fixed > so far. > >> >> A basic implementation of strlcpy would use strlen and memcpy so it is fast >> on every system without requiring any optimization: >> >> size_t >> strlcpy (char *dst, const char *src, size_t size) >> { >> size_t len = strlen (src); >> >> if (size == 0) >> return len; >> size = len >= size ? size - 1 : len; > > I'd use a separate variable dlen, to differentiate it from size. Otherwise, it > looks like an off-by-one bug just below, since writing at a [size] usually means > writing past the array. > >> dst[size] = 0; >> memcpy (dst, src, size); >> return len; >> } > > Then, to compare oranges to oranges, I'll provide the equivalently optimized > stpecpy(3): > > char * > stpecpy (char *dst, char *end, const char *restrict src) > { > size_t dsize; > size_t dlen; > size_t slen; > > slen = strlen(src); > > if (dst == end) > return NULL; > if (unlikely(dst == NULL)) > return NULL; > if (dst > end) > unreachable(); > dsize = end - dst; > dlen = slen >= dsize ? dsize - 1 : slen; > dst[dlen] = 0; > return mempcpy(dst, src, dlen); > } Sorry, I wrote a bug while optimizing: I forgot about the sentinel 'end' return. Now I think it should be fine (anyway, I'll test it soon): char * stpecpy (char *dst, char *end, const char *restrict src) { size_t dsize; size_t dlen; size_t slen = strlen (src); bool trunc = false; if (dst == end) return NULL; if (dst == NULL) return NULL; dsize = end - dst; trunc = (slen >= dsize); dlen = trunc ? dsize - 1 : slen; dst[dlen] = 0; return mempcpy(dst, src, dlen) + trunc; } This adds a '+' operation, so the difference compared to your strlcpy(3) is smaller, but stpecpy() still wins, I think. -- <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* stpecpy(3) vs strlcpy(3) benchmark (was: [PATCH 1/1] string: Add stpecpy(3)) 2022-12-24 0:26 ` Alejandro Colomar @ 2022-12-24 2:30 ` Alejandro Colomar 2022-12-24 10:28 ` Alejandro Colomar 2022-12-25 1:52 ` [PATCH 1/1] string: Add stpecpy(3) Noah Goldstein 1 sibling, 1 reply; 16+ messages in thread From: Alejandro Colomar @ 2022-12-24 2:30 UTC (permalink / raw) To: Wilco Dijkstra; +Cc: 'GNU C Library' [-- Attachment #1.1: Type: text/plain, Size: 3331 bytes --] Hi Wilco, Here goes benchmark code for comparing performance of stpecpy(3) against strlcpy(3). I used my_strlcpy(3), with your definition, because I suspect libbsd's version won't be optimized. stpecpy(3) was also updated to be as optimized as I can write it. I won't be showing exact numbers, as that's probably very system- and version-dependent. Test yourself for exact numbers. However, I'll share the general results. At first I was a little bit surprised (and disappointed) because strlcpy(3) seemed faster under GCC. When forming the very short string "Hello, " "worlds" "!", in just 3 consecutive calls, my_strlcpy(3) outperforms (~10%) stpecpy() in GCC. In Clang they had both similar times. When I added a long string of 'x's, the results more or less were the same. When I added a few calls by repeating "worlds" "!", that changed the balance very significantly for stpecpy(3). When chaining more than just a couple calls, strlcpy(3) starts showing a performance decrease. With the code below, I saw stpecpy(3) consistently outperform strlcpy(3) by around 3%. I guess the advantage of strlcpy(3) over stpecpy(3) may be due to the overhead of setting 'end = buf + BUFSIZ;' for stpecpy(3), and to the fact that the first strlcpy(3) call doesn't have to do the usual '+=', ' + l', and ' - l', which can be significant given how negligible the internal difference between the two is. Cheers, Alex --- alx@debian:~/tmp$ cat foo.c #include <err.h> #include <stdio.h> #include <stp/stpe/stpecpy.h> void foo(char *buf, size_t *len) { char *p, *end; end = buf + BUFSIZ; p = buf; p = stpecpy(p, end, "Heylo, xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"); p = stpecpy(p, end, "worlds"); p = stpecpy(p, end, "!"); p = stpecpy(p, end, "worlds"); p = stpecpy(p, end, "!"); p = stpecpy(p, end, "worlds"); p = stpecpy(p, end, "!"); p = stpecpy(p, end, "worlds"); p = stpecpy(p, end, "!"); if (p == end) { p--; warnx("Truncated"); } *len = p - buf; } void bar(char *buf, size_t *len) { size_t l; l = my_strlcpy(buf, "Hello, xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", BUFSIZ); if (l >= BUFSIZ) goto toolong; l += my_strlcpy(buf + l, "worlds", BUFSIZ - l); if (l >= BUFSIZ) goto toolong; l += my_strlcpy(buf + l, "!", BUFSIZ - l); if (l >= BUFSIZ) goto toolong; l += my_strlcpy(buf + l, "worlds", BUFSIZ - l); if (l >= BUFSIZ) goto toolong; l += my_strlcpy(buf + l, "!", BUFSIZ - l); if (l >= BUFSIZ) goto toolong; l += my_strlcpy(buf + l, "worlds", BUFSIZ - l); if (l >= BUFSIZ) goto toolong; l += my_strlcpy(buf + l, "!", BUFSIZ - l); if (l >= BUFSIZ) goto toolong; l += my_strlcpy(buf + l, "worlds", BUFSIZ - l); if (l >= BUFSIZ) goto toolong; l += my_strlcpy(buf + l, "!", BUFSIZ - l); if (l >= BUFSIZ) { toolong: l = BUFSIZ - 1; warnx("Truncated"); } *len = l; } alx@debian:~/tmp$ cat bench.c #include <stdio.h> int foo(char *buf, size_t *len); int bar(char *buf, size_t *len); int main(void) { char buf[BUFSIZ]; size_t len; for (size_t i = 0; i < 10000000; i++) #if 1 foo(buf, &len); #else bar(buf, &len); #endif printf("%zu: %s\n", len, buf); } -- <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: stpecpy(3) vs strlcpy(3) benchmark (was: [PATCH 1/1] string: Add stpecpy(3)) 2022-12-24 2:30 ` stpecpy(3) vs strlcpy(3) benchmark (was: [PATCH 1/1] string: Add stpecpy(3)) Alejandro Colomar @ 2022-12-24 10:28 ` Alejandro Colomar 0 siblings, 0 replies; 16+ messages in thread From: Alejandro Colomar @ 2022-12-24 10:28 UTC (permalink / raw) To: Wilco Dijkstra; +Cc: 'GNU C Library' [-- Attachment #1.1: Type: text/plain, Size: 4388 bytes --] Hi Wilco, On 12/24/22 03:30, Alejandro Colomar wrote: > Here goes benchmark code for comparing performance of stpecpy(3) against > strlcpy(3). I used my_strlcpy(3), with your definition, because I suspect > libbsd's version won't be optimized. stpecpy(3) was also updated to be as > optimized as I can write it. > > I won't be showing exact numbers, as that's probably very system- and > version-dependent. Test yourself for exact numbers. However, I'll share the > general results. > > At first I was a little bit surprised (and disappointed) because strlcpy(3) > seemed faster under GCC. When forming the very short string "Hello, " "worlds" > "!", in just 3 consecutive calls, my_strlcpy(3) outperforms (~10%) stpecpy() in > GCC. In Clang they had both similar times. Small self correction: While I compiled the benchmarked functions with -O3, I had forgotten to add -O3 to the compilation of foo() and bar(). With that, stpecpy() is faster in every single case. :) Cheers, Alex > > When I added a long string of 'x's, the results more or less were the same. > > When I added a few calls by repeating "worlds" "!", that changed the balance > very significantly for stpecpy(3). When chaining more than just a couple calls, > strlcpy(3) starts showing a performance decrease. With the code below, I saw > stpecpy(3) consistently outperform strlcpy(3) by around 3%. > > I guess the advantage of strlcpy(3) over stpecpy(3) may be due to the overhead > of setting 'end = buf + BUFSIZ;' for stpecpy(3), and to the fact that the first > strlcpy(3) call doesn't have to do the usual '+=', ' + l', and ' - l', which can > be significant given how negligible the internal difference between the two is. > > Cheers, > Alex > > --- > > alx@debian:~/tmp$ cat foo.c > #include <err.h> > #include <stdio.h> > > #include <stp/stpe/stpecpy.h> > > void > foo(char *buf, size_t *len) > { > char *p, *end; > > end = buf + BUFSIZ; > p = buf; > p = stpecpy(p, end, "Heylo, xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"); > p = stpecpy(p, end, "worlds"); > p = stpecpy(p, end, "!"); > p = stpecpy(p, end, "worlds"); > p = stpecpy(p, end, "!"); > p = stpecpy(p, end, "worlds"); > p = stpecpy(p, end, "!"); > p = stpecpy(p, end, "worlds"); > p = stpecpy(p, end, "!"); > if (p == end) { > p--; > warnx("Truncated"); > } > *len = p - buf; > } > > void > bar(char *buf, size_t *len) > { > size_t l; > > l = my_strlcpy(buf, "Hello, xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", > BUFSIZ); > if (l >= BUFSIZ) > goto toolong; > l += my_strlcpy(buf + l, "worlds", BUFSIZ - l); > if (l >= BUFSIZ) > goto toolong; > l += my_strlcpy(buf + l, "!", BUFSIZ - l); > if (l >= BUFSIZ) > goto toolong; > l += my_strlcpy(buf + l, "worlds", BUFSIZ - l); > if (l >= BUFSIZ) > goto toolong; > l += my_strlcpy(buf + l, "!", BUFSIZ - l); > if (l >= BUFSIZ) > goto toolong; > l += my_strlcpy(buf + l, "worlds", BUFSIZ - l); > if (l >= BUFSIZ) > goto toolong; > l += my_strlcpy(buf + l, "!", BUFSIZ - l); > if (l >= BUFSIZ) > goto toolong; > l += my_strlcpy(buf + l, "worlds", BUFSIZ - l); > if (l >= BUFSIZ) > goto toolong; > l += my_strlcpy(buf + l, "!", BUFSIZ - l); > if (l >= BUFSIZ) { > toolong: > l = BUFSIZ - 1; > warnx("Truncated"); > } > > *len = l; > } > > alx@debian:~/tmp$ cat bench.c > #include <stdio.h> > > int foo(char *buf, size_t *len); > int bar(char *buf, size_t *len); > > int > main(void) > { > char buf[BUFSIZ]; > size_t len; > > for (size_t i = 0; i < 10000000; i++) > #if 1 > foo(buf, &len); > #else > bar(buf, &len); > #endif > > printf("%zu: %s\n", len, buf); > } > > -- <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/1] string: Add stpecpy(3) 2022-12-24 0:26 ` Alejandro Colomar 2022-12-24 2:30 ` stpecpy(3) vs strlcpy(3) benchmark (was: [PATCH 1/1] string: Add stpecpy(3)) Alejandro Colomar @ 2022-12-25 1:52 ` Noah Goldstein 2022-12-25 14:37 ` Alejandro Colomar 1 sibling, 1 reply; 16+ messages in thread From: Noah Goldstein @ 2022-12-25 1:52 UTC (permalink / raw) To: Alejandro Colomar; +Cc: Wilco Dijkstra, GNU C Library On Fri, Dec 23, 2022 at 4:26 PM Alejandro Colomar via Libc-alpha <libc-alpha@sourceware.org> wrote: > > > > On 12/24/22 01:05, Alejandro Colomar wrote: > > > > > > On 12/24/22 00:24, Wilco Dijkstra wrote: > >> Hi Alex, > >> > >>> For that, we'd first need to discuss what is a typical scenario. > >> > >> Like copying/concatenating strings that fit within the buffer. > > >>> And also, it depends a lot on what the compiler can optimize. If I call > >>> strlcat(3) in a loop, I know that stpecpy(3) is going to be orders of magnitude > >>> faster. > >> > >> If you're trying to say that the 'strcat' variant is bad then yes absolutely - > > > > I must admit that it has good things, if you have a compiler that does the magic > > for you. GCC optimizes strcat(3) into stpcpy(3), so if you know that it will be > > optimized, it's not so bad, and the source code is cleaner to the eye. > > > >> it's better to inline in the compiler or avoid the 'strcat' versions altogether > >> (that's also why I would strongly suggest never to add more 'cat' variants). > > > > inlining in the compiler is a good solution. And yes, I agree on not adding cat > > variants, but at the same time, str functions are problematic in that they can't > > be chained, as opposed to stp functions. That problem shows itself in the > > snprintf(3) bugs I mentioned. Users need a way to catenate easily, even if not > > with a 'cat' function. > > > >> But that doesn't say anything about whether stpecpy is better than strlcpy. > >> > >>> If I call strlcpy(3) in a loop, doing what an ideal compiler might do, that > >>> might be something to benchmark, but we'd also need to discuss what is a good > >>> input for the benchmark. > >> > >> The typical case would be copying or concatenating smallish strings to a buffer. > > > > Okay, I'll try to prepare a benchmark. > > > >> > >>> In the OpenBSD definition of strlcpy(), I count 4 branches, and one of them is > >>> inside a while loop. So, I'd find it very surprising if strlcpy(3) outperformed > >>> stpecpy(3). > >> > >> If that really is the OpenBSD implementation then this proves my point that > >> non-standard string functions are often totally unoptimized. > > > > And not only that, but I find your version much more readable. I don't > > understand how the OpenBSD version was written that way, and hasn't been fixed > > so far. > > > >> > >> A basic implementation of strlcpy would use strlen and memcpy so it is fast > >> on every system without requiring any optimization: > >> > >> size_t > >> strlcpy (char *dst, const char *src, size_t size) > >> { > >> size_t len = strlen (src); > >> > >> if (size == 0) > >> return len; > >> size = len >= size ? size - 1 : len; > > > > I'd use a separate variable dlen, to differentiate it from size. Otherwise, it > > looks like an off-by-one bug just below, since writing at a [size] usually means > > writing past the array. > > > >> dst[size] = 0; > >> memcpy (dst, src, size); > >> return len; > >> } > > > > Then, to compare oranges to oranges, I'll provide the equivalently optimized > > stpecpy(3): > > > > char * > > stpecpy (char *dst, char *end, const char *restrict src) > > { > > size_t dsize; > > size_t dlen; > > size_t slen; > > > > slen = strlen(src); > > > > if (dst == end) > > return NULL; > > if (unlikely(dst == NULL)) > > return NULL; > > if (dst > end) > > unreachable(); > > dsize = end - dst; > > dlen = slen >= dsize ? dsize - 1 : slen; > > dst[dlen] = 0; > > return mempcpy(dst, src, dlen); > > } > > Sorry, I wrote a bug while optimizing: I forgot about the sentinel 'end' return. > Now I think it should be fine (anyway, I'll test it soon): > > char * > stpecpy (char *dst, char *end, const char *restrict src) > { > size_t dsize; > size_t dlen; > size_t slen = strlen (src); Imo move `dst == end` and `dst == NULL` check before strlen and change strlen to `strnlen(src, dsize)`. > bool trunc = false; > > if (dst == end) Out of curiosity what if `end < dst`? > return NULL; > if (dst == NULL) > return NULL; > dsize = end - dst; > trunc = (slen >= dsize); > dlen = trunc ? dsize - 1 : slen; > dst[dlen] = 0; > return mempcpy(dst, src, dlen) + trunc; > } > > This adds a '+' operation, so the difference compared to your strlcpy(3) is > smaller, but stpecpy() still wins, I think. > > > -- > <http://www.alejandro-colomar.es/> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/1] string: Add stpecpy(3) 2022-12-25 1:52 ` [PATCH 1/1] string: Add stpecpy(3) Noah Goldstein @ 2022-12-25 14:37 ` Alejandro Colomar 2022-12-25 22:31 ` Noah Goldstein 0 siblings, 1 reply; 16+ messages in thread From: Alejandro Colomar @ 2022-12-25 14:37 UTC (permalink / raw) To: Noah Goldstein; +Cc: Wilco Dijkstra, GNU C Library [-- Attachment #1.1: Type: text/plain, Size: 3160 bytes --] Hi Noah, On 12/25/22 02:52, Noah Goldstein wrote: >> char * >> stpecpy (char *dst, char *end, const char *restrict src) >> { >> size_t dsize; >> size_t dlen; >> size_t slen = strlen (src); > > Imo move `dst == end` and `dst == NULL` check before strlen That's a valid optimization. Having strlen(3) before has the advantage that you make sure that your strings are strings, as strlcpy(3) does. But since we're inventing stpecpy(3), we can choose to be faster. If anyone wants to instrument their code, they can add a temporary wrapper that does that. > and change strlen to `strnlen(src, dsize)`. About strnlen(3), I have doubts. Isn't strlen(3) faster for the common case of no truncation or little truncation? strnlen(3) would optimize for the case where you truncate by a large difference. >> bool trunc = false; >> >> if (dst == end) > > Out of curiosity what if `end < dst`? The behavior is undefined. That's by design. In the definition of stpecpy(3) I have currently in libstp, I even tell the compiler to optimize on that condition: <http://www.alejandro-colomar.es/src/alx/alx/libstp.git/tree/include/stp/stpe/stpecpy.h#n33> alx@asus5775:~/src/alx/libstp$ grepc -tfd stpecpy ./include/stp/stpe/stpecpy.h:21: inline char *stp_nullable stpecpy(char *stp_nullable dst, char *end, const char *restrict src) { bool trunc; size_t dsize, dlen, slen; slen = strlen(src); if (dst == end) return end; if (stp_unlikely(dst == NULL)) // Allow chaining with stpeprintf(). return NULL; stp_impossible(dst > end); dsize = end - dst; trunc = (slen >= dsize); dlen = trunc ? dsize - 1 : slen; dst[dlen] = '\0'; return mempcpy(dst, src, dlen) + trunc; } alx@asus5775:~/src/alx/libstp$ grepc -tm stp_impossible ./include/stp/_compiler.h:14: #define stp_impossible(e) do \ { \ if (e) \ stp_unreachable(); \ } while (0) alx@asus5775:~/src/alx/libstp$ grep -rnC1 define.stp_unreachable include/stp/_compiler.h-28-#if defined(unreachable) include/stp/_compiler.h:29:# define stp_unreachable() unreachable() include/stp/_compiler.h-30-#else include/stp/_compiler.h:31:# define stp_unreachable() __builtin_unreachable() include/stp/_compiler.h-32-#endif I'd do that for glibc, but I don't see any facility. Maybe we should add an __impossible() macro to document UB, and help the compiler. Cheers, Alex >> return NULL; >> if (dst == NULL) >> return NULL; >> dsize = end - dst; >> trunc = (slen >= dsize); >> dlen = trunc ? dsize - 1 : slen; >> dst[dlen] = 0; >> return mempcpy(dst, src, dlen) + trunc; >> } >> >> This adds a '+' operation, so the difference compared to your strlcpy(3) is >> smaller, but stpecpy() still wins, I think. >> >> >> -- >> <http://www.alejandro-colomar.es/> -- <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/1] string: Add stpecpy(3) 2022-12-25 14:37 ` Alejandro Colomar @ 2022-12-25 22:31 ` Noah Goldstein 2022-12-26 0:26 ` Alejandro Colomar 0 siblings, 1 reply; 16+ messages in thread From: Noah Goldstein @ 2022-12-25 22:31 UTC (permalink / raw) To: Alejandro Colomar; +Cc: Wilco Dijkstra, GNU C Library On Sun, Dec 25, 2022 at 6:37 AM Alejandro Colomar <alx.manpages@gmail.com> wrote: > > Hi Noah, > > On 12/25/22 02:52, Noah Goldstein wrote: > > >> char * > >> stpecpy (char *dst, char *end, const char *restrict src) > >> { > >> size_t dsize; > >> size_t dlen; > >> size_t slen = strlen (src); > > > > Imo move `dst == end` and `dst == NULL` check before strlen > > > That's a valid optimization. Having strlen(3) before has the advantage that you > make sure that your strings are strings, as strlcpy(3) does. But since we're > inventing stpecpy(3), we can choose to be faster. If anyone wants to instrument > their code, they can add a temporary wrapper that does that. > > > and change strlen to `strnlen(src, dsize)`. > > About strnlen(3), I have doubts. Isn't strlen(3) faster for the common case of > no truncation or little truncation? strnlen(3) would optimize for the case > where you truncate by a large difference. It's faster if strlen(s) <= strnlen(s, N) (maybe up to N + 32). But generally I think of it like `qsort`. Most data gets n * log(n) behavior but still it's worth preventing the worst case for minor constant cost. > > >> bool trunc = false; > >> > >> if (dst == end) > > > > Out of curiosity what if `end < dst`? > > The behavior is undefined. That's by design. In the definition of stpecpy(3) I > have currently in libstp, I even tell the compiler to optimize on that condition: > <http://www.alejandro-colomar.es/src/alx/alx/libstp.git/tree/include/stp/stpe/stpecpy.h#n33> > You could probably optimize out one of the branches along the line of: if((dst - 1UL) >= (end - 1UL)) { // if dst == NULL, then dst - 1UL -> SIZE_MAX and must be >= any value. // if dst == end, then (dst - 1UL) >= (end - 1UL) will be true. return NULL; } > > alx@asus5775:~/src/alx/libstp$ grepc -tfd stpecpy > ./include/stp/stpe/stpecpy.h:21: > inline char *stp_nullable > stpecpy(char *stp_nullable dst, char *end, const char *restrict src) > { > bool trunc; > size_t dsize, dlen, slen; > > slen = strlen(src); > > if (dst == end) > return end; > if (stp_unlikely(dst == NULL)) // Allow chaining with stpeprintf(). > return NULL; > stp_impossible(dst > end); > > dsize = end - dst; > trunc = (slen >= dsize); > dlen = trunc ? dsize - 1 : slen; > dst[dlen] = '\0'; > > return mempcpy(dst, src, dlen) + trunc; > } > alx@asus5775:~/src/alx/libstp$ grepc -tm stp_impossible > ./include/stp/_compiler.h:14: > #define stp_impossible(e) do \ > { \ > if (e) \ > stp_unreachable(); \ > } while (0) > alx@asus5775:~/src/alx/libstp$ grep -rnC1 define.stp_unreachable > include/stp/_compiler.h-28-#if defined(unreachable) > include/stp/_compiler.h:29:# define stp_unreachable() unreachable() > include/stp/_compiler.h-30-#else > include/stp/_compiler.h:31:# define stp_unreachable() __builtin_unreachable() > include/stp/_compiler.h-32-#endif > > > I'd do that for glibc, but I don't see any facility. Maybe we should add an > __impossible() macro to document UB, and help the compiler. Does it result in any improved codegen? If not seems like making it fail more noisily is always a win. > > Cheers, > > Alex > > >> return NULL; > >> if (dst == NULL) > >> return NULL; > >> dsize = end - dst; > >> trunc = (slen >= dsize); > >> dlen = trunc ? dsize - 1 : slen; > >> dst[dlen] = 0; > >> return mempcpy(dst, src, dlen) + trunc; > >> } > >> > >> This adds a '+' operation, so the difference compared to your strlcpy(3) is > >> smaller, but stpecpy() still wins, I think. > >> > >> > >> -- > >> <http://www.alejandro-colomar.es/> > > -- > <http://www.alejandro-colomar.es/> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/1] string: Add stpecpy(3) 2022-12-25 22:31 ` Noah Goldstein @ 2022-12-26 0:26 ` Alejandro Colomar 2022-12-26 0:32 ` Noah Goldstein 0 siblings, 1 reply; 16+ messages in thread From: Alejandro Colomar @ 2022-12-26 0:26 UTC (permalink / raw) To: Noah Goldstein; +Cc: Wilco Dijkstra, GNU C Library [-- Attachment #1.1: Type: text/plain, Size: 6088 bytes --] On 12/25/22 23:31, Noah Goldstein wrote: > On Sun, Dec 25, 2022 at 6:37 AM Alejandro Colomar > <alx.manpages@gmail.com> wrote: >> >> Hi Noah, >> >> On 12/25/22 02:52, Noah Goldstein wrote: >> >>>> char * >>>> stpecpy (char *dst, char *end, const char *restrict src) >>>> { >>>> size_t dsize; >>>> size_t dlen; >>>> size_t slen = strlen (src); >>> >>> Imo move `dst == end` and `dst == NULL` check before strlen >> >> >> That's a valid optimization. Having strlen(3) before has the advantage that you >> make sure that your strings are strings, as strlcpy(3) does. But since we're >> inventing stpecpy(3), we can choose to be faster. If anyone wants to instrument >> their code, they can add a temporary wrapper that does that. >> >>> and change strlen to `strnlen(src, dsize)`. >> >> About strnlen(3), I have doubts. Isn't strlen(3) faster for the common case of >> no truncation or little truncation? strnlen(3) would optimize for the case >> where you truncate by a large difference. > > It's faster if strlen(s) <= strnlen(s, N) (maybe up to N + 32). > > But generally I think of it like `qsort`. Most data gets n * log(n) behavior > but still it's worth preventing the worst case for minor constant cost. > I. Maybe it's a good thing. Since it's a truncating API, I guess optimizing for truncation is reasonable. For common strings, which will be short (size <= 64), I guess the constant will really be negligible. > >> >>>> bool trunc = false; >>>> >>>> if (dst == end) >>> >>> Out of curiosity what if `end < dst`? >> >> The behavior is undefined. That's by design. In the definition of stpecpy(3) I >> have currently in libstp, I even tell the compiler to optimize on that condition: >> <http://www.alejandro-colomar.es/src/alx/alx/libstp.git/tree/include/stp/stpe/stpecpy.h#n33> >> > > You could probably optimize out one of the branches along the line of: > if((dst - 1UL) >= (end - 1UL)) { > // if dst == NULL, then dst - 1UL -> SIZE_MAX and must be >= any value. You would need a cast, wouldn't you? Otherwise, you'll get pointer arithmetic. Pointer arithmetic with NULL is UB. > // if dst == end, then (dst - 1UL) >= (end - 1UL) will be true. > return NULL; Returning NULL on truncation would be a possibility, but then we'd need to use errno to tell the user if the error was truncation or an input NULL (which reports an error to a previous vsnprintf(3) call wrapped by [v]stpeprintf(). Using errno would probably counter any optimization, since you'd still need one more branch for setting errno, so I guess it's simpler to just use end for truncation. Oooor, if we reimplement __vsnprintf_internal(3) to work on size_t and never fail, then we could add a [v]stpeprintf(3) that never fails, and then this function would only bail out on truncation. Would it be possible to make __vsnprintf_internal() never fail? What are the current failing conditions; only a size greater than INT_MAX, or are there more errors? > } >> >> alx@asus5775:~/src/alx/libstp$ grepc -tfd stpecpy >> ./include/stp/stpe/stpecpy.h:21: >> inline char *stp_nullable >> stpecpy(char *stp_nullable dst, char *end, const char *restrict src) >> { >> bool trunc; >> size_t dsize, dlen, slen; >> >> slen = strlen(src); >> >> if (dst == end) >> return end; >> if (stp_unlikely(dst == NULL)) // Allow chaining with stpeprintf(). >> return NULL; >> stp_impossible(dst > end); >> >> dsize = end - dst; >> trunc = (slen >= dsize); >> dlen = trunc ? dsize - 1 : slen; >> dst[dlen] = '\0'; >> >> return mempcpy(dst, src, dlen) + trunc; >> } >> alx@asus5775:~/src/alx/libstp$ grepc -tm stp_impossible >> ./include/stp/_compiler.h:14: >> #define stp_impossible(e) do \ >> { \ >> if (e) \ >> stp_unreachable(); \ >> } while (0) >> alx@asus5775:~/src/alx/libstp$ grep -rnC1 define.stp_unreachable >> include/stp/_compiler.h-28-#if defined(unreachable) >> include/stp/_compiler.h:29:# define stp_unreachable() unreachable() >> include/stp/_compiler.h-30-#else >> include/stp/_compiler.h:31:# define stp_unreachable() __builtin_unreachable() >> include/stp/_compiler.h-32-#endif >> >> >> I'd do that for glibc, but I don't see any facility. Maybe we should add an >> __impossible() macro to document UB, and help the compiler. > > Does it result in any improved codegen? If not seems like > making it fail more noisily is always a win. Both Clang and GCC generate the same code with or without the hint that it's impossible. Anyway, I'll keep it in my source code because it also helps tell the programmer that dst>end was taken into consideration and explicitly outlawed. The 'end' pointer is expected to be always generated as 'buf + sizeof(buf)'. Doing something different is not what this API is designed for, and should be warned by compilers. 'end' should be a pointer to one after the last byte in an array. Thus, no valid pointer can be greater than end. If you use this API as expected, which is, only chain it with itself and with stpeprintf(3), then it is impossible to have dst>end. But as always, GIGO. As for the expected result, it would be akin calling strlcpy(3) with a negative size. It would wrap around size_t, and give something close to 2^64. Both would result in a buffer overrun, so writing at random memory, and later a crash, but I don't expect that libc should try to detect if the input to strlcpy(3) (or actually, any mem*() function) is huge, and neither if input to stpecpy(3) is similarly broken. -- <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/1] string: Add stpecpy(3) 2022-12-26 0:26 ` Alejandro Colomar @ 2022-12-26 0:32 ` Noah Goldstein 2022-12-26 0:37 ` Alejandro Colomar 0 siblings, 1 reply; 16+ messages in thread From: Noah Goldstein @ 2022-12-26 0:32 UTC (permalink / raw) To: Alejandro Colomar; +Cc: Wilco Dijkstra, GNU C Library On Sun, Dec 25, 2022 at 4:26 PM Alejandro Colomar <alx.manpages@gmail.com> wrote: > > > > On 12/25/22 23:31, Noah Goldstein wrote: > > On Sun, Dec 25, 2022 at 6:37 AM Alejandro Colomar > > <alx.manpages@gmail.com> wrote: > >> > >> Hi Noah, > >> > >> On 12/25/22 02:52, Noah Goldstein wrote: > >> > >>>> char * > >>>> stpecpy (char *dst, char *end, const char *restrict src) > >>>> { > >>>> size_t dsize; > >>>> size_t dlen; > >>>> size_t slen = strlen (src); > >>> > >>> Imo move `dst == end` and `dst == NULL` check before strlen > >> > >> > >> That's a valid optimization. Having strlen(3) before has the advantage that you > >> make sure that your strings are strings, as strlcpy(3) does. But since we're > >> inventing stpecpy(3), we can choose to be faster. If anyone wants to instrument > >> their code, they can add a temporary wrapper that does that. > >> > >>> and change strlen to `strnlen(src, dsize)`. > >> > >> About strnlen(3), I have doubts. Isn't strlen(3) faster for the common case of > >> no truncation or little truncation? strnlen(3) would optimize for the case > >> where you truncate by a large difference. > > > > It's faster if strlen(s) <= strnlen(s, N) (maybe up to N + 32). > > > > But generally I think of it like `qsort`. Most data gets n * log(n) behavior > > but still it's worth preventing the worst case for minor constant cost. > > > > I. Maybe it's a good thing. Since it's a truncating API, I guess optimizing > for truncation is reasonable. For common strings, which will be short (size <= > 64), I guess the constant will really be negligible. > > > > >> > >>>> bool trunc = false; > >>>> > >>>> if (dst == end) > >>> > >>> Out of curiosity what if `end < dst`? > >> > >> The behavior is undefined. That's by design. In the definition of stpecpy(3) I > >> have currently in libstp, I even tell the compiler to optimize on that condition: > >> <http://www.alejandro-colomar.es/src/alx/alx/libstp.git/tree/include/stp/stpe/stpecpy.h#n33> > >> > > > > You could probably optimize out one of the branches along the line of: > > if((dst - 1UL) >= (end - 1UL)) { > > // if dst == NULL, then dst - 1UL -> SIZE_MAX and must be >= any value. > > You would need a cast, wouldn't you? Otherwise, you'll get pointer arithmetic. > Pointer arithmetic with NULL is UB. > > > // if dst == end, then (dst - 1UL) >= (end - 1UL) will be true. > > return NULL; > > Returning NULL on truncation would be a possibility, but then we'd need to use > errno to tell the user if the error was truncation or an input NULL (which > reports an error to a previous vsnprintf(3) call wrapped by [v]stpeprintf(). I'm not sure I see what you mean. Your current logic is: ``` if (dst == end) return NULL; if (dst == NULL) return NULL; ``` Equivalent (since dst >= end || dst == NULL is required) is: ``` if((dst - 1UL) >= (end - 1UL)) { return NULL; } ``` May need to be cast to a `uintptr` or something but don't see what you mean about needing to check errno and such. > > Using errno would probably counter any optimization, since you'd still need one > more branch for setting errno, so I guess it's simpler to just use end for > truncation. > > > Oooor, if we reimplement __vsnprintf_internal(3) to work on size_t and never > fail, then we could add a [v]stpeprintf(3) that never fails, and then this > function would only bail out on truncation. > > Would it be possible to make __vsnprintf_internal() never fail? What are the > current failing conditions; only a size greater than INT_MAX, or are there more > errors? Don't think its worth reimplementing __vsnprintf_internal to save a single branch here. > > > } > >> > >> alx@asus5775:~/src/alx/libstp$ grepc -tfd stpecpy > >> ./include/stp/stpe/stpecpy.h:21: > >> inline char *stp_nullable > >> stpecpy(char *stp_nullable dst, char *end, const char *restrict src) > >> { > >> bool trunc; > >> size_t dsize, dlen, slen; > >> > >> slen = strlen(src); > >> > >> if (dst == end) > >> return end; > >> if (stp_unlikely(dst == NULL)) // Allow chaining with stpeprintf(). > >> return NULL; > >> stp_impossible(dst > end); > >> > >> dsize = end - dst; > >> trunc = (slen >= dsize); > >> dlen = trunc ? dsize - 1 : slen; > >> dst[dlen] = '\0'; > >> > >> return mempcpy(dst, src, dlen) + trunc; > >> } > >> alx@asus5775:~/src/alx/libstp$ grepc -tm stp_impossible > >> ./include/stp/_compiler.h:14: > >> #define stp_impossible(e) do \ > >> { \ > >> if (e) \ > >> stp_unreachable(); \ > >> } while (0) > >> alx@asus5775:~/src/alx/libstp$ grep -rnC1 define.stp_unreachable > >> include/stp/_compiler.h-28-#if defined(unreachable) > >> include/stp/_compiler.h:29:# define stp_unreachable() unreachable() > >> include/stp/_compiler.h-30-#else > >> include/stp/_compiler.h:31:# define stp_unreachable() __builtin_unreachable() > >> include/stp/_compiler.h-32-#endif > >> > >> > >> I'd do that for glibc, but I don't see any facility. Maybe we should add an > >> __impossible() macro to document UB, and help the compiler. > > > > Does it result in any improved codegen? If not seems like > > making it fail more noisily is always a win. > > Both Clang and GCC generate the same code with or without the hint that it's > impossible. Anyway, I'll keep it in my source code because it also helps tell > the programmer that dst>end was taken into consideration and explicitly outlawed. > > The 'end' pointer is expected to be always generated as 'buf + sizeof(buf)'. > Doing something different is not what this API is designed for, and should be > warned by compilers. 'end' should be a pointer to one after the last byte in an > array. Thus, no valid pointer can be greater than end. If you use this API as > expected, which is, only chain it with itself and with stpeprintf(3), then it is > impossible to have dst>end. But as always, GIGO. > > As for the expected result, it would be akin calling strlcpy(3) with a negative > size. It would wrap around size_t, and give something close to 2^64. Both > would result in a buffer overrun, so writing at random memory, and later a > crash, but I don't expect that libc should try to detect if the input to > strlcpy(3) (or actually, any mem*() function) is huge, and neither if input to > stpecpy(3) is similarly broken. > > -- > <http://www.alejandro-colomar.es/> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/1] string: Add stpecpy(3) 2022-12-26 0:32 ` Noah Goldstein @ 2022-12-26 0:37 ` Alejandro Colomar 2022-12-26 2:43 ` Noah Goldstein 0 siblings, 1 reply; 16+ messages in thread From: Alejandro Colomar @ 2022-12-26 0:37 UTC (permalink / raw) To: Noah Goldstein; +Cc: Wilco Dijkstra, GNU C Library [-- Attachment #1.1: Type: text/plain, Size: 2443 bytes --] On 12/26/22 01:32, Noah Goldstein wrote: >>> You could probably optimize out one of the branches along the line of: >>> if((dst - 1UL) >= (end - 1UL)) { >>> // if dst == NULL, then dst - 1UL -> SIZE_MAX and must be >= any value. >> >> You would need a cast, wouldn't you? Otherwise, you'll get pointer arithmetic. >> Pointer arithmetic with NULL is UB. >> >>> // if dst == end, then (dst - 1UL) >= (end - 1UL) will be true. >>> return NULL; >> >> Returning NULL on truncation would be a possibility, but then we'd need to use >> errno to tell the user if the error was truncation or an input NULL (which >> reports an error to a previous vsnprintf(3) call wrapped by [v]stpeprintf(). > > I'm not sure I see what you mean. Your current logic is: > ``` > if (dst == end) > return NULL; > if (dst == NULL) > return NULL; No; current code is: if (dst == end) return end; if (dst == NULL) return NULL; NULL is an error (contents of string are undefined; per vsnprintf(3)'s spec), while 'end' is just truncation, and contents if the string are well defined. > ``` > Equivalent (since dst >= end || dst == NULL is required) is: > ``` > if((dst - 1UL) >= (end - 1UL)) { > return NULL; > } > ``` > May need to be cast to a `uintptr` or something but don't see > what you mean about needing to check errno and such. > >> >> Using errno would probably counter any optimization, since you'd still need one >> more branch for setting errno, so I guess it's simpler to just use end for >> truncation. >> >> >> Oooor, if we reimplement __vsnprintf_internal(3) to work on size_t and never >> fail, then we could add a [v]stpeprintf(3) that never fails, and then this >> function would only bail out on truncation. >> >> Would it be possible to make __vsnprintf_internal() never fail? What are the >> current failing conditions; only a size greater than INT_MAX, or are there more >> errors? > > Don't think its worth reimplementing __vsnprintf_internal to save a single > branch here. It wouldn't be only for that, but also allowing to write size_t bytes of formatted output. However, I question how useful that is, since you only need that many bytes when you're catenating strings with %s, for which stpecpy(3) can be used; so yes, probably it's not worth it. -- <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/1] string: Add stpecpy(3) 2022-12-26 0:37 ` Alejandro Colomar @ 2022-12-26 2:43 ` Noah Goldstein 2022-12-26 22:25 ` Alejandro Colomar 0 siblings, 1 reply; 16+ messages in thread From: Noah Goldstein @ 2022-12-26 2:43 UTC (permalink / raw) To: Alejandro Colomar; +Cc: Wilco Dijkstra, GNU C Library On Sun, Dec 25, 2022 at 4:37 PM Alejandro Colomar <alx.manpages@gmail.com> wrote: > > On 12/26/22 01:32, Noah Goldstein wrote: > >>> You could probably optimize out one of the branches along the line of: > >>> if((dst - 1UL) >= (end - 1UL)) { > >>> // if dst == NULL, then dst - 1UL -> SIZE_MAX and must be >= any value. > >> > >> You would need a cast, wouldn't you? Otherwise, you'll get pointer arithmetic. > >> Pointer arithmetic with NULL is UB. > >> > >>> // if dst == end, then (dst - 1UL) >= (end - 1UL) will be true. > >>> return NULL; > >> > >> Returning NULL on truncation would be a possibility, but then we'd need to use > >> errno to tell the user if the error was truncation or an input NULL (which > >> reports an error to a previous vsnprintf(3) call wrapped by [v]stpeprintf(). > > > > I'm not sure I see what you mean. Your current logic is: > > ``` > > if (dst == end) > > return NULL; > > if (dst == NULL) > > return NULL; > > No; current code is: > > if (dst == end) > return end; > if (dst == NULL) > return NULL; I see. The code you commented earlier was NULL for both. Either way you can just make it: ``` if((dst - 1UL) >= (end - 1UL)) { return dst; // either dst == NULL or dst == end. } ``` > > NULL is an error (contents of string are undefined; per vsnprintf(3)'s spec), > while 'end' is just truncation, and contents if the string are well defined. > > > > ``` > > Equivalent (since dst >= end || dst == NULL is required) is: > > ``` > > if((dst - 1UL) >= (end - 1UL)) { > > return NULL; > > } > > ``` > > May need to be cast to a `uintptr` or something but don't see > > what you mean about needing to check errno and such. > > > >> > >> Using errno would probably counter any optimization, since you'd still need one > >> more branch for setting errno, so I guess it's simpler to just use end for > >> truncation. > >> > >> > >> Oooor, if we reimplement __vsnprintf_internal(3) to work on size_t and never > >> fail, then we could add a [v]stpeprintf(3) that never fails, and then this > >> function would only bail out on truncation. > >> > >> Would it be possible to make __vsnprintf_internal() never fail? What are the > >> current failing conditions; only a size greater than INT_MAX, or are there more > >> errors? > > > > Don't think its worth reimplementing __vsnprintf_internal to save a single > > branch here. > > It wouldn't be only for that, but also allowing to write size_t bytes of > formatted output. However, I question how useful that is, since you only need > that many bytes when you're catenating strings with %s, for which stpecpy(3) can > be used; so yes, probably it's not worth it. > > -- > <http://www.alejandro-colomar.es/> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/1] string: Add stpecpy(3) 2022-12-26 2:43 ` Noah Goldstein @ 2022-12-26 22:25 ` Alejandro Colomar 2022-12-26 23:24 ` Alejandro Colomar 0 siblings, 1 reply; 16+ messages in thread From: Alejandro Colomar @ 2022-12-26 22:25 UTC (permalink / raw) To: Noah Goldstein; +Cc: Wilco Dijkstra, GNU C Library [-- Attachment #1.1: Type: text/plain, Size: 903 bytes --] Hi Noah, On 12/26/22 03:43, Noah Goldstein wrote: > I see. The code you commented earlier was NULL for both. I don't remember; maybe there was a typo... > > Either way you can just make it: > > ``` > if((dst - 1UL) >= (end - 1UL)) { I didn't expect that would be significantly better than `(dst == end || dst == NULL)`. However, I compiled both just to see, and the assembly output for your code is shorter. I'll benchmark both to see if there are performance differences. I wonder why the compiler doesn't generate this code if it's better; it has all the information to decide that it can be transformed into that. Both clang and GCC produce better code with your suggestion (although in the case of GCC, the difference is bigger. Cheers, Alex > return dst; // either dst == NULL or dst == end. > } > ``` -- <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/1] string: Add stpecpy(3) 2022-12-26 22:25 ` Alejandro Colomar @ 2022-12-26 23:24 ` Alejandro Colomar 2022-12-26 23:52 ` Alejandro Colomar 0 siblings, 1 reply; 16+ messages in thread From: Alejandro Colomar @ 2022-12-26 23:24 UTC (permalink / raw) To: Noah Goldstein; +Cc: Wilco Dijkstra, GNU C Library [-- Attachment #1.1: Type: text/plain, Size: 1116 bytes --] On 12/26/22 23:25, Alejandro Colomar wrote: > Hi Noah, > > On 12/26/22 03:43, Noah Goldstein wrote: > >> I see. The code you commented earlier was NULL for both. > > I don't remember; maybe there was a typo... > >> >> Either way you can just make it: >> >> ``` >> if((dst - 1UL) >= (end - 1UL)) { > > I didn't expect that would be significantly better than `(dst == end || dst == > NULL)`. However, I compiled both just to see, and the assembly output for your > code is shorter. I'll benchmark both to see if there are performance differences. > > I wonder why the compiler doesn't generate this code if it's better; it has all > the information to decide that it can be transformed into that. > > Both clang and GCC produce better code with your suggestion (although in the > case of GCC, the difference is bigger. Self-correction: They produce smaller code for your suggestion, but it seems to be slower code. > > Cheers, > > Alex > >> return dst; // either dst == NULL or dst == end. >> } >> ``` > > -- <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/1] string: Add stpecpy(3) 2022-12-26 23:24 ` Alejandro Colomar @ 2022-12-26 23:52 ` Alejandro Colomar 2022-12-27 0:12 ` Alejandro Colomar 0 siblings, 1 reply; 16+ messages in thread From: Alejandro Colomar @ 2022-12-26 23:52 UTC (permalink / raw) To: Noah Goldstein; +Cc: Wilco Dijkstra, GNU C Library [-- Attachment #1.1: Type: text/plain, Size: 1803 bytes --] On 12/27/22 00:24, Alejandro Colomar wrote: > > > On 12/26/22 23:25, Alejandro Colomar wrote: >> Hi Noah, >> >> On 12/26/22 03:43, Noah Goldstein wrote: >> >>> I see. The code you commented earlier was NULL for both. >> >> I don't remember; maybe there was a typo... >> >>> >>> Either way you can just make it: >>> >>> ``` >>> if((dst - 1UL) >= (end - 1UL)) { >> >> I didn't expect that would be significantly better than `(dst == end || dst == >> NULL)`. However, I compiled both just to see, and the assembly output for >> your code is shorter. I'll benchmark both to see if there are performance >> differences. >> >> I wonder why the compiler doesn't generate this code if it's better; it has >> all the information to decide that it can be transformed into that. >> >> Both clang and GCC produce better code with your suggestion (although in the >> case of GCC, the difference is bigger. > > Self-correction: > > They produce smaller code for your suggestion, but it seems to be slower code. However, and this is interesting, calling strnlen(3) results in faster code even when no truncation occurs; at least for the short string I tested. I suspect it might be because it is already heavily optimized in glibc, and it implicitly simplifies surrounding code: - slen = strlen(src); dsize = end - dst; - trunc = (slen >= dsize); + slen = strnlen(src, dsize); + trunc = (slen == dsize); The generated assembly code is one line smaller (on my system, and phase of the moon), and some small percent faster. :) Cheers, Alex > >> >> Cheers, >> >> Alex >> >>> return dst; // either dst == NULL or dst == end. >>> } >>> ``` >> >> > -- <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/1] string: Add stpecpy(3) 2022-12-26 23:52 ` Alejandro Colomar @ 2022-12-27 0:12 ` Alejandro Colomar 0 siblings, 0 replies; 16+ messages in thread From: Alejandro Colomar @ 2022-12-27 0:12 UTC (permalink / raw) To: Noah Goldstein; +Cc: Wilco Dijkstra, GNU C Library [-- Attachment #1.1: Type: text/plain, Size: 1379 bytes --] On 12/27/22 00:52, Alejandro Colomar wrote: > However, and this is interesting, calling strnlen(3) results in faster code even > when no truncation occurs; at least for the short string I tested. > > I suspect it might be because it is already heavily optimized in glibc, and it > implicitly simplifies surrounding code: > > - slen = strlen(src); > dsize = end - dst; > - trunc = (slen >= dsize); > + slen = strnlen(src, dsize); > + trunc = (slen == dsize); > > The generated assembly code is one line smaller (on my system, and phase of the > moon), and some small percent faster. :) I found the reason; it helps simplify considerably the code. Here's the resulting optimized code: char *stp_nullable stpecpy(char *stp_nullable dst, char *end, const char *restrict src) { bool trunc; size_t dsize, dlen, slen; if (dst == end) return end; if (stp_unlikely(dst == NULL)) // Allow chaining with stpeprintf(). return NULL; stp_impossible(dst > end); dsize = end - dst; slen = strnlen(src, dsize); trunc = (slen == dsize); dlen = slen - trunc; dst[dlen] = '\0'; return mempcpy(dst, src, dlen) + trunc; } See how using strnlen(3) removed the ternary operator. That's a great optimization. :) -- <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2022-12-27 0:12 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-12-23 23:24 [PATCH 1/1] string: Add stpecpy(3) Wilco Dijkstra 2022-12-24 0:05 ` Alejandro Colomar 2022-12-24 0:26 ` Alejandro Colomar 2022-12-24 2:30 ` stpecpy(3) vs strlcpy(3) benchmark (was: [PATCH 1/1] string: Add stpecpy(3)) Alejandro Colomar 2022-12-24 10:28 ` Alejandro Colomar 2022-12-25 1:52 ` [PATCH 1/1] string: Add stpecpy(3) Noah Goldstein 2022-12-25 14:37 ` Alejandro Colomar 2022-12-25 22:31 ` Noah Goldstein 2022-12-26 0:26 ` Alejandro Colomar 2022-12-26 0:32 ` Noah Goldstein 2022-12-26 0:37 ` Alejandro Colomar 2022-12-26 2:43 ` Noah Goldstein 2022-12-26 22:25 ` Alejandro Colomar 2022-12-26 23:24 ` Alejandro Colomar 2022-12-26 23:52 ` Alejandro Colomar 2022-12-27 0:12 ` Alejandro Colomar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).