Hi Wilco, On 12/23/22 15:59, Wilco Dijkstra wrote: > Hi Alex, > >>> Given strlcpy and strlcat are in POSIX next and therefore bar >>> some extraordinary event will be in glibc, I think we should >>> probably wait until those two land, then see if there's still >>> an appetite for stpecpy in glibc. >> >> I disagree for the following reasons. >> >> strlcpy(3)/strlcat(3) are designed to be _slow_, in exchange for added >> simplicity and safety. That's what Theo told me about them. They didn't care >> about performance. The two performance issues are: > > We'd need actual benchmarks to confirm there is a detectable performance > difference in typical scenarios. For that, we'd first need to discuss what is a typical scenario. And also, it depends a lot on what the compiler can optimize. If I call strlcat(3) in a loop, I know that stpecpy(3) is going to be orders of magnitude faster. If I call strlcpy(3) in a loop, doing what an ideal compiler might do, that might be something to benchmark, but we'd also need to discuss what is a good input for the benchmark. > So calling strlcpy slow is premature. Looking > at the proposed stpecpy, it seems it has a lot more branches and special cases > compared to a typical strlcpy, In my definition of stpecpy(), I see 3 branches (the other one is just an unreachable path to let the compiler optimize better). And in the NULL check, we could use __builtin_expect to let it assume it's unlikely. And there's one more branch inside memccpy(3), so that's 3 branches plus one more branch that can be inside __builtin_expect(). After those branches, it's all memcpy(3). In the OpenBSD definition of strlcpy(), I count 4 branches, and one of them is inside a while loop. So, I'd find it very surprising if strlcpy(3) outperformed stpecpy(3). For completeness, I compared these definitions: alx@asus5775:~/src/bsd/openbsd/src$ grepc -tfd strlcpy lib/ lib/libc/string/strlcpy.c:27: size_t strlcpy(char *dst, const char *src, size_t dsize) { const char *osrc = src; size_t nleft = dsize; /* Copy as many bytes as will fit. */ if (nleft != 0) { while (--nleft != 0) { if ((*dst++ = *src++) == '\0') break; } } /* Not enough room in dst, add NUL and traverse rest of src. */ if (nleft == 0) { if (dsize != 0) *dst = '\0'; /* NUL-terminate dst */ while (*src++) ; } return(src - osrc - 1); /* count does not include NUL */ } alx@asus5775:~/src/alx/libstp/src$ grepc -tfd stpecpy ./stp/stpe/stpecpy.c:15: char *stp_nullable stpecpy(char *stp_nullable dst, char end[0], const char *restrict src) { char *stp_nullable p; if (dst == end) return end; if (stp_unlikely(dst == NULL)) // Allow chaining with stpeprintf(). return NULL; if (dst > end) stp_unreachable(); p = memccpy(dst, src, '\0', end - dst); if (p != NULL) return p - 1; /* Truncation detected. */ end[-1] = '\0'; return end; } alx@asus5775:~/src/gnu/glibc$ grepc -tfd __memccpy ./string/memccpy.c:30: void * __memccpy (void *dest, const void *src, int c, size_t n) { void *p = memchr (src, c, n); if (p != NULL) return __mempcpy (dest, src, p - src + 1); memcpy (dest, src, n); return NULL; } > and that adds extra overhead. Using memccpy > is risky too since that is often not optimized. Well, with the current memccpy(3) I already suspect it's going to be faster than strlcpy(3). If you optimize it, it would increase the chances that it's faster :) > >> - Traverse the entire input string, to make sure it's a string. stpecpy(3) >> instead only reads what's necessary for the copy; it stops reading after truncation. > > Almost all strings are short and few cases need truncation, so I don't see the issue. > People concerned about performance wouldn't use the standard string functions > anyway. That's true for copying strings without truncation; in such cases, mempcpy(3) is probably the best thing you can use. That's documented in string_copying(7). However, mempcpy(3) doesn't help when you need to truncate. And there are cases where you need to truncate, especially when you need formatting (snprintf(3)). > >> - strlcat(3) finds the terminating null byte; that's something you already know >> where it is, with functions that return a useful pointer (mempcpy(3), stpcpy(3), >> and stpecpy(3)). > > If you know the end of the destination string then don't use concatenation. Easy! I'll quote strlcpy(3) man page: Since it is known how many characters were copied the first time, things can be sped up a bit by using a copy instead of an append: [... code example ...] However, one may question the validity of such optimizations, as they defeat the whole purpose of strlcpy() and strlcat(). As a matter of fact, the first version of this manual page got it wrong. So, not easy. > > In fact compilers could inline the 'dest += strlen (dest)' part of strcat and call > strcpy instead. This allows optimization of the strlen in case you know the size > of the destination string. This is true for strlcpy too, a compiler could just inline > it and optimize the strlen (src). Fair. Compilers already optimize strcat(3) into stpcpy(3), so I expect they could optimize strlcat(3) in the same way. So, let's assume that a compiler can optimize strlcat(3) calls into the correct strlcpy(3) calls: if (strlcpy(buf, "Hello ", sizeof(buf)) >= sizeof(buf)) goto toolong; if (strlcat(buf, "world", sizeof(buf)) >= sizeof(buf)) goto toolong; len = strlcat(buf, "!", sizeof(buf)); if (len >= sizeof(buf)) goto toolong; puts(buf); the code above could be optimized into (assuming I didn't write any bugs, which is likely, due to complexity): n = strlcpy(buf, "Hello ", sizeof(buf)); if (n >= sizeof(buf)) goto toolong; n += strlcpy(buf + n, "world", sizeof(buf) - n); if (n >= sizeof(buf)) goto toolong; n += strlcpy(buf + n, "!", sizeof(buf) - n); if (n >= sizeof(buf)) goto toolong; puts(buf); This means that to the branches in the implementation of strlcpy(3), we need to add one more. strlcpy(3) forces the user to check for truncation after every call, while stpecpy(3) allows checking truncation only once at the end of the chained calls. So, here goes another performance issue of these calls. And, while strlcpy(3)/cat(3) are simpler _when truncating is fine_, truncating is almost never fine. Which means you almost always need to check for truncation. Which means boilerplate code, which in turn makes these calls more complex to use than simply stpecpy(3). Here goes the equivalent code to the above: end = buf + sizeof(buf); p = buf; p = stpecpy(p, end, "Hello "); p = stpecpy(p, end, "world"); p = stpecpy(p, end, "!"); if (p == end) goto toolong; len = p - buf; puts(buf); I find it _way_ more readable than the strlcpy(3)/cat(3) code. Oh, and did I say it has less branches? :) > >> The reason that triggered me wanting to add this function is seeing strncpy(3) >> used for a patch to some glibc internals themselves. Using strlcpy(3)/cat(3) in >> glibc internals would be bad for performance; I would hope that glibc uses the >> optimal internals, even if it also provides slow functions for users. > > Most internal uses are unlikely to be performance critical, and correctness is kind > of important for libraries. Correctness comes from readability. The less code you write, the more correct the code will be. Just see the above for proof that this function actually promotes correct code, which in fact was the main reason for this function. Performance is just a nice side effect. > > IMHO inventing many slightly different non-standard string functions is what > causes performance and correctness issues. People disagree about the semantics > (strlcpy has been argued over for a decade or so). Even if a library supports them, > you never know which implementations are actually well optimized (obviously > this varies between ISA and different libc's). The good thing about stpecpy(3) is that it's difficult to make it slow. The naive implementation is already quite simple and fast. Not saying it's comparable to memcpy(3), but it's likely to be faster than competition, which is strlcpy(3). > So which non-standard string function > is safe to use across all targets and libraries? I covered that in string_copying(7). I guess in a few weeks you'll be able to read it in your favourite unsable distro :) > > In contrast we can be pretty sure that the standard strlen, memcpy etc are both > correct and efficient on all targets/libc's. Sure, but memcpy(3) is not usable in code that needs to truncate. We need to compare against stpncpy(3) (ughhh) and strlcpy(3). > > Cheers, > Wilco Cheers, Alex --