From: Alejandro Colomar <alx.manpages@gmail.com>
To: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Cc: 'GNU C Library' <libc-alpha@sourceware.org>
Subject: Re: [PATCH 1/1] string: Add stpecpy(3)
Date: Fri, 23 Dec 2022 18:03:43 +0100 [thread overview]
Message-ID: <f2eb7368-94be-fba0-6ed6-dd4d84542a95@gmail.com> (raw)
In-Reply-To: <PAWPR08MB8982DF2FE0C57C913536218083E99@PAWPR08MB8982.eurprd08.prod.outlook.com>
[-- Attachment #1.1: Type: text/plain, Size: 9505 bytes --]
Hi Wilco,
On 12/23/22 15:59, Wilco Dijkstra wrote:
> Hi Alex,
>
>>> Given strlcpy and strlcat are in POSIX next and therefore bar
>>> some extraordinary event will be in glibc, I think we should
>>> probably wait until those two land, then see if there's still
>>> an appetite for stpecpy in glibc.
>>
>> I disagree for the following reasons.
>>
>> strlcpy(3)/strlcat(3) are designed to be _slow_, in exchange for added
>> simplicity and safety. That's what Theo told me about them. They didn't care
>> about performance. The two performance issues are:
>
> We'd need actual benchmarks to confirm there is a detectable performance
> difference in typical scenarios.
For that, we'd first need to discuss what is a typical scenario.
And also, it depends a lot on what the compiler can optimize. If I call
strlcat(3) in a loop, I know that stpecpy(3) is going to be orders of magnitude
faster.
If I call strlcpy(3) in a loop, doing what an ideal compiler might do, that
might be something to benchmark, but we'd also need to discuss what is a good
input for the benchmark.
> So calling strlcpy slow is premature. Looking
> at the proposed stpecpy, it seems it has a lot more branches and special cases
> compared to a typical strlcpy,
In my definition of stpecpy(), I see 3 branches (the other one is just an
unreachable path to let the compiler optimize better). And in the NULL check,
we could use __builtin_expect to let it assume it's unlikely.
And there's one more branch inside memccpy(3), so that's 3 branches plus one
more branch that can be inside __builtin_expect(). After those branches, it's
all memcpy(3).
In the OpenBSD definition of strlcpy(), I count 4 branches, and one of them is
inside a while loop. So, I'd find it very surprising if strlcpy(3) outperformed
stpecpy(3).
For completeness, I compared these definitions:
alx@asus5775:~/src/bsd/openbsd/src$ grepc -tfd strlcpy lib/
lib/libc/string/strlcpy.c:27:
size_t
strlcpy(char *dst, const char *src, size_t dsize)
{
const char *osrc = src;
size_t nleft = dsize;
/* Copy as many bytes as will fit. */
if (nleft != 0) {
while (--nleft != 0) {
if ((*dst++ = *src++) == '\0')
break;
}
}
/* Not enough room in dst, add NUL and traverse rest of src. */
if (nleft == 0) {
if (dsize != 0)
*dst = '\0'; /* NUL-terminate dst */
while (*src++)
;
}
return(src - osrc - 1); /* count does not include NUL */
}
alx@asus5775:~/src/alx/libstp/src$ grepc -tfd stpecpy
./stp/stpe/stpecpy.c:15:
char *stp_nullable
stpecpy(char *stp_nullable dst, char end[0], const char *restrict src)
{
char *stp_nullable p;
if (dst == end)
return end;
if (stp_unlikely(dst == NULL)) // Allow chaining with stpeprintf().
return NULL;
if (dst > end)
stp_unreachable();
p = memccpy(dst, src, '\0', end - dst);
if (p != NULL)
return p - 1;
/* Truncation detected. */
end[-1] = '\0';
return end;
}
alx@asus5775:~/src/gnu/glibc$ grepc -tfd __memccpy
./string/memccpy.c:30:
void *
__memccpy (void *dest, const void *src, int c, size_t n)
{
void *p = memchr (src, c, n);
if (p != NULL)
return __mempcpy (dest, src, p - src + 1);
memcpy (dest, src, n);
return NULL;
}
> and that adds extra overhead. Using memccpy
> is risky too since that is often not optimized.
Well, with the current memccpy(3) I already suspect it's going to be faster than
strlcpy(3). If you optimize it, it would increase the chances that it's faster :)
>
>> - Traverse the entire input string, to make sure it's a string. stpecpy(3)
>> instead only reads what's necessary for the copy; it stops reading after truncation.
>
> Almost all strings are short and few cases need truncation, so I don't see the issue.
> People concerned about performance wouldn't use the standard string functions
> anyway.
That's true for copying strings without truncation; in such cases, mempcpy(3) is
probably the best thing you can use. That's documented in string_copying(7).
However, mempcpy(3) doesn't help when you need to truncate. And there are cases
where you need to truncate, especially when you need formatting (snprintf(3)).
>
>> - strlcat(3) finds the terminating null byte; that's something you already know
>> where it is, with functions that return a useful pointer (mempcpy(3), stpcpy(3),
>> and stpecpy(3)).
>
> If you know the end of the destination string then don't use concatenation. Easy!
I'll quote strlcpy(3) man page:
Since it is known how many characters were copied the first time,
things can be sped up a bit by using a copy instead of an append:
[... code example ...]
However, one may question the validity of such optimizations, as they
defeat the whole purpose of strlcpy() and strlcat(). As a matter of
fact, the first version of this manual page got it wrong.
So, not easy.
>
> In fact compilers could inline the 'dest += strlen (dest)' part of strcat and call
> strcpy instead. This allows optimization of the strlen in case you know the size
> of the destination string. This is true for strlcpy too, a compiler could just inline
> it and optimize the strlen (src).
Fair. Compilers already optimize strcat(3) into stpcpy(3), so I expect they
could optimize strlcat(3) in the same way. So, let's assume that a compiler can
optimize strlcat(3) calls into the correct strlcpy(3) calls:
if (strlcpy(buf, "Hello ", sizeof(buf)) >= sizeof(buf))
goto toolong;
if (strlcat(buf, "world", sizeof(buf)) >= sizeof(buf))
goto toolong;
len = strlcat(buf, "!", sizeof(buf));
if (len >= sizeof(buf))
goto toolong;
puts(buf);
the code above could be optimized into (assuming I didn't write any bugs, which
is likely, due to complexity):
n = strlcpy(buf, "Hello ", sizeof(buf));
if (n >= sizeof(buf))
goto toolong;
n += strlcpy(buf + n, "world", sizeof(buf) - n);
if (n >= sizeof(buf))
goto toolong;
n += strlcpy(buf + n, "!", sizeof(buf) - n);
if (n >= sizeof(buf))
goto toolong;
puts(buf);
This means that to the branches in the implementation of strlcpy(3), we need to
add one more. strlcpy(3) forces the user to check for truncation after every
call, while stpecpy(3) allows checking truncation only once at the end of the
chained calls. So, here goes another performance issue of these calls.
And, while strlcpy(3)/cat(3) are simpler _when truncating is fine_, truncating
is almost never fine. Which means you almost always need to check for
truncation. Which means boilerplate code, which in turn makes these calls more
complex to use than simply stpecpy(3). Here goes the equivalent code to the above:
end = buf + sizeof(buf);
p = buf;
p = stpecpy(p, end, "Hello ");
p = stpecpy(p, end, "world");
p = stpecpy(p, end, "!");
if (p == end)
goto toolong;
len = p - buf;
puts(buf);
I find it _way_ more readable than the strlcpy(3)/cat(3) code. Oh, and did I
say it has less branches? :)
>
>> The reason that triggered me wanting to add this function is seeing strncpy(3)
>> used for a patch to some glibc internals themselves. Using strlcpy(3)/cat(3) in
>> glibc internals would be bad for performance; I would hope that glibc uses the
>> optimal internals, even if it also provides slow functions for users.
>
> Most internal uses are unlikely to be performance critical, and correctness is kind
> of important for libraries.
Correctness comes from readability. The less code you write, the more correct
the code will be. Just see the above for proof that this function actually
promotes correct code, which in fact was the main reason for this function.
Performance is just a nice side effect.
>
> IMHO inventing many slightly different non-standard string functions is what
> causes performance and correctness issues. People disagree about the semantics
> (strlcpy has been argued over for a decade or so). Even if a library supports them,
> you never know which implementations are actually well optimized (obviously
> this varies between ISA and different libc's).
The good thing about stpecpy(3) is that it's difficult to make it slow. The
naive implementation is already quite simple and fast. Not saying it's
comparable to memcpy(3), but it's likely to be faster than competition, which is
strlcpy(3).
> So which non-standard string function
> is safe to use across all targets and libraries?
I covered that in string_copying(7). I guess in a few weeks you'll be able to
read it in your favourite unsable distro :)
>
> In contrast we can be pretty sure that the standard strlen, memcpy etc are both
> correct and efficient on all targets/libc's.
Sure, but memcpy(3) is not usable in code that needs to truncate. We need to
compare against stpncpy(3) (ughhh) and strlcpy(3).
>
> Cheers,
> Wilco
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2022-12-23 17:03 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-23 14:59 Wilco Dijkstra
2022-12-23 17:03 ` Alejandro Colomar [this message]
2022-12-23 17:27 ` Alejandro Colomar
-- strict thread matches above, loose matches on Subject: below --
2022-12-23 23:24 Wilco Dijkstra
2022-12-24 0:05 ` Alejandro Colomar
2022-12-24 0:26 ` Alejandro Colomar
2022-12-25 1:52 ` Noah Goldstein
2022-12-25 14:37 ` Alejandro Colomar
2022-12-25 22:31 ` Noah Goldstein
2022-12-26 0:26 ` Alejandro Colomar
2022-12-26 0:32 ` Noah Goldstein
2022-12-26 0:37 ` Alejandro Colomar
2022-12-26 2:43 ` Noah Goldstein
2022-12-26 22:25 ` Alejandro Colomar
2022-12-26 23:24 ` Alejandro Colomar
2022-12-26 23:52 ` Alejandro Colomar
2022-12-27 0:12 ` Alejandro Colomar
2022-12-23 18:35 Wilco Dijkstra
2022-12-23 22:40 ` Alejandro Colomar
2022-12-22 21:42 [PATCH 0/1] " Alejandro Colomar
2022-12-22 21:42 ` [PATCH 1/1] " Alejandro Colomar
2022-12-23 7:02 ` Sam James
2022-12-23 12:26 ` Alejandro Colomar
2022-12-23 12:29 ` Alejandro Colomar
2022-12-23 17:21 ` Alejandro Colomar
2022-12-31 15:13 ` Sam James
2022-12-31 15:15 ` Alejandro Colomar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f2eb7368-94be-fba0-6ed6-dd4d84542a95@gmail.com \
--to=alx.manpages@gmail.com \
--cc=Wilco.Dijkstra@arm.com \
--cc=libc-alpha@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).