From: Alejandro Colomar <alx.manpages@gmail.com>
To: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Cc: 'GNU C Library' <libc-alpha@sourceware.org>
Subject: Re: [PATCH 1/1] string: Add stpecpy(3)
Date: Sat, 24 Dec 2022 01:26:23 +0100 [thread overview]
Message-ID: <91d159d1-d379-af67-6859-bf8e3fa14c72@gmail.com> (raw)
In-Reply-To: <ddb94d90-3340-700c-76eb-7f06fba358bc@gmail.com>
[-- Attachment #1.1: Type: text/plain, Size: 4175 bytes --]
On 12/24/22 01:05, Alejandro Colomar wrote:
>
>
> On 12/24/22 00:24, Wilco Dijkstra wrote:
>> Hi Alex,
>>
>>> For that, we'd first need to discuss what is a typical scenario.
>>
>> Like copying/concatenating strings that fit within the buffer. >
>>> And also, it depends a lot on what the compiler can optimize. If I call
>>> strlcat(3) in a loop, I know that stpecpy(3) is going to be orders of magnitude
>>> faster.
>>
>> If you're trying to say that the 'strcat' variant is bad then yes absolutely -
>
> I must admit that it has good things, if you have a compiler that does the magic
> for you. GCC optimizes strcat(3) into stpcpy(3), so if you know that it will be
> optimized, it's not so bad, and the source code is cleaner to the eye.
>
>> it's better to inline in the compiler or avoid the 'strcat' versions altogether
>> (that's also why I would strongly suggest never to add more 'cat' variants).
>
> inlining in the compiler is a good solution. And yes, I agree on not adding cat
> variants, but at the same time, str functions are problematic in that they can't
> be chained, as opposed to stp functions. That problem shows itself in the
> snprintf(3) bugs I mentioned. Users need a way to catenate easily, even if not
> with a 'cat' function.
>
>> But that doesn't say anything about whether stpecpy is better than strlcpy.
>>
>>> If I call strlcpy(3) in a loop, doing what an ideal compiler might do, that
>>> might be something to benchmark, but we'd also need to discuss what is a good
>>> input for the benchmark.
>>
>> The typical case would be copying or concatenating smallish strings to a buffer.
>
> Okay, I'll try to prepare a benchmark.
>
>>
>>> In the OpenBSD definition of strlcpy(), I count 4 branches, and one of them is
>>> inside a while loop. So, I'd find it very surprising if strlcpy(3) outperformed
>>> stpecpy(3).
>>
>> If that really is the OpenBSD implementation then this proves my point that
>> non-standard string functions are often totally unoptimized.
>
> And not only that, but I find your version much more readable. I don't
> understand how the OpenBSD version was written that way, and hasn't been fixed
> so far.
>
>>
>> A basic implementation of strlcpy would use strlen and memcpy so it is fast
>> on every system without requiring any optimization:
>>
>> size_t
>> strlcpy (char *dst, const char *src, size_t size)
>> {
>> size_t len = strlen (src);
>>
>> if (size == 0)
>> return len;
>> size = len >= size ? size - 1 : len;
>
> I'd use a separate variable dlen, to differentiate it from size. Otherwise, it
> looks like an off-by-one bug just below, since writing at a [size] usually means
> writing past the array.
>
>> dst[size] = 0;
>> memcpy (dst, src, size);
>> return len;
>> }
>
> Then, to compare oranges to oranges, I'll provide the equivalently optimized
> stpecpy(3):
>
> char *
> stpecpy (char *dst, char *end, const char *restrict src)
> {
> size_t dsize;
> size_t dlen;
> size_t slen;
>
> slen = strlen(src);
>
> if (dst == end)
> return NULL;
> if (unlikely(dst == NULL))
> return NULL;
> if (dst > end)
> unreachable();
> dsize = end - dst;
> dlen = slen >= dsize ? dsize - 1 : slen;
> dst[dlen] = 0;
> return mempcpy(dst, src, dlen);
> }
Sorry, I wrote a bug while optimizing: I forgot about the sentinel 'end' return.
Now I think it should be fine (anyway, I'll test it soon):
char *
stpecpy (char *dst, char *end, const char *restrict src)
{
size_t dsize;
size_t dlen;
size_t slen = strlen (src);
bool trunc = false;
if (dst == end)
return NULL;
if (dst == NULL)
return NULL;
dsize = end - dst;
trunc = (slen >= dsize);
dlen = trunc ? dsize - 1 : slen;
dst[dlen] = 0;
return mempcpy(dst, src, dlen) + trunc;
}
This adds a '+' operation, so the difference compared to your strlcpy(3) is
smaller, but stpecpy() still wins, I think.
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2022-12-24 0:26 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-23 23:24 Wilco Dijkstra
2022-12-24 0:05 ` Alejandro Colomar
2022-12-24 0:26 ` Alejandro Colomar [this message]
2022-12-24 2:30 ` stpecpy(3) vs strlcpy(3) benchmark (was: [PATCH 1/1] string: Add stpecpy(3)) Alejandro Colomar
2022-12-24 10:28 ` Alejandro Colomar
2022-12-25 1:52 ` [PATCH 1/1] string: Add stpecpy(3) Noah Goldstein
2022-12-25 14:37 ` Alejandro Colomar
2022-12-25 22:31 ` Noah Goldstein
2022-12-26 0:26 ` Alejandro Colomar
2022-12-26 0:32 ` Noah Goldstein
2022-12-26 0:37 ` Alejandro Colomar
2022-12-26 2:43 ` Noah Goldstein
2022-12-26 22:25 ` Alejandro Colomar
2022-12-26 23:24 ` Alejandro Colomar
2022-12-26 23:52 ` Alejandro Colomar
2022-12-27 0:12 ` Alejandro Colomar
-- strict thread matches above, loose matches on Subject: below --
2022-12-23 18:35 Wilco Dijkstra
2022-12-23 22:40 ` Alejandro Colomar
2022-12-23 14:59 Wilco Dijkstra
2022-12-23 17:03 ` Alejandro Colomar
2022-12-23 17:27 ` Alejandro Colomar
2022-12-22 21:42 [PATCH 0/1] " Alejandro Colomar
2022-12-22 21:42 ` [PATCH 1/1] " Alejandro Colomar
2022-12-23 7:02 ` Sam James
2022-12-23 12:26 ` Alejandro Colomar
2022-12-23 12:29 ` Alejandro Colomar
2022-12-23 17:21 ` Alejandro Colomar
2022-12-31 15:13 ` Sam James
2022-12-31 15:15 ` Alejandro Colomar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=91d159d1-d379-af67-6859-bf8e3fa14c72@gmail.com \
--to=alx.manpages@gmail.com \
--cc=Wilco.Dijkstra@arm.com \
--cc=libc-alpha@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).