Re: [PATCH 1/1] string: Add stpecpy(3)

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

* Re: [PATCH 1/1] string: Add stpecpy(3)
@ 2022-12-23 18:35 Wilco Dijkstra
  2022-12-23 22:40 ` Alejandro Colomar
  0 siblings, 1 reply; 26+ messages in thread
From: Wilco Dijkstra @ 2022-12-23 18:35 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: 'GNU C Library'

Hi Alex,

>      if (dst == end)
>          return end;
>      if (stp_unlikely(dst == NULL))  // Allow chaining with stpeprintf().
>          return NULL;

> Oh, and the two branches above can be optimized into a branch that returns dst. 

How? There will be 2 branches since you're doing 2 checks here...

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/1] string: Add stpecpy(3)
  2022-12-23 18:35 [PATCH 1/1] string: Add stpecpy(3) Wilco Dijkstra
@ 2022-12-23 22:40 ` Alejandro Colomar
  0 siblings, 0 replies; 26+ messages in thread
From: Alejandro Colomar @ 2022-12-23 22:40 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: 'GNU C Library'


[-- Attachment #1.1: Type: text/plain, Size: 3969 bytes --]

Hi Wilco,

On 12/23/22 19:35, Wilco Dijkstra wrote:
> Hi Alex,

(a)

> 
>>        if (dst == end)
>>            return end;
>>        if (stp_unlikely(dst == NULL))  // Allow chaining with stpeprintf().
>>            return NULL;
> 
>> Oh, and the two branches above can be optimized into a branch that returns dst.
> 
> How? There will be 2 branches since you're doing 2 checks here...

That is equivalent to:

(b)

         if (dst == end)
             return dst;
         if (stp_unlikely(dst == NULL))  // Allow chaining with stpeprintf().
             return dst;

which itself is equivalent to:

(c)

         if ((dst == end) || stp_unlikely(dst == NULL))
             return dst;

which still has a branch in ||, due to the shortcut of the boolean operator.
However, the compiler is allowed to transform it into bitwise, since there are 
no side effects, no UB, and the result would be the same:

(d)

         if ((dst == end) | stp_unlikely(dst == NULL))
             return dst;

Which doesn't have a hidden branch.

I tried GCC, and (b) and (c) produce the same assembly code, but slightly 
different than (a) (but almost identical, nothing significant).  (d) produces 
considerably different assembly.  Tried under -O3 -march=native.  I don't know 
enough assembly to judge which is better; I'll copy the results here for the 
curious.  I guess the compiler seems to prefer an extra branch here over 
unconditionally doing bitwise operations; it very likely knows more than I do.

Cheers,

Alex



alx@asus5775:~/src/alx/libstp$ diff -u a.s b.s
--- a.s	2022-12-23 23:27:57.788103834 +0100
+++ b.s	2022-12-23 23:28:13.463919271 +0100
@@ -59,9 +59,9 @@
  	.cfi_offset 3, -16
  	movq	%rsi, %rbx
  	cmpq	%rsi, %rdi
-	je	.L9
-	testq	%rdi, %rdi
  	je	.L12
+	testq	%rdi, %rdi
+	je	.L13
  	movq	%rbx, %rcx
  	movq	%rdx, %rsi
  	xorl	%edx, %edx
@@ -79,7 +79,7 @@
  .L11:
  	.cfi_restore_state
  	movb	$0, -1(%rbx)
-.L9:
+.L12:
  	movq	%rbx, %rax
  	popq	%rbx
  	.cfi_remember_state
@@ -87,7 +87,7 @@
  	ret
  	.p2align 4,,10
  	.p2align 3
-.L12:
+.L13:
  	.cfi_restore_state
  	xorl	%eax, %eax
  	popq	%rbx
alx@asus5775:~/src/alx/libstp$ diff -u b.s c.s
alx@asus5775:~/src/alx/libstp$ diff -u c.s d.s
--- c.s	2022-12-23 23:29:07.315367548 +0100
+++ d.s	2022-12-23 23:28:58.007455133 +0100
@@ -11,16 +11,16 @@
  	.cfi_offset 3, -16
  	movq	%rsi, %rbx
  	cmpq	%rsi, %rdi
-	je	.L2
+	je	.L4
  	testq	%rdi, %rdi
-	je	.L5
+	je	.L4
  	movq	%rbx, %rcx
  	movq	%rdx, %rsi
  	xorl	%edx, %edx
  	subq	%rdi, %rcx
  	call	memccpy@PLT
  	testq	%rax, %rax
-	je	.L4
+	je	.L3
  	decq	%rax
  	popq	%rbx
  	.cfi_remember_state
@@ -28,20 +28,19 @@
  	ret
  	.p2align 4,,10
  	.p2align 3
-.L4:
+.L3:
  	.cfi_restore_state
-	movb	$0, -1(%rbx)
-.L2:
  	movq	%rbx, %rax
+	movb	$0, -1(%rbx)
  	popq	%rbx
  	.cfi_remember_state
  	.cfi_def_cfa_offset 8
  	ret
  	.p2align 4,,10
  	.p2align 3
-.L5:
+.L4:
  	.cfi_restore_state
-	xorl	%eax, %eax
+	movq	%rdi, %rax
  	popq	%rbx
  	.cfi_def_cfa_offset 8
  	ret
@@ -59,16 +58,16 @@
  	.cfi_offset 3, -16
  	movq	%rsi, %rbx
  	cmpq	%rsi, %rdi
-	je	.L12
+	je	.L10
  	testq	%rdi, %rdi
-	je	.L13
+	je	.L10
  	movq	%rbx, %rcx
  	movq	%rdx, %rsi
  	xorl	%edx, %edx
  	subq	%rdi, %rcx
  	call	memccpy@PLT
  	testq	%rax, %rax
-	je	.L11
+	je	.L9
  	decq	%rax
  	popq	%rbx
  	.cfi_remember_state
@@ -76,20 +75,19 @@
  	ret
  	.p2align 4,,10
  	.p2align 3
-.L11:
+.L9:
  	.cfi_restore_state
-	movb	$0, -1(%rbx)
-.L12:
  	movq	%rbx, %rax
+	movb	$0, -1(%rbx)
  	popq	%rbx
  	.cfi_remember_state
  	.cfi_def_cfa_offset 8
  	ret
  	.p2align 4,,10
  	.p2align 3
-.L13:
+.L10:
  	.cfi_restore_state
-	xorl	%eax, %eax
+	movq	%rdi, %rax
  	popq	%rbx
  	.cfi_def_cfa_offset 8
  	ret


> 
> Cheers,
> Wilco

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/1] string: Add stpecpy(3)
@ 2022-12-23 23:24 Wilco Dijkstra
  2022-12-24  0:05 ` Alejandro Colomar
  0 siblings, 1 reply; 26+ messages in thread
From: Wilco Dijkstra @ 2022-12-23 23:24 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: 'GNU C Library'

Hi Alex,

> For that, we'd first need to discuss what is a typical scenario.

Like copying/concatenating strings that fit within the buffer.

> And also, it depends a lot on what the compiler can optimize.  If I call 
> strlcat(3) in a loop, I know that stpecpy(3) is going to be orders of magnitude 
> faster.

If you're trying to say that the 'strcat' variant is bad then yes absolutely -
it's better to inline in the compiler or avoid the 'strcat' versions altogether
(that's also why I would strongly suggest never to add more 'cat' variants).
But that doesn't say anything about whether stpecpy is better than strlcpy.

> If I call strlcpy(3) in a loop, doing what an ideal compiler might do, that 
> might be something to benchmark, but we'd also need to discuss what is a good 
> input for the benchmark.

The typical case would be copying or concatenating smallish strings to a buffer.

> In the OpenBSD definition of strlcpy(), I count 4 branches, and one of them is 
> inside a while loop.  So, I'd find it very surprising if strlcpy(3) outperformed 
> stpecpy(3).

If that really is the OpenBSD implementation then this proves my point that
non-standard string functions are often totally unoptimized.

A basic implementation of strlcpy would use strlen and memcpy so it is fast
on every system without requiring any optimization:

size_t
strlcpy (char *dst, const char *src, size_t size)
{
  size_t len = strlen (src);

  if (size == 0)
    return len;
  size = len >= size ? size - 1 : len;
  dst[size] = 0;
  memcpy (dst, src, size);
  return len;
}

> Well, with the current memccpy(3) I already suspect it's going to be faster than 
> strlcpy(3).  If you optimize it, it would increase the chances that it's faster :)

I don't see why it would be any faster given memccpy might also not be
optimized.

> I find it _way_ more readable than the strlcpy(3)/cat(3) code.  Oh, and did I 
> say it has less branches? :)

I'm not so sure about that - you've got 3 call/returns plus at least 4 branches
for each stpecpy (besides whatever memcpy/memchr do). strlcpy has 2 calls/
returns plus one branch. So needing an extra branch in case you need to do
something special for the buffer full case doesn't seem like a major problem.

>> In contrast we can be pretty sure that the standard strlen, memcpy etc are both
>> correct and efficient on all targets/libc's.
>
> Sure, but memcpy(3) is not usable in code that needs to truncate.  We need to 
> compare against stpncpy(3) (ughhh) and strlcpy(3).

The idea is that if we add new string functions, their implementation should use
other string functions that are known to be well optimized for most targets.

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/1] string: Add stpecpy(3)
  2022-12-23 23:24 Wilco Dijkstra
@ 2022-12-24  0:05 ` Alejandro Colomar
  2022-12-24  0:26   ` Alejandro Colomar
  0 siblings, 1 reply; 26+ messages in thread
From: Alejandro Colomar @ 2022-12-24  0:05 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: 'GNU C Library'

[-- Attachment #1.1: Type: text/plain, Size: 6084 bytes --]

On 12/24/22 00:24, Wilco Dijkstra wrote:
> Hi Alex,
> 
>> For that, we'd first need to discuss what is a typical scenario.
> 
> Like copying/concatenating strings that fit within the buffer. >
>> And also, it depends a lot on what the compiler can optimize.  If I call
>> strlcat(3) in a loop, I know that stpecpy(3) is going to be orders of magnitude
>> faster.
> 
> If you're trying to say that the 'strcat' variant is bad then yes absolutely -

I must admit that it has good things, if you have a compiler that does the magic 
for you.  GCC optimizes strcat(3) into stpcpy(3), so if you know that it will be 
optimized, it's not so bad, and the source code is cleaner to the eye.

> it's better to inline in the compiler or avoid the 'strcat' versions altogether
> (that's also why I would strongly suggest never to add more 'cat' variants).

inlining in the compiler is a good solution.  And yes, I agree on not adding cat 
variants, but at the same time, str functions are problematic in that they can't 
be chained, as opposed to stp functions.  That problem shows itself in the 
snprintf(3) bugs I mentioned.  Users need a way to catenate easily, even if not 
with a 'cat' function.

> But that doesn't say anything about whether stpecpy is better than strlcpy.
> 
>> If I call strlcpy(3) in a loop, doing what an ideal compiler might do, that
>> might be something to benchmark, but we'd also need to discuss what is a good
>> input for the benchmark.
> 
> The typical case would be copying or concatenating smallish strings to a buffer.

Okay, I'll try to prepare a benchmark.

> 
>> In the OpenBSD definition of strlcpy(), I count 4 branches, and one of them is
>> inside a while loop.  So, I'd find it very surprising if strlcpy(3) outperformed
>> stpecpy(3).
> 
> If that really is the OpenBSD implementation then this proves my point that
> non-standard string functions are often totally unoptimized.

And not only that, but I find your version much more readable.  I don't 
understand how the OpenBSD version was written that way, and hasn't been fixed 
so far.

> 
> A basic implementation of strlcpy would use strlen and memcpy so it is fast
> on every system without requiring any optimization:
> 
> size_t
> strlcpy (char *dst, const char *src, size_t size)
> {
>    size_t len = strlen (src);
> 
>    if (size == 0)
>      return len;
>    size = len >= size ? size - 1 : len;

I'd use a separate variable dlen, to differentiate it from size.  Otherwise, it 
looks like an off-by-one bug just below, since writing at a [size] usually means 
writing past the array.

>    dst[size] = 0;
>    memcpy (dst, src, size);
>    return len;
> }

Then, to compare oranges to oranges, I'll provide the equivalently optimized 
stpecpy(3):

char *
stpecpy (char *dst, char *end, const char *restrict src)
{
   size_t dsize;
   size_t dlen;
   size_t slen;

   slen = strlen(src);

   if (dst == end)
     return NULL;
   if (unlikely(dst == NULL))
     return NULL;
   if (dst > end)
     unreachable();
   dsize = end - dst;
   dlen = slen >= dsize ? dsize - 1 : slen;
   dst[dlen] = 0;
   return mempcpy(dst, src, dlen);
}

Now we can really compare them.  (unlikely() is the obvious wrapper over 
__builtin_expect(), and unreachable() is C23's equivalent of 
__builtin_unreachable();  they're just extra optimizations, but can be ignored.)

There are various decissions to take here:

-  We could leave NULL as UB, but I want to handle it for being able to combine 
with stpeprintf().  Although, we could implement stpeprintf() so that it never 
fails (we would need to implement it without the INT_MAX limitation of snprintf(3)).

-  We could call strnlen(3) instead, but strlen(3) is probably faster in the 
average use case, and has the benefit of crashing on invalid input.

The differences with your strlcpy(3) implementation are:

-  NULL check.
-  dsize = end - dst; calculation

Considering that strlcpy(3) chained calls need extra boilerplate at call site 
(remember):

               n = strlcpy(buf, "Hello ", sizeof(buf));
               if (n >= sizeof(buf))
                   goto toolong;
               n += strlcpy(buf + n, "world", sizeof(buf) - n);
               if (n >= sizeof(buf))
                   goto toolong;
               n += strlcpy(buf + n, "!", sizeof(buf) - n);
               if (n >= sizeof(buf))
                   goto toolong;
               puts(buf);

we see that there are in reality more calculations in the case of strlcpy(3); I 
see 2 '+'s and 1 '-' at strlcpy(3) call site, while we only had an extra '-' in 
the stpecpy(3) internals.  The number of conditionals seems to be the same after 
all, except for one single conditional after all the chained stpecpy(3) calls.

So, for equally optimized code, stpecpy(3) seems to win.  It's not hard to 
believe, since they perform the same operation, with the difference that 
strlcpy(3) forces the user to recalculate the buffer size (or really, the 
compiler on behalf of the user, since the intention is that users don't write 
this code, and instead write calls to strlcat(3)).  While it seems very obvious, 
since the source code is so similar that we can see the differences, I'll still 
try to write a benchmark.

And remember that performance is only second to usability, which is the main 
selling point of stpe*() functions.

[...]

>>> In contrast we can be pretty sure that the standard strlen, memcpy etc are both
>>> correct and efficient on all targets/libc's.
>>
>> Sure, but memcpy(3) is not usable in code that needs to truncate.  We need to
>> compare against stpncpy(3) (ughhh) and strlcpy(3).
> 
> The idea is that if we add new string functions, their implementation should use
> other string functions that are known to be well optimized for most targets.

Fair.  Above is my optimized version of stpecpy(3).

> 
> Cheers,
> Wilco

Cheers,

Alex

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/1] string: Add stpecpy(3)
  2022-12-24  0:05 ` Alejandro Colomar
@ 2022-12-24  0:26   ` Alejandro Colomar
  2022-12-25  1:52     ` Noah Goldstein
  0 siblings, 1 reply; 26+ messages in thread
From: Alejandro Colomar @ 2022-12-24  0:26 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: 'GNU C Library'


[-- Attachment #1.1: Type: text/plain, Size: 4175 bytes --]



On 12/24/22 01:05, Alejandro Colomar wrote:
> 
> 
> On 12/24/22 00:24, Wilco Dijkstra wrote:
>> Hi Alex,
>>
>>> For that, we'd first need to discuss what is a typical scenario.
>>
>> Like copying/concatenating strings that fit within the buffer. >
>>> And also, it depends a lot on what the compiler can optimize.  If I call
>>> strlcat(3) in a loop, I know that stpecpy(3) is going to be orders of magnitude
>>> faster.
>>
>> If you're trying to say that the 'strcat' variant is bad then yes absolutely -
> 
> I must admit that it has good things, if you have a compiler that does the magic 
> for you.  GCC optimizes strcat(3) into stpcpy(3), so if you know that it will be 
> optimized, it's not so bad, and the source code is cleaner to the eye.
> 
>> it's better to inline in the compiler or avoid the 'strcat' versions altogether
>> (that's also why I would strongly suggest never to add more 'cat' variants).
> 
> inlining in the compiler is a good solution.  And yes, I agree on not adding cat 
> variants, but at the same time, str functions are problematic in that they can't 
> be chained, as opposed to stp functions.  That problem shows itself in the 
> snprintf(3) bugs I mentioned.  Users need a way to catenate easily, even if not 
> with a 'cat' function.
> 
>> But that doesn't say anything about whether stpecpy is better than strlcpy.
>>
>>> If I call strlcpy(3) in a loop, doing what an ideal compiler might do, that
>>> might be something to benchmark, but we'd also need to discuss what is a good
>>> input for the benchmark.
>>
>> The typical case would be copying or concatenating smallish strings to a buffer.
> 
> Okay, I'll try to prepare a benchmark.
> 
>>
>>> In the OpenBSD definition of strlcpy(), I count 4 branches, and one of them is
>>> inside a while loop.  So, I'd find it very surprising if strlcpy(3) outperformed
>>> stpecpy(3).
>>
>> If that really is the OpenBSD implementation then this proves my point that
>> non-standard string functions are often totally unoptimized.
> 
> And not only that, but I find your version much more readable.  I don't 
> understand how the OpenBSD version was written that way, and hasn't been fixed 
> so far.
> 
>>
>> A basic implementation of strlcpy would use strlen and memcpy so it is fast
>> on every system without requiring any optimization:
>>
>> size_t
>> strlcpy (char *dst, const char *src, size_t size)
>> {
>>    size_t len = strlen (src);
>>
>>    if (size == 0)
>>      return len;
>>    size = len >= size ? size - 1 : len;
> 
> I'd use a separate variable dlen, to differentiate it from size.  Otherwise, it 
> looks like an off-by-one bug just below, since writing at a [size] usually means 
> writing past the array.
> 
>>    dst[size] = 0;
>>    memcpy (dst, src, size);
>>    return len;
>> }
> 
> Then, to compare oranges to oranges, I'll provide the equivalently optimized 
> stpecpy(3):
> 
> char *
> stpecpy (char *dst, char *end, const char *restrict src)
> {
>    size_t dsize;
>    size_t dlen;
>    size_t slen;
> 
>    slen = strlen(src);
> 
>    if (dst == end)
>      return NULL;
>    if (unlikely(dst == NULL))
>      return NULL;
>    if (dst > end)
>      unreachable();
>    dsize = end - dst;
>    dlen = slen >= dsize ? dsize - 1 : slen;
>    dst[dlen] = 0;
>    return mempcpy(dst, src, dlen);
> }

Sorry, I wrote a bug while optimizing: I forgot about the sentinel 'end' return. 
  Now I think it should be fine (anyway, I'll test it soon):

char *
stpecpy (char *dst, char *end, const char *restrict src)
{
   size_t dsize;
   size_t dlen;
   size_t slen = strlen (src);
   bool   trunc = false;

   if (dst == end)
     return NULL;
   if (dst == NULL)
     return NULL;
   dsize = end - dst;
   trunc = (slen >= dsize);
   dlen = trunc ? dsize - 1 : slen;
   dst[dlen] = 0;
   return mempcpy(dst, src, dlen) + trunc;
}

This adds a '+' operation, so the difference compared to your strlcpy(3) is 
smaller, but stpecpy() still wins, I think.


-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/1] string: Add stpecpy(3)
  2022-12-24  0:26   ` Alejandro Colomar
@ 2022-12-25  1:52     ` Noah Goldstein
  2022-12-25 14:37       ` Alejandro Colomar
  0 siblings, 1 reply; 26+ messages in thread
From: Noah Goldstein @ 2022-12-25  1:52 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Wilco Dijkstra, GNU C Library

On Fri, Dec 23, 2022 at 4:26 PM Alejandro Colomar via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
>
>
> On 12/24/22 01:05, Alejandro Colomar wrote:
> >
> >
> > On 12/24/22 00:24, Wilco Dijkstra wrote:
> >> Hi Alex,
> >>
> >>> For that, we'd first need to discuss what is a typical scenario.
> >>
> >> Like copying/concatenating strings that fit within the buffer. >
> >>> And also, it depends a lot on what the compiler can optimize.  If I call
> >>> strlcat(3) in a loop, I know that stpecpy(3) is going to be orders of magnitude
> >>> faster.
> >>
> >> If you're trying to say that the 'strcat' variant is bad then yes absolutely -
> >
> > I must admit that it has good things, if you have a compiler that does the magic
> > for you.  GCC optimizes strcat(3) into stpcpy(3), so if you know that it will be
> > optimized, it's not so bad, and the source code is cleaner to the eye.
> >
> >> it's better to inline in the compiler or avoid the 'strcat' versions altogether
> >> (that's also why I would strongly suggest never to add more 'cat' variants).
> >
> > inlining in the compiler is a good solution.  And yes, I agree on not adding cat
> > variants, but at the same time, str functions are problematic in that they can't
> > be chained, as opposed to stp functions.  That problem shows itself in the
> > snprintf(3) bugs I mentioned.  Users need a way to catenate easily, even if not
> > with a 'cat' function.
> >
> >> But that doesn't say anything about whether stpecpy is better than strlcpy.
> >>
> >>> If I call strlcpy(3) in a loop, doing what an ideal compiler might do, that
> >>> might be something to benchmark, but we'd also need to discuss what is a good
> >>> input for the benchmark.
> >>
> >> The typical case would be copying or concatenating smallish strings to a buffer.
> >
> > Okay, I'll try to prepare a benchmark.
> >
> >>
> >>> In the OpenBSD definition of strlcpy(), I count 4 branches, and one of them is
> >>> inside a while loop.  So, I'd find it very surprising if strlcpy(3) outperformed
> >>> stpecpy(3).
> >>
> >> If that really is the OpenBSD implementation then this proves my point that
> >> non-standard string functions are often totally unoptimized.
> >
> > And not only that, but I find your version much more readable.  I don't
> > understand how the OpenBSD version was written that way, and hasn't been fixed
> > so far.
> >
> >>
> >> A basic implementation of strlcpy would use strlen and memcpy so it is fast
> >> on every system without requiring any optimization:
> >>
> >> size_t
> >> strlcpy (char *dst, const char *src, size_t size)
> >> {
> >>    size_t len = strlen (src);
> >>
> >>    if (size == 0)
> >>      return len;
> >>    size = len >= size ? size - 1 : len;
> >
> > I'd use a separate variable dlen, to differentiate it from size.  Otherwise, it
> > looks like an off-by-one bug just below, since writing at a [size] usually means
> > writing past the array.
> >
> >>    dst[size] = 0;
> >>    memcpy (dst, src, size);
> >>    return len;
> >> }
> >
> > Then, to compare oranges to oranges, I'll provide the equivalently optimized
> > stpecpy(3):
> >
> > char *
> > stpecpy (char *dst, char *end, const char *restrict src)
> > {
> >    size_t dsize;
> >    size_t dlen;
> >    size_t slen;
> >
> >    slen = strlen(src);
> >
> >    if (dst == end)
> >      return NULL;
> >    if (unlikely(dst == NULL))
> >      return NULL;
> >    if (dst > end)
> >      unreachable();
> >    dsize = end - dst;
> >    dlen = slen >= dsize ? dsize - 1 : slen;
> >    dst[dlen] = 0;
> >    return mempcpy(dst, src, dlen);
> > }
>
> Sorry, I wrote a bug while optimizing: I forgot about the sentinel 'end' return.
>   Now I think it should be fine (anyway, I'll test it soon):
>
> char *
> stpecpy (char *dst, char *end, const char *restrict src)
> {
>    size_t dsize;
>    size_t dlen;
>    size_t slen = strlen (src);

Imo move `dst == end` and `dst == NULL` check before strlen
and change strlen to `strnlen(src, dsize)`.
>    bool   trunc = false;
>
>    if (dst == end)

Out of curiosity what if `end < dst`?
>      return NULL;
>    if (dst == NULL)
>      return NULL;
>    dsize = end - dst;
>    trunc = (slen >= dsize);
>    dlen = trunc ? dsize - 1 : slen;
>    dst[dlen] = 0;
>    return mempcpy(dst, src, dlen) + trunc;
> }
>
> This adds a '+' operation, so the difference compared to your strlcpy(3) is
> smaller, but stpecpy() still wins, I think.
>
>
> --
> <http://www.alejandro-colomar.es/>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/1] string: Add stpecpy(3)
  2022-12-25  1:52     ` Noah Goldstein
@ 2022-12-25 14:37       ` Alejandro Colomar
  2022-12-25 22:31         ` Noah Goldstein
  0 siblings, 1 reply; 26+ messages in thread
From: Alejandro Colomar @ 2022-12-25 14:37 UTC (permalink / raw)
  To: Noah Goldstein; +Cc: Wilco Dijkstra, GNU C Library


[-- Attachment #1.1: Type: text/plain, Size: 3160 bytes --]

Hi Noah,

On 12/25/22 02:52, Noah Goldstein wrote:

>> char *
>> stpecpy (char *dst, char *end, const char *restrict src)
>> {
>>     size_t dsize;
>>     size_t dlen;
>>     size_t slen = strlen (src);
> 
> Imo move `dst == end` and `dst == NULL` check before strlen


That's a valid optimization.  Having strlen(3) before has the advantage that you 
make sure that your strings are strings, as strlcpy(3) does.  But since we're 
inventing stpecpy(3), we can choose to be faster.  If anyone wants to instrument 
their code, they can add a temporary wrapper that does that.

> and change strlen to `strnlen(src, dsize)`.

About strnlen(3), I have doubts.  Isn't strlen(3) faster for the common case of 
no truncation or little truncation?  strnlen(3) would optimize for the case 
where you truncate by a large difference.

>>     bool   trunc = false;
>>
>>     if (dst == end)
> 
> Out of curiosity what if `end < dst`?

The behavior is undefined.  That's by design.  In the definition of stpecpy(3) I 
have currently in libstp, I even tell the compiler to optimize on that condition:
<http://www.alejandro-colomar.es/src/alx/alx/libstp.git/tree/include/stp/stpe/stpecpy.h#n33>


alx@asus5775:~/src/alx/libstp$ grepc -tfd stpecpy
./include/stp/stpe/stpecpy.h:21:
inline char *stp_nullable
stpecpy(char *stp_nullable dst, char *end, const char *restrict src)
{
	bool    trunc;
	size_t  dsize, dlen, slen;

	slen = strlen(src);

	if (dst == end)
		return end;
	if (stp_unlikely(dst == NULL))  // Allow chaining with stpeprintf().
		return NULL;
	stp_impossible(dst > end);

	dsize = end - dst;
	trunc = (slen >= dsize);
	dlen = trunc ? dsize - 1 : slen;
	dst[dlen] = '\0';

	return mempcpy(dst, src, dlen) + trunc;
}
alx@asus5775:~/src/alx/libstp$ grepc -tm stp_impossible
./include/stp/_compiler.h:14:
#define stp_impossible(e)   do                                                \
{                                                                             \
	if (e)                                                                \
		stp_unreachable();                                            \
} while (0)
alx@asus5775:~/src/alx/libstp$ grep -rnC1 define.stp_unreachable
include/stp/_compiler.h-28-#if defined(unreachable)
include/stp/_compiler.h:29:# define stp_unreachable()  unreachable()
include/stp/_compiler.h-30-#else
include/stp/_compiler.h:31:# define stp_unreachable()  __builtin_unreachable()
include/stp/_compiler.h-32-#endif


I'd do that for glibc, but I don't see any facility.  Maybe we should add an 
__impossible() macro to document UB, and help the compiler.

Cheers,

Alex

>>       return NULL;
>>     if (dst == NULL)
>>       return NULL;
>>     dsize = end - dst;
>>     trunc = (slen >= dsize);
>>     dlen = trunc ? dsize - 1 : slen;
>>     dst[dlen] = 0;
>>     return mempcpy(dst, src, dlen) + trunc;
>> }
>>
>> This adds a '+' operation, so the difference compared to your strlcpy(3) is
>> smaller, but stpecpy() still wins, I think.
>>
>>
>> --
>> <http://www.alejandro-colomar.es/>

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/1] string: Add stpecpy(3)
  2022-12-25 14:37       ` Alejandro Colomar
@ 2022-12-25 22:31         ` Noah Goldstein
  2022-12-26  0:26           ` Alejandro Colomar
  0 siblings, 1 reply; 26+ messages in thread
From: Noah Goldstein @ 2022-12-25 22:31 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Wilco Dijkstra, GNU C Library

On Sun, Dec 25, 2022 at 6:37 AM Alejandro Colomar
<alx.manpages@gmail.com> wrote:
>
> Hi Noah,
>
> On 12/25/22 02:52, Noah Goldstein wrote:
>
> >> char *
> >> stpecpy (char *dst, char *end, const char *restrict src)
> >> {
> >>     size_t dsize;
> >>     size_t dlen;
> >>     size_t slen = strlen (src);
> >
> > Imo move `dst == end` and `dst == NULL` check before strlen
>
>
> That's a valid optimization.  Having strlen(3) before has the advantage that you
> make sure that your strings are strings, as strlcpy(3) does.  But since we're
> inventing stpecpy(3), we can choose to be faster.  If anyone wants to instrument
> their code, they can add a temporary wrapper that does that.
>
> > and change strlen to `strnlen(src, dsize)`.
>
> About strnlen(3), I have doubts.  Isn't strlen(3) faster for the common case of
> no truncation or little truncation?  strnlen(3) would optimize for the case
> where you truncate by a large difference.

It's faster if strlen(s) <= strnlen(s, N) (maybe up to N + 32).

But generally I think of it like `qsort`. Most data gets n * log(n) behavior
but still it's worth preventing the worst case for minor constant cost.


>
> >>     bool   trunc = false;
> >>
> >>     if (dst == end)
> >
> > Out of curiosity what if `end < dst`?
>
> The behavior is undefined.  That's by design.  In the definition of stpecpy(3) I
> have currently in libstp, I even tell the compiler to optimize on that condition:
> <http://www.alejandro-colomar.es/src/alx/alx/libstp.git/tree/include/stp/stpe/stpecpy.h#n33>
>

You could probably optimize out one of the branches along the line of:
if((dst - 1UL) >= (end - 1UL)) {
    // if dst == NULL, then dst - 1UL -> SIZE_MAX and must be >= any value.
    // if dst == end, then (dst - 1UL) >= (end - 1UL) will be true.
    return NULL;
}
>
> alx@asus5775:~/src/alx/libstp$ grepc -tfd stpecpy
> ./include/stp/stpe/stpecpy.h:21:
> inline char *stp_nullable
> stpecpy(char *stp_nullable dst, char *end, const char *restrict src)
> {
>         bool    trunc;
>         size_t  dsize, dlen, slen;
>
>         slen = strlen(src);
>
>         if (dst == end)
>                 return end;
>         if (stp_unlikely(dst == NULL))  // Allow chaining with stpeprintf().
>                 return NULL;
>         stp_impossible(dst > end);
>
>         dsize = end - dst;
>         trunc = (slen >= dsize);
>         dlen = trunc ? dsize - 1 : slen;
>         dst[dlen] = '\0';
>
>         return mempcpy(dst, src, dlen) + trunc;
> }
> alx@asus5775:~/src/alx/libstp$ grepc -tm stp_impossible
> ./include/stp/_compiler.h:14:
> #define stp_impossible(e)   do                                                \
> {                                                                             \
>         if (e)                                                                \
>                 stp_unreachable();                                            \
> } while (0)
> alx@asus5775:~/src/alx/libstp$ grep -rnC1 define.stp_unreachable
> include/stp/_compiler.h-28-#if defined(unreachable)
> include/stp/_compiler.h:29:# define stp_unreachable()  unreachable()
> include/stp/_compiler.h-30-#else
> include/stp/_compiler.h:31:# define stp_unreachable()  __builtin_unreachable()
> include/stp/_compiler.h-32-#endif
>
>
> I'd do that for glibc, but I don't see any facility.  Maybe we should add an
> __impossible() macro to document UB, and help the compiler.

Does it result in any improved codegen? If not seems like
making it fail more noisily is always a win.
>
> Cheers,
>
> Alex
>
> >>       return NULL;
> >>     if (dst == NULL)
> >>       return NULL;
> >>     dsize = end - dst;
> >>     trunc = (slen >= dsize);
> >>     dlen = trunc ? dsize - 1 : slen;
> >>     dst[dlen] = 0;
> >>     return mempcpy(dst, src, dlen) + trunc;
> >> }
> >>
> >> This adds a '+' operation, so the difference compared to your strlcpy(3) is
> >> smaller, but stpecpy() still wins, I think.
> >>
> >>
> >> --
> >> <http://www.alejandro-colomar.es/>
>
> --
> <http://www.alejandro-colomar.es/>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/1] string: Add stpecpy(3)
  2022-12-25 22:31         ` Noah Goldstein
@ 2022-12-26  0:26           ` Alejandro Colomar
  2022-12-26  0:32             ` Noah Goldstein
  0 siblings, 1 reply; 26+ messages in thread
From: Alejandro Colomar @ 2022-12-26  0:26 UTC (permalink / raw)
  To: Noah Goldstein; +Cc: Wilco Dijkstra, GNU C Library

[-- Attachment #1.1: Type: text/plain, Size: 6088 bytes --]

On 12/25/22 23:31, Noah Goldstein wrote:
> On Sun, Dec 25, 2022 at 6:37 AM Alejandro Colomar
> <alx.manpages@gmail.com> wrote:
>>
>> Hi Noah,
>>
>> On 12/25/22 02:52, Noah Goldstein wrote:
>>
>>>> char *
>>>> stpecpy (char *dst, char *end, const char *restrict src)
>>>> {
>>>>      size_t dsize;
>>>>      size_t dlen;
>>>>      size_t slen = strlen (src);
>>>
>>> Imo move `dst == end` and `dst == NULL` check before strlen
>>
>>
>> That's a valid optimization.  Having strlen(3) before has the advantage that you
>> make sure that your strings are strings, as strlcpy(3) does.  But since we're
>> inventing stpecpy(3), we can choose to be faster.  If anyone wants to instrument
>> their code, they can add a temporary wrapper that does that.
>>
>>> and change strlen to `strnlen(src, dsize)`.
>>
>> About strnlen(3), I have doubts.  Isn't strlen(3) faster for the common case of
>> no truncation or little truncation?  strnlen(3) would optimize for the case
>> where you truncate by a large difference.
> 
> It's faster if strlen(s) <= strnlen(s, N) (maybe up to N + 32).
> 
> But generally I think of it like `qsort`. Most data gets n * log(n) behavior
> but still it's worth preventing the worst case for minor constant cost.
> 

I.  Maybe it's a good thing.  Since it's a truncating API, I guess optimizing 
for truncation is reasonable.  For common strings, which will be short (size <= 
64), I guess the constant will really be negligible.

> 
>>
>>>>      bool   trunc = false;
>>>>
>>>>      if (dst == end)
>>>
>>> Out of curiosity what if `end < dst`?
>>
>> The behavior is undefined.  That's by design.  In the definition of stpecpy(3) I
>> have currently in libstp, I even tell the compiler to optimize on that condition:
>> <http://www.alejandro-colomar.es/src/alx/alx/libstp.git/tree/include/stp/stpe/stpecpy.h#n33>
>>
> 
> You could probably optimize out one of the branches along the line of:
> if((dst - 1UL) >= (end - 1UL)) {
>      // if dst == NULL, then dst - 1UL -> SIZE_MAX and must be >= any value.

You would need a cast, wouldn't you?  Otherwise, you'll get pointer arithmetic. 
Pointer arithmetic with NULL is UB.

>      // if dst == end, then (dst - 1UL) >= (end - 1UL) will be true.
>      return NULL;

Returning NULL on truncation would be a possibility, but then we'd need to use 
errno to tell the user if the error was truncation or an input NULL (which 
reports an error to a previous vsnprintf(3) call wrapped by [v]stpeprintf().

Using errno would probably counter any optimization, since you'd still need one 
more branch for setting errno, so I guess it's simpler to just use end for 
truncation.

Oooor, if we reimplement __vsnprintf_internal(3) to work on size_t and never 
fail, then we could add a [v]stpeprintf(3) that never fails, and then this 
function would only bail out on truncation.

Would it be possible to make __vsnprintf_internal() never fail?  What are the 
current failing conditions; only a size greater than INT_MAX, or are there more 
errors?

> }
>>
>> alx@asus5775:~/src/alx/libstp$ grepc -tfd stpecpy
>> ./include/stp/stpe/stpecpy.h:21:
>> inline char *stp_nullable
>> stpecpy(char *stp_nullable dst, char *end, const char *restrict src)
>> {
>>          bool    trunc;
>>          size_t  dsize, dlen, slen;
>>
>>          slen = strlen(src);
>>
>>          if (dst == end)
>>                  return end;
>>          if (stp_unlikely(dst == NULL))  // Allow chaining with stpeprintf().
>>                  return NULL;
>>          stp_impossible(dst > end);
>>
>>          dsize = end - dst;
>>          trunc = (slen >= dsize);
>>          dlen = trunc ? dsize - 1 : slen;
>>          dst[dlen] = '\0';
>>
>>          return mempcpy(dst, src, dlen) + trunc;
>> }
>> alx@asus5775:~/src/alx/libstp$ grepc -tm stp_impossible
>> ./include/stp/_compiler.h:14:
>> #define stp_impossible(e)   do                                                \
>> {                                                                             \
>>          if (e)                                                                \
>>                  stp_unreachable();                                            \
>> } while (0)
>> alx@asus5775:~/src/alx/libstp$ grep -rnC1 define.stp_unreachable
>> include/stp/_compiler.h-28-#if defined(unreachable)
>> include/stp/_compiler.h:29:# define stp_unreachable()  unreachable()
>> include/stp/_compiler.h-30-#else
>> include/stp/_compiler.h:31:# define stp_unreachable()  __builtin_unreachable()
>> include/stp/_compiler.h-32-#endif
>>
>>
>> I'd do that for glibc, but I don't see any facility.  Maybe we should add an
>> __impossible() macro to document UB, and help the compiler.
> 
> Does it result in any improved codegen? If not seems like
> making it fail more noisily is always a win.

Both Clang and GCC generate the same code with or without the hint that it's 
impossible.  Anyway, I'll keep it in my source code because it also helps tell 
the programmer that dst>end was taken into consideration and explicitly outlawed.

The 'end' pointer is expected to be always generated as 'buf + sizeof(buf)'. 
Doing something different is not what this API is designed for, and should be 
warned by compilers.  'end' should be a pointer to one after the last byte in an 
array.  Thus, no valid pointer can be greater than end.  If you use this API as 
expected, which is, only chain it with itself and with stpeprintf(3), then it is 
impossible to have dst>end.  But as always, GIGO.

As for the expected result, it would be akin calling strlcpy(3) with a negative 
size.  It would wrap around size_t, and give something close to 2^64.  Both 
would result in a buffer overrun, so writing at random memory, and later a 
crash, but I don't expect that libc should try to detect if the input to 
strlcpy(3) (or actually, any mem*() function) is huge, and neither if input to 
stpecpy(3) is similarly broken.

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/1] string: Add stpecpy(3)
  2022-12-26  0:26           ` Alejandro Colomar
@ 2022-12-26  0:32             ` Noah Goldstein
  2022-12-26  0:37               ` Alejandro Colomar
  0 siblings, 1 reply; 26+ messages in thread
From: Noah Goldstein @ 2022-12-26  0:32 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Wilco Dijkstra, GNU C Library

On Sun, Dec 25, 2022 at 4:26 PM Alejandro Colomar
<alx.manpages@gmail.com> wrote:
>
>
>
> On 12/25/22 23:31, Noah Goldstein wrote:
> > On Sun, Dec 25, 2022 at 6:37 AM Alejandro Colomar
> > <alx.manpages@gmail.com> wrote:
> >>
> >> Hi Noah,
> >>
> >> On 12/25/22 02:52, Noah Goldstein wrote:
> >>
> >>>> char *
> >>>> stpecpy (char *dst, char *end, const char *restrict src)
> >>>> {
> >>>>      size_t dsize;
> >>>>      size_t dlen;
> >>>>      size_t slen = strlen (src);
> >>>
> >>> Imo move `dst == end` and `dst == NULL` check before strlen
> >>
> >>
> >> That's a valid optimization.  Having strlen(3) before has the advantage that you
> >> make sure that your strings are strings, as strlcpy(3) does.  But since we're
> >> inventing stpecpy(3), we can choose to be faster.  If anyone wants to instrument
> >> their code, they can add a temporary wrapper that does that.
> >>
> >>> and change strlen to `strnlen(src, dsize)`.
> >>
> >> About strnlen(3), I have doubts.  Isn't strlen(3) faster for the common case of
> >> no truncation or little truncation?  strnlen(3) would optimize for the case
> >> where you truncate by a large difference.
> >
> > It's faster if strlen(s) <= strnlen(s, N) (maybe up to N + 32).
> >
> > But generally I think of it like `qsort`. Most data gets n * log(n) behavior
> > but still it's worth preventing the worst case for minor constant cost.
> >
>
> I.  Maybe it's a good thing.  Since it's a truncating API, I guess optimizing
> for truncation is reasonable.  For common strings, which will be short (size <=
> 64), I guess the constant will really be negligible.
>
> >
> >>
> >>>>      bool   trunc = false;
> >>>>
> >>>>      if (dst == end)
> >>>
> >>> Out of curiosity what if `end < dst`?
> >>
> >> The behavior is undefined.  That's by design.  In the definition of stpecpy(3) I
> >> have currently in libstp, I even tell the compiler to optimize on that condition:
> >> <http://www.alejandro-colomar.es/src/alx/alx/libstp.git/tree/include/stp/stpe/stpecpy.h#n33>
> >>
> >
> > You could probably optimize out one of the branches along the line of:
> > if((dst - 1UL) >= (end - 1UL)) {
> >      // if dst == NULL, then dst - 1UL -> SIZE_MAX and must be >= any value.
>
> You would need a cast, wouldn't you?  Otherwise, you'll get pointer arithmetic.
> Pointer arithmetic with NULL is UB.
>
> >      // if dst == end, then (dst - 1UL) >= (end - 1UL) will be true.
> >      return NULL;
>
> Returning NULL on truncation would be a possibility, but then we'd need to use
> errno to tell the user if the error was truncation or an input NULL (which
> reports an error to a previous vsnprintf(3) call wrapped by [v]stpeprintf().

I'm not sure I see what you mean. Your current logic is:
```
   if (dst == end)
     return NULL;
   if (dst == NULL)
     return NULL;
```
Equivalent (since dst >= end || dst == NULL is required) is:
```
if((dst - 1UL) >= (end - 1UL)) {
    return NULL;
}
```
May need to be cast to a `uintptr` or something but don't see
what you mean about needing to check errno and such.

>
> Using errno would probably counter any optimization, since you'd still need one
> more branch for setting errno, so I guess it's simpler to just use end for
> truncation.
>
>
> Oooor, if we reimplement __vsnprintf_internal(3) to work on size_t and never
> fail, then we could add a [v]stpeprintf(3) that never fails, and then this
> function would only bail out on truncation.
>
> Would it be possible to make __vsnprintf_internal() never fail?  What are the
> current failing conditions; only a size greater than INT_MAX, or are there more
> errors?

Don't think its worth reimplementing    __vsnprintf_internal to save a single
branch here.
>
> > }
> >>
> >> alx@asus5775:~/src/alx/libstp$ grepc -tfd stpecpy
> >> ./include/stp/stpe/stpecpy.h:21:
> >> inline char *stp_nullable
> >> stpecpy(char *stp_nullable dst, char *end, const char *restrict src)
> >> {
> >>          bool    trunc;
> >>          size_t  dsize, dlen, slen;
> >>
> >>          slen = strlen(src);
> >>
> >>          if (dst == end)
> >>                  return end;
> >>          if (stp_unlikely(dst == NULL))  // Allow chaining with stpeprintf().
> >>                  return NULL;
> >>          stp_impossible(dst > end);
> >>
> >>          dsize = end - dst;
> >>          trunc = (slen >= dsize);
> >>          dlen = trunc ? dsize - 1 : slen;
> >>          dst[dlen] = '\0';
> >>
> >>          return mempcpy(dst, src, dlen) + trunc;
> >> }
> >> alx@asus5775:~/src/alx/libstp$ grepc -tm stp_impossible
> >> ./include/stp/_compiler.h:14:
> >> #define stp_impossible(e)   do                                                \
> >> {                                                                             \
> >>          if (e)                                                                \
> >>                  stp_unreachable();                                            \
> >> } while (0)
> >> alx@asus5775:~/src/alx/libstp$ grep -rnC1 define.stp_unreachable
> >> include/stp/_compiler.h-28-#if defined(unreachable)
> >> include/stp/_compiler.h:29:# define stp_unreachable()  unreachable()
> >> include/stp/_compiler.h-30-#else
> >> include/stp/_compiler.h:31:# define stp_unreachable()  __builtin_unreachable()
> >> include/stp/_compiler.h-32-#endif
> >>
> >>
> >> I'd do that for glibc, but I don't see any facility.  Maybe we should add an
> >> __impossible() macro to document UB, and help the compiler.
> >
> > Does it result in any improved codegen? If not seems like
> > making it fail more noisily is always a win.
>
> Both Clang and GCC generate the same code with or without the hint that it's
> impossible.  Anyway, I'll keep it in my source code because it also helps tell
> the programmer that dst>end was taken into consideration and explicitly outlawed.
>
> The 'end' pointer is expected to be always generated as 'buf + sizeof(buf)'.
> Doing something different is not what this API is designed for, and should be
> warned by compilers.  'end' should be a pointer to one after the last byte in an
> array.  Thus, no valid pointer can be greater than end.  If you use this API as
> expected, which is, only chain it with itself and with stpeprintf(3), then it is
> impossible to have dst>end.  But as always, GIGO.
>
> As for the expected result, it would be akin calling strlcpy(3) with a negative
> size.  It would wrap around size_t, and give something close to 2^64.  Both
> would result in a buffer overrun, so writing at random memory, and later a
> crash, but I don't expect that libc should try to detect if the input to
> strlcpy(3) (or actually, any mem*() function) is huge, and neither if input to
> stpecpy(3) is similarly broken.
>
> --
> <http://www.alejandro-colomar.es/>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/1] string: Add stpecpy(3)
  2022-12-26  0:32             ` Noah Goldstein
@ 2022-12-26  0:37               ` Alejandro Colomar
  2022-12-26  2:43                 ` Noah Goldstein
  0 siblings, 1 reply; 26+ messages in thread
From: Alejandro Colomar @ 2022-12-26  0:37 UTC (permalink / raw)
  To: Noah Goldstein; +Cc: Wilco Dijkstra, GNU C Library


[-- Attachment #1.1: Type: text/plain, Size: 2443 bytes --]

On 12/26/22 01:32, Noah Goldstein wrote:
>>> You could probably optimize out one of the branches along the line of:
>>> if((dst - 1UL) >= (end - 1UL)) {
>>>       // if dst == NULL, then dst - 1UL -> SIZE_MAX and must be >= any value.
>>
>> You would need a cast, wouldn't you?  Otherwise, you'll get pointer arithmetic.
>> Pointer arithmetic with NULL is UB.
>>
>>>       // if dst == end, then (dst - 1UL) >= (end - 1UL) will be true.
>>>       return NULL;
>>
>> Returning NULL on truncation would be a possibility, but then we'd need to use
>> errno to tell the user if the error was truncation or an input NULL (which
>> reports an error to a previous vsnprintf(3) call wrapped by [v]stpeprintf().
> 
> I'm not sure I see what you mean. Your current logic is:
> ```
>     if (dst == end)
>       return NULL;
>     if (dst == NULL)
>       return NULL;

No; current code is:

     if (dst == end)
         return end;
     if (dst == NULL)
         return NULL;

NULL is an error (contents of string are undefined; per vsnprintf(3)'s spec), 
while 'end' is just truncation, and contents if the string are well defined.


> ```
> Equivalent (since dst >= end || dst == NULL is required) is:
> ```
> if((dst - 1UL) >= (end - 1UL)) {
>      return NULL;
> }
> ```
> May need to be cast to a `uintptr` or something but don't see
> what you mean about needing to check errno and such.
> 
>>
>> Using errno would probably counter any optimization, since you'd still need one
>> more branch for setting errno, so I guess it's simpler to just use end for
>> truncation.
>>
>>
>> Oooor, if we reimplement __vsnprintf_internal(3) to work on size_t and never
>> fail, then we could add a [v]stpeprintf(3) that never fails, and then this
>> function would only bail out on truncation.
>>
>> Would it be possible to make __vsnprintf_internal() never fail?  What are the
>> current failing conditions; only a size greater than INT_MAX, or are there more
>> errors?
> 
> Don't think its worth reimplementing    __vsnprintf_internal to save a single
> branch here.

It wouldn't be only for that, but also allowing to write size_t bytes of 
formatted output.  However, I question how useful that is, since you only need 
that many bytes when you're catenating strings with %s, for which stpecpy(3) can 
be used; so yes, probably it's not worth it.

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/1] string: Add stpecpy(3)
  2022-12-26  0:37               ` Alejandro Colomar
@ 2022-12-26  2:43                 ` Noah Goldstein
  2022-12-26 22:25                   ` Alejandro Colomar
  0 siblings, 1 reply; 26+ messages in thread
From: Noah Goldstein @ 2022-12-26  2:43 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Wilco Dijkstra, GNU C Library

On Sun, Dec 25, 2022 at 4:37 PM Alejandro Colomar
<alx.manpages@gmail.com> wrote:
>
> On 12/26/22 01:32, Noah Goldstein wrote:
> >>> You could probably optimize out one of the branches along the line of:
> >>> if((dst - 1UL) >= (end - 1UL)) {
> >>>       // if dst == NULL, then dst - 1UL -> SIZE_MAX and must be >= any value.
> >>
> >> You would need a cast, wouldn't you?  Otherwise, you'll get pointer arithmetic.
> >> Pointer arithmetic with NULL is UB.
> >>
> >>>       // if dst == end, then (dst - 1UL) >= (end - 1UL) will be true.
> >>>       return NULL;
> >>
> >> Returning NULL on truncation would be a possibility, but then we'd need to use
> >> errno to tell the user if the error was truncation or an input NULL (which
> >> reports an error to a previous vsnprintf(3) call wrapped by [v]stpeprintf().
> >
> > I'm not sure I see what you mean. Your current logic is:
> > ```
> >     if (dst == end)
> >       return NULL;
> >     if (dst == NULL)
> >       return NULL;
>
> No; current code is:
>
>      if (dst == end)
>          return end;
>      if (dst == NULL)
>          return NULL;

I see. The code you commented earlier was NULL for both.

Either way you can just make it:

```
if((dst - 1UL) >= (end - 1UL)) {
 return dst; // either dst == NULL or dst == end.
}
```

>
> NULL is an error (contents of string are undefined; per vsnprintf(3)'s spec),
> while 'end' is just truncation, and contents if the string are well defined.
>
>
> > ```
> > Equivalent (since dst >= end || dst == NULL is required) is:
> > ```
> > if((dst - 1UL) >= (end - 1UL)) {
> >      return NULL;
> > }
> > ```
> > May need to be cast to a `uintptr` or something but don't see
> > what you mean about needing to check errno and such.
> >
> >>
> >> Using errno would probably counter any optimization, since you'd still need one
> >> more branch for setting errno, so I guess it's simpler to just use end for
> >> truncation.
> >>
> >>
> >> Oooor, if we reimplement __vsnprintf_internal(3) to work on size_t and never
> >> fail, then we could add a [v]stpeprintf(3) that never fails, and then this
> >> function would only bail out on truncation.
> >>
> >> Would it be possible to make __vsnprintf_internal() never fail?  What are the
> >> current failing conditions; only a size greater than INT_MAX, or are there more
> >> errors?
> >
> > Don't think its worth reimplementing    __vsnprintf_internal to save a single
> > branch here.
>
> It wouldn't be only for that, but also allowing to write size_t bytes of
> formatted output.  However, I question how useful that is, since you only need
> that many bytes when you're catenating strings with %s, for which stpecpy(3) can
> be used; so yes, probably it's not worth it.
>
> --
> <http://www.alejandro-colomar.es/>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/1] string: Add stpecpy(3)
  2022-12-26  2:43                 ` Noah Goldstein
@ 2022-12-26 22:25                   ` Alejandro Colomar
  2022-12-26 23:24                     ` Alejandro Colomar
  0 siblings, 1 reply; 26+ messages in thread
From: Alejandro Colomar @ 2022-12-26 22:25 UTC (permalink / raw)
  To: Noah Goldstein; +Cc: Wilco Dijkstra, GNU C Library

[-- Attachment #1.1: Type: text/plain, Size: 903 bytes --]

Hi Noah,

On 12/26/22 03:43, Noah Goldstein wrote:

> I see. The code you commented earlier was NULL for both.

I don't remember; maybe there was a typo...

> 
> Either way you can just make it:
> 
> ```
> if((dst - 1UL) >= (end - 1UL)) {

I didn't expect that would be significantly better than `(dst == end || dst == 
NULL)`.  However, I compiled both just to see, and the assembly output for your 
code is shorter.  I'll benchmark both to see if there are performance differences.

I wonder why the compiler doesn't generate this code if it's better; it has all 
the information to decide that it can be transformed into that.

Both clang and GCC produce better code with your suggestion (although in the 
case of GCC, the difference is bigger.

Cheers,

Alex

>   return dst; // either dst == NULL or dst == end.
> }
> ```

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/1] string: Add stpecpy(3)
  2022-12-26 22:25                   ` Alejandro Colomar
@ 2022-12-26 23:24                     ` Alejandro Colomar
  2022-12-26 23:52                       ` Alejandro Colomar
  0 siblings, 1 reply; 26+ messages in thread
From: Alejandro Colomar @ 2022-12-26 23:24 UTC (permalink / raw)
  To: Noah Goldstein; +Cc: Wilco Dijkstra, GNU C Library


[-- Attachment #1.1: Type: text/plain, Size: 1116 bytes --]



On 12/26/22 23:25, Alejandro Colomar wrote:
> Hi Noah,
> 
> On 12/26/22 03:43, Noah Goldstein wrote:
> 
>> I see. The code you commented earlier was NULL for both.
> 
> I don't remember; maybe there was a typo...
> 
>>
>> Either way you can just make it:
>>
>> ```
>> if((dst - 1UL) >= (end - 1UL)) {
> 
> I didn't expect that would be significantly better than `(dst == end || dst == 
> NULL)`.  However, I compiled both just to see, and the assembly output for your 
> code is shorter.  I'll benchmark both to see if there are performance differences.
> 
> I wonder why the compiler doesn't generate this code if it's better; it has all 
> the information to decide that it can be transformed into that.
> 
> Both clang and GCC produce better code with your suggestion (although in the 
> case of GCC, the difference is bigger.

Self-correction:

They produce smaller code for your suggestion, but it seems to be slower code.

> 
> Cheers,
> 
> Alex
> 
>>   return dst; // either dst == NULL or dst == end.
>> }
>> ```
> 
> 

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/1] string: Add stpecpy(3)
  2022-12-26 23:24                     ` Alejandro Colomar
@ 2022-12-26 23:52                       ` Alejandro Colomar
  2022-12-27  0:12                         ` Alejandro Colomar
  0 siblings, 1 reply; 26+ messages in thread
From: Alejandro Colomar @ 2022-12-26 23:52 UTC (permalink / raw)
  To: Noah Goldstein; +Cc: Wilco Dijkstra, GNU C Library


[-- Attachment #1.1: Type: text/plain, Size: 1803 bytes --]



On 12/27/22 00:24, Alejandro Colomar wrote:
> 
> 
> On 12/26/22 23:25, Alejandro Colomar wrote:
>> Hi Noah,
>>
>> On 12/26/22 03:43, Noah Goldstein wrote:
>>
>>> I see. The code you commented earlier was NULL for both.
>>
>> I don't remember; maybe there was a typo...
>>
>>>
>>> Either way you can just make it:
>>>
>>> ```
>>> if((dst - 1UL) >= (end - 1UL)) {
>>
>> I didn't expect that would be significantly better than `(dst == end || dst == 
>> NULL)`.  However, I compiled both just to see, and the assembly output for 
>> your code is shorter.  I'll benchmark both to see if there are performance 
>> differences.
>>
>> I wonder why the compiler doesn't generate this code if it's better; it has 
>> all the information to decide that it can be transformed into that.
>>
>> Both clang and GCC produce better code with your suggestion (although in the 
>> case of GCC, the difference is bigger.
> 
> Self-correction:
> 
> They produce smaller code for your suggestion, but it seems to be slower code.

However, and this is interesting, calling strnlen(3) results in faster code even 
when no truncation occurs; at least for the short string I tested.

I suspect it might be because it is already heavily optimized in glibc, and it 
implicitly simplifies surrounding code:

-       slen = strlen(src);
         dsize = end - dst;
-       trunc = (slen >= dsize);
+       slen = strnlen(src, dsize);
+       trunc = (slen == dsize);

The generated assembly code is one line smaller (on my system, and phase of the 
moon), and some small percent faster.  :)

Cheers,

Alex


> 
>>
>> Cheers,
>>
>> Alex
>>
>>>   return dst; // either dst == NULL or dst == end.
>>> }
>>> ```
>>
>>
> 

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/1] string: Add stpecpy(3)
  2022-12-26 23:52                       ` Alejandro Colomar
@ 2022-12-27  0:12                         ` Alejandro Colomar
  0 siblings, 0 replies; 26+ messages in thread
From: Alejandro Colomar @ 2022-12-27  0:12 UTC (permalink / raw)
  To: Noah Goldstein; +Cc: Wilco Dijkstra, GNU C Library


[-- Attachment #1.1: Type: text/plain, Size: 1379 bytes --]

On 12/27/22 00:52, Alejandro Colomar wrote:
> However, and this is interesting, calling strnlen(3) results in faster code even 
> when no truncation occurs; at least for the short string I tested.
> 
> I suspect it might be because it is already heavily optimized in glibc, and it 
> implicitly simplifies surrounding code:
> 
> -       slen = strlen(src);
>          dsize = end - dst;
> -       trunc = (slen >= dsize);
> +       slen = strnlen(src, dsize);
> +       trunc = (slen == dsize);
> 
> The generated assembly code is one line smaller (on my system, and phase of the 
> moon), and some small percent faster.  :)

I found the reason; it helps simplify considerably the code.  Here's the 
resulting optimized code:


char *stp_nullable
stpecpy(char *stp_nullable dst, char *end, const char *restrict src)
{
	bool    trunc;
	size_t  dsize, dlen, slen;

	if (dst == end)
		return end;
	if (stp_unlikely(dst == NULL))  // Allow chaining with stpeprintf().
		return NULL;
	stp_impossible(dst > end);

	dsize = end - dst;
	slen = strnlen(src, dsize);
	trunc = (slen == dsize);
	dlen = slen - trunc;
	dst[dlen] = '\0';

	return mempcpy(dst, src, dlen) + trunc;
}


See how using strnlen(3) removed the ternary operator.  That's a great 
optimization.  :)

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/1] string: Add stpecpy(3)
@ 2022-12-23 14:59 Wilco Dijkstra
  2022-12-23 17:03 ` Alejandro Colomar
  0 siblings, 1 reply; 26+ messages in thread
From: Wilco Dijkstra @ 2022-12-23 14:59 UTC (permalink / raw)
  To: Alejandro Colomar (man-pages); +Cc: 'GNU C Library'

Hi Alex,

>> Given strlcpy and strlcat are in POSIX next and therefore bar
>> some extraordinary event will be in glibc, I think we should
>> probably wait until those two land, then see if there's still
>> an appetite for stpecpy in glibc.
>
> I disagree for the following reasons.
>
> strlcpy(3)/strlcat(3) are designed to be _slow_, in exchange for added 
> simplicity and safety.  That's what Theo told me about them.  They didn't care 
> about performance.  The two performance issues are:

We'd need actual benchmarks to confirm there is a detectable performance
difference in typical scenarios. So calling strlcpy slow is premature. Looking
at the proposed stpecpy, it seems it has a lot more branches and special cases
compared to a typical strlcpy, and that adds extra overhead. Using memccpy
is risky too since that is often not optimized.

> -  Traverse the entire input string, to make sure it's a string.  stpecpy(3) 
> instead only reads what's necessary for the copy; it stops reading after truncation.

Almost all strings are short and few cases need truncation, so I don't see the issue.
People concerned about performance wouldn't use the standard string functions
anyway.

> -  strlcat(3) finds the terminating null byte; that's something you already know 
> where it is, with functions that return a useful pointer (mempcpy(3), stpcpy(3), 
> and stpecpy(3)).

If you know the end of the destination string then don't use concatenation. Easy!

In fact compilers could inline the 'dest += strlen (dest)' part of strcat and call
strcpy instead. This allows optimization of the strlen in case you know the size
of the destination string. This is true for strlcpy too, a compiler could just inline
it and optimize the strlen (src).

> The reason that triggered me wanting to add this function is seeing strncpy(3) 
> used for a patch to some glibc internals themselves.  Using strlcpy(3)/cat(3) in 
> glibc internals would be bad for performance; I would hope that glibc uses the 
> optimal internals, even if it also provides slow functions for users.

Most internal uses are unlikely to be performance critical, and correctness is kind
of important for libraries.

IMHO inventing many slightly different non-standard string functions is what
causes performance and correctness issues. People disagree about the semantics
(strlcpy has been argued over for a decade or so). Even if a library supports them,
you never know which implementations are actually well optimized (obviously
this varies between ISA and different libc's). So which non-standard string function
is safe to use across all targets and libraries?

In contrast we can be pretty sure that the standard strlen, memcpy etc are both
correct and efficient on all targets/libc's.

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/1] string: Add stpecpy(3)
  2022-12-23 14:59 Wilco Dijkstra
@ 2022-12-23 17:03 ` Alejandro Colomar
  2022-12-23 17:27   ` Alejandro Colomar
  0 siblings, 1 reply; 26+ messages in thread
From: Alejandro Colomar @ 2022-12-23 17:03 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: 'GNU C Library'

[-- Attachment #1.1: Type: text/plain, Size: 9505 bytes --]

Hi Wilco,

On 12/23/22 15:59, Wilco Dijkstra wrote:
> Hi Alex,
> 
>>> Given strlcpy and strlcat are in POSIX next and therefore bar
>>> some extraordinary event will be in glibc, I think we should
>>> probably wait until those two land, then see if there's still
>>> an appetite for stpecpy in glibc.
>>
>> I disagree for the following reasons.
>>
>> strlcpy(3)/strlcat(3) are designed to be _slow_, in exchange for added
>> simplicity and safety.  That's what Theo told me about them.  They didn't care
>> about performance.  The two performance issues are:
> 
> We'd need actual benchmarks to confirm there is a detectable performance
> difference in typical scenarios.

For that, we'd first need to discuss what is a typical scenario.

And also, it depends a lot on what the compiler can optimize.  If I call 
strlcat(3) in a loop, I know that stpecpy(3) is going to be orders of magnitude 
faster.

If I call strlcpy(3) in a loop, doing what an ideal compiler might do, that 
might be something to benchmark, but we'd also need to discuss what is a good 
input for the benchmark.

> So calling strlcpy slow is premature. Looking
> at the proposed stpecpy, it seems it has a lot more branches and special cases
> compared to a typical strlcpy,

In my definition of stpecpy(), I see 3 branches (the other one is just an 
unreachable path to let the compiler optimize better).  And in the NULL check, 
we could use __builtin_expect to let it assume it's unlikely.

And there's one more branch inside memccpy(3), so that's 3 branches plus one 
more branch that can be inside __builtin_expect().  After those branches, it's 
all memcpy(3).

In the OpenBSD definition of strlcpy(), I count 4 branches, and one of them is 
inside a while loop.  So, I'd find it very surprising if strlcpy(3) outperformed 
stpecpy(3).

For completeness, I compared these definitions:

alx@asus5775:~/src/bsd/openbsd/src$ grepc -tfd strlcpy lib/
lib/libc/string/strlcpy.c:27:
size_t
strlcpy(char *dst, const char *src, size_t dsize)
{
	const char *osrc = src;
	size_t nleft = dsize;

	/* Copy as many bytes as will fit. */
	if (nleft != 0) {
		while (--nleft != 0) {
			if ((*dst++ = *src++) == '\0')
				break;
		}
	}

	/* Not enough room in dst, add NUL and traverse rest of src. */
	if (nleft == 0) {
		if (dsize != 0)
			*dst = '\0';		/* NUL-terminate dst */
		while (*src++)
			;
	}

	return(src - osrc - 1);	/* count does not include NUL */
}

alx@asus5775:~/src/alx/libstp/src$ grepc -tfd stpecpy
./stp/stpe/stpecpy.c:15:
char *stp_nullable
stpecpy(char *stp_nullable dst, char end[0], const char *restrict src)
{
	char  *stp_nullable p;

	if (dst == end)
		return end;
	if (stp_unlikely(dst == NULL))  // Allow chaining with stpeprintf().
		return NULL;
	if (dst > end)
		stp_unreachable();

	p = memccpy(dst, src, '\0', end - dst);
	if (p != NULL)
		return p - 1;

	/* Truncation detected. */
	end[-1] = '\0';
	return end;
}

alx@asus5775:~/src/gnu/glibc$ grepc -tfd __memccpy
./string/memccpy.c:30:
void *
__memccpy (void *dest, const void *src, int c, size_t n)
{
   void *p = memchr (src, c, n);

   if (p != NULL)
     return __mempcpy (dest, src, p - src + 1);

   memcpy (dest, src, n);
   return NULL;
}

> and that adds extra overhead. Using memccpy
> is risky too since that is often not optimized.

Well, with the current memccpy(3) I already suspect it's going to be faster than 
strlcpy(3).  If you optimize it, it would increase the chances that it's faster :)

> 
>> -  Traverse the entire input string, to make sure it's a string.  stpecpy(3)
>> instead only reads what's necessary for the copy; it stops reading after truncation.
> 
> Almost all strings are short and few cases need truncation, so I don't see the issue.
> People concerned about performance wouldn't use the standard string functions
> anyway.

That's true for copying strings without truncation; in such cases, mempcpy(3) is 
probably the best thing you can use.  That's documented in string_copying(7).

However, mempcpy(3) doesn't help when you need to truncate.  And there are cases 
where you need to truncate, especially when you need formatting (snprintf(3)).

> 
>> -  strlcat(3) finds the terminating null byte; that's something you already know
>> where it is, with functions that return a useful pointer (mempcpy(3), stpcpy(3),
>> and stpecpy(3)).
> 
> If you know the end of the destination string then don't use concatenation. Easy!

I'll quote strlcpy(3) man page:

        Since it is known how many  characters  were  copied  the  first  time,
        things can be sped up a bit by using a copy instead of an append:

              [... code example ...]

        However,  one  may question the validity of such optimizations, as they
        defeat the whole purpose of strlcpy() and strlcat().  As  a  matter  of
        fact, the first version of this manual page got it wrong.

So, not easy.

> 
> In fact compilers could inline the 'dest += strlen (dest)' part of strcat and call
> strcpy instead. This allows optimization of the strlen in case you know the size
> of the destination string. This is true for strlcpy too, a compiler could just inline
> it and optimize the strlen (src).

Fair.  Compilers already optimize strcat(3) into stpcpy(3), so I expect they 
could optimize strlcat(3) in the same way.  So, let's assume that a compiler can 
optimize strlcat(3) calls into the correct strlcpy(3) calls:

               if (strlcpy(buf, "Hello ", sizeof(buf)) >= sizeof(buf))
                   goto toolong;
               if (strlcat(buf, "world", sizeof(buf)) >= sizeof(buf))
                   goto toolong;
               len = strlcat(buf, "!", sizeof(buf));
               if (len >= sizeof(buf))
                   goto toolong;
               puts(buf);

the code above could be optimized into (assuming I didn't write any bugs, which 
is likely, due to complexity):

               n = strlcpy(buf, "Hello ", sizeof(buf));
               if (n >= sizeof(buf))
                   goto toolong;
               n += strlcpy(buf + n, "world", sizeof(buf) - n);
               if (n >= sizeof(buf))
                   goto toolong;
               n += strlcpy(buf + n, "!", sizeof(buf) - n);
               if (n >= sizeof(buf))
                   goto toolong;
               puts(buf);

This means that to the branches in the implementation of strlcpy(3), we need to 
add one more.  strlcpy(3) forces the user to check for truncation after every 
call, while stpecpy(3) allows checking truncation only once at the end of the 
chained calls.  So, here goes another performance issue of these calls.

And, while strlcpy(3)/cat(3) are simpler _when truncating is fine_, truncating 
is almost never fine.  Which means you almost always need to check for 
truncation.  Which means boilerplate code, which in turn makes these calls more 
complex to use than simply stpecpy(3).  Here goes the equivalent code to the above:

               end = buf + sizeof(buf);
               p = buf;
               p = stpecpy(p, end, "Hello ");
               p = stpecpy(p, end, "world");
               p = stpecpy(p, end, "!");
               if (p == end)
                   goto toolong;
               len = p - buf;
               puts(buf);

I find it _way_ more readable than the strlcpy(3)/cat(3) code.  Oh, and did I 
say it has less branches? :)

> 
>> The reason that triggered me wanting to add this function is seeing strncpy(3)
>> used for a patch to some glibc internals themselves.  Using strlcpy(3)/cat(3) in
>> glibc internals would be bad for performance; I would hope that glibc uses the
>> optimal internals, even if it also provides slow functions for users.
> 
> Most internal uses are unlikely to be performance critical, and correctness is kind
> of important for libraries.

Correctness comes from readability.  The less code you write, the more correct 
the code will be.  Just see the above for proof that this function actually 
promotes correct code, which in fact was the main reason for this function. 
Performance is just a nice side effect.

> 
> IMHO inventing many slightly different non-standard string functions is what
> causes performance and correctness issues. People disagree about the semantics
> (strlcpy has been argued over for a decade or so). Even if a library supports them,
> you never know which implementations are actually well optimized (obviously
> this varies between ISA and different libc's).

The good thing about stpecpy(3) is that it's difficult to make it slow.  The 
naive implementation is already quite simple and fast.  Not saying it's 
comparable to memcpy(3), but it's likely to be faster than competition, which is 
strlcpy(3).

> So which non-standard string function
> is safe to use across all targets and libraries?

I covered that in string_copying(7).  I guess in a few weeks you'll be able to 
read it in your favourite unsable distro :)

> 
> In contrast we can be pretty sure that the standard strlen, memcpy etc are both
> correct and efficient on all targets/libc's.

Sure, but memcpy(3) is not usable in code that needs to truncate.  We need to 
compare against stpncpy(3) (ughhh) and strlcpy(3).

> 
> Cheers,
> Wilco

Cheers,

Alex

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/1] string: Add stpecpy(3)
  2022-12-23 17:03 ` Alejandro Colomar
@ 2022-12-23 17:27   ` Alejandro Colomar
  0 siblings, 0 replies; 26+ messages in thread
From: Alejandro Colomar @ 2022-12-23 17:27 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: 'GNU C Library'


[-- Attachment #1.1: Type: text/plain, Size: 1379 bytes --]



On 12/23/22 18:03, Alejandro Colomar wrote:
> alx@asus5775:~/src/alx/libstp/src$ grepc -tfd stpecpy
> ./stp/stpe/stpecpy.c:15:
> char *stp_nullable
> stpecpy(char *stp_nullable dst, char end[0], const char *restrict src)
> {
>      char  *stp_nullable p;
> 
>      if (dst == end)
>          return end;
>      if (stp_unlikely(dst == NULL))  // Allow chaining with stpeprintf().
>          return NULL;

Oh, and the two branches above can be optimized into a branch that returns dst. 
I wrote them expanded, for added readability, and allow the compiler reorganize 
it however it pleases.  So, in reality, there are 2 actual branches, plus one 
inside memccpy(3).

>      if (dst > end)
>          stp_unreachable();
> 
>      p = memccpy(dst, src, '\0', end - dst);
>      if (p != NULL)
>          return p - 1;
> 
>      /* Truncation detected. */
>      end[-1] = '\0';
>      return end;
> }
> 
> alx@asus5775:~/src/gnu/glibc$ grepc -tfd __memccpy
> ./string/memccpy.c:30:
> void *
> __memccpy (void *dest, const void *src, int c, size_t n)
> {
>    void *p = memchr (src, c, n);
> 
>    if (p != NULL)
>      return __mempcpy (dest, src, p - src + 1);
> 
>    memcpy (dest, src, n);
>    return NULL;
> }
> 

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 0/1] string: Add stpecpy(3)
@ 2022-12-22 21:42 Alejandro Colomar
  2022-12-22 21:42 ` [PATCH 1/1] " Alejandro Colomar
  0 siblings, 1 reply; 26+ messages in thread
From: Alejandro Colomar @ 2022-12-22 21:42 UTC (permalink / raw)
  To: libc-alpha; +Cc: Alejandro Colomar

Hi!

I recently rewrite the Linux man-pages for string-copying functions, and
just a moment ago, I released the new version.  There are some important
gaps in the string-copying library, and they should be addressed,
however it is done, but ignoring it won't solve the problem.

I know this has been suggested in the past (I did once), and has never
progressed, but I'll try to justify it as much as I can.

The gaps are:

-  No function for copying strings with truncation.  (strlcpy(3) or strscpy(9))
-  No function for catenating strings with truncation.  (strlcat(3))
-  No function for chain-copying strings with truncation.  (stpecpy(3))

   This is similar to strcpy(3)/strcat(3)/stpcpy(3), where stpcpy(3) is
   faster and more versatile than the other two, but it's also slightly
   more complex to use (only slightly).

   We wouldn't need to add the 3, but at least stpecpy(3) or both
   strlcpy(3) and strlcat(3).

   Since stpecpy(3) is significantly faster than the other two, I
   suggest at least adding stpecpy(3).  Also, strlcpy(3)/cat(3) can be
   more easily implemented in terms of stpecpy(3).

There are a few other gaps, but they are much less important, since
there are relatively good workarounds.  I don't want to overload the
discussion either, so I prefer first adding the most necessary function,
and only after that deciding if we want to support other string-copying
functions.

I had added this function this week to a new string library that I was
writing, so I already had a manual page written for it.  I'll copy it
below, to document all the details of the API.o

Cheers,

Alex


---

Alejandro Colomar (1):
  string: Add stpecpy(3)

 string/Makefile  |  1 +
 string/stpecpy.c | 39 +++++++++++++++++++++++++++++++++++++++
 string/string.h  |  7 +++++++
 3 files changed, 47 insertions(+)
 create mode 100644 string/stpecpy.c


stpecpy(3)                 Library Functions Manual                 stpecpy(3)

NAME
       stpecpy, stpecpyx - copy a string with truncation

LIBRARY
       Stp string library (libstp, pkgconf ‐‐cflags ‐‐libs libstp)

SYNOPSIS
       #include <stp/stpe/stpecpy.h>

       char *_Nullable stpecpy(char *_Nullable dst, char end[0],
                               const char *restrict src);
       char *_Nullable stpecpyx(char *_Nullable dst, char end[0],
                               const char *restrict src);

DESCRIPTION
       These functions copy the string pointer to by src, into a string at the
       buffer  pointer  to  by  dst.   If the destination buffer, limited by a
       pointer to its end —one after its last element—, isn’t large enough  to
       hold the copy, the resulting string is truncated.

       stpecpyx(3)  forces a SIGSEGV if the input is not a string, by travers‐
       ing it entirely.

       These  functions  can  be  chained  with  calls  to  stpeprintf(3)  and
       vstpeprintf(3).

       An implementation of these functions might be

           /* This code is in the public domain. */

           char *
           stpecpy(char *dst, char end[0], const char *restrict src)
           {
               char *p;

               if (dst == end || dst == NULL)
                   return dst;

               p = memccpy(dst, src, '\0', end - dst);
               if (p != NULL)
                   return p - 1;

               /* truncation detected */
               end[-1] = '\0';
               return end;
           }

           char *
           stpecpyx(char *dst, char end[0], const char *restrict src)
           {
               if (src[strlen(src)] != '\0')
                   raise(SIGSEGV);

               return stpecpy(dst, end, src);
           }

RETURN VALUE
       NULL   If dst was NULL.

       end
              •  If this call truncated.
              •  If  dst  was equal to end (a previous call to these functions
                 truncated).

       dst + strlen(dst)
              On success, these functions return a pointer to the  terminating
              null byte.

ATTRIBUTES
       For  an  explanation  of  the  terms  used in this section, see attrib‐
       utes(7).
       ┌────────────────────────────────────────────┬───────────────┬─────────┐
       │Interface                                   │ Attribute     │ Value   │
       ├────────────────────────────────────────────┼───────────────┼─────────┤
       │stpecpy(3), stpecpyx(3)                     │ Thread safety │ MT‐Safe │
       └────────────────────────────────────────────┴───────────────┴─────────┘

STANDARDS
       None.

EXAMPLES
       $ cc ./stpecpy.c $(pkgconf --cflags --libs libbsd-overlay libstp)
       $ ./a.out
       [len = 12]: Hello world!
       $

       // stpecpy.c
       #include <err.h>
       #include <stdio.h>
       #include <stdlib.h>

       #include <stp/stpecpy.h>
       #include <stp/stpeprintf.h>

       int
       main(void)
       {
           char    *p, *end;
           char    buf[BUFSIZ];
           size_t  len;

           end = buf + BUFSIZ;
           p = buf;
           p = stpecpy(p, end, "Hello, ");
           p = stpeprintf(p, end, "%d worlds", 22);
           p = stpecpy(p, end, "!");
           if (p == NULL)
               err(EXIT_FAILURE, "stpeprintf()");
           if (p == end) {
               p--;
               warnx("Truncated");
           }
           len = p - buf;
           printf("[len = %zu]: ", len);
           puts(buf);

           exit(EXIT_SUCCESS);
       }

SEE ALSO
       stpeprintf(3), string_copying(7)

libstp (unreleased)                 (date)                          stpecpy(3)

-- 
2.39.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/1] string: Add stpecpy(3)
  2022-12-22 21:42 [PATCH 0/1] " Alejandro Colomar
@ 2022-12-22 21:42 ` Alejandro Colomar
  2022-12-23  7:02   ` Sam James
  0 siblings, 1 reply; 26+ messages in thread
From: Alejandro Colomar @ 2022-12-22 21:42 UTC (permalink / raw)
  To: libc-alpha; +Cc: Alejandro Colomar

Glibc didn't provide any function that copies a string with truncation.

It only provided strncpy(3) and stpncpy(3), which copy from a string
into a null-padded character sequence at the destination fixed-width
buffer, with truncation.

Those old functions, which don't produce a string, have been misused for
a long time as a poor-man's replacement for strlcpy(3), but doing that
is a source of bugs, since it's hard to calculate the right size that
should be passed to the function, and it's also necessary to explicitly
terminate the buffer with a null byte.  Detecting truncation is yet
another problem.

stpecpy(3), described in the new string_copying(7) manual page, is
similar to OpenBSD's strlcpy(3)/strlcat(3), but:

-  It's simpler to implement.
-  It's faster.
-  It's simpler to detect truncation.

Signed-off-by: Alejandro Colomar <alx@kernel.org>
---

Of course this is still a very early patch.  I just compiled it,
but we'd need to write tests for it.  I didn't want to do all of that
work before the discussion.  Since the source code has been copied from
libstp, I can at least say that the function works, since I already used
that library, but this still needs a lot of work to adapt to glibc, I guess.

 string/Makefile  |  1 +
 string/stpecpy.c | 39 +++++++++++++++++++++++++++++++++++++++
 string/string.h  |  7 +++++++
 3 files changed, 47 insertions(+)
 create mode 100644 string/stpecpy.c

diff --git a/string/Makefile b/string/Makefile
index 938f528b8d..95e9ebce6d 100644
--- a/string/Makefile
+++ b/string/Makefile
@@ -73,6 +73,7 @@ routines := \
   sigabbrev_np \
   sigdescr_np \
   stpcpy \
+  stpecpy \
   stpncpy \
   strcasecmp \
   strcasecmp_l \
diff --git a/string/stpecpy.c b/string/stpecpy.c
new file mode 100644
index 0000000000..e6194559ee
--- /dev/null
+++ b/string/stpecpy.c
@@ -0,0 +1,39 @@
+/* Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 3.0 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <string.h>
+
+char *
+stpecpy (char *dest, char *end, const char *restrict src)
+{
+	char  *p;
+
+	if (dest == end)
+		return end;
+	if (dest == NULL)  // Allow chaining with stpeprintf(3).
+		return NULL;
+	if (dest > end)
+		__builtin_unreachable();
+
+	p = memccpy(dest, src, '\0', end - dest);
+	if (p != NULL)
+		return p - 1;
+
+	/* Truncation detected. */
+	end[-1] = '\0';
+	return end;
+}
diff --git a/string/string.h b/string/string.h
index 54dd8344de..966a8cb744 100644
--- a/string/string.h
+++ b/string/string.h
@@ -502,6 +502,13 @@ extern char *stpncpy (char *__restrict __dest,
 #endif
 
 #ifdef	__USE_GNU
+/* Copy the string SRC into a null-terminated string at DEST,
+   truncating if it would run after END.  Return a pointer to
+   the terminating null byte, or END if the string was truncated,
+   or NULL if DEST was NULL. */
+extern char *stpecpy (char *__dest, char *__end, const char *__restrict __src)
+     __THROW __nonnull ((2, 3));
+
 /* Compare S1 and S2 as strings holding name & indices/version numbers.  */
 extern int strverscmp (const char *__s1, const char *__s2)
      __THROW __attribute_pure__ __nonnull ((1, 2));
-- 
2.39.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/1] string: Add stpecpy(3)
  2022-12-22 21:42 ` [PATCH 1/1] " Alejandro Colomar
@ 2022-12-23  7:02   ` Sam James
  2022-12-23 12:26     ` Alejandro Colomar
  0 siblings, 1 reply; 26+ messages in thread
From: Sam James @ 2022-12-23  7:02 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Zack Weinberg via Libc-alpha, Alejandro Colomar

[-- Attachment #1: Type: text/plain, Size: 1197 bytes --]



> On 22 Dec 2022, at 21:42, Alejandro Colomar via Libc-alpha <libc-alpha@sourceware.org> wrote:
> 
> Glibc didn't provide any function that copies a string with truncation.
> 
> It only provided strncpy(3) and stpncpy(3), which copy from a string
> into a null-padded character sequence at the destination fixed-width
> buffer, with truncation.
> 
> Those old functions, which don't produce a string, have been misused for
> a long time as a poor-man's replacement for strlcpy(3), but doing that
> is a source of bugs, since it's hard to calculate the right size that
> should be passed to the function, and it's also necessary to explicitly
> terminate the buffer with a null byte.  Detecting truncation is yet
> another problem.
> 
> stpecpy(3), described in the new string_copying(7) manual page, is
> similar to OpenBSD's strlcpy(3)/strlcat(3), but:
> 
> -  It's simpler to implement.
> -  It's faster.
> -  It's simpler to detect truncation.

Given strlcpy and strlcat are in POSIX next and therefore bar
some extraordinary event will be in glibc, I think we should
probably wait until those two land, then see if there's still
an appetite for stpecpy in glibc.

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 358 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/1] string: Add stpecpy(3)
  2022-12-23  7:02   ` Sam James
@ 2022-12-23 12:26     ` Alejandro Colomar
  2022-12-23 12:29       ` Alejandro Colomar
                         ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Alejandro Colomar @ 2022-12-23 12:26 UTC (permalink / raw)
  To: Sam James
  Cc: Zack Weinberg via Libc-alpha, Alejandro Colomar, Florian Weimer,
	Paul Eggert


[-- Attachment #1.1: Type: text/plain, Size: 8478 bytes --]

Hi Sam!

On 12/23/22 08:02, Sam James wrote:
> 
> 
>> On 22 Dec 2022, at 21:42, Alejandro Colomar via Libc-alpha <libc-alpha@sourceware.org> wrote:
>>
>> Glibc didn't provide any function that copies a string with truncation.
>>
>> It only provided strncpy(3) and stpncpy(3), which copy from a string
>> into a null-padded character sequence at the destination fixed-width
>> buffer, with truncation.
>>
>> Those old functions, which don't produce a string, have been misused for
>> a long time as a poor-man's replacement for strlcpy(3), but doing that
>> is a source of bugs, since it's hard to calculate the right size that
>> should be passed to the function, and it's also necessary to explicitly
>> terminate the buffer with a null byte.  Detecting truncation is yet
>> another problem.
>>
>> stpecpy(3), described in the new string_copying(7) manual page, is
>> similar to OpenBSD's strlcpy(3)/strlcat(3), but:
>>
>> -  It's simpler to implement.
>> -  It's faster.
>> -  It's simpler to detect truncation.
> 
> Given strlcpy and strlcat are in POSIX next and therefore bar
> some extraordinary event will be in glibc, I think we should
> probably wait until those two land, then see if there's still
> an appetite for stpecpy in glibc.

I disagree for the following reasons.

strlcpy(3)/strlcat(3) are designed to be _slow_, in exchange for added 
simplicity and safety.  That's what Theo told me about them.  They didn't care 
about performance.  The two performance issues are:

-  Traverse the entire input string, to make sure it's a string.  stpecpy(3) 
instead only reads what's necessary for the copy; it stops reading after truncation.

-  strlcat(3) finds the terminating null byte; that's something you already know 
where it is, with functions that return a useful pointer (mempcpy(3), stpcpy(3), 
and stpecpy(3)).

While there are many programs that may be fine with the OpenBSD functions, glibc 
should _also_ provide a way to do the operation with an optimal API.  And it's 
in line with glibc providing stpcpy(3) and mempcpy(3) as extensions (stpcpy(3) 
is now in POSIX).


The reason that triggered me wanting to add this function is seeing strncpy(3) 
used for a patch to some glibc internals themselves.  Using strlcpy(3)/cat(3) in 
glibc internals would be bad for performance; I would hope that glibc uses the 
optimal internals, even if it also provides slow functions for users.

There are probably more cases within existing code in glibc.  Just check the 
output of:

     $ grep -rn st.ncpy -A1 | grep -B1 " = '\\\\0'"


Moreover, in the Austin discussion for strlcpy(3)/cat(3), it was mentioned that 
strlcpy(3) has an interface identical to that of snprintf(3), and that "If we 
truly think that this is bad design, should we come up with a new version of 
snprintf() that also doesn't do this? I don't think so.".

Well, I do believe snprintf is also misdesigned, for the same reasons that the 
strlcpy(3) manual page states that you should use strlcpy(3) for catenating 
strings, but rather strlcat(3):

        To detect truncation, perhaps while building a pathname, something like
        the following might be used:

              char *dir, *file, pname[MAXPATHLEN];

              ...

              if (strlcpy(pname, dir, sizeof(pname)) >= sizeof(pname))
                      goto toolong;
              if (strlcat(pname, file, sizeof(pname)) >= sizeof(pname))
                      goto toolong;

        Since it is known how many  characters  were  copied  the  first  time,
        things can be sped up a bit by using a copy instead of an append:

              char *dir, *file, pname[MAXPATHLEN];
              size_t n;

              ...

              n = strlcpy(pname, dir, sizeof(pname));
              if (n >= sizeof(pname))
                      goto toolong;
              if (strlcpy(pname + n, file, sizeof(pname) ‐ n) >= sizeof(pname) ‐ n)
                      goto toolong;

        However,  one  may question the validity of such optimizations, as they
        defeat the whole purpose of strlcpy() and strlcat().  As  a  matter  of
        fact, the first version of this manual page got it wrong.

Guess what?  There's no 'cat' version of snprintf, so users are doomed to write 
buggy code when trying to use it to concatenate after some other string.  I've 
recently been investigating a lot about it, and found invocations of Undefined 
Behavior, and some milder cases of benign off-by-one (on the safe side, by luck) 
errors, in calls to snprintf(3) in several important projects:

-  NGINX Unit:
    -  Undefined Behavior:
       <https://github.com/nginx/unit/issues/795#issuecomment-1345400420>

    -  Wrong truncation detection:
       <https://github.com/nginx/unit/pull/734#discussion_r1043963527>
       <https://github.com/nginx/unit/issues/804>

-  shadow:
    -  off-by-one:
       <https://github.com/shadow-maint/shadow/pull/607>

    -  clever code that looks like a bug but it's not:
       <https://github.com/shadow-maint/shadow/issues/608>

Rather than adding some catenating variant of snprintf(3), I suggest adding a 
single function that has an interface similar to stpcpy(3) and mempcpy(3), and 
identical to stpecpy(3):  stpeprintf(3):

<http://www.alejandro-colomar.es/src/alx/alx/libstp.git/tree/src/stp/stpe/stpeprintf.c>

I implemented it in terms of vsnprintf(3), so I need to handle a return of -1, 
but if implemented from scratch in glibc, in could be written to not be limited 
to INT_MAX (although I wonder why anyone would want to copy more than INT_MAX as 
a formatted string).


Below is its manual page in libstp.

Cheers,

Alex
---

stpeprintf(3)              Library Functions Manual              stpeprintf(3)

NAME
        stpeprintf, vstpeprintf - create a formatted string with truncation

LIBRARY
        Stp string library (libstp, pkgconf ‐‐cflags ‐‐libs libstp)

SYNOPSIS
        #include <stp/stpe/stpeprintf.h>

        char *_Nullable stpeprintf(char *_Nullable dst, char end[0],
                                   const char *restrict fmt, ...);
        char *_Nullable vstpeprintf(char *_Nullable dst, char end[0],
                                   const char *restrict fmt, va_list ap);

DESCRIPTION
        These functions are almost identical to snprintf(3) and vsnprintf(3).

        The  destination  buffer  is limited by a pointer to its end —one after
        its last element— instead of a size.

        These  functions  can  be  chained  with  calls  to  stpeprintf(3)  and
        vstpeprintf(3).

RETURN VALUE
        NULL
               •  If this function failed (see ERRORS).
               •  If dst was NULL.

        end
               •  If this call truncated.
               •  If  dst  was equal to end (a previous call to these functions
                  truncated).

        dst + strlen(dst)
               On success, these functions return a pointer to the  terminating
               null byte.

ERRORS
        These functions may fail for any of the same reasons as vsnprintf(3).

ATTRIBUTES
        For  an  explanation  of  the  terms  used in this section, see attrib‐
        utes(7).
        ┌────────────────────────────────────────────┬───────────────┬─────────┐
        │Interface                                   │ Attribute     │ Value   │
        ├────────────────────────────────────────────┼───────────────┼─────────┤
        │stpeprintf(3), vstpeprintf(3)               │ Thread safety │ MT‐Safe │
        └────────────────────────────────────────────┴───────────────┴─────────┘

STANDARDS
        None.

EXAMPLES
        See stpecpy(3).

SEE ALSO
        stpecpy(3), string_copying(7)

libstp (unreleased)                 (date)                       stpeprintf(3)








-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/1] string: Add stpecpy(3)
  2022-12-23 12:26     ` Alejandro Colomar
@ 2022-12-23 12:29       ` Alejandro Colomar
  2022-12-23 17:21       ` Alejandro Colomar
  2022-12-31 15:13       ` Sam James
  2 siblings, 0 replies; 26+ messages in thread
From: Alejandro Colomar @ 2022-12-23 12:29 UTC (permalink / raw)
  To: Sam James
  Cc: Zack Weinberg via Libc-alpha, Alejandro Colomar, Florian Weimer,
	Paul Eggert


[-- Attachment #1.1: Type: text/plain, Size: 1807 bytes --]



On 12/23/22 13:26, Alejandro Colomar wrote:

> Well, I do believe snprintf is also misdesigned, for the same reasons that the 
> strlcpy(3) manual page states that you should use strlcpy(3) for catenating 

typo fix:  s/should/shouldn't/

> strings, but rather strlcat(3):
> 
>         To detect truncation, perhaps while building a pathname, something like
>         the following might be used:
> 
>               char *dir, *file, pname[MAXPATHLEN];
> 
>               ...
> 
>               if (strlcpy(pname, dir, sizeof(pname)) >= sizeof(pname))
>                       goto toolong;
>               if (strlcat(pname, file, sizeof(pname)) >= sizeof(pname))
>                       goto toolong;
> 
>         Since it is known how many  characters  were  copied  the  first  time,
>         things can be sped up a bit by using a copy instead of an append:
> 
>               char *dir, *file, pname[MAXPATHLEN];
>               size_t n;
> 
>               ...
> 
>               n = strlcpy(pname, dir, sizeof(pname));
>               if (n >= sizeof(pname))
>                       goto toolong;
>               if (strlcpy(pname + n, file, sizeof(pname) ‐ n) >= sizeof(pname) ‐ n)
>                       goto toolong;
> 
>         However,  one  may question the validity of such optimizations, as they
>         defeat the whole purpose of strlcpy() and strlcat().  As  a  matter  of
>         fact, the first version of this manual page got it wrong.

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/1] string: Add stpecpy(3)
  2022-12-23 12:26     ` Alejandro Colomar
  2022-12-23 12:29       ` Alejandro Colomar
@ 2022-12-23 17:21       ` Alejandro Colomar
  2022-12-31 15:13       ` Sam James
  2 siblings, 0 replies; 26+ messages in thread
From: Alejandro Colomar @ 2022-12-23 17:21 UTC (permalink / raw)
  To: Wilco Dijkstra
  Cc: Zack Weinberg via Libc-alpha, Alejandro Colomar, Florian Weimer,
	Paul Eggert, Sam James, Steffen Nurpmeso


[-- Attachment #1.1: Type: text/plain, Size: 2559 bytes --]

Hi Wilco,

On 12/23/22 13:26, Alejandro Colomar wrote:
>         However,  one  may question the validity of such optimizations, as they
>         defeat the whole purpose of strlcpy() and strlcat().  As  a  matter  of
>         fact, the first version of this manual page got it wrong.
> 
> Guess what?  There's no 'cat' version of snprintf, so users are doomed to write 
> buggy code when trying to use it to concatenate after some other string.  I've 
> recently been investigating a lot about it, and found invocations of Undefined 
> Behavior, and some milder cases of benign off-by-one (on the safe side, by luck) 
> errors, in calls to snprintf(3) in several important projects:
> 
> -  NGINX Unit:
>     -  Undefined Behavior:
>        <https://github.com/nginx/unit/issues/795#issuecomment-1345400420>
> 
>     -  Wrong truncation detection:
>        <https://github.com/nginx/unit/pull/734#discussion_r1043963527>
>        <https://github.com/nginx/unit/issues/804>
> 
> -  shadow:
>     -  off-by-one:
>        <https://github.com/shadow-maint/shadow/pull/607>
> 
>     -  clever code that looks like a bug but it's not:
>        <https://github.com/shadow-maint/shadow/issues/608>

And adding strlcat(3) doesn't address the issue about snprintf(3), which, as 
EdSchouten said in the Austin discussion:

"
- strlcpy() fits within the existing set of functions like a glove. strlcpy(a, 
b, n) behaves identically to snprintf(a, n, "%s", b). The return value always 
corresponds to the number of non-null bytes that would have been written. If we 
truly think that this is bad design, should we come up with a new version of 
snprintf() that also doesn't do this? I don't think so.
"

I only disagree in the last part ("I don't think so").  As I linked in my 
previous message, there have been numerous misuses of snprintf(3), due to the 
fact that it's not designed to be concatenated.  But of course there's no 
alternative, so the only way is using it, and hoping that you didn't introduce a 
bug.

Steffen (was it Nurpmeso?) in that same discussion rebated the claims about 
strlcpy(3) with performance claims that snprintf(3) is slow, but that was the 
least evil.  The real evil with snprintf(3) is that it doesn't have a 'cat' 
complement.

<https://www.austingroupbugs.net/view.php?id=986>

So, I'll send a second revision of the patch set to add stpeprintf(3).

Cheers,

Alex

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/1] string: Add stpecpy(3)
  2022-12-23 12:26     ` Alejandro Colomar
  2022-12-23 12:29       ` Alejandro Colomar
  2022-12-23 17:21       ` Alejandro Colomar
@ 2022-12-31 15:13       ` Sam James
  2022-12-31 15:15         ` Alejandro Colomar
  2 siblings, 1 reply; 26+ messages in thread
From: Sam James @ 2022-12-31 15:13 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Alejandro Colomar, Florian Weimer, Paul Eggert, Libc-alpha

[-- Attachment #1: Type: text/plain, Size: 1762 bytes --]



> On 23 Dec 2022, at 12:26, Alejandro Colomar via Libc-alpha <libc-alpha@sourceware.org> wrote:
> 
> Hi Sam!
> 
> On 12/23/22 08:02, Sam James wrote:
>>> On 22 Dec 2022, at 21:42, Alejandro Colomar via Libc-alpha <libc-alpha@sourceware.org> wrote:
>>> 
>>> Glibc didn't provide any function that copies a string with truncation.
>>> 
>>> It only provided strncpy(3) and stpncpy(3), which copy from a string
>>> into a null-padded character sequence at the destination fixed-width
>>> buffer, with truncation.
>>> 
>>> Those old functions, which don't produce a string, have been misused for
>>> a long time as a poor-man's replacement for strlcpy(3), but doing that
>>> is a source of bugs, since it's hard to calculate the right size that
>>> should be passed to the function, and it's also necessary to explicitly
>>> terminate the buffer with a null byte.  Detecting truncation is yet
>>> another problem.
>>> 
>>> stpecpy(3), described in the new string_copying(7) manual page, is
>>> similar to OpenBSD's strlcpy(3)/strlcat(3), but:
>>> 
>>> -  It's simpler to implement.
>>> -  It's faster.
>>> -  It's simpler to detect truncation.
>> Given strlcpy and strlcat are in POSIX next and therefore bar
>> some extraordinary event will be in glibc, I think we should
>> probably wait until those two land, then see if there's still
>> an appetite for stpecpy in glibc.
> 
> I disagree for the following reasons.
[snip]

Hi Alex,

Thanks for your detailed and thoughtful reply. I'll reflect on your comments here
and in the rest of the thread(s) - but there's some intriguing pointers you've made.

I wasn't trying to be dismissive at all so I hope it didn't come across like that.

Thank you again!

Best,
sam


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 358 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/1] string: Add stpecpy(3)
  2022-12-31 15:13       ` Sam James
@ 2022-12-31 15:15         ` Alejandro Colomar
  0 siblings, 0 replies; 26+ messages in thread
From: Alejandro Colomar @ 2022-12-31 15:15 UTC (permalink / raw)
  To: Sam James; +Cc: Alejandro Colomar, Florian Weimer, Paul Eggert, Libc-alpha


[-- Attachment #1.1: Type: text/plain, Size: 555 bytes --]

Hey Sam!

On 12/31/22 16:13, Sam James wrote:
[...]

>>
>> I disagree for the following reasons.
> [snip]
> 
> Hi Alex,
> 
> Thanks for your detailed and thoughtful reply. I'll reflect on your comments here
> and in the rest of the thread(s) - but there's some intriguing pointers you've made.
> 
> I wasn't trying to be dismissive at all so I hope it didn't come across like that.

Ahh, no, I didn't take it bad; really :)

> 
> Thank you again!

Cheers!

Alex

> 
> Best,
> sam
> 

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2022-12-31 15:15 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-23 18:35 [PATCH 1/1] string: Add stpecpy(3) Wilco Dijkstra
2022-12-23 22:40 ` Alejandro Colomar
  -- strict thread matches above, loose matches on Subject: below --
2022-12-23 23:24 Wilco Dijkstra
2022-12-24  0:05 ` Alejandro Colomar
2022-12-24  0:26   ` Alejandro Colomar
2022-12-25  1:52     ` Noah Goldstein
2022-12-25 14:37       ` Alejandro Colomar
2022-12-25 22:31         ` Noah Goldstein
2022-12-26  0:26           ` Alejandro Colomar
2022-12-26  0:32             ` Noah Goldstein
2022-12-26  0:37               ` Alejandro Colomar
2022-12-26  2:43                 ` Noah Goldstein
2022-12-26 22:25                   ` Alejandro Colomar
2022-12-26 23:24                     ` Alejandro Colomar
2022-12-26 23:52                       ` Alejandro Colomar
2022-12-27  0:12                         ` Alejandro Colomar
2022-12-23 14:59 Wilco Dijkstra
2022-12-23 17:03 ` Alejandro Colomar
2022-12-23 17:27   ` Alejandro Colomar
2022-12-22 21:42 [PATCH 0/1] " Alejandro Colomar
2022-12-22 21:42 ` [PATCH 1/1] " Alejandro Colomar
2022-12-23  7:02   ` Sam James
2022-12-23 12:26     ` Alejandro Colomar
2022-12-23 12:29       ` Alejandro Colomar
2022-12-23 17:21       ` Alejandro Colomar
2022-12-31 15:13       ` Sam James
2022-12-31 15:15         ` Alejandro Colomar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).