Re: strncpy clarify result may not be null terminated

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

* Re: strncpy clarify result may not be null terminated
       [not found]               ` <ZUsoIbhrJar6ojux@dj3ntoo>
@ 2023-11-08  9:51                 ` Alejandro Colomar
  2023-11-08  9:59                   ` Thorsten Kukuk
                                     ` (2 more replies)
  0 siblings, 3 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-08  9:51 UTC (permalink / raw)
  To: libc-alpha, Jonny Grant, linux-man

[-- Attachment #1: Type: text/plain, Size: 1930 bytes --]

On Wed, Nov 08, 2023 at 12:18:09AM -0600, Oskari Pirhonen wrote:
> On Tue, Nov 07, 2023 at 17:17:29 +0100, Alejandro Colomar wrote:
> > 
> > I would love to find this API useless, and in that case, I'd go further
> > and add [[deprecated]] in the synopsis, and write a heavy statement in a
> > BUGS section.  But I can't do that while it's still a good function in
> > some cases (even if those cases are bad design, such as utmp(5)).
> > 
> > On the other hand, utmp(5) has other issues, like Y2038, and AFAIR it's
> > being deprecated, so maybe we could consider deprecating strncpy(3).
> > 
> > If I see enough proof that all APIs that require this function are
> > deprecated, I'll happily declare the function deprecated as well.
> > (in fact I already did some time ago, but then found this use with
> > utmp(5), which is why I removed the deprecation; see
> > <https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/man3/strncpy.3?id=30d458d1a6261221bad15e58f1862e0dda24f4a0>).
> > 
> 
> If you ask me, I'd not mark libc functions as deprecated without some
> kind of consesnsus from the libc maintainers too. They may not go so far
> as to add the `deprecated` attribute in their own headers, at least not
> yet at that point in time, but some kind of written "Yes, please don't
> use this function" would be nice to have before marking them in the man
> pages.

Okay, let's ask them.

Hi glibc developers,

strncpy(3) is useful to write to fixed-width buffers like `struct utmp`
and `struct utmpx`.  Is there any other libc API that needs strncpy(3)?
Of those two APIs (utmp and utmpx) and any other that need strncpy(3),
are those deprecated, or is any such API still good for new code?

If all APIs that need strncpy(3) are deprecated, I propose recommending
against its use in new code.

Thanks,
Alex

> 
> - Oskari



-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-08  9:51                 ` strncpy clarify result may not be null terminated Alejandro Colomar
@ 2023-11-08  9:59                   ` Thorsten Kukuk
  2023-11-08 15:09                     ` Alejandro Colomar
       [not found]                     ` <6bcad2492ab843019aa63895beaea2ce@DB6PR04MB3255.eurprd04.prod.outlook.com>
  2023-11-08 14:06                   ` Zack Weinberg
  2023-11-08 19:04                   ` DJ Delorie
  2 siblings, 2 replies; 77+ messages in thread
From: Thorsten Kukuk @ 2023-11-08  9:59 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: libc-alpha, Jonny Grant, linux-man

On Wed, Nov 08, Alejandro Colomar wrote:

> strncpy(3) is useful to write to fixed-width buffers like `struct utmp`
> and `struct utmpx`.  Is there any other libc API that needs strncpy(3)?
> Of those two APIs (utmp and utmpx) and any other that need strncpy(3),
> are those deprecated, or is any such API still good for new code?

Everything around utmp/utmpx/wtmp/lastlog is deprecated.

openSUSE Tumbleweed and MicroOS are no longer using nor supporting them
and fresh installations don't have that files anymore.
So new code should not use utmp/utmp/wtmp/lastlog anymore. Alternatives
are e.g. systemd-logind/wtmpdb/lastlog2.

  Thorsten

-- 
Thorsten Kukuk, Distinguished Engineer, Senior Architect, Future Technologies
SUSE Software Solutions Germany GmbH, Frankenstraße 146, 90461 Nuernberg, Germany
Managing Director: Ivo Totev, Andrew McDonald, Werner Knoblich
(HRB 36809, AG Nürnberg)

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-08  9:51                 ` strncpy clarify result may not be null terminated Alejandro Colomar
  2023-11-08  9:59                   ` Thorsten Kukuk
@ 2023-11-08 14:06                   ` Zack Weinberg
  2023-11-08 15:07                     ` Alejandro Colomar
  2023-11-08 19:04                   ` DJ Delorie
  2 siblings, 1 reply; 77+ messages in thread
From: Zack Weinberg @ 2023-11-08 14:06 UTC (permalink / raw)
  To: Alejandro Colomar, GNU libc development, Jonny Grant,
	'linux-man'

>> If you ask me, I'd not mark libc functions as deprecated without some
>> kind of consesnsus from the libc maintainers too.
...
> Okay, let's ask them.
...
> Hi glibc developers,
>
> strncpy(3)
...

Speaking only for myself, I would be very reluctant to declare any standardized function "deprecated" by glibc unless the relevant standards have also made that declaration. This goes double for anything that was in C89.

Also speaking only for myself, the Linux manpages are welcome to discourage the use of any function that you feel is not a wise choice for new programs, but the word "deprecated" should be reserved for cases where there really has been a declaration of deprecation by us and/or the standards. The word "obsolete" should also be used very cautiously; it's broader, but I personally would only use it in situations where there is a direct replacement (e.g. sigaction replaces signal, strsep replaces strtok and strtok_r).

In the specific cases we're discussing: I would definitely like to see a BUGS or NOTES section in the strncpy(3) manpage, warning people that it's probably not what they want and recommending use of strlen+memcpy instead. I don't know enough about the utmp(x) situation to have a strong opinion, but I do think the manpages need to be very clear that this particular proposed replacement for utmp(x) is Linux-specific and still somewhat experimental.

zw

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-08 14:06                   ` Zack Weinberg
@ 2023-11-08 15:07                     ` Alejandro Colomar
  2023-11-08 21:35                       ` Carlos O'Donell
  0 siblings, 1 reply; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-08 15:07 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: GNU libc development, Jonny Grant, 'linux-man'

[-- Attachment #1: Type: text/plain, Size: 3528 bytes --]

Hi Zack!

On Wed, Nov 08, 2023 at 09:06:48AM -0500, Zack Weinberg wrote:
> >> If you ask me, I'd not mark libc functions as deprecated without some
> >> kind of consesnsus from the libc maintainers too.
> ...
> > Okay, let's ask them.
> ...
> > Hi glibc developers,
> >
> > strncpy(3)
> ...
> 
> Speaking only for myself, I would be very reluctant to declare any
> standardized function "deprecated" by glibc unless the relevant
> standards have also made that declaration.  This goes double for
> anything that was in C89.

I understand your point of view, but disagree with it.  Deprecation by
ISO C or POSIX takes very very long.  We had gets(3) for decades until
they realized it should be removed from the standards.

	STANDARDS
	     POSIX.1‐2008.

	HISTORY
	     C89, POSIX.1‐2001.

	     LSB deprecates gets().  POSIX.1‐2008 marks gets()  obsoles‐
	     cent.  ISO C11 removes the specification of gets() from the
	     C  language, and since glibc 2.16, glibc header files don’t
	     expose the function declaration if the _ISOC11_SOURCE  fea‐
	     ture test macro is defined.

So we had it in ISO C in C89 and C99, and only in C11 they realized it
had to be removed.  POSIX hasn't even removed it yet!  I won't hesitate
to kill a function just because of bureaucracy.

The standard, especially C89, was just a reflection of the commonalities
of most implementation.  It was a burden of implementations to add new
stuff or to remove existing stuff.  Later revisions of the standards
invented more, though.

In this case, since ISO C has no APIs that use strncpy(3), it could (and
should) already deprecate strncpy(3) from ISO C.  POSIX still needs it
while it keeps utmpx(5), because there's no other way to correctly write
to the fixed-width buffers within struct utmpx.

> 
> Also speaking only for myself, the Linux manpages are welcome to
> discourage the use of any function that you feel is not a wise choicei
> for new programs, but the word "deprecated" should be reserved for
> cases where there really has been a declaration of deprecation by us
> and/or the standards.

If a function is deprecated by a standard or other entity, that will be
reflected in the STANDARDS or HISTORY section.  For deprecation by the
manual itself, the SYNOPSIS (and BUGS) sections are fine.  In the end,
the word 'deprecate' isn't any magic.

	From WordNet (r) 3.0 (2006) [wn]:

	  deprecate
	      v 1: express strong disapproval of; deplore

That term applies to strncpy(3).

> The word "obsolete" should also be used very cautiously; it's broader,
> but I personally would only use it in situations where there is a
> direct replacement (e.g. sigaction replaces signal, strsep replaces strtok and strtok_r).
> 
> In the specific cases we're discussing: I would definitely like to see
> a BUGS or NOTES section in the strncpy(3) manpage, warning people that
> it's probably not what they want and recommending use of strlen+memcpy
> instead. I don't know enough about the utmp(x) situation to have a
> strong opinion, but I do think the manpages need to be very clear that
> this particular proposed replacement for utmp(x) is Linux-specific and
> still somewhat experimental.

But yes, we need to make sure that the APIs that need strncpy(3) are
all deprecated.  If other Unix systems still need utmpx or similar
stuff, strncpy(3) will still be necessary.

Cheers,
Alex

> 
> zw

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-08  9:59                   ` Thorsten Kukuk
@ 2023-11-08 15:09                     ` Alejandro Colomar
       [not found]                     ` <6bcad2492ab843019aa63895beaea2ce@DB6PR04MB3255.eurprd04.prod.outlook.com>
  1 sibling, 0 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-08 15:09 UTC (permalink / raw)
  To: Thorsten Kukuk; +Cc: libc-alpha, Jonny Grant, linux-man

[-- Attachment #1: Type: text/plain, Size: 899 bytes --]

On Wed, Nov 08, 2023 at 09:59:11AM +0000, Thorsten Kukuk wrote:
> On Wed, Nov 08, Alejandro Colomar wrote:
> 
> > strncpy(3) is useful to write to fixed-width buffers like `struct utmp`
> > and `struct utmpx`.  Is there any other libc API that needs strncpy(3)?
> > Of those two APIs (utmp and utmpx) and any other that need strncpy(3),
> > are those deprecated, or is any such API still good for new code?
> 

Hi Thorsten!

> Everything around utmp/utmpx/wtmp/lastlog is deprecated.

Is this a Linux-specific thing?  Do you know if the BSDs also deprecated
utmpx?

Thanks,
Alex

> 
> openSUSE Tumbleweed and MicroOS are no longer using nor supporting them
> and fresh installations don't have that files anymore.
> So new code should not use utmp/utmp/wtmp/lastlog anymore. Alternatives
> are e.g. systemd-logind/wtmpdb/lastlog2.

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
       [not found]                     ` <6bcad2492ab843019aa63895beaea2ce@DB6PR04MB3255.eurprd04.prod.outlook.com>
@ 2023-11-08 15:44                       ` Thorsten Kukuk
  2023-11-08 17:26                         ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 77+ messages in thread
From: Thorsten Kukuk @ 2023-11-08 15:44 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: libc-alpha, Jonny Grant, linux-man

On Wed, Nov 08, Alejandro Colomar wrote:

> On Wed, Nov 08, 2023 at 09:59:11AM +0000, Thorsten Kukuk wrote:
> > On Wed, Nov 08, Alejandro Colomar wrote:
> > 
> > > strncpy(3) is useful to write to fixed-width buffers like `struct utmp`
> > > and `struct utmpx`.  Is there any other libc API that needs strncpy(3)?
> > > Of those two APIs (utmp and utmpx) and any other that need strncpy(3),
> > > are those deprecated, or is any such API still good for new code?
> > 
> 
> Hi Thorsten!
> 
> > Everything around utmp/utmpx/wtmp/lastlog is deprecated.
> 
> Is this a Linux-specific thing?  Do you know if the BSDs also deprecated
> utmpx?

Beside the design issues of the interface, which are generic, the Y2038
issue is more or less glibc specific and a result of supporting 32bit
and 64bit userland at the same time.
For most other implementations I'm aware of there is no Y2038 problem,
either because they don't support utmp/utmpx/... like musl libc, or they
were able to switch to a 64bit time variable or used that already.
So no need to change anything.
For BSD I don't really know the situation, but as far as I know, they
don't have the problem and thus no need to change anything.

  Thorsten

> Thanks,
> Alex
> 
> > 
> > openSUSE Tumbleweed and MicroOS are no longer using nor supporting them
> > and fresh installations don't have that files anymore.
> > So new code should not use utmp/utmp/wtmp/lastlog anymore. Alternatives
> > are e.g. systemd-logind/wtmpdb/lastlog2.
> 
> -- 
> <https://www.alejandro-colomar.es/>



-- 
Thorsten Kukuk, Distinguished Engineer, Senior Architect, Future Technologies
SUSE Software Solutions Germany GmbH, Frankenstraße 146, 90461 Nuernberg, Germany
Managing Director: Ivo Totev, Andrew McDonald, Werner Knoblich
(HRB 36809, AG Nürnberg)

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-08 15:44                       ` Thorsten Kukuk
@ 2023-11-08 17:26                         ` Adhemerval Zanella Netto
  0 siblings, 0 replies; 77+ messages in thread
From: Adhemerval Zanella Netto @ 2023-11-08 17:26 UTC (permalink / raw)
  To: Thorsten Kukuk, Alejandro Colomar; +Cc: libc-alpha, Jonny Grant, linux-man

On 08/11/23 12:44, Thorsten Kukuk wrote:
> On Wed, Nov 08, Alejandro Colomar wrote:
> 
>> On Wed, Nov 08, 2023 at 09:59:11AM +0000, Thorsten Kukuk wrote:
>>> On Wed, Nov 08, Alejandro Colomar wrote:
>>>
>>>> strncpy(3) is useful to write to fixed-width buffers like `struct utmp`
>>>> and `struct utmpx`.  Is there any other libc API that needs strncpy(3)?
>>>> Of those two APIs (utmp and utmpx) and any other that need strncpy(3),
>>>> are those deprecated, or is any such API still good for new code?
>>>
>>
>> Hi Thorsten!
>>
>>> Everything around utmp/utmpx/wtmp/lastlog is deprecated.
>>
>> Is this a Linux-specific thing?  Do you know if the BSDs also deprecated
>> utmpx?
> 
> Beside the design issues of the interface, which are generic, the Y2038
> issue is more or less glibc specific and a result of supporting 32bit
> and 64bit userland at the same time.
> For most other implementations I'm aware of there is no Y2038 problem,
> either because they don't support utmp/utmpx/... like musl libc, or they
> were able to switch to a 64bit time variable or used that already.
> So no need to change anything.

In fact the glibc utmp y2038 support depends of the ABI, some 64 bit ABIs
decided to be compatible with 32 bits so the utmp files could be read/parsed
by both ABIs (defined by __WORDSIZE_TIME64_COMPAT32).  This required the 
ut_tv field to be define not as a 'struct timeval', but rather with a similar
struct with 32 bit tv_sec (yes, it is a mess and not sure why it was
considered a good idea back then).

It means that for 64 bits that define __WORDSIZE_TIME64_COMPAT32ABI (mips, 
riscv, s390, sparc, powerpc, and x86) the utmp ABI is broken regarding
y2038 support. The ut_tv is also defined depending of the time_t at build 
time (_TIME_BITS), so if you have programs with different time_t support, 
they won't correctly access the utmp (gnulib seems to have some overrides 
to fix it).

Fixing those issues would require a lot of work that I don't think it worth 
for a API with some inherent implementation flaws [1] (most likely it would
require a complete rewrite, which logind basically did).  That's why I am
leaning to complete remove glibc implementation and mimic what musl did
(no-op implementation that return -1/ENOTSUP where applicable). 

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=24492

> For BSD I don't really know the situation, but as far as I know, they
> don't have the problem and thus no need to change anything.
> 
>   Thorsten
> 
>> Thanks,
>> Alex
>>
>>>
>>> openSUSE Tumbleweed and MicroOS are no longer using nor supporting them
>>> and fresh installations don't have that files anymore.
>>> So new code should not use utmp/utmp/wtmp/lastlog anymore. Alternatives
>>> are e.g. systemd-logind/wtmpdb/lastlog2.
>>
>> -- 
>> <https://www.alejandro-colomar.es/>
> 
> 
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-08  9:51                 ` strncpy clarify result may not be null terminated Alejandro Colomar
  2023-11-08  9:59                   ` Thorsten Kukuk
  2023-11-08 14:06                   ` Zack Weinberg
@ 2023-11-08 19:04                   ` DJ Delorie
  2023-11-08 19:40                     ` Alejandro Colomar
  2 siblings, 1 reply; 77+ messages in thread
From: DJ Delorie @ 2023-11-08 19:04 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: libc-alpha, jg, linux-man

Alejandro Colomar <alx@kernel.org> writes:
> strncpy(3) is useful to write to fixed-width buffers like `struct utmp`
> and `struct utmpx`.  Is there any other libc API that needs strncpy(3)?

Let's not limit ourselves to glibc APIs.  Tar format, for example, uses
fixed length fields (and my bet is that strncpy was created for it) yet
tar is not part of glibc.

IMHO the solution here is to document strncpy with sufficiently obvious
intent that it is NOT a length-limited strcpy (i.e. strlcpy) and should
ONLY be used for its intended purpose (filling a space-padded but not
null-terminated field)

It is not documentation's purpose to limit programmer's creativity, just
to give them an accurate representation of what the functions do.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-08 19:04                   ` DJ Delorie
@ 2023-11-08 19:40                     ` Alejandro Colomar
  2023-11-08 19:58                       ` DJ Delorie
  0 siblings, 1 reply; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-08 19:40 UTC (permalink / raw)
  To: DJ Delorie; +Cc: libc-alpha, jg, linux-man

[-- Attachment #1: Type: text/plain, Size: 1849 bytes --]

Hi DJ,

On Wed, Nov 08, 2023 at 02:04:45PM -0500, DJ Delorie wrote:
> Alejandro Colomar <alx@kernel.org> writes:
> > strncpy(3) is useful to write to fixed-width buffers like `struct utmp`
> > and `struct utmpx`.  Is there any other libc API that needs strncpy(3)?
> 
> Let's not limit ourselves to glibc APIs.  Tar format, for example, uses
> fixed length fields (and my bet is that strncpy was created for it) yet
> tar is not part of glibc.
> 
> IMHO the solution here is to document strncpy with sufficiently obvious
> intent that it is NOT a length-limited strcpy (i.e. strlcpy) and should
> ONLY be used for its intended purpose (filling a space-padded but not
> null-terminated field)

Indeed.  That's what I did (I think).

DESCRIPTION
     These  functions  copy  the string pointed to by src into a null‐
     padded character sequence at the fixed‐width buffer pointed to by
     dst.  If the destination buffer, limited by its size, isn’t large
     enough to hold the copy,  the  resulting  character  sequence  is
     truncated.

...

CAVEATS
     The name of these functions is confusing.  These  functions  pro‐
     duce   a  null‐padded  character  sequence,  not  a  string  (see
     string_copying(7)).

     It’s impossible to distinguish truncation by the  result  of  the
     call,  from  a  character sequence that just fits the destination
     buffer; truncation should be detected by comparing the length  of
     the input string with the size of the destination buffer.


I refuse to add any hints that strncpy(3) is good for copying strings.

> 
> It is not documentation's purpose to limit programmer's creativity, just
> to give them an accurate representation of what the functions do.

Thanks!

Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-08 19:40                     ` Alejandro Colomar
@ 2023-11-08 19:58                       ` DJ Delorie
  2023-11-08 20:13                         ` Alejandro Colomar
  0 siblings, 1 reply; 77+ messages in thread
From: DJ Delorie @ 2023-11-08 19:58 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: libc-alpha, jg, linux-man


Perhaps an example that shows the problem?

EXAMPLES

    strncpy (buf, "1", 5);
    { '1', 0, 0, 0, 0 }

    strncpy (buf, "1234", 5);
    { '1', '2', '3', '4', 0 }

    strncpy (buf, "12345", 5);
    { '1', '2', '3', '4', '5' }

    strncpy (buf, "123456", 5);
    { '1', '2', '3', '4', '5' }

Maybe strcpy and strncpy shouldn't even share man pages, since they're
not as related as we once thought?


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-08 19:58                       ` DJ Delorie
@ 2023-11-08 20:13                         ` Alejandro Colomar
  2023-11-08 21:07                           ` DJ Delorie
  0 siblings, 1 reply; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-08 20:13 UTC (permalink / raw)
  To: DJ Delorie; +Cc: libc-alpha, jg, linux-man

[-- Attachment #1: Type: text/plain, Size: 2647 bytes --]

Hi DJ,

On Wed, Nov 08, 2023 at 02:58:24PM -0500, DJ Delorie wrote:
> 
> Perhaps an example that shows the problem?

Maybe.

> 
> EXAMPLES
> 
>     strncpy (buf, "1", 5);
>     { '1', 0, 0, 0, 0 }
> 
>     strncpy (buf, "1234", 5);
>     { '1', '2', '3', '4', 0 }
> 
>     strncpy (buf, "12345", 5);
>     { '1', '2', '3', '4', '5' }
> 
>     strncpy (buf, "123456", 5);
>     { '1', '2', '3', '4', '5' }

Would you mind reading the latest versions of strcpy(3), strncpy(3), and
string_copying(7), as in the git repository, and comment your thoughts?

You don't even need to install the pages from git.  You can read them
with this:

$ git clone https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/
$ cd man-pages/
$ man ./man3/strcpy.3
$ man ./man3/strncpy.3
$ man ./man7/string_copying.7

Also check the examples and suggest if anything could be clearer.

Thanks!

> 
> Maybe strcpy and strncpy shouldn't even share man pages, since they're
> not as related as we once thought?

They don't (anymore):

	$ pwd
	/home/alx/src/linux/man-pages/man-pages/master
	$ git log --oneline -1
	b8584be14 (HEAD -> master, korg/master, alx/main, main) bcmp.3: wfix

	$ grep -e '\.TH ' -e '\.so ' man3/strcpy.3 
	.TH strcpy 3 (date) "Linux man-pages (unreleased)"
	$ grep -e '\.TH ' -e '\.so ' man3/stpcpy.3 
	.so man3/strcpy.3

	$ grep -e '\.TH ' -e '\.so ' man3/strncpy.3 
	.so man3/stpncpy.3
	$ grep -e '\.TH ' -e '\.so ' man3/stpncpy.3 
	.TH stpncpy 3 (date) "Linux man-pages (unreleased)"

The only shared page is string_copying(7), which attempts to clarify all
of this.  It was only in old versions of the Linux man-pages where they
shared page.

	$ pwd
	/home/alx/src/linux/man-pages/man-pages/5/5.13
	$ git log --oneline -1
	091fbf1fe (HEAD, tag: man-pages-5.13) Ready for 5.13

	$ grep -e '\.TH ' -e '\.so ' man3/strcpy.3 
	.TH STRCPY 3  2021-03-22 "GNU" "Linux Programmer's Manual"
	$ grep -e '\.TH ' -e '\.so ' man3/stpcpy.3 
	.TH STPCPY 3  2021-03-22 "GNU" "Linux Programmer's Manual"

	$ grep -e '\.TH ' -e '\.so ' man3/strncpy.3 
	.so man3/strcpy.3
	$ grep -e '\.TH ' -e '\.so ' man3/stpncpy.3 
	.TH STPNCPY 3  2021-03-22 "GNU" "Linux Programmer's Manual"

I've spent the last year working on shadow-utils' string handling code,
while at the same time wrote string_copying(7) as a complete guide to
*cpy() functions, detailing what they do and what they don't, and also
rewrote all the pages for these functions with shorter reference guides
that refer to string_copying(7) for more details.

Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-08 20:13                         ` Alejandro Colomar
@ 2023-11-08 21:07                           ` DJ Delorie
  2023-11-08 21:50                             ` Alejandro Colomar
  0 siblings, 1 reply; 77+ messages in thread
From: DJ Delorie @ 2023-11-08 21:07 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: libc-alpha, jg, linux-man

Alejandro Colomar <alx@kernel.org> writes:
> Would you mind reading the latest versions of strcpy(3), strncpy(3), and
> string_copying(7), as in the git repository, and comment your thoughts?

I think my examples would work well after the first CAVEATS paragaph:

       The name of these functions is confusing.  These functions
       produce a null-padded character sequence, not a string (see
       string_copying(7)), like this:

     strncpy (buf, "1", 5) -> { '1', 0, 0, 0, 0 }
     strncpy (buf, "1234", 5) -> { '1', '2', '3', '4', 0 }
     strncpy (buf, "12345", 5) -> { '1', '2', '3', '4', '5' }
     strncpy (buf, "123456", 5) -> { '1', '2', '3', '4', '5' }

>       These functions copy the string pointed to by src  into  a  null-padded
>       character sequence at the fixed-width buffer pointed to by dst.  If the
>       destination buffer, limited by its size, isn't large enough to hold the
>       copy,  the  resulting character sequence is truncated.

hmmm... perhaps

  These functions copy at most SZ bytes from SRC into a fixed-length
  buffer DST, padding any unwritten bytes in DST with NUL bytes.
  Specifically, if SRC has a NUL byte in the first SZ bytes, copying
  stops there and any remaining bytes in DST are filled with NUL bytes.
  If there are no NUL bytes in the first SZ bytes of SRC, SZ bytes are
  copied to DST.

This avoids the term "string" completely and emphasises the not-string
nature of the destination.

 stpncpy,  strncpy  - zero a fixed-width buffer and copy a string into a
       character sequence with truncation and zero the rest of it

Or "fill a fixed-width zero-padded buffer with bytes from a string"

That avoids saying "copy a string"

string_copying.7:

> For historic reasons, some standard APIs, such as utmpx(5),

Perhaps "some standard APIs and file formats,, such as utmpx(5) or
tar(1)," ?

> however, those padding null bytes are not part of the character
> sequence.

add ", and may not be present if not needed." ?


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-08 15:07                     ` Alejandro Colomar
@ 2023-11-08 21:35                       ` Carlos O'Donell
  2023-11-08 22:11                         ` Alejandro Colomar
  0 siblings, 1 reply; 77+ messages in thread
From: Carlos O'Donell @ 2023-11-08 21:35 UTC (permalink / raw)
  To: Alejandro Colomar, Zack Weinberg
  Cc: GNU libc development, Jonny Grant, 'linux-man'

On 11/8/23 10:07, Alejandro Colomar wrote:
> So we had it in ISO C in C89 and C99, and only in C11 they realized it
> had to be removed.  POSIX hasn't even removed it yet!  I won't hesitate
> to kill a function just because of bureaucracy.

Attempting to get consensus at an international level, across cultural boundaries,
use cases, workloads, and developer workflows is difficult and not intended to be
bureaucracy for the sake of bureaucracy.

-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-08 21:07                           ` DJ Delorie
@ 2023-11-08 21:50                             ` Alejandro Colomar
  2023-11-08 22:17                               ` [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string Alejandro Colomar
  0 siblings, 1 reply; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-08 21:50 UTC (permalink / raw)
  To: DJ Delorie; +Cc: libc-alpha, jg, linux-man

[-- Attachment #1: Type: text/plain, Size: 2856 bytes --]

Hi DJ,

On Wed, Nov 08, 2023 at 04:07:07PM -0500, DJ Delorie wrote:
> Alejandro Colomar <alx@kernel.org> writes:
> > Would you mind reading the latest versions of strcpy(3), strncpy(3), and
> > string_copying(7), as in the git repository, and comment your thoughts?
> 
> I think my examples would work well after the first CAVEATS paragaph:
> 
>        The name of these functions is confusing.  These functions
>        produce a null-padded character sequence, not a string (see
>        string_copying(7)), like this:
> 
>      strncpy (buf, "1", 5) -> { '1', 0, 0, 0, 0 }
>      strncpy (buf, "1234", 5) -> { '1', '2', '3', '4', 0 }
>      strncpy (buf, "12345", 5) -> { '1', '2', '3', '4', '5' }
>      strncpy (buf, "123456", 5) -> { '1', '2', '3', '4', '5' }

It fits perfectly there.  And it also merges nicely with the paragraph
below.

> 
> >       These functions copy the string pointed to by src  into  a  null-padded
> >       character sequence at the fixed-width buffer pointed to by dst.  If the
> >       destination buffer, limited by its size, isn't large enough to hold the
> >       copy,  the  resulting character sequence is truncated.
> 
> hmmm... perhaps
> 
>   These functions copy at most SZ bytes from SRC into a fixed-length
>   buffer DST, padding any unwritten bytes in DST with NUL bytes.
>   Specifically, if SRC has a NUL byte in the first SZ bytes, copying
>   stops there and any remaining bytes in DST are filled with NUL bytes.
>   If there are no NUL bytes in the first SZ bytes of SRC, SZ bytes are
>   copied to DST.
> 
> This avoids the term "string" completely and emphasises the not-string
> nature of the destination.

I don't like that, because it talks a lot about what the function does
in terms of low-level copies of bytes.  That may induce programmers to
try to find an abstraction in terms of strings.

> 
>  stpncpy,  strncpy  - zero a fixed-width buffer and copy a string into a
>        character sequence with truncation and zero the rest of it
> 
> Or "fill a fixed-width zero-padded buffer with bytes from a string"

But this wording is perfect!  I also used a similar wording for the
description.  I'll send a patch in a moment.

> 
> That avoids saying "copy a string"

Yep!

> 
> string_copying.7:
> 
> > For historic reasons, some standard APIs, such as utmpx(5),
> 
> Perhaps "some standard APIs and file formats,, such as utmpx(5) or
> tar(1)," ?

Yes; thanks!

> 
> > however, those padding null bytes are not part of the character
> > sequence.
> 
> add ", and may not be present if not needed." ?

I'm not convinced about this one.  "needed" is not the right word I
think.  For now, I'll add the other suggestions to a patch.  Expect it
in a moment.

Cheers,
Alex 

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-08 21:35                       ` Carlos O'Donell
@ 2023-11-08 22:11                         ` Alejandro Colomar
  2023-11-08 23:31                           ` Paul Eggert
  0 siblings, 1 reply; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-08 22:11 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: Zack Weinberg, GNU libc development, Jonny Grant, 'linux-man'

[-- Attachment #1: Type: text/plain, Size: 968 bytes --]

On Wed, Nov 08, 2023 at 04:35:12PM -0500, Carlos O'Donell wrote:
> On 11/8/23 10:07, Alejandro Colomar wrote:
> > So we had it in ISO C in C89 and C99, and only in C11 they realized it
> > had to be removed.  POSIX hasn't even removed it yet!  I won't hesitate
> > to kill a function just because of bureaucracy.
> 
> Attempting to get consensus at an international level, across cultural boundaries,
> use cases, workloads, and developer workflows is difficult and not intended to be
> bureaucracy for the sake of bureaucracy.

Hi Carlos!

I understand that, and respect ISO's work.  I just don't think we need,
as GNU or Linux projects, to be restricted to the decisions of ISO.  We
can realize that certain functions are bad, and mark them as deprecated
in our scope.  If others want to imitate (ISO might even take it as
"prior art"), then great.

Cheers,
Alex

> 
> -- 
> Cheers,
> Carlos.
> 

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
  2023-11-08 21:50                             ` Alejandro Colomar
@ 2023-11-08 22:17                               ` Alejandro Colomar
  2023-11-08 23:06                                 ` Paul Eggert
                                                   ` (3 more replies)
  0 siblings, 4 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-08 22:17 UTC (permalink / raw)
  To: linux-man
  Cc: Alejandro Colomar, libc-alpha, DJ Delorie, Jonny Grant,
	Matthew House, Oskari Pirhonen, Thorsten Kukuk,
	Adhemerval Zanella Netto, Zack Weinberg, G. Branden Robinson,
	Carlos O'Donell

[-- Attachment #1: Type: text/plain, Size: 3837 bytes --]

These copy *from* a string.  But the destination is a simple character
sequence within an array; not a string.

Suggested-by: DJ Delorie <dj@redhat.com>
Cc: Jonny Grant <jg@jguk.org>
Cc: Matthew House <mattlloydhouse@gmail.com>
Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com>
Cc: Thorsten Kukuk <kukuk@suse.com>
Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Zack Weinberg <zack@owlfolio.org>
Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Carlos O'Donell <carlos@redhat.com>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---

Resending, including the mailing lists, which I forgot.

 man3/stpncpy.3        | 17 +++++++++++++----
 man7/string_copying.7 | 20 ++++++++++----------
 2 files changed, 23 insertions(+), 14 deletions(-)

diff --git a/man3/stpncpy.3 b/man3/stpncpy.3
index b6bbfd0a3..f86ff8c29 100644
--- a/man3/stpncpy.3
+++ b/man3/stpncpy.3
@@ -6,9 +6,8 @@
 .TH stpncpy 3 (date) "Linux man-pages (unreleased)"
 .SH NAME
 stpncpy, strncpy
-\- zero a fixed-width buffer and
-copy a string into a character sequence with truncation
-and zero the rest of it
+\-
+fill a fixed-width null-padded buffer with bytes from a string
 .SH LIBRARY
 Standard C library
 .RI ( libc ", " \-lc )
@@ -37,7 +36,7 @@ .SH SYNOPSIS
         _GNU_SOURCE
 .fi
 .SH DESCRIPTION
-These functions copy the string pointed to by
+These functions copy bytes from the string pointed to by
 .I src
 into a null-padded character sequence at the fixed-width buffer pointed to by
 .IR dst .
@@ -110,6 +109,16 @@ .SH CAVEATS
 These functions produce a null-padded character sequence,
 not a string (see
 .BR string_copying (7)).
+For example:
+.P
+.in +4n
+.EX
+strncpy(buf, "1", 5);       // { \[aq]1\[aq],   0,   0,   0,   0 }
+strncpy(buf, "1234", 5);    // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq],   0 }
+strncpy(buf, "12345", 5);   // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], \[aq]5\[aq] }
+strncpy(buf, "123456", 5);  // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], \[aq]5\[aq] }
+.EE
+.in
 .P
 It's impossible to distinguish truncation by the result of the call,
 from a character sequence that just fits the destination buffer;
diff --git a/man7/string_copying.7 b/man7/string_copying.7
index cadf1c539..0e179ba34 100644
--- a/man7/string_copying.7
+++ b/man7/string_copying.7
@@ -41,15 +41,11 @@ .SS Strings
 .\" ----- SYNOPSIS :: Null-padded character sequences --------/
 .SS Null-padded character sequences
 .nf
-// Zero a fixed-width buffer, and
-// copy a string into a character sequence with truncation.
-.BI "char *stpncpy(char " dst "[restrict ." sz "], \
+// Fill a fixed-width null-padded buffer with bytes from a string.
+.BI "char *strncpy(char " dst "[restrict ." sz "], \
 const char *restrict " src ,
 .BI "               size_t " sz );
-.P
-// Zero a fixed-width buffer, and
-// copy a string into a character sequence with truncation.
-.BI "char *strncpy(char " dst "[restrict ." sz "], \
+.BI "char *stpncpy(char " dst "[restrict ." sz "], \
 const char *restrict " src ,
 .BI "               size_t " sz );
 .P
@@ -240,14 +236,18 @@ .SS Truncate or not?
 .\" ----- DESCRIPTION :: Null-padded character sequences --------------/
 .SS Null-padded character sequences
 For historic reasons,
-some standard APIs,
+some standard APIs and file formats,
 such as
-.BR utmpx (5),
+.BR utmpx (5)
+and
+.BR tar (1),
 use null-padded character sequences in fixed-width buffers.
 To interface with them,
 specialized functions need to be used.
 .P
-To copy strings into them, use
+To copy bytes from strings into these buffers, use
+.BR strncpy (3)
+or
 .BR stpncpy (3).
 .P
 To copy from an unterminated string within a fixed-width buffer into a string,
-- 
2.42.0

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
  2023-11-08 22:17                               ` [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string Alejandro Colomar
@ 2023-11-08 23:06                                 ` Paul Eggert
  2023-11-08 23:28                                   ` DJ Delorie
                                                     ` (2 more replies)
  2023-11-09  7:23                                 ` Oskari Pirhonen
                                                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 77+ messages in thread
From: Paul Eggert @ 2023-11-08 23:06 UTC (permalink / raw)
  To: Alejandro Colomar, linux-man
  Cc: libc-alpha, DJ Delorie, Jonny Grant, Matthew House,
	Oskari Pirhonen, Thorsten Kukuk, Adhemerval Zanella Netto,
	Zack Weinberg, G. Branden Robinson, Carlos O'Donell

On 11/8/23 14:17, Alejandro Colomar wrote:
> These copy*from*  a string

Not necessarily. For example, in strncpy (DST, SRC, N), SRC need not be 
a string.

By the way, have you looked at the recent (i.e., this-year) changes to 
the glibc manual's string section? They're relevant.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
  2023-11-08 23:06                                 ` Paul Eggert
@ 2023-11-08 23:28                                   ` DJ Delorie
  2023-11-09  0:24                                   ` Alejandro Colomar
  2023-11-09 14:11                                   ` Jonny Grant
  2 siblings, 0 replies; 77+ messages in thread
From: DJ Delorie @ 2023-11-08 23:28 UTC (permalink / raw)
  To: Paul Eggert
  Cc: alx, linux-man, libc-alpha, jg, mattlloydhouse, xxc3ncoredxx,
	kukuk, adhemerval.zanella, zack, g.branden.robinson, carlos

Paul Eggert <eggert@cs.ucla.edu> writes:
> Not necessarily. For example, in strncpy (DST, SRC, N), SRC need not be 
> a string.

But it will be treated as one, for the purposes of this function.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-08 22:11                         ` Alejandro Colomar
@ 2023-11-08 23:31                           ` Paul Eggert
  2023-11-09  0:29                             ` Alejandro Colomar
  0 siblings, 1 reply; 77+ messages in thread
From: Paul Eggert @ 2023-11-08 23:31 UTC (permalink / raw)
  To: Alejandro Colomar, Carlos O'Donell
  Cc: Zack Weinberg, GNU libc development, Jonny Grant, 'linux-man'

On 11/8/23 14:11, Alejandro Colomar wrote:
> I just don't think we need,
> as GNU or Linux projects, to be restricted to the decisions of ISO.  We
> can realize that certain functions are bad, and mark them as deprecated
> in our scope.

There's enough use of strncpy for the intended use (smallish fixed size 
character arrays that are null padded, not null terminated) that saying 
it's deprecated would likely cause more trouble than it's worth. It's 
not just utmp and tar; it's also socket programming (sun_path) and I'm 
sure other stuff.

Were we designing the C library from scratch I'd agree with you: in that 
context, strncpy would clearly be more trouble than it's worth. But now 
that we're stuck with strncpy we have better things to do than try to 
deprecate it.

Instead of saying "deprecate" I suggest we say something like "This 
function is generally a poor choice for processing strings" and point to 
the longer man page about strings in general. That's what the glibc 
manual does and it works reasonably well.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
  2023-11-08 23:06                                 ` Paul Eggert
  2023-11-08 23:28                                   ` DJ Delorie
@ 2023-11-09  0:24                                   ` Alejandro Colomar
  2023-11-09 14:11                                   ` Jonny Grant
  2 siblings, 0 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-09  0:24 UTC (permalink / raw)
  To: Paul Eggert
  Cc: linux-man, libc-alpha, DJ Delorie, Jonny Grant, Matthew House,
	Oskari Pirhonen, Thorsten Kukuk, Adhemerval Zanella Netto,
	Zack Weinberg, G. Branden Robinson, Carlos O'Donell

[-- Attachment #1: Type: text/plain, Size: 2592 bytes --]

Hi Paul,

On Wed, Nov 08, 2023 at 03:06:40PM -0800, Paul Eggert wrote:
> On 11/8/23 14:17, Alejandro Colomar wrote:
> > These copy*from*  a string
> 
> Not necessarily. For example, in strncpy (DST, SRC, N), SRC need not be a
> string.

Pedantically, true.  But since it's quite rare to copy from a
fixed-width null-padded array into another, I didn't want to waste
space on that and possibly confuse readers.  In such a case, the source
buffer must be at least as large as the destination buffer, and will
likely be the same size (because having fixed-width stuff, why make it
different), so memcpy(3) will probably be simpler.

> 
> By the way, have you looked at the recent (i.e., this-year) changes to the
> glibc manual's string section? They're relevant.

I hadn't; after your message, I have.
<https://sourceware.org/glibc/manual/2.38/html_mono/libc.html#String-and-Array-Utilities>

I like how it connects all the functions, and it explains the concepts
and gives advice (e.g., avoid truncation as it's usually evil), and
compares the different functions.

However, I think it misses a few things:

-  strncpy(3) and strncat(3) are not related at all.  They don't have
   the same relation that strcpy(3) and strcat(3) have.  You can't
   write the following code in any case:

	strncpy(dst, foo, sizeof(dst));
	strncat(dst, bar, sizeof(dst));

   as you would with strcpy(3) or strlcpy(3).

   strncpy(3) and strncat(3) are opposite functions: the former reads
   from a string and writes to a fixed-width null-padded buffer, and the
   latter reads from a fixed-width buffer and writes to a string.  (You
   can use them in other cases, pedantically, as you said above, but
   those cases are rather unreal.)

-  strncpy(3) is in a section that starts by saying:

   > The functions described in this section copy or concatenate the
   > possibly-truncated contents of a string or array to another

   This may mislead programmers to believe it is useful for producing
   strings, when it's not.

In general, I would like the manual to put some more distance between
these functions and the term "string".  As DJ mentioned, it might be
useful to mention utmp(5) and tar(1) as niche use cases for
st[rp]ncpy(3).

And now for some typo:

-  In the following sentence under "5.2 String and Array Conventions":

   > The array arguments and return values for these functions have type
   > void * or wchar_t.

   I believe it meant `void *` or `wchar_t *`

Cheers,

Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-08 23:31                           ` Paul Eggert
@ 2023-11-09  0:29                             ` Alejandro Colomar
  2023-11-09 10:13                               ` Jonny Grant
  0 siblings, 1 reply; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-09  0:29 UTC (permalink / raw)
  To: Paul Eggert
  Cc: Carlos O'Donell, Zack Weinberg, GNU libc development,
	Jonny Grant, 'linux-man'

[-- Attachment #1: Type: text/plain, Size: 1811 bytes --]

Hi Pail,

On Wed, Nov 08, 2023 at 03:31:38PM -0800, Paul Eggert wrote:
> On 11/8/23 14:11, Alejandro Colomar wrote:
> > I just don't think we need,
> > as GNU or Linux projects, to be restricted to the decisions of ISO.  We
> > can realize that certain functions are bad, and mark them as deprecated
> > in our scope.
> 
> There's enough use of strncpy for the intended use (smallish fixed size
> character arrays that are null padded, not null terminated) that saying it's
> deprecated would likely cause more trouble than it's worth. It's not just
> utmp and tar; it's also socket programming (sun_path) and I'm sure other
> stuff.
> 
> Were we designing the C library from scratch I'd agree with you: in that
> context, strncpy would clearly be more trouble than it's worth. But now that
> we're stuck with strncpy we have better things to do than try to deprecate
> it.

No, no, I'm not trying to deprecate it.  I was just saying that *iff*
all of its uses were dead, I'd deprecate it.  But they're clearly not
dead, so it's a perfect function for those cases.

> 
> Instead of saying "deprecate" I suggest we say something like "This function
> is generally a poor choice for processing strings" and point to the longer
> man page about strings in general. That's what the glibc manual does and it
> works reasonably well.

Yes, I've done something like this.  string_copying(7) recommends
avoiding fixed-width null-padded buffers in APIs.  But for those use
cases that already exist, this is the function to use.

I'm also refusing to document how to (mis)use this function for
truncating strings.  If one wants to struncate strings, they'll need
functions that were designed to do that (e.g., strlcpy(3)).

Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
  2023-11-08 22:17                               ` [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string Alejandro Colomar
  2023-11-08 23:06                                 ` Paul Eggert
@ 2023-11-09  7:23                                 ` Oskari Pirhonen
  2023-11-09 15:20                                 ` [PATCH v2 1/2] " Alejandro Colomar
  2023-11-09 15:20                                 ` [PATCH v2 2/2] stpncpy.3, string.3, string_copying.7: Clarify that st[rp]ncpy() pad with null bytes Alejandro Colomar
  3 siblings, 0 replies; 77+ messages in thread
From: Oskari Pirhonen @ 2023-11-09  7:23 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: linux-man, libc-alpha, DJ Delorie, Jonny Grant, Matthew House,
	Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg,
	G. Branden Robinson, Carlos O'Donell

[-- Attachment #1: Type: text/plain, Size: 4198 bytes --]

On Wed, Nov 08, 2023 at 23:17:07 +0100, Alejandro Colomar wrote:
> These copy *from* a string.  But the destination is a simple character
> sequence within an array; not a string.
> 
> Suggested-by: DJ Delorie <dj@redhat.com>
> Cc: Jonny Grant <jg@jguk.org>
> Cc: Matthew House <mattlloydhouse@gmail.com>
> Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com>
> Cc: Thorsten Kukuk <kukuk@suse.com>
> Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
> Cc: Zack Weinberg <zack@owlfolio.org>
> Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
> Cc: Carlos O'Donell <carlos@redhat.com>
> Signed-off-by: Alejandro Colomar <alx@kernel.org>
> ---

I like the "with bytes from a string" wording. Good call.

- Oskari

> 
> Resending, including the mailing lists, which I forgot.
> 
>  man3/stpncpy.3        | 17 +++++++++++++----
>  man7/string_copying.7 | 20 ++++++++++----------
>  2 files changed, 23 insertions(+), 14 deletions(-)
> 
> diff --git a/man3/stpncpy.3 b/man3/stpncpy.3
> index b6bbfd0a3..f86ff8c29 100644
> --- a/man3/stpncpy.3
> +++ b/man3/stpncpy.3
> @@ -6,9 +6,8 @@
>  .TH stpncpy 3 (date) "Linux man-pages (unreleased)"
>  .SH NAME
>  stpncpy, strncpy
> -\- zero a fixed-width buffer and
> -copy a string into a character sequence with truncation
> -and zero the rest of it
> +\-
> +fill a fixed-width null-padded buffer with bytes from a string
>  .SH LIBRARY
>  Standard C library
>  .RI ( libc ", " \-lc )
> @@ -37,7 +36,7 @@ .SH SYNOPSIS
>          _GNU_SOURCE
>  .fi
>  .SH DESCRIPTION
> -These functions copy the string pointed to by
> +These functions copy bytes from the string pointed to by
>  .I src
>  into a null-padded character sequence at the fixed-width buffer pointed to by
>  .IR dst .
> @@ -110,6 +109,16 @@ .SH CAVEATS
>  These functions produce a null-padded character sequence,
>  not a string (see
>  .BR string_copying (7)).
> +For example:
> +.P
> +.in +4n
> +.EX
> +strncpy(buf, "1", 5);       // { \[aq]1\[aq],   0,   0,   0,   0 }
> +strncpy(buf, "1234", 5);    // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq],   0 }
> +strncpy(buf, "12345", 5);   // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], \[aq]5\[aq] }
> +strncpy(buf, "123456", 5);  // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], \[aq]5\[aq] }
> +.EE
> +.in
>  .P
>  It's impossible to distinguish truncation by the result of the call,
>  from a character sequence that just fits the destination buffer;
> diff --git a/man7/string_copying.7 b/man7/string_copying.7
> index cadf1c539..0e179ba34 100644
> --- a/man7/string_copying.7
> +++ b/man7/string_copying.7
> @@ -41,15 +41,11 @@ .SS Strings
>  .\" ----- SYNOPSIS :: Null-padded character sequences --------/
>  .SS Null-padded character sequences
>  .nf
> -// Zero a fixed-width buffer, and
> -// copy a string into a character sequence with truncation.
> -.BI "char *stpncpy(char " dst "[restrict ." sz "], \
> +// Fill a fixed-width null-padded buffer with bytes from a string.
> +.BI "char *strncpy(char " dst "[restrict ." sz "], \
>  const char *restrict " src ,
>  .BI "               size_t " sz );
> -.P
> -// Zero a fixed-width buffer, and
> -// copy a string into a character sequence with truncation.
> -.BI "char *strncpy(char " dst "[restrict ." sz "], \
> +.BI "char *stpncpy(char " dst "[restrict ." sz "], \
>  const char *restrict " src ,
>  .BI "               size_t " sz );
>  .P
> @@ -240,14 +236,18 @@ .SS Truncate or not?
>  .\" ----- DESCRIPTION :: Null-padded character sequences --------------/
>  .SS Null-padded character sequences
>  For historic reasons,
> -some standard APIs,
> +some standard APIs and file formats,
>  such as
> -.BR utmpx (5),
> +.BR utmpx (5)
> +and
> +.BR tar (1),
>  use null-padded character sequences in fixed-width buffers.
>  To interface with them,
>  specialized functions need to be used.
>  .P
> -To copy strings into them, use
> +To copy bytes from strings into these buffers, use
> +.BR strncpy (3)
> +or
>  .BR stpncpy (3).
>  .P
>  To copy from an unterminated string within a fixed-width buffer into a string,
> -- 
> 2.42.0

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-09  0:29                             ` Alejandro Colomar
@ 2023-11-09 10:13                               ` Jonny Grant
  2023-11-09 11:08                                 ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Alejandro Colomar
  2023-11-09 11:13                                 ` strncpy clarify result may not be null terminated Alejandro Colomar
  0 siblings, 2 replies; 77+ messages in thread
From: Jonny Grant @ 2023-11-09 10:13 UTC (permalink / raw)
  To: Alejandro Colomar, Paul Eggert
  Cc: Carlos O'Donell, Zack Weinberg, GNU libc development,
	'linux-man'



On 09/11/2023 00:29, Alejandro Colomar wrote:
> Hi Pail,
> 
> On Wed, Nov 08, 2023 at 03:31:38PM -0800, Paul Eggert wrote:
>> On 11/8/23 14:11, Alejandro Colomar wrote:
>>> I just don't think we need,
>>> as GNU or Linux projects, to be restricted to the decisions of ISO.  We
>>> can realize that certain functions are bad, and mark them as deprecated
>>> in our scope.
>>
>> There's enough use of strncpy for the intended use (smallish fixed size
>> character arrays that are null padded, not null terminated) that saying it's
>> deprecated would likely cause more trouble than it's worth. It's not just
>> utmp and tar; it's also socket programming (sun_path) and I'm sure other
>> stuff.
>>
>> Were we designing the C library from scratch I'd agree with you: in that
>> context, strncpy would clearly be more trouble than it's worth. But now that
>> we're stuck with strncpy we have better things to do than try to deprecate
>> it.
> 
> No, no, I'm not trying to deprecate it.  I was just saying that *iff*
> all of its uses were dead, I'd deprecate it.  But they're clearly not
> dead, so it's a perfect function for those cases.
> 
>>
>> Instead of saying "deprecate" I suggest we say something like "This function
>> is generally a poor choice for processing strings" and point to the longer
>> man page about strings in general. That's what the glibc manual does and it
>> works reasonably well.
> 
> Yes, I've done something like this.  string_copying(7) recommends
> avoiding fixed-width null-padded buffers in APIs.  But for those use
> cases that already exist, this is the function to use.

https://man7.org/linux/man-pages/man7/string_copying.7.html
Rather than "catenation", in my experience "concatenation" is the common term to explain what it does. There are quite a few on that page. Probably other man pages too.

How about following the style of the other man pages that put the notes about each function below them? (rather than above)
https://man7.org/linux/man-pages/man3/string.3.html

size_t strlen(const char *s);
Return the length of the string s.


At the moment on string_copying there are // comments on the line above each function. So the presentation of the information is different:

// Copy/catenate a string.
char *strcpy(char *restrict dst, const char *restrict src);
char *strcat(char *restrict dst, const char *restrict src);


Kind regards
Jonny

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
       [not found]               ` <20231109031345.245703-1-mattlloydhouse@gmail.com>
@ 2023-11-09 10:31                 ` Jonny Grant
  2023-11-09 11:38                   ` Alejandro Colomar
  0 siblings, 1 reply; 77+ messages in thread
From: Jonny Grant @ 2023-11-09 10:31 UTC (permalink / raw)
  To: Matthew House; +Cc: Alejandro Colomar, linux-man, GNU C Library

With glibc added

On Thu, 9 Nov 2023 at 03:13, Matthew House <mattlloydhouse@gmail.com> wrote:
>
> On Wed, Nov 8, 2023 at 2:33 PM Alejandro Colomar <alx@kernel.org> wrote:
> > On Tue, Nov 07, 2023 at 09:12:37PM -0500, Matthew House wrote:
> > > Man pages aren't read only by people writing new code, but also by people
> > > reading and modifying existing code. And despite your preferences regarding
> > > which functions ought to be used to produce strings, it's a widespread (and
> > > correct) practice to produce a string from the character sequence created
> > > by strncpy(3). There are two ways of doing this, either by setting the last
> > > character of the destination buffer to null if you want to produce a
> > > truncated string, or by testing the last character against zero if you want
> > > to detect truncation and raise an error.
> >
> > It is not strncpy(3) who truncated, but the programmer by adding a NULL
> > in buff[BUFSIZ - 1].  In the following snippet, strncpy(3) will not
> > truncate:
> >
> >         char cs[3];
> >
> >         strncpy(cs, "foo", 3);
> >
> > And yet your code doing if (cs[2] != '\0') { goto error; } would think
> > it did.  That's because you deformed strncpy(3) to implement a poor
> > man's strlcpy(3).
> >
> >         char cs[3];
> >
> >         strncpy(cs, "foo", 3);
> >         cs[2] = '\0';  // The truncation is here, not in strncpy(3).
>
> That's indeed a self-consistent interpretation of strncpy(3)'s function,
> but I don't think it's borne out by its formal definition, which I was
> basing my reasoning on. The current Linux man page for strncpy(3) says,
>
>   These functions copy the string pointed to by src into a null-padded
>   character sequence at the fixed-width buffer pointed to by dst. If the
>   destination buffer, limited by its size, isn't large enough to hold the
>   copy, the resulting character sequence is truncated.
>
> Notice how it "copies the string": as your string_copying(7) says, a string
> includes both a character sequence and a final null byte. So I'd ordinarily
> read this definition as saying that strncpy(3) tries to copy src up to and
> including the null byte, but produces a truncated copy of the whole string
> if the destination buffer is too small. Thus, even if the destination
> buffer contains all non-null characters in the original string, then the
> copy has still been "truncated" in this sense.
>
> The ISO C definition, and by extension, the POSIX definition, make this
> interpretation even more explicit:
>
>   The strncpy function copies not more than n characters (characters that
>   follow a null character are not copied) from the array pointed to by s2
>   to the array pointed to by s1.
>
> That is, the terminating null byte is part of the copy, but not anything
> after the terminating null byte.
>
> So one can interpret strncpy(3) as copying a prefix of a character sequence
> into a buffer (and zero-filling the remainder), in which case you're
> correct that truncation cannot be detected. But the function is fomally
> defined as copying a prefix of a string into a buffer (and zero-filling the
> remainder), in which case the string has been truncated if the buffer
> doesn't end in a null byte afterward. It's just that one may not care about
> the terminating null byte being truncated if the user of the result just
> wants the initial character sequence.
>
> > > I'm not aware of any alternative to a strncpy(3)-based snippet for
> > > producing a possibly-truncated copy of a string, except for your preferred
> > > strlcpy(3) or stpecpy(3), which aren't available to anyone without a
> >
> > The Linux kernel has strscpy(3), which is also good, but is not
> > available to user space.
> >
> > > brand-new glibc (nor, by extension, any applications or libraries that want
> >
> > libbsd has provided strlcpy(3) since basically forever.  It is a very
> > portable library.  You don't need a brand-new glibc for having
> > strlcpy(3).
> >
> > <https://libbsd.freedesktop.org/wiki/>
>
> That's a nice library that I didn't know about! Unfortunately, I don't
> think it's a very viable option for the long tail of small libraries I've
> referred to, which generally don't have any sub-dependencies of their own,
> apart from those provided by the platform.
>
> Going from 0 to 2 dependencies (libbsd and libmd) requires invoking their
> configure scripts from whatever build system you're using (in such a way
> that libbsd can locate libmd), ensuring they're safe for cross-compilation
> if that's a goal, ensuring you bundle them in a way that respects their
> license terms, and ensuring that any user of your library links to the two
> dependencies and doesn't duplicate them. At that point, rolling your own
> strlcpy(3) equivalent definitely sounds like less mental load, at least to
> me.
>
> > > functions); snprintf(3), which has the insidious flaw of not supporting
> > > more than INT_MAX characters on pain of UB, and also produces a warning if
> > > the compiler notices the possible truncation; or strlen(3) + min() +
> > > memcpy(3) + manually adding a null terminator, which is certainly more
> > > explicit in its intent, and avoids strncpy(3)'s zero-filling behavior if
> > > that poses a performance problem, but similarly opens up room for
> > > off-by-one errors.
> >
> > More than the performance problem, I'm more worried about the
> > maintainability of strncpy(3).  When 20 years from now, a programmer
> > reading a piece of code full of strncpy(3) wants to migrate to a sane
> > function like strlcpy(3) or strcpy(3), the programmer needs to
> > understand if the zeroing was purposeful or just accidental.  Because
> > by using strlcpy(3), it may start leaking some trailing data if the
> > trailing of the buffer is meaningful to some program.
>
> I didn't see this as an issue in practice when I was reviewing all those
> existing usages of strncpy(3). The vast majority were used in the midst of
> simple string manipulation, where the destination buffer starts as
> uninitialized or zeroed out, and ultimately gets passed into a user
> expecting an ordinary null-terminated string.
>
> (One exception was a few functions that used strncpy(dst, "", len) to zero
> out the buffer, which is thankfully pretty obvious. Another exception was
> the functions that actually used strncpy(3) to produce a null-padded
> character sequence, e.g., when writing a value into a section of a binary.
> But in general, I found that it's usually not difficult to tell when a
> usage is being clever enough that the null padding might be significant.)
>
> In fact, the greater confusion came from the surprisingly common practice
> of using strncpy(3) like it's memcpy(3), by giving it the known length of
> the source string, or of some prefix computed through strchr(3) or similar.
> This is often then followed up by strncat(3) or similar, indicating that
> the writer clearly expects the full length to have non-null characters. But
> if the length computation is separated far enough from the actual call to
> strncpy(3), then it can become unclear whether the source is actually
> expected to have any interior null bytes before the computed length. (So if
> a list of alternatives to strncpy(3) is ever drawn up, then I'd suggest
> that ordinary memcpy(3) be one of them.)
>
> > > For the sake of reference, I looked into a few big C and C++ projects to
> > > see how often a strncpy(3)-based snippet was used to produce a truncated
> > > copy. I found 18 instances in glibc 2.38, 2 in util-linux 2.39.2 (in spite
> > > of its custom xstrncpy() function), 61 in GNU binutils 2.41, 43 in
> > > GDB 13.2, 1 in LLVM 17.0.4, 7 in CPython 3.12.0, 99 in OpenJDK 22+22,
> > > 10 in .NET Runtime 7.0.13, 3 in V8 12.1.82, and 86 in Firefox 120.0. (Note
> > > that I haven't filtered out vendored dependencies, so there's a little bit
> > > of double-counting.) It seems like most codebases that don't ban strncpy(3)
> > > use a derived snippet somewhere or another. Also, I found 3 instances in
> > > glibc 2.38 and 5 instances in Firefox 120.0 of detecting truncation by
> > > checking the last character.
> >
> > I know.  I've been rewriting the code handling strings in shadow-utils
> > for the last year, and ther was a lot of it.  I fixed several small bugs
> > in the process, so I recommend avoiding it.
>
> I can't tell you about your own experience, but in mine, the root cause of
> most string-handling bugs has been excessive cleverness in using the
> standard string functions, rather than the behavior of the functions
> themselves. So one worry of mine is that if strncpy(3) ends up being
> deprecated or whatever, then authors of portable libraries will start
> writing lots of custom memcpy(3)-based replacements to their strncpy(3)-
> based snippets, and more lines of code will introduce more opportunities
> for cleverness.
>
> (This is also why I was confused by your support for strcpy(3) on the
> grounds that _FORTIFY_SOURCE exists. Sure, it's better than strncpy(3) in
> that its behavior isn't nearly so subtle, but _FORTIFY_SOURCE can only
> protect us from overruns, not from all the "small bugs" that might ensue
> from people becoming more clever with sizing the destination buffer with
> strcpy(3). Also, if it were truly a panacea, then we'd hardly have to worry
> about the problems of strncpy(3) at all, since it would detect any misuse
> of the function.)

Matthew, thank you for sharing your information.

https://www.gnu.org/software/libc/manual/html_node/Source-Fortification.html

I do find _FORTIFY_SOURCE useful in a developer build, for testing, it
raises SIGABRT and we can get useful coredump. Without that macro, it
would likely still crash or corrupt. However, in my experience in
safety critical applications, we really need to avoid the crashes, so
we'd write user-space functions that do the same sanity checks (in the
same way that fortify does) and then propagate the error back to the
application to report the failure, and log it.

>
> Probably the only way to solve the cleverness issue for good is to have an
> immediately-available, foolproof, performant set of string functions that
> are extremely straightforward to understand and use, flexible enough for
> any use case, and generally agreed to be the first choice for string
> manipulation.

What's the best standardized function for C string copying in your
opinion?  They all seem to have drawbacks, strlcpy truncates (I'd
rather it rejected if it didn't have enough buffer - could cause
issues if the meaning of the string changed due to truncation, eg if
it was a file path). Other alternative functions aren't widely in use.

Kind regards, Jonny

^ permalink raw reply	[flat|nested] 77+ messages in thread

* catenate vs concatenate (was: strncpy clarify result may not be null terminated)
  2023-11-09 10:13                               ` Jonny Grant
@ 2023-11-09 11:08                                 ` Alejandro Colomar
  2023-11-09 14:06                                   ` catenate vs concatenate Jonny Grant
  2023-11-27 14:33                                   ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Zack Weinberg
  2023-11-09 11:13                                 ` strncpy clarify result may not be null terminated Alejandro Colomar
  1 sibling, 2 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-09 11:08 UTC (permalink / raw)
  To: Jonny Grant
  Cc: Paul Eggert, Carlos O'Donell, Zack Weinberg,
	GNU libc development, 'linux-man'

[-- Attachment #1: Type: text/plain, Size: 796 bytes --]

Hi Jonny,

On Thu, Nov 09, 2023 at 10:13:24AM +0000, Jonny Grant wrote:
> https://man7.org/linux/man-pages/man7/string_copying.7.html
> Rather than "catenation", in my experience "concatenation" is the common term to explain what it does. There are quite a few on that page. Probably other man pages too.

Here's why:
<https://lore.kernel.org/linux-man/CAKH6PiUrQzb7vRZxUs0742WnfaLpcUec0QfdJQJ5Di8LqFg+NA@mail.gmail.com/>

Douglas McIlroy wrote (Wed, 14 Dec 2022 11:22:05 -0500):
>> concatenate
> 
> We began fighting this pomposity before v7. There has only been
> backsliding since..
> "Catenate" is crisper, means the same thing, and concurs with the "cat" command.
> I invite you to join the battle for simplicity.

Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-09 10:13                               ` Jonny Grant
  2023-11-09 11:08                                 ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Alejandro Colomar
@ 2023-11-09 11:13                                 ` Alejandro Colomar
  2023-11-09 14:05                                   ` Jonny Grant
  1 sibling, 1 reply; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-09 11:13 UTC (permalink / raw)
  To: Jonny Grant
  Cc: Paul Eggert, Carlos O'Donell, Zack Weinberg,
	GNU libc development, 'linux-man'

[-- Attachment #1: Type: text/plain, Size: 1169 bytes --]

Hi Jonny,

On Thu, Nov 09, 2023 at 10:13:24AM +0000, Jonny Grant wrote:
> On 09/11/2023 00:29, Alejandro Colomar wrote:
> How about following the style of the other man pages that put the notes about each function below them? (rather than above)
> https://man7.org/linux/man-pages/man3/string.3.html
> 
> size_t strlen(const char *s);
> Return the length of the string s.
> 
> 
> At the moment on string_copying there are // comments on the line above each function. So the presentation of the information is different:
> 
> // Copy/catenate a string.
> char *strcpy(char *restrict dst, const char *restrict src);
> char *strcat(char *restrict dst, const char *restrict src);

The reason for this presentation is that I want to first look at what
they do, and only then look at the function you need to do that.

So, if you want to copy from a character sequence into a string, you
search for that, and it will tell you what functions you can use for
that (strncat(3) is the only standard one).

If you want to search for a specific function, you can always search
with '/strncpy'.

Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-09 10:31                 ` strncpy clarify result may not be null terminated Jonny Grant
@ 2023-11-09 11:38                   ` Alejandro Colomar
  2023-11-09 12:43                     ` Alejandro Colomar
                                       ` (3 more replies)
  0 siblings, 4 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-09 11:38 UTC (permalink / raw)
  To: Jonny Grant; +Cc: Matthew House, linux-man, GNU C Library

[-- Attachment #1: Type: text/plain, Size: 2122 bytes --]

Hi Jonny,

On Thu, Nov 09, 2023 at 10:31:49AM +0000, Jonny Grant wrote:
> > Probably the only way to solve the cleverness issue for good is to have an
> > immediately-available, foolproof, performant set of string functions that
> > are extremely straightforward to understand and use, flexible enough for
> > any use case, and generally agreed to be the first choice for string
> > manipulation.
> 
> What's the best standardized function for C string copying in your

strlcpy(3) will soon be standard.  POSIX.1-202x (Issue 8) will add it,
which is why it's been added recently to glibc.  Hopefully, ISO C3x will
follow (yeah, it's not like tomorrow).

> opinion?  They all seem to have drawbacks, strlcpy truncates (I'd
> rather it rejected if it didn't have enough buffer - could cause
> issues if the meaning of the string changed due to truncation, eg if
> it was a file path). Other alternative functions aren't widely in use.

If you are consistent in checking the return value of strlcpy(3) and
reporting an error, it's the best standard alternative nowadays.
snprintf(3), except for using int instead of size_t, has an equivalent
API, and is in C99, in case that means something.

If you would want to write something based on Michael Kerrisk's article,
you could do this:

	ssize_t
	strxcpy(char *restrict dst, char *restrict src, size_t dsize)
	{
		if (strlen(src) < dsize)
			return -1;

		strcpy(dst, src);
	}

You may also want to calculate 'dsize' automagically, to avoid human
error, in case it's an array, so you could write a macro on top of it:

	#define STRXCPY(dst, src)  strxcpy(dst, src, ARRAY_SIZE(dst))

These are just small wrappers over standard functions, so you shouldn't
have problems adding them to your project.

This is my long term plan for shadow-utils, indeed.  I'm first
transforming strncpy(3) calls into strlcpy(3) to remove the superfluous
padding, and later will use this strxcpy() to remove the truncated
strings to avoid misinterpretation.

Cheers,
Alex

> 
> Kind regards, Jonny

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-09 11:38                   ` Alejandro Colomar
@ 2023-11-09 12:43                     ` Alejandro Colomar
  2023-11-09 12:51                     ` Xi Ruoyao
                                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-09 12:43 UTC (permalink / raw)
  To: Jonny Grant; +Cc: Matthew House, linux-man, GNU C Library

[-- Attachment #1: Type: text/plain, Size: 1158 bytes --]

On Thu, Nov 09, 2023 at 12:38:37PM +0100, Alejandro Colomar wrote:
> If you would want to write something based on Michael Kerrisk's article,
> you could do this:
> 
> 	ssize_t
> 	strxcpy(char *restrict dst, char *restrict src, size_t dsize)
> 	{
> 		if (strlen(src) < dsize)

Heh, here's my off-by-one bug of the day.  Good thing is I can fix it in
a single place; unlike calling strncpy(3) all the time.

This should have been <=.

Cheers,
Alex

> 			return -1;
> 
> 		strcpy(dst, src);
> 	}
> 
> You may also want to calculate 'dsize' automagically, to avoid human
> error, in case it's an array, so you could write a macro on top of it:
> 
> 	#define STRXCPY(dst, src)  strxcpy(dst, src, ARRAY_SIZE(dst))
> 
> These are just small wrappers over standard functions, so you shouldn't
> have problems adding them to your project.
> 
> This is my long term plan for shadow-utils, indeed.  I'm first
> transforming strncpy(3) calls into strlcpy(3) to remove the superfluous
> padding, and later will use this strxcpy() to remove the truncated
> strings to avoid misinterpretation.

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-09 11:38                   ` Alejandro Colomar
  2023-11-09 12:43                     ` Alejandro Colomar
@ 2023-11-09 12:51                     ` Xi Ruoyao
  2023-11-09 14:01                       ` Alejandro Colomar
  2023-11-09 18:11                     ` Paul Eggert
  2023-11-10 11:23                     ` Jonny Grant
  3 siblings, 1 reply; 77+ messages in thread
From: Xi Ruoyao @ 2023-11-09 12:51 UTC (permalink / raw)
  To: Alejandro Colomar, Jonny Grant; +Cc: Matthew House, linux-man, GNU C Library

On Thu, 2023-11-09 at 12:38 +0100, Alejandro Colomar wrote:
> If you are consistent in checking the return value of strlcpy(3) and
> reporting an error, it's the best standard alternative nowadays.
> snprintf(3), except for using int instead of size_t, has an equivalent
> API, and is in C99, in case that means something.

Yes, you can always create your own wrapper instead of demanding a
standard function which must be implemented by every libc.

> If you would want to write something based on Michael Kerrisk's article,
> you could do this:

> 	ssize_t
> 	strxcpy(char *restrict dst, char *restrict src, size_t dsize)
> 	{
> 		if (strlen(src) < dsize)
> 			return -1;
> 
> 		strcpy(dst, src);
> 	}

I'd like to add __attribute__ ((warn_unused_result)) for this wrapper as
well.

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-09 12:51                     ` Xi Ruoyao
@ 2023-11-09 14:01                       ` Alejandro Colomar
  0 siblings, 0 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-09 14:01 UTC (permalink / raw)
  To: Xi Ruoyao; +Cc: Jonny Grant, Matthew House, linux-man, GNU C Library

[-- Attachment #1: Type: text/plain, Size: 1089 bytes --]

On Thu, Nov 09, 2023 at 08:51:34PM +0800, Xi Ruoyao wrote:
> On Thu, 2023-11-09 at 12:38 +0100, Alejandro Colomar wrote:
> > If you are consistent in checking the return value of strlcpy(3) and
> > reporting an error, it's the best standard alternative nowadays.
> > snprintf(3), except for using int instead of size_t, has an equivalent
> > API, and is in C99, in case that means something.
> 
> Yes, you can always create your own wrapper instead of demanding a
> standard function which must be implemented by every libc.
> 
> > If you would want to write something based on Michael Kerrisk's article,
> > you could do this:
> 
> > 	ssize_t
> > 	strxcpy(char *restrict dst, char *restrict src, size_t dsize)
> > 	{
> > 		if (strlen(src) < dsize)
> > 			return -1;
> > 
> > 		strcpy(dst, src);
> > 	}
> 
> I'd like to add __attribute__ ((warn_unused_result)) for this wrapper as
> well.

Indeed.  Thanks!

> 
> -- 
> Xi Ruoyao <xry111@xry111.site>
> School of Aerospace Science and Technology, Xidian University

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-09 11:13                                 ` strncpy clarify result may not be null terminated Alejandro Colomar
@ 2023-11-09 14:05                                   ` Jonny Grant
  2023-11-09 15:04                                     ` Alejandro Colomar
  0 siblings, 1 reply; 77+ messages in thread
From: Jonny Grant @ 2023-11-09 14:05 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Paul Eggert, Carlos O'Donell, Zack Weinberg,
	GNU libc development, 'linux-man'



On 09/11/2023 11:13, Alejandro Colomar wrote:
> Hi Jonny,
> 
> On Thu, Nov 09, 2023 at 10:13:24AM +0000, Jonny Grant wrote:
>> On 09/11/2023 00:29, Alejandro Colomar wrote:
>> How about following the style of the other man pages that put the notes about each function below them? (rather than above)
>> https://man7.org/linux/man-pages/man3/string.3.html
>>
>> size_t strlen(const char *s);
>> Return the length of the string s.
>>
>>
>> At the moment on string_copying there are // comments on the line above each function. So the presentation of the information is different:
>>
>> // Copy/catenate a string.
>> char *strcpy(char *restrict dst, const char *restrict src);
>> char *strcat(char *restrict dst, const char *restrict src);
> 
> The reason for this presentation is that I want to first look at what
> they do, and only then look at the function you need to do that.

That appears different to the man page convention. It looks odd especially with the extra // that I don't recall other pages having in the description, usually that would be for examples. Consistency is best, but I'll leave it with you.
Kind regards
Jonny

> 
> So, if you want to copy from a character sequence into a string, you
> search for that, and it will tell you what functions you can use for
> that (strncat(3) is the only standard one).
> 
> If you want to search for a specific function, you can always search
> with '/strncpy'.
> 
> Cheers,
> Alex
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: catenate vs concatenate
  2023-11-09 11:08                                 ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Alejandro Colomar
@ 2023-11-09 14:06                                   ` Jonny Grant
  2023-11-27 14:33                                   ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Zack Weinberg
  1 sibling, 0 replies; 77+ messages in thread
From: Jonny Grant @ 2023-11-09 14:06 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Paul Eggert, Carlos O'Donell, Zack Weinberg,
	GNU libc development, 'linux-man'



On 09/11/2023 11:08, Alejandro Colomar wrote:
> Hi Jonny,
> 
> On Thu, Nov 09, 2023 at 10:13:24AM +0000, Jonny Grant wrote:
>> https://man7.org/linux/man-pages/man7/string_copying.7.html
>> Rather than "catenation", in my experience "concatenation" is the common term to explain what it does. There are quite a few on that page. Probably other man pages too.
> 
> Here's why:
> <https://lore.kernel.org/linux-man/CAKH6PiUrQzb7vRZxUs0742WnfaLpcUec0QfdJQJ5Di8LqFg+NA@mail.gmail.com/>
> 
> Douglas McIlroy wrote (Wed, 14 Dec 2022 11:22:05 -0500):
>>> concatenate
>>
>> We began fighting this pomposity before v7. There has only been
>> backsliding since..
>> "Catenate" is crisper, means the same thing, and concurs with the "cat" command.
>> I invite you to join the battle for simplicity.
> 
> Cheers,
> Alex
> 

Looks like it's already been discussed. Where a term is already in use, it's a question if to change the commonly used term. Technical documents seem to be mostly 'concatenate'. Looks like people have already decided on going with 'catenate'.
Kind regards
Jonny

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
  2023-11-08 23:06                                 ` Paul Eggert
  2023-11-08 23:28                                   ` DJ Delorie
  2023-11-09  0:24                                   ` Alejandro Colomar
@ 2023-11-09 14:11                                   ` Jonny Grant
  2023-11-09 14:35                                     ` Alejandro Colomar
  2 siblings, 1 reply; 77+ messages in thread
From: Jonny Grant @ 2023-11-09 14:11 UTC (permalink / raw)
  To: Paul Eggert, Alejandro Colomar, linux-man
  Cc: libc-alpha, DJ Delorie, Matthew House, Oskari Pirhonen,
	Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg,
	G. Branden Robinson, Carlos O'Donell



On 08/11/2023 23:06, Paul Eggert wrote:
> On 11/8/23 14:17, Alejandro Colomar wrote:
>> These copy*from*  a string
> 
> Not necessarily. For example, in strncpy (DST, SRC, N), SRC need not be a string.
> 
> By the way, have you looked at the recent (i.e., this-year) changes to the glibc manual's string section? They're relevant.

That's a great reference page Paul, lots of useful information in the manual.
https://www.gnu.org/software/libc/manual/html_node/String-and-Array-Utilities.html

Re this man page:

https://man7.org/linux/man-pages/man3/string.3.html

 Obsolete functions
       char *strncpy(char dest[restrict .n], const char src[restrict .n],
                     size_t n);
              Copy at most n bytes from string src to dest, returning a
              pointer to the start of dest.


It could clarify
"Copy at most n bytes from string src to ARRAY dest, returning a
pointer to the start of ARRAY dest."

(caps for my emphasis in this email)

Kind regards
Jonny

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
  2023-11-09 14:11                                   ` Jonny Grant
@ 2023-11-09 14:35                                     ` Alejandro Colomar
  2023-11-09 14:47                                       ` Jonny Grant
  0 siblings, 1 reply; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-09 14:35 UTC (permalink / raw)
  To: Jonny Grant
  Cc: Paul Eggert, linux-man, libc-alpha, DJ Delorie, Matthew House,
	Oskari Pirhonen, Thorsten Kukuk, Adhemerval Zanella Netto,
	Zack Weinberg, G. Branden Robinson, Carlos O'Donell

[-- Attachment #1: Type: text/plain, Size: 1456 bytes --]

Hi Jonny,

On Thu, Nov 09, 2023 at 02:11:14PM +0000, Jonny Grant wrote:
> On 08/11/2023 23:06, Paul Eggert wrote:
> > On 11/8/23 14:17, Alejandro Colomar wrote:
> >> These copy*from*  a string
> > 
> > Not necessarily. For example, in strncpy (DST, SRC, N), SRC need not be a string.
> > 
> > By the way, have you looked at the recent (i.e., this-year) changes to the glibc manual's string section? They're relevant.
> 
> That's a great reference page Paul, lots of useful information in the manual.
> https://www.gnu.org/software/libc/manual/html_node/String-and-Array-Utilities.html
> 
> Re this man page:
> 
> https://man7.org/linux/man-pages/man3/string.3.html
> 
>  Obsolete functions
>        char *strncpy(char dest[restrict .n], const char src[restrict .n],
>                      size_t n);
>               Copy at most n bytes from string src to dest, returning a
>               pointer to the start of dest.

Uh, I forgot about that page.  I'll have a look at it and update it.  At
least, I need to remove that "Obsolete functions".

> 
> 
> It could clarify
> "Copy at most n bytes from string src to ARRAY dest, returning a
> pointer to the start of ARRAY dest."

I think I prefer DJ's suggestion:

"Fill a fixed‐width null‐padded buffer with bytes from a string."

Thanks!
Alex

> 
> (caps for my emphasis in this email)
> 
> Kind regards
> Jonny

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
  2023-11-09 14:35                                     ` Alejandro Colomar
@ 2023-11-09 14:47                                       ` Jonny Grant
  2023-11-09 15:02                                         ` Alejandro Colomar
  0 siblings, 1 reply; 77+ messages in thread
From: Jonny Grant @ 2023-11-09 14:47 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Paul Eggert, linux-man, libc-alpha, DJ Delorie, Matthew House,
	Oskari Pirhonen, Thorsten Kukuk, Adhemerval Zanella Netto,
	Zack Weinberg, G. Branden Robinson, Carlos O'Donell



On 09/11/2023 14:35, Alejandro Colomar wrote:
> Hi Jonny,
> 
> On Thu, Nov 09, 2023 at 02:11:14PM +0000, Jonny Grant wrote:
>> On 08/11/2023 23:06, Paul Eggert wrote:
>>> On 11/8/23 14:17, Alejandro Colomar wrote:
>>>> These copy*from*  a string
>>>
>>> Not necessarily. For example, in strncpy (DST, SRC, N), SRC need not be a string.
>>>
>>> By the way, have you looked at the recent (i.e., this-year) changes to the glibc manual's string section? They're relevant.
>>
>> That's a great reference page Paul, lots of useful information in the manual.
>> https://www.gnu.org/software/libc/manual/html_node/String-and-Array-Utilities.html
>>
>> Re this man page:
>>
>> https://man7.org/linux/man-pages/man3/string.3.html
>>
>>  Obsolete functions
>>        char *strncpy(char dest[restrict .n], const char src[restrict .n],
>>                      size_t n);
>>               Copy at most n bytes from string src to dest, returning a
>>               pointer to the start of dest.
> 
> Uh, I forgot about that page.  I'll have a look at it and update it.  At
> least, I need to remove that "Obsolete functions".
> 
>>
>>
>> It could clarify
>> "Copy at most n bytes from string src to ARRAY dest, returning a
>> pointer to the start of ARRAY dest."
> 
> I think I prefer DJ's suggestion:
> 
> "Fill a fixed‐width null‐padded buffer with bytes from a string."

Better to make it clear it's null-padded after?

"Fill a fixed‐width buffer with bytes from a string and pad with null bytes."

I'll leave it with you.

Kind regards
Jonny

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
  2023-11-09 14:47                                       ` Jonny Grant
@ 2023-11-09 15:02                                         ` Alejandro Colomar
  2023-11-09 17:30                                           ` DJ Delorie
  0 siblings, 1 reply; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-09 15:02 UTC (permalink / raw)
  To: Jonny Grant
  Cc: Paul Eggert, linux-man, libc-alpha, DJ Delorie, Matthew House,
	Oskari Pirhonen, Thorsten Kukuk, Adhemerval Zanella Netto,
	Zack Weinberg, G. Branden Robinson, Carlos O'Donell

[-- Attachment #1: Type: text/plain, Size: 756 bytes --]

On Thu, Nov 09, 2023 at 02:47:05PM +0000, Jonny Grant wrote:
> >> It could clarify
> >> "Copy at most n bytes from string src to ARRAY dest, returning a
> >> pointer to the start of ARRAY dest."
> > 
> > I think I prefer DJ's suggestion:
> > 
> > "Fill a fixed‐width null‐padded buffer with bytes from a string."
> 
> Better to make it clear it's null-padded after?
> 
> "Fill a fixed‐width buffer with bytes from a string and pad with null bytes."

Yes, that looks even better.  And I wasn't very happy with "bytes".
Maybe:

"Fill a fixed-width buffer with characters from a string and pad with
null bytes."

Thanks,
Alex

> 
> I'll leave it with you.
> 
> Kind regards
> Jonny

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-09 14:05                                   ` Jonny Grant
@ 2023-11-09 15:04                                     ` Alejandro Colomar
  0 siblings, 0 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-09 15:04 UTC (permalink / raw)
  To: Jonny Grant
  Cc: Paul Eggert, Carlos O'Donell, Zack Weinberg,
	GNU libc development, 'linux-man'

[-- Attachment #1: Type: text/plain, Size: 1545 bytes --]

On Thu, Nov 09, 2023 at 02:05:38PM +0000, Jonny Grant wrote:
> 
> 
> On 09/11/2023 11:13, Alejandro Colomar wrote:
> > Hi Jonny,
> > 
> > On Thu, Nov 09, 2023 at 10:13:24AM +0000, Jonny Grant wrote:
> >> On 09/11/2023 00:29, Alejandro Colomar wrote:
> >> How about following the style of the other man pages that put the notes about each function below them? (rather than above)
> >> https://man7.org/linux/man-pages/man3/string.3.html
> >>
> >> size_t strlen(const char *s);
> >> Return the length of the string s.
> >>
> >>
> >> At the moment on string_copying there are // comments on the line above each function. So the presentation of the information is different:
> >>
> >> // Copy/catenate a string.
> >> char *strcpy(char *restrict dst, const char *restrict src);
> >> char *strcat(char *restrict dst, const char *restrict src);
> > 
> > The reason for this presentation is that I want to first look at what
> > they do, and only then look at the function you need to do that.
> 
> That appears different to the man page convention. It looks odd especially with the extra // that I don't recall other pages having in the description, usually that would be for examples. Consistency is best, but I'll leave it with you.

The difference is that you're comparing to man3 pages, which document
specific functions.  string_copying(7) instead documents how to copy
functions, and specific functions are only means to that end.  I'll keep
it this way.

Thanks,
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH v2 1/2] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
  2023-11-08 22:17                               ` [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string Alejandro Colomar
  2023-11-08 23:06                                 ` Paul Eggert
  2023-11-09  7:23                                 ` Oskari Pirhonen
@ 2023-11-09 15:20                                 ` Alejandro Colomar
  2023-11-09 15:20                                 ` [PATCH v2 2/2] stpncpy.3, string.3, string_copying.7: Clarify that st[rp]ncpy() pad with null bytes Alejandro Colomar
  3 siblings, 0 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-09 15:20 UTC (permalink / raw)
  To: linux-man
  Cc: Alejandro Colomar, libc-alpha, DJ Delorie, Oskari Pirhonen,
	Jonny Grant, Matthew House, Thorsten Kukuk,
	Adhemerval Zanella Netto, Zack Weinberg, G. Branden Robinson,
	Carlos O'Donell, Paul Eggert, Xi Ruoyao

These copy *from* a string.  But the destination is a simple character
sequence within an array; not a string.

Suggested-by: DJ Delorie <dj@redhat.com>
Acked-by: Oskari Pirhonen <xxc3ncoredxx@gmail.com>
Cc: Jonny Grant <jg@jguk.org>
Cc: Matthew House <mattlloydhouse@gmail.com>
Cc: Thorsten Kukuk <kukuk@suse.com>
Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Zack Weinberg <zack@owlfolio.org>
Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Carlos O'Donell <carlos@redhat.com>
Cc: Paul Eggert <eggert@cs.ucla.edu>
Cc: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---

Patch 1/2 is just a resend, with more CCs.
Patch 2/2 is a new one further clarifying the wording, after Jonny's
suggestions.

 man3/stpncpy.3        | 17 +++++++++++++----
 man7/string_copying.7 | 20 ++++++++++----------
 2 files changed, 23 insertions(+), 14 deletions(-)

diff --git a/man3/stpncpy.3 b/man3/stpncpy.3
index b6bbfd0a3..f86ff8c29 100644
--- a/man3/stpncpy.3
+++ b/man3/stpncpy.3
@@ -6,9 +6,8 @@
 .TH stpncpy 3 (date) "Linux man-pages (unreleased)"
 .SH NAME
 stpncpy, strncpy
-\- zero a fixed-width buffer and
-copy a string into a character sequence with truncation
-and zero the rest of it
+\-
+fill a fixed-width null-padded buffer with bytes from a string
 .SH LIBRARY
 Standard C library
 .RI ( libc ", " \-lc )
@@ -37,7 +36,7 @@ .SH SYNOPSIS
         _GNU_SOURCE
 .fi
 .SH DESCRIPTION
-These functions copy the string pointed to by
+These functions copy bytes from the string pointed to by
 .I src
 into a null-padded character sequence at the fixed-width buffer pointed to by
 .IR dst .
@@ -110,6 +109,16 @@ .SH CAVEATS
 These functions produce a null-padded character sequence,
 not a string (see
 .BR string_copying (7)).
+For example:
+.P
+.in +4n
+.EX
+strncpy(buf, "1", 5);       // { \[aq]1\[aq],   0,   0,   0,   0 }
+strncpy(buf, "1234", 5);    // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq],   0 }
+strncpy(buf, "12345", 5);   // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], \[aq]5\[aq] }
+strncpy(buf, "123456", 5);  // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], \[aq]5\[aq] }
+.EE
+.in
 .P
 It's impossible to distinguish truncation by the result of the call,
 from a character sequence that just fits the destination buffer;
diff --git a/man7/string_copying.7 b/man7/string_copying.7
index cadf1c539..0e179ba34 100644
--- a/man7/string_copying.7
+++ b/man7/string_copying.7
@@ -41,15 +41,11 @@ .SS Strings
 .\" ----- SYNOPSIS :: Null-padded character sequences --------/
 .SS Null-padded character sequences
 .nf
-// Zero a fixed-width buffer, and
-// copy a string into a character sequence with truncation.
-.BI "char *stpncpy(char " dst "[restrict ." sz "], \
+// Fill a fixed-width null-padded buffer with bytes from a string.
+.BI "char *strncpy(char " dst "[restrict ." sz "], \
 const char *restrict " src ,
 .BI "               size_t " sz );
-.P
-// Zero a fixed-width buffer, and
-// copy a string into a character sequence with truncation.
-.BI "char *strncpy(char " dst "[restrict ." sz "], \
+.BI "char *stpncpy(char " dst "[restrict ." sz "], \
 const char *restrict " src ,
 .BI "               size_t " sz );
 .P
@@ -240,14 +236,18 @@ .SS Truncate or not?
 .\" ----- DESCRIPTION :: Null-padded character sequences --------------/
 .SS Null-padded character sequences
 For historic reasons,
-some standard APIs,
+some standard APIs and file formats,
 such as
-.BR utmpx (5),
+.BR utmpx (5)
+and
+.BR tar (1),
 use null-padded character sequences in fixed-width buffers.
 To interface with them,
 specialized functions need to be used.
 .P
-To copy strings into them, use
+To copy bytes from strings into these buffers, use
+.BR strncpy (3)
+or
 .BR stpncpy (3).
 .P
 To copy from an unterminated string within a fixed-width buffer into a string,
-- 
2.42.0


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH v2 2/2] stpncpy.3, string.3, string_copying.7: Clarify that st[rp]ncpy() pad with null bytes
  2023-11-08 22:17                               ` [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string Alejandro Colomar
                                                   ` (2 preceding siblings ...)
  2023-11-09 15:20                                 ` [PATCH v2 1/2] " Alejandro Colomar
@ 2023-11-09 15:20                                 ` Alejandro Colomar
  2023-11-10  5:47                                   ` Oskari Pirhonen
  3 siblings, 1 reply; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-09 15:20 UTC (permalink / raw)
  To: linux-man
  Cc: Alejandro Colomar, libc-alpha, Jonny Grant, DJ Delorie,
	Matthew House, Oskari Pirhonen, Thorsten Kukuk,
	Adhemerval Zanella Netto, Zack Weinberg, G. Branden Robinson,
	Carlos O'Donell, Paul Eggert, Xi Ruoyao

The previous wording could be interpreted as if the nulls were already
in place.  Clarify that it's this function which pads with null bytes.

Also, it copies "characters" from the src string.  That's a bit more
specific than copying "bytes", and makes it clearer that the terminating
null byte in src is not part of the copy.

Suggested-by: Jonny Grant <jg@jguk.org>
Cc: DJ Delorie <dj@redhat.com>
Cc: Jonny Grant <jg@jguk.org>
Cc: Matthew House <mattlloydhouse@gmail.com>
Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com>
Cc: Thorsten Kukuk <kukuk@suse.com>
Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Zack Weinberg <zack@owlfolio.org>
Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Carlos O'Donell <carlos@redhat.com>
Cc: Paul Eggert <eggert@cs.ucla.edu>
Cc: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
 man3/stpncpy.3        | 10 ++++++----
 man3/string.3         | 11 ++---------
 man7/string_copying.7 |  3 ++-
 3 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/man3/stpncpy.3 b/man3/stpncpy.3
index f86ff8c29..3cf4eb371 100644
--- a/man3/stpncpy.3
+++ b/man3/stpncpy.3
@@ -7,7 +7,8 @@
 .SH NAME
 stpncpy, strncpy
 \-
-fill a fixed-width null-padded buffer with bytes from a string
+fill a fixed-width buffer with characters from a string
+and pad with null bytes
 .SH LIBRARY
 Standard C library
 .RI ( libc ", " \-lc )
@@ -36,10 +37,11 @@ .SH SYNOPSIS
         _GNU_SOURCE
 .fi
 .SH DESCRIPTION
-These functions copy bytes from the string pointed to by
+These functions copy characters from the string pointed to by
 .I src
-into a null-padded character sequence at the fixed-width buffer pointed to by
-.IR dst .
+into a character sequence at the fixed-width buffer pointed to by
+.IR dst ,
+and pad with null bytes.
 If the destination buffer,
 limited by its size,
 isn't large enough to hold the copy,
diff --git a/man3/string.3 b/man3/string.3
index aba5efd2b..bd8b342a6 100644
--- a/man3/string.3
+++ b/man3/string.3
@@ -179,21 +179,14 @@ .SH SYNOPSIS
 .I n
 bytes to
 .IR dest .
-.SS Obsolete functions
 .TP
 .nf
 .BI "char *strncpy(char " dest "[restrict ." n "], \
 const char " src "[restrict ." n ],
 .BI "       size_t " n );
 .fi
-Copy at most
-.I n
-bytes from string
-.I src
-to
-.IR dest ,
-returning a pointer to the start of
-.IR dest .
+Fill a fixed‐width buffer with characters from a string
+and pad with null bytes.
 .SH DESCRIPTION
 The string functions perform operations on null-terminated
 strings.
diff --git a/man7/string_copying.7 b/man7/string_copying.7
index 0e179ba34..865271c6f 100644
--- a/man7/string_copying.7
+++ b/man7/string_copying.7
@@ -41,7 +41,8 @@ .SS Strings
 .\" ----- SYNOPSIS :: Null-padded character sequences --------/
 .SS Null-padded character sequences
 .nf
-// Fill a fixed-width null-padded buffer with bytes from a string.
+// Fill a fixed-width buffer with characters from a string
+// and pad with null bytes.
 .BI "char *strncpy(char " dst "[restrict ." sz "], \
 const char *restrict " src ,
 .BI "               size_t " sz );
-- 
2.42.0


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
  2023-11-09 15:02                                         ` Alejandro Colomar
@ 2023-11-09 17:30                                           ` DJ Delorie
  2023-11-09 17:54                                             ` Andreas Schwab
                                                               ` (2 more replies)
  0 siblings, 3 replies; 77+ messages in thread
From: DJ Delorie @ 2023-11-09 17:30 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: jg, eggert, linux-man, libc-alpha, mattlloydhouse, xxc3ncoredxx,
	kukuk, adhemerval.zanella, zack, g.branden.robinson, carlos

Alejandro Colomar <alx@kernel.org> writes:
> "Fill a fixed-width buffer with characters from a string and pad with
> null bytes."

The pedant in me says it should be NUL bytes (or NUL's), not null bytes.
nul/NUL is a character, null/NULL is a pointer.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
  2023-11-09 17:30                                           ` DJ Delorie
@ 2023-11-09 17:54                                             ` Andreas Schwab
  2023-11-09 18:00                                             ` Alejandro Colomar
  2023-11-09 19:42                                             ` Jonny Grant
  2 siblings, 0 replies; 77+ messages in thread
From: Andreas Schwab @ 2023-11-09 17:54 UTC (permalink / raw)
  To: DJ Delorie
  Cc: Alejandro Colomar, jg, eggert, linux-man, libc-alpha,
	mattlloydhouse, xxc3ncoredxx, kukuk, adhemerval.zanella, zack,
	g.branden.robinson, carlos

On Nov 09 2023, DJ Delorie wrote:

> The pedant in me says it should be NUL bytes (or NUL's), not null bytes.
> nul/NUL is a character, null/NULL is a pointer.

NUL is the ASCII abbreviation for Null (see RFC 20).

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
  2023-11-09 17:30                                           ` DJ Delorie
  2023-11-09 17:54                                             ` Andreas Schwab
@ 2023-11-09 18:00                                             ` Alejandro Colomar
  2023-11-09 19:42                                             ` Jonny Grant
  2 siblings, 0 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-09 18:00 UTC (permalink / raw)
  To: DJ Delorie
  Cc: jg, eggert, linux-man, libc-alpha, mattlloydhouse, xxc3ncoredxx,
	kukuk, adhemerval.zanella, zack, g.branden.robinson, carlos

[-- Attachment #1: Type: text/plain, Size: 2519 bytes --]

Hi DJ,

On Thu, Nov 09, 2023 at 12:30:17PM -0500, DJ Delorie wrote:
> Alejandro Colomar <alx@kernel.org> writes:
> > "Fill a fixed-width buffer with characters from a string and pad with
> > null bytes."
> 
> The pedant in me says it should be NUL bytes (or NUL's), not null bytes.
> nul/NUL is a character, null/NULL is a pointer.

Here's what man-pages(7) (written by Michael Kerrisk) says:

   NULL, NUL, null pointer, and null byte
     A null pointer is a pointer that points to nothing, and  is  nor‐
     mally  indicated by the constant NULL.  On the other hand, NUL is
     the null byte, a byte with the value 0, represented in C via  the
     character constant '\0'.

     The  preferred  term  for the pointer is "null pointer" or simply
     "NULL"; avoid writing "NULL pointer".

     The preferred term for the byte is "null  byte".   Avoid  writing
     "NUL",  since  it is too easily confused with "NULL".  Avoid also
     the terms "zero byte" and "null character".  The byte that termi‐
     nates a C string should be described  as  "the  terminating  null
     byte";  strings  may be described as "null‐terminated", but avoid
     the use of "NUL‐terminated".


I don't necessarily agree with all of that, but mostly.  I don't agree
with not saying null character, because as well as we have the null wide
character (L'\0'), using null character for '\0' makes it symmetric.

Other than that, I mostly agree with Michael.  Here's what I think of
these terms:

-  NULL is a null pointer constant (as well as 0 is another null pointer
   constant).

-  A null pointer is a more generic term that includes a run-time null
   pointer as well. 

-  The null byte is 0.

-  The null character, '\0', is composed of a null byte.

-  The null wide character, L'\0' is composed of several null bytes.

-  NUL is the ASCII name of the null byte, or maybe is it null character
   here?  It's a bit muddy.

I use null byte for padding, and null character for the string
terminator, to make a stronger difference between strings and
null-padded fixed-width arrays.  I need to review string_copying(7) to
make sure I was consistent in this regard.

Colloquially, I find it fine to write NULL instead of null pointer (even
for non-constant cases), and NUL instead of any of "null character",
"null byte", or "null wide character", but for being precise, I prefer
"null something".

Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-09 11:38                   ` Alejandro Colomar
  2023-11-09 12:43                     ` Alejandro Colomar
  2023-11-09 12:51                     ` Xi Ruoyao
@ 2023-11-09 18:11                     ` Paul Eggert
  2023-11-09 23:48                       ` Alejandro Colomar
  2023-11-10 11:23                     ` Jonny Grant
  3 siblings, 1 reply; 77+ messages in thread
From: Paul Eggert @ 2023-11-09 18:11 UTC (permalink / raw)
  To: Alejandro Colomar, Jonny Grant; +Cc: Matthew House, linux-man, GNU C Library

On 2023-11-09 03:38, Alejandro Colomar wrote:
> If you are consistent in checking the return value of strlcpy(3) and
> reporting an error, it's the best standard alternative nowadays.

Not necessarily. strlcpy is subject to denial-of-service attacks if the 
attacker has control of the source string and can attack by using long 
source strings. strncpy, as bad as it is, does not have this problem.

Instead of this:

    if (strlcpy (dst, src, dstsize) == dstsize)
      return failure;

applications that want want to copy a string into a small nonempty 
fixed-size buffer, failing if the string doesn't fit, should do 
something like this:

    if (strncpy (dst, src, dstsize)[dstsize - 1])
      return failure;

This avoids the denial-of-service attack and is portable all the way 
back to K&R C.

It's unfortunate that strlcpy was misdesigned but here we are.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
  2023-11-09 17:30                                           ` DJ Delorie
  2023-11-09 17:54                                             ` Andreas Schwab
  2023-11-09 18:00                                             ` Alejandro Colomar
@ 2023-11-09 19:42                                             ` Jonny Grant
  2 siblings, 0 replies; 77+ messages in thread
From: Jonny Grant @ 2023-11-09 19:42 UTC (permalink / raw)
  To: DJ Delorie, Alejandro Colomar
  Cc: eggert, linux-man, libc-alpha, mattlloydhouse, xxc3ncoredxx,
	kukuk, adhemerval.zanella, zack, g.branden.robinson, carlos



On 09/11/2023 17:30, DJ Delorie wrote:
> Alejandro Colomar <alx@kernel.org> writes:
>> "Fill a fixed-width buffer with characters from a string and pad with
>> null bytes."
> 
> The pedant in me says it should be NUL bytes (or NUL's), not null bytes.
> nul/NUL is a character, null/NULL is a pointer.
> 

NUL would be a big improvement.

Kind regards, Jonny

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-09 18:11                     ` Paul Eggert
@ 2023-11-09 23:48                       ` Alejandro Colomar
  2023-11-10  5:36                         ` Paul Eggert
  0 siblings, 1 reply; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-09 23:48 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Jonny Grant, Matthew House, linux-man, GNU C Library

[-- Attachment #1: Type: text/plain, Size: 1126 bytes --]

On Thu, Nov 09, 2023 at 10:11:10AM -0800, Paul Eggert wrote:
> On 2023-11-09 03:38, Alejandro Colomar wrote:
> > If you are consistent in checking the return value of strlcpy(3) and
> > reporting an error, it's the best standard alternative nowadays.
> 
> Not necessarily. strlcpy is subject to denial-of-service attacks if the
> attacker has control of the source string and can attack by using long
> source strings. strncpy, as bad as it is, does not have this problem.

Interesting thing.  I'd then just use strlen(3)+strcpy(3), avoiding
strncpy(3).

> 
> Instead of this:
> 
>    if (strlcpy (dst, src, dstsize) == dstsize)
>      return failure;
> 
> applications that want want to copy a string into a small nonempty
> fixed-size buffer, failing if the string doesn't fit, should do something
> like this:
> 
>    if (strncpy (dst, src, dstsize)[dstsize - 1])
>      return failure;
> 
> This avoids the denial-of-service attack and is portable all the way back to
> K&R C.
> 
> It's unfortunate that strlcpy was misdesigned but here we are.
> 

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-09 23:48                       ` Alejandro Colomar
@ 2023-11-10  5:36                         ` Paul Eggert
  2023-11-10 11:05                           ` Alejandro Colomar
  2023-11-10 11:36                           ` Jonny Grant
  0 siblings, 2 replies; 77+ messages in thread
From: Paul Eggert @ 2023-11-10  5:36 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Jonny Grant, Matthew House, linux-man, GNU C Library

On 2023-11-09 15:48, Alejandro Colomar wrote:
> I'd then just use strlen(3)+strcpy(3), avoiding
> strncpy(3).

But that is vulnerable to the same denial-of-service attack that strlcpy 
is vulnerable to. You'd need strnlen+strcpy instead.

The strncpy approach I suggested is simpler, and (though this doesn't 
matter much in practice) is typically significantly faster than 
strnlen+strcpy in the typical case where the destination is a small 
fixed-size buffer.

Although strncpy is not a good design, it's often simpler or faster or 
safer than later "improvements".

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 2/2] stpncpy.3, string.3, string_copying.7: Clarify that st[rp]ncpy() pad with null bytes
  2023-11-09 15:20                                 ` [PATCH v2 2/2] stpncpy.3, string.3, string_copying.7: Clarify that st[rp]ncpy() pad with null bytes Alejandro Colomar
@ 2023-11-10  5:47                                   ` Oskari Pirhonen
  2023-11-10 10:47                                     ` Alejandro Colomar
  0 siblings, 1 reply; 77+ messages in thread
From: Oskari Pirhonen @ 2023-11-10  5:47 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: linux-man, libc-alpha, Jonny Grant, DJ Delorie, Matthew House,
	Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg,
	G. Branden Robinson, Carlos O'Donell, Paul Eggert, Xi Ruoyao

[-- Attachment #1: Type: text/plain, Size: 1941 bytes --]

On Thu, Nov 09, 2023 at 16:20:39 +0100, Alejandro Colomar wrote:
> The previous wording could be interpreted as if the nulls were already
> in place.  Clarify that it's this function which pads with null bytes.
> 
> Also, it copies "characters" from the src string.  That's a bit more
> specific than copying "bytes", and makes it clearer that the terminating
> null byte in src is not part of the copy.
> 
> Suggested-by: Jonny Grant <jg@jguk.org>
> Cc: DJ Delorie <dj@redhat.com>
> Cc: Jonny Grant <jg@jguk.org>
> Cc: Matthew House <mattlloydhouse@gmail.com>
> Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com>
> Cc: Thorsten Kukuk <kukuk@suse.com>
> Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
> Cc: Zack Weinberg <zack@owlfolio.org>
> Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
> Cc: Carlos O'Donell <carlos@redhat.com>
> Cc: Paul Eggert <eggert@cs.ucla.edu>
> Cc: Xi Ruoyao <xry111@xry111.site>
> Signed-off-by: Alejandro Colomar <alx@kernel.org>
> ---
>  man3/stpncpy.3        | 10 ++++++----
>  man3/string.3         | 11 ++---------
>  man7/string_copying.7 |  3 ++-
>  3 files changed, 10 insertions(+), 14 deletions(-)
> 

... snip ...

> diff --git a/man3/string.3 b/man3/string.3
> index aba5efd2b..bd8b342a6 100644
> --- a/man3/string.3
> +++ b/man3/string.3
> @@ -179,21 +179,14 @@ .SH SYNOPSIS
>  .I n
>  bytes to
>  .IR dest .
> -.SS Obsolete functions

If you're removing this section ...

>  .TP
>  .nf
>  .BI "char *strncpy(char " dest "[restrict ." n "], \
>  const char " src "[restrict ." n ],
>  .BI "       size_t " n );
>  .fi
> -Copy at most
> -.I n
> -bytes from string
> -.I src
> -to
> -.IR dest ,
> -returning a pointer to the start of
> -.IR dest .
> +Fill a fixed‐width buffer with characters from a string
> +and pad with null bytes.

... shouldn't you also move the rest of this up to keep it alphabetized?

- Oskari

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 2/2] stpncpy.3, string.3, string_copying.7: Clarify that st[rp]ncpy() pad with null bytes
  2023-11-10  5:47                                   ` Oskari Pirhonen
@ 2023-11-10 10:47                                     ` Alejandro Colomar
  0 siblings, 0 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-10 10:47 UTC (permalink / raw)
  To: linux-man, libc-alpha, Jonny Grant, DJ Delorie, Matthew House,
	Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg,
	G. Branden Robinson, Carlos O'Donell, Paul Eggert, Xi Ruoyao

[-- Attachment #1: Type: text/plain, Size: 2310 bytes --]

On Thu, Nov 09, 2023 at 11:47:34PM -0600, Oskari Pirhonen wrote:
> On Thu, Nov 09, 2023 at 16:20:39 +0100, Alejandro Colomar wrote:
> > The previous wording could be interpreted as if the nulls were already
> > in place.  Clarify that it's this function which pads with null bytes.
> > 
> > Also, it copies "characters" from the src string.  That's a bit more
> > specific than copying "bytes", and makes it clearer that the terminating
> > null byte in src is not part of the copy.
> > 
> > Suggested-by: Jonny Grant <jg@jguk.org>
> > Cc: DJ Delorie <dj@redhat.com>
> > Cc: Jonny Grant <jg@jguk.org>
> > Cc: Matthew House <mattlloydhouse@gmail.com>
> > Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com>
> > Cc: Thorsten Kukuk <kukuk@suse.com>
> > Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
> > Cc: Zack Weinberg <zack@owlfolio.org>
> > Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
> > Cc: Carlos O'Donell <carlos@redhat.com>
> > Cc: Paul Eggert <eggert@cs.ucla.edu>
> > Cc: Xi Ruoyao <xry111@xry111.site>
> > Signed-off-by: Alejandro Colomar <alx@kernel.org>
> > ---
> >  man3/stpncpy.3        | 10 ++++++----
> >  man3/string.3         | 11 ++---------
> >  man7/string_copying.7 |  3 ++-
> >  3 files changed, 10 insertions(+), 14 deletions(-)
> > 
> 
> ... snip ...
> 
> > diff --git a/man3/string.3 b/man3/string.3
> > index aba5efd2b..bd8b342a6 100644
> > --- a/man3/string.3
> > +++ b/man3/string.3
> > @@ -179,21 +179,14 @@ .SH SYNOPSIS
> >  .I n
> >  bytes to
> >  .IR dest .
> > -.SS Obsolete functions
> 
> If you're removing this section ...
> 
> >  .TP
> >  .nf
> >  .BI "char *strncpy(char " dest "[restrict ." n "], \
> >  const char " src "[restrict ." n ],
> >  .BI "       size_t " n );
> >  .fi
> > -Copy at most
> > -.I n
> > -bytes from string
> > -.I src
> > -to
> > -.IR dest ,
> > -returning a pointer to the start of
> > -.IR dest .
> > +Fill a fixed‐width buffer with characters from a string
> > +and pad with null bytes.
> 
> ... shouldn't you also move the rest of this up to keep it alphabetized?

Hi Oskari,

Sure!  I was trying to find a pattern in the order, but didn't see it
yesterday.  Thanks!  :)

Cheers,
Alex

> 
> - Oskari



-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-10  5:36                         ` Paul Eggert
@ 2023-11-10 11:05                           ` Alejandro Colomar
  2023-11-10 11:47                             ` Alejandro Colomar
  2023-11-10 17:58                             ` Paul Eggert
  2023-11-10 11:36                           ` Jonny Grant
  1 sibling, 2 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-10 11:05 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Jonny Grant, Matthew House, linux-man, GNU C Library

[-- Attachment #1: Type: text/plain, Size: 1427 bytes --]

Hi Paul,

On Thu, Nov 09, 2023 at 09:36:43PM -0800, Paul Eggert wrote:
> On 2023-11-09 15:48, Alejandro Colomar wrote:
> > I'd then just use strlen(3)+strcpy(3), avoiding
> > strncpy(3).

Heh, brain fart on my side.

> 
> But that is vulnerable to the same denial-of-service attack that strlcpy is
> vulnerable to. You'd need strnlen+strcpy instead.
> 
> The strncpy approach I suggested is simpler, and (though this doesn't matter

Yeah, although you can always wrap strnlen(3)+memcpy(3) in a strxcpy()
inline function and have it even simpler.

Rewriting the strxcpy() wrapper I wrote the other day to not be
vulnerable to DoS, and hoping I get it right today.

[[nodiscard]]
inline ssize_t
strxcpy(char *restrict dst, const char *restrict src, size_t dsize)
{
	size_t  slen;

	slen = strnlen(src, dsize);
	if (slen >= dsize)
		return -1;

	memcpy(dst, src, slen + 1);

	return slen;
}

Hopefully, it won't be so bad in terms of performance.  And it is still
protected by fortification of memcpy(3).  And thanks to [[nodiscard]],
it should be hard to misuse.

> much in practice) is typically significantly faster than strnlen+strcpy in
> the typical case where the destination is a small fixed-size buffer.
> 
> Although strncpy is not a good design, it's often simpler or faster or safer
> than later "improvements".

Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
       [not found]               ` <CACKs7VDsTdSNwbC6+2LtQ67J_eJiD814xkw2_5XM1Q=iMjLXJA@mail.gmail.com>
@ 2023-11-10 11:06                 ` Jonny Grant
  0 siblings, 0 replies; 77+ messages in thread
From: Jonny Grant @ 2023-11-10 11:06 UTC (permalink / raw)
  To: Stefan Puiu, Alejandro Colomar; +Cc: Matthew House, linux-man, GNU C Library

On 10/11/2023 10:40, Stefan Puiu wrote:
> Hi Alex,
> 
> On Wed, Nov 8, 2023 at 9:33 PM Alejandro Colomar <alx@kernel.org> wrote:
> [.....]
>> strncpy(3):
>> CAVEATS
>>      The  name  of  these  functions  is confusing.  These functions produce a
>>      null‐padded character sequence, not a string (see string_copying(7)).
> 
> I'm a bit confused by this distinction. Isn't a null-padded sequence
> technically also null-terminated? If there's a '0' at the end, then
> it's a string, in my understanding. Or was the intention to say "a
> character sequence that may be null-padded", where the case in which
> there's no padding at all being the reason for the distinction?

This is a null padded sequence of characters in an array:

char buf[4] = {'a', '\0', '\0', '\0'};

I'm sure we are all well aware from this long email thread, strncpy is designed to fill fixed sized arrays, and pad with NUL bytes '\0' if any space left. Otherwise, the array buffer is left not padded.. there in lies the trouble, a possibly not terminated sequence of characters. Someone thought saving the extra byte was a good idea. It would have been better if that programmer had crafted their own local function rather than put out the strncpy function which is similarly named to strcpy(), they could have called it copy_to_array_nul_pad().

// a not terminated array - using printf, or strlen will carry on reading off down the memory until it finds a NUL byte '\0', perhaps reading out side the addressable space of the process, causing a SEGV.
char buf[4] = {'a', 'b', 'c', 'd'};

Hope that helps.

Kind regards, Jonny

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-09 11:38                   ` Alejandro Colomar
                                       ` (2 preceding siblings ...)
  2023-11-09 18:11                     ` Paul Eggert
@ 2023-11-10 11:23                     ` Jonny Grant
  3 siblings, 0 replies; 77+ messages in thread
From: Jonny Grant @ 2023-11-10 11:23 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Matthew House, linux-man, GNU C Library



On 09/11/2023 11:38, Alejandro Colomar wrote:
> Hi Jonny,
> 
> On Thu, Nov 09, 2023 at 10:31:49AM +0000, Jonny Grant wrote:
>>> Probably the only way to solve the cleverness issue for good is to have an
>>> immediately-available, foolproof, performant set of string functions that
>>> are extremely straightforward to understand and use, flexible enough for
>>> any use case, and generally agreed to be the first choice for string
>>> manipulation.
>>
>> What's the best standardized function for C string copying in your
> 
> strlcpy(3) will soon be standard.  POSIX.1-202x (Issue 8) will add it,
> which is why it's been added recently to glibc.  Hopefully, ISO C3x will
> follow (yeah, it's not like tomorrow).
> 
>> opinion?  They all seem to have drawbacks, strlcpy truncates (I'd
>> rather it rejected if it didn't have enough buffer - could cause
>> issues if the meaning of the string changed due to truncation, eg if
>> it was a file path). Other alternative functions aren't widely in use.
> 
> If you are consistent in checking the return value of strlcpy(3) and
> reporting an error, it's the best standard alternative nowadays.
> snprintf(3), except for using int instead of size_t, has an equivalent
> API, and is in C99, in case that means something.
> 
> If you would want to write something based on Michael Kerrisk's article,
> you could do this:
> 
> 	ssize_t
> 	strxcpy(char *restrict dst, char *restrict src, size_t dsize)
> 	{
> 		if (strlen(src) < dsize)
> 			return -1;
> 
> 		strcpy(dst, src);
> 	}
> 
> You may also want to calculate 'dsize' automagically, to avoid human
> error, in case it's an array, so you could write a macro on top of it:
> 
> 	#define STRXCPY(dst, src)  strxcpy(dst, src, ARRAY_SIZE(dst))
> 
> These are just small wrappers over standard functions, so you shouldn't
> have problems adding them to your project.
> 
> This is my long term plan for shadow-utils, indeed.  I'm first
> transforming strncpy(3) calls into strlcpy(3) to remove the superfluous
> padding, and later will use this strxcpy() to remove the truncated
> strings to avoid misinterpretation.
> 
> Cheers,
> Alex
> 
>>
>> Kind regards, Jonny
> 

Yes, I like to look for a libc library function before writing my own wrapper, but I would consider something like strxcpy.

snprintf will truncate if not enough space, but will then return the number of bytes that would have been written had there not been truncation. So one could use snprintf on an array buffer on the stack, and then if truncation, discard the buffer and return an error, otherwise carry on using the string (that wasn't truncated).

Re strlcpy I see BSD man page gives some examples how to check for truncation by strlcpy. Perhaps examples could be added to linux kernel man page.
https://man.freebsd.org/cgi/man.cgi?query=strlcat&sektion=3

Kind regards, Jonny

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-10  5:36                         ` Paul Eggert
  2023-11-10 11:05                           ` Alejandro Colomar
@ 2023-11-10 11:36                           ` Jonny Grant
  2023-11-10 13:15                             ` Alejandro Colomar
  1 sibling, 1 reply; 77+ messages in thread
From: Jonny Grant @ 2023-11-10 11:36 UTC (permalink / raw)
  To: Paul Eggert, Alejandro Colomar; +Cc: Matthew House, linux-man, GNU C Library



On 10/11/2023 05:36, Paul Eggert wrote:
> On 2023-11-09 15:48, Alejandro Colomar wrote:
>> I'd then just use strlen(3)+strcpy(3), avoiding
>> strncpy(3).
> 
> But that is vulnerable to the same denial-of-service attack that strlcpy is vulnerable to. You'd need strnlen+strcpy instead.
> 
> The strncpy approach I suggested is simpler, and (though this doesn't matter much in practice) is typically significantly faster than strnlen+strcpy in the typical case where the destination is a small fixed-size buffer.
> 
> Although strncpy is not a good design, it's often simpler or faster or safer than later "improvements".

As you say, it is a known API. I recall looking for a standardized bounded string copy a few years ago that avoids pitfalls:

1) cost of any initial strnlen() reading memory to determine input src size
2) accepts a src_max_size to actually try to copy from src
3) does not truncate by writing anything to the buffer if there isn't enough space in the dest_max_size to fit src_max_size
4) check for NULL pointers
5) probably other thing I've overlooked

Something like this API:
int my_str_copy(char *dest, const char *src, size_t dest_max_size, size_t src_max_size, size_t * dest_written);
These sizes are including any NUL terminating byte.

0 on success, or an an error code like EINVAL, or ERANGE if would truncate

All comments welcome.

Kind regards, Jonny

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-10 11:05                           ` Alejandro Colomar
@ 2023-11-10 11:47                             ` Alejandro Colomar
  2023-11-10 17:58                             ` Paul Eggert
  1 sibling, 0 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-10 11:47 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Jonny Grant, Matthew House, linux-man, GNU C Library

[-- Attachment #1: Type: text/plain, Size: 1666 bytes --]

On Fri, Nov 10, 2023 at 12:05:31PM +0100, Alejandro Colomar wrote:
> Hi Paul,
> 
> On Thu, Nov 09, 2023 at 09:36:43PM -0800, Paul Eggert wrote:
> > On 2023-11-09 15:48, Alejandro Colomar wrote:
> > > I'd then just use strlen(3)+strcpy(3), avoiding
> > > strncpy(3).
> 
> Heh, brain fart on my side.
> 
> > 
> > But that is vulnerable to the same denial-of-service attack that strlcpy is
> > vulnerable to. You'd need strnlen+strcpy instead.
> > 
> > The strncpy approach I suggested is simpler, and (though this doesn't matter
> 
> Yeah, although you can always wrap strnlen(3)+memcpy(3) in a strxcpy()
> inline function and have it even simpler.
> 
> Rewriting the strxcpy() wrapper I wrote the other day to not be
> vulnerable to DoS, and hoping I get it right today.
> 
> [[nodiscard]]
> inline ssize_t
> strxcpy(char *restrict dst, const char *restrict src, size_t dsize)
> {
> 	size_t  slen;
> 
> 	slen = strnlen(src, dsize);
> 	if (slen >= dsize)

Oops:  s/>=/==/

> 		return -1;
> 
> 	memcpy(dst, src, slen + 1);
> 
> 	return slen;
> }
> 
> Hopefully, it won't be so bad in terms of performance.  And it is still
> protected by fortification of memcpy(3).  And thanks to [[nodiscard]],
> it should be hard to misuse.
> 
> > much in practice) is typically significantly faster than strnlen+strcpy in
> > the typical case where the destination is a small fixed-size buffer.
> > 
> > Although strncpy is not a good design, it's often simpler or faster or safer
> > than later "improvements".
> 
> Cheers,
> Alex
> 
> -- 
> <https://www.alejandro-colomar.es/>



-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-10 11:36                           ` Jonny Grant
@ 2023-11-10 13:15                             ` Alejandro Colomar
  0 siblings, 0 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-10 13:15 UTC (permalink / raw)
  To: Jonny Grant; +Cc: Paul Eggert, Matthew House, linux-man, GNU C Library

[-- Attachment #1: Type: text/plain, Size: 3255 bytes --]

Hi Jonny,

On Fri, Nov 10, 2023 at 11:36:20AM +0000, Jonny Grant wrote:
> 
> 
> On 10/11/2023 05:36, Paul Eggert wrote:
> > On 2023-11-09 15:48, Alejandro Colomar wrote:
> >> I'd then just use strlen(3)+strcpy(3), avoiding
> >> strncpy(3).
> > 
> > But that is vulnerable to the same denial-of-service attack that strlcpy is vulnerable to. You'd need strnlen+strcpy instead.
> > 
> > The strncpy approach I suggested is simpler, and (though this doesn't matter much in practice) is typically significantly faster than strnlen+strcpy in the typical case where the destination is a small fixed-size buffer.
> > 
> > Although strncpy is not a good design, it's often simpler or faster or safer than later "improvements".
> 
> As you say, it is a known API. I recall looking for a standardized bounded string copy a few years ago that avoids pitfalls:
> 
> 1) cost of any initial strnlen() reading memory to determine input src size
> 2) accepts a src_max_size to actually try to copy from src
> 3) does not truncate by writing anything to the buffer if there isn't enough space in the dest_max_size to fit src_max_size
> 4) check for NULL pointers
> 5) probably other thing I've overlooked
> 
> Something like this API:
> int my_str_copy(char *dest, const char *src, size_t dest_max_size, size_t src_max_size, size_t * dest_written);
> These sizes are including any NUL terminating byte.
> 
> 0 on success, or an an error code like EINVAL, or ERANGE if would truncate

-  Linux kernel's strscpy() returns -E2BIG if it would truncate.  You
   may want to follow suit if you want such an errno(3) code.

   However, I think it's simpler to return the "standard" user-space
   error return value: -1

   If you'd need to distinguish error reasons, you could distinguish
   error codes, but for a string-copying function I think it's not so
   useful.

-  Why specify the src buffer size?  If you're copying strings, then you
   know it'll be null-terminated, so strnlen(3) will not overrun.  If
   you're not copying strings, then you'll need a different function
   that reads from a non-string.  The only standard such function is
   strncat(3), which reads from a fixed-width null-padded buffer, and
   writes to a string.  You may want to write a function similar to
   strncat(3) that doesn't catenate, if you want to just copy; I call
   that function zustr2stp(), and you can find an implementation in
   string_copying(7).

-  You can reuse the return value for the dest_written value with
   ssize_t.  Just return -1 on error and the string length on success.
   That's how most libc functions behave.

-  Regarding NULL checks, it depends on how you program.  I wouldn't add
   them, but if you want to avoid crashes at all costs, it may be
   necessary for you.  You could do a wrapper over strxcpy():


	inline ssize_t
	strxcpy0(char *restrict dst, const char *restrict src, size_t dsize)
	{
		if (dst == NULL || src == NULL)
			return -1;

		return strxcpy(dst, src, dsize);
	}

   I used 0 in the name to mark that this function checks for null
   pointers.

Cheers,
Alex

> 
> All comments welcome.
> 
> Kind regards, Jonny

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-10 11:05                           ` Alejandro Colomar
  2023-11-10 11:47                             ` Alejandro Colomar
@ 2023-11-10 17:58                             ` Paul Eggert
  2023-11-10 18:36                               ` Alejandro Colomar
  2023-11-10 19:52                               ` Alejandro Colomar
  1 sibling, 2 replies; 77+ messages in thread
From: Paul Eggert @ 2023-11-10 17:58 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Jonny Grant, Matthew House, linux-man, GNU C Library

On 2023-11-10 03:05, Alejandro Colomar wrote:
> Hopefully, it won't be so bad in terms of performance.

It's significantly slower than strncpy for typical use (smallish 
fixed-size destination buffers). So just use strncpy for that. It may be 
bad, but it's better than the alternatives you've mentioned. You can 
package strncpy inside a [[nodiscard]] inline wrapper if you like.

More importantly, the manual should not push strlcpy as being superior 
or being in any way a "fix" for strncpy's problems. strlcpy is worse 
than strncpy in important ways and besides - as mentioned in the glibc 
manual - neither function is a good choice for string processing.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-10 17:58                             ` Paul Eggert
@ 2023-11-10 18:36                               ` Alejandro Colomar
  2023-11-10 20:19                                 ` Alejandro Colomar
  2023-11-10 19:52                               ` Alejandro Colomar
  1 sibling, 1 reply; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-10 18:36 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Jonny Grant, Matthew House, linux-man, GNU C Library

[-- Attachment #1: Type: text/plain, Size: 2030 bytes --]

Hi Paul,


On Fri, Nov 10, 2023 at 09:58:42AM -0800, Paul Eggert wrote:
> On 2023-11-10 03:05, Alejandro Colomar wrote:
> > Hopefully, it won't be so bad in terms of performance.
> 
> It's significantly slower than strncpy for typical use (smallish fixed-size
> destination buffers). So just use strncpy for that. It may be bad, but it's
> better than the alternatives you've mentioned. You can package strncpy
> inside a [[nodiscard]] inline wrapper if you like.
> 
> More importantly, the manual should not push strlcpy as being superior or
> being in any way a "fix" for strncpy's problems. strlcpy is worse than
> strncpy in important ways and besides - as mentioned in the glibc manual -
> neither function is a good choice for string processing.

Hmmmm, that sounds convincing.  How about this as a starting point?

diff --git a/man3/stpncpy.3 b/man3/stpncpy.3
index 3cf4eb371..3aff18106 100644
--- a/man3/stpncpy.3
+++ b/man3/stpncpy.3
@@ -67,6 +67,38 @@ .SH DESCRIPTION
 }
 .EE
 .in
+.\"
+.SS Copying a string with truncation
+Although this function wasn't designed to copy a string with truncation,
+it can be used with appropriate care for that purpose.
+Such use is prone to off-by-one bugs,
+so it is recommended that you write a wrapper function
+that encloses all the danger.
+.P
+.in +4n
+.EX
+[[nodiscard]]
+inline ssize_t
+strxcpy(char *restrict dst, const char *restrict src, char dsize)
+{
+    char  *p;
+
+    p = stpncpy(dst, src, dsize);
+    if (dst[dsize - 1] != '\0')
+        return -1;
+
+    return p - dst - 1;
+}
+.EE
+.in
+You could implement a similar function in terms of
+.BR strlen (3)
+and
+.BR memcpy (3),
+or in terms of
+.BR strlcpy (3),
+and it would be simpler,
+but this implementation is faster.
 .SH RETURN VALUE
 .TP
 .BR strncpy ()


I used stpncpy(3), assuming it will have the same performance of
strncpy(3), because it can be used to return the length.

Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-10 17:58                             ` Paul Eggert
  2023-11-10 18:36                               ` Alejandro Colomar
@ 2023-11-10 19:52                               ` Alejandro Colomar
  2023-11-10 22:14                                 ` Paul Eggert
  1 sibling, 1 reply; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-10 19:52 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Jonny Grant, Matthew House, linux-man, GNU C Library

[-- Attachment #1: Type: text/plain, Size: 1010 bytes --]

On Fri, Nov 10, 2023 at 09:58:42AM -0800, Paul Eggert wrote:
> On 2023-11-10 03:05, Alejandro Colomar wrote:
> > Hopefully, it won't be so bad in terms of performance.
> 
> It's significantly slower than strncpy for typical use (smallish fixed-size
> destination buffers). So just use strncpy for that. It may be bad, but it's

Do you have any numbers?  I'm curious to see strnlen+memcpy vs stpncpy
for buffers of some typical sizes (say 80 and BUFSIZ) under amd64 and
arm64 (two typical archs).  Are we talking of 1%, 10%, or 100%?

> better than the alternatives you've mentioned. You can package strncpy
> inside a [[nodiscard]] inline wrapper if you like.
> 
> More importantly, the manual should not push strlcpy as being superior or
> being in any way a "fix" for strncpy's problems. strlcpy is worse than
> strncpy in important ways and besides - as mentioned in the glibc manual -
> neither function is a good choice for string processing.

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-10 18:36                               ` Alejandro Colomar
@ 2023-11-10 20:19                                 ` Alejandro Colomar
  2023-11-10 23:44                                   ` Jonny Grant
  0 siblings, 1 reply; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-10 20:19 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Jonny Grant, Matthew House, linux-man, GNU C Library

[-- Attachment #1: Type: text/plain, Size: 3565 bytes --]

Hi Paul,

On Fri, Nov 10, 2023 at 07:36:33PM +0100, Alejandro Colomar wrote:
> Hi Paul,
> 
> 
> On Fri, Nov 10, 2023 at 09:58:42AM -0800, Paul Eggert wrote:
> > On 2023-11-10 03:05, Alejandro Colomar wrote:
> > > Hopefully, it won't be so bad in terms of performance.
> > 
> > It's significantly slower than strncpy for typical use (smallish fixed-size
> > destination buffers). So just use strncpy for that. It may be bad, but it's
> > better than the alternatives you've mentioned. You can package strncpy
> > inside a [[nodiscard]] inline wrapper if you like.
> > 
> > More importantly, the manual should not push strlcpy as being superior or
> > being in any way a "fix" for strncpy's problems. strlcpy is worse than
> > strncpy in important ways and besides - as mentioned in the glibc manual -
> > neither function is a good choice for string processing.
> 
> Hmmmm, that sounds convincing.  How about this as a starting point?

Something slightly better:

diff --git a/man3/stpncpy.3 b/man3/stpncpy.3
index 3cf4eb371..8ffedae01 100644
--- a/man3/stpncpy.3
+++ b/man3/stpncpy.3
@@ -67,6 +67,88 @@ .SH DESCRIPTION
 }
 .EE
 .in
+.\"
+.SS Producing a string in a fixed-width buffer
+Programs should normally avoid arbitrary string limitations.
+However, some programs may need to write strings into fixed-width buffers.
+.P
+Although this function wasn't designed to produce a string,
+it can be used with appropriate care for that purpose.
+There are two main cases where it can be useful:
+.IP \[bu] 3
+Copying a string into a new string in a fixed-width buffer,
+preventing buffer overflow.
+.IP \[bu]
+Copying a string into a new string in a fixed-width buffer,
+with truncation.
+.P
+Using
+.BR strncpy (3)
+in any of those cases is prone to several classes of bugs,
+so it is recommended that you write a wrapper function
+that encloses all the dangers.
+.TP
+Copying a string preventing buffer overflow
+.in +4n
+.EX
+[[nodiscard]]
+inline ssize_t
+strxcpy(char *restrict dst, const char *restrict src, char dsize)
+{
+    char  *p;
+
+    if (dsize == 0)
+        return -1;
+
+    p = stpncpy(dst, src, dsize);
+    if (dst[dsize - 1] != '\0')
+        return -1;
+
+    return p - dst;
+}
+.EE
+.in
+.P
+If it returns -1,
+the contents of
+.I dst
+are undefined,
+and the program should handle the error.
+.P
+You could implement a similar function in terms of
+.BR strlen (3)
+and
+.BR memcpy (3),
+or in terms of
+.BR strlcpy (3),
+and it would be simpler,
+but this implementation is faster.
+.\"
+.TP
+Copying a string with truncation
+Truncation is almost always a bug.
+However, in the few cases where it is not a bug,
+you can use the following function.
+.in +4n
+.EX
+inline ssize_t
+strtcpy(char *restrict dst, const char *restrict src, char dsize)
+{
+    char  *p;
+
+    if (dsize == 0)
+        return -1;
+
+    p = stpncpy(dst, src, dsize);
+    if (dst[dsize - 1] != '\0') {
+        dst[dsize - 1] = '\0';
+        p--;
+    }
+
+    return p - dst;
+}
+.EE
+.in
 .SH RETURN VALUE
 .TP
 .BR strncpy ()


However, note how many branches we need to make a function that handles
all corner cases.  Is it still faster than strnlen+memcpy?  stpncpy must
be heavily optimized for that.  Also, strnlen(3) might be optimized out
by the compiler in many cases, so maybe in real code it would be better
to use memcpy.  I'd very much like to see some numbers.

Thanks,
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-10 19:52                               ` Alejandro Colomar
@ 2023-11-10 22:14                                 ` Paul Eggert
  2023-11-11 21:13                                   ` Alejandro Colomar
  0 siblings, 1 reply; 77+ messages in thread
From: Paul Eggert @ 2023-11-10 22:14 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Jonny Grant, Matthew House, linux-man, GNU C Library

[-- Attachment #1: Type: text/plain, Size: 1133 bytes --]

On 2023-11-10 11:52, Alejandro Colomar wrote:

> Do you have any numbers?

It depends on size of course. With programs like 'tar' (one of the few 
programs that actually needs something like strncpy) the destination 
buffer is usually fairly small (32 bytes or less) though some of them 
are 100 bytes. I used 16 bytes in the following shell transcript:

$ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy strlcpy; do 
echo; echo $i:; time ./a.out 16 100000000 abcdefghijk $i; done

strnlen+strcpy:

real	0m0.411s
user	0m0.411s
sys	0m0.000s

strnlen+memcpy:

real	0m0.392s
user	0m0.388s
sys	0m0.004s

strncpy:

real	0m0.300s
user	0m0.300s
sys	0m0.000s

stpncpy:

real	0m0.326s
user	0m0.326s
sys	0m0.000s

strlcpy:

real	0m0.623s
user	0m0.623s
sys	0m0.000s


... where a.out was generated by compiling the attached program with gcc 
-O2 on Ubuntu 23.10 64-bit on a Xeon W-1350.

I wouldn't take these numbers all that seriously, as microbenchmarks 
like these are not that informative these days. Still, for a typical 
case one should not assume strncpy must be slower merely because it has 
more work to do; quite the contrary.

[-- Attachment #2: strncpy-bench.c --]
[-- Type: text/x-csrc, Size: 1090 bytes --]

#include <stdlib.h>
#include <string.h>


int
main (int argc, char **argv)
{
  if (argc != 5)
    return 2;
  long bufsize = atol (argv[1]);
  char *buf = malloc (bufsize);
  long n = atol (argv[2]);
  char const *a = argv[3];
  if (strcmp (argv[4], "strnlen+strcpy") == 0)
    {
      for (long i = 0; i < n; i++)
	{
	  if (strnlen (a, bufsize) == bufsize)
	    return 1;
	  strcpy (buf, a);
	}
    }
  else if (strcmp (argv[4], "strnlen+memcpy") == 0)
    {
      for (long i = 0; i < n; i++)
	{
	  size_t alen = strnlen (a, bufsize);
	  if (alen == bufsize)
	    return 1;
	  memcpy (buf, a, alen + 1);
	}
    }
  else if (strcmp (argv[4], "strncpy") == 0)
    {
      for (long i = 0; i < n; i++)
	if (strncpy (buf, a, bufsize)[bufsize - 1])
	  return 1;
    }
  else if (strcmp (argv[4], "stpncpy") == 0)
    {
      for (long i = 0; i < n; i++)
	if (stpncpy (buf, a, bufsize) == buf + bufsize)
	  return 1;
    }
  else if (strcmp (argv[4], "strlcpy") == 0)
    {
      for (long i = 0; i < n; i++)
	if (strlcpy (buf, a, bufsize) == bufsize)
	  return 1;
    }
  else
    return 2;
}

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-10 20:19                                 ` Alejandro Colomar
@ 2023-11-10 23:44                                   ` Jonny Grant
  0 siblings, 0 replies; 77+ messages in thread
From: Jonny Grant @ 2023-11-10 23:44 UTC (permalink / raw)
  To: Alejandro Colomar, Paul Eggert; +Cc: Matthew House, linux-man, GNU C Library



On 10/11/2023 20:19, Alejandro Colomar wrote:
> Hi Paul,
> 
> On Fri, Nov 10, 2023 at 07:36:33PM +0100, Alejandro Colomar wrote:
>> Hi Paul,
>>
>>
>> On Fri, Nov 10, 2023 at 09:58:42AM -0800, Paul Eggert wrote:
>>> On 2023-11-10 03:05, Alejandro Colomar wrote:
>>>> Hopefully, it won't be so bad in terms of performance.
>>>
>>> It's significantly slower than strncpy for typical use (smallish fixed-size
>>> destination buffers). So just use strncpy for that. It may be bad, but it's
>>> better than the alternatives you've mentioned. You can package strncpy
>>> inside a [[nodiscard]] inline wrapper if you like.
>>>
>>> More importantly, the manual should not push strlcpy as being superior or
>>> being in any way a "fix" for strncpy's problems. strlcpy is worse than
>>> strncpy in important ways and besides - as mentioned in the glibc manual -
>>> neither function is a good choice for string processing.
>>
>> Hmmmm, that sounds convincing.  How about this as a starting point?
> 
> Something slightly better:
> 
> diff --git a/man3/stpncpy.3 b/man3/stpncpy.3
> index 3cf4eb371..8ffedae01 100644
> --- a/man3/stpncpy.3
> +++ b/man3/stpncpy.3
> @@ -67,6 +67,88 @@ .SH DESCRIPTION
>  }
>  .EE
>  .in
> +.\"
> +.SS Producing a string in a fixed-width buffer
> +Programs should normally avoid arbitrary string limitations.
> +However, some programs may need to write strings into fixed-width buffers.
> +.P
> +Although this function wasn't designed to produce a string,
> +it can be used with appropriate care for that purpose.
> +There are two main cases where it can be useful:
> +.IP \[bu] 3
> +Copying a string into a new string in a fixed-width buffer,
> +preventing buffer overflow.
> +.IP \[bu]
> +Copying a string into a new string in a fixed-width buffer,
> +with truncation.
> +.P
> +Using
> +.BR strncpy (3)
> +in any of those cases is prone to several classes of bugs,
> +so it is recommended that you write a wrapper function
> +that encloses all the dangers.

Some feedback about last line: "that covers all the risks" is clearer.

> +.TP
> +Copying a string preventing buffer overflow
> +.in +4n
> +.EX
> +[[nodiscard]]
> +inline ssize_t
> +strxcpy(char *restrict dst, const char *restrict src, char dsize)
> +{
> +    char  *p;
> +
> +    if (dsize == 0)
> +        return -1;
> +
> +    p = stpncpy(dst, src, dsize);
> +    if (dst[dsize - 1] != '\0')
> +        return -1;
> +
> +    return p - dst;
> +}
> +.EE
> +.in
> +.P
> +If it returns -1,
> +the contents of
> +.I dst
> +are undefined,
> +and the program should handle the error.
> +.P
> +You could implement a similar function in terms of
> +.BR strlen (3)
> +and
> +.BR memcpy (3),
> +or in terms of
> +.BR strlcpy (3),
> +and it would be simpler,
> +but this implementation is faster.

I suggest to add a little more information, could append "because it accesses less memory".

> +.\"
> +.TP
> +Copying a string with truncation
> +Truncation is almost always a bug.
> +However, in the few cases where it is not a bug,
> +you can use the following function.
> +.in +4n
> +.EX
> +inline ssize_t
> +strtcpy(char *restrict dst, const char *restrict src, char dsize)
> +{
> +    char  *p;
> +
> +    if (dsize == 0)
> +        return -1;
> +
> +    p = stpncpy(dst, src, dsize);
> +    if (dst[dsize - 1] != '\0') {
> +        dst[dsize - 1] = '\0';
> +        p--;
> +    }
> +
> +    return p - dst;
> +}
> +.EE
> +.in
>  .SH RETURN VALUE
>  .TP
>  .BR strncpy ()
> 
> 
> However, note how many branches we need to make a function that handles
> all corner cases.  Is it still faster than strnlen+memcpy?  stpncpy must
> be heavily optimized for that.  Also, strnlen(3) might be optimized out
> by the compiler in many cases, so maybe in real code it would be better
> to use memcpy.  I'd very much like to see some numbers.

A benchmark test would show performance. Can't be that many lines of code in a loop to measure this.

strnlen_s is in the C standard Annex K, but strnlen didn't make it in yet, even C23.

Kind regards
Jonny

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-10 22:14                                 ` Paul Eggert
@ 2023-11-11 21:13                                   ` Alejandro Colomar
  2023-11-11 22:20                                     ` Paul Eggert
  2023-11-12  9:52                                     ` Jonny Grant
  0 siblings, 2 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-11 21:13 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Jonny Grant, Matthew House, linux-man, GNU C Library

[-- Attachment #1: Type: text/plain, Size: 6922 bytes --]

Hi Paul,

On Fri, Nov 10, 2023 at 02:14:13PM -0800, Paul Eggert wrote:
> On 2023-11-10 11:52, Alejandro Colomar wrote:
> 
> > Do you have any numbers?
> 
> It depends on size of course. With programs like 'tar' (one of the few
> programs that actually needs something like strncpy) the destination buffer
> is usually fairly small (32 bytes or less) though some of them are 100
> bytes. I used 16 bytes in the following shell transcript:
> 
> $ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy strlcpy; do echo;
> echo $i:; time ./a.out 16 100000000 abcdefghijk $i; done
> 
> strnlen+strcpy:
> 
> real	0m0.411s
> user	0m0.411s
> sys	0m0.000s
> 
> strnlen+memcpy:
> 
> real	0m0.392s
> user	0m0.388s
> sys	0m0.004s
> 
> strncpy:
> 
> real	0m0.300s
> user	0m0.300s
> sys	0m0.000s
> 
> stpncpy:
> 
> real	0m0.326s
> user	0m0.326s
> sys	0m0.000s
> 
> strlcpy:
> 
> real	0m0.623s
> user	0m0.623s
> sys	0m0.000s
> 
> 
> ... where a.out was generated by compiling the attached program with gcc -O2
> on Ubuntu 23.10 64-bit on a Xeon W-1350.
> 
> I wouldn't take these numbers all that seriously, as microbenchmarks like
> these are not that informative these days. Still, for a typical case one
> should not assume strncpy must be slower merely because it has more work to
> do; quite the contrary.

Thanks for the benchmarck!  Yeah, I won't take it as the last word, but
it shows the growth order (and its cause) of the different alternatives.

I'd like to point out some curious things about it:

-  strnlen+strcpy is slower than strnlen+memcpy.

   The compiler has all the information necessary here, so I don't see
   why it's not optimizing out the strcpy(3) into a simple memcpy(3).
   AFAICS, it's a missed optimization.  Even with -O3, it misses the
   optimization.

-  strncpy is slower than stpncpy in my computer.

   stpncpy is in fact the fastest call in my computer.

   Was strncpy(3) optimized in a recent version of glibc that you have?
   I'm using Debian Sid on an underclocked i9-13900T.  Or is it maybe
   just luck?  I'm curious.

	$ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy memccpy strlcpy; do
		echo; echo $i:;
		time ./a.out 16 100000000 abcdefghijk $i;
	  done;

	strnlen+strcpy:

	real	0m0.188s
	user	0m0.184s
	sys	0m0.004s

	strnlen+memcpy:

	real	0m0.148s
	user	0m0.148s
	sys	0m0.000s

	strncpy:

	real	0m0.157s
	user	0m0.157s
	sys	0m0.000s

	stpncpy:

	real	0m0.135s
	user	0m0.135s
	sys	0m0.000s

	memccpy:

	real	0m0.208s
	user	0m0.208s
	sys	0m0.000s

	strlcpy:

	real	0m0.322s
	user	0m0.322s
	sys	0m0.000s

-  strlcpy(3) is very heavy.  Much more than I expected.  See some tests
   with larger strings.  The main growth of strlcpy(3) comes from slen.

	$ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy memccpy strlcpy; do
		echo; echo $i:;
		time ./a.out 64 100000000 aaaabbbbaaaaccccaaaabbbbaaaadddd $i;
	  done;

	strnlen+strcpy:

	real	0m0.242s
	user	0m0.242s
	sys	0m0.000s

	strnlen+memcpy:

	real	0m0.190s
	user	0m0.186s
	sys	0m0.004s

	strncpy:

	real	0m0.174s
	user	0m0.173s
	sys	0m0.000s

	stpncpy:

	real	0m0.170s
	user	0m0.166s
	sys	0m0.004s

	memccpy:

	real	0m0.253s
	user	0m0.249s
	sys	0m0.004s

	strlcpy:

	real	0m1.385s
	user	0m1.385s
	sys	0m0.000s

-  strncpy(3) also gets heavy compared to strnlen+memcpy.
   Considering how small the difference with memcpy is for small
   strings, I wouldn't recommend it instead of memcpy, except for
   micro-optimizations.  The main growth of strncpy(3) comes from dsize.

	$ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy memccpy strlcpy; do
		echo; echo $i:;
		time ./a.out 256 100000000 aaaabbbbaaaaccccaaaabbbbaaaadddd $i;
	  done;

	strnlen+strcpy:

	real	0m0.234s
	user	0m0.233s
	sys	0m0.001s

	strnlen+memcpy:

	real	0m0.192s
	user	0m0.192s
	sys	0m0.000s

	strncpy:

	real	0m0.268s
	user	0m0.268s
	sys	0m0.000s

	stpncpy:

	real	0m0.267s
	user	0m0.267s
	sys	0m0.000s

	memccpy:

	real	0m0.257s
	user	0m0.256s
	sys	0m0.001s

	strlcpy:

	real	0m1.574s
	user	0m1.574s
	sys	0m0.000s

	$ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy memccpy strlcpy; do
		echo; echo $i:;
		time ./a.out 4096 100000000 aaaabbbbaaaaccccaaaabbbbaaaadddd $i;
	  done;

	strnlen+strcpy:

	real	0m0.227s
	user	0m0.227s
	sys	0m0.000s

	strnlen+memcpy:

	real	0m0.190s
	user	0m0.190s
	sys	0m0.000s

	strncpy:

	real	0m1.400s
	user	0m1.399s
	sys	0m0.000s

	stpncpy:

	real	0m1.398s
	user	0m1.398s
	sys	0m0.000s

	memccpy:

	real	0m0.256s
	user	0m0.256s
	sys	0m0.000s

	strlcpy:

	real	0m1.184s
	user	0m1.184s
	sys	0m0.000s


-  strnlen(3)+memcpy(3) becomes the fastest when dsize grows a bit over
   a few hundred bytes, and is only a few 10%'s slower than the fastest
   for smaller buffers.

   It is also the most semantically correct (together with
   strnlen+strcpy), avoiding unnecessary dead code (padding).  This
   should get the main backing from the manual pages.

   However, it can be useful to document typical alternatives to prevent
   mistakes from users.  Especially, since some micro-optimizations may
   favor uses of strncpy(3).

Cheers,
Alex   

> #include <stdlib.h>
> #include <string.h>
> 
> 
> int
> main (int argc, char **argv)
> {
>   if (argc != 5)
>     return 2;
>   long bufsize = atol (argv[1]);
>   char *buf = malloc (bufsize);
>   long n = atol (argv[2]);
>   char const *a = argv[3];
>   if (strcmp (argv[4], "strnlen+strcpy") == 0)
>     {
>       for (long i = 0; i < n; i++)
> 	{
> 	  if (strnlen (a, bufsize) == bufsize)
> 	    return 1;
> 	  strcpy (buf, a);
> 	}
>     }
>   else if (strcmp (argv[4], "strnlen+memcpy") == 0)
>     {
>       for (long i = 0; i < n; i++)
> 	{
> 	  size_t alen = strnlen (a, bufsize);
> 	  if (alen == bufsize)
> 	    return 1;
> 	  memcpy (buf, a, alen + 1);
> 	}
>     }
>   else if (strcmp (argv[4], "strncpy") == 0)
>     {
>       for (long i = 0; i < n; i++)
> 	if (strncpy (buf, a, bufsize)[bufsize - 1])
> 	  return 1;
>     }
>   else if (strcmp (argv[4], "stpncpy") == 0)
>     {
>       for (long i = 0; i < n; i++)
> 	if (stpncpy (buf, a, bufsize) == buf + bufsize)
> 	  return 1;
>     }

I've added the following one for completeness.  Especially now that
it'll be in C2x.

  else if (strcmp (argv[4], "memccpy") == 0)
    {
      for (long i = 0; i < n; i++)
	if (memccpy (buf, a, 0, bufsize) == NULL)
	  return 1;
    }

>   else if (strcmp (argv[4], "strlcpy") == 0)
>     {
>       for (long i = 0; i < n; i++)
> 	if (strlcpy (buf, a, bufsize) == bufsize)

This should have been >= bufsize, right?

> 	  return 1;
>     }
>   else
>     return 2;
> }


-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-11 21:13                                   ` Alejandro Colomar
@ 2023-11-11 22:20                                     ` Paul Eggert
  2023-11-12  9:52                                     ` Jonny Grant
  1 sibling, 0 replies; 77+ messages in thread
From: Paul Eggert @ 2023-11-11 22:20 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Jonny Grant, Matthew House, linux-man, GNU C Library

On 2023-11-11 13:13, Alejandro Colomar wrote:
>     Was strncpy(3) optimized in a recent version of glibc that you have?

Ubuntu 23.10 currently uses glibc 2.38-1ubuntu6. Fortification is on by 
default, so __builtin___strncpy_chk is involved.

Again, I wouldn't take these numbers too seriously. It's just a 
microbenchmark.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH 0/2] Expand BUGS section of string_copying(7).
       [not found] <cfbd8674-fe6a-4430-95f1-ec8bde7da32e@jguk.org>
       [not found] ` <ZUacobMq0l_O8gjg@debian>
@ 2023-11-12  9:17 ` Alejandro Colomar
  2023-11-12  9:18 ` [PATCH 1/2] string_copying.7: BUGS: *cat(3) functions aren't always bad Alejandro Colomar
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-12  9:17 UTC (permalink / raw)
  To: linux-man; +Cc: Alejandro Colomar, libc-alpha

[-- Attachment #1: Type: text/plain, Size: 462 bytes --]

Hi,

After Paul showing important problems of strlcpy(3) (and strlcat(3)),
I've written something in string_copying(7)'s BUGS to warn against them.

Cheers,
Alex

Alejandro Colomar (2):
  string_copying.7: BUGS: *cat(3) functions aren't always bad
  string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance
    problems

 man7/string_copying.7 | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

-- 
2.42.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH 1/2] string_copying.7: BUGS: *cat(3) functions aren't always bad
       [not found] <cfbd8674-fe6a-4430-95f1-ec8bde7da32e@jguk.org>
       [not found] ` <ZUacobMq0l_O8gjg@debian>
  2023-11-12  9:17 ` [PATCH 0/2] Expand BUGS section of string_copying(7) Alejandro Colomar
@ 2023-11-12  9:18 ` Alejandro Colomar
  2023-11-12  9:18 ` [PATCH 2/2] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems Alejandro Colomar
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-12  9:18 UTC (permalink / raw)
  To: linux-man
  Cc: Alejandro Colomar, libc-alpha, Paul Eggert, Jonny Grant,
	DJ Delorie, Matthew House, Oskari Pirhonen, Thorsten Kukuk,
	Adhemerval Zanella Netto, Zack Weinberg, G. Branden Robinson,
	Carlos O'Donell, Xi Ruoyao, Stefan Puiu, Andreas Schwab

[-- Attachment #1: Type: text/plain, Size: 1736 bytes --]

The compiler will sometimes optimize them to normal *cpy(3) functions,
since the length of dst is usually known, if the previous *cpy(3) is
visible to the compiler.  And they provide for cleaner code.  If you
know that they'll get optimized, you could use them.

Cc: Paul Eggert <eggert@cs.ucla.edu>
Cc: Jonny Grant <jg@jguk.org>
Cc: DJ Delorie <dj@redhat.com>
Cc: Matthew House <mattlloydhouse@gmail.com>
Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com>
Cc: Thorsten Kukuk <kukuk@suse.com>
Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Zack Weinberg <zack@owlfolio.org>
Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Carlos O'Donell <carlos@redhat.com>
Cc: Xi Ruoyao <xry111@xry111.site>
Cc: Stefan Puiu <stefan.puiu@gmail.com>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
 man7/string_copying.7 | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/man7/string_copying.7 b/man7/string_copying.7
index 1637ebc91..0254fbba6 100644
--- a/man7/string_copying.7
+++ b/man7/string_copying.7
@@ -592,8 +592,14 @@ .SH BUGS
 All catenation functions share the same performance problem:
 .UR https://www.joelonsoftware.com/\:2001/12/11/\:back\-to\-basics/
 Shlemiel the painter
 .UE .
+As a mitigation,
+compilers are able to transform some calls to catenation functions
+into normal copy functions,
+since
+.I strlen(dst)
+is usually a byproduct of the previous copy.
 .\" ----- EXAMPLES :: -------------------------------------------------/
 .SH EXAMPLES
 The following are examples of correct use of each of these functions.
 .\" ----- EXAMPLES :: stpcpy(3) ---------------------------------------/
-- 
2.42.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH 2/2] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems
       [not found] <cfbd8674-fe6a-4430-95f1-ec8bde7da32e@jguk.org>
                   ` (2 preceding siblings ...)
  2023-11-12  9:18 ` [PATCH 1/2] string_copying.7: BUGS: *cat(3) functions aren't always bad Alejandro Colomar
@ 2023-11-12  9:18 ` Alejandro Colomar
  2023-11-12 11:26 ` [PATCH v2 0/3] Improve string_copying(7) Alejandro Colomar
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-12  9:18 UTC (permalink / raw)
  To: linux-man
  Cc: Alejandro Colomar, libc-alpha, Paul Eggert, Jonny Grant,
	DJ Delorie, Matthew House, Oskari Pirhonen, Thorsten Kukuk,
	Adhemerval Zanella Netto, Zack Weinberg, G. Branden Robinson,
	Carlos O'Donell, Xi Ruoyao, Stefan Puiu, Andreas Schwab

[-- Attachment #1: Type: text/plain, Size: 2593 bytes --]

Also point to BUGS from other sections that talk about these functions.

These functions are doomed due to the design decision of mirroring
snprintf(3)'s return value.  They must return strlen(src), which makes
them terribly slow, and vulnerable to DoS if an attacker can control
strlen(src).

A better design would have been to return -1 when truncating.

Reported-by: Paul Eggert <eggert@cs.ucla.edu>
Cc: Jonny Grant <jg@jguk.org>
Cc: DJ Delorie <dj@redhat.com>
Cc: Matthew House <mattlloydhouse@gmail.com>
Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com>
Cc: Thorsten Kukuk <kukuk@suse.com>
Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Zack Weinberg <zack@owlfolio.org>
Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Carlos O'Donell <carlos@redhat.com>
Cc: Xi Ruoyao <xry111@xry111.site>
Cc: Stefan Puiu <stefan.puiu@gmail.com>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
 man7/string_copying.7 | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/man7/string_copying.7 b/man7/string_copying.7
index 0254fbba6..cb3910db0 100644
--- a/man7/string_copying.7
+++ b/man7/string_copying.7
@@ -226,9 +226,9 @@ .SS Truncate or not?
 .IP \[bu]
 .BR strlcpy (3bsd)
 and
 .BR strlcat (3bsd)
-are similar, but less efficient when chained.
+are similar, but have important performance problems; see BUGS.
 .IP \[bu]
 .BR stpncpy (3)
 and
 .BR strncpy (3)
@@ -417,8 +417,10 @@ .SS Functions
 the resulting string is truncated
 (but it is guaranteed to be null-terminated).
 They return the length of the total string they tried to create.
 .IP
+Check BUGS before using these functions.
+.IP
 .BR stpecpy (3)
 is a simpler alternative to these functions.
 .\" ----- DESCRIPTION :: Functions :: stpncpy(3) ----------------------/
 .TP
@@ -598,8 +600,22 @@ .SH BUGS
 into normal copy functions,
 since
 .I strlen(dst)
 is usually a byproduct of the previous copy.
+.P
+.BR strlcpy (3)
+and
+.BR strlcat (3)
+need to read the entire
+.I src
+string,
+even if the destination buffer is small.
+This makes them vulnerable to Denial of Service (DoS) attacks
+if an attacker can control the length of the
+.I src
+string.
+And if not,
+they're still unnecessarily slow.
 .\" ----- EXAMPLES :: -------------------------------------------------/
 .SH EXAMPLES
 The following are examples of correct use of each of these functions.
 .\" ----- EXAMPLES :: stpcpy(3) ---------------------------------------/
-- 
2.42.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-11 21:13                                   ` Alejandro Colomar
  2023-11-11 22:20                                     ` Paul Eggert
@ 2023-11-12  9:52                                     ` Jonny Grant
  2023-11-12 10:59                                       ` Alejandro Colomar
  1 sibling, 1 reply; 77+ messages in thread
From: Jonny Grant @ 2023-11-12  9:52 UTC (permalink / raw)
  To: Alejandro Colomar, Paul Eggert; +Cc: Matthew House, linux-man, GNU C Library



On 11/11/2023 21:13, Alejandro Colomar wrote:
> Hi Paul,
> 
> On Fri, Nov 10, 2023 at 02:14:13PM -0800, Paul Eggert wrote:
>> On 2023-11-10 11:52, Alejandro Colomar wrote:
>>
>>> Do you have any numbers?
>>
>> It depends on size of course. With programs like 'tar' (one of the few
>> programs that actually needs something like strncpy) the destination buffer
>> is usually fairly small (32 bytes or less) though some of them are 100
>> bytes. I used 16 bytes in the following shell transcript:
>>
>> $ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy strlcpy; do echo;
>> echo $i:; time ./a.out 16 100000000 abcdefghijk $i; done
>>
>> strnlen+strcpy:
>>
>> real	0m0.411s
>> user	0m0.411s
>> sys	0m0.000s
>>
>> strnlen+memcpy:
>>
>> real	0m0.392s
>> user	0m0.388s
>> sys	0m0.004s
>>
>> strncpy:
>>
>> real	0m0.300s
>> user	0m0.300s
>> sys	0m0.000s
>>
>> stpncpy:
>>
>> real	0m0.326s
>> user	0m0.326s
>> sys	0m0.000s
>>
>> strlcpy:
>>
>> real	0m0.623s
>> user	0m0.623s
>> sys	0m0.000s
>>
>>
>> ... where a.out was generated by compiling the attached program with gcc -O2
>> on Ubuntu 23.10 64-bit on a Xeon W-1350.
>>
>> I wouldn't take these numbers all that seriously, as microbenchmarks like
>> these are not that informative these days. Still, for a typical case one
>> should not assume strncpy must be slower merely because it has more work to
>> do; quite the contrary.
> 
> Thanks for the benchmarck!  Yeah, I won't take it as the last word, but
> it shows the growth order (and its cause) of the different alternatives.
> 
> I'd like to point out some curious things about it:
> 
> -  strnlen+strcpy is slower than strnlen+memcpy.
> 
>    The compiler has all the information necessary here, so I don't see
>    why it's not optimizing out the strcpy(3) into a simple memcpy(3).
>    AFAICS, it's a missed optimization.  Even with -O3, it misses the
>    optimization.
> 
> -  strncpy is slower than stpncpy in my computer.
> 
>    stpncpy is in fact the fastest call in my computer.
> 
>    Was strncpy(3) optimized in a recent version of glibc that you have?
>    I'm using Debian Sid on an underclocked i9-13900T.  Or is it maybe
>    just luck?  I'm curious.
> 
> 	$ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy memccpy strlcpy; do
> 		echo; echo $i:;
> 		time ./a.out 16 100000000 abcdefghijk $i;
> 	  done;
> 
> 	strnlen+strcpy:
> 
> 	real	0m0.188s
> 	user	0m0.184s
> 	sys	0m0.004s
> 
> 	strnlen+memcpy:
> 
> 	real	0m0.148s
> 	user	0m0.148s
> 	sys	0m0.000s
> 
> 	strncpy:
> 
> 	real	0m0.157s
> 	user	0m0.157s
> 	sys	0m0.000s
> 
> 	stpncpy:
> 
> 	real	0m0.135s
> 	user	0m0.135s
> 	sys	0m0.000s
> 
> 	memccpy:
> 
> 	real	0m0.208s
> 	user	0m0.208s
> 	sys	0m0.000s
> 
> 	strlcpy:
> 
> 	real	0m0.322s
> 	user	0m0.322s
> 	sys	0m0.000s
> 
> -  strlcpy(3) is very heavy.  Much more than I expected.  See some tests
>    with larger strings.  The main growth of strlcpy(3) comes from slen.
> 
> 	$ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy memccpy strlcpy; do
> 		echo; echo $i:;
> 		time ./a.out 64 100000000 aaaabbbbaaaaccccaaaabbbbaaaadddd $i;
> 	  done;
> 
> 	strnlen+strcpy:
> 
> 	real	0m0.242s
> 	user	0m0.242s
> 	sys	0m0.000s
> 
> 	strnlen+memcpy:
> 
> 	real	0m0.190s
> 	user	0m0.186s
> 	sys	0m0.004s
> 
> 	strncpy:
> 
> 	real	0m0.174s
> 	user	0m0.173s
> 	sys	0m0.000s
> 
> 	stpncpy:
> 
> 	real	0m0.170s
> 	user	0m0.166s
> 	sys	0m0.004s
> 
> 	memccpy:
> 
> 	real	0m0.253s
> 	user	0m0.249s
> 	sys	0m0.004s
> 
> 	strlcpy:
> 
> 	real	0m1.385s
> 	user	0m1.385s
> 	sys	0m0.000s
> 
> -  strncpy(3) also gets heavy compared to strnlen+memcpy.
>    Considering how small the difference with memcpy is for small
>    strings, I wouldn't recommend it instead of memcpy, except for
>    micro-optimizations.  The main growth of strncpy(3) comes from dsize.
> 
> 	$ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy memccpy strlcpy; do
> 		echo; echo $i:;
> 		time ./a.out 256 100000000 aaaabbbbaaaaccccaaaabbbbaaaadddd $i;
> 	  done;
> 
> 	strnlen+strcpy:
> 
> 	real	0m0.234s
> 	user	0m0.233s
> 	sys	0m0.001s
> 
> 	strnlen+memcpy:
> 
> 	real	0m0.192s
> 	user	0m0.192s
> 	sys	0m0.000s
> 
> 	strncpy:
> 
> 	real	0m0.268s
> 	user	0m0.268s
> 	sys	0m0.000s
> 
> 	stpncpy:
> 
> 	real	0m0.267s
> 	user	0m0.267s
> 	sys	0m0.000s
> 
> 	memccpy:
> 
> 	real	0m0.257s
> 	user	0m0.256s
> 	sys	0m0.001s
> 
> 	strlcpy:
> 
> 	real	0m1.574s
> 	user	0m1.574s
> 	sys	0m0.000s
> 
> 	$ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy memccpy strlcpy; do
> 		echo; echo $i:;
> 		time ./a.out 4096 100000000 aaaabbbbaaaaccccaaaabbbbaaaadddd $i;
> 	  done;
> 
> 	strnlen+strcpy:
> 
> 	real	0m0.227s
> 	user	0m0.227s
> 	sys	0m0.000s
> 
> 	strnlen+memcpy:
> 
> 	real	0m0.190s
> 	user	0m0.190s
> 	sys	0m0.000s
> 
> 	strncpy:
> 
> 	real	0m1.400s
> 	user	0m1.399s
> 	sys	0m0.000s
> 
> 	stpncpy:
> 
> 	real	0m1.398s
> 	user	0m1.398s
> 	sys	0m0.000s
> 
> 	memccpy:
> 
> 	real	0m0.256s
> 	user	0m0.256s
> 	sys	0m0.000s
> 
> 	strlcpy:
> 
> 	real	0m1.184s
> 	user	0m1.184s
> 	sys	0m0.000s
> 
> 
> -  strnlen(3)+memcpy(3) becomes the fastest when dsize grows a bit over
>    a few hundred bytes, and is only a few 10%'s slower than the fastest
>    for smaller buffers.
> 
>    It is also the most semantically correct (together with
>    strnlen+strcpy), avoiding unnecessary dead code (padding).  This
>    should get the main backing from the manual pages.
> 
>    However, it can be useful to document typical alternatives to prevent
>    mistakes from users.  Especially, since some micro-optimizations may
>    favor uses of strncpy(3).
> 
> Cheers,
> Alex   
> 
>> #include <stdlib.h>
>> #include <string.h>
>>
>>
>> int
>> main (int argc, char **argv)
>> {
>>   if (argc != 5)
>>     return 2;
>>   long bufsize = atol (argv[1]);
>>   char *buf = malloc (bufsize);
>>   long n = atol (argv[2]);
>>   char const *a = argv[3];
>>   if (strcmp (argv[4], "strnlen+strcpy") == 0)
>>     {
>>       for (long i = 0; i < n; i++)
>> 	{
>> 	  if (strnlen (a, bufsize) == bufsize)
>> 	    return 1;
>> 	  strcpy (buf, a);
>> 	}
>>     }
>>   else if (strcmp (argv[4], "strnlen+memcpy") == 0)
>>     {
>>       for (long i = 0; i < n; i++)
>> 	{
>> 	  size_t alen = strnlen (a, bufsize);
>> 	  if (alen == bufsize)
>> 	    return 1;
>> 	  memcpy (buf, a, alen + 1);
>> 	}
>>     }
>>   else if (strcmp (argv[4], "strncpy") == 0)
>>     {
>>       for (long i = 0; i < n; i++)
>> 	if (strncpy (buf, a, bufsize)[bufsize - 1])
>> 	  return 1;
>>     }
>>   else if (strcmp (argv[4], "stpncpy") == 0)
>>     {
>>       for (long i = 0; i < n; i++)
>> 	if (stpncpy (buf, a, bufsize) == buf + bufsize)
>> 	  return 1;
>>     }
> 
> I've added the following one for completeness.  Especially now that
> it'll be in C2x.
> 
>   else if (strcmp (argv[4], "memccpy") == 0)
>     {
>       for (long i = 0; i < n; i++)
> 	if (memccpy (buf, a, 0, bufsize) == NULL)
> 	  return 1;
>     }
> 
>>   else if (strcmp (argv[4], "strlcpy") == 0)
>>     {
>>       for (long i = 0; i < n; i++)
>> 	if (strlcpy (buf, a, bufsize) == bufsize)
> 
> This should have been >= bufsize, right?
> 
>> 	  return 1;
>>     }
>>   else
>>     return 2;
>> }
> 
> 

Maybe we're gonna need a bigger benchmark.

Probably there existing studies. Or could patch something like SQLite Benchmark to utilise each string function just for measurements. Hopefully it moves around at least 2GB of strings to give some meaningful comparison timings.

As Paul mentioned, strlcpy is a poor choice for processing strings. Could rely on their guidance as they already measured.
https://www.gnu.org/software/libc/manual/html_node/Truncating-Strings.html

Maybe the strlcpy API is easier, safer for programmers; but the compiler can't figure out that the programmer already knew src string length. So the strlcpy does a strlen() and wastes time reading over memory. If the src length is known, can just memcpy.


When I've benchmarked things, reducing the memory accesses for read, write boosted performance, also looked at the cycles taken, of course cache and alignment all play a part too.

Maybe could suggest in your man page programmers should keep track of the src size ? - to save the cost of the strlen().

At least the strlen functions are optimized:
glibc/strnlen.c calls memchr() searching for '\0' memchr searches 4 bytes at a time.
glibc/strlen.c searches 4 bytes at a time.

glibc/strlcpy.c __strlcpy() is there a reason when truncating it overwrites the last byte, twice?

memcpy (dest, src, size);
dest[size - 1] = '\0';

Kind regards, Jonny

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-12  9:52                                     ` Jonny Grant
@ 2023-11-12 10:59                                       ` Alejandro Colomar
  0 siblings, 0 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-12 10:59 UTC (permalink / raw)
  To: Jonny Grant; +Cc: Paul Eggert, Matthew House, linux-man, GNU C Library

[-- Attachment #1: Type: text/plain, Size: 4572 bytes --]

Hi Jonny,

On Sun, Nov 12, 2023 at 09:52:20AM +0000, Jonny Grant wrote:
[... some micro-benchmarks...]

> 
> Maybe we're gonna need a bigger benchmark.

Not really.

> 
> Probably there existing studies. Or could patch something like SQLite
> Benchmark to utilise each string function just for measurements.
> Hopefully it moves around at least 2GB of strings to give some
> meaningful comparison timings.

I wasn't so interested in the small differences between functions.
What this micro-benchmark showed clearly, without needing much more info
to be conclusive, is the first order of growth of each of the functions:

-  strlcpy(3)'s first order growth corresponds to strlen(src).  That's
   due to returning strlen(src), which proves to be a poor API.

-  strncpy(3)'s first order growth corresponds to sizeof(dst).  That's
   of course due to the zeroing.  If sizeof(dst) is kept very small, you
   could live with it.  When the size grows to more or less 4 KiB, this
   drag becomes meaningful.

-  strnlen(3)+*cpy() first order growth corresponds to
   strnlen(src, sizeof(dst)), which is the fastest order of growth
   you can get from a truncating string-copying function (except if you
   keep track of your slen manually and call directly memcpy(3)).

Of course, first order of growth ignores second order of growth and so
on, which for small inputs can be important.  That is, O(x^3) is bigger
than O(x^2), but x3 + x2 can be smaller than 5*x2 for small x.

> 
> As Paul mentioned, strlcpy is a poor choice for processing strings.\
> Could rely on their guidance as they already measured.
> https://www.gnu.org/software/libc/manual/html_node/Truncating-Strings.html

Indeed.  I've added important notices in BUGS about it, and recommended
against.

> 
> Maybe the strlcpy API is easier, safer for programmers; but the
> compiler can't figure out that the programmer already knew src string
> length.  So the strlcpy does a strlen() and wastes time reading over
> memory.  If the src length is known, can just memcpy.

I've written strtcpy(3) as an alternative to strlcpy(3) that doesn't
suffer its problems.  It should be even safer and easier to use, and its
first order of growth is better.  I'll send a patch for review in a
moment.

> When I've benchmarked things, reducing the memory accesses for read,
> write boosted performance, also looked at the cycles taken, of course
> cache and alignment all play a part too.

If one wants to micro-optimize for their use case, its none of my
business.  I provide a function that should be safe and relatively fast
for all use cases, which libc doesn't.

> Maybe could suggest in your man page programmers should keep track of
> the src size ? - to save the cost of the strlen().

No.  Optimizations are not my business.  Writing good APIs should make
these optimizations low value so that they aren't done, except for the
most performance-critical programs.

The problem comes when libc doesn't provide anything usable, and the
user has no guidance on where to start.  Then, programmers start being
clever, usually too clever.  That's why I think the man-pages should go
ahead and write wrapper functions such as strtcpy() and stpecpy()
aound libc functions; these wrappers should provide a fast and safe
starting point for most programs.

It's true that memcpy(3) is the fastest function one can use, but it
requires the programmer to be rather careful with the lengths of the
strings.  I don't think keeping track of all those little details is
what the common programmer should do.

> 
> At least the strlen functions are optimized:
> glibc/strnlen.c calls memchr() searching for '\0' memchr searches 4 bytes at a time.
> glibc/strlen.c searches 4 bytes at a time.
> 
> glibc/strlcpy.c __strlcpy() is there a reason when truncating it overwrites the last byte, twice?
> 
> memcpy (dest, src, size);
> dest[size - 1] = '\0';

-1's in the source code make up for off-by-one bugs.  APIs should be
written so that common use doesn't involve manually writing -1 if
possible.

I acknowledge the performance benefits of this construction, and have
used it myself in NGINX code, but I also find it very dangerous, which
is why I recommend using a wrapper over it:

	char *
	ustr2stp(char *restrict dst, const char *restrict src, size_t len)
	{
		char  *p;

		p = mempcpy(dst, src, len);
		*p = '\0';

		return p;
	}

Cheers,
Alex

> 
> Kind regards, Jonny

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH v2 0/3] Improve string_copying(7)
       [not found] <cfbd8674-fe6a-4430-95f1-ec8bde7da32e@jguk.org>
                   ` (3 preceding siblings ...)
  2023-11-12  9:18 ` [PATCH 2/2] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems Alejandro Colomar
@ 2023-11-12 11:26 ` Alejandro Colomar
  2023-11-12 11:26 ` [PATCH v2 1/3] string_copying.7: BUGS: *cat(3) functions aren't always bad Alejandro Colomar
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-12 11:26 UTC (permalink / raw)
  To: linux-man, Guillem Jover
  Cc: Alejandro Colomar, libc-alpha, Paul Eggert, Jonny Grant,
	DJ Delorie, Matthew House, Oskari Pirhonen, Thorsten Kukuk,
	Adhemerval Zanella Netto, Zack Weinberg, G. Branden Robinson,
	Carlos O'Donell, Xi Ruoyao, Stefan Puiu, Andreas Schwab

[-- Attachment #1: Type: text/plain, Size: 907 bytes --]


Hi,

v3:

-  Patches 1/3 and 2/3 are identical to v2, except that I CCd libbsd's
   maintainer (Guillem) in 2/3 so he's aware that we're documenting BUGS
   for strlcpy(3).  Since the strlcpy(3bsd) manual page is part of
   libbsd, it may be interesting to also add a BUGS section in that
   page.

-  Add 3/3, which adds strtcpy(3), a function almost identical to
   strscpy(9), and very similar to strlcpy(3), which doesn't share its
   bugs.

Cheers,
Alex

Alejandro Colomar (3):
  string_copying.7: BUGS: *cat(3) functions aren't always bad
  string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance
    problems
  strtcpy.3, string_copying.7: Add strtcpy(3)

 man3/strtcpy.3        |   1 +
 man7/string_copying.7 | 121 +++++++++++++++++++++++++++++++-----------
 2 files changed, 92 insertions(+), 30 deletions(-)
 create mode 100644 man3/strtcpy.3

-- 
2.42.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH v2 1/3] string_copying.7: BUGS: *cat(3) functions aren't always bad
       [not found] <cfbd8674-fe6a-4430-95f1-ec8bde7da32e@jguk.org>
                   ` (4 preceding siblings ...)
  2023-11-12 11:26 ` [PATCH v2 0/3] Improve string_copying(7) Alejandro Colomar
@ 2023-11-12 11:26 ` Alejandro Colomar
  2023-11-12 11:26 ` [PATCH v2 2/3] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems Alejandro Colomar
  2023-11-12 11:27 ` [PATCH v2 3/3] strtcpy.3, string_copying.7: Add strtcpy(3) Alejandro Colomar
  7 siblings, 0 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-12 11:26 UTC (permalink / raw)
  To: linux-man
  Cc: Alejandro Colomar, libc-alpha, Guillem Jover, Paul Eggert,
	Jonny Grant, DJ Delorie, Matthew House, Oskari Pirhonen,
	Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg,
	G. Branden Robinson, Carlos O'Donell, Xi Ruoyao, Stefan Puiu,
	Andreas Schwab

[-- Attachment #1: Type: text/plain, Size: 1736 bytes --]

The compiler will sometimes optimize them to normal *cpy(3) functions,
since the length of dst is usually known, if the previous *cpy(3) is
visible to the compiler.  And they provide for cleaner code.  If you
know that they'll get optimized, you could use them.

Cc: Paul Eggert <eggert@cs.ucla.edu>
Cc: Jonny Grant <jg@jguk.org>
Cc: DJ Delorie <dj@redhat.com>
Cc: Matthew House <mattlloydhouse@gmail.com>
Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com>
Cc: Thorsten Kukuk <kukuk@suse.com>
Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Zack Weinberg <zack@owlfolio.org>
Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Carlos O'Donell <carlos@redhat.com>
Cc: Xi Ruoyao <xry111@xry111.site>
Cc: Stefan Puiu <stefan.puiu@gmail.com>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
 man7/string_copying.7 | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/man7/string_copying.7 b/man7/string_copying.7
index 1637ebc91..0254fbba6 100644
--- a/man7/string_copying.7
+++ b/man7/string_copying.7
@@ -592,8 +592,14 @@ .SH BUGS
 All catenation functions share the same performance problem:
 .UR https://www.joelonsoftware.com/\:2001/12/11/\:back\-to\-basics/
 Shlemiel the painter
 .UE .
+As a mitigation,
+compilers are able to transform some calls to catenation functions
+into normal copy functions,
+since
+.I strlen(dst)
+is usually a byproduct of the previous copy.
 .\" ----- EXAMPLES :: -------------------------------------------------/
 .SH EXAMPLES
 The following are examples of correct use of each of these functions.
 .\" ----- EXAMPLES :: stpcpy(3) ---------------------------------------/
-- 
2.42.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH v2 2/3] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems
       [not found] <cfbd8674-fe6a-4430-95f1-ec8bde7da32e@jguk.org>
                   ` (5 preceding siblings ...)
  2023-11-12 11:26 ` [PATCH v2 1/3] string_copying.7: BUGS: *cat(3) functions aren't always bad Alejandro Colomar
@ 2023-11-12 11:26 ` Alejandro Colomar
  2023-11-12 11:27 ` [PATCH v2 3/3] strtcpy.3, string_copying.7: Add strtcpy(3) Alejandro Colomar
  7 siblings, 0 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-12 11:26 UTC (permalink / raw)
  To: linux-man
  Cc: Alejandro Colomar, libc-alpha, Guillem Jover, Paul Eggert,
	Jonny Grant, DJ Delorie, Matthew House, Oskari Pirhonen,
	Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg,
	G. Branden Robinson, Carlos O'Donell, Xi Ruoyao, Stefan Puiu,
	Andreas Schwab

[-- Attachment #1: Type: text/plain, Size: 2634 bytes --]

Also point to BUGS from other sections that talk about these functions.

These functions are doomed due to the design decision of mirroring
snprintf(3)'s return value.  They must return strlen(src), which makes
them terribly slow, and vulnerable to DoS if an attacker can control
strlen(src).

A better design would have been to return -1 when truncating.

Reported-by: Paul Eggert <eggert@cs.ucla.edu>
Cc: Jonny Grant <jg@jguk.org>
Cc: DJ Delorie <dj@redhat.com>
Cc: Matthew House <mattlloydhouse@gmail.com>
Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com>
Cc: Thorsten Kukuk <kukuk@suse.com>
Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Zack Weinberg <zack@owlfolio.org>
Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Carlos O'Donell <carlos@redhat.com>
Cc: Xi Ruoyao <xry111@xry111.site>
Cc: Stefan Puiu <stefan.puiu@gmail.com>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Cc: Guillem Jover <guillem@hadrons.org>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
 man7/string_copying.7 | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/man7/string_copying.7 b/man7/string_copying.7
index 0254fbba6..cb3910db0 100644
--- a/man7/string_copying.7
+++ b/man7/string_copying.7
@@ -226,9 +226,9 @@ .SS Truncate or not?
 .IP \[bu]
 .BR strlcpy (3bsd)
 and
 .BR strlcat (3bsd)
-are similar, but less efficient when chained.
+are similar, but have important performance problems; see BUGS.
 .IP \[bu]
 .BR stpncpy (3)
 and
 .BR strncpy (3)
@@ -417,8 +417,10 @@ .SS Functions
 the resulting string is truncated
 (but it is guaranteed to be null-terminated).
 They return the length of the total string they tried to create.
 .IP
+Check BUGS before using these functions.
+.IP
 .BR stpecpy (3)
 is a simpler alternative to these functions.
 .\" ----- DESCRIPTION :: Functions :: stpncpy(3) ----------------------/
 .TP
@@ -598,8 +600,22 @@ .SH BUGS
 into normal copy functions,
 since
 .I strlen(dst)
 is usually a byproduct of the previous copy.
+.P
+.BR strlcpy (3)
+and
+.BR strlcat (3)
+need to read the entire
+.I src
+string,
+even if the destination buffer is small.
+This makes them vulnerable to Denial of Service (DoS) attacks
+if an attacker can control the length of the
+.I src
+string.
+And if not,
+they're still unnecessarily slow.
 .\" ----- EXAMPLES :: -------------------------------------------------/
 .SH EXAMPLES
 The following are examples of correct use of each of these functions.
 .\" ----- EXAMPLES :: stpcpy(3) ---------------------------------------/
-- 
2.42.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH v2 3/3] strtcpy.3, string_copying.7: Add strtcpy(3)
       [not found] <cfbd8674-fe6a-4430-95f1-ec8bde7da32e@jguk.org>
                   ` (6 preceding siblings ...)
  2023-11-12 11:26 ` [PATCH v2 2/3] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems Alejandro Colomar
@ 2023-11-12 11:27 ` Alejandro Colomar
  7 siblings, 0 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-12 11:27 UTC (permalink / raw)
  To: linux-man
  Cc: Alejandro Colomar, libc-alpha, Guillem Jover, Paul Eggert,
	Jonny Grant, DJ Delorie, Matthew House, Oskari Pirhonen,
	Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg,
	G. Branden Robinson, Carlos O'Donell, Xi Ruoyao, Stefan Puiu,
	Andreas Schwab

[-- Attachment #1: Type: text/plain, Size: 7496 bytes --]

Add this new truncating string-copying function.  It intends to fully
replace strlcpy(3), which has important bugs (documented in the
preceeding commit).

It is almost identical to Linux kernel's strscpy(9), so reduce the
documentation of strscpy(9) in this page to the minimum, giving
preference to strtcpy(3).  Provide a reference implementation, since no
libc provides it.

Providing an easy, safe, and relatively fast truncating string-copying
function should prevent users from rolling their own, in which they
might introduce bugs accidentally.  We already made enough mistakes
while discussing these functions, so it's certainly not something that
should be written often.

Cc: Paul Eggert <eggert@cs.ucla.edu>
Cc: Jonny Grant <jg@jguk.org>
Cc: DJ Delorie <dj@redhat.com>
Cc: Matthew House <mattlloydhouse@gmail.com>
Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com>
Cc: Thorsten Kukuk <kukuk@suse.com>
Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Zack Weinberg <zack@owlfolio.org>
Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Carlos O'Donell <carlos@redhat.com>
Cc: Xi Ruoyao <xry111@xry111.site>
Cc: Stefan Puiu <stefan.puiu@gmail.com>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Cc: Guillem Jover <guillem@hadrons.org>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
 man3/strtcpy.3        |  1 +
 man7/string_copying.7 | 97 ++++++++++++++++++++++++++++++-------------
 2 files changed, 69 insertions(+), 29 deletions(-)
 create mode 100644 man3/strtcpy.3

diff --git a/man3/strtcpy.3 b/man3/strtcpy.3
new file mode 100644
index 000000000..beb850746
--- /dev/null
+++ b/man3/strtcpy.3
@@ -0,0 +1 @@
+.so man7/string_copying.7
diff --git a/man7/string_copying.7 b/man7/string_copying.7
index cb3910db0..4f609e480 100644
--- a/man7/string_copying.7
+++ b/man7/string_copying.7
@@ -6,8 +6,9 @@
 .\" ----- NAME :: -----------------------------------------------------/
 .SH NAME
 stpcpy,
 strcpy, strcat,
+strtcpy,
 stpecpy,
 strlcpy, strlcat,
 stpncpy,
 strncpy,
@@ -30,8 +31,11 @@ .SS Strings
 // Chain-copy a string with truncation.
 .BI "char *stpecpy(char *" dst ", char " end "[0], const char *restrict " src );
 .P
 // Copy/catenate a string with truncation.
+.BI "size_t strtcpy(char " dst "[restrict ." sz "], \
+const char *restrict " src ,
+.BI "               size_t " sz );
 .BI "size_t strlcpy(char " dst "[restrict ." sz "], \
 const char *restrict " src ,
 .BI "               size_t " sz );
 .BI "size_t strlcat(char " dst "[restrict ." sz "], \
@@ -220,10 +224,10 @@ .SS Truncate or not?
 .P
 Functions that truncate:
 .IP \[bu] 3
 .BR stpecpy (3)
-is the most efficient string copy function that performs truncation.
-It only requires to check for truncation once after all chained calls.
+.IP \[bu]
+.BR strtcpy (3)
 .IP \[bu]
 .BR strlcpy (3bsd)
 and
 .BR strlcat (3bsd)
@@ -326,8 +330,10 @@ .SS String vs character sequence
 .IP \[bu]
 .BR strcpy (3),
 .BR strcat (3)
 .IP \[bu]
+.BR strtcpy (3)
+.IP \[bu]
 .BR stpecpy (3)
 .IP \[bu]
 .BR strlcpy (3bsd),
 .BR strlcat (3bsd)
@@ -390,12 +396,24 @@ .SS Functions
 The return value is useless.
 .IP
 .BR stpcpy (3)
 is a faster alternative to these functions.
+.\" ----- DESCRIPTION :: Functions :: strtcpy(3) ----------------------/
+.TP
+.BR strtcpy (3)
+Copy the input string into a destination string.
+If the destination buffer isn't large enough to hold the copy,
+the resulting string is truncated
+(but it is guaranteed to be null-terminated).
+It returns the length of the string,
+or \-1 if it truncated.
+.IP
+This function is not provided by any library;
+see EXAMPLES for a reference implementation.
 .\" ----- DESCRIPTION :: Functions :: stpecpy(3) ----------------------/
 .TP
 .BR stpecpy (3)
-Copy the input string into a destination string.
+Chain-copy the input string into a destination string.
 If the destination buffer,
 limited by a pointer to its end,
 isn't large enough to hold the copy,
 the resulting string is truncated
@@ -419,10 +437,12 @@ .SS Functions
 They return the length of the total string they tried to create.
 .IP
 Check BUGS before using these functions.
 .IP
+.BR strtcpy (3)
+and
 .BR stpecpy (3)
-is a simpler alternative to these functions.
+are better alternatives to these functions.
 .\" ----- DESCRIPTION :: Functions :: stpncpy(3) ----------------------/
 .TP
 .BR stpncpy (3)
 Copy the input string into
@@ -542,8 +562,17 @@ .SH RETURN VALUE
 .BR ustpcpy (3)
 A pointer to one after the last character
 in the destination character sequence.
 .TP
+.BR strtcpy (3)
+The length of the string.
+When truncation occurs, it returns \-1.
+When
+.I dsize
+is
+.BR 0 ,
+it also returns \-1.
+.TP
 .BR strlcpy (3bsd)
 .TQ
 .BR strlcat (3bsd)
 The length of the total string that they tried to create
@@ -562,25 +591,14 @@ .SH RETURN VALUE
 which is useless.
 .\" ----- NOTES :: strscpy(9) -----------------------------------------/
 .SH NOTES
 The Linux kernel has an internal function for copying strings,
-which is similar to
-.BR stpecpy (3),
-except that it can't be chained:
-.TP
-.BR strscpy (9)
-Copy the input string into a destination string.
-If the destination buffer,
-limited by its size,
-isn't large enough to hold the copy,
-the resulting string is truncated
-(but it is guaranteed to be null-terminated).
-It returns the length of the destination string, or
+.BR strscpy (9),
+which is identical to
+.BR strtcpy (3),
+except that it returns
 .B \-E2BIG
-on truncation.
-.IP
-.BR stpecpy (3)
-is a simpler and faster alternative to this function.
+instead of \-1.
 .\" ----- CAVEATS :: --------------------------------------------------/
 .SH CAVEATS
 Don't mix chain calls to truncating and non-truncating functions.
 It is conceptually wrong
@@ -640,8 +658,17 @@ .SH EXAMPLES
 strcat(buf, "!");
 len = strlen(buf);
 puts(buf);
 .EE
+.\" ----- EXAMPLES :: strtcpy(3) --------------------------------------/
+.TP
+.BR strtcpy (3)
+.EX
+len = strtcpy(buf, "Hello world!", sizeof(buf));
+if (len == \-1)
+    goto toolong;
+puts(buf);
+.EE
 .\" ----- EXAMPLES :: stpecpy(3) --------------------------------------/
 .TP
 .BR stpecpy (3)
 .EX
@@ -671,17 +698,8 @@ .SH EXAMPLES
 if (len >= sizeof(buf))
     goto toolong;
 puts(buf);
 .EE
-.\" ----- EXAMPLES :: strscpy(9) --------------------------------------/
-.TP
-.BR strscpy (9)
-.EX
-len = strscpy(buf, "Hello world!", sizeof(buf));
-if (len == \-E2BIG)
-    goto toolong;
-puts(buf);
-.EE
 .\" ----- EXAMPLES :: stpncpy(3) --------------------------------------/
 .TP
 .BR stpncpy (3)
 .EX
@@ -765,8 +783,29 @@ .SS Implementations
 .in +4n
 .EX
 /* This code is in the public domain. */
 \&
+.\" ----- EXAMPLES :: Implementations :: strtcpy(3) -------------------/
+ssize_t
+.IR strtcpy "(char *restrict dst, const char *restrict src, size_t sz)"
+{
+    bool    trunc;
+    char    *p;
+    size_t  dlen, slen;
+\&
+    if (dsize == 0)
+        return \-1;
+\&
+    slen = strnlen(src, dsize);
+    trunc = (slen == dsize);
+    dlen = slen \- trunc;
+\&
+    p = mempcpy(dst, src, dlen);
+    *p = \[aq]\e0\[aq];
+
+    return trunc ? \-1 : slen;
+}
+\&
 .\" ----- EXAMPLES :: Implementations :: stpecpy(3) -------------------/
 char *
 .IR stpecpy "(char *dst, char end[0], const char *restrict src)"
 {
-- 
2.42.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: catenate vs concatenate (was: strncpy clarify result may not be null terminated)
  2023-11-09 11:08                                 ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Alejandro Colomar
  2023-11-09 14:06                                   ` catenate vs concatenate Jonny Grant
@ 2023-11-27 14:33                                   ` Zack Weinberg
  2023-11-27 15:08                                     ` Alejandro Colomar
  1 sibling, 1 reply; 77+ messages in thread
From: Zack Weinberg @ 2023-11-27 14:33 UTC (permalink / raw)
  To: Alejandro Colomar, Jonny Grant
  Cc: Paul Eggert, Carlos O'Donell, GNU libc development,
	'linux-man'

[all attribution deleted because it was so tangled I couldn't make
sense of it]

>> Rather than "catenation", in my experience "concatenation" is the
>> common term
...
> We began fighting this pomposity before v7. There has only been
> backsliding since. "Catenate" is crisper, means the same thing,

[English pedant mode on]

"Concatenate" is the correct term; "catenate" means something completely
different, probably "hang between two posts like a chain".  You can't
chop prefixes off a Latinate word and have it still mean the same thing.

[English pedant mode off]

Also, and much more importantly, "concatenate" is used at least 100x
more often than "catenate" in modern English, and that means it's the
word that a randomly selected reader of the manpages is more likely to
know, and, therefore, the word that the manpages should be using.

https://books.google.com/ngrams/graph?content=concatenate%2Ccatenate&year_start=1800&year_end=2019&corpus=en-2019&smoothing=3

zw

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: catenate vs concatenate (was: strncpy clarify result may not be null terminated)
  2023-11-27 14:33                                   ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Zack Weinberg
@ 2023-11-27 15:08                                     ` Alejandro Colomar
  2023-11-27 15:13                                       ` Alejandro Colomar
  2023-11-27 16:59                                       ` G. Branden Robinson
  0 siblings, 2 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-27 15:08 UTC (permalink / raw)
  To: Zack Weinberg
  Cc: Jonny Grant, Paul Eggert, Carlos O'Donell,
	GNU libc development, 'linux-man'

[-- Attachment #1: Type: text/plain, Size: 2002 bytes --]

Hi Zack,

On Mon, Nov 27, 2023 at 09:33:56AM -0500, Zack Weinberg wrote:
> [all attribution deleted because it was so tangled I couldn't make
> sense of it]
> 
> >> Rather than "catenation", in my experience "concatenation" is the
> >> common term

The above was Jonny Grant.

> > We began fighting this pomposity before v7. There has only been
> > backsliding since. "Catenate" is crisper, means the same thing,

The above was Doug McIlroy.

> [English pedant mode on]
> 
> "Concatenate" is the correct term; "catenate" means something completely
> different, probably "hang between two posts like a chain".  You can't
> chop prefixes off a Latinate word and have it still mean the same thing.

[Latin pedant mode on]

contatenate comes from the Latin concatenare.  The prefix "con-" means
"join", "together", and "catena" means "chain".
<https://en.wiktionary.org/wiki/concatenate>

catenate comes from the Latin catenare, which AFAICS, seems a synonym.
It just drops the redundant "con-" prefix, since "catena" already
implies it.
<https://en.wiktionary.org/wiki/catenate>

English isn't as propense as other Latin languages to have such synonyms
where one of them simply adds a redundant prefix or suffix, but Catalan
or Spanish for example have several such cases.

[Latin pedant mode off]

> [English pedant mode off]
> 
> Also, and much more importantly, "concatenate" is used at least 100x
> more often than "catenate" in modern English, and that means it's the
> word that a randomly selected reader of the manpages is more likely to
> know, and, therefore, the word that the manpages should be using.
> 
> https://books.google.com/ngrams/graph?content=concatenate%2Ccatenate&year_start=1800&year_end=2019&corpus=en-2019&smoothing=3

Heh, Paul sent a patch for changing it to append, which I applied, since
it reads better, even if it removes the mnemonics of cat for catenate.  :)

Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: catenate vs concatenate (was: strncpy clarify result may not be null terminated)
  2023-11-27 15:08                                     ` Alejandro Colomar
@ 2023-11-27 15:13                                       ` Alejandro Colomar
  2023-11-27 16:59                                       ` G. Branden Robinson
  1 sibling, 0 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-27 15:13 UTC (permalink / raw)
  To: Zack Weinberg
  Cc: Jonny Grant, Paul Eggert, Carlos O'Donell,
	GNU libc development, 'linux-man'

[-- Attachment #1: Type: text/plain, Size: 2245 bytes --]

On Mon, Nov 27, 2023 at 04:08:17PM +0100, Alejandro Colomar wrote:
> Hi Zack,
> 
> On Mon, Nov 27, 2023 at 09:33:56AM -0500, Zack Weinberg wrote:
> > [all attribution deleted because it was so tangled I couldn't make
> > sense of it]
> > 
> > >> Rather than "catenation", in my experience "concatenation" is the
> > >> common term
> 
> The above was Jonny Grant.
> 
> > > We began fighting this pomposity before v7. There has only been
> > > backsliding since. "Catenate" is crisper, means the same thing,
> 
> The above was Doug McIlroy.
> 
> > [English pedant mode on]
> > 
> > "Concatenate" is the correct term; "catenate" means something completely
> > different, probably "hang between two posts like a chain".  You can't
> > chop prefixes off a Latinate word and have it still mean the same thing.
> 
> [Latin pedant mode on]
> 
> contatenate comes from the Latin concatenare.  The prefix "con-" means
> "join", "together", and "catena" means "chain".
> <https://en.wiktionary.org/wiki/concatenate>
> 
> catenate comes from the Latin catenare, which AFAICS, seems a synonym.
> It just drops the redundant "con-" prefix, since "catena" already
> implies it.
> <https://en.wiktionary.org/wiki/catenate>
> 
> English isn't as propense as other Latin languages to have such synonyms

s/other//

> where one of them simply adds a redundant prefix or suffix, but Catalan
> or Spanish for example have several such cases.
> 
> [Latin pedant mode off]
> 
> > [English pedant mode off]
> > 
> > Also, and much more importantly, "concatenate" is used at least 100x
> > more often than "catenate" in modern English, and that means it's the
> > word that a randomly selected reader of the manpages is more likely to
> > know, and, therefore, the word that the manpages should be using.
> > 
> > https://books.google.com/ngrams/graph?content=concatenate%2Ccatenate&year_start=1800&year_end=2019&corpus=en-2019&smoothing=3
> 
> Heh, Paul sent a patch for changing it to append, which I applied, since
> it reads better, even if it removes the mnemonics of cat for catenate.  :)
> 
> Cheers,
> Alex
> 
> -- 
> <https://www.alejandro-colomar.es/>



-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: catenate vs concatenate (was: strncpy clarify result may not be null terminated)
  2023-11-27 15:08                                     ` Alejandro Colomar
  2023-11-27 15:13                                       ` Alejandro Colomar
@ 2023-11-27 16:59                                       ` G. Branden Robinson
  2023-11-27 18:35                                         ` Zack Weinberg
  1 sibling, 1 reply; 77+ messages in thread
From: G. Branden Robinson @ 2023-11-27 16:59 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Zack Weinberg, Jonny Grant, Paul Eggert, Carlos O'Donell,
	GNU libc development, 'linux-man'

[-- Attachment #1: Type: text/plain, Size: 5481 bytes --]

At 2023-11-27T16:08:17+0100, Alejandro Colomar wrote:
> On Mon, Nov 27, 2023 at 09:33:56AM -0500, Zack Weinberg wrote:
> > [all attribution deleted because it was so tangled I couldn't make
> > sense of it]

This elision was pretty poor form, given that one of the people whose
attribution (and opinion) Zack discarded was a relevant authority: M.
Douglas McIlroy, an alum of the Bell Labs Computing Science Research
Center and editor of the Seventh Edition Unix Programmer's Manual.

> > > We began fighting this pomposity before v7. There has only been
> > > backsliding since. "Catenate" is crisper, means the same thing,
> 
> The above was Doug McIlroy.
> 
> > [English pedant mode on]
> > 
> > "Concatenate" is the correct term; "catenate" means something
> > completely different, probably "hang between two posts like a
> > chain".  You can't chop prefixes off a Latinate word and have it
> > still mean the same thing.

In some cases, you can.  Witness the case of "flammable"/inflammable",
which are synonymous.  The former term arose because the prefix "in-"
alters meaning in multiple ways in English[1] (maybe Latin, too).  The
coinage of "flammable" later became important in the labeling and
transport of hazardous materials.  Some pedants must despair of this
linguistic innovation, perhaps viewing the prospect of handlers of such
materials burning to death as a just punishment for their lack of
morphological and etymological sophistication.  If you don't want to die
like a prole, get an English degree, eh?[2]

Here, the "con-" prefix is duplicative.  It doesn't pay its freight.

> > [English pedant mode off]

When one discards all other authorities, all that remains is one's own.
I trust we can recognize the parallels here with Dunning-Krugeresque
self-regard.

> > Also, and much more importantly, "concatenate" is used at least 100x
> > more often than "catenate" in modern English, and that means it's
> > the word that a randomly selected reader of the manpages is more
> > likely to know, and, therefore, the word that the manpages should be
> > using.

Man pages are specialized technical literature demanding a bespoke
vocabulary.  Some employment of jargon is inescapable, even necessary.
In any case, "catenate" has ~50 years of attestation in this domain
alone, which constitutes approximately the entire history of Unix
discourse.

If you apply this sort of frequency analysis to contrast man page and
general English corpora more broadly, I predict that you'll find many
candidates for terminological replacement that you would _not_ embrace.

For instance...[3]

https://books.google.com/ngrams/graph?content=open+source%2Cfree+software&year_start=1980&year_end=2019&corpus=en-2019&smoothing=3
https://books.google.com/ngrams/graph?content=emacs%2Cvi&year_start=1980&year_end=2019&corpus=en-2019&smoothing=3

Zack also overlooks the process by which speakers and readers of a
language grapple with unfamiliar words that they encounter unexpectedly.
Before undertaking to reach for dictionaries (online or otherwise), many
readers morphophonemically analyze them to see if they can infer their
meanings from familiar components.[4]

> > https://books.google.com/ngrams/graph?content=concatenate%2Ccatenate&year_start=1800&year_end=2019&corpus=en-2019&smoothing=3
> 
> Heh, Paul sent a patch for changing it to append, which I applied,
> since it reads better, even if it removes the mnemonics of cat for
> catenate.  :)

In Unix culture, one will need to remain conversant with the term
"catenate" to know why cat(1) is not named "concat(1)".  ;-)

"Concatenate" may end up prevailing even in *nix man pages; languages do
not necessarily evolve in directions that maximize lexical economy.[5]

But to change one's usage based on the break room reasoning put on offer
in this thread is a terrible idea.

Regards,
Branden

[1] https://www.saturdayeveningpost.com/2023/02/in-a-word-flammable-inflammable-and-nonflammable/

[2] ...where the first-order factor in determining your academic merit
    will be your facility with the ideas of 20th-century French
    political philosophers.

[3] One can complain that the second example suffers from a confounding
    effect given one of the terms' appearance as a roman numeral.
    Precisely.  Google Ngram Viewer is not sensitive to context.  Zack's
    use of it is a makeweight recourse to cloak an opinion grounded on
    personal preference in a shroud of false objectivity.

[4] I see this practice offered as advice in numerous resources, and it
    reflects my own approach as a native English speaker who acquired
    language before the availability of computerized (let alone
    hyperlinked) dictionaries in the home, but in a perfunctory search I
    couldn't turn up any _studies_ of what readers _actually do_.  One
    technique that could arise from Zack's approach would be to obtain
    an English word list sorted by frequency, strike off known words
    until encountering an unfamiliar one, learn it, then resume the
    process until the unfamiliar word that actually came up is reached.
    (This way you can be more confident in your own writing and speech
    that you don't use an obscure word where a more common one
    suffices.)  How well do we suppose such a process might work?

[5] certainly not if _my_ emails play any part in that evolution <drum fill>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: catenate vs concatenate (was: strncpy clarify result may not be null terminated)
  2023-11-27 16:59                                       ` G. Branden Robinson
@ 2023-11-27 18:35                                         ` Zack Weinberg
  2023-11-27 23:45                                           ` G. Branden Robinson
  0 siblings, 1 reply; 77+ messages in thread
From: Zack Weinberg @ 2023-11-27 18:35 UTC (permalink / raw)
  To: G. Branden Robinson, Alejandro Colomar
  Cc: Jonny Grant, Paul Eggert, Carlos O'Donell,
	GNU libc development, 'linux-man'

On Mon, Nov 27, 2023, at 11:59 AM, G. Branden Robinson wrote:
> At 2023-11-27T16:08:17+0100, Alejandro Colomar wrote:
>> On Mon, Nov 27, 2023 at 09:33:56AM -0500, Zack Weinberg wrote:
>> > [English pedant mode on]
>> >
>> > "Concatenate" is the correct term; "catenate" means something
>> > completely different, probably "hang between two posts like a
>> > chain".  You can't chop prefixes off a Latinate word and have it
>> > still mean the same thing.
>
> In some cases, you can.  Witness the case of "flammable"/inflammable",
> which are synonymous.

Yeah, and (after seeing Alejandro's reply) I did look up both
"concatenate" and "catenate" and find that they are synonymous in
English and both are attested from the 1600s.

**But I had to look that up.**

I cannot recall ever encountering the word "catenate" prior to this
thread, and my knee-jerk reaction was "typo."  Based on actual
experience trying, and mostly failing, to teach college undergraduates
to read man pages, I believe someone new to English technical
documentation would have a different, much more troublesome knee-jerk
reaction: "There must be some subtle reason why this documentation is
using an unfamiliar term 'catenate', instead of 'concatenate' that I
already know." Followed by wasting a bunch of time trying to research
that unfamiliar term, and when they find it's an exact synonym, adding
another tick mark to their mental tally for "manpages are badly written
and hard to understand."

> Man pages are specialized technical literature demanding a bespoke
> vocabulary.  Some employment of jargon is inescapable, even necessary.
> In any case, "catenate" has ~50 years of attestation in this domain
> alone, which constitutes approximately the entire history of Unix
> discourse.

This is no excuse.  Specialized technical jargon is only appropriate
when there is an actual difference in meaning.  (Thus, your "open
source" vs "free software" counterpoint is bogus.)

> Zack also overlooks the process by which speakers and readers of a
> language grapple with unfamiliar words that they encounter
> unexpectedly. Before undertaking to reach for dictionaries (online or
> otherwise), many readers morphophonemically analyze them to see if
> they can infer their meanings from familiar components.[4]

In grappling with general literature, yes.  In grappling with technical
writing, *no*, and again I am speaking from direct experience as an
educator.  Readers who encounter an unfamiliar word in technical
documents will most probably assume that the word has a precise meaning
that they must learn, and that they *cannot* deduce that meaning from
context. If they can't find a definition -- and they might not even try
looking in a general dictionary, since they may assume that the relevant
definition is too specialized to appear there; also it seems to me that
schoolchildren are not being taught how to use dictionaries anymore --
*they will give up on the entire document*.

Yes, this is bad.  It's an instance of learned helplessness, and it's
going to take decades and major educational reform at the grade-school
level to fix.  But there's one thing we, authors of technical documents,
can do about it right now, and that is embrace plain talk.  For example,
whenever there really is no difference of meaning, the most common word
in general usage is the word that should be used.

> In Unix culture, one will need to remain conversant with the term
> "catenate" to know why cat(1) is not named "concat(1)".  ;-)

This is how I would teach it: 'concat' is too long for Kernighan
and Ritchie's 1970s (or more precisely ASR33) tastes; 'con' was already
in use as an abbreviation for 'console' (not in Unix itself, but in
other contemporary OSes); and 'cat' is the next three letters of
"concatenate".  So that's what they picked.

zw

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: catenate vs concatenate (was: strncpy clarify result may not be null terminated)
  2023-11-27 18:35                                         ` Zack Weinberg
@ 2023-11-27 23:45                                           ` G. Branden Robinson
  0 siblings, 0 replies; 77+ messages in thread
From: G. Branden Robinson @ 2023-11-27 23:45 UTC (permalink / raw)
  To: Zack Weinberg
  Cc: Alejandro Colomar, Jonny Grant, Paul Eggert, Carlos O'Donell,
	GNU libc development, 'linux-man'

[-- Attachment #1: Type: text/plain, Size: 10006 bytes --]

Hi Zack,

At 2023-11-27T13:35:01-0500, Zack Weinberg wrote:
> On Mon, Nov 27, 2023, at 11:59 AM, G. Branden Robinson wrote:
> > At 2023-11-27T16:08:17+0100, Alejandro Colomar wrote:
> >> On Mon, Nov 27, 2023 at 09:33:56AM -0500, Zack Weinberg wrote:
> >> > [English pedant mode on]
> >> >
> >> > "Concatenate" is the correct term; "catenate" means something
> >> > completely different, probably "hang between two posts like a
> >> > chain".  You can't chop prefixes off a Latinate word and have it
> >> > still mean the same thing.
> >
> > In some cases, you can.  Witness the case of
> > "flammable"/inflammable", which are synonymous.
> 
> Yeah, and (after seeing Alejandro's reply) I did look up both
> "concatenate" and "catenate" and find that they are synonymous in
> English and both are attested from the 1600s.
> 
> **But I had to look that up.**

That's not a bug.  When we stop learning, our brains die.

> I cannot recall ever encountering the word "catenate" prior to this
> thread, and my knee-jerk reaction was "typo."

The patellar reflex is not a reliable guide to purposeful development.

> Based on actual experience trying, and mostly failing, to teach
> college undergraduates to read man pages,

I empathize with you here.  I have a bit of background in teaching and a
bit more in man page composition.  Over the years my emotional response
to being frustrated that I have to quote a man page to other software
professionals in an email or message board has evolved into relief that
I have material of reasonable quality to quote to people...when that
happens.  Sometimes a person raises an issue and my internal Gilbert
Gottfried yells, "you FOOL![1]  That's plainly documented in--wait, uh,
give me a second.  Uh...sh*t, I need to write a patch to this man page."

> I believe someone new to English technical documentation would have a
> different, much more troublesome knee-jerk reaction: "There must be
> some subtle reason why this documentation is using an unfamiliar term
> 'catenate', instead of 'concatenate' that I already know." Followed by
> wasting a bunch of time trying to research that unfamiliar term, and
> when they find it's an exact synonym, adding another tick mark to
> their mental tally for "manpages are badly written and hard to
> understand."

I think your hypothesis is sorely in need of testing.  My own feeling is
that unfamiliarity with standard English vocabulary is well down the
list of things that people find frustrating about man pages, if we take
the product of annoyance level times the number of people perceiving a
defect.

> > Man pages are specialized technical literature demanding a bespoke
> > vocabulary.  Some employment of jargon is inescapable, even
> > necessary.  In any case, "catenate" has ~50 years of attestation in
> > this domain alone, which constitutes approximately the entire
> > history of Unix discourse.
> 
> This is no excuse.  Specialized technical jargon is only appropriate
> when there is an actual difference in meaning.  (Thus, your "open
> source" vs "free software" counterpoint is bogus.)

I offered them in a tongue-in-cheek effort at humor.  I don't regard
"Emacs" and "vi" as synonymous, either.  Also I know they'll take away
your GNU card if you claim "open source" and "free software"
equivalence.[2]

Analogously, "disenfranchise" and "disfranchise" are also synonymous,
and I prefer the latter to the former for the same reason, popularity be
damned.

> > Before undertaking to reach for dictionaries (online or otherwise),
> > many readers morphophonemically analyze them to see if they can
> > infer their meanings from familiar components.
> 
> In grappling with general literature, yes.  In grappling with
> technical writing, *no*, and again I am speaking from direct
> experience as an educator.  Readers who encounter an unfamiliar word
> in technical documents will most probably assume that the word has a
> precise meaning that they must learn, and that they *cannot* deduce
> that meaning from context.

If that's the case, then our field is doing a crap job at terminology
selection.  (Stop the presses, right?)

> If they can't find a definition -- and they might not even try looking
> in a general dictionary, since they may assume that the relevant
> definition is too specialized to appear there; also it seems to me
> that schoolchildren are not being taught how to use dictionaries
> anymore

Enough of them seem to be using urbandictionary.com that the concept
remains familiar.

> -- *they will give up on the entire document*.
> 
> Yes, this is bad.  It's an instance of learned helplessness, and it's
> going to take decades and major educational reform at the grade-school
> level to fix.  But there's one thing we, authors of technical
> documents, can do about it right now, and that is embrace plain talk.
> For example, whenever there really is no difference of meaning, the
> most common word in general usage is the word that should be used.

Again I'm going to have to disagree with you.  Where we can
morphologically simplify without loss of meaning, I think that fits a
meaning of "plain talk" that is reasonably robust across the many
cultural contexts in which English is used.  Your popularity metric is
vulnerable to sampling biases, particularly of the geographical sort.
And the plainer the talk, the more it is exposed to confounding regional
factors.  When I moved to Australia, I had a frustrating experience at
the grocery store.  I need to replace a light bulb.  No sign anywhere in
the store helped me.  While searching fruitlessly, I vaguely noted a
sign for "globes", and a thought that didn't quite reach the top of my
brain observed that globes are a damned weird thing to sell in a
grocery--but hey, it's Australia, maybe they need a _reminder_ that
they're hanging from the Earth's underbelly.[9]  After a few more
minutes, these two threads joined.

Q:  How many seppos does it take to screw in a light bulb?
A:  What's gardening got to do with it?

> > In Unix culture, one will need to remain conversant with the term
> > "catenate" to know why cat(1) is not named "concat(1)".  ;-)
> 
> This is how I would teach it: 'concat' is too long for Kernighan and
> Ritchie's 1970s (or more precisely ASR33) tastes; 'con' was already in
> use as an abbreviation for 'console' (not in Unix itself, but in other
> contemporary OSes); and 'cat' is the next three letters of
> "concatenate".  So that's what they picked.

Please don't teach that.  There's a lot about it I find dubious.

1.  Thompson was the primary human force for extreme terseness in Unix
    culture, as far as I can tell from my readings in CSRC history.
    (There were other technical and ergonomic forces driving it, like
    low line speeds and the Fortran linker on the PDP-11--which C
    initially re-used--being limited to six significant characters in
    external identifiers.)  Kernighan's own writings suggest that he
    preferred clear labels over cryptic ones (see his _The Elements of
    Programming Style_, with Plauger; _Software Tools_, also with
    Plauger; and _The Unix Programming Environment_, with Pike).  I
    speculate that Thompson reasoned that he'd never need more than
    26*26 commands anyway, so there was no reason to use an encoding
    space larger than that to denote them.[3]

2.  "ASR33" is a misleading misnomer in a couple of respects.  You're
    referring to a Western Electric Teletype Model 33.  "ASR" is neither
    a manufacturer nor a model, but a configuration option.
    Specifically, "ASR" devices didn't have keyboards--just a paper tape
    punch and reader--so they were not much used for Unix development.
    "KSR" (keyboard send and receive) was the relevant configuration.

3.  The Bell Labs CSRC didn't use Model 33s anyway.  Western Electric
    was also part of the Bell monopoly, and by late 1972 at the latest,
    Labs personnel got to drive Cadillacs--the Model 37, and moreover
    the ones used to produce Unix had the "Greek" character set
    extension.[4]  You will find references to both devices in the
    Seventh Edition man pages, but the terminal driver was "tuned for
    Teletype Model 37's"[5], and troff(1) named it as a supported
    terminal device rather than the 33.[6] That said, Model 33s were
    supported, and widely used at Unix installations outside the Labs.

4.  Your deployment of "CON" to refer to the console device may be
    anachronistic.  I can't find any evidence that Multics used this
    name for it.  I'm not familiar enough with IBM's OS offerings over
    the decades to be able to navigate online material about it.  Many
    people likely know that MS-DOS called its console device that, but
    cat(1) is about a decade older than that product.[7][8]

Regards
Branden

[1] https://www.youtube.com/watch?v=2NpTmKmWdzk
[2] https://www.gnu.org/philosophy/open-source-misses-the-point.en.html

[3] I base this surmise on more than an attempt at mind reading.  See
    the first footnote on page 6 of McIlroy's "A Research Unix Reader".
    https://www.cs.dartmouth.edu/~doug/reader.pdf

[4] https://minnie.tuhs.org/cgi-bin/utree.pl?file=V3/man/man7/greek.7
[5] https://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/man/man4/tty.4
[6] https://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/man/man1/troff.1
[7] https://minnie.tuhs.org/cgi-bin/utree.pl?file=V1/man/man1/cat.1
[8] https://www.os2museum.com/wp/dos/dos-1-0-and-1-1/

[9] I'm teasing.  I'd have loved an "upside-down" globe, not least as a
    reminder that the melting of the Antarctic ice sheets will pour
    inundating destruction down on most of us thanks to the superior
    qualities of billionaires.  I already had a counter-clockwise clock,
    but didn't take it with me to Oz.  Also the moon is wrong there.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

end of thread, other threads:[~2023-11-27 23:45 UTC | newest]

Thread overview: 77+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <cfbd8674-fe6a-4430-95f1-ec8bde7da32e@jguk.org>
     [not found] ` <ZUacobMq0l_O8gjg@debian>
     [not found]   ` <aeb55af5-1017-4ffd-9824-30b43d5748e3@jguk.org>
     [not found]     ` <ZUgl2HPJvUge7XYN@debian>
     [not found]       ` <d40fffcb-524d-44b6-a252-b55a8ddc9fee@jguk.org>
     [not found]         ` <ZUo6btEFD_z_3NcF@devuan>
     [not found]           ` <929865e3-17b4-49c4-8fa9-8383885e9904@jguk.org>
     [not found]             ` <ZUpjI1AHNOMOjdFk@devuan>
     [not found]               ` <ZUsoIbhrJar6ojux@dj3ntoo>
2023-11-08  9:51                 ` strncpy clarify result may not be null terminated Alejandro Colomar
2023-11-08  9:59                   ` Thorsten Kukuk
2023-11-08 15:09                     ` Alejandro Colomar
     [not found]                     ` <6bcad2492ab843019aa63895beaea2ce@DB6PR04MB3255.eurprd04.prod.outlook.com>
2023-11-08 15:44                       ` Thorsten Kukuk
2023-11-08 17:26                         ` Adhemerval Zanella Netto
2023-11-08 14:06                   ` Zack Weinberg
2023-11-08 15:07                     ` Alejandro Colomar
2023-11-08 21:35                       ` Carlos O'Donell
2023-11-08 22:11                         ` Alejandro Colomar
2023-11-08 23:31                           ` Paul Eggert
2023-11-09  0:29                             ` Alejandro Colomar
2023-11-09 10:13                               ` Jonny Grant
2023-11-09 11:08                                 ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Alejandro Colomar
2023-11-09 14:06                                   ` catenate vs concatenate Jonny Grant
2023-11-27 14:33                                   ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Zack Weinberg
2023-11-27 15:08                                     ` Alejandro Colomar
2023-11-27 15:13                                       ` Alejandro Colomar
2023-11-27 16:59                                       ` G. Branden Robinson
2023-11-27 18:35                                         ` Zack Weinberg
2023-11-27 23:45                                           ` G. Branden Robinson
2023-11-09 11:13                                 ` strncpy clarify result may not be null terminated Alejandro Colomar
2023-11-09 14:05                                   ` Jonny Grant
2023-11-09 15:04                                     ` Alejandro Colomar
2023-11-08 19:04                   ` DJ Delorie
2023-11-08 19:40                     ` Alejandro Colomar
2023-11-08 19:58                       ` DJ Delorie
2023-11-08 20:13                         ` Alejandro Colomar
2023-11-08 21:07                           ` DJ Delorie
2023-11-08 21:50                             ` Alejandro Colomar
2023-11-08 22:17                               ` [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string Alejandro Colomar
2023-11-08 23:06                                 ` Paul Eggert
2023-11-08 23:28                                   ` DJ Delorie
2023-11-09  0:24                                   ` Alejandro Colomar
2023-11-09 14:11                                   ` Jonny Grant
2023-11-09 14:35                                     ` Alejandro Colomar
2023-11-09 14:47                                       ` Jonny Grant
2023-11-09 15:02                                         ` Alejandro Colomar
2023-11-09 17:30                                           ` DJ Delorie
2023-11-09 17:54                                             ` Andreas Schwab
2023-11-09 18:00                                             ` Alejandro Colomar
2023-11-09 19:42                                             ` Jonny Grant
2023-11-09  7:23                                 ` Oskari Pirhonen
2023-11-09 15:20                                 ` [PATCH v2 1/2] " Alejandro Colomar
2023-11-09 15:20                                 ` [PATCH v2 2/2] stpncpy.3, string.3, string_copying.7: Clarify that st[rp]ncpy() pad with null bytes Alejandro Colomar
2023-11-10  5:47                                   ` Oskari Pirhonen
2023-11-10 10:47                                     ` Alejandro Colomar
     [not found]           ` <20231108021240.176996-1-mattlloydhouse@gmail.com>
     [not found]             ` <ZUvilH5kuQfTuZjy@debian>
     [not found]               ` <20231109031345.245703-1-mattlloydhouse@gmail.com>
2023-11-09 10:31                 ` strncpy clarify result may not be null terminated Jonny Grant
2023-11-09 11:38                   ` Alejandro Colomar
2023-11-09 12:43                     ` Alejandro Colomar
2023-11-09 12:51                     ` Xi Ruoyao
2023-11-09 14:01                       ` Alejandro Colomar
2023-11-09 18:11                     ` Paul Eggert
2023-11-09 23:48                       ` Alejandro Colomar
2023-11-10  5:36                         ` Paul Eggert
2023-11-10 11:05                           ` Alejandro Colomar
2023-11-10 11:47                             ` Alejandro Colomar
2023-11-10 17:58                             ` Paul Eggert
2023-11-10 18:36                               ` Alejandro Colomar
2023-11-10 20:19                                 ` Alejandro Colomar
2023-11-10 23:44                                   ` Jonny Grant
2023-11-10 19:52                               ` Alejandro Colomar
2023-11-10 22:14                                 ` Paul Eggert
2023-11-11 21:13                                   ` Alejandro Colomar
2023-11-11 22:20                                     ` Paul Eggert
2023-11-12  9:52                                     ` Jonny Grant
2023-11-12 10:59                                       ` Alejandro Colomar
2023-11-10 11:36                           ` Jonny Grant
2023-11-10 13:15                             ` Alejandro Colomar
2023-11-10 11:23                     ` Jonny Grant
     [not found]               ` <CACKs7VDsTdSNwbC6+2LtQ67J_eJiD814xkw2_5XM1Q=iMjLXJA@mail.gmail.com>
2023-11-10 11:06                 ` Jonny Grant
2023-11-12  9:17 ` [PATCH 0/2] Expand BUGS section of string_copying(7) Alejandro Colomar
2023-11-12  9:18 ` [PATCH 1/2] string_copying.7: BUGS: *cat(3) functions aren't always bad Alejandro Colomar
2023-11-12  9:18 ` [PATCH 2/2] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems Alejandro Colomar
2023-11-12 11:26 ` [PATCH v2 0/3] Improve string_copying(7) Alejandro Colomar
2023-11-12 11:26 ` [PATCH v2 1/3] string_copying.7: BUGS: *cat(3) functions aren't always bad Alejandro Colomar
2023-11-12 11:26 ` [PATCH v2 2/3] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems Alejandro Colomar
2023-11-12 11:27 ` [PATCH v2 3/3] strtcpy.3, string_copying.7: Add strtcpy(3) Alejandro Colomar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).