* Re: strncpy clarify result may not be null terminated
[not found] ` <ZUsoIbhrJar6ojux@dj3ntoo>
@ 2023-11-08 9:51 ` Alejandro Colomar
2023-11-08 9:59 ` Thorsten Kukuk
` (2 more replies)
0 siblings, 3 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-08 9:51 UTC (permalink / raw)
To: libc-alpha, Jonny Grant, linux-man
[-- Attachment #1: Type: text/plain, Size: 1930 bytes --]
On Wed, Nov 08, 2023 at 12:18:09AM -0600, Oskari Pirhonen wrote:
> On Tue, Nov 07, 2023 at 17:17:29 +0100, Alejandro Colomar wrote:
> >
> > I would love to find this API useless, and in that case, I'd go further
> > and add [[deprecated]] in the synopsis, and write a heavy statement in a
> > BUGS section. But I can't do that while it's still a good function in
> > some cases (even if those cases are bad design, such as utmp(5)).
> >
> > On the other hand, utmp(5) has other issues, like Y2038, and AFAIR it's
> > being deprecated, so maybe we could consider deprecating strncpy(3).
> >
> > If I see enough proof that all APIs that require this function are
> > deprecated, I'll happily declare the function deprecated as well.
> > (in fact I already did some time ago, but then found this use with
> > utmp(5), which is why I removed the deprecation; see
> > <https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/man3/strncpy.3?id=30d458d1a6261221bad15e58f1862e0dda24f4a0>).
> >
>
> If you ask me, I'd not mark libc functions as deprecated without some
> kind of consesnsus from the libc maintainers too. They may not go so far
> as to add the `deprecated` attribute in their own headers, at least not
> yet at that point in time, but some kind of written "Yes, please don't
> use this function" would be nice to have before marking them in the man
> pages.
Okay, let's ask them.
Hi glibc developers,
strncpy(3) is useful to write to fixed-width buffers like `struct utmp`
and `struct utmpx`. Is there any other libc API that needs strncpy(3)?
Of those two APIs (utmp and utmpx) and any other that need strncpy(3),
are those deprecated, or is any such API still good for new code?
If all APIs that need strncpy(3) are deprecated, I propose recommending
against its use in new code.
Thanks,
Alex
>
> - Oskari
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: strncpy clarify result may not be null terminated
2023-11-08 9:51 ` strncpy clarify result may not be null terminated Alejandro Colomar
@ 2023-11-08 9:59 ` Thorsten Kukuk
2023-11-08 15:09 ` Alejandro Colomar
[not found] ` <6bcad2492ab843019aa63895beaea2ce@DB6PR04MB3255.eurprd04.prod.outlook.com>
2023-11-08 14:06 ` Zack Weinberg
2023-11-08 19:04 ` DJ Delorie
2 siblings, 2 replies; 77+ messages in thread
From: Thorsten Kukuk @ 2023-11-08 9:59 UTC (permalink / raw)
To: Alejandro Colomar; +Cc: libc-alpha, Jonny Grant, linux-man
On Wed, Nov 08, Alejandro Colomar wrote:
> strncpy(3) is useful to write to fixed-width buffers like `struct utmp`
> and `struct utmpx`. Is there any other libc API that needs strncpy(3)?
> Of those two APIs (utmp and utmpx) and any other that need strncpy(3),
> are those deprecated, or is any such API still good for new code?
Everything around utmp/utmpx/wtmp/lastlog is deprecated.
openSUSE Tumbleweed and MicroOS are no longer using nor supporting them
and fresh installations don't have that files anymore.
So new code should not use utmp/utmp/wtmp/lastlog anymore. Alternatives
are e.g. systemd-logind/wtmpdb/lastlog2.
Thorsten
--
Thorsten Kukuk, Distinguished Engineer, Senior Architect, Future Technologies
SUSE Software Solutions Germany GmbH, Frankenstraße 146, 90461 Nuernberg, Germany
Managing Director: Ivo Totev, Andrew McDonald, Werner Knoblich
(HRB 36809, AG Nürnberg)
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: strncpy clarify result may not be null terminated
2023-11-08 9:59 ` Thorsten Kukuk
@ 2023-11-08 15:09 ` Alejandro Colomar
[not found] ` <6bcad2492ab843019aa63895beaea2ce@DB6PR04MB3255.eurprd04.prod.outlook.com>
1 sibling, 0 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-08 15:09 UTC (permalink / raw)
To: Thorsten Kukuk; +Cc: libc-alpha, Jonny Grant, linux-man
[-- Attachment #1: Type: text/plain, Size: 899 bytes --]
On Wed, Nov 08, 2023 at 09:59:11AM +0000, Thorsten Kukuk wrote:
> On Wed, Nov 08, Alejandro Colomar wrote:
>
> > strncpy(3) is useful to write to fixed-width buffers like `struct utmp`
> > and `struct utmpx`. Is there any other libc API that needs strncpy(3)?
> > Of those two APIs (utmp and utmpx) and any other that need strncpy(3),
> > are those deprecated, or is any such API still good for new code?
>
Hi Thorsten!
> Everything around utmp/utmpx/wtmp/lastlog is deprecated.
Is this a Linux-specific thing? Do you know if the BSDs also deprecated
utmpx?
Thanks,
Alex
>
> openSUSE Tumbleweed and MicroOS are no longer using nor supporting them
> and fresh installations don't have that files anymore.
> So new code should not use utmp/utmp/wtmp/lastlog anymore. Alternatives
> are e.g. systemd-logind/wtmpdb/lastlog2.
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 77+ messages in thread
[parent not found: <6bcad2492ab843019aa63895beaea2ce@DB6PR04MB3255.eurprd04.prod.outlook.com>]
* Re: strncpy clarify result may not be null terminated
[not found] ` <6bcad2492ab843019aa63895beaea2ce@DB6PR04MB3255.eurprd04.prod.outlook.com>
@ 2023-11-08 15:44 ` Thorsten Kukuk
2023-11-08 17:26 ` Adhemerval Zanella Netto
0 siblings, 1 reply; 77+ messages in thread
From: Thorsten Kukuk @ 2023-11-08 15:44 UTC (permalink / raw)
To: Alejandro Colomar; +Cc: libc-alpha, Jonny Grant, linux-man
On Wed, Nov 08, Alejandro Colomar wrote:
> On Wed, Nov 08, 2023 at 09:59:11AM +0000, Thorsten Kukuk wrote:
> > On Wed, Nov 08, Alejandro Colomar wrote:
> >
> > > strncpy(3) is useful to write to fixed-width buffers like `struct utmp`
> > > and `struct utmpx`. Is there any other libc API that needs strncpy(3)?
> > > Of those two APIs (utmp and utmpx) and any other that need strncpy(3),
> > > are those deprecated, or is any such API still good for new code?
> >
>
> Hi Thorsten!
>
> > Everything around utmp/utmpx/wtmp/lastlog is deprecated.
>
> Is this a Linux-specific thing? Do you know if the BSDs also deprecated
> utmpx?
Beside the design issues of the interface, which are generic, the Y2038
issue is more or less glibc specific and a result of supporting 32bit
and 64bit userland at the same time.
For most other implementations I'm aware of there is no Y2038 problem,
either because they don't support utmp/utmpx/... like musl libc, or they
were able to switch to a 64bit time variable or used that already.
So no need to change anything.
For BSD I don't really know the situation, but as far as I know, they
don't have the problem and thus no need to change anything.
Thorsten
> Thanks,
> Alex
>
> >
> > openSUSE Tumbleweed and MicroOS are no longer using nor supporting them
> > and fresh installations don't have that files anymore.
> > So new code should not use utmp/utmp/wtmp/lastlog anymore. Alternatives
> > are e.g. systemd-logind/wtmpdb/lastlog2.
>
> --
> <https://www.alejandro-colomar.es/>
--
Thorsten Kukuk, Distinguished Engineer, Senior Architect, Future Technologies
SUSE Software Solutions Germany GmbH, Frankenstraße 146, 90461 Nuernberg, Germany
Managing Director: Ivo Totev, Andrew McDonald, Werner Knoblich
(HRB 36809, AG Nürnberg)
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: strncpy clarify result may not be null terminated
2023-11-08 15:44 ` Thorsten Kukuk
@ 2023-11-08 17:26 ` Adhemerval Zanella Netto
0 siblings, 0 replies; 77+ messages in thread
From: Adhemerval Zanella Netto @ 2023-11-08 17:26 UTC (permalink / raw)
To: Thorsten Kukuk, Alejandro Colomar; +Cc: libc-alpha, Jonny Grant, linux-man
On 08/11/23 12:44, Thorsten Kukuk wrote:
> On Wed, Nov 08, Alejandro Colomar wrote:
>
>> On Wed, Nov 08, 2023 at 09:59:11AM +0000, Thorsten Kukuk wrote:
>>> On Wed, Nov 08, Alejandro Colomar wrote:
>>>
>>>> strncpy(3) is useful to write to fixed-width buffers like `struct utmp`
>>>> and `struct utmpx`. Is there any other libc API that needs strncpy(3)?
>>>> Of those two APIs (utmp and utmpx) and any other that need strncpy(3),
>>>> are those deprecated, or is any such API still good for new code?
>>>
>>
>> Hi Thorsten!
>>
>>> Everything around utmp/utmpx/wtmp/lastlog is deprecated.
>>
>> Is this a Linux-specific thing? Do you know if the BSDs also deprecated
>> utmpx?
>
> Beside the design issues of the interface, which are generic, the Y2038
> issue is more or less glibc specific and a result of supporting 32bit
> and 64bit userland at the same time.
> For most other implementations I'm aware of there is no Y2038 problem,
> either because they don't support utmp/utmpx/... like musl libc, or they
> were able to switch to a 64bit time variable or used that already.
> So no need to change anything.
In fact the glibc utmp y2038 support depends of the ABI, some 64 bit ABIs
decided to be compatible with 32 bits so the utmp files could be read/parsed
by both ABIs (defined by __WORDSIZE_TIME64_COMPAT32). This required the
ut_tv field to be define not as a 'struct timeval', but rather with a similar
struct with 32 bit tv_sec (yes, it is a mess and not sure why it was
considered a good idea back then).
It means that for 64 bits that define __WORDSIZE_TIME64_COMPAT32ABI (mips,
riscv, s390, sparc, powerpc, and x86) the utmp ABI is broken regarding
y2038 support. The ut_tv is also defined depending of the time_t at build
time (_TIME_BITS), so if you have programs with different time_t support,
they won't correctly access the utmp (gnulib seems to have some overrides
to fix it).
Fixing those issues would require a lot of work that I don't think it worth
for a API with some inherent implementation flaws [1] (most likely it would
require a complete rewrite, which logind basically did). That's why I am
leaning to complete remove glibc implementation and mimic what musl did
(no-op implementation that return -1/ENOTSUP where applicable).
[1] https://sourceware.org/bugzilla/show_bug.cgi?id=24492
> For BSD I don't really know the situation, but as far as I know, they
> don't have the problem and thus no need to change anything.
>
> Thorsten
>
>> Thanks,
>> Alex
>>
>>>
>>> openSUSE Tumbleweed and MicroOS are no longer using nor supporting them
>>> and fresh installations don't have that files anymore.
>>> So new code should not use utmp/utmp/wtmp/lastlog anymore. Alternatives
>>> are e.g. systemd-logind/wtmpdb/lastlog2.
>>
>> --
>> <https://www.alejandro-colomar.es/>
>
>
>
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: strncpy clarify result may not be null terminated
2023-11-08 9:51 ` strncpy clarify result may not be null terminated Alejandro Colomar
2023-11-08 9:59 ` Thorsten Kukuk
@ 2023-11-08 14:06 ` Zack Weinberg
2023-11-08 15:07 ` Alejandro Colomar
2023-11-08 19:04 ` DJ Delorie
2 siblings, 1 reply; 77+ messages in thread
From: Zack Weinberg @ 2023-11-08 14:06 UTC (permalink / raw)
To: Alejandro Colomar, GNU libc development, Jonny Grant,
'linux-man'
>> If you ask me, I'd not mark libc functions as deprecated without some
>> kind of consesnsus from the libc maintainers too.
...
> Okay, let's ask them.
...
> Hi glibc developers,
>
> strncpy(3)
...
Speaking only for myself, I would be very reluctant to declare any standardized function "deprecated" by glibc unless the relevant standards have also made that declaration. This goes double for anything that was in C89.
Also speaking only for myself, the Linux manpages are welcome to discourage the use of any function that you feel is not a wise choice for new programs, but the word "deprecated" should be reserved for cases where there really has been a declaration of deprecation by us and/or the standards. The word "obsolete" should also be used very cautiously; it's broader, but I personally would only use it in situations where there is a direct replacement (e.g. sigaction replaces signal, strsep replaces strtok and strtok_r).
In the specific cases we're discussing: I would definitely like to see a BUGS or NOTES section in the strncpy(3) manpage, warning people that it's probably not what they want and recommending use of strlen+memcpy instead. I don't know enough about the utmp(x) situation to have a strong opinion, but I do think the manpages need to be very clear that this particular proposed replacement for utmp(x) is Linux-specific and still somewhat experimental.
zw
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: strncpy clarify result may not be null terminated
2023-11-08 14:06 ` Zack Weinberg
@ 2023-11-08 15:07 ` Alejandro Colomar
2023-11-08 21:35 ` Carlos O'Donell
0 siblings, 1 reply; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-08 15:07 UTC (permalink / raw)
To: Zack Weinberg; +Cc: GNU libc development, Jonny Grant, 'linux-man'
[-- Attachment #1: Type: text/plain, Size: 3528 bytes --]
Hi Zack!
On Wed, Nov 08, 2023 at 09:06:48AM -0500, Zack Weinberg wrote:
> >> If you ask me, I'd not mark libc functions as deprecated without some
> >> kind of consesnsus from the libc maintainers too.
> ...
> > Okay, let's ask them.
> ...
> > Hi glibc developers,
> >
> > strncpy(3)
> ...
>
> Speaking only for myself, I would be very reluctant to declare any
> standardized function "deprecated" by glibc unless the relevant
> standards have also made that declaration. This goes double for
> anything that was in C89.
I understand your point of view, but disagree with it. Deprecation by
ISO C or POSIX takes very very long. We had gets(3) for decades until
they realized it should be removed from the standards.
STANDARDS
POSIX.1‐2008.
HISTORY
C89, POSIX.1‐2001.
LSB deprecates gets(). POSIX.1‐2008 marks gets() obsoles‐
cent. ISO C11 removes the specification of gets() from the
C language, and since glibc 2.16, glibc header files don’t
expose the function declaration if the _ISOC11_SOURCE fea‐
ture test macro is defined.
So we had it in ISO C in C89 and C99, and only in C11 they realized it
had to be removed. POSIX hasn't even removed it yet! I won't hesitate
to kill a function just because of bureaucracy.
The standard, especially C89, was just a reflection of the commonalities
of most implementation. It was a burden of implementations to add new
stuff or to remove existing stuff. Later revisions of the standards
invented more, though.
In this case, since ISO C has no APIs that use strncpy(3), it could (and
should) already deprecate strncpy(3) from ISO C. POSIX still needs it
while it keeps utmpx(5), because there's no other way to correctly write
to the fixed-width buffers within struct utmpx.
>
> Also speaking only for myself, the Linux manpages are welcome to
> discourage the use of any function that you feel is not a wise choicei
> for new programs, but the word "deprecated" should be reserved for
> cases where there really has been a declaration of deprecation by us
> and/or the standards.
If a function is deprecated by a standard or other entity, that will be
reflected in the STANDARDS or HISTORY section. For deprecation by the
manual itself, the SYNOPSIS (and BUGS) sections are fine. In the end,
the word 'deprecate' isn't any magic.
From WordNet (r) 3.0 (2006) [wn]:
deprecate
v 1: express strong disapproval of; deplore
That term applies to strncpy(3).
> The word "obsolete" should also be used very cautiously; it's broader,
> but I personally would only use it in situations where there is a
> direct replacement (e.g. sigaction replaces signal, strsep replaces strtok and strtok_r).
>
> In the specific cases we're discussing: I would definitely like to see
> a BUGS or NOTES section in the strncpy(3) manpage, warning people that
> it's probably not what they want and recommending use of strlen+memcpy
> instead. I don't know enough about the utmp(x) situation to have a
> strong opinion, but I do think the manpages need to be very clear that
> this particular proposed replacement for utmp(x) is Linux-specific and
> still somewhat experimental.
But yes, we need to make sure that the APIs that need strncpy(3) are
all deprecated. If other Unix systems still need utmpx or similar
stuff, strncpy(3) will still be necessary.
Cheers,
Alex
>
> zw
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: strncpy clarify result may not be null terminated
2023-11-08 15:07 ` Alejandro Colomar
@ 2023-11-08 21:35 ` Carlos O'Donell
2023-11-08 22:11 ` Alejandro Colomar
0 siblings, 1 reply; 77+ messages in thread
From: Carlos O'Donell @ 2023-11-08 21:35 UTC (permalink / raw)
To: Alejandro Colomar, Zack Weinberg
Cc: GNU libc development, Jonny Grant, 'linux-man'
On 11/8/23 10:07, Alejandro Colomar wrote:
> So we had it in ISO C in C89 and C99, and only in C11 they realized it
> had to be removed. POSIX hasn't even removed it yet! I won't hesitate
> to kill a function just because of bureaucracy.
Attempting to get consensus at an international level, across cultural boundaries,
use cases, workloads, and developer workflows is difficult and not intended to be
bureaucracy for the sake of bureaucracy.
--
Cheers,
Carlos.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: strncpy clarify result may not be null terminated
2023-11-08 21:35 ` Carlos O'Donell
@ 2023-11-08 22:11 ` Alejandro Colomar
2023-11-08 23:31 ` Paul Eggert
0 siblings, 1 reply; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-08 22:11 UTC (permalink / raw)
To: Carlos O'Donell
Cc: Zack Weinberg, GNU libc development, Jonny Grant, 'linux-man'
[-- Attachment #1: Type: text/plain, Size: 968 bytes --]
On Wed, Nov 08, 2023 at 04:35:12PM -0500, Carlos O'Donell wrote:
> On 11/8/23 10:07, Alejandro Colomar wrote:
> > So we had it in ISO C in C89 and C99, and only in C11 they realized it
> > had to be removed. POSIX hasn't even removed it yet! I won't hesitate
> > to kill a function just because of bureaucracy.
>
> Attempting to get consensus at an international level, across cultural boundaries,
> use cases, workloads, and developer workflows is difficult and not intended to be
> bureaucracy for the sake of bureaucracy.
Hi Carlos!
I understand that, and respect ISO's work. I just don't think we need,
as GNU or Linux projects, to be restricted to the decisions of ISO. We
can realize that certain functions are bad, and mark them as deprecated
in our scope. If others want to imitate (ISO might even take it as
"prior art"), then great.
Cheers,
Alex
>
> --
> Cheers,
> Carlos.
>
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: strncpy clarify result may not be null terminated
2023-11-08 22:11 ` Alejandro Colomar
@ 2023-11-08 23:31 ` Paul Eggert
2023-11-09 0:29 ` Alejandro Colomar
0 siblings, 1 reply; 77+ messages in thread
From: Paul Eggert @ 2023-11-08 23:31 UTC (permalink / raw)
To: Alejandro Colomar, Carlos O'Donell
Cc: Zack Weinberg, GNU libc development, Jonny Grant, 'linux-man'
On 11/8/23 14:11, Alejandro Colomar wrote:
> I just don't think we need,
> as GNU or Linux projects, to be restricted to the decisions of ISO. We
> can realize that certain functions are bad, and mark them as deprecated
> in our scope.
There's enough use of strncpy for the intended use (smallish fixed size
character arrays that are null padded, not null terminated) that saying
it's deprecated would likely cause more trouble than it's worth. It's
not just utmp and tar; it's also socket programming (sun_path) and I'm
sure other stuff.
Were we designing the C library from scratch I'd agree with you: in that
context, strncpy would clearly be more trouble than it's worth. But now
that we're stuck with strncpy we have better things to do than try to
deprecate it.
Instead of saying "deprecate" I suggest we say something like "This
function is generally a poor choice for processing strings" and point to
the longer man page about strings in general. That's what the glibc
manual does and it works reasonably well.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: strncpy clarify result may not be null terminated
2023-11-08 23:31 ` Paul Eggert
@ 2023-11-09 0:29 ` Alejandro Colomar
2023-11-09 10:13 ` Jonny Grant
0 siblings, 1 reply; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-09 0:29 UTC (permalink / raw)
To: Paul Eggert
Cc: Carlos O'Donell, Zack Weinberg, GNU libc development,
Jonny Grant, 'linux-man'
[-- Attachment #1: Type: text/plain, Size: 1811 bytes --]
Hi Pail,
On Wed, Nov 08, 2023 at 03:31:38PM -0800, Paul Eggert wrote:
> On 11/8/23 14:11, Alejandro Colomar wrote:
> > I just don't think we need,
> > as GNU or Linux projects, to be restricted to the decisions of ISO. We
> > can realize that certain functions are bad, and mark them as deprecated
> > in our scope.
>
> There's enough use of strncpy for the intended use (smallish fixed size
> character arrays that are null padded, not null terminated) that saying it's
> deprecated would likely cause more trouble than it's worth. It's not just
> utmp and tar; it's also socket programming (sun_path) and I'm sure other
> stuff.
>
> Were we designing the C library from scratch I'd agree with you: in that
> context, strncpy would clearly be more trouble than it's worth. But now that
> we're stuck with strncpy we have better things to do than try to deprecate
> it.
No, no, I'm not trying to deprecate it. I was just saying that *iff*
all of its uses were dead, I'd deprecate it. But they're clearly not
dead, so it's a perfect function for those cases.
>
> Instead of saying "deprecate" I suggest we say something like "This function
> is generally a poor choice for processing strings" and point to the longer
> man page about strings in general. That's what the glibc manual does and it
> works reasonably well.
Yes, I've done something like this. string_copying(7) recommends
avoiding fixed-width null-padded buffers in APIs. But for those use
cases that already exist, this is the function to use.
I'm also refusing to document how to (mis)use this function for
truncating strings. If one wants to struncate strings, they'll need
functions that were designed to do that (e.g., strlcpy(3)).
Cheers,
Alex
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: strncpy clarify result may not be null terminated
2023-11-09 0:29 ` Alejandro Colomar
@ 2023-11-09 10:13 ` Jonny Grant
2023-11-09 11:08 ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Alejandro Colomar
2023-11-09 11:13 ` strncpy clarify result may not be null terminated Alejandro Colomar
0 siblings, 2 replies; 77+ messages in thread
From: Jonny Grant @ 2023-11-09 10:13 UTC (permalink / raw)
To: Alejandro Colomar, Paul Eggert
Cc: Carlos O'Donell, Zack Weinberg, GNU libc development,
'linux-man'
On 09/11/2023 00:29, Alejandro Colomar wrote:
> Hi Pail,
>
> On Wed, Nov 08, 2023 at 03:31:38PM -0800, Paul Eggert wrote:
>> On 11/8/23 14:11, Alejandro Colomar wrote:
>>> I just don't think we need,
>>> as GNU or Linux projects, to be restricted to the decisions of ISO. We
>>> can realize that certain functions are bad, and mark them as deprecated
>>> in our scope.
>>
>> There's enough use of strncpy for the intended use (smallish fixed size
>> character arrays that are null padded, not null terminated) that saying it's
>> deprecated would likely cause more trouble than it's worth. It's not just
>> utmp and tar; it's also socket programming (sun_path) and I'm sure other
>> stuff.
>>
>> Were we designing the C library from scratch I'd agree with you: in that
>> context, strncpy would clearly be more trouble than it's worth. But now that
>> we're stuck with strncpy we have better things to do than try to deprecate
>> it.
>
> No, no, I'm not trying to deprecate it. I was just saying that *iff*
> all of its uses were dead, I'd deprecate it. But they're clearly not
> dead, so it's a perfect function for those cases.
>
>>
>> Instead of saying "deprecate" I suggest we say something like "This function
>> is generally a poor choice for processing strings" and point to the longer
>> man page about strings in general. That's what the glibc manual does and it
>> works reasonably well.
>
> Yes, I've done something like this. string_copying(7) recommends
> avoiding fixed-width null-padded buffers in APIs. But for those use
> cases that already exist, this is the function to use.
https://man7.org/linux/man-pages/man7/string_copying.7.html
Rather than "catenation", in my experience "concatenation" is the common term to explain what it does. There are quite a few on that page. Probably other man pages too.
How about following the style of the other man pages that put the notes about each function below them? (rather than above)
https://man7.org/linux/man-pages/man3/string.3.html
size_t strlen(const char *s);
Return the length of the string s.
At the moment on string_copying there are // comments on the line above each function. So the presentation of the information is different:
// Copy/catenate a string.
char *strcpy(char *restrict dst, const char *restrict src);
char *strcat(char *restrict dst, const char *restrict src);
Kind regards
Jonny
^ permalink raw reply [flat|nested] 77+ messages in thread
* catenate vs concatenate (was: strncpy clarify result may not be null terminated)
2023-11-09 10:13 ` Jonny Grant
@ 2023-11-09 11:08 ` Alejandro Colomar
2023-11-09 14:06 ` catenate vs concatenate Jonny Grant
2023-11-27 14:33 ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Zack Weinberg
2023-11-09 11:13 ` strncpy clarify result may not be null terminated Alejandro Colomar
1 sibling, 2 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-09 11:08 UTC (permalink / raw)
To: Jonny Grant
Cc: Paul Eggert, Carlos O'Donell, Zack Weinberg,
GNU libc development, 'linux-man'
[-- Attachment #1: Type: text/plain, Size: 796 bytes --]
Hi Jonny,
On Thu, Nov 09, 2023 at 10:13:24AM +0000, Jonny Grant wrote:
> https://man7.org/linux/man-pages/man7/string_copying.7.html
> Rather than "catenation", in my experience "concatenation" is the common term to explain what it does. There are quite a few on that page. Probably other man pages too.
Here's why:
<https://lore.kernel.org/linux-man/CAKH6PiUrQzb7vRZxUs0742WnfaLpcUec0QfdJQJ5Di8LqFg+NA@mail.gmail.com/>
Douglas McIlroy wrote (Wed, 14 Dec 2022 11:22:05 -0500):
>> concatenate
>
> We began fighting this pomposity before v7. There has only been
> backsliding since..
> "Catenate" is crisper, means the same thing, and concurs with the "cat" command.
> I invite you to join the battle for simplicity.
Cheers,
Alex
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: catenate vs concatenate
2023-11-09 11:08 ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Alejandro Colomar
@ 2023-11-09 14:06 ` Jonny Grant
2023-11-27 14:33 ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Zack Weinberg
1 sibling, 0 replies; 77+ messages in thread
From: Jonny Grant @ 2023-11-09 14:06 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Paul Eggert, Carlos O'Donell, Zack Weinberg,
GNU libc development, 'linux-man'
On 09/11/2023 11:08, Alejandro Colomar wrote:
> Hi Jonny,
>
> On Thu, Nov 09, 2023 at 10:13:24AM +0000, Jonny Grant wrote:
>> https://man7.org/linux/man-pages/man7/string_copying.7.html
>> Rather than "catenation", in my experience "concatenation" is the common term to explain what it does. There are quite a few on that page. Probably other man pages too.
>
> Here's why:
> <https://lore.kernel.org/linux-man/CAKH6PiUrQzb7vRZxUs0742WnfaLpcUec0QfdJQJ5Di8LqFg+NA@mail.gmail.com/>
>
> Douglas McIlroy wrote (Wed, 14 Dec 2022 11:22:05 -0500):
>>> concatenate
>>
>> We began fighting this pomposity before v7. There has only been
>> backsliding since..
>> "Catenate" is crisper, means the same thing, and concurs with the "cat" command.
>> I invite you to join the battle for simplicity.
>
> Cheers,
> Alex
>
Looks like it's already been discussed. Where a term is already in use, it's a question if to change the commonly used term. Technical documents seem to be mostly 'concatenate'. Looks like people have already decided on going with 'catenate'.
Kind regards
Jonny
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: catenate vs concatenate (was: strncpy clarify result may not be null terminated)
2023-11-09 11:08 ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Alejandro Colomar
2023-11-09 14:06 ` catenate vs concatenate Jonny Grant
@ 2023-11-27 14:33 ` Zack Weinberg
2023-11-27 15:08 ` Alejandro Colomar
1 sibling, 1 reply; 77+ messages in thread
From: Zack Weinberg @ 2023-11-27 14:33 UTC (permalink / raw)
To: Alejandro Colomar, Jonny Grant
Cc: Paul Eggert, Carlos O'Donell, GNU libc development,
'linux-man'
[all attribution deleted because it was so tangled I couldn't make
sense of it]
>> Rather than "catenation", in my experience "concatenation" is the
>> common term
...
> We began fighting this pomposity before v7. There has only been
> backsliding since. "Catenate" is crisper, means the same thing,
[English pedant mode on]
"Concatenate" is the correct term; "catenate" means something completely
different, probably "hang between two posts like a chain". You can't
chop prefixes off a Latinate word and have it still mean the same thing.
[English pedant mode off]
Also, and much more importantly, "concatenate" is used at least 100x
more often than "catenate" in modern English, and that means it's the
word that a randomly selected reader of the manpages is more likely to
know, and, therefore, the word that the manpages should be using.
https://books.google.com/ngrams/graph?content=concatenate%2Ccatenate&year_start=1800&year_end=2019&corpus=en-2019&smoothing=3
zw
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: catenate vs concatenate (was: strncpy clarify result may not be null terminated)
2023-11-27 14:33 ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Zack Weinberg
@ 2023-11-27 15:08 ` Alejandro Colomar
2023-11-27 15:13 ` Alejandro Colomar
2023-11-27 16:59 ` G. Branden Robinson
0 siblings, 2 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-27 15:08 UTC (permalink / raw)
To: Zack Weinberg
Cc: Jonny Grant, Paul Eggert, Carlos O'Donell,
GNU libc development, 'linux-man'
[-- Attachment #1: Type: text/plain, Size: 2002 bytes --]
Hi Zack,
On Mon, Nov 27, 2023 at 09:33:56AM -0500, Zack Weinberg wrote:
> [all attribution deleted because it was so tangled I couldn't make
> sense of it]
>
> >> Rather than "catenation", in my experience "concatenation" is the
> >> common term
The above was Jonny Grant.
> > We began fighting this pomposity before v7. There has only been
> > backsliding since. "Catenate" is crisper, means the same thing,
The above was Doug McIlroy.
> [English pedant mode on]
>
> "Concatenate" is the correct term; "catenate" means something completely
> different, probably "hang between two posts like a chain". You can't
> chop prefixes off a Latinate word and have it still mean the same thing.
[Latin pedant mode on]
contatenate comes from the Latin concatenare. The prefix "con-" means
"join", "together", and "catena" means "chain".
<https://en.wiktionary.org/wiki/concatenate>
catenate comes from the Latin catenare, which AFAICS, seems a synonym.
It just drops the redundant "con-" prefix, since "catena" already
implies it.
<https://en.wiktionary.org/wiki/catenate>
English isn't as propense as other Latin languages to have such synonyms
where one of them simply adds a redundant prefix or suffix, but Catalan
or Spanish for example have several such cases.
[Latin pedant mode off]
> [English pedant mode off]
>
> Also, and much more importantly, "concatenate" is used at least 100x
> more often than "catenate" in modern English, and that means it's the
> word that a randomly selected reader of the manpages is more likely to
> know, and, therefore, the word that the manpages should be using.
>
> https://books.google.com/ngrams/graph?content=concatenate%2Ccatenate&year_start=1800&year_end=2019&corpus=en-2019&smoothing=3
Heh, Paul sent a patch for changing it to append, which I applied, since
it reads better, even if it removes the mnemonics of cat for catenate. :)
Cheers,
Alex
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: catenate vs concatenate (was: strncpy clarify result may not be null terminated)
2023-11-27 15:08 ` Alejandro Colomar
@ 2023-11-27 15:13 ` Alejandro Colomar
2023-11-27 16:59 ` G. Branden Robinson
1 sibling, 0 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-27 15:13 UTC (permalink / raw)
To: Zack Weinberg
Cc: Jonny Grant, Paul Eggert, Carlos O'Donell,
GNU libc development, 'linux-man'
[-- Attachment #1: Type: text/plain, Size: 2245 bytes --]
On Mon, Nov 27, 2023 at 04:08:17PM +0100, Alejandro Colomar wrote:
> Hi Zack,
>
> On Mon, Nov 27, 2023 at 09:33:56AM -0500, Zack Weinberg wrote:
> > [all attribution deleted because it was so tangled I couldn't make
> > sense of it]
> >
> > >> Rather than "catenation", in my experience "concatenation" is the
> > >> common term
>
> The above was Jonny Grant.
>
> > > We began fighting this pomposity before v7. There has only been
> > > backsliding since. "Catenate" is crisper, means the same thing,
>
> The above was Doug McIlroy.
>
> > [English pedant mode on]
> >
> > "Concatenate" is the correct term; "catenate" means something completely
> > different, probably "hang between two posts like a chain". You can't
> > chop prefixes off a Latinate word and have it still mean the same thing.
>
> [Latin pedant mode on]
>
> contatenate comes from the Latin concatenare. The prefix "con-" means
> "join", "together", and "catena" means "chain".
> <https://en.wiktionary.org/wiki/concatenate>
>
> catenate comes from the Latin catenare, which AFAICS, seems a synonym.
> It just drops the redundant "con-" prefix, since "catena" already
> implies it.
> <https://en.wiktionary.org/wiki/catenate>
>
> English isn't as propense as other Latin languages to have such synonyms
s/other//
> where one of them simply adds a redundant prefix or suffix, but Catalan
> or Spanish for example have several such cases.
>
> [Latin pedant mode off]
>
> > [English pedant mode off]
> >
> > Also, and much more importantly, "concatenate" is used at least 100x
> > more often than "catenate" in modern English, and that means it's the
> > word that a randomly selected reader of the manpages is more likely to
> > know, and, therefore, the word that the manpages should be using.
> >
> > https://books.google.com/ngrams/graph?content=concatenate%2Ccatenate&year_start=1800&year_end=2019&corpus=en-2019&smoothing=3
>
> Heh, Paul sent a patch for changing it to append, which I applied, since
> it reads better, even if it removes the mnemonics of cat for catenate. :)
>
> Cheers,
> Alex
>
> --
> <https://www.alejandro-colomar.es/>
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: catenate vs concatenate (was: strncpy clarify result may not be null terminated)
2023-11-27 15:08 ` Alejandro Colomar
2023-11-27 15:13 ` Alejandro Colomar
@ 2023-11-27 16:59 ` G. Branden Robinson
2023-11-27 18:35 ` Zack Weinberg
1 sibling, 1 reply; 77+ messages in thread
From: G. Branden Robinson @ 2023-11-27 16:59 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Zack Weinberg, Jonny Grant, Paul Eggert, Carlos O'Donell,
GNU libc development, 'linux-man'
[-- Attachment #1: Type: text/plain, Size: 5481 bytes --]
At 2023-11-27T16:08:17+0100, Alejandro Colomar wrote:
> On Mon, Nov 27, 2023 at 09:33:56AM -0500, Zack Weinberg wrote:
> > [all attribution deleted because it was so tangled I couldn't make
> > sense of it]
This elision was pretty poor form, given that one of the people whose
attribution (and opinion) Zack discarded was a relevant authority: M.
Douglas McIlroy, an alum of the Bell Labs Computing Science Research
Center and editor of the Seventh Edition Unix Programmer's Manual.
> > > We began fighting this pomposity before v7. There has only been
> > > backsliding since. "Catenate" is crisper, means the same thing,
>
> The above was Doug McIlroy.
>
> > [English pedant mode on]
> >
> > "Concatenate" is the correct term; "catenate" means something
> > completely different, probably "hang between two posts like a
> > chain". You can't chop prefixes off a Latinate word and have it
> > still mean the same thing.
In some cases, you can. Witness the case of "flammable"/inflammable",
which are synonymous. The former term arose because the prefix "in-"
alters meaning in multiple ways in English[1] (maybe Latin, too). The
coinage of "flammable" later became important in the labeling and
transport of hazardous materials. Some pedants must despair of this
linguistic innovation, perhaps viewing the prospect of handlers of such
materials burning to death as a just punishment for their lack of
morphological and etymological sophistication. If you don't want to die
like a prole, get an English degree, eh?[2]
Here, the "con-" prefix is duplicative. It doesn't pay its freight.
> > [English pedant mode off]
When one discards all other authorities, all that remains is one's own.
I trust we can recognize the parallels here with Dunning-Krugeresque
self-regard.
> > Also, and much more importantly, "concatenate" is used at least 100x
> > more often than "catenate" in modern English, and that means it's
> > the word that a randomly selected reader of the manpages is more
> > likely to know, and, therefore, the word that the manpages should be
> > using.
Man pages are specialized technical literature demanding a bespoke
vocabulary. Some employment of jargon is inescapable, even necessary.
In any case, "catenate" has ~50 years of attestation in this domain
alone, which constitutes approximately the entire history of Unix
discourse.
If you apply this sort of frequency analysis to contrast man page and
general English corpora more broadly, I predict that you'll find many
candidates for terminological replacement that you would _not_ embrace.
For instance...[3]
https://books.google.com/ngrams/graph?content=open+source%2Cfree+software&year_start=1980&year_end=2019&corpus=en-2019&smoothing=3
https://books.google.com/ngrams/graph?content=emacs%2Cvi&year_start=1980&year_end=2019&corpus=en-2019&smoothing=3
Zack also overlooks the process by which speakers and readers of a
language grapple with unfamiliar words that they encounter unexpectedly.
Before undertaking to reach for dictionaries (online or otherwise), many
readers morphophonemically analyze them to see if they can infer their
meanings from familiar components.[4]
> > https://books.google.com/ngrams/graph?content=concatenate%2Ccatenate&year_start=1800&year_end=2019&corpus=en-2019&smoothing=3
>
> Heh, Paul sent a patch for changing it to append, which I applied,
> since it reads better, even if it removes the mnemonics of cat for
> catenate. :)
In Unix culture, one will need to remain conversant with the term
"catenate" to know why cat(1) is not named "concat(1)". ;-)
"Concatenate" may end up prevailing even in *nix man pages; languages do
not necessarily evolve in directions that maximize lexical economy.[5]
But to change one's usage based on the break room reasoning put on offer
in this thread is a terrible idea.
Regards,
Branden
[1] https://www.saturdayeveningpost.com/2023/02/in-a-word-flammable-inflammable-and-nonflammable/
[2] ...where the first-order factor in determining your academic merit
will be your facility with the ideas of 20th-century French
political philosophers.
[3] One can complain that the second example suffers from a confounding
effect given one of the terms' appearance as a roman numeral.
Precisely. Google Ngram Viewer is not sensitive to context. Zack's
use of it is a makeweight recourse to cloak an opinion grounded on
personal preference in a shroud of false objectivity.
[4] I see this practice offered as advice in numerous resources, and it
reflects my own approach as a native English speaker who acquired
language before the availability of computerized (let alone
hyperlinked) dictionaries in the home, but in a perfunctory search I
couldn't turn up any _studies_ of what readers _actually do_. One
technique that could arise from Zack's approach would be to obtain
an English word list sorted by frequency, strike off known words
until encountering an unfamiliar one, learn it, then resume the
process until the unfamiliar word that actually came up is reached.
(This way you can be more confident in your own writing and speech
that you don't use an obscure word where a more common one
suffices.) How well do we suppose such a process might work?
[5] certainly not if _my_ emails play any part in that evolution <drum fill>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: catenate vs concatenate (was: strncpy clarify result may not be null terminated)
2023-11-27 16:59 ` G. Branden Robinson
@ 2023-11-27 18:35 ` Zack Weinberg
2023-11-27 23:45 ` G. Branden Robinson
0 siblings, 1 reply; 77+ messages in thread
From: Zack Weinberg @ 2023-11-27 18:35 UTC (permalink / raw)
To: G. Branden Robinson, Alejandro Colomar
Cc: Jonny Grant, Paul Eggert, Carlos O'Donell,
GNU libc development, 'linux-man'
On Mon, Nov 27, 2023, at 11:59 AM, G. Branden Robinson wrote:
> At 2023-11-27T16:08:17+0100, Alejandro Colomar wrote:
>> On Mon, Nov 27, 2023 at 09:33:56AM -0500, Zack Weinberg wrote:
>> > [English pedant mode on]
>> >
>> > "Concatenate" is the correct term; "catenate" means something
>> > completely different, probably "hang between two posts like a
>> > chain". You can't chop prefixes off a Latinate word and have it
>> > still mean the same thing.
>
> In some cases, you can. Witness the case of "flammable"/inflammable",
> which are synonymous.
Yeah, and (after seeing Alejandro's reply) I did look up both
"concatenate" and "catenate" and find that they are synonymous in
English and both are attested from the 1600s.
**But I had to look that up.**
I cannot recall ever encountering the word "catenate" prior to this
thread, and my knee-jerk reaction was "typo." Based on actual
experience trying, and mostly failing, to teach college undergraduates
to read man pages, I believe someone new to English technical
documentation would have a different, much more troublesome knee-jerk
reaction: "There must be some subtle reason why this documentation is
using an unfamiliar term 'catenate', instead of 'concatenate' that I
already know." Followed by wasting a bunch of time trying to research
that unfamiliar term, and when they find it's an exact synonym, adding
another tick mark to their mental tally for "manpages are badly written
and hard to understand."
> Man pages are specialized technical literature demanding a bespoke
> vocabulary. Some employment of jargon is inescapable, even necessary.
> In any case, "catenate" has ~50 years of attestation in this domain
> alone, which constitutes approximately the entire history of Unix
> discourse.
This is no excuse. Specialized technical jargon is only appropriate
when there is an actual difference in meaning. (Thus, your "open
source" vs "free software" counterpoint is bogus.)
> Zack also overlooks the process by which speakers and readers of a
> language grapple with unfamiliar words that they encounter
> unexpectedly. Before undertaking to reach for dictionaries (online or
> otherwise), many readers morphophonemically analyze them to see if
> they can infer their meanings from familiar components.[4]
In grappling with general literature, yes. In grappling with technical
writing, *no*, and again I am speaking from direct experience as an
educator. Readers who encounter an unfamiliar word in technical
documents will most probably assume that the word has a precise meaning
that they must learn, and that they *cannot* deduce that meaning from
context. If they can't find a definition -- and they might not even try
looking in a general dictionary, since they may assume that the relevant
definition is too specialized to appear there; also it seems to me that
schoolchildren are not being taught how to use dictionaries anymore --
*they will give up on the entire document*.
Yes, this is bad. It's an instance of learned helplessness, and it's
going to take decades and major educational reform at the grade-school
level to fix. But there's one thing we, authors of technical documents,
can do about it right now, and that is embrace plain talk. For example,
whenever there really is no difference of meaning, the most common word
in general usage is the word that should be used.
> In Unix culture, one will need to remain conversant with the term
> "catenate" to know why cat(1) is not named "concat(1)". ;-)
This is how I would teach it: 'concat' is too long for Kernighan
and Ritchie's 1970s (or more precisely ASR33) tastes; 'con' was already
in use as an abbreviation for 'console' (not in Unix itself, but in
other contemporary OSes); and 'cat' is the next three letters of
"concatenate". So that's what they picked.
zw
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: catenate vs concatenate (was: strncpy clarify result may not be null terminated)
2023-11-27 18:35 ` Zack Weinberg
@ 2023-11-27 23:45 ` G. Branden Robinson
0 siblings, 0 replies; 77+ messages in thread
From: G. Branden Robinson @ 2023-11-27 23:45 UTC (permalink / raw)
To: Zack Weinberg
Cc: Alejandro Colomar, Jonny Grant, Paul Eggert, Carlos O'Donell,
GNU libc development, 'linux-man'
[-- Attachment #1: Type: text/plain, Size: 10006 bytes --]
Hi Zack,
At 2023-11-27T13:35:01-0500, Zack Weinberg wrote:
> On Mon, Nov 27, 2023, at 11:59 AM, G. Branden Robinson wrote:
> > At 2023-11-27T16:08:17+0100, Alejandro Colomar wrote:
> >> On Mon, Nov 27, 2023 at 09:33:56AM -0500, Zack Weinberg wrote:
> >> > [English pedant mode on]
> >> >
> >> > "Concatenate" is the correct term; "catenate" means something
> >> > completely different, probably "hang between two posts like a
> >> > chain". You can't chop prefixes off a Latinate word and have it
> >> > still mean the same thing.
> >
> > In some cases, you can. Witness the case of
> > "flammable"/inflammable", which are synonymous.
>
> Yeah, and (after seeing Alejandro's reply) I did look up both
> "concatenate" and "catenate" and find that they are synonymous in
> English and both are attested from the 1600s.
>
> **But I had to look that up.**
That's not a bug. When we stop learning, our brains die.
> I cannot recall ever encountering the word "catenate" prior to this
> thread, and my knee-jerk reaction was "typo."
The patellar reflex is not a reliable guide to purposeful development.
> Based on actual experience trying, and mostly failing, to teach
> college undergraduates to read man pages,
I empathize with you here. I have a bit of background in teaching and a
bit more in man page composition. Over the years my emotional response
to being frustrated that I have to quote a man page to other software
professionals in an email or message board has evolved into relief that
I have material of reasonable quality to quote to people...when that
happens. Sometimes a person raises an issue and my internal Gilbert
Gottfried yells, "you FOOL![1] That's plainly documented in--wait, uh,
give me a second. Uh...sh*t, I need to write a patch to this man page."
> I believe someone new to English technical documentation would have a
> different, much more troublesome knee-jerk reaction: "There must be
> some subtle reason why this documentation is using an unfamiliar term
> 'catenate', instead of 'concatenate' that I already know." Followed by
> wasting a bunch of time trying to research that unfamiliar term, and
> when they find it's an exact synonym, adding another tick mark to
> their mental tally for "manpages are badly written and hard to
> understand."
I think your hypothesis is sorely in need of testing. My own feeling is
that unfamiliarity with standard English vocabulary is well down the
list of things that people find frustrating about man pages, if we take
the product of annoyance level times the number of people perceiving a
defect.
> > Man pages are specialized technical literature demanding a bespoke
> > vocabulary. Some employment of jargon is inescapable, even
> > necessary. In any case, "catenate" has ~50 years of attestation in
> > this domain alone, which constitutes approximately the entire
> > history of Unix discourse.
>
> This is no excuse. Specialized technical jargon is only appropriate
> when there is an actual difference in meaning. (Thus, your "open
> source" vs "free software" counterpoint is bogus.)
I offered them in a tongue-in-cheek effort at humor. I don't regard
"Emacs" and "vi" as synonymous, either. Also I know they'll take away
your GNU card if you claim "open source" and "free software"
equivalence.[2]
Analogously, "disenfranchise" and "disfranchise" are also synonymous,
and I prefer the latter to the former for the same reason, popularity be
damned.
> > Before undertaking to reach for dictionaries (online or otherwise),
> > many readers morphophonemically analyze them to see if they can
> > infer their meanings from familiar components.
>
> In grappling with general literature, yes. In grappling with
> technical writing, *no*, and again I am speaking from direct
> experience as an educator. Readers who encounter an unfamiliar word
> in technical documents will most probably assume that the word has a
> precise meaning that they must learn, and that they *cannot* deduce
> that meaning from context.
If that's the case, then our field is doing a crap job at terminology
selection. (Stop the presses, right?)
> If they can't find a definition -- and they might not even try looking
> in a general dictionary, since they may assume that the relevant
> definition is too specialized to appear there; also it seems to me
> that schoolchildren are not being taught how to use dictionaries
> anymore
Enough of them seem to be using urbandictionary.com that the concept
remains familiar.
> -- *they will give up on the entire document*.
>
> Yes, this is bad. It's an instance of learned helplessness, and it's
> going to take decades and major educational reform at the grade-school
> level to fix. But there's one thing we, authors of technical
> documents, can do about it right now, and that is embrace plain talk.
> For example, whenever there really is no difference of meaning, the
> most common word in general usage is the word that should be used.
Again I'm going to have to disagree with you. Where we can
morphologically simplify without loss of meaning, I think that fits a
meaning of "plain talk" that is reasonably robust across the many
cultural contexts in which English is used. Your popularity metric is
vulnerable to sampling biases, particularly of the geographical sort.
And the plainer the talk, the more it is exposed to confounding regional
factors. When I moved to Australia, I had a frustrating experience at
the grocery store. I need to replace a light bulb. No sign anywhere in
the store helped me. While searching fruitlessly, I vaguely noted a
sign for "globes", and a thought that didn't quite reach the top of my
brain observed that globes are a damned weird thing to sell in a
grocery--but hey, it's Australia, maybe they need a _reminder_ that
they're hanging from the Earth's underbelly.[9] After a few more
minutes, these two threads joined.
Q: How many seppos does it take to screw in a light bulb?
A: What's gardening got to do with it?
> > In Unix culture, one will need to remain conversant with the term
> > "catenate" to know why cat(1) is not named "concat(1)". ;-)
>
> This is how I would teach it: 'concat' is too long for Kernighan and
> Ritchie's 1970s (or more precisely ASR33) tastes; 'con' was already in
> use as an abbreviation for 'console' (not in Unix itself, but in other
> contemporary OSes); and 'cat' is the next three letters of
> "concatenate". So that's what they picked.
Please don't teach that. There's a lot about it I find dubious.
1. Thompson was the primary human force for extreme terseness in Unix
culture, as far as I can tell from my readings in CSRC history.
(There were other technical and ergonomic forces driving it, like
low line speeds and the Fortran linker on the PDP-11--which C
initially re-used--being limited to six significant characters in
external identifiers.) Kernighan's own writings suggest that he
preferred clear labels over cryptic ones (see his _The Elements of
Programming Style_, with Plauger; _Software Tools_, also with
Plauger; and _The Unix Programming Environment_, with Pike). I
speculate that Thompson reasoned that he'd never need more than
26*26 commands anyway, so there was no reason to use an encoding
space larger than that to denote them.[3]
2. "ASR33" is a misleading misnomer in a couple of respects. You're
referring to a Western Electric Teletype Model 33. "ASR" is neither
a manufacturer nor a model, but a configuration option.
Specifically, "ASR" devices didn't have keyboards--just a paper tape
punch and reader--so they were not much used for Unix development.
"KSR" (keyboard send and receive) was the relevant configuration.
3. The Bell Labs CSRC didn't use Model 33s anyway. Western Electric
was also part of the Bell monopoly, and by late 1972 at the latest,
Labs personnel got to drive Cadillacs--the Model 37, and moreover
the ones used to produce Unix had the "Greek" character set
extension.[4] You will find references to both devices in the
Seventh Edition man pages, but the terminal driver was "tuned for
Teletype Model 37's"[5], and troff(1) named it as a supported
terminal device rather than the 33.[6] That said, Model 33s were
supported, and widely used at Unix installations outside the Labs.
4. Your deployment of "CON" to refer to the console device may be
anachronistic. I can't find any evidence that Multics used this
name for it. I'm not familiar enough with IBM's OS offerings over
the decades to be able to navigate online material about it. Many
people likely know that MS-DOS called its console device that, but
cat(1) is about a decade older than that product.[7][8]
Regards
Branden
[1] https://www.youtube.com/watch?v=2NpTmKmWdzk
[2] https://www.gnu.org/philosophy/open-source-misses-the-point.en.html
[3] I base this surmise on more than an attempt at mind reading. See
the first footnote on page 6 of McIlroy's "A Research Unix Reader".
https://www.cs.dartmouth.edu/~doug/reader.pdf
[4] https://minnie.tuhs.org/cgi-bin/utree.pl?file=V3/man/man7/greek.7
[5] https://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/man/man4/tty.4
[6] https://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/man/man1/troff.1
[7] https://minnie.tuhs.org/cgi-bin/utree.pl?file=V1/man/man1/cat.1
[8] https://www.os2museum.com/wp/dos/dos-1-0-and-1-1/
[9] I'm teasing. I'd have loved an "upside-down" globe, not least as a
reminder that the melting of the Antarctic ice sheets will pour
inundating destruction down on most of us thanks to the superior
qualities of billionaires. I already had a counter-clockwise clock,
but didn't take it with me to Oz. Also the moon is wrong there.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: strncpy clarify result may not be null terminated
2023-11-09 10:13 ` Jonny Grant
2023-11-09 11:08 ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Alejandro Colomar
@ 2023-11-09 11:13 ` Alejandro Colomar
2023-11-09 14:05 ` Jonny Grant
1 sibling, 1 reply; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-09 11:13 UTC (permalink / raw)
To: Jonny Grant
Cc: Paul Eggert, Carlos O'Donell, Zack Weinberg,
GNU libc development, 'linux-man'
[-- Attachment #1: Type: text/plain, Size: 1169 bytes --]
Hi Jonny,
On Thu, Nov 09, 2023 at 10:13:24AM +0000, Jonny Grant wrote:
> On 09/11/2023 00:29, Alejandro Colomar wrote:
> How about following the style of the other man pages that put the notes about each function below them? (rather than above)
> https://man7.org/linux/man-pages/man3/string.3.html
>
> size_t strlen(const char *s);
> Return the length of the string s.
>
>
> At the moment on string_copying there are // comments on the line above each function. So the presentation of the information is different:
>
> // Copy/catenate a string.
> char *strcpy(char *restrict dst, const char *restrict src);
> char *strcat(char *restrict dst, const char *restrict src);
The reason for this presentation is that I want to first look at what
they do, and only then look at the function you need to do that.
So, if you want to copy from a character sequence into a string, you
search for that, and it will tell you what functions you can use for
that (strncat(3) is the only standard one).
If you want to search for a specific function, you can always search
with '/strncpy'.
Cheers,
Alex
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: strncpy clarify result may not be null terminated
2023-11-09 11:13 ` strncpy clarify result may not be null terminated Alejandro Colomar
@ 2023-11-09 14:05 ` Jonny Grant
2023-11-09 15:04 ` Alejandro Colomar
0 siblings, 1 reply; 77+ messages in thread
From: Jonny Grant @ 2023-11-09 14:05 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Paul Eggert, Carlos O'Donell, Zack Weinberg,
GNU libc development, 'linux-man'
On 09/11/2023 11:13, Alejandro Colomar wrote:
> Hi Jonny,
>
> On Thu, Nov 09, 2023 at 10:13:24AM +0000, Jonny Grant wrote:
>> On 09/11/2023 00:29, Alejandro Colomar wrote:
>> How about following the style of the other man pages that put the notes about each function below them? (rather than above)
>> https://man7.org/linux/man-pages/man3/string.3.html
>>
>> size_t strlen(const char *s);
>> Return the length of the string s.
>>
>>
>> At the moment on string_copying there are // comments on the line above each function. So the presentation of the information is different:
>>
>> // Copy/catenate a string.
>> char *strcpy(char *restrict dst, const char *restrict src);
>> char *strcat(char *restrict dst, const char *restrict src);
>
> The reason for this presentation is that I want to first look at what
> they do, and only then look at the function you need to do that.
That appears different to the man page convention. It looks odd especially with the extra // that I don't recall other pages having in the description, usually that would be for examples. Consistency is best, but I'll leave it with you.
Kind regards
Jonny
>
> So, if you want to copy from a character sequence into a string, you
> search for that, and it will tell you what functions you can use for
> that (strncat(3) is the only standard one).
>
> If you want to search for a specific function, you can always search
> with '/strncpy'.
>
> Cheers,
> Alex
>
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: strncpy clarify result may not be null terminated
2023-11-09 14:05 ` Jonny Grant
@ 2023-11-09 15:04 ` Alejandro Colomar
0 siblings, 0 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-09 15:04 UTC (permalink / raw)
To: Jonny Grant
Cc: Paul Eggert, Carlos O'Donell, Zack Weinberg,
GNU libc development, 'linux-man'
[-- Attachment #1: Type: text/plain, Size: 1545 bytes --]
On Thu, Nov 09, 2023 at 02:05:38PM +0000, Jonny Grant wrote:
>
>
> On 09/11/2023 11:13, Alejandro Colomar wrote:
> > Hi Jonny,
> >
> > On Thu, Nov 09, 2023 at 10:13:24AM +0000, Jonny Grant wrote:
> >> On 09/11/2023 00:29, Alejandro Colomar wrote:
> >> How about following the style of the other man pages that put the notes about each function below them? (rather than above)
> >> https://man7.org/linux/man-pages/man3/string.3.html
> >>
> >> size_t strlen(const char *s);
> >> Return the length of the string s.
> >>
> >>
> >> At the moment on string_copying there are // comments on the line above each function. So the presentation of the information is different:
> >>
> >> // Copy/catenate a string.
> >> char *strcpy(char *restrict dst, const char *restrict src);
> >> char *strcat(char *restrict dst, const char *restrict src);
> >
> > The reason for this presentation is that I want to first look at what
> > they do, and only then look at the function you need to do that.
>
> That appears different to the man page convention. It looks odd especially with the extra // that I don't recall other pages having in the description, usually that would be for examples. Consistency is best, but I'll leave it with you.
The difference is that you're comparing to man3 pages, which document
specific functions. string_copying(7) instead documents how to copy
functions, and specific functions are only means to that end. I'll keep
it this way.
Thanks,
Alex
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: strncpy clarify result may not be null terminated
2023-11-08 9:51 ` strncpy clarify result may not be null terminated Alejandro Colomar
2023-11-08 9:59 ` Thorsten Kukuk
2023-11-08 14:06 ` Zack Weinberg
@ 2023-11-08 19:04 ` DJ Delorie
2023-11-08 19:40 ` Alejandro Colomar
2 siblings, 1 reply; 77+ messages in thread
From: DJ Delorie @ 2023-11-08 19:04 UTC (permalink / raw)
To: Alejandro Colomar; +Cc: libc-alpha, jg, linux-man
Alejandro Colomar <alx@kernel.org> writes:
> strncpy(3) is useful to write to fixed-width buffers like `struct utmp`
> and `struct utmpx`. Is there any other libc API that needs strncpy(3)?
Let's not limit ourselves to glibc APIs. Tar format, for example, uses
fixed length fields (and my bet is that strncpy was created for it) yet
tar is not part of glibc.
IMHO the solution here is to document strncpy with sufficiently obvious
intent that it is NOT a length-limited strcpy (i.e. strlcpy) and should
ONLY be used for its intended purpose (filling a space-padded but not
null-terminated field)
It is not documentation's purpose to limit programmer's creativity, just
to give them an accurate representation of what the functions do.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: strncpy clarify result may not be null terminated
2023-11-08 19:04 ` DJ Delorie
@ 2023-11-08 19:40 ` Alejandro Colomar
2023-11-08 19:58 ` DJ Delorie
0 siblings, 1 reply; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-08 19:40 UTC (permalink / raw)
To: DJ Delorie; +Cc: libc-alpha, jg, linux-man
[-- Attachment #1: Type: text/plain, Size: 1849 bytes --]
Hi DJ,
On Wed, Nov 08, 2023 at 02:04:45PM -0500, DJ Delorie wrote:
> Alejandro Colomar <alx@kernel.org> writes:
> > strncpy(3) is useful to write to fixed-width buffers like `struct utmp`
> > and `struct utmpx`. Is there any other libc API that needs strncpy(3)?
>
> Let's not limit ourselves to glibc APIs. Tar format, for example, uses
> fixed length fields (and my bet is that strncpy was created for it) yet
> tar is not part of glibc.
>
> IMHO the solution here is to document strncpy with sufficiently obvious
> intent that it is NOT a length-limited strcpy (i.e. strlcpy) and should
> ONLY be used for its intended purpose (filling a space-padded but not
> null-terminated field)
Indeed. That's what I did (I think).
DESCRIPTION
These functions copy the string pointed to by src into a null‐
padded character sequence at the fixed‐width buffer pointed to by
dst. If the destination buffer, limited by its size, isn’t large
enough to hold the copy, the resulting character sequence is
truncated.
...
CAVEATS
The name of these functions is confusing. These functions pro‐
duce a null‐padded character sequence, not a string (see
string_copying(7)).
It’s impossible to distinguish truncation by the result of the
call, from a character sequence that just fits the destination
buffer; truncation should be detected by comparing the length of
the input string with the size of the destination buffer.
I refuse to add any hints that strncpy(3) is good for copying strings.
>
> It is not documentation's purpose to limit programmer's creativity, just
> to give them an accurate representation of what the functions do.
Thanks!
Cheers,
Alex
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: strncpy clarify result may not be null terminated
2023-11-08 19:40 ` Alejandro Colomar
@ 2023-11-08 19:58 ` DJ Delorie
2023-11-08 20:13 ` Alejandro Colomar
0 siblings, 1 reply; 77+ messages in thread
From: DJ Delorie @ 2023-11-08 19:58 UTC (permalink / raw)
To: Alejandro Colomar; +Cc: libc-alpha, jg, linux-man
Perhaps an example that shows the problem?
EXAMPLES
strncpy (buf, "1", 5);
{ '1', 0, 0, 0, 0 }
strncpy (buf, "1234", 5);
{ '1', '2', '3', '4', 0 }
strncpy (buf, "12345", 5);
{ '1', '2', '3', '4', '5' }
strncpy (buf, "123456", 5);
{ '1', '2', '3', '4', '5' }
Maybe strcpy and strncpy shouldn't even share man pages, since they're
not as related as we once thought?
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: strncpy clarify result may not be null terminated
2023-11-08 19:58 ` DJ Delorie
@ 2023-11-08 20:13 ` Alejandro Colomar
2023-11-08 21:07 ` DJ Delorie
0 siblings, 1 reply; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-08 20:13 UTC (permalink / raw)
To: DJ Delorie; +Cc: libc-alpha, jg, linux-man
[-- Attachment #1: Type: text/plain, Size: 2647 bytes --]
Hi DJ,
On Wed, Nov 08, 2023 at 02:58:24PM -0500, DJ Delorie wrote:
>
> Perhaps an example that shows the problem?
Maybe.
>
> EXAMPLES
>
> strncpy (buf, "1", 5);
> { '1', 0, 0, 0, 0 }
>
> strncpy (buf, "1234", 5);
> { '1', '2', '3', '4', 0 }
>
> strncpy (buf, "12345", 5);
> { '1', '2', '3', '4', '5' }
>
> strncpy (buf, "123456", 5);
> { '1', '2', '3', '4', '5' }
Would you mind reading the latest versions of strcpy(3), strncpy(3), and
string_copying(7), as in the git repository, and comment your thoughts?
You don't even need to install the pages from git. You can read them
with this:
$ git clone https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/
$ cd man-pages/
$ man ./man3/strcpy.3
$ man ./man3/strncpy.3
$ man ./man7/string_copying.7
Also check the examples and suggest if anything could be clearer.
Thanks!
>
> Maybe strcpy and strncpy shouldn't even share man pages, since they're
> not as related as we once thought?
They don't (anymore):
$ pwd
/home/alx/src/linux/man-pages/man-pages/master
$ git log --oneline -1
b8584be14 (HEAD -> master, korg/master, alx/main, main) bcmp.3: wfix
$ grep -e '\.TH ' -e '\.so ' man3/strcpy.3
.TH strcpy 3 (date) "Linux man-pages (unreleased)"
$ grep -e '\.TH ' -e '\.so ' man3/stpcpy.3
.so man3/strcpy.3
$ grep -e '\.TH ' -e '\.so ' man3/strncpy.3
.so man3/stpncpy.3
$ grep -e '\.TH ' -e '\.so ' man3/stpncpy.3
.TH stpncpy 3 (date) "Linux man-pages (unreleased)"
The only shared page is string_copying(7), which attempts to clarify all
of this. It was only in old versions of the Linux man-pages where they
shared page.
$ pwd
/home/alx/src/linux/man-pages/man-pages/5/5.13
$ git log --oneline -1
091fbf1fe (HEAD, tag: man-pages-5.13) Ready for 5.13
$ grep -e '\.TH ' -e '\.so ' man3/strcpy.3
.TH STRCPY 3 2021-03-22 "GNU" "Linux Programmer's Manual"
$ grep -e '\.TH ' -e '\.so ' man3/stpcpy.3
.TH STPCPY 3 2021-03-22 "GNU" "Linux Programmer's Manual"
$ grep -e '\.TH ' -e '\.so ' man3/strncpy.3
.so man3/strcpy.3
$ grep -e '\.TH ' -e '\.so ' man3/stpncpy.3
.TH STPNCPY 3 2021-03-22 "GNU" "Linux Programmer's Manual"
I've spent the last year working on shadow-utils' string handling code,
while at the same time wrote string_copying(7) as a complete guide to
*cpy() functions, detailing what they do and what they don't, and also
rewrote all the pages for these functions with shorter reference guides
that refer to string_copying(7) for more details.
Cheers,
Alex
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: strncpy clarify result may not be null terminated
2023-11-08 20:13 ` Alejandro Colomar
@ 2023-11-08 21:07 ` DJ Delorie
2023-11-08 21:50 ` Alejandro Colomar
0 siblings, 1 reply; 77+ messages in thread
From: DJ Delorie @ 2023-11-08 21:07 UTC (permalink / raw)
To: Alejandro Colomar; +Cc: libc-alpha, jg, linux-man
Alejandro Colomar <alx@kernel.org> writes:
> Would you mind reading the latest versions of strcpy(3), strncpy(3), and
> string_copying(7), as in the git repository, and comment your thoughts?
I think my examples would work well after the first CAVEATS paragaph:
The name of these functions is confusing. These functions
produce a null-padded character sequence, not a string (see
string_copying(7)), like this:
strncpy (buf, "1", 5) -> { '1', 0, 0, 0, 0 }
strncpy (buf, "1234", 5) -> { '1', '2', '3', '4', 0 }
strncpy (buf, "12345", 5) -> { '1', '2', '3', '4', '5' }
strncpy (buf, "123456", 5) -> { '1', '2', '3', '4', '5' }
> These functions copy the string pointed to by src into a null-padded
> character sequence at the fixed-width buffer pointed to by dst. If the
> destination buffer, limited by its size, isn't large enough to hold the
> copy, the resulting character sequence is truncated.
hmmm... perhaps
These functions copy at most SZ bytes from SRC into a fixed-length
buffer DST, padding any unwritten bytes in DST with NUL bytes.
Specifically, if SRC has a NUL byte in the first SZ bytes, copying
stops there and any remaining bytes in DST are filled with NUL bytes.
If there are no NUL bytes in the first SZ bytes of SRC, SZ bytes are
copied to DST.
This avoids the term "string" completely and emphasises the not-string
nature of the destination.
stpncpy, strncpy - zero a fixed-width buffer and copy a string into a
character sequence with truncation and zero the rest of it
Or "fill a fixed-width zero-padded buffer with bytes from a string"
That avoids saying "copy a string"
string_copying.7:
> For historic reasons, some standard APIs, such as utmpx(5),
Perhaps "some standard APIs and file formats,, such as utmpx(5) or
tar(1)," ?
> however, those padding null bytes are not part of the character
> sequence.
add ", and may not be present if not needed." ?
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: strncpy clarify result may not be null terminated
2023-11-08 21:07 ` DJ Delorie
@ 2023-11-08 21:50 ` Alejandro Colomar
2023-11-08 22:17 ` [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string Alejandro Colomar
0 siblings, 1 reply; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-08 21:50 UTC (permalink / raw)
To: DJ Delorie; +Cc: libc-alpha, jg, linux-man
[-- Attachment #1: Type: text/plain, Size: 2856 bytes --]
Hi DJ,
On Wed, Nov 08, 2023 at 04:07:07PM -0500, DJ Delorie wrote:
> Alejandro Colomar <alx@kernel.org> writes:
> > Would you mind reading the latest versions of strcpy(3), strncpy(3), and
> > string_copying(7), as in the git repository, and comment your thoughts?
>
> I think my examples would work well after the first CAVEATS paragaph:
>
> The name of these functions is confusing. These functions
> produce a null-padded character sequence, not a string (see
> string_copying(7)), like this:
>
> strncpy (buf, "1", 5) -> { '1', 0, 0, 0, 0 }
> strncpy (buf, "1234", 5) -> { '1', '2', '3', '4', 0 }
> strncpy (buf, "12345", 5) -> { '1', '2', '3', '4', '5' }
> strncpy (buf, "123456", 5) -> { '1', '2', '3', '4', '5' }
It fits perfectly there. And it also merges nicely with the paragraph
below.
>
> > These functions copy the string pointed to by src into a null-padded
> > character sequence at the fixed-width buffer pointed to by dst. If the
> > destination buffer, limited by its size, isn't large enough to hold the
> > copy, the resulting character sequence is truncated.
>
> hmmm... perhaps
>
> These functions copy at most SZ bytes from SRC into a fixed-length
> buffer DST, padding any unwritten bytes in DST with NUL bytes.
> Specifically, if SRC has a NUL byte in the first SZ bytes, copying
> stops there and any remaining bytes in DST are filled with NUL bytes.
> If there are no NUL bytes in the first SZ bytes of SRC, SZ bytes are
> copied to DST.
>
> This avoids the term "string" completely and emphasises the not-string
> nature of the destination.
I don't like that, because it talks a lot about what the function does
in terms of low-level copies of bytes. That may induce programmers to
try to find an abstraction in terms of strings.
>
> stpncpy, strncpy - zero a fixed-width buffer and copy a string into a
> character sequence with truncation and zero the rest of it
>
> Or "fill a fixed-width zero-padded buffer with bytes from a string"
But this wording is perfect! I also used a similar wording for the
description. I'll send a patch in a moment.
>
> That avoids saying "copy a string"
Yep!
>
> string_copying.7:
>
> > For historic reasons, some standard APIs, such as utmpx(5),
>
> Perhaps "some standard APIs and file formats,, such as utmpx(5) or
> tar(1)," ?
Yes; thanks!
>
> > however, those padding null bytes are not part of the character
> > sequence.
>
> add ", and may not be present if not needed." ?
I'm not convinced about this one. "needed" is not the right word I
think. For now, I'll add the other suggestions to a patch. Expect it
in a moment.
Cheers,
Alex
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 77+ messages in thread
* [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
2023-11-08 21:50 ` Alejandro Colomar
@ 2023-11-08 22:17 ` Alejandro Colomar
2023-11-08 23:06 ` Paul Eggert
` (3 more replies)
0 siblings, 4 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-08 22:17 UTC (permalink / raw)
To: linux-man
Cc: Alejandro Colomar, libc-alpha, DJ Delorie, Jonny Grant,
Matthew House, Oskari Pirhonen, Thorsten Kukuk,
Adhemerval Zanella Netto, Zack Weinberg, G. Branden Robinson,
Carlos O'Donell
[-- Attachment #1: Type: text/plain, Size: 3837 bytes --]
These copy *from* a string. But the destination is a simple character
sequence within an array; not a string.
Suggested-by: DJ Delorie <dj@redhat.com>
Cc: Jonny Grant <jg@jguk.org>
Cc: Matthew House <mattlloydhouse@gmail.com>
Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com>
Cc: Thorsten Kukuk <kukuk@suse.com>
Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Zack Weinberg <zack@owlfolio.org>
Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Carlos O'Donell <carlos@redhat.com>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
Resending, including the mailing lists, which I forgot.
man3/stpncpy.3 | 17 +++++++++++++----
man7/string_copying.7 | 20 ++++++++++----------
2 files changed, 23 insertions(+), 14 deletions(-)
diff --git a/man3/stpncpy.3 b/man3/stpncpy.3
index b6bbfd0a3..f86ff8c29 100644
--- a/man3/stpncpy.3
+++ b/man3/stpncpy.3
@@ -6,9 +6,8 @@
.TH stpncpy 3 (date) "Linux man-pages (unreleased)"
.SH NAME
stpncpy, strncpy
-\- zero a fixed-width buffer and
-copy a string into a character sequence with truncation
-and zero the rest of it
+\-
+fill a fixed-width null-padded buffer with bytes from a string
.SH LIBRARY
Standard C library
.RI ( libc ", " \-lc )
@@ -37,7 +36,7 @@ .SH SYNOPSIS
_GNU_SOURCE
.fi
.SH DESCRIPTION
-These functions copy the string pointed to by
+These functions copy bytes from the string pointed to by
.I src
into a null-padded character sequence at the fixed-width buffer pointed to by
.IR dst .
@@ -110,6 +109,16 @@ .SH CAVEATS
These functions produce a null-padded character sequence,
not a string (see
.BR string_copying (7)).
+For example:
+.P
+.in +4n
+.EX
+strncpy(buf, "1", 5); // { \[aq]1\[aq], 0, 0, 0, 0 }
+strncpy(buf, "1234", 5); // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], 0 }
+strncpy(buf, "12345", 5); // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], \[aq]5\[aq] }
+strncpy(buf, "123456", 5); // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], \[aq]5\[aq] }
+.EE
+.in
.P
It's impossible to distinguish truncation by the result of the call,
from a character sequence that just fits the destination buffer;
diff --git a/man7/string_copying.7 b/man7/string_copying.7
index cadf1c539..0e179ba34 100644
--- a/man7/string_copying.7
+++ b/man7/string_copying.7
@@ -41,15 +41,11 @@ .SS Strings
.\" ----- SYNOPSIS :: Null-padded character sequences --------/
.SS Null-padded character sequences
.nf
-// Zero a fixed-width buffer, and
-// copy a string into a character sequence with truncation.
-.BI "char *stpncpy(char " dst "[restrict ." sz "], \
+// Fill a fixed-width null-padded buffer with bytes from a string.
+.BI "char *strncpy(char " dst "[restrict ." sz "], \
const char *restrict " src ,
.BI " size_t " sz );
-.P
-// Zero a fixed-width buffer, and
-// copy a string into a character sequence with truncation.
-.BI "char *strncpy(char " dst "[restrict ." sz "], \
+.BI "char *stpncpy(char " dst "[restrict ." sz "], \
const char *restrict " src ,
.BI " size_t " sz );
.P
@@ -240,14 +236,18 @@ .SS Truncate or not?
.\" ----- DESCRIPTION :: Null-padded character sequences --------------/
.SS Null-padded character sequences
For historic reasons,
-some standard APIs,
+some standard APIs and file formats,
such as
-.BR utmpx (5),
+.BR utmpx (5)
+and
+.BR tar (1),
use null-padded character sequences in fixed-width buffers.
To interface with them,
specialized functions need to be used.
.P
-To copy strings into them, use
+To copy bytes from strings into these buffers, use
+.BR strncpy (3)
+or
.BR stpncpy (3).
.P
To copy from an unterminated string within a fixed-width buffer into a string,
--
2.42.0
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
2023-11-08 22:17 ` [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string Alejandro Colomar
@ 2023-11-08 23:06 ` Paul Eggert
2023-11-08 23:28 ` DJ Delorie
` (2 more replies)
2023-11-09 7:23 ` Oskari Pirhonen
` (2 subsequent siblings)
3 siblings, 3 replies; 77+ messages in thread
From: Paul Eggert @ 2023-11-08 23:06 UTC (permalink / raw)
To: Alejandro Colomar, linux-man
Cc: libc-alpha, DJ Delorie, Jonny Grant, Matthew House,
Oskari Pirhonen, Thorsten Kukuk, Adhemerval Zanella Netto,
Zack Weinberg, G. Branden Robinson, Carlos O'Donell
On 11/8/23 14:17, Alejandro Colomar wrote:
> These copy*from* a string
Not necessarily. For example, in strncpy (DST, SRC, N), SRC need not be
a string.
By the way, have you looked at the recent (i.e., this-year) changes to
the glibc manual's string section? They're relevant.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
2023-11-08 23:06 ` Paul Eggert
@ 2023-11-08 23:28 ` DJ Delorie
2023-11-09 0:24 ` Alejandro Colomar
2023-11-09 14:11 ` Jonny Grant
2 siblings, 0 replies; 77+ messages in thread
From: DJ Delorie @ 2023-11-08 23:28 UTC (permalink / raw)
To: Paul Eggert
Cc: alx, linux-man, libc-alpha, jg, mattlloydhouse, xxc3ncoredxx,
kukuk, adhemerval.zanella, zack, g.branden.robinson, carlos
Paul Eggert <eggert@cs.ucla.edu> writes:
> Not necessarily. For example, in strncpy (DST, SRC, N), SRC need not be
> a string.
But it will be treated as one, for the purposes of this function.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
2023-11-08 23:06 ` Paul Eggert
2023-11-08 23:28 ` DJ Delorie
@ 2023-11-09 0:24 ` Alejandro Colomar
2023-11-09 14:11 ` Jonny Grant
2 siblings, 0 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-09 0:24 UTC (permalink / raw)
To: Paul Eggert
Cc: linux-man, libc-alpha, DJ Delorie, Jonny Grant, Matthew House,
Oskari Pirhonen, Thorsten Kukuk, Adhemerval Zanella Netto,
Zack Weinberg, G. Branden Robinson, Carlos O'Donell
[-- Attachment #1: Type: text/plain, Size: 2592 bytes --]
Hi Paul,
On Wed, Nov 08, 2023 at 03:06:40PM -0800, Paul Eggert wrote:
> On 11/8/23 14:17, Alejandro Colomar wrote:
> > These copy*from* a string
>
> Not necessarily. For example, in strncpy (DST, SRC, N), SRC need not be a
> string.
Pedantically, true. But since it's quite rare to copy from a
fixed-width null-padded array into another, I didn't want to waste
space on that and possibly confuse readers. In such a case, the source
buffer must be at least as large as the destination buffer, and will
likely be the same size (because having fixed-width stuff, why make it
different), so memcpy(3) will probably be simpler.
>
> By the way, have you looked at the recent (i.e., this-year) changes to the
> glibc manual's string section? They're relevant.
I hadn't; after your message, I have.
<https://sourceware.org/glibc/manual/2.38/html_mono/libc.html#String-and-Array-Utilities>
I like how it connects all the functions, and it explains the concepts
and gives advice (e.g., avoid truncation as it's usually evil), and
compares the different functions.
However, I think it misses a few things:
- strncpy(3) and strncat(3) are not related at all. They don't have
the same relation that strcpy(3) and strcat(3) have. You can't
write the following code in any case:
strncpy(dst, foo, sizeof(dst));
strncat(dst, bar, sizeof(dst));
as you would with strcpy(3) or strlcpy(3).
strncpy(3) and strncat(3) are opposite functions: the former reads
from a string and writes to a fixed-width null-padded buffer, and the
latter reads from a fixed-width buffer and writes to a string. (You
can use them in other cases, pedantically, as you said above, but
those cases are rather unreal.)
- strncpy(3) is in a section that starts by saying:
> The functions described in this section copy or concatenate the
> possibly-truncated contents of a string or array to another
This may mislead programmers to believe it is useful for producing
strings, when it's not.
In general, I would like the manual to put some more distance between
these functions and the term "string". As DJ mentioned, it might be
useful to mention utmp(5) and tar(1) as niche use cases for
st[rp]ncpy(3).
And now for some typo:
- In the following sentence under "5.2 String and Array Conventions":
> The array arguments and return values for these functions have type
> void * or wchar_t.
I believe it meant `void *` or `wchar_t *`
Cheers,
Alex
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
2023-11-08 23:06 ` Paul Eggert
2023-11-08 23:28 ` DJ Delorie
2023-11-09 0:24 ` Alejandro Colomar
@ 2023-11-09 14:11 ` Jonny Grant
2023-11-09 14:35 ` Alejandro Colomar
2 siblings, 1 reply; 77+ messages in thread
From: Jonny Grant @ 2023-11-09 14:11 UTC (permalink / raw)
To: Paul Eggert, Alejandro Colomar, linux-man
Cc: libc-alpha, DJ Delorie, Matthew House, Oskari Pirhonen,
Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg,
G. Branden Robinson, Carlos O'Donell
On 08/11/2023 23:06, Paul Eggert wrote:
> On 11/8/23 14:17, Alejandro Colomar wrote:
>> These copy*from* a string
>
> Not necessarily. For example, in strncpy (DST, SRC, N), SRC need not be a string.
>
> By the way, have you looked at the recent (i.e., this-year) changes to the glibc manual's string section? They're relevant.
That's a great reference page Paul, lots of useful information in the manual.
https://www.gnu.org/software/libc/manual/html_node/String-and-Array-Utilities.html
Re this man page:
https://man7.org/linux/man-pages/man3/string.3.html
Obsolete functions
char *strncpy(char dest[restrict .n], const char src[restrict .n],
size_t n);
Copy at most n bytes from string src to dest, returning a
pointer to the start of dest.
It could clarify
"Copy at most n bytes from string src to ARRAY dest, returning a
pointer to the start of ARRAY dest."
(caps for my emphasis in this email)
Kind regards
Jonny
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
2023-11-09 14:11 ` Jonny Grant
@ 2023-11-09 14:35 ` Alejandro Colomar
2023-11-09 14:47 ` Jonny Grant
0 siblings, 1 reply; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-09 14:35 UTC (permalink / raw)
To: Jonny Grant
Cc: Paul Eggert, linux-man, libc-alpha, DJ Delorie, Matthew House,
Oskari Pirhonen, Thorsten Kukuk, Adhemerval Zanella Netto,
Zack Weinberg, G. Branden Robinson, Carlos O'Donell
[-- Attachment #1: Type: text/plain, Size: 1456 bytes --]
Hi Jonny,
On Thu, Nov 09, 2023 at 02:11:14PM +0000, Jonny Grant wrote:
> On 08/11/2023 23:06, Paul Eggert wrote:
> > On 11/8/23 14:17, Alejandro Colomar wrote:
> >> These copy*from* a string
> >
> > Not necessarily. For example, in strncpy (DST, SRC, N), SRC need not be a string.
> >
> > By the way, have you looked at the recent (i.e., this-year) changes to the glibc manual's string section? They're relevant.
>
> That's a great reference page Paul, lots of useful information in the manual.
> https://www.gnu.org/software/libc/manual/html_node/String-and-Array-Utilities.html
>
> Re this man page:
>
> https://man7.org/linux/man-pages/man3/string.3.html
>
> Obsolete functions
> char *strncpy(char dest[restrict .n], const char src[restrict .n],
> size_t n);
> Copy at most n bytes from string src to dest, returning a
> pointer to the start of dest.
Uh, I forgot about that page. I'll have a look at it and update it. At
least, I need to remove that "Obsolete functions".
>
>
> It could clarify
> "Copy at most n bytes from string src to ARRAY dest, returning a
> pointer to the start of ARRAY dest."
I think I prefer DJ's suggestion:
"Fill a fixed‐width null‐padded buffer with bytes from a string."
Thanks!
Alex
>
> (caps for my emphasis in this email)
>
> Kind regards
> Jonny
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
2023-11-09 14:35 ` Alejandro Colomar
@ 2023-11-09 14:47 ` Jonny Grant
2023-11-09 15:02 ` Alejandro Colomar
0 siblings, 1 reply; 77+ messages in thread
From: Jonny Grant @ 2023-11-09 14:47 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Paul Eggert, linux-man, libc-alpha, DJ Delorie, Matthew House,
Oskari Pirhonen, Thorsten Kukuk, Adhemerval Zanella Netto,
Zack Weinberg, G. Branden Robinson, Carlos O'Donell
On 09/11/2023 14:35, Alejandro Colomar wrote:
> Hi Jonny,
>
> On Thu, Nov 09, 2023 at 02:11:14PM +0000, Jonny Grant wrote:
>> On 08/11/2023 23:06, Paul Eggert wrote:
>>> On 11/8/23 14:17, Alejandro Colomar wrote:
>>>> These copy*from* a string
>>>
>>> Not necessarily. For example, in strncpy (DST, SRC, N), SRC need not be a string.
>>>
>>> By the way, have you looked at the recent (i.e., this-year) changes to the glibc manual's string section? They're relevant.
>>
>> That's a great reference page Paul, lots of useful information in the manual.
>> https://www.gnu.org/software/libc/manual/html_node/String-and-Array-Utilities.html
>>
>> Re this man page:
>>
>> https://man7.org/linux/man-pages/man3/string.3.html
>>
>> Obsolete functions
>> char *strncpy(char dest[restrict .n], const char src[restrict .n],
>> size_t n);
>> Copy at most n bytes from string src to dest, returning a
>> pointer to the start of dest.
>
> Uh, I forgot about that page. I'll have a look at it and update it. At
> least, I need to remove that "Obsolete functions".
>
>>
>>
>> It could clarify
>> "Copy at most n bytes from string src to ARRAY dest, returning a
>> pointer to the start of ARRAY dest."
>
> I think I prefer DJ's suggestion:
>
> "Fill a fixed‐width null‐padded buffer with bytes from a string."
Better to make it clear it's null-padded after?
"Fill a fixed‐width buffer with bytes from a string and pad with null bytes."
I'll leave it with you.
Kind regards
Jonny
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
2023-11-09 14:47 ` Jonny Grant
@ 2023-11-09 15:02 ` Alejandro Colomar
2023-11-09 17:30 ` DJ Delorie
0 siblings, 1 reply; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-09 15:02 UTC (permalink / raw)
To: Jonny Grant
Cc: Paul Eggert, linux-man, libc-alpha, DJ Delorie, Matthew House,
Oskari Pirhonen, Thorsten Kukuk, Adhemerval Zanella Netto,
Zack Weinberg, G. Branden Robinson, Carlos O'Donell
[-- Attachment #1: Type: text/plain, Size: 756 bytes --]
On Thu, Nov 09, 2023 at 02:47:05PM +0000, Jonny Grant wrote:
> >> It could clarify
> >> "Copy at most n bytes from string src to ARRAY dest, returning a
> >> pointer to the start of ARRAY dest."
> >
> > I think I prefer DJ's suggestion:
> >
> > "Fill a fixed‐width null‐padded buffer with bytes from a string."
>
> Better to make it clear it's null-padded after?
>
> "Fill a fixed‐width buffer with bytes from a string and pad with null bytes."
Yes, that looks even better. And I wasn't very happy with "bytes".
Maybe:
"Fill a fixed-width buffer with characters from a string and pad with
null bytes."
Thanks,
Alex
>
> I'll leave it with you.
>
> Kind regards
> Jonny
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
2023-11-09 15:02 ` Alejandro Colomar
@ 2023-11-09 17:30 ` DJ Delorie
2023-11-09 17:54 ` Andreas Schwab
` (2 more replies)
0 siblings, 3 replies; 77+ messages in thread
From: DJ Delorie @ 2023-11-09 17:30 UTC (permalink / raw)
To: Alejandro Colomar
Cc: jg, eggert, linux-man, libc-alpha, mattlloydhouse, xxc3ncoredxx,
kukuk, adhemerval.zanella, zack, g.branden.robinson, carlos
Alejandro Colomar <alx@kernel.org> writes:
> "Fill a fixed-width buffer with characters from a string and pad with
> null bytes."
The pedant in me says it should be NUL bytes (or NUL's), not null bytes.
nul/NUL is a character, null/NULL is a pointer.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
2023-11-09 17:30 ` DJ Delorie
@ 2023-11-09 17:54 ` Andreas Schwab
2023-11-09 18:00 ` Alejandro Colomar
2023-11-09 19:42 ` Jonny Grant
2 siblings, 0 replies; 77+ messages in thread
From: Andreas Schwab @ 2023-11-09 17:54 UTC (permalink / raw)
To: DJ Delorie
Cc: Alejandro Colomar, jg, eggert, linux-man, libc-alpha,
mattlloydhouse, xxc3ncoredxx, kukuk, adhemerval.zanella, zack,
g.branden.robinson, carlos
On Nov 09 2023, DJ Delorie wrote:
> The pedant in me says it should be NUL bytes (or NUL's), not null bytes.
> nul/NUL is a character, null/NULL is a pointer.
NUL is the ASCII abbreviation for Null (see RFC 20).
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
"And now for something completely different."
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
2023-11-09 17:30 ` DJ Delorie
2023-11-09 17:54 ` Andreas Schwab
@ 2023-11-09 18:00 ` Alejandro Colomar
2023-11-09 19:42 ` Jonny Grant
2 siblings, 0 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-09 18:00 UTC (permalink / raw)
To: DJ Delorie
Cc: jg, eggert, linux-man, libc-alpha, mattlloydhouse, xxc3ncoredxx,
kukuk, adhemerval.zanella, zack, g.branden.robinson, carlos
[-- Attachment #1: Type: text/plain, Size: 2519 bytes --]
Hi DJ,
On Thu, Nov 09, 2023 at 12:30:17PM -0500, DJ Delorie wrote:
> Alejandro Colomar <alx@kernel.org> writes:
> > "Fill a fixed-width buffer with characters from a string and pad with
> > null bytes."
>
> The pedant in me says it should be NUL bytes (or NUL's), not null bytes.
> nul/NUL is a character, null/NULL is a pointer.
Here's what man-pages(7) (written by Michael Kerrisk) says:
NULL, NUL, null pointer, and null byte
A null pointer is a pointer that points to nothing, and is nor‐
mally indicated by the constant NULL. On the other hand, NUL is
the null byte, a byte with the value 0, represented in C via the
character constant '\0'.
The preferred term for the pointer is "null pointer" or simply
"NULL"; avoid writing "NULL pointer".
The preferred term for the byte is "null byte". Avoid writing
"NUL", since it is too easily confused with "NULL". Avoid also
the terms "zero byte" and "null character". The byte that termi‐
nates a C string should be described as "the terminating null
byte"; strings may be described as "null‐terminated", but avoid
the use of "NUL‐terminated".
I don't necessarily agree with all of that, but mostly. I don't agree
with not saying null character, because as well as we have the null wide
character (L'\0'), using null character for '\0' makes it symmetric.
Other than that, I mostly agree with Michael. Here's what I think of
these terms:
- NULL is a null pointer constant (as well as 0 is another null pointer
constant).
- A null pointer is a more generic term that includes a run-time null
pointer as well.
- The null byte is 0.
- The null character, '\0', is composed of a null byte.
- The null wide character, L'\0' is composed of several null bytes.
- NUL is the ASCII name of the null byte, or maybe is it null character
here? It's a bit muddy.
I use null byte for padding, and null character for the string
terminator, to make a stronger difference between strings and
null-padded fixed-width arrays. I need to review string_copying(7) to
make sure I was consistent in this regard.
Colloquially, I find it fine to write NULL instead of null pointer (even
for non-constant cases), and NUL instead of any of "null character",
"null byte", or "null wide character", but for being precise, I prefer
"null something".
Cheers,
Alex
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
2023-11-09 17:30 ` DJ Delorie
2023-11-09 17:54 ` Andreas Schwab
2023-11-09 18:00 ` Alejandro Colomar
@ 2023-11-09 19:42 ` Jonny Grant
2 siblings, 0 replies; 77+ messages in thread
From: Jonny Grant @ 2023-11-09 19:42 UTC (permalink / raw)
To: DJ Delorie, Alejandro Colomar
Cc: eggert, linux-man, libc-alpha, mattlloydhouse, xxc3ncoredxx,
kukuk, adhemerval.zanella, zack, g.branden.robinson, carlos
On 09/11/2023 17:30, DJ Delorie wrote:
> Alejandro Colomar <alx@kernel.org> writes:
>> "Fill a fixed-width buffer with characters from a string and pad with
>> null bytes."
>
> The pedant in me says it should be NUL bytes (or NUL's), not null bytes.
> nul/NUL is a character, null/NULL is a pointer.
>
NUL would be a big improvement.
Kind regards, Jonny
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
2023-11-08 22:17 ` [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string Alejandro Colomar
2023-11-08 23:06 ` Paul Eggert
@ 2023-11-09 7:23 ` Oskari Pirhonen
2023-11-09 15:20 ` [PATCH v2 1/2] " Alejandro Colomar
2023-11-09 15:20 ` [PATCH v2 2/2] stpncpy.3, string.3, string_copying.7: Clarify that st[rp]ncpy() pad with null bytes Alejandro Colomar
3 siblings, 0 replies; 77+ messages in thread
From: Oskari Pirhonen @ 2023-11-09 7:23 UTC (permalink / raw)
To: Alejandro Colomar
Cc: linux-man, libc-alpha, DJ Delorie, Jonny Grant, Matthew House,
Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg,
G. Branden Robinson, Carlos O'Donell
[-- Attachment #1: Type: text/plain, Size: 4198 bytes --]
On Wed, Nov 08, 2023 at 23:17:07 +0100, Alejandro Colomar wrote:
> These copy *from* a string. But the destination is a simple character
> sequence within an array; not a string.
>
> Suggested-by: DJ Delorie <dj@redhat.com>
> Cc: Jonny Grant <jg@jguk.org>
> Cc: Matthew House <mattlloydhouse@gmail.com>
> Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com>
> Cc: Thorsten Kukuk <kukuk@suse.com>
> Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
> Cc: Zack Weinberg <zack@owlfolio.org>
> Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
> Cc: Carlos O'Donell <carlos@redhat.com>
> Signed-off-by: Alejandro Colomar <alx@kernel.org>
> ---
I like the "with bytes from a string" wording. Good call.
- Oskari
>
> Resending, including the mailing lists, which I forgot.
>
> man3/stpncpy.3 | 17 +++++++++++++----
> man7/string_copying.7 | 20 ++++++++++----------
> 2 files changed, 23 insertions(+), 14 deletions(-)
>
> diff --git a/man3/stpncpy.3 b/man3/stpncpy.3
> index b6bbfd0a3..f86ff8c29 100644
> --- a/man3/stpncpy.3
> +++ b/man3/stpncpy.3
> @@ -6,9 +6,8 @@
> .TH stpncpy 3 (date) "Linux man-pages (unreleased)"
> .SH NAME
> stpncpy, strncpy
> -\- zero a fixed-width buffer and
> -copy a string into a character sequence with truncation
> -and zero the rest of it
> +\-
> +fill a fixed-width null-padded buffer with bytes from a string
> .SH LIBRARY
> Standard C library
> .RI ( libc ", " \-lc )
> @@ -37,7 +36,7 @@ .SH SYNOPSIS
> _GNU_SOURCE
> .fi
> .SH DESCRIPTION
> -These functions copy the string pointed to by
> +These functions copy bytes from the string pointed to by
> .I src
> into a null-padded character sequence at the fixed-width buffer pointed to by
> .IR dst .
> @@ -110,6 +109,16 @@ .SH CAVEATS
> These functions produce a null-padded character sequence,
> not a string (see
> .BR string_copying (7)).
> +For example:
> +.P
> +.in +4n
> +.EX
> +strncpy(buf, "1", 5); // { \[aq]1\[aq], 0, 0, 0, 0 }
> +strncpy(buf, "1234", 5); // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], 0 }
> +strncpy(buf, "12345", 5); // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], \[aq]5\[aq] }
> +strncpy(buf, "123456", 5); // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], \[aq]5\[aq] }
> +.EE
> +.in
> .P
> It's impossible to distinguish truncation by the result of the call,
> from a character sequence that just fits the destination buffer;
> diff --git a/man7/string_copying.7 b/man7/string_copying.7
> index cadf1c539..0e179ba34 100644
> --- a/man7/string_copying.7
> +++ b/man7/string_copying.7
> @@ -41,15 +41,11 @@ .SS Strings
> .\" ----- SYNOPSIS :: Null-padded character sequences --------/
> .SS Null-padded character sequences
> .nf
> -// Zero a fixed-width buffer, and
> -// copy a string into a character sequence with truncation.
> -.BI "char *stpncpy(char " dst "[restrict ." sz "], \
> +// Fill a fixed-width null-padded buffer with bytes from a string.
> +.BI "char *strncpy(char " dst "[restrict ." sz "], \
> const char *restrict " src ,
> .BI " size_t " sz );
> -.P
> -// Zero a fixed-width buffer, and
> -// copy a string into a character sequence with truncation.
> -.BI "char *strncpy(char " dst "[restrict ." sz "], \
> +.BI "char *stpncpy(char " dst "[restrict ." sz "], \
> const char *restrict " src ,
> .BI " size_t " sz );
> .P
> @@ -240,14 +236,18 @@ .SS Truncate or not?
> .\" ----- DESCRIPTION :: Null-padded character sequences --------------/
> .SS Null-padded character sequences
> For historic reasons,
> -some standard APIs,
> +some standard APIs and file formats,
> such as
> -.BR utmpx (5),
> +.BR utmpx (5)
> +and
> +.BR tar (1),
> use null-padded character sequences in fixed-width buffers.
> To interface with them,
> specialized functions need to be used.
> .P
> -To copy strings into them, use
> +To copy bytes from strings into these buffers, use
> +.BR strncpy (3)
> +or
> .BR stpncpy (3).
> .P
> To copy from an unterminated string within a fixed-width buffer into a string,
> --
> 2.42.0
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 77+ messages in thread
* [PATCH v2 1/2] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
2023-11-08 22:17 ` [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string Alejandro Colomar
2023-11-08 23:06 ` Paul Eggert
2023-11-09 7:23 ` Oskari Pirhonen
@ 2023-11-09 15:20 ` Alejandro Colomar
2023-11-09 15:20 ` [PATCH v2 2/2] stpncpy.3, string.3, string_copying.7: Clarify that st[rp]ncpy() pad with null bytes Alejandro Colomar
3 siblings, 0 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-09 15:20 UTC (permalink / raw)
To: linux-man
Cc: Alejandro Colomar, libc-alpha, DJ Delorie, Oskari Pirhonen,
Jonny Grant, Matthew House, Thorsten Kukuk,
Adhemerval Zanella Netto, Zack Weinberg, G. Branden Robinson,
Carlos O'Donell, Paul Eggert, Xi Ruoyao
These copy *from* a string. But the destination is a simple character
sequence within an array; not a string.
Suggested-by: DJ Delorie <dj@redhat.com>
Acked-by: Oskari Pirhonen <xxc3ncoredxx@gmail.com>
Cc: Jonny Grant <jg@jguk.org>
Cc: Matthew House <mattlloydhouse@gmail.com>
Cc: Thorsten Kukuk <kukuk@suse.com>
Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Zack Weinberg <zack@owlfolio.org>
Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Carlos O'Donell <carlos@redhat.com>
Cc: Paul Eggert <eggert@cs.ucla.edu>
Cc: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
Patch 1/2 is just a resend, with more CCs.
Patch 2/2 is a new one further clarifying the wording, after Jonny's
suggestions.
man3/stpncpy.3 | 17 +++++++++++++----
man7/string_copying.7 | 20 ++++++++++----------
2 files changed, 23 insertions(+), 14 deletions(-)
diff --git a/man3/stpncpy.3 b/man3/stpncpy.3
index b6bbfd0a3..f86ff8c29 100644
--- a/man3/stpncpy.3
+++ b/man3/stpncpy.3
@@ -6,9 +6,8 @@
.TH stpncpy 3 (date) "Linux man-pages (unreleased)"
.SH NAME
stpncpy, strncpy
-\- zero a fixed-width buffer and
-copy a string into a character sequence with truncation
-and zero the rest of it
+\-
+fill a fixed-width null-padded buffer with bytes from a string
.SH LIBRARY
Standard C library
.RI ( libc ", " \-lc )
@@ -37,7 +36,7 @@ .SH SYNOPSIS
_GNU_SOURCE
.fi
.SH DESCRIPTION
-These functions copy the string pointed to by
+These functions copy bytes from the string pointed to by
.I src
into a null-padded character sequence at the fixed-width buffer pointed to by
.IR dst .
@@ -110,6 +109,16 @@ .SH CAVEATS
These functions produce a null-padded character sequence,
not a string (see
.BR string_copying (7)).
+For example:
+.P
+.in +4n
+.EX
+strncpy(buf, "1", 5); // { \[aq]1\[aq], 0, 0, 0, 0 }
+strncpy(buf, "1234", 5); // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], 0 }
+strncpy(buf, "12345", 5); // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], \[aq]5\[aq] }
+strncpy(buf, "123456", 5); // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], \[aq]5\[aq] }
+.EE
+.in
.P
It's impossible to distinguish truncation by the result of the call,
from a character sequence that just fits the destination buffer;
diff --git a/man7/string_copying.7 b/man7/string_copying.7
index cadf1c539..0e179ba34 100644
--- a/man7/string_copying.7
+++ b/man7/string_copying.7
@@ -41,15 +41,11 @@ .SS Strings
.\" ----- SYNOPSIS :: Null-padded character sequences --------/
.SS Null-padded character sequences
.nf
-// Zero a fixed-width buffer, and
-// copy a string into a character sequence with truncation.
-.BI "char *stpncpy(char " dst "[restrict ." sz "], \
+// Fill a fixed-width null-padded buffer with bytes from a string.
+.BI "char *strncpy(char " dst "[restrict ." sz "], \
const char *restrict " src ,
.BI " size_t " sz );
-.P
-// Zero a fixed-width buffer, and
-// copy a string into a character sequence with truncation.
-.BI "char *strncpy(char " dst "[restrict ." sz "], \
+.BI "char *stpncpy(char " dst "[restrict ." sz "], \
const char *restrict " src ,
.BI " size_t " sz );
.P
@@ -240,14 +236,18 @@ .SS Truncate or not?
.\" ----- DESCRIPTION :: Null-padded character sequences --------------/
.SS Null-padded character sequences
For historic reasons,
-some standard APIs,
+some standard APIs and file formats,
such as
-.BR utmpx (5),
+.BR utmpx (5)
+and
+.BR tar (1),
use null-padded character sequences in fixed-width buffers.
To interface with them,
specialized functions need to be used.
.P
-To copy strings into them, use
+To copy bytes from strings into these buffers, use
+.BR strncpy (3)
+or
.BR stpncpy (3).
.P
To copy from an unterminated string within a fixed-width buffer into a string,
--
2.42.0
^ permalink raw reply [flat|nested] 77+ messages in thread
* [PATCH v2 2/2] stpncpy.3, string.3, string_copying.7: Clarify that st[rp]ncpy() pad with null bytes
2023-11-08 22:17 ` [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string Alejandro Colomar
` (2 preceding siblings ...)
2023-11-09 15:20 ` [PATCH v2 1/2] " Alejandro Colomar
@ 2023-11-09 15:20 ` Alejandro Colomar
2023-11-10 5:47 ` Oskari Pirhonen
3 siblings, 1 reply; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-09 15:20 UTC (permalink / raw)
To: linux-man
Cc: Alejandro Colomar, libc-alpha, Jonny Grant, DJ Delorie,
Matthew House, Oskari Pirhonen, Thorsten Kukuk,
Adhemerval Zanella Netto, Zack Weinberg, G. Branden Robinson,
Carlos O'Donell, Paul Eggert, Xi Ruoyao
The previous wording could be interpreted as if the nulls were already
in place. Clarify that it's this function which pads with null bytes.
Also, it copies "characters" from the src string. That's a bit more
specific than copying "bytes", and makes it clearer that the terminating
null byte in src is not part of the copy.
Suggested-by: Jonny Grant <jg@jguk.org>
Cc: DJ Delorie <dj@redhat.com>
Cc: Jonny Grant <jg@jguk.org>
Cc: Matthew House <mattlloydhouse@gmail.com>
Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com>
Cc: Thorsten Kukuk <kukuk@suse.com>
Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Zack Weinberg <zack@owlfolio.org>
Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Carlos O'Donell <carlos@redhat.com>
Cc: Paul Eggert <eggert@cs.ucla.edu>
Cc: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
man3/stpncpy.3 | 10 ++++++----
man3/string.3 | 11 ++---------
man7/string_copying.7 | 3 ++-
3 files changed, 10 insertions(+), 14 deletions(-)
diff --git a/man3/stpncpy.3 b/man3/stpncpy.3
index f86ff8c29..3cf4eb371 100644
--- a/man3/stpncpy.3
+++ b/man3/stpncpy.3
@@ -7,7 +7,8 @@
.SH NAME
stpncpy, strncpy
\-
-fill a fixed-width null-padded buffer with bytes from a string
+fill a fixed-width buffer with characters from a string
+and pad with null bytes
.SH LIBRARY
Standard C library
.RI ( libc ", " \-lc )
@@ -36,10 +37,11 @@ .SH SYNOPSIS
_GNU_SOURCE
.fi
.SH DESCRIPTION
-These functions copy bytes from the string pointed to by
+These functions copy characters from the string pointed to by
.I src
-into a null-padded character sequence at the fixed-width buffer pointed to by
-.IR dst .
+into a character sequence at the fixed-width buffer pointed to by
+.IR dst ,
+and pad with null bytes.
If the destination buffer,
limited by its size,
isn't large enough to hold the copy,
diff --git a/man3/string.3 b/man3/string.3
index aba5efd2b..bd8b342a6 100644
--- a/man3/string.3
+++ b/man3/string.3
@@ -179,21 +179,14 @@ .SH SYNOPSIS
.I n
bytes to
.IR dest .
-.SS Obsolete functions
.TP
.nf
.BI "char *strncpy(char " dest "[restrict ." n "], \
const char " src "[restrict ." n ],
.BI " size_t " n );
.fi
-Copy at most
-.I n
-bytes from string
-.I src
-to
-.IR dest ,
-returning a pointer to the start of
-.IR dest .
+Fill a fixed‐width buffer with characters from a string
+and pad with null bytes.
.SH DESCRIPTION
The string functions perform operations on null-terminated
strings.
diff --git a/man7/string_copying.7 b/man7/string_copying.7
index 0e179ba34..865271c6f 100644
--- a/man7/string_copying.7
+++ b/man7/string_copying.7
@@ -41,7 +41,8 @@ .SS Strings
.\" ----- SYNOPSIS :: Null-padded character sequences --------/
.SS Null-padded character sequences
.nf
-// Fill a fixed-width null-padded buffer with bytes from a string.
+// Fill a fixed-width buffer with characters from a string
+// and pad with null bytes.
.BI "char *strncpy(char " dst "[restrict ." sz "], \
const char *restrict " src ,
.BI " size_t " sz );
--
2.42.0
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v2 2/2] stpncpy.3, string.3, string_copying.7: Clarify that st[rp]ncpy() pad with null bytes
2023-11-09 15:20 ` [PATCH v2 2/2] stpncpy.3, string.3, string_copying.7: Clarify that st[rp]ncpy() pad with null bytes Alejandro Colomar
@ 2023-11-10 5:47 ` Oskari Pirhonen
2023-11-10 10:47 ` Alejandro Colomar
0 siblings, 1 reply; 77+ messages in thread
From: Oskari Pirhonen @ 2023-11-10 5:47 UTC (permalink / raw)
To: Alejandro Colomar
Cc: linux-man, libc-alpha, Jonny Grant, DJ Delorie, Matthew House,
Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg,
G. Branden Robinson, Carlos O'Donell, Paul Eggert, Xi Ruoyao
[-- Attachment #1: Type: text/plain, Size: 1941 bytes --]
On Thu, Nov 09, 2023 at 16:20:39 +0100, Alejandro Colomar wrote:
> The previous wording could be interpreted as if the nulls were already
> in place. Clarify that it's this function which pads with null bytes.
>
> Also, it copies "characters" from the src string. That's a bit more
> specific than copying "bytes", and makes it clearer that the terminating
> null byte in src is not part of the copy.
>
> Suggested-by: Jonny Grant <jg@jguk.org>
> Cc: DJ Delorie <dj@redhat.com>
> Cc: Jonny Grant <jg@jguk.org>
> Cc: Matthew House <mattlloydhouse@gmail.com>
> Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com>
> Cc: Thorsten Kukuk <kukuk@suse.com>
> Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
> Cc: Zack Weinberg <zack@owlfolio.org>
> Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
> Cc: Carlos O'Donell <carlos@redhat.com>
> Cc: Paul Eggert <eggert@cs.ucla.edu>
> Cc: Xi Ruoyao <xry111@xry111.site>
> Signed-off-by: Alejandro Colomar <alx@kernel.org>
> ---
> man3/stpncpy.3 | 10 ++++++----
> man3/string.3 | 11 ++---------
> man7/string_copying.7 | 3 ++-
> 3 files changed, 10 insertions(+), 14 deletions(-)
>
... snip ...
> diff --git a/man3/string.3 b/man3/string.3
> index aba5efd2b..bd8b342a6 100644
> --- a/man3/string.3
> +++ b/man3/string.3
> @@ -179,21 +179,14 @@ .SH SYNOPSIS
> .I n
> bytes to
> .IR dest .
> -.SS Obsolete functions
If you're removing this section ...
> .TP
> .nf
> .BI "char *strncpy(char " dest "[restrict ." n "], \
> const char " src "[restrict ." n ],
> .BI " size_t " n );
> .fi
> -Copy at most
> -.I n
> -bytes from string
> -.I src
> -to
> -.IR dest ,
> -returning a pointer to the start of
> -.IR dest .
> +Fill a fixed‐width buffer with characters from a string
> +and pad with null bytes.
... shouldn't you also move the rest of this up to keep it alphabetized?
- Oskari
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v2 2/2] stpncpy.3, string.3, string_copying.7: Clarify that st[rp]ncpy() pad with null bytes
2023-11-10 5:47 ` Oskari Pirhonen
@ 2023-11-10 10:47 ` Alejandro Colomar
0 siblings, 0 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-10 10:47 UTC (permalink / raw)
To: linux-man, libc-alpha, Jonny Grant, DJ Delorie, Matthew House,
Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg,
G. Branden Robinson, Carlos O'Donell, Paul Eggert, Xi Ruoyao
[-- Attachment #1: Type: text/plain, Size: 2310 bytes --]
On Thu, Nov 09, 2023 at 11:47:34PM -0600, Oskari Pirhonen wrote:
> On Thu, Nov 09, 2023 at 16:20:39 +0100, Alejandro Colomar wrote:
> > The previous wording could be interpreted as if the nulls were already
> > in place. Clarify that it's this function which pads with null bytes.
> >
> > Also, it copies "characters" from the src string. That's a bit more
> > specific than copying "bytes", and makes it clearer that the terminating
> > null byte in src is not part of the copy.
> >
> > Suggested-by: Jonny Grant <jg@jguk.org>
> > Cc: DJ Delorie <dj@redhat.com>
> > Cc: Jonny Grant <jg@jguk.org>
> > Cc: Matthew House <mattlloydhouse@gmail.com>
> > Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com>
> > Cc: Thorsten Kukuk <kukuk@suse.com>
> > Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
> > Cc: Zack Weinberg <zack@owlfolio.org>
> > Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
> > Cc: Carlos O'Donell <carlos@redhat.com>
> > Cc: Paul Eggert <eggert@cs.ucla.edu>
> > Cc: Xi Ruoyao <xry111@xry111.site>
> > Signed-off-by: Alejandro Colomar <alx@kernel.org>
> > ---
> > man3/stpncpy.3 | 10 ++++++----
> > man3/string.3 | 11 ++---------
> > man7/string_copying.7 | 3 ++-
> > 3 files changed, 10 insertions(+), 14 deletions(-)
> >
>
> ... snip ...
>
> > diff --git a/man3/string.3 b/man3/string.3
> > index aba5efd2b..bd8b342a6 100644
> > --- a/man3/string.3
> > +++ b/man3/string.3
> > @@ -179,21 +179,14 @@ .SH SYNOPSIS
> > .I n
> > bytes to
> > .IR dest .
> > -.SS Obsolete functions
>
> If you're removing this section ...
>
> > .TP
> > .nf
> > .BI "char *strncpy(char " dest "[restrict ." n "], \
> > const char " src "[restrict ." n ],
> > .BI " size_t " n );
> > .fi
> > -Copy at most
> > -.I n
> > -bytes from string
> > -.I src
> > -to
> > -.IR dest ,
> > -returning a pointer to the start of
> > -.IR dest .
> > +Fill a fixed‐width buffer with characters from a string
> > +and pad with null bytes.
>
> ... shouldn't you also move the rest of this up to keep it alphabetized?
Hi Oskari,
Sure! I was trying to find a pattern in the order, but didn't see it
yesterday. Thanks! :)
Cheers,
Alex
>
> - Oskari
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 77+ messages in thread
[parent not found: <20231108021240.176996-1-mattlloydhouse@gmail.com>]
* [PATCH 0/2] Expand BUGS section of string_copying(7).
[not found] <cfbd8674-fe6a-4430-95f1-ec8bde7da32e@jguk.org>
[not found] ` <ZUacobMq0l_O8gjg@debian>
@ 2023-11-12 9:17 ` Alejandro Colomar
2023-11-12 9:18 ` [PATCH 1/2] string_copying.7: BUGS: *cat(3) functions aren't always bad Alejandro Colomar
` (5 subsequent siblings)
7 siblings, 0 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-12 9:17 UTC (permalink / raw)
To: linux-man; +Cc: Alejandro Colomar, libc-alpha
[-- Attachment #1: Type: text/plain, Size: 462 bytes --]
Hi,
After Paul showing important problems of strlcpy(3) (and strlcat(3)),
I've written something in string_copying(7)'s BUGS to warn against them.
Cheers,
Alex
Alejandro Colomar (2):
string_copying.7: BUGS: *cat(3) functions aren't always bad
string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance
problems
man7/string_copying.7 | 24 +++++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)
--
2.42.0
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 77+ messages in thread
* [PATCH 1/2] string_copying.7: BUGS: *cat(3) functions aren't always bad
[not found] <cfbd8674-fe6a-4430-95f1-ec8bde7da32e@jguk.org>
[not found] ` <ZUacobMq0l_O8gjg@debian>
2023-11-12 9:17 ` [PATCH 0/2] Expand BUGS section of string_copying(7) Alejandro Colomar
@ 2023-11-12 9:18 ` Alejandro Colomar
2023-11-12 9:18 ` [PATCH 2/2] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems Alejandro Colomar
` (4 subsequent siblings)
7 siblings, 0 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-12 9:18 UTC (permalink / raw)
To: linux-man
Cc: Alejandro Colomar, libc-alpha, Paul Eggert, Jonny Grant,
DJ Delorie, Matthew House, Oskari Pirhonen, Thorsten Kukuk,
Adhemerval Zanella Netto, Zack Weinberg, G. Branden Robinson,
Carlos O'Donell, Xi Ruoyao, Stefan Puiu, Andreas Schwab
[-- Attachment #1: Type: text/plain, Size: 1736 bytes --]
The compiler will sometimes optimize them to normal *cpy(3) functions,
since the length of dst is usually known, if the previous *cpy(3) is
visible to the compiler. And they provide for cleaner code. If you
know that they'll get optimized, you could use them.
Cc: Paul Eggert <eggert@cs.ucla.edu>
Cc: Jonny Grant <jg@jguk.org>
Cc: DJ Delorie <dj@redhat.com>
Cc: Matthew House <mattlloydhouse@gmail.com>
Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com>
Cc: Thorsten Kukuk <kukuk@suse.com>
Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Zack Weinberg <zack@owlfolio.org>
Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Carlos O'Donell <carlos@redhat.com>
Cc: Xi Ruoyao <xry111@xry111.site>
Cc: Stefan Puiu <stefan.puiu@gmail.com>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
man7/string_copying.7 | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/man7/string_copying.7 b/man7/string_copying.7
index 1637ebc91..0254fbba6 100644
--- a/man7/string_copying.7
+++ b/man7/string_copying.7
@@ -592,8 +592,14 @@ .SH BUGS
All catenation functions share the same performance problem:
.UR https://www.joelonsoftware.com/\:2001/12/11/\:back\-to\-basics/
Shlemiel the painter
.UE .
+As a mitigation,
+compilers are able to transform some calls to catenation functions
+into normal copy functions,
+since
+.I strlen(dst)
+is usually a byproduct of the previous copy.
.\" ----- EXAMPLES :: -------------------------------------------------/
.SH EXAMPLES
The following are examples of correct use of each of these functions.
.\" ----- EXAMPLES :: stpcpy(3) ---------------------------------------/
--
2.42.0
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 77+ messages in thread
* [PATCH 2/2] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems
[not found] <cfbd8674-fe6a-4430-95f1-ec8bde7da32e@jguk.org>
` (2 preceding siblings ...)
2023-11-12 9:18 ` [PATCH 1/2] string_copying.7: BUGS: *cat(3) functions aren't always bad Alejandro Colomar
@ 2023-11-12 9:18 ` Alejandro Colomar
2023-11-12 11:26 ` [PATCH v2 0/3] Improve string_copying(7) Alejandro Colomar
` (3 subsequent siblings)
7 siblings, 0 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-12 9:18 UTC (permalink / raw)
To: linux-man
Cc: Alejandro Colomar, libc-alpha, Paul Eggert, Jonny Grant,
DJ Delorie, Matthew House, Oskari Pirhonen, Thorsten Kukuk,
Adhemerval Zanella Netto, Zack Weinberg, G. Branden Robinson,
Carlos O'Donell, Xi Ruoyao, Stefan Puiu, Andreas Schwab
[-- Attachment #1: Type: text/plain, Size: 2593 bytes --]
Also point to BUGS from other sections that talk about these functions.
These functions are doomed due to the design decision of mirroring
snprintf(3)'s return value. They must return strlen(src), which makes
them terribly slow, and vulnerable to DoS if an attacker can control
strlen(src).
A better design would have been to return -1 when truncating.
Reported-by: Paul Eggert <eggert@cs.ucla.edu>
Cc: Jonny Grant <jg@jguk.org>
Cc: DJ Delorie <dj@redhat.com>
Cc: Matthew House <mattlloydhouse@gmail.com>
Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com>
Cc: Thorsten Kukuk <kukuk@suse.com>
Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Zack Weinberg <zack@owlfolio.org>
Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Carlos O'Donell <carlos@redhat.com>
Cc: Xi Ruoyao <xry111@xry111.site>
Cc: Stefan Puiu <stefan.puiu@gmail.com>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
man7/string_copying.7 | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/man7/string_copying.7 b/man7/string_copying.7
index 0254fbba6..cb3910db0 100644
--- a/man7/string_copying.7
+++ b/man7/string_copying.7
@@ -226,9 +226,9 @@ .SS Truncate or not?
.IP \[bu]
.BR strlcpy (3bsd)
and
.BR strlcat (3bsd)
-are similar, but less efficient when chained.
+are similar, but have important performance problems; see BUGS.
.IP \[bu]
.BR stpncpy (3)
and
.BR strncpy (3)
@@ -417,8 +417,10 @@ .SS Functions
the resulting string is truncated
(but it is guaranteed to be null-terminated).
They return the length of the total string they tried to create.
.IP
+Check BUGS before using these functions.
+.IP
.BR stpecpy (3)
is a simpler alternative to these functions.
.\" ----- DESCRIPTION :: Functions :: stpncpy(3) ----------------------/
.TP
@@ -598,8 +600,22 @@ .SH BUGS
into normal copy functions,
since
.I strlen(dst)
is usually a byproduct of the previous copy.
+.P
+.BR strlcpy (3)
+and
+.BR strlcat (3)
+need to read the entire
+.I src
+string,
+even if the destination buffer is small.
+This makes them vulnerable to Denial of Service (DoS) attacks
+if an attacker can control the length of the
+.I src
+string.
+And if not,
+they're still unnecessarily slow.
.\" ----- EXAMPLES :: -------------------------------------------------/
.SH EXAMPLES
The following are examples of correct use of each of these functions.
.\" ----- EXAMPLES :: stpcpy(3) ---------------------------------------/
--
2.42.0
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 77+ messages in thread
* [PATCH v2 0/3] Improve string_copying(7)
[not found] <cfbd8674-fe6a-4430-95f1-ec8bde7da32e@jguk.org>
` (3 preceding siblings ...)
2023-11-12 9:18 ` [PATCH 2/2] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems Alejandro Colomar
@ 2023-11-12 11:26 ` Alejandro Colomar
2023-11-12 11:26 ` [PATCH v2 1/3] string_copying.7: BUGS: *cat(3) functions aren't always bad Alejandro Colomar
` (2 subsequent siblings)
7 siblings, 0 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-12 11:26 UTC (permalink / raw)
To: linux-man, Guillem Jover
Cc: Alejandro Colomar, libc-alpha, Paul Eggert, Jonny Grant,
DJ Delorie, Matthew House, Oskari Pirhonen, Thorsten Kukuk,
Adhemerval Zanella Netto, Zack Weinberg, G. Branden Robinson,
Carlos O'Donell, Xi Ruoyao, Stefan Puiu, Andreas Schwab
[-- Attachment #1: Type: text/plain, Size: 907 bytes --]
Hi,
v3:
- Patches 1/3 and 2/3 are identical to v2, except that I CCd libbsd's
maintainer (Guillem) in 2/3 so he's aware that we're documenting BUGS
for strlcpy(3). Since the strlcpy(3bsd) manual page is part of
libbsd, it may be interesting to also add a BUGS section in that
page.
- Add 3/3, which adds strtcpy(3), a function almost identical to
strscpy(9), and very similar to strlcpy(3), which doesn't share its
bugs.
Cheers,
Alex
Alejandro Colomar (3):
string_copying.7: BUGS: *cat(3) functions aren't always bad
string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance
problems
strtcpy.3, string_copying.7: Add strtcpy(3)
man3/strtcpy.3 | 1 +
man7/string_copying.7 | 121 +++++++++++++++++++++++++++++++-----------
2 files changed, 92 insertions(+), 30 deletions(-)
create mode 100644 man3/strtcpy.3
--
2.42.0
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 77+ messages in thread
* [PATCH v2 1/3] string_copying.7: BUGS: *cat(3) functions aren't always bad
[not found] <cfbd8674-fe6a-4430-95f1-ec8bde7da32e@jguk.org>
` (4 preceding siblings ...)
2023-11-12 11:26 ` [PATCH v2 0/3] Improve string_copying(7) Alejandro Colomar
@ 2023-11-12 11:26 ` Alejandro Colomar
2023-11-12 11:26 ` [PATCH v2 2/3] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems Alejandro Colomar
2023-11-12 11:27 ` [PATCH v2 3/3] strtcpy.3, string_copying.7: Add strtcpy(3) Alejandro Colomar
7 siblings, 0 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-12 11:26 UTC (permalink / raw)
To: linux-man
Cc: Alejandro Colomar, libc-alpha, Guillem Jover, Paul Eggert,
Jonny Grant, DJ Delorie, Matthew House, Oskari Pirhonen,
Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg,
G. Branden Robinson, Carlos O'Donell, Xi Ruoyao, Stefan Puiu,
Andreas Schwab
[-- Attachment #1: Type: text/plain, Size: 1736 bytes --]
The compiler will sometimes optimize them to normal *cpy(3) functions,
since the length of dst is usually known, if the previous *cpy(3) is
visible to the compiler. And they provide for cleaner code. If you
know that they'll get optimized, you could use them.
Cc: Paul Eggert <eggert@cs.ucla.edu>
Cc: Jonny Grant <jg@jguk.org>
Cc: DJ Delorie <dj@redhat.com>
Cc: Matthew House <mattlloydhouse@gmail.com>
Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com>
Cc: Thorsten Kukuk <kukuk@suse.com>
Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Zack Weinberg <zack@owlfolio.org>
Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Carlos O'Donell <carlos@redhat.com>
Cc: Xi Ruoyao <xry111@xry111.site>
Cc: Stefan Puiu <stefan.puiu@gmail.com>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
man7/string_copying.7 | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/man7/string_copying.7 b/man7/string_copying.7
index 1637ebc91..0254fbba6 100644
--- a/man7/string_copying.7
+++ b/man7/string_copying.7
@@ -592,8 +592,14 @@ .SH BUGS
All catenation functions share the same performance problem:
.UR https://www.joelonsoftware.com/\:2001/12/11/\:back\-to\-basics/
Shlemiel the painter
.UE .
+As a mitigation,
+compilers are able to transform some calls to catenation functions
+into normal copy functions,
+since
+.I strlen(dst)
+is usually a byproduct of the previous copy.
.\" ----- EXAMPLES :: -------------------------------------------------/
.SH EXAMPLES
The following are examples of correct use of each of these functions.
.\" ----- EXAMPLES :: stpcpy(3) ---------------------------------------/
--
2.42.0
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 77+ messages in thread
* [PATCH v2 2/3] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems
[not found] <cfbd8674-fe6a-4430-95f1-ec8bde7da32e@jguk.org>
` (5 preceding siblings ...)
2023-11-12 11:26 ` [PATCH v2 1/3] string_copying.7: BUGS: *cat(3) functions aren't always bad Alejandro Colomar
@ 2023-11-12 11:26 ` Alejandro Colomar
2023-11-12 11:27 ` [PATCH v2 3/3] strtcpy.3, string_copying.7: Add strtcpy(3) Alejandro Colomar
7 siblings, 0 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-12 11:26 UTC (permalink / raw)
To: linux-man
Cc: Alejandro Colomar, libc-alpha, Guillem Jover, Paul Eggert,
Jonny Grant, DJ Delorie, Matthew House, Oskari Pirhonen,
Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg,
G. Branden Robinson, Carlos O'Donell, Xi Ruoyao, Stefan Puiu,
Andreas Schwab
[-- Attachment #1: Type: text/plain, Size: 2634 bytes --]
Also point to BUGS from other sections that talk about these functions.
These functions are doomed due to the design decision of mirroring
snprintf(3)'s return value. They must return strlen(src), which makes
them terribly slow, and vulnerable to DoS if an attacker can control
strlen(src).
A better design would have been to return -1 when truncating.
Reported-by: Paul Eggert <eggert@cs.ucla.edu>
Cc: Jonny Grant <jg@jguk.org>
Cc: DJ Delorie <dj@redhat.com>
Cc: Matthew House <mattlloydhouse@gmail.com>
Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com>
Cc: Thorsten Kukuk <kukuk@suse.com>
Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Zack Weinberg <zack@owlfolio.org>
Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Carlos O'Donell <carlos@redhat.com>
Cc: Xi Ruoyao <xry111@xry111.site>
Cc: Stefan Puiu <stefan.puiu@gmail.com>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Cc: Guillem Jover <guillem@hadrons.org>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
man7/string_copying.7 | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/man7/string_copying.7 b/man7/string_copying.7
index 0254fbba6..cb3910db0 100644
--- a/man7/string_copying.7
+++ b/man7/string_copying.7
@@ -226,9 +226,9 @@ .SS Truncate or not?
.IP \[bu]
.BR strlcpy (3bsd)
and
.BR strlcat (3bsd)
-are similar, but less efficient when chained.
+are similar, but have important performance problems; see BUGS.
.IP \[bu]
.BR stpncpy (3)
and
.BR strncpy (3)
@@ -417,8 +417,10 @@ .SS Functions
the resulting string is truncated
(but it is guaranteed to be null-terminated).
They return the length of the total string they tried to create.
.IP
+Check BUGS before using these functions.
+.IP
.BR stpecpy (3)
is a simpler alternative to these functions.
.\" ----- DESCRIPTION :: Functions :: stpncpy(3) ----------------------/
.TP
@@ -598,8 +600,22 @@ .SH BUGS
into normal copy functions,
since
.I strlen(dst)
is usually a byproduct of the previous copy.
+.P
+.BR strlcpy (3)
+and
+.BR strlcat (3)
+need to read the entire
+.I src
+string,
+even if the destination buffer is small.
+This makes them vulnerable to Denial of Service (DoS) attacks
+if an attacker can control the length of the
+.I src
+string.
+And if not,
+they're still unnecessarily slow.
.\" ----- EXAMPLES :: -------------------------------------------------/
.SH EXAMPLES
The following are examples of correct use of each of these functions.
.\" ----- EXAMPLES :: stpcpy(3) ---------------------------------------/
--
2.42.0
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 77+ messages in thread
* [PATCH v2 3/3] strtcpy.3, string_copying.7: Add strtcpy(3)
[not found] <cfbd8674-fe6a-4430-95f1-ec8bde7da32e@jguk.org>
` (6 preceding siblings ...)
2023-11-12 11:26 ` [PATCH v2 2/3] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems Alejandro Colomar
@ 2023-11-12 11:27 ` Alejandro Colomar
7 siblings, 0 replies; 77+ messages in thread
From: Alejandro Colomar @ 2023-11-12 11:27 UTC (permalink / raw)
To: linux-man
Cc: Alejandro Colomar, libc-alpha, Guillem Jover, Paul Eggert,
Jonny Grant, DJ Delorie, Matthew House, Oskari Pirhonen,
Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg,
G. Branden Robinson, Carlos O'Donell, Xi Ruoyao, Stefan Puiu,
Andreas Schwab
[-- Attachment #1: Type: text/plain, Size: 7496 bytes --]
Add this new truncating string-copying function. It intends to fully
replace strlcpy(3), which has important bugs (documented in the
preceeding commit).
It is almost identical to Linux kernel's strscpy(9), so reduce the
documentation of strscpy(9) in this page to the minimum, giving
preference to strtcpy(3). Provide a reference implementation, since no
libc provides it.
Providing an easy, safe, and relatively fast truncating string-copying
function should prevent users from rolling their own, in which they
might introduce bugs accidentally. We already made enough mistakes
while discussing these functions, so it's certainly not something that
should be written often.
Cc: Paul Eggert <eggert@cs.ucla.edu>
Cc: Jonny Grant <jg@jguk.org>
Cc: DJ Delorie <dj@redhat.com>
Cc: Matthew House <mattlloydhouse@gmail.com>
Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com>
Cc: Thorsten Kukuk <kukuk@suse.com>
Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Zack Weinberg <zack@owlfolio.org>
Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Carlos O'Donell <carlos@redhat.com>
Cc: Xi Ruoyao <xry111@xry111.site>
Cc: Stefan Puiu <stefan.puiu@gmail.com>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Cc: Guillem Jover <guillem@hadrons.org>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
man3/strtcpy.3 | 1 +
man7/string_copying.7 | 97 ++++++++++++++++++++++++++++++-------------
2 files changed, 69 insertions(+), 29 deletions(-)
create mode 100644 man3/strtcpy.3
diff --git a/man3/strtcpy.3 b/man3/strtcpy.3
new file mode 100644
index 000000000..beb850746
--- /dev/null
+++ b/man3/strtcpy.3
@@ -0,0 +1 @@
+.so man7/string_copying.7
diff --git a/man7/string_copying.7 b/man7/string_copying.7
index cb3910db0..4f609e480 100644
--- a/man7/string_copying.7
+++ b/man7/string_copying.7
@@ -6,8 +6,9 @@
.\" ----- NAME :: -----------------------------------------------------/
.SH NAME
stpcpy,
strcpy, strcat,
+strtcpy,
stpecpy,
strlcpy, strlcat,
stpncpy,
strncpy,
@@ -30,8 +31,11 @@ .SS Strings
// Chain-copy a string with truncation.
.BI "char *stpecpy(char *" dst ", char " end "[0], const char *restrict " src );
.P
// Copy/catenate a string with truncation.
+.BI "size_t strtcpy(char " dst "[restrict ." sz "], \
+const char *restrict " src ,
+.BI " size_t " sz );
.BI "size_t strlcpy(char " dst "[restrict ." sz "], \
const char *restrict " src ,
.BI " size_t " sz );
.BI "size_t strlcat(char " dst "[restrict ." sz "], \
@@ -220,10 +224,10 @@ .SS Truncate or not?
.P
Functions that truncate:
.IP \[bu] 3
.BR stpecpy (3)
-is the most efficient string copy function that performs truncation.
-It only requires to check for truncation once after all chained calls.
+.IP \[bu]
+.BR strtcpy (3)
.IP \[bu]
.BR strlcpy (3bsd)
and
.BR strlcat (3bsd)
@@ -326,8 +330,10 @@ .SS String vs character sequence
.IP \[bu]
.BR strcpy (3),
.BR strcat (3)
.IP \[bu]
+.BR strtcpy (3)
+.IP \[bu]
.BR stpecpy (3)
.IP \[bu]
.BR strlcpy (3bsd),
.BR strlcat (3bsd)
@@ -390,12 +396,24 @@ .SS Functions
The return value is useless.
.IP
.BR stpcpy (3)
is a faster alternative to these functions.
+.\" ----- DESCRIPTION :: Functions :: strtcpy(3) ----------------------/
+.TP
+.BR strtcpy (3)
+Copy the input string into a destination string.
+If the destination buffer isn't large enough to hold the copy,
+the resulting string is truncated
+(but it is guaranteed to be null-terminated).
+It returns the length of the string,
+or \-1 if it truncated.
+.IP
+This function is not provided by any library;
+see EXAMPLES for a reference implementation.
.\" ----- DESCRIPTION :: Functions :: stpecpy(3) ----------------------/
.TP
.BR stpecpy (3)
-Copy the input string into a destination string.
+Chain-copy the input string into a destination string.
If the destination buffer,
limited by a pointer to its end,
isn't large enough to hold the copy,
the resulting string is truncated
@@ -419,10 +437,12 @@ .SS Functions
They return the length of the total string they tried to create.
.IP
Check BUGS before using these functions.
.IP
+.BR strtcpy (3)
+and
.BR stpecpy (3)
-is a simpler alternative to these functions.
+are better alternatives to these functions.
.\" ----- DESCRIPTION :: Functions :: stpncpy(3) ----------------------/
.TP
.BR stpncpy (3)
Copy the input string into
@@ -542,8 +562,17 @@ .SH RETURN VALUE
.BR ustpcpy (3)
A pointer to one after the last character
in the destination character sequence.
.TP
+.BR strtcpy (3)
+The length of the string.
+When truncation occurs, it returns \-1.
+When
+.I dsize
+is
+.BR 0 ,
+it also returns \-1.
+.TP
.BR strlcpy (3bsd)
.TQ
.BR strlcat (3bsd)
The length of the total string that they tried to create
@@ -562,25 +591,14 @@ .SH RETURN VALUE
which is useless.
.\" ----- NOTES :: strscpy(9) -----------------------------------------/
.SH NOTES
The Linux kernel has an internal function for copying strings,
-which is similar to
-.BR stpecpy (3),
-except that it can't be chained:
-.TP
-.BR strscpy (9)
-Copy the input string into a destination string.
-If the destination buffer,
-limited by its size,
-isn't large enough to hold the copy,
-the resulting string is truncated
-(but it is guaranteed to be null-terminated).
-It returns the length of the destination string, or
+.BR strscpy (9),
+which is identical to
+.BR strtcpy (3),
+except that it returns
.B \-E2BIG
-on truncation.
-.IP
-.BR stpecpy (3)
-is a simpler and faster alternative to this function.
+instead of \-1.
.\" ----- CAVEATS :: --------------------------------------------------/
.SH CAVEATS
Don't mix chain calls to truncating and non-truncating functions.
It is conceptually wrong
@@ -640,8 +658,17 @@ .SH EXAMPLES
strcat(buf, "!");
len = strlen(buf);
puts(buf);
.EE
+.\" ----- EXAMPLES :: strtcpy(3) --------------------------------------/
+.TP
+.BR strtcpy (3)
+.EX
+len = strtcpy(buf, "Hello world!", sizeof(buf));
+if (len == \-1)
+ goto toolong;
+puts(buf);
+.EE
.\" ----- EXAMPLES :: stpecpy(3) --------------------------------------/
.TP
.BR stpecpy (3)
.EX
@@ -671,17 +698,8 @@ .SH EXAMPLES
if (len >= sizeof(buf))
goto toolong;
puts(buf);
.EE
-.\" ----- EXAMPLES :: strscpy(9) --------------------------------------/
-.TP
-.BR strscpy (9)
-.EX
-len = strscpy(buf, "Hello world!", sizeof(buf));
-if (len == \-E2BIG)
- goto toolong;
-puts(buf);
-.EE
.\" ----- EXAMPLES :: stpncpy(3) --------------------------------------/
.TP
.BR stpncpy (3)
.EX
@@ -765,8 +783,29 @@ .SS Implementations
.in +4n
.EX
/* This code is in the public domain. */
\&
+.\" ----- EXAMPLES :: Implementations :: strtcpy(3) -------------------/
+ssize_t
+.IR strtcpy "(char *restrict dst, const char *restrict src, size_t sz)"
+{
+ bool trunc;
+ char *p;
+ size_t dlen, slen;
+\&
+ if (dsize == 0)
+ return \-1;
+\&
+ slen = strnlen(src, dsize);
+ trunc = (slen == dsize);
+ dlen = slen \- trunc;
+\&
+ p = mempcpy(dst, src, dlen);
+ *p = \[aq]\e0\[aq];
+
+ return trunc ? \-1 : slen;
+}
+\&
.\" ----- EXAMPLES :: Implementations :: stpecpy(3) -------------------/
char *
.IR stpecpy "(char *dst, char end[0], const char *restrict src)"
{
--
2.42.0
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 77+ messages in thread
end of thread, other threads:[~2023-11-27 23:45 UTC | newest]
Thread overview: 77+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <cfbd8674-fe6a-4430-95f1-ec8bde7da32e@jguk.org>
[not found] ` <ZUacobMq0l_O8gjg@debian>
[not found] ` <aeb55af5-1017-4ffd-9824-30b43d5748e3@jguk.org>
[not found] ` <ZUgl2HPJvUge7XYN@debian>
[not found] ` <d40fffcb-524d-44b6-a252-b55a8ddc9fee@jguk.org>
[not found] ` <ZUo6btEFD_z_3NcF@devuan>
[not found] ` <929865e3-17b4-49c4-8fa9-8383885e9904@jguk.org>
[not found] ` <ZUpjI1AHNOMOjdFk@devuan>
[not found] ` <ZUsoIbhrJar6ojux@dj3ntoo>
2023-11-08 9:51 ` strncpy clarify result may not be null terminated Alejandro Colomar
2023-11-08 9:59 ` Thorsten Kukuk
2023-11-08 15:09 ` Alejandro Colomar
[not found] ` <6bcad2492ab843019aa63895beaea2ce@DB6PR04MB3255.eurprd04.prod.outlook.com>
2023-11-08 15:44 ` Thorsten Kukuk
2023-11-08 17:26 ` Adhemerval Zanella Netto
2023-11-08 14:06 ` Zack Weinberg
2023-11-08 15:07 ` Alejandro Colomar
2023-11-08 21:35 ` Carlos O'Donell
2023-11-08 22:11 ` Alejandro Colomar
2023-11-08 23:31 ` Paul Eggert
2023-11-09 0:29 ` Alejandro Colomar
2023-11-09 10:13 ` Jonny Grant
2023-11-09 11:08 ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Alejandro Colomar
2023-11-09 14:06 ` catenate vs concatenate Jonny Grant
2023-11-27 14:33 ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Zack Weinberg
2023-11-27 15:08 ` Alejandro Colomar
2023-11-27 15:13 ` Alejandro Colomar
2023-11-27 16:59 ` G. Branden Robinson
2023-11-27 18:35 ` Zack Weinberg
2023-11-27 23:45 ` G. Branden Robinson
2023-11-09 11:13 ` strncpy clarify result may not be null terminated Alejandro Colomar
2023-11-09 14:05 ` Jonny Grant
2023-11-09 15:04 ` Alejandro Colomar
2023-11-08 19:04 ` DJ Delorie
2023-11-08 19:40 ` Alejandro Colomar
2023-11-08 19:58 ` DJ Delorie
2023-11-08 20:13 ` Alejandro Colomar
2023-11-08 21:07 ` DJ Delorie
2023-11-08 21:50 ` Alejandro Colomar
2023-11-08 22:17 ` [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string Alejandro Colomar
2023-11-08 23:06 ` Paul Eggert
2023-11-08 23:28 ` DJ Delorie
2023-11-09 0:24 ` Alejandro Colomar
2023-11-09 14:11 ` Jonny Grant
2023-11-09 14:35 ` Alejandro Colomar
2023-11-09 14:47 ` Jonny Grant
2023-11-09 15:02 ` Alejandro Colomar
2023-11-09 17:30 ` DJ Delorie
2023-11-09 17:54 ` Andreas Schwab
2023-11-09 18:00 ` Alejandro Colomar
2023-11-09 19:42 ` Jonny Grant
2023-11-09 7:23 ` Oskari Pirhonen
2023-11-09 15:20 ` [PATCH v2 1/2] " Alejandro Colomar
2023-11-09 15:20 ` [PATCH v2 2/2] stpncpy.3, string.3, string_copying.7: Clarify that st[rp]ncpy() pad with null bytes Alejandro Colomar
2023-11-10 5:47 ` Oskari Pirhonen
2023-11-10 10:47 ` Alejandro Colomar
[not found] ` <20231108021240.176996-1-mattlloydhouse@gmail.com>
[not found] ` <ZUvilH5kuQfTuZjy@debian>
[not found] ` <20231109031345.245703-1-mattlloydhouse@gmail.com>
2023-11-09 10:31 ` strncpy clarify result may not be null terminated Jonny Grant
2023-11-09 11:38 ` Alejandro Colomar
2023-11-09 12:43 ` Alejandro Colomar
2023-11-09 12:51 ` Xi Ruoyao
2023-11-09 14:01 ` Alejandro Colomar
2023-11-09 18:11 ` Paul Eggert
2023-11-09 23:48 ` Alejandro Colomar
2023-11-10 5:36 ` Paul Eggert
2023-11-10 11:05 ` Alejandro Colomar
2023-11-10 11:47 ` Alejandro Colomar
2023-11-10 17:58 ` Paul Eggert
2023-11-10 18:36 ` Alejandro Colomar
2023-11-10 20:19 ` Alejandro Colomar
2023-11-10 23:44 ` Jonny Grant
2023-11-10 19:52 ` Alejandro Colomar
2023-11-10 22:14 ` Paul Eggert
2023-11-11 21:13 ` Alejandro Colomar
2023-11-11 22:20 ` Paul Eggert
2023-11-12 9:52 ` Jonny Grant
2023-11-12 10:59 ` Alejandro Colomar
2023-11-10 11:36 ` Jonny Grant
2023-11-10 13:15 ` Alejandro Colomar
2023-11-10 11:23 ` Jonny Grant
[not found] ` <CACKs7VDsTdSNwbC6+2LtQ67J_eJiD814xkw2_5XM1Q=iMjLXJA@mail.gmail.com>
2023-11-10 11:06 ` Jonny Grant
2023-11-12 9:17 ` [PATCH 0/2] Expand BUGS section of string_copying(7) Alejandro Colomar
2023-11-12 9:18 ` [PATCH 1/2] string_copying.7: BUGS: *cat(3) functions aren't always bad Alejandro Colomar
2023-11-12 9:18 ` [PATCH 2/2] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems Alejandro Colomar
2023-11-12 11:26 ` [PATCH v2 0/3] Improve string_copying(7) Alejandro Colomar
2023-11-12 11:26 ` [PATCH v2 1/3] string_copying.7: BUGS: *cat(3) functions aren't always bad Alejandro Colomar
2023-11-12 11:26 ` [PATCH v2 2/3] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems Alejandro Colomar
2023-11-12 11:27 ` [PATCH v2 3/3] strtcpy.3, string_copying.7: Add strtcpy(3) Alejandro Colomar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).