public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* question about Glibc extensions
@ 2020-05-20 16:20 Martin Sebor
  2020-05-20 16:38 ` Florian Weimer
                   ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Martin Sebor @ 2020-05-20 16:20 UTC (permalink / raw)
  To: GNU C Library

I'm wondering how to go about figuring out what Glibc's expected
behavior is in cases where the standard (e.g., C or POSIX) doesn't
say.  Should the Glibc manual take precedence over the Linux
Programmer's Manual (http://man7.org/linux/man-pages) or should
it be the other way around?

For example, C (and POSIX) requires the first argument to mbstowcs
to be a valid non-null pointer, but the mbstowcs man page says it
can be null.  The Glibc manual doesn't mention it.

Another example is the POSIX readlink function which is required
to set errno to EINVAL if (and only if) the path argument names
a file that is not a symbolic link, but the Linux man page
says it also sets it to EINVAL when the (unsigned) bufsiz
argument is not positive.  The Glibc manual doesn't mention
this behavior.

(Both of these have come up while adding the GCC 10 access attribute
to Glibc's APIs as GCC warns about uses of these extensions.)

Thanks
Martin

PS Where does material for the Linux man pages that describe Glibc
specifics come from?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: question about Glibc extensions
  2020-05-20 16:20 question about Glibc extensions Martin Sebor
@ 2020-05-20 16:38 ` Florian Weimer
  2020-05-20 17:30   ` Martin Sebor
  2020-05-20 17:32 ` Andreas Schwab
  2020-05-21 15:53 ` The GNU C Library Manual - Authoritative or not? Carlos O'Donell
  2 siblings, 1 reply; 26+ messages in thread
From: Florian Weimer @ 2020-05-20 16:38 UTC (permalink / raw)
  To: Martin Sebor via Libc-alpha

* Martin Sebor via Libc-alpha:

> For example, C (and POSIX) requires the first argument to mbstowcs
> to be a valid non-null pointer, but the mbstowcs man page says it
> can be null.  The Glibc manual doesn't mention it.

It's an XSI extension in POSIX:

| [XSI] [Option Start] If pwcs is a null pointer, mbstowcs() shall
| return the length required to convert the entire array regardless of
| the value of n, but no values are stored. [Option End]

> Another example is the POSIX readlink function which is required
> to set errno to EINVAL if (and only if) the path argument names
> a file that is not a symbolic link,

We could probably set the length argument to MIN (INT_MAX, bufsize)
before passing it to the kernel.

> PS Where does material for the Linux man pages that describe Glibc
> specifics come from?

The glibc source code, for the most part.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: question about Glibc extensions
  2020-05-20 16:38 ` Florian Weimer
@ 2020-05-20 17:30   ` Martin Sebor
  0 siblings, 0 replies; 26+ messages in thread
From: Martin Sebor @ 2020-05-20 17:30 UTC (permalink / raw)
  To: Florian Weimer, Martin Sebor via Libc-alpha

On 5/20/20 10:38 AM, Florian Weimer wrote:
> * Martin Sebor via Libc-alpha:
> 
>> For example, C (and POSIX) requires the first argument to mbstowcs
>> to be a valid non-null pointer, but the mbstowcs man page says it
>> can be null.  The Glibc manual doesn't mention it.
> 
> It's an XSI extension in POSIX:
> 
> | [XSI] [Option Start] If pwcs is a null pointer, mbstowcs() shall
> | return the length required to convert the entire array regardless of
> | the value of n, but no values are stored. [Option End]

I see, thanks.  I overlooked that sentence.

> 
>> Another example is the POSIX readlink function which is required
>> to set errno to EINVAL if (and only if) the path argument names
>> a file that is not a symbolic link,
> 
> We could probably set the length argument to MIN (INT_MAX, bufsize)
> before passing it to the kernel.
> 
>> PS Where does material for the Linux man pages that describe Glibc
>> specifics come from?
> 
> The glibc source code, for the most part.

Thanks.  I'm not sure it answers my questions but it has helped me
get a clearer picture.  Here's my impression (please correct me if
I'm wrong).

Even though documentation.html makes it sound like the authoritative
documentation is the Glibc manual, and the Linux man pages are just
one of a number of other sources of information, in reality the man
page are more complete, and because of the close ties between
the projects can be trusted to be up to date.  So the man page are
the de facto authoritative reference.

Thanks again!
Martin

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: question about Glibc extensions
  2020-05-20 16:20 question about Glibc extensions Martin Sebor
  2020-05-20 16:38 ` Florian Weimer
@ 2020-05-20 17:32 ` Andreas Schwab
  2020-05-20 19:35   ` Martin Sebor
  2020-05-21 15:53 ` The GNU C Library Manual - Authoritative or not? Carlos O'Donell
  2 siblings, 1 reply; 26+ messages in thread
From: Andreas Schwab @ 2020-05-20 17:32 UTC (permalink / raw)
  To: Martin Sebor via Libc-alpha

On Mai 20 2020, Martin Sebor via Libc-alpha wrote:

> Another example is the POSIX readlink function which is required
> to set errno to EINVAL if (and only if) the path argument names
> a file that is not a symbolic link

This is wrong, there is no "only if".  Implementations are free to
define additional error conditions.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: question about Glibc extensions
  2020-05-20 17:32 ` Andreas Schwab
@ 2020-05-20 19:35   ` Martin Sebor
  2020-05-20 20:22     ` Andreas Schwab
  0 siblings, 1 reply; 26+ messages in thread
From: Martin Sebor @ 2020-05-20 19:35 UTC (permalink / raw)
  To: Andreas Schwab, Martin Sebor via Libc-alpha

On 5/20/20 11:32 AM, Andreas Schwab wrote:
> On Mai 20 2020, Martin Sebor via Libc-alpha wrote:
> 
>> Another example is the POSIX readlink function which is required
>> to set errno to EINVAL if (and only if) the path argument names
>> a file that is not a symbolic link
> 
> This is wrong, there is no "only if".  Implementations are free to
> define additional error conditions.

The point of my post was to ask a general a question about Glibc
documentation.  I chose these particular examples because they
illustrate what seems like a discrepancy between the Glibc manual,
the Linux man page, and C and/or POSIX, and also because they have
both come up as a result of the attribute access changes.

We discussed the specific case of readlink during the review of
the attribute access patch.  You're right that POSIX implementations
are free to define additional error conditions.  But they are only
permitted to do it "if and only if all those error conditions can
always be treated identically to the error conditions as described."
(Quoted from 2.3 Error Numbers).

For readlink, using EINVAL to also mean "buffer size is greater
than SSIZE_MAX" implies programs cannot interpret that error value
alone as meaning the path doesn't refer to a symbolic link.

I suppose an argument could be made that POSIX leaving the readlink
behavior up to implementations when bufsize is SSIZE_MAX includes
setting other errno values, even if that would seem to go against
the blanket "if and only" condition.  If that's so I would expect
the canonical Glibc API documentation for readlink to document its
choice (as conforming implementations are required to do).

But again, the details of the readlink example are secondary
to the bigger question I had about what should be considered
the authoritative reference for Glibc APIs.  (I think we've
now established the answer is the Linux man-pages project.)

Martin

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: question about Glibc extensions
  2020-05-20 19:35   ` Martin Sebor
@ 2020-05-20 20:22     ` Andreas Schwab
  2020-05-21  1:00       ` Rich Felker
  0 siblings, 1 reply; 26+ messages in thread
From: Andreas Schwab @ 2020-05-20 20:22 UTC (permalink / raw)
  To: Martin Sebor; +Cc: Martin Sebor via Libc-alpha

On Mai 20 2020, Martin Sebor wrote:

> For readlink, using EINVAL to also mean "buffer size is greater
> than SSIZE_MAX" implies programs cannot interpret that error value
> alone as meaning the path doesn't refer to a symbolic link.

You cannot do that anyway.  There is no one-to-one mapping between error
conditions and error numbers.  The only guarantee is that a defined
error condition results in a unique error number, not the other way
round.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: question about Glibc extensions
  2020-05-20 20:22     ` Andreas Schwab
@ 2020-05-21  1:00       ` Rich Felker
  0 siblings, 0 replies; 26+ messages in thread
From: Rich Felker @ 2020-05-21  1:00 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Martin Sebor, Martin Sebor via Libc-alpha

On Wed, May 20, 2020 at 10:22:55PM +0200, Andreas Schwab wrote:
> On Mai 20 2020, Martin Sebor wrote:
> 
> > For readlink, using EINVAL to also mean "buffer size is greater
> > than SSIZE_MAX" implies programs cannot interpret that error value
> > alone as meaning the path doesn't refer to a symbolic link.
> 
> You cannot do that anyway.  There is no one-to-one mapping between error
> conditions and error numbers.  The only guarantee is that a defined
> error condition results in a unique error number, not the other way
> round.

POSIX does have something to say about this:

    "Implementations may generate error numbers listed here under
    circumstances other than those described, if and only if all those
    error conditions can always be treated identically to the error
    conditions as described in this volume of POSIX.1-2017."

but just before that, it also says:

    "Implementations may support additional errors not included in
    this list, may generate errors included in this list under
    circumstances other than those described here, or may contain
    extensions or limitations that prevent some errors from
    occurring."

https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_03

^ permalink raw reply	[flat|nested] 26+ messages in thread

* The GNU C Library Manual - Authoritative or not?
  2020-05-20 16:20 question about Glibc extensions Martin Sebor
  2020-05-20 16:38 ` Florian Weimer
  2020-05-20 17:32 ` Andreas Schwab
@ 2020-05-21 15:53 ` Carlos O'Donell
  2020-05-21 17:46   ` Martin Sebor
  2020-05-22 12:54   ` Florian Weimer
  2 siblings, 2 replies; 26+ messages in thread
From: Carlos O'Donell @ 2020-05-21 15:53 UTC (permalink / raw)
  To: Martin Sebor, GNU C Library

On 5/20/20 12:20 PM, Martin Sebor via Libc-alpha wrote:
> I'm wondering how to go about figuring out what Glibc's expected
> behavior is in cases where the standard (e.g., C or POSIX) doesn't
> say.  Should the Glibc manual take precedence over the Linux
> Programmer's Manual (http://man7.org/linux/man-pages) or should
> it be the other way around?
> 
> For example, C (and POSIX) requires the first argument to mbstowcs
> to be a valid non-null pointer, but the mbstowcs man page says it
> can be null.  The Glibc manual doesn't mention it.
> 
> Another example is the POSIX readlink function which is required
> to set errno to EINVAL if (and only if) the path argument names
> a file that is not a symbolic link, but the Linux man page
> says it also sets it to EINVAL when the (unsigned) bufsiz
> argument is not positive.  The Glibc manual doesn't mention
> this behavior.
> 
> (Both of these have come up while adding the GCC 10 access attribute
> to Glibc's APIs as GCC warns about uses of these extensions.)
> 
> Thanks
> Martin
> 
> PS Where does material for the Linux man pages that describe Glibc
> specifics come from?

Do these answer your questions?

"What is the authoritative source for public glibc APIs?"
https://sourceware.org/glibc/wiki/FAQ#What_is_the_authoritative_source_for_public_glibc_APIs.3F

"What other sources of documentation about glibc are available?"
https://sourceware.org/glibc/wiki/FAQ#What_other_sources_of_documentation_about_glibc_are_available.3F

-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: The GNU C Library Manual - Authoritative or not?
  2020-05-21 15:53 ` The GNU C Library Manual - Authoritative or not? Carlos O'Donell
@ 2020-05-21 17:46   ` Martin Sebor
  2020-05-21 22:11     ` Carlos O'Donell
  2020-05-22  0:20     ` Rich Felker
  2020-05-22 12:54   ` Florian Weimer
  1 sibling, 2 replies; 26+ messages in thread
From: Martin Sebor @ 2020-05-21 17:46 UTC (permalink / raw)
  To: Carlos O'Donell, GNU C Library

On 5/21/20 9:53 AM, Carlos O'Donell wrote:
> On 5/20/20 12:20 PM, Martin Sebor via Libc-alpha wrote:
>> I'm wondering how to go about figuring out what Glibc's expected
>> behavior is in cases where the standard (e.g., C or POSIX) doesn't
>> say.  Should the Glibc manual take precedence over the Linux
>> Programmer's Manual (http://man7.org/linux/man-pages) or should
>> it be the other way around?
>>
>> For example, C (and POSIX) requires the first argument to mbstowcs
>> to be a valid non-null pointer, but the mbstowcs man page says it
>> can be null.  The Glibc manual doesn't mention it.
>>
>> Another example is the POSIX readlink function which is required
>> to set errno to EINVAL if (and only if) the path argument names
>> a file that is not a symbolic link, but the Linux man page
>> says it also sets it to EINVAL when the (unsigned) bufsiz
>> argument is not positive.  The Glibc manual doesn't mention
>> this behavior.
>>
>> (Both of these have come up while adding the GCC 10 access attribute
>> to Glibc's APIs as GCC warns about uses of these extensions.)
>>
>> Thanks
>> Martin
>>
>> PS Where does material for the Linux man pages that describe Glibc
>> specifics come from?
> 
> Do these answer your questions?

They do.  Unfortunately, the answers are wrong or at least unhelpful.
The two examples that prompted the questions show the Glibc manual
is less complete and less accurate than the Linux man pages (or than
POSIX that the man pages are derived from).  So in reality it cannot
realistically be taken as authoritative (at least in these two cases).
My impression is that this will hold in general.  There are functions
the manual doesn't even mention (e.g., readlink's cousin readlinkat).

I realize these are just omissions that could be fixed.  But until
they are, the manual cannot very well be taken as authoritative.

Martin

> "What is the authoritative source for public glibc APIs?"
> https://sourceware.org/glibc/wiki/FAQ#What_is_the_authoritative_source_for_public_glibc_APIs.3F
> 
> "What other sources of documentation about glibc are available?"
> https://sourceware.org/glibc/wiki/FAQ#What_other_sources_of_documentation_about_glibc_are_available.3F
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: The GNU C Library Manual - Authoritative or not?
  2020-05-21 17:46   ` Martin Sebor
@ 2020-05-21 22:11     ` Carlos O'Donell
  2020-05-22  0:22       ` Martin Sebor
  2020-05-22  0:20     ` Rich Felker
  1 sibling, 1 reply; 26+ messages in thread
From: Carlos O'Donell @ 2020-05-21 22:11 UTC (permalink / raw)
  To: Martin Sebor, GNU C Library

On 5/21/20 1:46 PM, Martin Sebor wrote:
> They do.  Unfortunately, the answers are wrong or at least unhelpful.

Which answer is wrong? I'd be happy to correct anything that is wrong.

As a contributor to the GNU Toolchain I encourage you to help us add
to the manual where we think we need to document such interfaces.

I also suggest a tactical approach, add only the interface you're
interested in improving.

This is not a duplication of effort IMO. The manual and the linux man
pages solve different needs. The manual is task-oriented, covering
sections of the standard APIs and how they could and should be used
together, while the the linux man pages are API references only
(in isolation to the larger set of APIs).

One should be able to `info libc mbstowcs` and get accurate information.

Given that you raised the issue and I wanted to be helpful, I tried to
add the docs and a test case (because we have zero test cases for this):

tst-mbstowcs.c: In function ‘do_test’:
tst-mbstowcs.c:44:12: error: argument 1 is null but the corresponding size argument 3 value is 24 [-Werror=nonnull]
   44 |   result = mbstowcs (NULL, string, len);
      |            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from ../include/stdlib.h:15,
                 from tst-mbstowcs.c:19:
../stdlib/stdlib.h:933:15: note: in a call to function ‘mbstowcs’ declared with attribute ‘write_only (1, 3)’
  933 | extern size_t mbstowcs (wchar_t *__restrict  __pwcs,
      |               ^~~~~~~~
cc1: all warnings being treated as errors

Which is expected given the markup. Because of the ability for pwcs to
be NULL, what can we do here? Remove the markup for the first argument?

> The two examples that prompted the questions show the Glibc manual
> is less complete and less accurate than the Linux man pages (or than
> POSIX that the man pages are derived from). 

Yes, for mbstowcs the manual doesn't say anything about the first
argument.

All of the following functions behave the same way:
* mbstowcs (XSI extension)
* mbsrtowcs 
* mbsnrtowcs
* wcsrtombs
* wcsnrtombs 

The last 4 functions are all worded in ISO C11 to operate correctly with
a NULL dst, but not mbstowcs, which is what I assume this discussion is
really about, that glibc doesn't quite meet what ISO C11 requires? All of
the "if dst is not null" language is missing from mbstowcs in C11.

In mbsrtowcs we have this note from 2001 by Ulrich:

We have 0 tests for mbstowcs that use NULL as the first argument in
the implementation. So we have no regression test for breaking this
behaviour AFAIK.

In 1996 when Roland implemented mbsrtowcs (the pre-cursor for the _l
variant and the backend to this function) we had this code:

30de3b18a52 (Roland McGrath 1996-04-02 13:27:17 +0000 32)   size_t result = 0;
...
30de3b18a52 (Roland McGrath 1996-04-02 13:27:17 +0000 45)       if (dst != NULL)
30de3b18a52 (Roland McGrath 1996-04-02 13:27:17 +0000 46)           dst[result] = (wchar_t) **src;
30de3b18a52 (Roland McGrath 1996-04-02 13:27:17 +0000 47)       ++result;
...
30de3b18a52 (Roland McGrath 1996-04-02 13:27:17 +0000 62)   return result;

So the very first implementation already supported this behaviour.

Following XSI conformance we set _XOPEN_UNIX 1 and _XOPEN_VERSION 700 so
we do indicate this behaviour to applications looking to check for such
semantics.

> So in reality it cannot
> realistically be taken as authoritative (at least in these two cases).

I don't see how that follows. Could you expand on this please?

The fact that the manual says nothing means there is no authoritative
answer for this. You can rely on external documentation to help.

My preference would be for us to improve the project documentation.

You can say that the linux man pages represent general user expectations
and you would be right, and that is one factor we take into consideration
when making changes to the API and ABI.

> My impression is that this will hold in general.  There are functions
> the manual doesn't even mention (e.g., readlink's cousin readlinkat).

Correct, please see manual/filesys.text:

3580 @c FIXME these are undocumented:
3581 @c faccessat
3582 @c fchmodat
3583 @c fchownat
3584 @c futimesat
3585 @c fstatat (there's a commented-out safety assessment for this one)
3586 @c statx
3587 @c mkdirat
3588 @c mkfifoat
3589 @c name_to_handle_at
3590 @c openat
3591 @c open_by_handle_at
3592 @c readlinkat
3593 @c renameat
3594 @c renameat2
3595 @c scandirat
3596 @c symlinkat
3597 @c unlinkat
3598 @c utimensat
3599 @c mknodat

I just reviewed Florian's patches for several *at functions:

https://patchwork.sourceware.org/project/glibc/patch/87d07p9v73.fsf@mid.deneb.enyo.de/

I think we're at v2 for that patch.

> I realize these are just omissions that could be fixed.  But until
> they are, the manual cannot very well be taken as authoritative.

I look forward to the day when all the APIs are covered.

However, until that day, it does not follow that because the manual is
incomplete that it is not the authoritative reference.

Missing information does not make it less authoritative, just incomplete.

You can come to the project and ask for clarification, and that act in
and of itself helps us prioritize what to work on next.

-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: The GNU C Library Manual - Authoritative or not?
  2020-05-21 17:46   ` Martin Sebor
  2020-05-21 22:11     ` Carlos O'Donell
@ 2020-05-22  0:20     ` Rich Felker
  2020-05-22 10:30       ` Michael Kerrisk
  1 sibling, 1 reply; 26+ messages in thread
From: Rich Felker @ 2020-05-22  0:20 UTC (permalink / raw)
  To: Martin Sebor; +Cc: Carlos O'Donell, GNU C Library

On Thu, May 21, 2020 at 11:46:39AM -0600, Martin Sebor via Libc-alpha wrote:
> On 5/21/20 9:53 AM, Carlos O'Donell wrote:
> >On 5/20/20 12:20 PM, Martin Sebor via Libc-alpha wrote:
> >>I'm wondering how to go about figuring out what Glibc's expected
> >>behavior is in cases where the standard (e.g., C or POSIX) doesn't
> >>say.  Should the Glibc manual take precedence over the Linux
> >>Programmer's Manual (http://man7.org/linux/man-pages) or should
> >>it be the other way around?
> >>
> >>For example, C (and POSIX) requires the first argument to mbstowcs
> >>to be a valid non-null pointer, but the mbstowcs man page says it
> >>can be null.  The Glibc manual doesn't mention it.
> >>
> >>Another example is the POSIX readlink function which is required
> >>to set errno to EINVAL if (and only if) the path argument names
> >>a file that is not a symbolic link, but the Linux man page
> >>says it also sets it to EINVAL when the (unsigned) bufsiz
> >>argument is not positive.  The Glibc manual doesn't mention
> >>this behavior.
> >>
> >>(Both of these have come up while adding the GCC 10 access attribute
> >>to Glibc's APIs as GCC warns about uses of these extensions.)
> >>
> >>Thanks
> >>Martin
> >>
> >>PS Where does material for the Linux man pages that describe Glibc
> >>specifics come from?
> >
> >Do these answer your questions?
> 
> They do.  Unfortunately, the answers are wrong or at least unhelpful.
> The two examples that prompted the questions show the Glibc manual
> is less complete and less accurate than the Linux man pages (or than
> POSIX that the man pages are derived from).  So in reality it cannot
> realistically be taken as authoritative (at least in these two cases).
> My impression is that this will hold in general.  There are functions
> the manual doesn't even mention (e.g., readlink's cousin readlinkat).
> 
> I realize these are just omissions that could be fixed.  But until
> they are, the manual cannot very well be taken as authoritative.

Less complete does not mean less accurate; rather the opposite, at
least in general. Unofficial documentation that documents specific
observed behaviors that are not officially documented contracts is
inherently *less accurate* because the information it's providing is
potentially subject to change.

Rich

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: The GNU C Library Manual - Authoritative or not?
  2020-05-21 22:11     ` Carlos O'Donell
@ 2020-05-22  0:22       ` Martin Sebor
  2020-05-22 12:35         ` Carlos O'Donell
  0 siblings, 1 reply; 26+ messages in thread
From: Martin Sebor @ 2020-05-22  0:22 UTC (permalink / raw)
  To: Carlos O'Donell, GNU C Library

On 5/21/20 4:11 PM, Carlos O'Donell wrote:
> On 5/21/20 1:46 PM, Martin Sebor wrote:
>> They do.  Unfortunately, the answers are wrong or at least unhelpful.
> 
> Which answer is wrong? I'd be happy to correct anything that is wrong.

There's a difference between saying "we want the Glibc manual to be
the authoritative reference on the implementation" versus "the manual
is the authoritative reference."

Clearly, on the details I was interested in (the readlink and mbstowcs
examples), the manual is not aa good a reference as the Linux man pages
because it doesn't cover them.  (I would call that authoritative but
I don't insist on using that word.)

> 
> As a contributor to the GNU Toolchain I encourage you to help us add
> to the manual where we think we need to document such interfaces.
> 
> I also suggest a tactical approach, add only the interface you're
> interested in improving.

Thanks, but no.  I wasn't looking for a project, just for an answer
to what I thought was a simple question :)  It would be a considerable
effort to bring the manual up to par with the Linux man pages.  I see
little point in investing it into duplicating what already exists
elsewhere and what according to documentation.html many of you are
already contributing to.  My suggestion instead is to declare
the Linux man pages the reference and treat the manual as a user
guide.

> 
> This is not a duplication of effort IMO. The manual and the linux man
> pages solve different needs. The manual is task-oriented, covering
> sections of the standard APIs and how they could and should be used
> together, while the the linux man pages are API references only
> (in isolation to the larger set of APIs).
> 
> One should be able to `info libc mbstowcs` and get accurate information.
> 
> Given that you raised the issue and I wanted to be helpful, I tried to
> add the docs and a test case (because we have zero test cases for this):
> 
> tst-mbstowcs.c: In function ‘do_test’:
> tst-mbstowcs.c:44:12: error: argument 1 is null but the corresponding size argument 3 value is 24 [-Werror=nonnull]
>     44 |   result = mbstowcs (NULL, string, len);
>        |            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
> In file included from ../include/stdlib.h:15,
>                   from tst-mbstowcs.c:19:
> ../stdlib/stdlib.h:933:15: note: in a call to function ‘mbstowcs’ declared with attribute ‘write_only (1, 3)’
>    933 | extern size_t mbstowcs (wchar_t *__restrict  __pwcs,
>        |               ^~~~~~~~
> cc1: all warnings being treated as errors
> 
> Which is expected given the markup. Because of the ability for pwcs to
> be NULL, what can we do here? Remove the markup for the first argument?

Yes, it needs to be removed for now.

A report of the warning above is what prompted my question.  I had
checked the C and in some cases also POSIX standard as well as
the Glibc manual when adding the attribute.  C doesn't have this
extension, Glibc doesn't document it, and I missed it in POSIX (or
more likely didn't think to look there in this case).  What I was
looking for with the question is an acknowledgment of what I had
suspected, namely that the Linux pages can be trusted to accurately
document this and other Glibc extensions.

> 
>> The two examples that prompted the questions show the Glibc manual
>> is less complete and less accurate than the Linux man pages (or than
>> POSIX that the man pages are derived from).
> 
> Yes, for mbstowcs the manual doesn't say anything about the first
> argument.
> 
> All of the following functions behave the same way:
> * mbstowcs (XSI extension)
> * mbsrtowcs
> * mbsnrtowcs
> * wcsrtombs
> * wcsnrtombs

The Glibc mbsnrtowcs and wcsnrtombs documentation is also missing
the detail about the first pointer being null.  The Linux man pages,
on the other hand, do mention it.

> 
> The last 4 functions are all worded in ISO C11 to operate correctly with
> a NULL dst, but not mbstowcs, which is what I assume this discussion is
> really about, that glibc doesn't quite meet what ISO C11 requires? All of
> the "if dst is not null" language is missing from mbstowcs in C11.
> 
> In mbsrtowcs we have this note from 2001 by Ulrich:
> 
> We have 0 tests for mbstowcs that use NULL as the first argument in
> the implementation. So we have no regression test for breaking this
> behaviour AFAIK.
> 
> In 1996 when Roland implemented mbsrtowcs (the pre-cursor for the _l
> variant and the backend to this function) we had this code:
> 
> 30de3b18a52 (Roland McGrath 1996-04-02 13:27:17 +0000 32)   size_t result = 0;
> ...
> 30de3b18a52 (Roland McGrath 1996-04-02 13:27:17 +0000 45)       if (dst != NULL)
> 30de3b18a52 (Roland McGrath 1996-04-02 13:27:17 +0000 46)           dst[result] = (wchar_t) **src;
> 30de3b18a52 (Roland McGrath 1996-04-02 13:27:17 +0000 47)       ++result;
> ...
> 30de3b18a52 (Roland McGrath 1996-04-02 13:27:17 +0000 62)   return result;
> 
> So the very first implementation already supported this behaviour.
> 
> Following XSI conformance we set _XOPEN_UNIX 1 and _XOPEN_VERSION 700 so
> we do indicate this behaviour to applications looking to check for such
> semantics.
> 
>> So in reality it cannot
>> realistically be taken as authoritative (at least in these two cases).
> 
> I don't see how that follows. Could you expand on this please?

Because it doesn't cover the implementation-defined details as
completely or accurately as the Linux man pages cover it.

> 
> The fact that the manual says nothing means there is no authoritative
> answer for this. You can rely on external documentation to help.

That's not how it's supposed to work.  For one, standards require
implementations to document the choices they let them make.
The documentation is (obviously) expected to be provided with
the implementation, not by some unknown third party or parties.

But from a simple usability point of view, it's unhelpful to tell
people to consider the union of the Glibc manual and all external
documentation (or some subset of it).  Not just because it's
impractical to read everything, but also because not everything
is correct or up to date.  How are they/we supposed to resolve conflicts?

> 
> My preference would be for us to improve the project documentation.
> 
> You can say that the linux man pages represent general user expectations
> and you would be right, and that is one factor we take into consideration
> when making changes to the API and ABI.

That's not all I'm saying: from what I've seen they document
the implementation in more detail, more accurately, and more
completely than the Glibc manual does.  And that's okay.

>> My impression is that this will hold in general.  There are functions
>> the manual doesn't even mention (e.g., readlink's cousin readlinkat).
> 
> Correct, please see manual/filesys.text:

That doesn't help your argument.

> 
> 3580 @c FIXME these are undocumented:
> 3581 @c faccessat
> 3582 @c fchmodat
> 3583 @c fchownat
> 3584 @c futimesat
> 3585 @c fstatat (there's a commented-out safety assessment for this one)
> 3586 @c statx
> 3587 @c mkdirat
> 3588 @c mkfifoat
> 3589 @c name_to_handle_at
> 3590 @c openat
> 3591 @c open_by_handle_at
> 3592 @c readlinkat
> 3593 @c renameat
> 3594 @c renameat2
> 3595 @c scandirat
> 3596 @c symlinkat
> 3597 @c unlinkat
> 3598 @c utimensat
> 3599 @c mknodat
> 
> I just reviewed Florian's patches for several *at functions:
> 
> https://patchwork.sourceware.org/project/glibc/patch/87d07p9v73.fsf@mid.deneb.enyo.de/
> 
> I think we're at v2 for that patch.
> 
>> I realize these are just omissions that could be fixed.  But until
>> they are, the manual cannot very well be taken as authoritative.
> 
> I look forward to the day when all the APIs are covered.
> 
> However, until that day, it does not follow that because the manual is
> incomplete that it is not the authoritative reference.
> 
> Missing information does not make it less authoritative, just incomplete.

The Cambridge Dictionary defines authoritative as:

   containing complete and accurate information, and therefore respected

But that's just semantics.  I'm sorry if my questioning the authority
of the Glibc manual struck a nerve.  All I wanted to know is where I'm
more likely to find complete and accurate documentation of Glibc
implementation details.  I've got my answer.

Thanks
Martin

> 
> You can come to the project and ask for clarification, and that act in
> and of itself helps us prioritize what to work on next.
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: The GNU C Library Manual - Authoritative or not?
  2020-05-22  0:20     ` Rich Felker
@ 2020-05-22 10:30       ` Michael Kerrisk
  2020-05-22 12:21         ` Carlos O'Donell
  0 siblings, 1 reply; 26+ messages in thread
From: Michael Kerrisk @ 2020-05-22 10:30 UTC (permalink / raw)
  To: Rich Felker; +Cc: Martin Sebor, GNU C Library, Michael Kerrisk-manpages

> > I realize these are just omissions that could be fixed.  But until
> > they are, the manual cannot very well be taken as authoritative.
>
> Less complete does not mean less accurate; rather the opposite, at
> least in general. Unofficial documentation that documents specific
> observed behaviors that are not officially documented contracts is
> inherently *less accurate* because the information it's providing is
> potentially subject to change.

The last few words there could be taken to sound like a version of "we
didn't document a contract, therefore we can change the behavior".
Repeated experience has shown that doesn't fly. When there is no
documentation of the behavior, users one way or or another discover it
for themselves and encode that understanding into their behavior and
applications, thus forming an implicit contract which typically is
hard to break. (I'm sure you know this; I think it's just worth
emphasizing the point.)

Cheers,

Michael

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: The GNU C Library Manual - Authoritative or not?
  2020-05-22 10:30       ` Michael Kerrisk
@ 2020-05-22 12:21         ` Carlos O'Donell
  0 siblings, 0 replies; 26+ messages in thread
From: Carlos O'Donell @ 2020-05-22 12:21 UTC (permalink / raw)
  To: Michael Kerrisk, Rich Felker; +Cc: GNU C Library

On 5/22/20 6:30 AM, Michael Kerrisk via Libc-alpha wrote:
>>> I realize these are just omissions that could be fixed.  But until
>>> they are, the manual cannot very well be taken as authoritative.
>>
>> Less complete does not mean less accurate; rather the opposite, at
>> least in general. Unofficial documentation that documents specific
>> observed behaviors that are not officially documented contracts is
>> inherently *less accurate* because the information it's providing is
>> potentially subject to change.
> 
> The last few words there could be taken to sound like a version of "we
> didn't document a contract, therefore we can change the behavior".
> Repeated experience has shown that doesn't fly. When there is no
> documentation of the behavior, users one way or or another discover it
> for themselves and encode that understanding into their behavior and
> applications, thus forming an implicit contract which typically is
> hard to break. (I'm sure you know this; I think it's just worth
> emphasizing the point.)

Absolutely. Missing information makes it difficult to set expectations
with users. Incomplete authoritative sources may be less useful than 
complete non-authoritative sources depending on the question you're 
asking, but it makes them no less authoritative.

Michael, I also want to say that I am incredibly appreciative
of the work you do on the linux man pages, and I try to show that by
being a regular contributor. For example documenting the exacting
details of how the API changed over time, and exactly what each
version did, and which versions had bugs is amazing. That is something
I don't think we need to capture in the glibc manual, but I think the
linux man pages should do that, and our users expect that of the man
pages.

Please keep up the good work, and ask for me if you think I can be
helpful with a review.

-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: The GNU C Library Manual - Authoritative or not?
  2020-05-22  0:22       ` Martin Sebor
@ 2020-05-22 12:35         ` Carlos O'Donell
  2020-05-22 21:02           ` Joseph Myers
  2020-05-25  8:58           ` Michael Kerrisk
  0 siblings, 2 replies; 26+ messages in thread
From: Carlos O'Donell @ 2020-05-22 12:35 UTC (permalink / raw)
  To: Martin Sebor, GNU C Library

On 5/21/20 8:22 PM, Martin Sebor wrote:
> On 5/21/20 4:11 PM, Carlos O'Donell wrote:
>> On 5/21/20 1:46 PM, Martin Sebor wrote:
>>> They do.  Unfortunately, the answers are wrong or at least unhelpful.
>>
>> Which answer is wrong? I'd be happy to correct anything that is wrong.
> 
> There's a difference between saying "we want the Glibc manual to be
> the authoritative reference on the implementation" versus "the manual
> is the authoritative reference."

No there is not.

The glibc manual is the authoritative reference.

> Clearly, on the details I was interested in (the readlink and mbstowcs
> examples), the manual is not aa good a reference as the Linux man pages
> because it doesn't cover them.  (I would call that authoritative but
> I don't insist on using that word.)

The glibc manual is still authoritative.

The linux man pages do have more information, and I contribute to
them regularly because I deeply respect Michael's vision and the mission
of the project which is to have excellent API reference documentation.

>> As a contributor to the GNU Toolchain I encourage you to help us add
>> to the manual where we think we need to document such interfaces.
>>
>> I also suggest a tactical approach, add only the interface you're
>> interested in improving.
> 
> Thanks, but no.  I wasn't looking for a project, just for an answer
> to what I thought was a simple question :)  

It *is* a simple question.

The glibc manual is authoritative over what glibc promises from the
implementation.

The glibc manual is also incomplete, which is deeply disappointing
for me (that's the nerve you touch).

> It would be a considerable
> effort to bring the manual up to par with the Linux man pages.  I see
> little point in investing it into duplicating what already exists
> elsewhere and what according to documentation.html many of you are
> already contributing to.  My suggestion instead is to declare
> the Linux man pages the reference and treat the manual as a user
> guide.

I appreciate your suggestion.

I agree it would be a waste of time to duplicate what already exists.

The glibc manual should be a guide and it should contain complete and
authoritative information about the APIs it implements. This does
not include version-by-version changes, bugs, raw syscalls, etc.

I think having the linux man pages as a non-authoritative reference
is very good, and I contribute to the linux man pages.

I think having WG14 work on ISO C is also good, so I contribute where
I can to SC22 in Canada for that reason.

That makes 4 documents covering the same APIs!

* ISO C (the standard)
* POSIX (the extended standard)
* glibc manual (the authoritative manual for glibc's implementation)
* Linux man pages (detailed syscall and API reference documentation)

FWIW they all serve different purposes.

The linux man pages are *great* they even document version to version
differences and if you're interested in targeting specific versions of
glibc you can use them accurately to write code that does exactly
what you would expect. That's awesome! Our manual does not have that as
a goal.

>> This is not a duplication of effort IMO. The manual and the linux man
>> pages solve different needs. The manual is task-oriented, covering
>> sections of the standard APIs and how they could and should be used
>> together, while the the linux man pages are API references only
>> (in isolation to the larger set of APIs).
>>
>> One should be able to `info libc mbstowcs` and get accurate information.
>>
>> Given that you raised the issue and I wanted to be helpful, I tried to
>> add the docs and a test case (because we have zero test cases for this):
>>
>> tst-mbstowcs.c: In function ‘do_test’:
>> tst-mbstowcs.c:44:12: error: argument 1 is null but the corresponding size argument 3 value is 24 [-Werror=nonnull]
>>     44 |   result = mbstowcs (NULL, string, len);
>>        |            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> In file included from ../include/stdlib.h:15,
>>                   from tst-mbstowcs.c:19:
>> ../stdlib/stdlib.h:933:15: note: in a call to function ‘mbstowcs’ declared with attribute ‘write_only (1, 3)’
>>    933 | extern size_t mbstowcs (wchar_t *__restrict  __pwcs,
>>        |               ^~~~~~~~
>> cc1: all warnings being treated as errors
>>
>> Which is expected given the markup. Because of the ability for pwcs to
>> be NULL, what can we do here? Remove the markup for the first argument?
> 
> Yes, it needs to be removed for now.

Thanks. I posted a patch to fix this up.

This is *my* fault, we should have had regression tests for this,
but we didn't. We barely have any tests for mbstowcs :-(

> A report of the warning above is what prompted my question.  I had
> checked the C and in some cases also POSIX standard as well as
> the Glibc manual when adding the attribute.  C doesn't have this
> extension, Glibc doesn't document it, and I missed it in POSIX (or
> more likely didn't think to look there in this case).  What I was
> looking for with the question is an acknowledgment of what I had
> suspected, namely that the Linux pages can be trusted to accurately
> document this and other Glibc extensions.

The linux man pages can *absolutely* be trusted to document every
change glibc makes, but that doesn't make them authoritative, it makes
them detailed. As a developer I love the detail in `man mbstowcs`.

>>> The two examples that prompted the questions show the Glibc manual
>>> is less complete and less accurate than the Linux man pages (or than
>>> POSIX that the man pages are derived from).
>>
>> Yes, for mbstowcs the manual doesn't say anything about the first
>> argument.
>>
>> All of the following functions behave the same way:
>> * mbstowcs (XSI extension)
>> * mbsrtowcs
>> * mbsnrtowcs
>> * wcsrtombs
>> * wcsnrtombs
> 
> The Glibc mbsnrtowcs and wcsnrtombs documentation is also missing
> the detail about the first pointer being null.  The Linux man pages,
> on the other hand, do mention it.

I added it in to the patch.

>> The last 4 functions are all worded in ISO C11 to operate correctly with
>> a NULL dst, but not mbstowcs, which is what I assume this discussion is
>> really about, that glibc doesn't quite meet what ISO C11 requires? All of
>> the "if dst is not null" language is missing from mbstowcs in C11.
>>
>> In mbsrtowcs we have this note from 2001 by Ulrich:
>>
>> We have 0 tests for mbstowcs that use NULL as the first argument in
>> the implementation. So we have no regression test for breaking this
>> behaviour AFAIK.
>>
>> In 1996 when Roland implemented mbsrtowcs (the pre-cursor for the _l
>> variant and the backend to this function) we had this code:
>>
>> 30de3b18a52 (Roland McGrath 1996-04-02 13:27:17 +0000 32)   size_t result = 0;
>> ...
>> 30de3b18a52 (Roland McGrath 1996-04-02 13:27:17 +0000 45)       if (dst != NULL)
>> 30de3b18a52 (Roland McGrath 1996-04-02 13:27:17 +0000 46)           dst[result] = (wchar_t) **src;
>> 30de3b18a52 (Roland McGrath 1996-04-02 13:27:17 +0000 47)       ++result;
>> ...
>> 30de3b18a52 (Roland McGrath 1996-04-02 13:27:17 +0000 62)   return result;
>>
>> So the very first implementation already supported this behaviour.
>>
>> Following XSI conformance we set _XOPEN_UNIX 1 and _XOPEN_VERSION 700 so
>> we do indicate this behaviour to applications looking to check for such
>> semantics.
>>
>>> So in reality it cannot
>>> realistically be taken as authoritative (at least in these two cases).
>>
>> I don't see how that follows. Could you expand on this please?
> 
> Because it doesn't cover the implementation-defined details as
> completely or accurately as the Linux man pages cover it.

I'm sorry, but that doesn't make something authoritative, it just makes
it more detailed, and those details can be wrong if implementation-defined
details should not be relied up or is not guaranteed by the implementation.

The flip side of this is that the lack of documentation makes assessing
developer expectations *very* difficult.

>> The fact that the manual says nothing means there is no authoritative
>> answer for this. You can rely on external documentation to help.
> 
> That's not how it's supposed to work.  For one, standards require
> implementations to document the choices they let them make.

As a FOSS project we don't have resources to do that. I wish we did,
and I'm working every day to try resolve that.

> The documentation is (obviously) expected to be provided with
> the implementation, not by some unknown third party or parties.

Correct! It is. We provide the glibc manual. It's not complete.

> But from a simple usability point of view, it's unhelpful to tell
> people to consider the union of the Glibc manual and all external
> documentation (or some subset of it).  Not just because it's
> impractical to read everything, but also because not everything
> is correct or up to date.  How are they/we supposed to resolve conflicts?

We resolve conflicts by writing things in the manual to cover such cases,
and we do so tactically to resolve problems as they arise.

Over the years the project has had a significant lack of engagement with
writing good documentation, and I can understand that. It's hard to
write clear and unambiguous English sentences to describe an interface
and how it dovetails into the rest of the APIs. Such writing is not as
much fun as writing code. We really need to engage with technical writers
and involve a broader set of industry skills in our projects.

Part of my responsibility as steward for the project is to try turn that
around by reviewing manual patches. I haven't done a great job, but I
continue to try move that forward.

Imagine one day if we could harmonize the 4 sources of information listed
above for the APIs? :-)

>>
>> My preference would be for us to improve the project documentation.
>>
>> You can say that the linux man pages represent general user expectations
>> and you would be right, and that is one factor we take into consideration
>> when making changes to the API and ABI.
> 
> That's not all I'm saying: from what I've seen they document
> the implementation in more detail, more accurately, and more
> completely than the Glibc manual does.  And that's okay.

Yes, it's OK, but they are not authoritative.

>> Missing information does not make it less authoritative, just incomplete.
> 
> The Cambridge Dictionary defines authoritative as:
> 
>   containing complete and accurate information, and therefore respected
> 
> But that's just semantics.  

Yes, and it's a logical fallacy i.e. argumentum ab auctoritate.

> I'm sorry if my questioning the authority
> of the Glibc manual struck a nerve.  All I wanted to know is where I'm
> more likely to find complete and accurate documentation of Glibc
> implementation details.  I've got my answer.
I am here to serve the users of glibc.

You asked if the manual was authoritative, it is.

The nerve you struck is that it's incomplete.

You ask for accurate API reference documentation that includes
bug behaviour (with version-by-version notes), and
implementation-specific details, and for that I suggest the linux man
pages.

I contribute to both projects actively, and eagerly, but they serve
different purposes.

Did I answer all of your questions?

Is there something that we could be doing better (other than write
documentation for all missing functions)?

--
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: The GNU C Library Manual - Authoritative or not?
  2020-05-21 15:53 ` The GNU C Library Manual - Authoritative or not? Carlos O'Donell
  2020-05-21 17:46   ` Martin Sebor
@ 2020-05-22 12:54   ` Florian Weimer
  2020-05-22 16:23     ` Carlos O'Donell
  2020-05-25  9:04     ` Michael Kerrisk
  1 sibling, 2 replies; 26+ messages in thread
From: Florian Weimer @ 2020-05-22 12:54 UTC (permalink / raw)
  To: Carlos O'Donell via Libc-alpha

* Carlos O'Donell via Libc-alpha:

> "What is the authoritative source for public glibc APIs?"
> https://sourceware.org/glibc/wiki/FAQ#What_is_the_authoritative_source_for_public_glibc_APIs.3F

Current text:

| The GNU C Library manual is the authoritative place for such
| information that is related to the implementation of functions in
| glibc.
|
| The Linux Man Pages are non-authoritative, but they are incredibly
| useful, easy to use, and often the first source of such information.
|
| The Linux Man Pages is generally authoritative on kernel syscalls, and
| we have worked hard in cases like futex to ensure the documentation is
| clear enough for all C libraries.
|
| We should all work together to keep both the manual (glibc manual) and
| the shorter form API references (linux man pages) up to date with the
| most accurate information we have.
|
| Where you find issues with the manual or the linux man pages please
| reach out to discuss the issue.

> "What other sources of documentation about glibc are available?"
> https://sourceware.org/glibc/wiki/FAQ#What_other_sources_of_documentation_about_glibc_are_available.3F

| The glibc manual is part of glibc, it is also available online.
|
| The Linux man-pages project documents the Linux kernel and the C
| library interfaces.
|
| The Open Group maintains the POSIX Standard which is the authoritative
| reference for the POSIX APIs.
|
| The ISO JTC1 SC22 WG14 maintains the C Standard which is the
| authoritative reference for the ISO C APIs.
|
| The official home page of glibc is at http://www.gnu.org/software/libc.
|
| The glibc wiki is at http://sourceware.org/glibc/wiki/HomePage.
|
| For bugs, the glibc project uses the sourceware bugzilla with
|component 'glibc'.

I don't think this is very helpful.  It paints a simple picture which is
turns out to be rather misleading when inspected closely.  For example,
POSIX often claims that ISO C takes precedence, but then proceeds to
specify conflicting requirements with ISO C.  What does that even mean?
After fall, it's not possible to have multiple authoritative sources for
the behavior of a single function.

Practically speaking, I see the following problems.

The GNU C Library manual is not often consulted by developers.  I don't
know why.  One reason may be that it is not readily installable from
package repositories on Debian or Ubuntu (at least not in current
versions from the main archive).  But our experience at Red Hat suggests
that our developers do not read the manual, either, although we do ship
it.  I base this on the paucity of bug reports against the manual
compared to what we receive for the man-pages package (which are often
misfiled initially against glibc).  In my opinion, a manual that is not
actually used by the people who benefit from the information in it has
at best a dubious claim to authority.

Reading our manual requires considerable skill.  You need to know the
history of the project, the lingering Linux vs Hurd conflict, the late
arrival of threading support in UNIX-like environments, the tension
between the POSIX and C standards, the lack of maintenance of both, and
so on.  Without such knowledge, it is often not possible to reach the
right conclusions.  Even senior developers can easily get confused.
(For a recent example, consider Kees Cook's misinterpretation of the
O_EXEC documentation in the manual.)  Part of the problem here is that
we do not have a team that combs through the manual from time to time
and keeps it up to date, as our understanding of the documented matter
evolves.

When it comes to Linux interfaces, any claim about authority of the
manual is very misleading.  It does not matter if the system call is
described by POSIX.  For example, if Linux developers change the signal
that waitpid reports after a failed execve, and our manual documents
something else, then the manual is now wrong.  And not the kernel.  If
the manual were authoritative, it would be the other way round.  (The
man-pages project is not authoritative in that sense, either—it did
document the SIGKILL signal and had to be updated.)

Many of these issues are beyond our control.  Some of the issues which
are in our area would need a tremendous amount of work to address.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: The GNU C Library Manual - Authoritative or not?
  2020-05-22 12:54   ` Florian Weimer
@ 2020-05-22 16:23     ` Carlos O'Donell
  2020-05-25  9:04     ` Michael Kerrisk
  1 sibling, 0 replies; 26+ messages in thread
From: Carlos O'Donell @ 2020-05-22 16:23 UTC (permalink / raw)
  To: Florian Weimer, Carlos O'Donell via Libc-alpha,
	Michael Kerrisk (man-pages)

On 5/22/20 8:54 AM, Florian Weimer wrote:
> * Carlos O'Donell via Libc-alpha:
> 
>> "What is the authoritative source for public glibc APIs?"
>> https://sourceware.org/glibc/wiki/FAQ#What_is_the_authoritative_source_for_public_glibc_APIs.3F
> 
> Current text:
> 
> | The GNU C Library manual is the authoritative place for such
> | information that is related to the implementation of functions in
> | glibc.
> |
> | The Linux Man Pages are non-authoritative, but they are incredibly
> | useful, easy to use, and often the first source of such information.
> |
> | The Linux Man Pages is generally authoritative on kernel syscalls, and
> | we have worked hard in cases like futex to ensure the documentation is
> | clear enough for all C libraries.
> |
> | We should all work together to keep both the manual (glibc manual) and
> | the shorter form API references (linux man pages) up to date with the
> | most accurate information we have.
> |
> | Where you find issues with the manual or the linux man pages please
> | reach out to discuss the issue.
> 
>> "What other sources of documentation about glibc are available?"
>> https://sourceware.org/glibc/wiki/FAQ#What_other_sources_of_documentation_about_glibc_are_available.3F
> 
> | The glibc manual is part of glibc, it is also available online.
> |
> | The Linux man-pages project documents the Linux kernel and the C
> | library interfaces.
> |
> | The Open Group maintains the POSIX Standard which is the authoritative
> | reference for the POSIX APIs.
> |
> | The ISO JTC1 SC22 WG14 maintains the C Standard which is the
> | authoritative reference for the ISO C APIs.
> |
> | The official home page of glibc is at http://www.gnu.org/software/libc.
> |
> | The glibc wiki is at http://sourceware.org/glibc/wiki/HomePage.
> |
> | For bugs, the glibc project uses the sourceware bugzilla with
> |component 'glibc'.
> 
> I don't think this is very helpful.  It paints a simple picture which is
> turns out to be rather misleading when inspected closely.  For example,
> POSIX often claims that ISO C takes precedence, but then proceeds to
> specify conflicting requirements with ISO C.  What does that even mean?
> After fall, it's not possible to have multiple authoritative sources for
> the behavior of a single function.

Good point.

How about this update for the two FAQ items:

"
The APIs provided by the glibc project are derived from ISO C, POSIX,
BSD, and many other sources. These APIs have in some cases been adjusted
for the underlying kernel, and where possible we try to document those
differences from the original standard sources.

The GNU C Library manual is the authoritative place for such
information that is related to the implementation of functions in
glibc.

The Linux Man Pages are non-authoritative with regards to the glibc APIs,
but they are incredibly useful, easy to use, and often the first source
of such information.

The Linux Man Pages is generally authoritative on kernel syscalls, and
we have worked hard in cases like futex to ensure the documentation is
clear enough for all C libraries.

We should all work together to keep both the manual (glibc manual) and
the shorter form API references (linux man pages) up to date with the
most accurate information we have.

Where you find issues with the manual or the linux man pages please
reach out to discuss the issue.
"
---
"
The glibc manual is part of glibc, it is also available online.

The Linux man-pages project documents the Linux kernel and the standard
library interfaces.

The Open Group maintains the POSIX Standard which is the authoritative
reference for the original POSIX APIs.

The ISO JTC1 SC22 WG14 maintains the C Standard which is the
authoritative reference for the original ISO C APIs.

The official home page of glibc is at http://www.gnu.org/software/libc.

The glibc wiki is at http://sourceware.org/glibc/wiki/HomePage.

For bugs, the glibc project uses the sourceware bugzilla with
component 'glibc', and you may find defects against the implementation
which have not yet been fixed and are under discussion.
"

At which point we're back to tactically filling out the manual where we have
such conflicts called out that need resolution.

Do you have any other suggestions?

> Practically speaking, I see the following problems.
> 
> The GNU C Library manual is not often consulted by developers.  I don't
> know why.  One reason may be that it is not readily installable from
> package repositories on Debian or Ubuntu (at least not in current
> versions from the main archive).  But our experience at Red Hat suggests
> that our developers do not read the manual, either, although we do ship
> it.  I base this on the paucity of bug reports against the manual
> compared to what we receive for the man-pages package (which are often
> misfiled initially against glibc).  In my opinion, a manual that is not
> actually used by the people who benefit from the information in it has
> at best a dubious claim to authority.

0) Claims on authority.

It does not follow logically to say that because people don't benefit
directly from something that such a thing is not authoritative.

Examples abound in other industries, with electrical standards dictating
what needs to be implemented in modern homes, but most DIY home owners
watch youtube videos on how to fix things. Such youtube videos are
certainly of benefit to home owners, but your jurisdictions electrical codes
are what is authoritative. Likewise youtube videos have more detail about
how to practically *achieve* code compliance, with the code often being
an ascerbic and detail-bereft document, and yet the code remains authoritative.

Likewise the ISO C standard has the last say in what is the ISO C language,
despite the hundreds of "Learn to program C" books on the market (some of
which probably have defects).

Thus the GNU C Library manual, despite being incomplete, is the authoritative
reference for the implementation, up and until the day we transfer that
authority to another project as part of a consensus discussion.

1) Popularity of the GNU C Library manual.

I agree with your assessment about popularity.

Several issues conspire against us, and I want to call them out for posterity
so I can reference this post.

1.a)
* We have failed to lobby our patrons to consider technical writing a
  key skill that is required to deliver quality upstream.
- When I say patrons I include corporate sponsors e.g. Red Hat, SUSE,
  IBM, Arm, etc.

1.b)
* The manual is GFDL with invariant sections.
- This makes it essentially non-free and unused by Debian and Ubuntu.

1.c)
* Manual patches require copyright assignment to the FSF.
- This makes contributions difficult.

1.d)
* info is _unknown_ to developers.
* man is more popular.
- This makes `info libc mbstowcs` an unknown command to run.

1.e)
* The project neglected good documentation for a long time.

It was a project-level strategic mistake not to create man pages for all
of the API references. It meant the project has never had reference documentation.
We can fix this by working with the linux man pages project, see this thread:
https://sourceware.org/pipermail/libc-alpha/2020-May/114251.html

Instead a semi-task-oriented manual was written, and never again updated
after the initial effort, which was large.

The "low cost" approach was to reference the ISO C standard, and the POSIX
standard, and the linux man pages. This leads to, as you point out, difficult and
inconsistent developer understanding of the APIs. You get what you pay for.

Can we fix 1.a), 1.b) and 1.c)? Yes, we can. We work something out with
the linux man pages project which is doing something right since they
manage to maintain the API documentation better than we can (they don't
suffer from 1.b), 1.c), 1.d), or 1.e)).

> Reading our manual requires considerable skill.  You need to know the
> history of the project, the lingering Linux vs Hurd conflict, the late
> arrival of threading support in UNIX-like environments, the tension
> between the POSIX and C standards, the lack of maintenance of both, and
> so on.  Without such knowledge, it is often not possible to reach the
> right conclusions.  Even senior developers can easily get confused.
> (For a recent example, consider Kees Cook's misinterpretation of the
> O_EXEC documentation in the manual.)  Part of the problem here is that
> we do not have a team that combs through the manual from time to time
> and keeps it up to date, as our understanding of the documented matter
> evolves.

Defects are a normal part of life.

Misunderstandings are a normal part of life.

What you do with a defect or a misunderstanding is what is important.

Did we fix the O_EXEC documentation?

Did we clean up both the manual and the linux man page?

Did we decide to harmonize on one source of documentation to be authoritative? :-)

Yes, we could do with a team to comb through the manual, but I'd like
to discuss having one harmonized authoritative API reference, rather
than 4 of them in conflict (ISO C, POSIX, glibc manaual, linux man pages).

If, in the hypothetical, we agreed to transition authority to the linux
man pages, we would just have to review those pages on a semi-regular
basis, and I think that's doable since they are discrete chunks.

Likewise task-oriented documentation can be tactically updated as users
and developers request we update those documents, or when new APIs are
added that dovetail into existing tasks.

Note: The task-oriented documentation can be spun off into another
project to avoid 1.b) and 1.d) and redone using modern html generators
e.g. Sphinx (my preference), and tied to the project release schedule.

> When it comes to Linux interfaces, any claim about authority of the
> manual is very misleading.  It does not matter if the system call is
> described by POSIX.  For example, if Linux developers change the signal
> that waitpid reports after a failed execve, and our manual documents
> something else, then the manual is now wrong.  And not the kernel.  If
> the manual were authoritative, it would be the other way round.  (The
> man-pages project is not authoritative in that sense, either—it did
> document the SIGKILL signal and had to be updated.)

I disagree that it is misleading.

I feel like we are conflating two things.

1) Authority.

2) Stability.

You can have 1) and not have 2)

The discussions and documentation of futex is the best example that I have
that leading kernel developers consider the linux man pages the place where
we will put authoritative documentation about the syscall interfaces.

Just because you are the authoritative reference does not mean that it
cannot change. It means you can change it, and that if you do, everyone
else needs to adjust.

The reality is, and you rightly point out, that we are in cooperation with
the Linux kernel project to implement a lot of the more complex semantics
with regards to process handling. My feeling is that the authority of the
kernel's designs are delegated indirectly to the linux man pages project
because senior kernel developers actively contribute to the linux man pages
for specific API descriptions e.g. Christian Brauner for clone
(4fe3acd9e1197554001a93d61d4ec65a0b19745e in man-pages).

> Many of these issues are beyond our control.  Some of the issues which
> are in our area would need a tremendous amount of work to address.

I think we can resolve some issues, particularly if we do work with
the linux man pages.

If we don't, or can't, work with existing documentation efforts then
we need to keep working incrementally on our own authoritative references
to the ISO C, POSIX, and BSD derived APIs.

-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: The GNU C Library Manual - Authoritative or not?
  2020-05-22 12:35         ` Carlos O'Donell
@ 2020-05-22 21:02           ` Joseph Myers
  2020-05-23 20:24             ` Michael Kerrisk
  2020-05-25  8:58           ` Michael Kerrisk
  1 sibling, 1 reply; 26+ messages in thread
From: Joseph Myers @ 2020-05-22 21:02 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: Martin Sebor, GNU C Library

On Fri, 22 May 2020, Carlos O'Donell via Libc-alpha wrote:

> The linux man pages can *absolutely* be trusted to document every
> change glibc makes, but that doesn't make them authoritative, it makes

But see e.g. the BUGS section in pow(3).  It still goes into details about 
bugs that were fixed in 2012, describing them as something current and 
architecture-independent rather than as something (a) mostly i386-specific 
and (b) fixed a very long time ago.  So, no, the man pages aren't updated 
for every change glibc makes.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: The GNU C Library Manual - Authoritative or not?
  2020-05-22 21:02           ` Joseph Myers
@ 2020-05-23 20:24             ` Michael Kerrisk
  2020-05-25 16:15               ` Carlos O'Donell
  0 siblings, 1 reply; 26+ messages in thread
From: Michael Kerrisk @ 2020-05-23 20:24 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Carlos O'Donell, GNU C Library

Oh how I wish this were true Carlos:

> > The linux man pages can *absolutely* be trusted to document every
> > change glibc makes, but that doesn't make them authoritative, it makes
>
> But see e.g. the BUGS section in pow(3).  It still goes into details about
> bugs that were fixed in 2012, describing them as something current and
> architecture-independent rather than as something (a) mostly i386-specific
> and (b) fixed a very long time ago.  So, no, the man pages aren't updated
> for every change glibc makes.

But Joseph is right. (And his comment embarrassed me into checking a
lot of my old glibc math bug reports and making some updates to
various pages :-}.)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: The GNU C Library Manual - Authoritative or not?
  2020-05-22 12:35         ` Carlos O'Donell
  2020-05-22 21:02           ` Joseph Myers
@ 2020-05-25  8:58           ` Michael Kerrisk
  2020-05-25 15:51             ` J William Piggott
  1 sibling, 1 reply; 26+ messages in thread
From: Michael Kerrisk @ 2020-05-25  8:58 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: Martin Sebor, GNU C Library, linux-man

[CC += linux-man@]
[Context: https://lwn.net/ml/libc-alpha/875300cf-92ca-c115-c42d-19dda5de5a4a@redhat.com/]

On Fri, May 22, 2020 at 3:07 PM Carlos O'Donell via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> On 5/21/20 8:22 PM, Martin Sebor wrote:
> > On 5/21/20 4:11 PM, Carlos O'Donell wrote:
> >> On 5/21/20 1:46 PM, Martin Sebor wrote:

[...]

> > It would be a considerable
> > effort to bring the manual up to par with the Linux man pages.  I see
> > little point in investing it into duplicating what already exists
> > elsewhere and what according to documentation.html many of you are
> > already contributing to.  My suggestion instead is to declare
> > the Linux man pages the reference and treat the manual as a user
> > guide.

So, as an aside, I think the Linux man-pages are rather better than
the glibc manual in many areas, but there some areas where where the
reverse is true. And lately I see patches to the glibc manual from
Florian, and sometimes I realize: "oh that's not covered in Linux
man-pages". So, notwithstanding what the glibc developers may thing
of the idea, from my perspective the notion of declaring man-pages as
the [authoritative] reference would not be without problems.

> I appreciate your suggestion.
>
> I agree it would be a waste of time to duplicate what already exists.
>
> The glibc manual should be a guide and it should contain complete and
> authoritative information about the APIs it implements. This does
> not include version-by-version changes, bugs, raw syscalls, etc.
>
> I think having the linux man pages as a non-authoritative reference
> is very good, and I contribute to the linux man pages.
>
> I think having WG14 work on ISO C is also good, so I contribute where
> I can to SC22 in Canada for that reason.
>
> That makes 4 documents covering the same APIs!
>
> * ISO C (the standard)
> * POSIX (the extended standard)
> * glibc manual (the authoritative manual for glibc's implementation)
> * Linux man pages (detailed syscall and API reference documentation)
>
> FWIW they all serve different purposes.
>
> The linux man pages are *great* they even document version to version
> differences and if you're interested in targeting specific versions of
> glibc you can use them accurately to write code that does exactly
> what you would expect. That's awesome! Our manual does not have that as
> a goal.
>
> >> This is not a duplication of effort IMO. The manual and the linux man
> >> pages solve different needs. The manual is task-oriented, covering
> >> sections of the standard APIs and how they could and should be used
> >> together,

And yes, in places this is where the glbc manual does really excel.

> >> while the the linux man pages are API references only
> >> (in isolation to the larger set of APIs).

Broadly true, although in places, Linux man-pages tries also to draw
bigger pictures (e.g., various overview pages in Section 7).

[...]

> > A report of the warning above is what prompted my question.  I had
> > checked the C and in some cases also POSIX standard as well as
> > the Glibc manual when adding the attribute.  C doesn't have this
> > extension, Glibc doesn't document it, and I missed it in POSIX (or
> > more likely didn't think to look there in this case).  What I was
> > looking for with the question is an acknowledgment of what I had
> > suspected, namely that the Linux pages can be trusted to accurately
> > document this and other Glibc extensions.
>
> The linux man pages can *absolutely* be trusted to document every
> change glibc makes, but that doesn't make them authoritative, it makes
> them detailed. As a developer I love the detail in `man mbstowcs`.

I commented on this already. I wish the above were true, but Linux
man-pages does not manage to track all of the changes.

[...]

> > But from a simple usability point of view, it's unhelpful to tell
> > people to consider the union of the Glibc manual and all external
> > documentation (or some subset of it).  Not just because it's
> > impractical to read everything, but also because not everything
> > is correct or up to date.  How are they/we supposed to resolve conflicts?
>
> We resolve conflicts by writing things in the manual to cover such cases,
> and we do so tactically to resolve problems as they arise.
>
> Over the years the project has had a significant lack of engagement with
> writing good documentation,

Yes, the manual seems to have started very well, but then the wheels
came off for many years. (Surprisingly, during that time I would
occasionally get a piece of very helpful input for man-pages from
Ulrich Drepper!)

> and I can understand that. It's hard to
> write clear and unambiguous English sentences to describe an interface
> and how it dovetails into the rest of the APIs.

Yes, best to leave that task to the Germans. (Sorry; I could not
resist the hat tip to Florian.)

> Such writing is not as
> much fun as writing code.

Yes, I never really understood that one. It's only by explaining (my)
code very clearly at least to myself, but probably to others, that I
can feel like it is/I have written good code. Writing good human
language expression is just as much an interesting challenge as
writing good programming language expression.

> We really need to engage with technical writers
> and involve a broader set of industry skills in our projects.

I want to add a note of caution here. It's great to have technical
writers (and like good developers companies should be paying them),
but they can't do the job on their own. A lot of developer input is
still required.

Thanks,

Michael

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: The GNU C Library Manual - Authoritative or not?
  2020-05-22 12:54   ` Florian Weimer
  2020-05-22 16:23     ` Carlos O'Donell
@ 2020-05-25  9:04     ` Michael Kerrisk
  2020-05-25 10:52       ` Florian Weimer
  2020-05-25 19:52       ` J William Piggott
  1 sibling, 2 replies; 26+ messages in thread
From: Michael Kerrisk @ 2020-05-25  9:04 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Carlos O'Donell via Libc-alpha, Michael Kerrisk-manpages,
	Martin Sebor, linux-man

[CC += linux-man@]
[Context: https://lwn.net/ml/libc-alpha/875300cf-92ca-c115-c42d-19dda5de5a4a@redhat.com/]

Hello Florian, Carlos, et al.

On Fri, May 22, 2020 at 3:22 PM Florian Weimer via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> * Carlos O'Donell via Libc-alpha:
>
> > "What is the authoritative source for public glibc APIs?"
> > https://sourceware.org/glibc/wiki/FAQ#What_is_the_authoritative_source_for_public_glibc_APIs.3F
>
> Current text:
>
> | The GNU C Library manual is the authoritative place for such
> | information that is related to the implementation of functions in
> | glibc.
> |
> | The Linux Man Pages are non-authoritative, but they are incredibly
> | useful, easy to use, and often the first source of such information.
> |
> | The Linux Man Pages is generally authoritative on kernel syscalls, and
> | we have worked hard in cases like futex to ensure the documentation is
> | clear enough for all C libraries.
> |
> | We should all work together to keep both the manual (glibc manual) and
> | the shorter form API references (linux man pages) up to date with the
> | most accurate information we have.
> |
> | Where you find issues with the manual or the linux man pages please
> | reach out to discuss the issue.
>
> > "What other sources of documentation about glibc are available?"
> > https://sourceware.org/glibc/wiki/FAQ#What_other_sources_of_documentation_about_glibc_are_available.3F
>
> | The glibc manual is part of glibc, it is also available online.
> |
> | The Linux man-pages project documents the Linux kernel and the C
> | library interfaces.
> |
> | The Open Group maintains the POSIX Standard which is the authoritative
> | reference for the POSIX APIs.
> |
> | The ISO JTC1 SC22 WG14 maintains the C Standard which is the
> | authoritative reference for the ISO C APIs.
> |
> | The official home page of glibc is at http://www.gnu.org/software/libc.
> |
> | The glibc wiki is at http://sourceware.org/glibc/wiki/HomePage.
> |
> | For bugs, the glibc project uses the sourceware bugzilla with
> |component 'glibc'.
>
> I don't think this is very helpful.  It paints a simple picture which is
> turns out to be rather misleading when inspected closely.  For example,
> POSIX often claims that ISO C takes precedence, but then proceeds to
> specify conflicting requirements with ISO C.  What does that even mean?
> After fall, it's not possible to have multiple authoritative sources for
> the behavior of a single function.
>
> Practically speaking, I see the following problems.
>
> The GNU C Library manual is not often consulted by developers.  I don't
> know why.  One reason may be that it is not readily installable from
> package repositories on Debian or Ubuntu (at least not in current
> versions from the main archive).  But our experience at Red Hat suggests
> that our developers do not read the manual, either, although we do ship
> it.  I base this on the paucity of bug reports against the manual
> compared to what we receive for the man-pages package (which are often
> misfiled initially against glibc).  In my opinion, a manual that is not
> actually used by the people who benefit from the information in it has
> at best a dubious claim to authority.
>
> Reading our manual requires considerable skill.  You need to know the
> history of the project, the lingering Linux vs Hurd conflict, the late
> arrival of threading support in UNIX-like environments, the tension
> between the POSIX and C standards, the lack of maintenance of both, and
> so on.  Without such knowledge, it is often not possible to reach the
> right conclusions.  Even senior developers can easily get confused.
> (For a recent example, consider Kees Cook's misinterpretation of the
> O_EXEC documentation in the manual.)  Part of the problem here is that
> we do not have a team that combs through the manual from time to time
> and keeps it up to date, as our understanding of the documented matter
> evolves.
>
> When it comes to Linux interfaces, any claim about authority of the
> manual is very misleading.  It does not matter if the system call is
> described by POSIX.  For example, if Linux developers change the signal
> that waitpid reports after a failed execve, and our manual documents
> something else, then the manual is now wrong.  And not the kernel.  If
> the manual were authoritative, it would be the other way round.  (The
> man-pages project is not authoritative in that sense, either—it did
> document the SIGKILL signal and had to be updated.)

Thanks. I find a lot of wisdom in what you say and do not disagree
with any of it. I just want to add a few thoughts and observations.

"The GNU C Library manual is not often consulted by developers"

Each year I come into contact with quite a large number of developers
(some few hundred each year) in many locations in my day job (or, at
least what used to be my day job until COVID-19 landed), and I think
*very* few of them are aware of the existence of the glibc manual.
Most are aware of manual pages. (However, they mostly aren't aware
that there is an project entity called "Linux man-pages"; rather, they
just know that they get a pile of manual pages on their systems, and
many of them consult those pages.)

And then there is the "info" thing. As a complete document (i.e.,
PDF), the glibc manual is quite a handsome document with a lot of good
information, but not the thing one wants to reach for when facing a
specific API problem. What is one then left with? "info". I think in
all of the years that I have been around Linux, I cannot recall
meeting anyone who had a kind word to say about this format/interface.
People generally don't understand how to navigate in "info", and I
think the whole idea of hyperlinking in a textual UI is one that
doesn't work well from a usability perspective. https://xkcd.com/912/
is funny for a reason.

"I base this on the paucity of bug reports against the manual compared
to what we receive for the man-pages package"

I want to add some further perspective here. Linux man-pages roughly
follows the release frequency of the kernel [1], thus a new release
every 10 weeks or so. The next release will feature changes resulting
from input from more than *70* people (email bug reporters, patch
submitters, reviewers, people who sent me random email that gave me
ideas, bugzilla reporters).

I put that high number of contributors [2] down to many factors:

* As you (Florian) observe, I think it's true that (many) more people know
about manual pages (than the glibc manual/"info").

* I try to make it easy for people to know how to report bugs.
Notably, since 2007, each manual page in the released set has a
COLOPHON [3] that has some minimal information about the origin of the
page *and how to report bugs*.

* That there is someone who actually responds to the documentation bug
reports. And here I paint myself in a good and bad light. When I am
very active, I do notice more bug reports start coming in. When I am
less active (e.g. in the last couple of years), there is a noticeable
fall in bug reports and contributions. (These observations are
subjective/anecdotal, but I think there is a real trend underneath,
since I've been  through this cycle a few times..)

* I try to make it easy for people to contribute. There's a website
with a fair amount of information about how people should send patches
[4], and I get a surprising number of random patches that actually
follow the guidelines [5]. By contrast, even among those who are aware
of the glibc manual, I estimate that few are aware of how to
contribute to it.

And on that last point, I circle back to an issue that I've banged on
about before. The CLA. It's just a huge barrier to contribution (and,
I remain convinced, A Bad Thing [TM], even if its motivation is well
intentioned [6]). Just in the last day or two, there's someone doing
what seems natural to so many in this (FOSS) world:

https://lwn.net/ml/libc-alpha/20200523191809.19663-1-aurelien.aptel%40gmail.com/

I presume this patch submitter has no idea of the existence of the
CLA. Once that person learns of it, will they bother doing the
paperwork, or will they just never bother submitting another patch? I
know which way I would bet my money.

> Many of these issues are beyond our control.  Some of the issues which
> are in our area would need a tremendous amount of work to address.

Yes. Writing good documentation is a lot of work. I know for sure that
man-pages could be a full-time (end even enjoyable!) job for me (or
someone else)--there is that much work that *could* be done--but
something else must pay the bills.

Cheers,

Michael

[1] This is a somewhat arbitrary decision that I made a few years ago,
simply to avoid the "big minor version numbers" problem, which was
roughly the problem that Linus Torvalds was also trying to avoid with
the kernel, although he came to that decision a few years earlier than
me.

[2] The high contribution rate is something of a local maxima in the
last couple of years. See my comments in the other thread about
contribution by others in relation to my own activity elsewhere.
(Turns out that lockdown has some positive effects not just for the
environment, but also for man-pages.)

[3] Example: https://man7.org/linux/man-pages/man7/user_namespaces.7.html#COLOPHON

[4] https://www.kernel.org/doc/man-pages/patches.html

[5] I don't want to paint too rosy a picture here. I still write the
majority of the commits that go into manual pages, and I remain the
reviewer of last resort for the majority of patches, which clearly
does not scale well. And far too many things fall on the floor when my
time is limited. (@Carlos, you probably have too rosy a picture of my
efficiency, because I am usually fairly responsive to input from you
and Florian. But that's because I know from past experience that I can
be fairly confident that what I receive from you will be of a quality
that is typically effortless for me to process.)

[6] https://lwn.net/Articles/529522/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: The GNU C Library Manual - Authoritative or not?
  2020-05-25  9:04     ` Michael Kerrisk
@ 2020-05-25 10:52       ` Florian Weimer
  2020-05-25 19:52       ` J William Piggott
  1 sibling, 0 replies; 26+ messages in thread
From: Florian Weimer @ 2020-05-25 10:52 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: Carlos O'Donell via Libc-alpha, Martin Sebor, linux-man

* Michael Kerrisk:

> Each year I come into contact with quite a large number of developers
> (some few hundred each year) in many locations in my day job (or, at
> least what used to be my day job until COVID-19 landed), and I think
> *very* few of them are aware of the existence of the glibc manual.
> Most are aware of manual pages. (However, they mostly aren't aware
> that there is an project entity called "Linux man-pages"; rather, they
> just know that they get a pile of manual pages on their systems, and
> many of them consult those pages.)

Thanks for sharing your perspective.

> And then there is the "info" thing. As a complete document (i.e.,
> PDF), the glibc manual is quite a handsome document with a lot of good
> information, but not the thing one wants to reach for when facing a
> specific API problem. What is one then left with? "info". I think in
> all of the years that I have been around Linux, I cannot recall
> meeting anyone who had a kind word to say about this format/interface.
> People generally don't understand how to navigate in "info", and I
> think the whole idea of hyperlinking in a textual UI is one that
> doesn't work well from a usability perspective. https://xkcd.com/912/
> is funny for a reason.

Even for Emacs users, I suspect that many more are aware of “M-x man RET
RET” than those that are aware of “C-h S”, which jumps right to the
function documentation in the glibc manual.  I have not figured out how
this actually works in practice.  I suspect it uses the Texinfo function
index.  Unfortunately, quite a bit of useful information in the Texinfo
sources is not visible in rendered versions.

One could try to get something similar to “C-h S” into Visual Studio
Code and other IDEs.  But I'm not convinced this is a good use of
resources.  Even if I can remember the Emacs command, I usually need the
manual pages because I'm more interested in the system call
documentation.

> And on that last point, I circle back to an issue that I've banged on
> about before. The CLA. It's just a huge barrier to contribution (and,
> I remain convinced, A Bad Thing [TM], even if its motivation is well
> intentioned [6]). Just in the last day or two, there's someone doing
> what seems natural to so many in this (FOSS) world:
>
> https://lwn.net/ml/libc-alpha/20200523191809.19663-1-aurelien.aptel%40gmail.com/
>
> I presume this patch submitter has no idea of the existence of the
> CLA. Once that person learns of it, will they bother doing the
> paperwork, or will they just never bother submitting another patch? I
> know which way I would bet my money.

I still haven't given up hope entirely for relicensing the manual under
a license that is compatible with Debian's requirements, and also making
it easier to move code and documentation between the manual and the
implementation itself.  The current copyright assignment procedure means
that there is no legal or technical obstacle to relicensing, one has
simply to convince the single copyright owner.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: The GNU C Library Manual - Authoritative or not?
  2020-05-25  8:58           ` Michael Kerrisk
@ 2020-05-25 15:51             ` J William Piggott
  2020-05-25 16:21               ` Carlos O'Donell
  0 siblings, 1 reply; 26+ messages in thread
From: J William Piggott @ 2020-05-25 15:51 UTC (permalink / raw)
  To: Michael Kerrisk; +Cc: Carlos O'Donell, linux-man, GNU C Library



On Mon, 25 May 2020, Michael Kerrisk via Libc-alpha wrote:

... >8
>
>> We really need to engage with technical writers
>> and involve a broader set of industry skills in our projects.
>
> I want to add a note of caution here. It's great to have technical
> writers (and like good developers companies should be paying them),
> but they can't do the job on their own. A lot of developer input is
> still required.

Another caution, many HR departments hire 'technical writers' that in
reality are copy editors, who's knowledge base is grammar/writing/language.
In my experience, they tend to make things a lot worse. Wordsmiths like
to use words, lots of words. They want to create novels. The complete
opposite of what technical writing should be.

>
> Thanks,
>
> Michael
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: The GNU C Library Manual - Authoritative or not?
  2020-05-23 20:24             ` Michael Kerrisk
@ 2020-05-25 16:15               ` Carlos O'Donell
  0 siblings, 0 replies; 26+ messages in thread
From: Carlos O'Donell @ 2020-05-25 16:15 UTC (permalink / raw)
  To: Michael Kerrisk, Joseph Myers; +Cc: GNU C Library

On 5/23/20 4:24 PM, Michael Kerrisk wrote:
> Oh how I wish this were true Carlos:
> 
>>> The linux man pages can *absolutely* be trusted to document every
>>> change glibc makes, but that doesn't make them authoritative, it makes
>>
>> But see e.g. the BUGS section in pow(3).  It still goes into details about
>> bugs that were fixed in 2012, describing them as something current and
>> architecture-independent rather than as something (a) mostly i386-specific
>> and (b) fixed a very long time ago.  So, no, the man pages aren't updated
>> for every change glibc makes.
> 
> But Joseph is right. (And his comment embarrassed me into checking a
> lot of my old glibc math bug reports and making some updates to
> various pages :-}.)

I admit to being hyperbolic in that sentence, but in general linux man pages
tracks a lot of changes.

-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: The GNU C Library Manual - Authoritative or not?
  2020-05-25 15:51             ` J William Piggott
@ 2020-05-25 16:21               ` Carlos O'Donell
  0 siblings, 0 replies; 26+ messages in thread
From: Carlos O'Donell @ 2020-05-25 16:21 UTC (permalink / raw)
  To: J William Piggott, Michael Kerrisk; +Cc: linux-man, GNU C Library

On 5/25/20 11:51 AM, J William Piggott wrote:
> 
> 
> On Mon, 25 May 2020, Michael Kerrisk via Libc-alpha wrote:
> 
> ... >8
>>
>>> We really need to engage with technical writers
>>> and involve a broader set of industry skills in our projects.
>>
>> I want to add a note of caution here. It's great to have technical
>> writers (and like good developers companies should be paying them),
>> but they can't do the job on their own. A lot of developer input is
>> still required.
> 
> Another caution, many HR departments hire 'technical writers' that in
> reality are copy editors, who's knowledge base is grammar/writing/language.
> In my experience, they tend to make things a lot worse. Wordsmiths like
> to use words, lots of words. They want to create novels. The complete
> opposite of what technical writing should be.

Agreed. We don't need copy editors. We need true technical writers that
understand C and C++. Sadly, I've rarely met such people, and I agree with
Michael, that I also find describing my code to be an illuminating part of
design. Even though others don't share my interests, the linux man pages as
a project shows there are enough of those people that the project can thrive
and keep all the man pages well enough updated that they are more useful for
API reference than the glibc manual.

-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: The GNU C Library Manual - Authoritative or not?
  2020-05-25  9:04     ` Michael Kerrisk
  2020-05-25 10:52       ` Florian Weimer
@ 2020-05-25 19:52       ` J William Piggott
  1 sibling, 0 replies; 26+ messages in thread
From: J William Piggott @ 2020-05-25 19:52 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: Florian Weimer, linux-man, Carlos O'Donell via Libc-alpha



On Mon, 25 May 2020, Michael Kerrisk via Libc-alpha wrote:

  ... >8

> * I try to make it easy for people to contribute.

Yes, the barrier to entry is pretty high; especially for a simple manual
fix. I speak from experience, I had a list of corrects to make; it was
relatively easy for the Linux man-pages. I believe, after getting two
accepted for The Manual I gave up.

Perhaps a separate mailing list dedicated to The Manual accepting
patches with relaxed rules?

As to discovery, that is, The Manual being unknown. For years my go to
tool for information was apropos(1). Of course you cannot discover
info(1) pages that way. A script could convert The Manual into a man
page. I'd be huge and probably ugly, but people could find it. Actually,
I already use The Manual in a similar way. I cat and format it into a
monolithic text file. I use the pager's search to find what I need. I am
used to the search patterns that, for example, find x-refs, nodes, etc.
It works for me (better then info(1) does).

In the beginning I found the fragmentation of Linux docs frustrating.
Not just info and man pages, but also html, pdf, text, howtos, kernel
docs, etc. I'm used to it now, and I'm thankful that we have as much
information as we do. There seems to be a lot of negative response to
The Manual; I'd like to say that it is a very useful body of work for
me. Michael's project is too. So a big thank you to all that put time
and effort into documentation!



^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2020-05-25 19:52 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-20 16:20 question about Glibc extensions Martin Sebor
2020-05-20 16:38 ` Florian Weimer
2020-05-20 17:30   ` Martin Sebor
2020-05-20 17:32 ` Andreas Schwab
2020-05-20 19:35   ` Martin Sebor
2020-05-20 20:22     ` Andreas Schwab
2020-05-21  1:00       ` Rich Felker
2020-05-21 15:53 ` The GNU C Library Manual - Authoritative or not? Carlos O'Donell
2020-05-21 17:46   ` Martin Sebor
2020-05-21 22:11     ` Carlos O'Donell
2020-05-22  0:22       ` Martin Sebor
2020-05-22 12:35         ` Carlos O'Donell
2020-05-22 21:02           ` Joseph Myers
2020-05-23 20:24             ` Michael Kerrisk
2020-05-25 16:15               ` Carlos O'Donell
2020-05-25  8:58           ` Michael Kerrisk
2020-05-25 15:51             ` J William Piggott
2020-05-25 16:21               ` Carlos O'Donell
2020-05-22  0:20     ` Rich Felker
2020-05-22 10:30       ` Michael Kerrisk
2020-05-22 12:21         ` Carlos O'Donell
2020-05-22 12:54   ` Florian Weimer
2020-05-22 16:23     ` Carlos O'Donell
2020-05-25  9:04     ` Michael Kerrisk
2020-05-25 10:52       ` Florian Weimer
2020-05-25 19:52       ` J William Piggott

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).