public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
       [not found]             ` <491a930d-47eb-7c86-c0c4-25eef4ac0be0@gmail.com>
@ 2022-09-02 21:57               ` Alejandro Colomar
  2022-09-03 12:47                 ` Martin Uecker
  0 siblings, 1 reply; 50+ messages in thread
From: Alejandro Colomar @ 2022-09-02 21:57 UTC (permalink / raw)
  To: JeanHeyd Meneide; +Cc: Ingo Schwarze, linux-man, gcc


[-- Attachment #1.1: Type: text/plain, Size: 8301 bytes --]

Hi JeanHeyd,

> Subject:     Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in 
> function parameters
> Date:     Fri, 2 Sep 2022 16:56:00 -0400
> From:     JeanHeyd Meneide <wg14@soasis.org>
> To:     Alejandro Colomar <alx.manpages@gmail.com>
> CC:     Ingo Schwarze <schwarze@usta.de>, linux-man@vger.kernel.org
> 
> 
> 
> Hi Alejandro and Ingo,
> 
>        Just chiming in from a Standards perspective, here. We discussed, 
> briefly, a way to allow Variable-Length function parameter declarations 
> like the ones shown in this thread (e.g., char *getcwd(char buf[size], 
> size_t size );).
> 
>        In GCC, there is a GNU extension that allows explicitly 
> forward-declaring the prototype. Using the above example, it would look 
> like so:

I added the GCC list to the thread, so that they can intervene if they 
consider it necessary.

> 
> char *getcwd(size_t size; char buf[size], size_t size);

I read about that, although I don't like it very much, and never used it.

> 
> (Live Example [1])
> 
> (Note the `;` after the first "size" declaration). This was brought 
> before the Committee to vote on for C23 in the form of N2780 [2], around 
> the January 2022 timeframe. The paper did not pass, and it was seen as a 
> "failed extension". After the vote on that failed, we talked about other 
> ways of allowing places whether there was some appetite to allow 
> "forward parsing" for this sort of case. That is, could we simply allow:
> 
> char *getcwd(char buf[size], size_t size);
> 
> to work as expected. The vote for this did not gain full consensus 
> either, but there were a lot of abstentions [3]. While I personally 
> voted in favor of allowing such for C, there was distinct worry that 
> this would produce issues for weaker C implementations that did not want 
> to commit to delayed parsing or forward parsing of the entirety of the 
> argument list before resolving types. There are enough abstentions 
> during voting that a working implementation with a writeup of complexity 
> would sway the Committee one way or the other.

I like that this got less hate than the GNU extension.  It's nicer to my 
eyes.

> 
> This is not to dissuade Alejandro's position, or to bolster Ingo's 
> point; I'm mostly just reporting the Committee's response here. This is 
> an unsolved problem for the Committee, and also a larger holdover from 
> the removal of K&R declarations from C23, which COULD solve this problem:
> 
> // decl
> char *getcwd();
> 
> // impl
> char* getcwd(buf, size)
> char buf[size];
>        size_t size;
> {
>        /* impl here */
> }

I won't miss them ;)

My regex-based parser[1] that finds declarations and definitions in C 
code bases goes nuts with K&R functions.  They are dead for good :)

[1]: <http://www.alejandro-colomar.es/src/alx/alx/grepc.git/>

> 
>        There is room for innovation here, or perhaps bolstering of the 
> GCC original extension. As it stands right now, compilers only very 
> recently started taking Variably-Modified Type parameters and Static 
> Extent parameters seriously after carefully separating them out of 
> Variable-Length Arrays, warning where they can when static or other 
> array parameters do not match buffer lengths and so-on.
> 
>        Not just to the folks in this thread, but to the broader 
> community for anyone who is paying attention: WG14 would actively like 
> to solve this problem. If someone can:
> - prove out a way to do delayed parsing that is not implementation-costly,
> - revive the considered-dead GCC extension, or
> - provide a 3rd or 4th way to support the goals,
> 
> I am certain WG14 would look favorably upon such a thing eventually, 
> brought before the Committee in inclusion for C2y/C3a.
> 
>        Whether or not you feel like the manpages are the best place to 
> start that, I'll leave up to you!

I'll try to defend the reasons to start this in the man-pages.

This feature is mostly for documentation purposes, not being meaningful 
for code at all (for some meaning of meaningful), since it won't change 
the function definition in any way, nor the calls to it.  At least not 
by itself; static analysis may get some benefits, though.

Also, new code can be designed from the beginning so that sizes go 
before their corresponding arrays, so that new code won't typically be 
affected by the lack of this feature in the language.

This leaves us with legacy code, especially libc, which just works, and 
doesn't have any urgent needs to change their prototypes in this regard 
(they could, to improve static analysis, but not what we'd call urgent).

And since most people don't go around reading libc headers searching for 
function declarations (especially since there are manual pages that show 
them nicely), it's not like the documentation of the code depends on how 
the function is _actually_ declared in code (that's why I also defended 
documenting restrict even if glibc wouldn't have cared to declare it), 
but it depends basically on what the manual pages say about the 
function.  If the manual pages say a function gets 'restrict' params, it 
means it gets 'restrict' params, no matter what the code says, and if it 
doesn't, the function accepts overlapping pointers, at least for most of 
the public (modulo manual page bugs, that is).

So this extension could very well be added by the manual pages, as a 
form of documentation, and then maybe picked up by compilers that have 
enough resources to implement it.


Considering that this feature is mostly about documentation (and a bit 
of static analysis too), the documentation should be something appealing 
to the reader.


Let's take an example:


        int getnameinfo(const struct sockaddr *restrict addr,
                        socklen_t addrlen,
                        char *restrict host, socklen_t hostlen,
                        char *restrict serv, socklen_t servlen,
                        int flags);

and some transformations:


        int getnameinfo(const struct sockaddr *restrict addr,
                        socklen_t addrlen,
                        char host[restrict hostlen], socklen_t hostlen,
                        char serv[restrict servlen], socklen_t servlen,
                        int flags);


        int getnameinfo(socklen_t hostlen;
                        socklen_t servlen;
                        const struct sockaddr *restrict addr,
                        socklen_t addrlen,
                        char host[restrict hostlen], socklen_t hostlen,
                        char serv[restrict servlen], socklen_t servlen,
                        int flags);

(I'm not sure if I used correct GNU syntax, since I never used that 
extension myself.)

The first transformation above is non-ambiguous, as concise as possible, 
and its only issue is that it might complicate the implementation a bit 
too much.  I don't think forward-using a parameter's size would be too 
much of a parsing problem for human readers.

The second one is unnecessarily long and verbose, and semicolons are not 
very distinguishable from commas, for human readers, which may be very 
confusing.

        int foo(int a; int b[a], int a);
        int foo(int a, int b[a], int o);

Those two are very different to the compiler, and yet very similar to 
the human eye.  I don't like it.  The fact that it allows for simpler 
compilers isn't enough to overcome the readability issues.

I think I'd prefer having the forward-using syntax as a non-standard 
extension --or a standard but optional language feature-- to avoid 
forcing small compilers to implement it, rather than having the GNU 
extension standardized in all compilers.

Having this extension in any single compiler would even make it more 
appealing to manual pages, which could use the syntax more freely 
without fear of confusing readers.  Even if the standard wouldn't accept it.

Let's see if GCC likes the feature and helps me attempt to use it a 
little bit! :-)

Cheers,

Alex


-- 
Alejandro Colomar
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-09-02 21:57               ` [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters Alejandro Colomar
@ 2022-09-03 12:47                 ` Martin Uecker
  2022-09-03 13:29                   ` Ingo Schwarze
  2022-09-03 13:41                   ` Alejandro Colomar
  0 siblings, 2 replies; 50+ messages in thread
From: Martin Uecker @ 2022-09-03 12:47 UTC (permalink / raw)
  To: Alejandro Colomar, JeanHeyd Meneide; +Cc: Ingo Schwarze, linux-man, gcc

...
> > 
> >        Whether or not you feel like the manpages are the best place to 
> > start that, I'll leave up to you!
> 
> I'll try to defend the reasons to start this in the man-pages.
> 
> This feature is mostly for documentation purposes, not being meaningful 
> for code at all (for some meaning of meaningful), since it won't change 
> the function definition in any way, nor the calls to it.  At least not 
> by itself; static analysis may get some benefits, though.


GCC will warn if the bound is specified inconsistently between
declarations and also emit warnings if it can see that a buffer
which is passed is too small:

https://godbolt.org/z/PsjPG1nv7


BTW: If you declare pointers to arrays (not first elements) you
can get run-time bounds checking with UBSan:

https://godbolt.org/z/TvMo89WfP


> 
> Also, new code can be designed from the beginning so that sizes go 
> before their corresponding arrays, so that new code won't typically be 
> affected by the lack of this feature in the language.
> 
> This leaves us with legacy code, especially libc, which just works, and 
> doesn't have any urgent needs to change their prototypes in this regard 
> (they could, to improve static analysis, but not what we'd call urgent).

It would be useful step to find out-of-bounds problem in
applications using libc.


> And since most people don't go around reading libc headers searching for 
> function declarations (especially since there are manual pages that show 
> them nicely), it's not like the documentation of the code depends on how 
> the function is _actually_ declared in code (that's why I also defended 
> documenting restrict even if glibc wouldn't have cared to declare it), 
> but it depends basically on what the manual pages say about the 
> function.  If the manual pages say a function gets 'restrict' params, it 
> means it gets 'restrict' params, no matter what the code says, and if it 
> doesn't, the function accepts overlapping pointers, at least for most of 
> the public (modulo manual page bugs, that is).
> 
> So this extension could very well be added by the manual pages, as a 
> form of documentation, and then maybe picked up by compilers that have 
> enough resources to implement it.
> 
> 
> Considering that this feature is mostly about documentation (and a bit 
> of static analysis too), the documentation should be something appealing 
> to the reader.
> 
> 
> Let's take an example:
> 
> 
>         int getnameinfo(const struct sockaddr *restrict addr,
>                         socklen_t addrlen,
>                         char *restrict host, socklen_t hostlen,
>                         char *restrict serv, socklen_t servlen,
>                         int flags);
> 
> and some transformations:
> 
> 
>         int getnameinfo(const struct sockaddr *restrict addr,
>                         socklen_t addrlen,
>                         char host[restrict hostlen], socklen_t hostlen,
>                         char serv[restrict servlen], socklen_t servlen,
>                         int flags);
> 
> 
>         int getnameinfo(socklen_t hostlen;
>                         socklen_t servlen;
>                         const struct sockaddr *restrict addr,
>                         socklen_t addrlen,
>                         char host[restrict hostlen], socklen_t hostlen,
>                         char serv[restrict servlen], socklen_t servlen,
>                         int flags);
> 
> (I'm not sure if I used correct GNU syntax, since I never used that 
> extension myself.)
> 
> The first transformation above is non-ambiguous, as concise as possible, 
> and its only issue is that it might complicate the implementation a bit 
> too much.  I don't think forward-using a parameter's size would be too 
> much of a parsing problem for human readers.


I personally find the second form not terrible.  Being
able to read code left-to-right, top-down is helpful in more
complicated examples.



> The second one is unnecessarily long and verbose, and semicolons are not 
> very distinguishable from commas, for human readers, which may be very 
> confusing.
> 
>         int foo(int a; int b[a], int a);
>         int foo(int a, int b[a], int o);
> 
> Those two are very different to the compiler, and yet very similar to 
> the human eye.  I don't like it.  The fact that it allows for simpler 
> compilers isn't enough to overcome the readability issues.

This is true, I would probably use it with a comma and/or
syntax highlighting.


> I think I'd prefer having the forward-using syntax as a non-standard 
> extension --or a standard but optional language feature-- to avoid 
> forcing small compilers to implement it, rather than having the GNU 
> extension standardized in all compilers.

The problems with the second form are:

- it is not 100% backwards compatible (which maybe ok though) as
the semantics of the following code changes:

int n;
int foo(int a[n], int n); // refers to different n!

Code written for new compilers could then be misunderstood
by old compilers when a variable with 'n' is in scope.


- it would generally be fundamentally new to C to have
backwards references and parser might need to be changes
to allow this


- a compiler or tool then has to deal also with ugly
corner cases such as mutual references:

int foo(int (*a)[sizeof(*b)], int (*b)[sizeof(*a)]);



We could consider new syntax such as

int foo(char buf[.n], int n);


Personally, I would prefer the conceptual simplicity of forward
declarations and the fact that these exist already in GCC
over any alternative.  I would also not mind new syntax, but
then one has to define the rules more precisely to avoid the
aforementioned problems. 


Martin





^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-09-03 12:47                 ` Martin Uecker
@ 2022-09-03 13:29                   ` Ingo Schwarze
  2022-09-03 15:08                     ` Alejandro Colomar
  2022-09-03 13:41                   ` Alejandro Colomar
  1 sibling, 1 reply; 50+ messages in thread
From: Ingo Schwarze @ 2022-09-03 13:29 UTC (permalink / raw)
  To: alx.manpages
  Cc: Martin Uecker, Alejandro Colomar, JeanHeyd Meneide, linux-man, gcc

Hi,

the only point i strongly care about is this one:

Manual pages should not use
 * non-standard syntax
 * non-portable syntax
 * ambiguous syntax (i.e. syntax that might have different meanings
   with different compilers or in different contexts)
 * syntax that might be invalid or dangerous with some widely
   used compiler collections like GCC or LLVM

Regarding the discussions about standardization and extensions,
all proposals i have seen look seriously ugly and awkward to me,
and i'm not yet convinced such ugliness is sufficiently offset by
the relatively minor benefit that is apparent to me right now.

Yours,
  Ingo

-- 
Ingo Schwarze             <schwarze@usta.de>
http://www.openbsd.org/   <schwarze@openbsd.org>
http://mandoc.bsd.lv/     <schwarze@mandoc.bsd.lv>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-09-03 12:47                 ` Martin Uecker
  2022-09-03 13:29                   ` Ingo Schwarze
@ 2022-09-03 13:41                   ` Alejandro Colomar
  2022-09-03 14:35                     ` Martin Uecker
  1 sibling, 1 reply; 50+ messages in thread
From: Alejandro Colomar @ 2022-09-03 13:41 UTC (permalink / raw)
  To: Martin Uecker; +Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc


[-- Attachment #1.1: Type: text/plain, Size: 5699 bytes --]

Hi Martin,

On 9/3/22 14:47, Martin Uecker wrote:
[...]

> GCC will warn if the bound is specified inconsistently between
> declarations and also emit warnings if it can see that a buffer
> which is passed is too small:
> 
> https://godbolt.org/z/PsjPG1nv7

That's very good news!

BTW, it's nice to see that GCC doesn't need 'static' for array 
parameters.  I never understood what the static keyword adds there. 
There's no way one can specify an array size an mean anything other than 
requiring that, for a non-null pointer, the array should have at least 
that size.

> 
> 
> BTW: If you declare pointers to arrays (not first elements) you
> can get run-time bounds checking with UBSan:
> 
> https://godbolt.org/z/TvMo89WfP

Couldn't that be caught at compile time?  n is certainly out of bounds 
always for such an array, since the last element is n-1.

> 
> 
>>
>> Also, new code can be designed from the beginning so that sizes go
>> before their corresponding arrays, so that new code won't typically be
>> affected by the lack of this feature in the language.
>>
>> This leaves us with legacy code, especially libc, which just works, and
>> doesn't have any urgent needs to change their prototypes in this regard
>> (they could, to improve static analysis, but not what we'd call urgent).
> 
> It would be useful step to find out-of-bounds problem in
> applications using libc.

Yep, it would be very useful for that.  Not urgent, but yes, very useful.


>> Let's take an example:
>>
>>
>>          int getnameinfo(const struct sockaddr *restrict addr,
>>                          socklen_t addrlen,
>>                          char *restrict host, socklen_t hostlen,
>>                          char *restrict serv, socklen_t servlen,
>>                          int flags);
>>
>> and some transformations:
>>
>>
>>          int getnameinfo(const struct sockaddr *restrict addr,
>>                          socklen_t addrlen,
>>                          char host[restrict hostlen], socklen_t hostlen,
>>                          char serv[restrict servlen], socklen_t servlen,
>>                          int flags);
>>
>>
>>          int getnameinfo(socklen_t hostlen;
>>                          socklen_t servlen;
>>                          const struct sockaddr *restrict addr,
>>                          socklen_t addrlen,
>>                          char host[restrict hostlen], socklen_t hostlen,
>>                          char serv[restrict servlen], socklen_t servlen,
>>                          int flags);
>>
>> (I'm not sure if I used correct GNU syntax, since I never used that
>> extension myself.)
>>
>> The first transformation above is non-ambiguous, as concise as possible,
>> and its only issue is that it might complicate the implementation a bit
>> too much.  I don't think forward-using a parameter's size would be too
>> much of a parsing problem for human readers.
> 
> 
> I personally find the second form not terrible.  Being
> able to read code left-to-right, top-down is helpful in more
> complicated examples.
> 
> 
> 
>> The second one is unnecessarily long and verbose, and semicolons are not
>> very distinguishable from commas, for human readers, which may be very
>> confusing.
>>
>>          int foo(int a; int b[a], int a);
>>          int foo(int a, int b[a], int o);
>>
>> Those two are very different to the compiler, and yet very similar to
>> the human eye.  I don't like it.  The fact that it allows for simpler
>> compilers isn't enough to overcome the readability issues.
> 
> This is true, I would probably use it with a comma and/or
> syntax highlighting.
> 
> 
>> I think I'd prefer having the forward-using syntax as a non-standard
>> extension --or a standard but optional language feature-- to avoid
>> forcing small compilers to implement it, rather than having the GNU
>> extension standardized in all compilers.
> 
> The problems with the second form are:
> 
> - it is not 100% backwards compatible (which maybe ok though) as
> the semantics of the following code changes:
> 
> int n;
> int foo(int a[n], int n); // refers to different n!
> 
> Code written for new compilers could then be misunderstood
> by old compilers when a variable with 'n' is in scope.
> 
> 

Hmmm, this one is serious.  I can't seem to solve it with that syntax.

> - it would generally be fundamentally new to C to have
> backwards references and parser might need to be changes
> to allow this
> 
> 
> - a compiler or tool then has to deal also with ugly
> corner cases such as mutual references:
> 
> int foo(int (*a)[sizeof(*b)], int (*b)[sizeof(*a)]);
> 
> 
> 
> We could consider new syntax such as
> 
> int foo(char buf[.n], int n);
> 
> 
> Personally, I would prefer the conceptual simplicity of forward
> declarations and the fact that these exist already in GCC
> over any alternative.  I would also not mind new syntax, but
> then one has to define the rules more precisely to avoid the
> aforementioned problems.

What about taking something from K&R functions for this?:

int foo(q; w; int a[q], int q, int s[w], int w);

By not specifying the types, the syntax is again short.
This is left-to-right, so no problems with global variables, and no need 
for complex parsers.
Also, by not specifying types, now it's more obvious to the naked eye 
that there's a difference:


           int foo(a; int b[a], int a);
           int foo(int a, int b[a], int o);


What do you think about this syntax?


Thanks,

Alex

-- 
Alejandro Colomar
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-09-03 13:41                   ` Alejandro Colomar
@ 2022-09-03 14:35                     ` Martin Uecker
  2022-09-03 14:59                       ` Alejandro Colomar
  0 siblings, 1 reply; 50+ messages in thread
From: Martin Uecker @ 2022-09-03 14:35 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc

Am Samstag, den 03.09.2022, 15:41 +0200 schrieb Alejandro Colomar:
> Hi Martin,
> 
> On 9/3/22 14:47, Martin Uecker wrote:
> [...]
> 
> > GCC will warn if the bound is specified inconsistently between
> > declarations and also emit warnings if it can see that a buffer
> > which is passed is too small:
> > 
> > https://godbolt.org/z/PsjPG1nv7
> 
> That's very good news!
> 
> BTW, it's nice to see that GCC doesn't need 'static' for array 
> parameters.  I never understood what the static keyword adds there. 
> There's no way one can specify an array size an mean anything other than 
> requiring that, for a non-null pointer, the array should have at least 
> that size.

From the C standard's point of view,

void foo(int n, char buf[n]);

is semantically equivalent to

void foo(int, char *buf);

and without 'static' the 'n' has no further meaning
(this is different for pointers to arrays).

The static keyword implies that the pointer is be valid and
non-zero and that there must be at least 'n' elements
accessible, so in some sense it is stronger (it implies 
alid non-zero pointers), but at the same time it does not
imply a bound.

But I agree that 'n' without 'static' should simply imply
a bound and I think we should use it this way even when
the standard currently does not attach a meaning to it.

> > 
> > BTW: If you declare pointers to arrays (not first elements) you
> > can get run-time bounds checking with UBSan:
> > 
> > https://godbolt.org/z/TvMo89WfP
> 
> Couldn't that be caught at compile time?  n is certainly out of bounds 
> always for such an array, since the last element is n-1.

Yes, in this example it could (and ideally should) be
detected at compile time.

But this notation already today allows passing of a bound
across API  boundaries and thus enables run-time detection of
out-of-bound accesses even in scenarious where it could
not be found at compile time.

> > 
> > > Also, new code can be designed from the beginning so that sizes go
> > > before their corresponding arrays, so that new code won't typically be
> > > affected by the lack of this feature in the language.
> > > 
> > > This leaves us with legacy code, especially libc, which just works, and
> > > doesn't have any urgent needs to change their prototypes in this regard
> > > (they could, to improve static analysis, but not what we'd call urgent).
> > 
> > It would be useful step to find out-of-bounds problem in
> > applications using libc.
> 
> Yep, it would be very useful for that.  Not urgent, but yes, very useful.
> 
> 
> > > Let's take an example:
> > > 
> > > 
> > >          int getnameinfo(const struct sockaddr *restrict addr,
> > >                          socklen_t addrlen,
> > >                          char *restrict host, socklen_t hostlen,
> > >                          char *restrict serv, socklen_t servlen,
> > >                          int flags);
> > > 
> > > and some transformations:
> > > 
> > > 
> > >          int getnameinfo(const struct sockaddr *restrict addr,
> > >                          socklen_t addrlen,
> > >                          char host[restrict hostlen], socklen_t hostlen,
> > >                          char serv[restrict servlen], socklen_t servlen,
> > >                          int flags);
> > > 
> > > 
> > >          int getnameinfo(socklen_t hostlen;
> > >                          socklen_t servlen;
> > >                          const struct sockaddr *restrict addr,
> > >                          socklen_t addrlen,
> > >                          char host[restrict hostlen], socklen_t hostlen,
> > >                          char serv[restrict servlen], socklen_t servlen,
> > >                          int flags);
> > > 
> > > (I'm not sure if I used correct GNU syntax, since I never used that
> > > extension myself.)
> > > 
> > > The first transformation above is non-ambiguous, as concise as possible,
> > > and its only issue is that it might complicate the implementation a bit
> > > too much.  I don't think forward-using a parameter's size would be too
> > > much of a parsing problem for human readers.
> > 
> > I personally find the second form not terrible.  Being
> > able to read code left-to-right, top-down is helpful in more
> > complicated examples.
> > 
> > 
> > 
> > > The second one is unnecessarily long and verbose, and semicolons are not
> > > very distinguishable from commas, for human readers, which may be very
> > > confusing.
> > > 
> > >          int foo(int a; int b[a], int a);
> > >          int foo(int a, int b[a], int o);
> > > 
> > > Those two are very different to the compiler, and yet very similar to
> > > the human eye.  I don't like it.  The fact that it allows for simpler
> > > compilers isn't enough to overcome the readability issues.
> > 
> > This is true, I would probably use it with a comma and/or
> > syntax highlighting.
> > 
> > 
> > > I think I'd prefer having the forward-using syntax as a non-standard
> > > extension --or a standard but optional language feature-- to avoid
> > > forcing small compilers to implement it, rather than having the GNU
> > > extension standardized in all compilers.
> > 
> > The problems with the second form are:
> > 
> > - it is not 100% backwards compatible (which maybe ok though) as
> > the semantics of the following code changes:
> > 
> > int n;
> > int foo(int a[n], int n); // refers to different n!
> > 
> > Code written for new compilers could then be misunderstood
> > by old compilers when a variable with 'n' is in scope.
> > 
> > 
> 
> Hmmm, this one is serious.  I can't seem to solve it with that syntax.
> 
> > - it would generally be fundamentally new to C to have
> > backwards references and parser might need to be changes
> > to allow this
> > 
> > 
> > - a compiler or tool then has to deal also with ugly
> > corner cases such as mutual references:
> > 
> > int foo(int (*a)[sizeof(*b)], int (*b)[sizeof(*a)]);
> > 
> > 
> > 
> > We could consider new syntax such as
> > 
> > int foo(char buf[.n], int n);
> > 
> > 
> > Personally, I would prefer the conceptual simplicity of forward
> > declarations and the fact that these exist already in GCC
> > over any alternative.  I would also not mind new syntax, but
> > then one has to define the rules more precisely to avoid the
> > aforementioned problems.
> 
> What about taking something from K&R functions for this?:
> 
> int foo(q; w; int a[q], int q, int s[w], int w);
> 
> By not specifying the types, the syntax is again short.
> This is left-to-right, so no problems with global variables, and no need 
> for complex parsers.
> Also, by not specifying types, now it's more obvious to the naked eye 
> that there's a difference:

I am ok with the syntax, but I am not sure how this would
work. If the type is determined only later you would still
have to change parsers (some C compilers do type
checking  and folding during parsing, so need the types
to be known during parsing) and you also still have the
problem with the mutual dependencies.

We thought about using this syntax

int foo(char buf[.n], int n);

because it is new syntax which means we can restrict the
size to be the name of a parameter instead of allowing
arbitrary expressions, which then makes forward references
less problematic.  It is also consistent with designators in
initializers and could also be extend to annotate
flexible array members or for storing pointers to arrays
in structures:

struct {
  int n;
  char buf[.n];
};

struct {
  int n;
  char (*buf)[.n];
};


Martin


> 
>            int foo(a; int b[a], int a);
>            int foo(int a, int b[a], int o);
> 
> 
> What do you think about this syntax?




^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-09-03 14:35                     ` Martin Uecker
@ 2022-09-03 14:59                       ` Alejandro Colomar
  2022-09-03 15:31                         ` Martin Uecker
  0 siblings, 1 reply; 50+ messages in thread
From: Alejandro Colomar @ 2022-09-03 14:59 UTC (permalink / raw)
  To: Martin Uecker; +Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc


[-- Attachment #1.1: Type: text/plain, Size: 3878 bytes --]

Hi Martin,

On 9/3/22 16:35, Martin Uecker wrote:
> Am Samstag, den 03.09.2022, 15:41 +0200 schrieb Alejandro Colomar:
>> Hi Martin,
>>
>> On 9/3/22 14:47, Martin Uecker wrote:
>> [...]
>>
>>> GCC will warn if the bound is specified inconsistently between
>>> declarations and also emit warnings if it can see that a buffer
>>> which is passed is too small:
>>>
>>> https://godbolt.org/z/PsjPG1nv7
>>
>> That's very good news!
>>
>> BTW, it's nice to see that GCC doesn't need 'static' for array
>> parameters.  I never understood what the static keyword adds there.
>> There's no way one can specify an array size an mean anything other than
>> requiring that, for a non-null pointer, the array should have at least
>> that size.
> 
>  From the C standard's point of view,
> 
> void foo(int n, char buf[n]);
> 
> is semantically equivalent to
> 
> void foo(int, char *buf);
> 
> and without 'static' the 'n' has no further meaning
> (this is different for pointers to arrays).

I know.  I just don't understand the rationale for that decission. :/

> 
> The static keyword implies that the pointer is be valid and
> non-zero and that there must be at least 'n' elements
> accessible, so in some sense it is stronger (it implies
> alid non-zero pointers), but at the same time it does not
> imply a bound.

That stronger meaning, I think is a mistake by the standard.
Basically, [static n] means the same as [n] combined with [[gnu::nonnull]].
What the standard should have done would be to keep those two things 
separate, since one may want to declare non-null non-array pointers, or 
possibly-null array ones.  So the standard should have standardized some 
form of nonnull for that.  But the recent discussion about presenting 
nonnull pointers as [static 1] is horrible.  But let's wait till the 
future hopefully fixes this.

> 
> But I agree that 'n' without 'static' should simply imply
> a bound and I think we should use it this way even when
> the standard currently does not attach a meaning to it.

Yep.

[...]

>> What about taking something from K&R functions for this?:
>>
>> int foo(q; w; int a[q], int q, int s[w], int w);
>>
>> By not specifying the types, the syntax is again short.
>> This is left-to-right, so no problems with global variables, and no need
>> for complex parsers.
>> Also, by not specifying types, now it's more obvious to the naked eye
>> that there's a difference:
> 
> I am ok with the syntax, but I am not sure how this would
> work. If the type is determined only later you would still
> have to change parsers (some C compilers do type
> checking  and folding during parsing, so need the types
> to be known during parsing) and you also still have the
> problem with the mutual dependencies.

This syntax resembles a lot K&R syntax.  Any C compiler that supports 
them (and I guess most compilers out there do) should be easily 
convertible to support this syntax (at least more easily than other 
alternatives).  But this is just a guess.

> 
> We thought about using this syntax
> 
> int foo(char buf[.n], int n);
> 
> because it is new syntax which means we can restrict the
> size to be the name of a parameter instead of allowing
> arbitrary expressions, which then makes forward references
> less problematic.  It is also consistent with designators in
> initializers and could also be extend to annotate
> flexible array members or for storing pointers to arrays
> in structures:

It's not crazy.  I don't have much to argue against it.

> 
> struct {
>    int n;
>    char buf[.n];
> };
> 
> struct {
>    int n;
>    char (*buf)[.n];
> };

Perhaps some doubts about how this would work for nested structures, but 
not unreasonable.

Cheers,

Alex

-- 
Alejandro Colomar
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-09-03 13:29                   ` Ingo Schwarze
@ 2022-09-03 15:08                     ` Alejandro Colomar
  0 siblings, 0 replies; 50+ messages in thread
From: Alejandro Colomar @ 2022-09-03 15:08 UTC (permalink / raw)
  To: Ingo Schwarze; +Cc: Martin Uecker, JeanHeyd Meneide, linux-man, gcc


[-- Attachment #1.1: Type: text/plain, Size: 1460 bytes --]

Hi Ingo,

On 9/3/22 15:29, Ingo Schwarze wrote:
> the only point i strongly care about is this one:
> 
> Manual pages should not use
>   * non-standard syntax
>   * non-portable syntax
>   * ambiguous syntax (i.e. syntax that might have different meanings
>     with different compilers or in different contexts)
>   * syntax that might be invalid or dangerous with some widely
>     used compiler collections like GCC or LLVM

The first two are good guidelines, but not strict IMHO if there's a good 
reason.

The third and fourth are a strong requirements.

For now I won't be applying this patch.

> 
> Regarding the discussions about standardization and extensions,
> all proposals i have seen look seriously ugly and awkward to me,
> and i'm not yet convinced such ugliness is sufficiently offset by
> the relatively minor benefit that is apparent to me right now.

I hope we come up with something not ugly from that discussion.

The static analysis / compiler warning capabilities of using VLA syntax 
seem strong reasons to me.  They help avoid stupid bugs, even for 
careless programmers (well, only if those careless programmers care just 
enough to enable -Wall, and then to read the warnings).  Not something 
that will fix an incorrect algorithm, but can stop some typos, or other 
stupid mistakes that we all do from time to time.

Cheers,

Alex

-- 
Alejandro Colomar
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-09-03 14:59                       ` Alejandro Colomar
@ 2022-09-03 15:31                         ` Martin Uecker
  2022-09-03 20:02                           ` Alejandro Colomar
  2022-11-10  0:06                           ` Alejandro Colomar
  0 siblings, 2 replies; 50+ messages in thread
From: Martin Uecker @ 2022-09-03 15:31 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc

Hi Alejandro,

Am Samstag, den 03.09.2022, 16:59 +0200 schrieb Alejandro Colomar:
> Hi Martin,
> 
> On 9/3/22 16:35, Martin Uecker wrote:
> > Am Samstag, den 03.09.2022, 15:41 +0200 schrieb Alejandro Colomar:
> > > Hi Martin,
> > > 
> > > On 9/3/22 14:47, Martin Uecker wrote:
> > > [...]
> > > 
> > > > GCC will warn if the bound is specified inconsistently between
> > > > declarations and also emit warnings if it can see that a buffer
> > > > which is passed is too small:
> > > > 
> > > > https://godbolt.org/z/PsjPG1nv7
> > > 
> > > That's very good news!
> > > 
> > > BTW, it's nice to see that GCC doesn't need 'static' for array
> > > parameters.  I never understood what the static keyword adds there.
> > > There's no way one can specify an array size an mean anything other than
> > > requiring that, for a non-null pointer, the array should have at least
> > > that size.
> > 
> >  From the C standard's point of view,
> > 
> > void foo(int n, char buf[n]);
> > 
> > is semantically equivalent to
> > 
> > void foo(int, char *buf);
> > 
> > and without 'static' the 'n' has no further meaning
> > (this is different for pointers to arrays).
> 
> I know.  I just don't understand the rationale for that decission. :/

I guess it made sense in the past, but is simply not
what we need today.

> > The static keyword implies that the pointer is be valid and
> > non-zero and that there must be at least 'n' elements
> > accessible, so in some sense it is stronger (it implies
> > alid non-zero pointers), but at the same time it does not
> > imply a bound.
> 
> That stronger meaning, I think is a mistake by the standard.
> Basically, [static n] means the same as [n] combined with [[gnu::nonnull]].
> What the standard should have done would be to keep those two things 
> separate, since one may want to declare non-null non-array pointers, or 
> possibly-null array ones.  So the standard should have standardized some 
> form of nonnull for that.  

I agree the situation is not good.  

> But the recent discussion about presenting 
> nonnull pointers as [static 1] is horrible.  But let's wait till the 
> future hopefully fixes this.

yes, [static 1] is problematic because then the number
can not be used as a bound anymore. 

My experience is that if one wants to see something fixed,
one has to push for it.  Standardization is meant
to standardize existing practice, so if we want to see
this improved, we can not wait for this.

> > But I agree that 'n' without 'static' should simply imply
> > a bound and I think we should use it this way even when
> > the standard currently does not attach a meaning to it.
> 
> Yep.
> 
> [...]
> 
> > > What about taking something from K&R functions for this?:
> > > 
> > > int foo(q; w; int a[q], int q, int s[w], int w);
> > > 
> > > By not specifying the types, the syntax is again short.
> > > This is left-to-right, so no problems with global variables, and no need
> > > for complex parsers.
> > > Also, by not specifying types, now it's more obvious to the naked eye
> > > that there's a difference:
> > 
> > I am ok with the syntax, but I am not sure how this would
> > work. If the type is determined only later you would still
> > have to change parsers (some C compilers do type
> > checking  and folding during parsing, so need the types
> > to be known during parsing) and you also still have the
> > problem with the mutual dependencies.
> 
> This syntax resembles a lot K&R syntax.  Any C compiler that supports 
> them (and I guess most compilers out there do) should be easily 
> convertible to support this syntax (at least more easily than other 
> alternatives).  But this is just a guess.

In K&R syntax this worked for definition:

void foo(y, n)
 int n;
 int y[n];
{ ...

But this worked because you could reorder the
declarations so that later declarations could
refer to previous ones.

So one could do

int foo(int n, char buf[n];  buf, n);

where the second part defines the order of
the parameter or

int foo(buf, n; int n, char buf[n]);

where the first part defins the order,
but the declarations need to have the size
first. But then you need to specify each
parameter twice...


> > We thought about using this syntax
> > 
> > int foo(char buf[.n], int n);
> > 
> > because it is new syntax which means we can restrict the
> > size to be the name of a parameter instead of allowing
> > arbitrary expressions, which then makes forward references
> > less problematic.  It is also consistent with designators in
> > initializers and could also be extend to annotate
> > flexible array members or for storing pointers to arrays
> > in structures:
> 
> It's not crazy.  I don't have much to argue against it.
> 
> > struct {
> >    int n;
> >    char buf[.n];
> > };
> > 
> > struct {
> >    int n;
> >    char (*buf)[.n];
> > };
> 
> Perhaps some doubts about how this would work for nested structures, but 
> not unreasonable.

It is not implemented though...

Martin


> Cheers,
> 
> Alex
> 
> -- 
> Alejandro Colomar
> <http://www.alejandro-colomar.es/>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-09-03 15:31                         ` Martin Uecker
@ 2022-09-03 20:02                           ` Alejandro Colomar
  2022-09-05 14:31                             ` Alejandro Colomar
  2022-11-10  0:06                           ` Alejandro Colomar
  1 sibling, 1 reply; 50+ messages in thread
From: Alejandro Colomar @ 2022-09-03 20:02 UTC (permalink / raw)
  To: Martin Uecker; +Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc


[-- Attachment #1.1: Type: text/plain, Size: 2782 bytes --]

Hi Martin,

On 9/3/22 17:31, Martin Uecker wrote:
[...]

>> But the recent discussion about presenting
>> nonnull pointers as [static 1] is horrible.  But let's wait till the
>> future hopefully fixes this.
> 
> yes, [static 1] is problematic because then the number
> can not be used as a bound anymore.
> 
> My experience is that if one wants to see something fixed,
> one has to push for it.  Standardization is meant
> to standardize existing practice, so if we want to see
> this improved, we can not wait for this.
> 

Yeah, I'm not just waiting to see if it gets fixed alone.  I've been 
discussing about nonnull being added to the standard, or improved in the 
compilers, but so far no compiler has something convincing.  GCC's 
attribute is problematic due to UB issues, and Clang's _Nonnull keyword 
is useless as of now:

<https://github.com/llvm/llvm-project/issues/57546>

Maybe GCC could add Clang's _Nonnull (and maybe _Nullable and the 
pragmas, but definitely not _Null_unspecified), and add some good warnings.

Only then it would make sense to try to standardize the feature.

[...]

> In K&R syntax this worked for definition:
> 
> void foo(y, n)
>   int n;
>   int y[n];
> { ...
> 
> But this worked because you could reorder the
> declarations so that later declarations could
> refer to previous ones.
> 
> So one could do
> 
> int foo(int n, char buf[n];  buf, n);
> 
> where the second part defines the order of
> the parameter or
> 
> int foo(buf, n; int n, char buf[n]);
> 
> where the first part defins the order,
> but the declarations need to have the size
> first. But then you need to specify each
> parameter twice...

Hmm, yeah, maybe the [.n] notation makes more sense.

> 
> 
>>> We thought about using this syntax
>>>
>>> int foo(char buf[.n], int n);
>>>
>>> because it is new syntax which means we can restrict the
>>> size to be the name of a parameter instead of allowing
>>> arbitrary expressions, which then makes forward references
>>> less problematic.  It is also consistent with designators in
>>> initializers and could also be extend to annotate
>>> flexible array members or for storing pointers to arrays
>>> in structures:
>>
>> It's not crazy.  I don't have much to argue against it.
>>
>>> struct {
>>>     int n;
>>>     char buf[.n];
>>> };
>>>
>>> struct {
>>>     int n;
>>>     char (*buf)[.n];
>>> };
>>
>> Perhaps some doubts about how this would work for nested structures, but
>> not unreasonable.
> 
> It is not implemented though...

Well, are you planning to implement it?
If you do, I'm very interested in using it in the documentation ;)


Cheers,

Alex

-- 
Alejandro Colomar
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-09-03 20:02                           ` Alejandro Colomar
@ 2022-09-05 14:31                             ` Alejandro Colomar
  0 siblings, 0 replies; 50+ messages in thread
From: Alejandro Colomar @ 2022-09-05 14:31 UTC (permalink / raw)
  To: Martin Uecker; +Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc


[-- Attachment #1.1: Type: text/plain, Size: 730 bytes --]

Hi Martin,

On 9/3/22 22:02, Alejandro Colomar wrote:
>>>> We thought about using this syntax
>>>>
>>>> int foo(char buf[.n], int n);

BTW, it would be useful if this syntax was accepted for void * too, 
especially since GNU C allows pointer arithmetic on void *.

     void *memmove(void dest[.n], const void src[.n], size_t n);

I understand that a void array doesn't make sense, so defining a VLA of 
type void is an error elsewhere, but since array parameters are not 
really arrays, and instead pointers, this could be reasonable.

The same that these "arrays" can have zero sizes, or even negative ones 
in some weird cases.

Cheers,

Alex

-- 
Alejandro Colomar
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-09-03 15:31                         ` Martin Uecker
  2022-09-03 20:02                           ` Alejandro Colomar
@ 2022-11-10  0:06                           ` Alejandro Colomar
  2022-11-10  0:09                             ` Alejandro Colomar
                                               ` (2 more replies)
  1 sibling, 3 replies; 50+ messages in thread
From: Alejandro Colomar @ 2022-11-10  0:06 UTC (permalink / raw)
  To: Martin Uecker; +Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc


[-- Attachment #1.1: Type: text/plain, Size: 1792 bytes --]

Hi Martin,

On 9/3/22 17:31, Martin Uecker wrote:
> My experience is that if one wants to see something fixed,
> one has to push for it.  Standardization is meant
> to standardize existing practice, so if we want to see
> this improved, we can not wait for this.

I fully agree with you.  I've been ruminating these patches for some time, for 
having some more time to think about them.  Now, I like them enough to push. 
So, after a few minor cosmetic issues detected by some linters, I've pushed the 
changes to document all of man2 and man3 with hypothetical VLA syntax.

Now, I've released man-pages-6.01 very recently (just a few weeks ago), and I 
don't plan to release again in a year or two, so there's time to do the 
implementation in GCC.  From my side, please consider this an ACK or even 
somewhat of a push to get things done in the compiler side of things :)

I'll show here an excerpt of what kind of syntax has been pushed.  Of course, 
there's room for improving/fixing, since it's not seen an official release, but 
for now, this is what's up there:


        int strncmp(const char s1[.n], const char s2[.n], size_t n);

        long mbind(void addr[.len], unsigned long len, int mode,
                   const unsigned long nodemask[(.maxnode + ULONG_WIDTH ‐ 1)
                                                / ULONG_WIDTH],
                   unsigned long maxnode, unsigned int flags);

        int cacheflush(void addr[.nbytes], int nbytes, int cache);


I've shown the three kinds of prototypes that have been changed:

-  Normal VLA; nothing fancy except for the '.'.
-  Complex size expressions.
-  'void *' VLAs (assuming GNU conventions: sizeof(void *)==1).


Cheers,

Alex

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-10  0:06                           ` Alejandro Colomar
@ 2022-11-10  0:09                             ` Alejandro Colomar
  2022-11-10  1:33                             ` Joseph Myers
  2022-11-10  9:40                             ` G. Branden Robinson
  2 siblings, 0 replies; 50+ messages in thread
From: Alejandro Colomar @ 2022-11-10  0:09 UTC (permalink / raw)
  To: Martin Uecker; +Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc


[-- Attachment #1.1: Type: text/plain, Size: 2058 bytes --]



On 11/10/22 01:06, Alejandro Colomar wrote:
> Hi Martin,
> 
> On 9/3/22 17:31, Martin Uecker wrote:
>> My experience is that if one wants to see something fixed,
>> one has to push for it.  Standardization is meant
>> to standardize existing practice, so if we want to see
>> this improved, we can not wait for this.
> 
> I fully agree with you.  I've been ruminating these patches for some time, for 
> having some more time to think about them.  Now, I like them enough to push. So, 
> after a few minor cosmetic issues detected by some linters, I've pushed the 
> changes to document all of man2 and man3 with hypothetical VLA syntax.
> 
> Now, I've released man-pages-6.01 very recently (just a few weeks ago), and I 
> don't plan to release again in a year or two, so there's time to do the 
> implementation in GCC.  From my side, please consider this an ACK or even 
> somewhat of a push to get things done in the compiler side of things :)
> 
> I'll show here an excerpt of what kind of syntax has been pushed.  Of course, 
> there's room for improving/fixing, since it's not seen an official release, but 
> for now, this is what's up there:
> 
> 
>         int strncmp(const char s1[.n], const char s2[.n], size_t n);
> 
>         long mbind(void addr[.len], unsigned long len, int mode,
>                    const unsigned long nodemask[(.maxnode + ULONG_WIDTH ‐ 1)
>                                                 / ULONG_WIDTH],
>                    unsigned long maxnode, unsigned int flags);
> 
>         int cacheflush(void addr[.nbytes], int nbytes, int cache);
> 
> 
> I've shown the three kinds of prototypes that have been changed:
> 
> -  Normal VLA; nothing fancy except for the '.'.
> -  Complex size expressions.
> -  'void *' VLAs (assuming GNU conventions: sizeof(void *)==1).

Oops: sizeof(void)==1
> 
> 
> Cheers,
> 
> Alex
> 

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-10  0:06                           ` Alejandro Colomar
  2022-11-10  0:09                             ` Alejandro Colomar
@ 2022-11-10  1:33                             ` Joseph Myers
  2022-11-10  1:39                               ` Joseph Myers
  2022-11-10  9:40                             ` G. Branden Robinson
  2 siblings, 1 reply; 50+ messages in thread
From: Joseph Myers @ 2022-11-10  1:33 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc

On Thu, 10 Nov 2022, Alejandro Colomar via Gcc wrote:

> I've shown the three kinds of prototypes that have been changed:
> 
> -  Normal VLA; nothing fancy except for the '.'.
> -  Complex size expressions.
> -  'void *' VLAs (assuming GNU conventions: sizeof(void *)==1).

That doesn't cover any of the tricky issues with such proposals, such as 
the choice of which entity is referred to by the parameter name when there 
are multiple nested parameter lists that use the same parameter name, or 
when the identifier is visible from an outer scope (including in 
particular the case where it's declared as a typedef name in an outer 
scope).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-10  1:33                             ` Joseph Myers
@ 2022-11-10  1:39                               ` Joseph Myers
  2022-11-10  6:21                                 ` Martin Uecker
  0 siblings, 1 reply; 50+ messages in thread
From: Joseph Myers @ 2022-11-10  1:39 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc

On Thu, 10 Nov 2022, Joseph Myers wrote:

> On Thu, 10 Nov 2022, Alejandro Colomar via Gcc wrote:
> 
> > I've shown the three kinds of prototypes that have been changed:
> > 
> > -  Normal VLA; nothing fancy except for the '.'.
> > -  Complex size expressions.
> > -  'void *' VLAs (assuming GNU conventions: sizeof(void *)==1).
> 
> That doesn't cover any of the tricky issues with such proposals, such as 
> the choice of which entity is referred to by the parameter name when there 
> are multiple nested parameter lists that use the same parameter name, or 
> when the identifier is visible from an outer scope (including in 
> particular the case where it's declared as a typedef name in an outer 
> scope).

In fact I can't tell from these examples whether you mean for a '.' token 
after '[' to have special semantics, or whether you mean to have a special 
'. identifier' form of expression valid in certain context (each of which 
introduces its own complications; for the former, typedef names from outer 
scopes are problematic; for the latter, it's designated initializers where 
you get complications, for example).  Designing new syntax that doesn't 
cause ambiguity is generally tricky, and this sort of language extension 
is the kind of thing where you'd expect to so through at least five 
iterations of a WG14 paper before you have something like a sound 
specification.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-10  1:39                               ` Joseph Myers
@ 2022-11-10  6:21                                 ` Martin Uecker
  2022-11-10 10:09                                   ` Alejandro Colomar
  2022-11-10 23:19                                   ` Joseph Myers
  0 siblings, 2 replies; 50+ messages in thread
From: Martin Uecker @ 2022-11-10  6:21 UTC (permalink / raw)
  To: Joseph Myers, Alejandro Colomar
  Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc

Am Donnerstag, den 10.11.2022, 01:39 +0000 schrieb Joseph Myers:
> On Thu, 10 Nov 2022, Joseph Myers wrote:
> 
> > On Thu, 10 Nov 2022, Alejandro Colomar via Gcc wrote:
> > 
> > > I've shown the three kinds of prototypes that have been changed:
> > > 
> > > -  Normal VLA; nothing fancy except for the '.'.
> > > -  Complex size expressions.
> > > -  'void *' VLAs (assuming GNU conventions: sizeof(void *)==1).
> > 
> > That doesn't cover any of the tricky issues with such proposals, such as 
> > the choice of which entity is referred to by the parameter name when there 
> > are multiple nested parameter lists that use the same parameter name, or 
> > when the identifier is visible from an outer scope (including in 
> > particular the case where it's declared as a typedef name in an outer 
> > scope).
> 
> In fact I can't tell from these examples whether you mean for a '.' token 
> after '[' to have special semantics, or whether you mean to have a special 
> '. identifier' form of expression valid in certain context (each of which 
> introduces its own complications; for the former, typedef names from outer 
> scopes are problematic; for the latter, it's designated initializers where 
> you get complications, for example).  Designing new syntax that doesn't 
> cause ambiguity is generally tricky, and this sort of language extension 
> is the kind of thing where you'd expect to so through at least five 
> iterations of a WG14 paper before you have something like a sound 
> specification.

I am not sure what Alejandro has in mind exactly, but my idea of using
a new notation [.identifier] would be to limit it to accessing other
parameter names in the same parameter list only, so that there is 

1) no ambiguity what is referred to  and  
2) one can access parameters which come later 

If we want to specify something like this, I think we should also
restrict what kind of expressions one allows, e.g. it has to
be side-effect free.  But maybe we want to make this even more
restrictive (at least initially).

One problem with WG14 papers is that people put in too much,
because the overhead is so high and the standard is not updated
very often.  It would be better to build such feature more
incrementally, which could be done more easily with a compiler
extension.  One could start supporting just [.x] but not more
complicated expressions.

Later WG14 can still accept or reject or modify this proposal
based on the experience we get.

(I would also be happy with using GNU forward declarations, and
I am not sure why people dislike them so much.) 


Martin




^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-10  0:06                           ` Alejandro Colomar
  2022-11-10  0:09                             ` Alejandro Colomar
  2022-11-10  1:33                             ` Joseph Myers
@ 2022-11-10  9:40                             ` G. Branden Robinson
  2022-11-10 10:59                               ` Alejandro Colomar
  2 siblings, 1 reply; 50+ messages in thread
From: G. Branden Robinson @ 2022-11-10  9:40 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc

[-- Attachment #1: Type: text/plain, Size: 944 bytes --]

Hi Alex,

At 2022-11-10T01:06:31+0100, Alejandro Colomar wrote:
> Now, I've released man-pages-6.01 very recently (just a few weeks
> ago), and I don't plan to release again in a year or two, so there's
> time to do the implementation in GCC.  From my side, please consider
> this an ACK or even somewhat of a push to get things done in the
> compiler side of things :)

Do you mean you _don't_ plan to release again for a year or two?

You know what Moltke said about plans and contact with the enemy.  For
one thing, I think the Linux kernel will move too fast to permit such a
leisurely cadence.

Also, as soon as Bertrand and I can get groff 1.23 out[1], I am hoping
you will, shortly thereafter, migrate to the new `MR` macro.

<tents fingers, laughs villainously>

Regards,
Branden

[1] Only 6 RC bugs left!

    https://savannah.gnu.org/bugs/index.php?go_report=Apply&group=groff&set=custom&report_id=225&status_id=1&plan_release_id=103

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-10  6:21                                 ` Martin Uecker
@ 2022-11-10 10:09                                   ` Alejandro Colomar
  2022-11-10 23:19                                   ` Joseph Myers
  1 sibling, 0 replies; 50+ messages in thread
From: Alejandro Colomar @ 2022-11-10 10:09 UTC (permalink / raw)
  To: Martin Uecker, Joseph Myers
  Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc


[-- Attachment #1.1: Type: text/plain, Size: 6277 bytes --]

Hi Joseph and Martin!

On 11/10/22 07:21, Martin Uecker wrote:
> Am Donnerstag, den 10.11.2022, 01:39 +0000 schrieb Joseph Myers:
>> On Thu, 10 Nov 2022, Joseph Myers wrote:
>>
>>> On Thu, 10 Nov 2022, Alejandro Colomar via Gcc wrote:
>>>
>>>> I've shown the three kinds of prototypes that have been changed:
>>>>
>>>> -  Normal VLA; nothing fancy except for the '.'.
>>>> -  Complex size expressions.
>>>> -  'void *' VLAs (assuming GNU conventions: sizeof(void *)==1).
>>>
>>> That doesn't cover any of the tricky issues with such proposals, such as
>>> the choice of which entity is referred to by the parameter name when there
>>> are multiple nested parameter lists that use the same parameter name, or
>>> when the identifier is visible from an outer scope (including in
>>> particular the case where it's declared as a typedef name in an outer
>>> scope).
>>
>> In fact I can't tell from these examples whether you mean for a '.' token
>> after '[' to have special semantics, or whether you mean to have a special
>> '. identifier' form of expression valid in certain context (each of which
>> introduces its own complications; for the former, typedef names from outer
>> scopes are problematic; for the latter, it's designated initializers where
>> you get complications, for example).  Designing new syntax that doesn't
>> cause ambiguity is generally tricky, and this sort of language extension
>> is the kind of thing where you'd expect to so through at least five
>> iterations of a WG14 paper before you have something like a sound
>> specification.
> 
> I am not sure what Alejandro has in mind exactly, but my idea of using
> a new notation [.identifier] would be to limit it to accessing other
> parameter names in the same parameter list only, so that there is
> 
> 1) no ambiguity what is referred to  and
> 2) one can access parameters which come later

Yes, I implemented your idea.  As always, I thought I had linked to it in the 
commit message, but I didn't.  Quite a bad thing for the commit that implements 
a completely new feature to not point to the documentation/idea at all.

So, the documentation followed by these 3 patches is Martin's email:
<https://lore.kernel.org/linux-man/601680ae-30d7-1481-e152-034083f6dde1@gmail.com/T/#med2bdfcc31a3d0b3bc6c48b229c8d8dd5088935e>

It was sound in my head, and I couldn't see any inconsistencies.

-  I implemented it with '.' as being restricted to refer to parameters of the 
function being prototypes (commit 1).

-  I also allowed complex expressions in the prototypes (commit 2), since it's 
something that can be quite useful (that was already foreseen by Martin's idea, 
IIRC).  The most useful example that I have in my mind is a patch that I'm 
developing for shadow-utils:
 
<https://github.com/shadow-maint/shadow/pull/569/files#diff-12b560bab6b4fb8f7f3a16f01aaa994de539a8bed3058c976be0daebe16405c1>

    The gist of it is a function that gets a fixed-width non-NUL-terminated 
string, and copies it into a NUL-terminated string in a buffer than has to be of 
course +1 the size of the input string:

	void buf2str(char dst[restrict .n+1], const char src[restrict .n],
	             size_t n);

-  I extended the idea to apply to void[] (commit 3).  Something not yet allowed 
by GCC, but very useful IMO, especially for the mem...(3) functions.  Since GNU 
C consistently treats sizeof(void)==1, it makes sense to allow VLA syntax in 
that way.  This is not at all about allowing true VLAs of type void[]; that's 
forbidden, and should continue to be forbidden.  But since parameters are just 
pointers, I don't see any issue with allowing false void[] VLAs in parameters 
that really are void* in disguise.


The 3 commits are here (last 3 commits in that log):
<https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?id=c64cd13e002561c6802c6a1a1a8a640f034fea70>


Martin, please check if I implemented your idea faithfully.  The 3 example 
prototypes I showed are good representatives of what I added, so if you don't 
understand man(7) source you could just read them and see if they make sense to 
you; the rest of the changes are of the same kind.  Or you could install the man 
pages from the repo :)



> 
> If we want to specify something like this, I think we should also
> restrict what kind of expressions one allows, e.g. it has to
> be side-effect free.

Well, yes, there should be no side effects; it would not make sense in a 
prototype.  I'd put it as simply as with _Generic(3) and similar stuff, where 
the controlling expression is not evaluated for side effects.  I never remember 
about sizeof() or typeof(): I always need to consult if they have side effects 
or not.  I'll be documenting that in the man-pages soon.

>  But maybe we want to make this even more
> restrictive (at least initially).

Yeah, you could go for an initial implementation that only supports my commit 1; 
that would be the simplest.  That would cover already the vast majority of 
cases.  But please consider commits 2 and 3 afterwards, since I believe they are 
also of great importance.

> 
> One problem with WG14 papers is that people put in too much,
> because the overhead is so high and the standard is not updated
> very often.  It would be better to build such feature more
> incrementally, which could be done more easily with a compiler
> extension.  One could start supporting just [.x] but not more
> complicated expressions.
> 
> Later WG14 can still accept or reject or modify this proposal
> based on the experience we get.

Yeah, and I also think any WG14 papers with features as important as this one 
without prior experience in a real compiler should be rejected.  I don't think 
it makes sense to standardize something just from theoretical discussions, and 
force everyone to implement it afterwards.  No matter how good the reviewers are.

> 
> (I would also be happy with using GNU forward declarations, and
> I am not sure why people dislike them so much.)

For me, it's how easy it is to confuse a comma with a semicolon.  Also, 
unnecessarily long lines.

> 
> Martin
Cheers,

Alex

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-10  9:40                             ` G. Branden Robinson
@ 2022-11-10 10:59                               ` Alejandro Colomar
  2022-11-10 22:25                                 ` G. Branden Robinson
  0 siblings, 1 reply; 50+ messages in thread
From: Alejandro Colomar @ 2022-11-10 10:59 UTC (permalink / raw)
  To: G. Branden Robinson
  Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc


[-- Attachment #1.1: Type: text/plain, Size: 1807 bytes --]

Hi Branden!

On 11/10/22 10:40, G. Branden Robinson wrote:
> Hi Alex,
> 
> At 2022-11-10T01:06:31+0100, Alejandro Colomar wrote:
>> Now, I've released man-pages-6.01 very recently (just a few weeks
>> ago), and I don't plan to release again in a year or two, so there's
>> time to do the implementation in GCC.  From my side, please consider
>> this an ACK or even somewhat of a push to get things done in the
>> compiler side of things :)
> 
> Do you mean you _don't_ plan to release again for a year or two?
> 
> You know what Moltke said about plans and contact with the enemy.  For
> one thing, I think the Linux kernel will move too fast to permit such a
> leisurely cadence.

Heh, at this point, I burnt my ships, by using enhanced VLA syntax.  If I 
release that before GCC, I'm expecting to see an avalanche of reports about it 
(and I also expect that GCC and forums will receive a similar ammount).  So yes, 
I expect to wait some longish time.

> 
> Also, as soon as Bertrand and I can get groff 1.23 out[1], I am hoping
> you will, shortly thereafter, migrate to the new `MR` macro.

Not as soon as it gets released, because I expect (at least a decent amount of) 
contributors to be able to read the pages to which they contribute to, but as 
soon as it makes it into Debian stable, yes, that's in my plans.  So, if you 
make it before the freeze, that means around a couple of months from now.

> 
> <tents fingers, laughs villainously>

<also tents fingers, laughs villainously>

> 
> Regards,
> Branden
> 
> [1] Only 6 RC bugs left!

Looks good!

Cheers,

Alex

> 
>      https://savannah.gnu.org/bugs/index.php?go_report=Apply&group=groff&set=custom&report_id=225&status_id=1&plan_release_id=103

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-10 10:59                               ` Alejandro Colomar
@ 2022-11-10 22:25                                 ` G. Branden Robinson
  0 siblings, 0 replies; 50+ messages in thread
From: G. Branden Robinson @ 2022-11-10 22:25 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc

[-- Attachment #1: Type: text/plain, Size: 1533 bytes --]

Hi Alex,

At 2022-11-10T11:59:02+0100, Alejandro Colomar wrote:
> > You know what Moltke said about plans and contact with the enemy.
> > For one thing, I think the Linux kernel will move too fast to permit
> > such a leisurely cadence.
> 
> Heh, at this point, I burnt my ships, by using enhanced VLA syntax.
> If I release that before GCC, I'm expecting to see an avalanche of
> reports about it (and I also expect that GCC and forums will receive a
> similar ammount).  So yes, I expect to wait some longish time.

Hah, you rebutted my Moltke with your namesake.  You understand that I'm
obligated to spring a reference to the Battle of Lepanto or something on
you at some point.

> > Also, as soon as Bertrand and I can get groff 1.23 out[1], I am
> > hoping you will, shortly thereafter, migrate to the new `MR` macro.
> 
> Not as soon as it gets released, because I expect (at least a decent
> amount of) contributors to be able to read the pages to which they
> contribute to,

Laggardly adopters can always put this in man.local.

.if !d MR \{\
.  de MR
.    IR \\$1 (\\$2)\\$3
.  .
.\}

> but as soon as it makes it into Debian stable, yes, that's in my
> plans.  So, if you make it before the freeze, that means around a
> couple of months from now.

Yes.  It is a major personal goal to get groff 1.23 into Debian
bookworm.

> > <tents fingers, laughs villainously>
> 
> <also tents fingers, laughs villainously>

https://www.youtube.com/watch?v=VhH2egTLohM

Regards,
Branden

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-10  6:21                                 ` Martin Uecker
  2022-11-10 10:09                                   ` Alejandro Colomar
@ 2022-11-10 23:19                                   ` Joseph Myers
  2022-11-10 23:28                                     ` Alejandro Colomar
                                                       ` (2 more replies)
  1 sibling, 3 replies; 50+ messages in thread
From: Joseph Myers @ 2022-11-10 23:19 UTC (permalink / raw)
  To: Martin Uecker
  Cc: Alejandro Colomar, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc

On Thu, 10 Nov 2022, Martin Uecker via Gcc wrote:

> One problem with WG14 papers is that people put in too much,
> because the overhead is so high and the standard is not updated
> very often.  It would be better to build such feature more
> incrementally, which could be done more easily with a compiler
> extension.  One could start supporting just [.x] but not more
> complicated expressions.

Even a compiler extension requires the level of detail of specification 
that you get with a WG14 paper (and the level of work on finding bugs in 
that specification), to avoid the problem we've had before with too many 
features added in GCC 2.x days where a poorly defined feature is "whatever 
the compiler accepts".

If you use .x as the notation but don't limit it to [.x], you have a 
completely new ambiguity between ordinary identifiers and member names

struct s { int a; };
void f(int a, int b[((struct s) { .a = 1 }).a]);

where it's newly ambiguous whether ".a = 1" is an assignment to the 
expression ".a" or a use of a designated initializer.

(I think that if you add any syntax for this, GNU VLA forward declarations 
are clearly to be preferred to inventing something new like [.x] which 
introduces its own problems.)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-10 23:19                                   ` Joseph Myers
@ 2022-11-10 23:28                                     ` Alejandro Colomar
  2022-11-11 19:52                                     ` Martin Uecker
  2022-11-12 12:34                                     ` Alejandro Colomar
  2 siblings, 0 replies; 50+ messages in thread
From: Alejandro Colomar @ 2022-11-10 23:28 UTC (permalink / raw)
  To: Joseph Myers, Martin Uecker
  Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc


[-- Attachment #1.1: Type: text/plain, Size: 2085 bytes --]

Hi Joseph,

On 11/11/22 00:19, Joseph Myers wrote:
> On Thu, 10 Nov 2022, Martin Uecker via Gcc wrote:
> 
>> One problem with WG14 papers is that people put in too much,
>> because the overhead is so high and the standard is not updated
>> very often.  It would be better to build such feature more
>> incrementally, which could be done more easily with a compiler
>> extension.  One could start supporting just [.x] but not more
>> complicated expressions.
> 
> Even a compiler extension requires the level of detail of specification
> that you get with a WG14 paper (and the level of work on finding bugs in
> that specification), to avoid the problem we've had before with too many
> features added in GCC 2.x days where a poorly defined feature is "whatever
> the compiler accepts".
> 
> If you use .x as the notation but don't limit it to [.x], you have a
> completely new ambiguity between ordinary identifiers and member names
> 
> struct s { int a; };
> void f(int a, int b[((struct s) { .a = 1 }).a]);
> 
> where it's newly ambiguous whether ".a = 1" is an assignment to the
> expression ".a" or a use of a designated initializer.
> 
> (I think that if you add any syntax for this, GNU VLA forward declarations
> are clearly to be preferred to inventing something new like [.x] which
> introduces its own problems.)

Yeah, I think limiting it to [.n] initially, and only moving forward, step by 
step, if it's perfectly clear that it's doable seems very reasonable.

Re: GNU VLA fwd decl:

This example is what I'm worried about:

         int foo(int a; int b[a], int a);
         int foo(int a, int b[a], int o);

Okay, parameters should have more readable names...  But still, it allows for a 
high chance of wtf moments.  However, I can think of a syntax very similar to 
GNU's, that would make it a bit better in terms of readability: not declaring 
the type in the fwd decl:


         int foo(a; int b[a], int a);
         int foo(int a, int b[a], int o);

Cheers,

Alex


-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-10 23:19                                   ` Joseph Myers
  2022-11-10 23:28                                     ` Alejandro Colomar
@ 2022-11-11 19:52                                     ` Martin Uecker
  2022-11-12  1:09                                       ` Joseph Myers
  2022-11-12 12:34                                     ` Alejandro Colomar
  2 siblings, 1 reply; 50+ messages in thread
From: Martin Uecker @ 2022-11-11 19:52 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Alejandro Colomar, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc

Am Donnerstag, den 10.11.2022, 23:19 +0000 schrieb Joseph Myers:
> On Thu, 10 Nov 2022, Martin Uecker via Gcc wrote:
> 
> > One problem with WG14 papers is that people put in too much,
> > because the overhead is so high and the standard is not updated
> > very often.  It would be better to build such feature more
> > incrementally, which could be done more easily with a compiler
> > extension.  One could start supporting just [.x] but not more
> > complicated expressions.
> 
> Even a compiler extension requires the level of detail of specification 
> that you get with a WG14 paper (and the level of work on finding bugs in 
> that specification), to avoid the problem we've had before with too many 
> features added in GCC 2.x days where a poorly defined feature is "whatever 
> the compiler accepts".

I think the effort needed to specify the feature correctly
can be minimized by making the first version of the feature
as simple as possible.  

> If you use .x as the notation but don't limit it to [.x], you have a 
> completely new ambiguity between ordinary identifiers and member names
> 
> struct s { int a; };
> void f(int a, int b[((struct s) { .a = 1 }).a]);
> 
> where it's newly ambiguous whether ".a = 1" is an assignment to the 
> expression ".a" or a use of a designated initializer.

If we only allowed [ . a ] then this example would not be allowed.

If need more flexibility, we could incrementally extend it.

> (I think that if you add any syntax for this, GNU VLA forward declarations 
> are clearly to be preferred to inventing something new like [.x] which 
> introduces its own problems.)

I also prefer this.

I proposed forward declarations but WG14 and also people in this
discussion did not like them.  If we would actually start using
them, we could propose them again for the next revision.

Martin




^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-11 19:52                                     ` Martin Uecker
@ 2022-11-12  1:09                                       ` Joseph Myers
  2022-11-12  7:24                                         ` Martin Uecker
  0 siblings, 1 reply; 50+ messages in thread
From: Joseph Myers @ 2022-11-12  1:09 UTC (permalink / raw)
  To: Martin Uecker
  Cc: Alejandro Colomar, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc

On Fri, 11 Nov 2022, Martin Uecker via Gcc wrote:

> > Even a compiler extension requires the level of detail of specification 
> > that you get with a WG14 paper (and the level of work on finding bugs in 
> > that specification), to avoid the problem we've had before with too many 
> > features added in GCC 2.x days where a poorly defined feature is "whatever 
> > the compiler accepts".
> 
> I think the effort needed to specify the feature correctly
> can be minimized by making the first version of the feature
> as simple as possible.  

The version of constexpr in the current C2x working draft is more or less 
as simple as possible.  It also went through lots of revisions to get 
there.  I'm currently testing an implementation of C2x constexpr for GCC 
13, and there are still several issues with the specification I found in 
the implementation process, beyond those raised in WG14 discussions, for 
which I'll need to raise NB comments to clarify things.

I think that illustrates that you need the several iterations on the 
specification process, *and* making it as simple as possible, *and* 
getting implementation experience, *and* the implementation experience 
being with a close eye to what it implies for all the details in the 
specification rather than just getting something vaguely functional but 
not clearly specified.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-12  1:09                                       ` Joseph Myers
@ 2022-11-12  7:24                                         ` Martin Uecker
  0 siblings, 0 replies; 50+ messages in thread
From: Martin Uecker @ 2022-11-12  7:24 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Alejandro Colomar, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc

Am Samstag, den 12.11.2022, 01:09 +0000 schrieb Joseph Myers:
> On Fri, 11 Nov 2022, Martin Uecker via Gcc wrote:
> 
> > > Even a compiler extension requires the level of detail of specification 
> > > that you get with a WG14 paper (and the level of work on finding bugs in 
> > > that specification), to avoid the problem we've had before with too many 
> > > features added in GCC 2.x days where a poorly defined feature is "whatever 
> > > the compiler accepts".
> > 
> > I think the effort needed to specify the feature correctly
> > can be minimized by making the first version of the feature
> > as simple as possible.  
> 
> The version of constexpr in the current C2x working draft is more or less 
> as simple as possible.  It also went through lots of revisions to get 
> there.  I'm currently testing an implementation of C2x constexpr for GCC 
> 13, and there are still several issues with the specification I found in 
> the implementation process, beyond those raised in WG14 discussions, for 
> which I'll need to raise NB comments to clarify things.

constexpr had no implementation experience in C at all and
always suspected that C++ experience should somehow count is
not really justified.  

> I think that illustrates that you need the several iterations on the 
> specification process, *and* making it as simple as possible, *and* 
> getting implementation experience, *and* the implementation experience 
> being with a close eye to what it implies for all the details in the 
> specification rather than just getting something vaguely functional but 
> not clearly specified.

I agree. We should work on specification and on prototyping
new features in parallel.

Martin



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-10 23:19                                   ` Joseph Myers
  2022-11-10 23:28                                     ` Alejandro Colomar
  2022-11-11 19:52                                     ` Martin Uecker
@ 2022-11-12 12:34                                     ` Alejandro Colomar
  2022-11-12 12:46                                       ` Alejandro Colomar
  2022-11-12 13:03                                       ` Joseph Myers
  2 siblings, 2 replies; 50+ messages in thread
From: Alejandro Colomar @ 2022-11-12 12:34 UTC (permalink / raw)
  To: Joseph Myers, Martin Uecker
  Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc


[-- Attachment #1.1: Type: text/plain, Size: 2166 bytes --]

Hi Joseph,

On 11/11/22 00:19, Joseph Myers wrote:
> On Thu, 10 Nov 2022, Martin Uecker via Gcc wrote:
> 
>> One problem with WG14 papers is that people put in too much,
>> because the overhead is so high and the standard is not updated
>> very often.  It would be better to build such feature more
>> incrementally, which could be done more easily with a compiler
>> extension.  One could start supporting just [.x] but not more
>> complicated expressions.
> 
> Even a compiler extension requires the level of detail of specification
> that you get with a WG14 paper (and the level of work on finding bugs in
> that specification), to avoid the problem we've had before with too many
> features added in GCC 2.x days where a poorly defined feature is "whatever
> the compiler accepts".
> 
> If you use .x as the notation but don't limit it to [.x], you have a
> completely new ambiguity between ordinary identifiers and member names
> 
> struct s { int a; };
> void f(int a, int b[((struct s) { .a = 1 }).a]);

Is it really ambiguous?  Let's show some currently-valid code:


struct s {
	int a;
};

struct t {
	struct s s;
	int a;
};

void f(void)
{
	struct t x = {
		.a = 1,
		.s = {
			.a = ((struct s) {.a = 1}).a,
		},
	};
}


It is ambiguous to a human reader, but that's a subjective thing, and of course 
shadowing should be avoided by programmers.  However, for a compiler, scoping 
and syntax rules should be unambiguous, I think.  In your code example, I 
believe it is unambiguous that both '.a' refer to the struct member.

But maybe we're not considering more complex situations that might really be 
ambiguous to the compiler, so a first round of supporting only [.a] would be a 
good first implementation.

> 
> where it's newly ambiguous whether ".a = 1" is an assignment to the
> expression ".a" or a use of a designated initializer.
> 
> (I think that if you add any syntax for this, GNU VLA forward declarations
> are clearly to be preferred to inventing something new like [.x] which
> introduces its own problems.)
> 

Cheers,

Alex

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-12 12:34                                     ` Alejandro Colomar
@ 2022-11-12 12:46                                       ` Alejandro Colomar
  2022-11-12 13:03                                       ` Joseph Myers
  1 sibling, 0 replies; 50+ messages in thread
From: Alejandro Colomar @ 2022-11-12 12:46 UTC (permalink / raw)
  To: Joseph Myers, Martin Uecker
  Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc


[-- Attachment #1.1: Type: text/plain, Size: 1004 bytes --]

On 11/12/22 13:34, Alejandro Colomar wrote:
> struct s {
>      int a;
> };
> 
> struct t {
>      struct s s;
>      int a;
> };
> 
> void f(void)
> {
>      struct t x = {
>          .a = 1,
>          .s = {
>              .a = ((struct s) {.a = 1}).a,
>          },
>      };
> }

 From here, a demonstration of what I understood from Martin's email is that 
there's also an idea of allowing the following:


struct s {
     int a;
     int b;
};

struct t {
     struct s s;
     int a;
     int b;
};

void f(void)
{
     struct t x = {
         .a = 1,
         .s = {
             // In the following line, .b=.a is assigning 2
             .a = ((struct s) {.a = 2, .b = .a}).b,
             // The previous line assigned 2, since the compound had 2 in .b
         },
         // In the following line, .b=.a is assigning 1
         .b = .a,
     };
}

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-12 12:34                                     ` Alejandro Colomar
  2022-11-12 12:46                                       ` Alejandro Colomar
@ 2022-11-12 13:03                                       ` Joseph Myers
  2022-11-12 13:40                                         ` Alejandro Colomar
  1 sibling, 1 reply; 50+ messages in thread
From: Joseph Myers @ 2022-11-12 13:03 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc

On Sat, 12 Nov 2022, Alejandro Colomar via Gcc wrote:

> > struct s { int a; };
> > void f(int a, int b[((struct s) { .a = 1 }).a]);
> 
> Is it really ambiguous?  Let's show some currently-valid code:

Well, I still don't know what the syntax addition you propose is.  Is it

postfix-expression : . identifier

(with a special rule about how the identifier is interpreted, different 
from the normal scope rules)?  If so, then ".a = 1" could either match 
assignment-expression directly (assigning to the postfix-expression ".a").  
Or it could match designation[opt] initializer, where ".a" is a 
designator.  And as I've noted many times in discussions of C2x proposals 
on the WG14 reflector, if some sequence of tokens can match the syntax in 
more than one way, there always needs to be explicit normative text to 
disambiguate the intended parse - it's not enough that one parse might 
lead later to a violation of some other constraint (not that either parse 
leads to a constraint violation in this case).

Or is the syntax

array-declarator : direct-declarator [ . assignment-expression ]

(with appropriate variants with static and type-qualifier-list and for 
array-abstract-declarator as well, and with different identifier 
interpretation rules inside the assignment-expression)?  If so, then there 
are big problems parsing [ . ( a ) + ( b ) ], where 'a' is a typedef name 
in an outer scope, because the appropriate parse would depend on whether 
'a' is shadowed by a parameter - unless of course you add appropriate 
wording like that present in some places about not being able to use this 
syntax to shadow a typedef name.

Or is it just

array-declarator : direct-declarator [ . identifier ]

which might avoid some of these problems at the expense of being less 
expressive?

If you're proposing a C syntax addition, you always need to be clear about 
exactly what the new cases in the syntax would be, and how you resolve 
ambiguities with any other existing part of the syntax, how you interact 
with rules on scopes, namespaces and linkage of identifiers, etc.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-12 13:03                                       ` Joseph Myers
@ 2022-11-12 13:40                                         ` Alejandro Colomar
  2022-11-12 13:58                                           ` Alejandro Colomar
  2022-11-12 14:54                                           ` Joseph Myers
  0 siblings, 2 replies; 50+ messages in thread
From: Alejandro Colomar @ 2022-11-12 13:40 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc


[-- Attachment #1.1: Type: text/plain, Size: 3876 bytes --]

Hi Joseph,

On 11/12/22 14:03, Joseph Myers wrote:
> On Sat, 12 Nov 2022, Alejandro Colomar via Gcc wrote:
> 
>>> struct s { int a; };
>>> void f(int a, int b[((struct s) { .a = 1 }).a]);
>>
>> Is it really ambiguous?  Let's show some currently-valid code:
> 
> Well, I still don't know what the syntax addition you propose is.  Is it
> 
> postfix-expression : . identifier

I'll try to explain it in standardeese, but I'm not sure if I'll get it right, 
so I'll accompany it with plain English.

Maybe Martin can help.

Since it's to be used as an rvalue, not as a lvalue, I guess a 
postfix-expression wouldn't be the right one.

> 
> (with a special rule about how the identifier is interpreted, different
> from the normal scope rules)?  If so, then ".a = 1" could either match
> assignment-expression directly (assigning to the postfix-expression ".a").

No, assigning to a function parameter from within another parameter declaration 
wouldn't make sense.  They should be readonly.  Side effects should be 
forbidden, I think.

> Or it could match designation[opt] initializer, where ".a" is a
> designator.  And as I've noted many times in discussions of C2x proposals
> on the WG14 reflector, if some sequence of tokens can match the syntax in
> more than one way, there always needs to be explicit normative text to
> disambiguate the intended parse - it's not enough that one parse might
> lead later to a violation of some other constraint (not that either parse
> leads to a constraint violation in this case).
> 
> Or is the syntax
> 
> array-declarator : direct-declarator [ . assignment-expression ]

Not good either.  The '.' should prefix the identifier, not the expression.  So, 
for example, you would have:

        void *bsearch(const void key[.size], const void base[.size * .nmemb],
                      size_t nmemb, size_t size,
                      int (*compar)(const void [.size], const void [.size]));

That's taken from the current manual page from git HEAD.  See 'base', which gets 
its size from the multiplication of 'size' and 'nmemb'.

> 
> (with appropriate variants with static and type-qualifier-list and for
> array-abstract-declarator as well, and with different identifier
> interpretation rules inside the assignment-expression)?  If so, then there
> are big problems parsing [ . ( a ) + ( b ) ], where 'a' is a typedef name
> in an outer scope, because the appropriate parse would depend on whether
> 'a' is shadowed by a parameter - unless of course you add appropriate
> wording like that present in some places about not being able to use this
> syntax to shadow a typedef name.
> 
> Or is it just
> 
> array-declarator : direct-declarator [ . identifier ]

For the initial implementation, it would be, I think.

> 
> which might avoid some of these problems at the expense of being less
> expressive?

Yes.

> 
> If you're proposing a C syntax addition, you always need to be clear about
> exactly what the new cases in the syntax would be, and how you resolve
> ambiguities with any other existing part of the syntax, how you interact
> with rules on scopes, namespaces and linkage of identifiers, etc.

Yeah, I'll try.

I think that the complete feature would allow 'designator' to be used within 
unary-expression:

unary-expression: designator

Since sizeof(foo) is a unary-expression and you can't assign to it, I'm guessing 
that similar rules could be used for '.size'.


That would have the effect of allowing both features suggested by Martin: being 
able to used designators in both structures (as demonstrated in my last email) 
and function prototypes (as in the thing we're discussing).

I hope I got it right.  I'm not used to lexical grammar so much.

Cheers,

Alex


-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-12 13:40                                         ` Alejandro Colomar
@ 2022-11-12 13:58                                           ` Alejandro Colomar
  2022-11-12 14:54                                           ` Joseph Myers
  1 sibling, 0 replies; 50+ messages in thread
From: Alejandro Colomar @ 2022-11-12 13:58 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc


[-- Attachment #1.1: Type: text/plain, Size: 4271 bytes --]



On 11/12/22 14:40, Alejandro Colomar wrote:
> Hi Joseph,
> 
> On 11/12/22 14:03, Joseph Myers wrote:
>> On Sat, 12 Nov 2022, Alejandro Colomar via Gcc wrote:
>>
>>>> struct s { int a; };
>>>> void f(int a, int b[((struct s) { .a = 1 }).a]);
>>>
>>> Is it really ambiguous?  Let's show some currently-valid code:
>>
>> Well, I still don't know what the syntax addition you propose is.  Is it
>>
>> postfix-expression : . identifier
> 
> I'll try to explain it in standardeese, but I'm not sure if I'll get it right, 
> so I'll accompany it with plain English.
> 
> Maybe Martin can help.
> 
> Since it's to be used as an rvalue, not as a lvalue, I guess a 
> postfix-expression wouldn't be the right one.
> 
>>
>> (with a special rule about how the identifier is interpreted, different
>> from the normal scope rules)?  If so, then ".a = 1" could either match
>> assignment-expression directly (assigning to the postfix-expression ".a").
> 
> No, assigning to a function parameter from within another parameter declaration 
> wouldn't make sense.  They should be readonly.  Side effects should be 
> forbidden, I think.
> 
>> Or it could match designation[opt] initializer, where ".a" is a
>> designator.  And as I've noted many times in discussions of C2x proposals
>> on the WG14 reflector, if some sequence of tokens can match the syntax in
>> more than one way, there always needs to be explicit normative text to
>> disambiguate the intended parse - it's not enough that one parse might
>> lead later to a violation of some other constraint (not that either parse
>> leads to a constraint violation in this case).
>>
>> Or is the syntax
>>
>> array-declarator : direct-declarator [ . assignment-expression ]
> 
> Not good either.  The '.' should prefix the identifier, not the expression.  So, 
> for example, you would have:
> 
>         void *bsearch(const void key[.size], const void base[.size * .nmemb],
>                       size_t nmemb, size_t size,
>                       int (*compar)(const void [.size], const void [.size]));
> 
> That's taken from the current manual page from git HEAD.  See 'base', which gets 
> its size from the multiplication of 'size' and 'nmemb'.
> 
>>
>> (with appropriate variants with static and type-qualifier-list and for
>> array-abstract-declarator as well, and with different identifier
>> interpretation rules inside the assignment-expression)?  If so, then there
>> are big problems parsing [ . ( a ) + ( b ) ], where 'a' is a typedef name
>> in an outer scope, because the appropriate parse would depend on whether
>> 'a' is shadowed by a parameter - unless of course you add appropriate
>> wording like that present in some places about not being able to use this
>> syntax to shadow a typedef name.
>>
>> Or is it just
>>
>> array-declarator : direct-declarator [ . identifier ]
> 
> For the initial implementation, it would be, I think.
> 
>>
>> which might avoid some of these problems at the expense of being less
>> expressive?
> 
> Yes.
> 
>>
>> If you're proposing a C syntax addition, you always need to be clear about
>> exactly what the new cases in the syntax would be, and how you resolve
>> ambiguities with any other existing part of the syntax, how you interact
>> with rules on scopes, namespaces and linkage of identifiers, etc.
> 
> Yeah, I'll try.
> 
> I think that the complete feature would allow 'designator' to be used within 
> unary-expression:
> 
> unary-expression: designator

Some mistake I did:  Since enum designators don't make sense in this feature, it 
should only be:

unary-expression: . identifier

> 
> Since sizeof(foo) is a unary-expression and you can't assign to it, I'm guessing 
> that similar rules could be used for '.size'.
> 
> 
> That would have the effect of allowing both features suggested by Martin: being 
> able to used designators in both structures (as demonstrated in my last email) 
> and function prototypes (as in the thing we're discussing).
> 
> I hope I got it right.  I'm not used to lexical grammar so much.
> 
> Cheers,
> 
> Alex
> 
> 

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-12 13:40                                         ` Alejandro Colomar
  2022-11-12 13:58                                           ` Alejandro Colomar
@ 2022-11-12 14:54                                           ` Joseph Myers
  2022-11-12 15:35                                             ` Alejandro Colomar
  2022-11-12 15:56                                             ` Martin Uecker
  1 sibling, 2 replies; 50+ messages in thread
From: Joseph Myers @ 2022-11-12 14:54 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc

On Sat, 12 Nov 2022, Alejandro Colomar via Gcc wrote:

> Since it's to be used as an rvalue, not as a lvalue, I guess a
> postfix-expression wouldn't be the right one.

Several forms of postfix-expression are only rvalues.

> > (with a special rule about how the identifier is interpreted, different
> > from the normal scope rules)?  If so, then ".a = 1" could either match
> > assignment-expression directly (assigning to the postfix-expression ".a").
> 
> No, assigning to a function parameter from within another parameter
> declaration wouldn't make sense.  They should be readonly.  Side effects
> should be forbidden, I think.

Such assignments are already allowed.  In a function definition, the side 
effects (including in size expressions for array parameters adjusted to 
pointers) take place before entry to the function body.

And, in any case, if you did have a constraint disallowing such 
assignments, it wouldn't suffice for syntactic disambiguation (see the 
previous point I made about that; I have some rough notes towards a WG14 
paper on syntactic disambiguation, but haven't converted them into a 
coherent paper).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-12 14:54                                           ` Joseph Myers
@ 2022-11-12 15:35                                             ` Alejandro Colomar
  2022-11-12 17:02                                               ` Joseph Myers
  2022-11-12 15:56                                             ` Martin Uecker
  1 sibling, 1 reply; 50+ messages in thread
From: Alejandro Colomar @ 2022-11-12 15:35 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc


[-- Attachment #1.1: Type: text/plain, Size: 1768 bytes --]

Hi Joseph,

On 11/12/22 15:54, Joseph Myers wrote:
> On Sat, 12 Nov 2022, Alejandro Colomar via Gcc wrote:
> 
>> Since it's to be used as an rvalue, not as a lvalue, I guess a
>> postfix-expression wouldn't be the right one.
> 
> Several forms of postfix-expression are only rvalues.
> 
>>> (with a special rule about how the identifier is interpreted, different
>>> from the normal scope rules)?  If so, then ".a = 1" could either match
>>> assignment-expression directly (assigning to the postfix-expression ".a").
>>
>> No, assigning to a function parameter from within another parameter
>> declaration wouldn't make sense.  They should be readonly.  Side effects
>> should be forbidden, I think.
> 
> Such assignments are already allowed.  In a function definition, the side
> effects (including in size expressions for array parameters adjusted to
> pointers) take place before entry to the function body.

Then, I'm guessing that rules need to change in a way that .initializer cannot 
appear as the left operand of an assignment-expression.

That is, for the following current definition of the assignment-expression (as 
of N3054):

assignment-expression:
     conditional-expression
     unary-expression assignment-operator assignment-expression

The unary-expression cannot be formed by a .initializer.

Would that be doable and sufficient?

Cheers,

Alex

> 
> And, in any case, if you did have a constraint disallowing such
> assignments, it wouldn't suffice for syntactic disambiguation (see the
> previous point I made about that; I have some rough notes towards a WG14
> paper on syntactic disambiguation, but haven't converted them into a
> coherent paper).
> 

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-12 14:54                                           ` Joseph Myers
  2022-11-12 15:35                                             ` Alejandro Colomar
@ 2022-11-12 15:56                                             ` Martin Uecker
  2022-11-13 13:19                                               ` Alejandro Colomar
  1 sibling, 1 reply; 50+ messages in thread
From: Martin Uecker @ 2022-11-12 15:56 UTC (permalink / raw)
  To: Joseph Myers, Alejandro Colomar
  Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc

Am Samstag, den 12.11.2022, 14:54 +0000 schrieb Joseph Myers:
> On Sat, 12 Nov 2022, Alejandro Colomar via Gcc wrote:
> 
> > Since it's to be used as an rvalue, not as a lvalue, I guess a
> > postfix-expression wouldn't be the right one.
> 
> Several forms of postfix-expression are only rvalues.
> 
> > > (with a special rule about how the identifier is interpreted, different
> > > from the normal scope rules)?  If so, then ".a = 1" could either match
> > > assignment-expression directly (assigning to the postfix-expression ".a").
> > 
> > No, assigning to a function parameter from within another parameter
> > declaration wouldn't make sense.  They should be readonly.  Side effects
> > should be forbidden, I think.
> 
> Such assignments are already allowed.  In a function definition, the side 
> effects (including in size expressions for array parameters adjusted to 
> pointers) take place before entry to the function body.
> 
> And, in any case, if you did have a constraint disallowing such 
> assignments, it wouldn't suffice for syntactic disambiguation (see the 
> previous point I made about that; I have some rough notes towards a WG14 
> paper on syntactic disambiguation, but haven't converted them into a 
> coherent paper).

My idea was to only allow

array-declarator : direct-declarator [ . identifier ]

and only for parameter (not nested inside structs declared
in parameter list) as a first step because it seems this 
would exclude all difficult cases.

But if we need to allow more complicated expressions, then
it starts getting more complicated.

One could could allow more generic expressions, and
specify that the .identifier refers to a
parameter in
the nearest lexically enclosing parameter list or
struct/union.

Then

void foo(struct bar { int x; char c[.x] } a, int x);

would not be allowed (which is good because then we
could later use the syntax also inside structs). If
we apply scoping rules, the following would work:

struct bar { int y; };
void foo(char p[((struct bar){ .y = .x }).y], int x);

But not:

struct bar { int y; };
void foo(char p[((struct bar){ .y = .y }).y], int y);


But there are not only syntactical problems, because
also the type of the parameter might become relevant
and then you can get circular dependencies:

void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);

I am not sure what would the best way to fix it. One
could specifiy that parameters referred to by 
the .identifer syntax must of some integer type and
that the sub-expression .identifer is always
converted to a 'size_t'. 

Maybe one should also add a constraint that all new
type length expressions, i.e. using the syntax,
can not have side effects. Or even that they follow
all the rules of integer constant expressions with
the fictitious assumption that all . identifer 
sub-expressions are integer constant expressions.
The rationale being that this would facilitate
compile time reasoning about length expressions.
 

Martin






^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-12 15:35                                             ` Alejandro Colomar
@ 2022-11-12 17:02                                               ` Joseph Myers
  2022-11-12 17:08                                                 ` Alejandro Colomar
  0 siblings, 1 reply; 50+ messages in thread
From: Joseph Myers @ 2022-11-12 17:02 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc

On Sat, 12 Nov 2022, Alejandro Colomar via Gcc wrote:

> > > No, assigning to a function parameter from within another parameter
> > > declaration wouldn't make sense.  They should be readonly.  Side effects
> > > should be forbidden, I think.
> > 
> > Such assignments are already allowed.  In a function definition, the side
> > effects (including in size expressions for array parameters adjusted to
> > pointers) take place before entry to the function body.
> 
> Then, I'm guessing that rules need to change in a way that .initializer cannot
> appear as the left operand of an assignment-expression.

I think needing such a very special case rule tends to indicate that some 
alternative syntax, not needing such a rule, would be better.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-12 17:02                                               ` Joseph Myers
@ 2022-11-12 17:08                                                 ` Alejandro Colomar
  0 siblings, 0 replies; 50+ messages in thread
From: Alejandro Colomar @ 2022-11-12 17:08 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc


[-- Attachment #1.1: Type: text/plain, Size: 1244 bytes --]



On 11/12/22 18:02, Joseph Myers wrote:
> On Sat, 12 Nov 2022, Alejandro Colomar via Gcc wrote:
> 
>>>> No, assigning to a function parameter from within another parameter
>>>> declaration wouldn't make sense.  They should be readonly.  Side effects
>>>> should be forbidden, I think.
>>>
>>> Such assignments are already allowed.  In a function definition, the side
>>> effects (including in size expressions for array parameters adjusted to
>>> pointers) take place before entry to the function body.
>>
>> Then, I'm guessing that rules need to change in a way that .initializer cannot
>> appear as the left operand of an assignment-expression.
> 
> I think needing such a very special case rule tends to indicate that some
> alternative syntax, not needing such a rule, would be better.

Well, by not being an lvalue, it can't be assigned to.  That would be somewhat 
like sizeof(identifier), which is also a unary-expression, so it's not so much 
of a special case, is it?

void f(size_t s, int a[sizeof(1) = 1]);  // constraint violation
void g(size_t s, int a[.s = 1]);         // Also constraint violation
void h(size_t s, int a[s = 1]);          // This is fine



-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-12 15:56                                             ` Martin Uecker
@ 2022-11-13 13:19                                               ` Alejandro Colomar
  2022-11-13 13:33                                                 ` Alejandro Colomar
  2022-11-14 17:52                                                 ` Joseph Myers
  0 siblings, 2 replies; 50+ messages in thread
From: Alejandro Colomar @ 2022-11-13 13:19 UTC (permalink / raw)
  To: Martin Uecker, Joseph Myers
  Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc


[-- Attachment #1.1: Type: text/plain, Size: 6674 bytes --]

Hi Martin!

On 11/12/22 16:56, Martin Uecker wrote:
> Am Samstag, den 12.11.2022, 14:54 +0000 schrieb Joseph Myers:
>> On Sat, 12 Nov 2022, Alejandro Colomar via Gcc wrote:
>>
>>> Since it's to be used as an rvalue, not as a lvalue, I guess a
>>> postfix-expression wouldn't be the right one.
>>
>> Several forms of postfix-expression are only rvalues.
>>
>>>> (with a special rule about how the identifier is interpreted, different
>>>> from the normal scope rules)?  If so, then ".a = 1" could either match
>>>> assignment-expression directly (assigning to the postfix-expression ".a").
>>>
>>> No, assigning to a function parameter from within another parameter
>>> declaration wouldn't make sense.  They should be readonly.  Side effects
>>> should be forbidden, I think.
>>
>> Such assignments are already allowed.  In a function definition, the side
>> effects (including in size expressions for array parameters adjusted to
>> pointers) take place before entry to the function body.
>>
>> And, in any case, if you did have a constraint disallowing such
>> assignments, it wouldn't suffice for syntactic disambiguation (see the
>> previous point I made about that; I have some rough notes towards a WG14
>> paper on syntactic disambiguation, but haven't converted them into a
>> coherent paper).
> 
> My idea was to only allow
> 
> array-declarator : direct-declarator [ . identifier ]
> 
> and only for parameter (not nested inside structs declared
> in parameter list) as a first step because it seems this
> would exclude all difficult cases.
> 
> But if we need to allow more complicated expressions, then
> it starts getting more complicated.

Ahh, I guess my work in documenting the man-pages prototypes got me thinking of 
those extensions to the idea.  I don't remember all the details :)

> 
> One could could allow more generic expressions, and
> specify that the .identifier refers to a
> parameter in
> the nearest lexically enclosing parameter list or
> struct/union.
> 
> Then
> 
> void foo(struct bar { int x; char c[.x] } a, int x);
> 
> would not be allowed (which is good because then we
> could later use the syntax also inside structs). If
> we apply scoping rules, the following would work:
> 
> struct bar { int y; };
> void foo(char p[((struct bar){ .y = .x }).y], int x);

Makes sense.

> 
> But not:
> 
> struct bar { int y; };
> void foo(char p[((struct bar){ .y = .y }).y], int y);

Although it clearly is nonsense, I'm not sure I'd make it a constraint 
violation, but rather Undefined Behavior.  How is it different than this?:

$ cat foo.c
int main(void)
{
	int i = i;
	return i;
}


$ gcc --version | head -n1
gcc (Debian 12.2.0-9) 12.2.0
$ gcc -Wall -Wextra -Werror foo.c
$

$ clang --version | head -n1
Debian clang version 14.0.6
$ clang -Wall -Wextra -Werror foo.c
foo.c:3:10: error: variable 'i' is uninitialized when used within its own 
initialization [-Werror,-Wuninitialized]
         int i = i;
             ~   ^
1 error generated.


BTW, I just freaked out that GCC can't catch this trivial bug.  Should I open a 
bug report?

> 
> 
> But there are not only syntactical problems, because
> also the type of the parameter might become relevant
> and then you can get circular dependencies:
> 
> void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);

This seems to be a difficult stone in the road.

> 
> I am not sure what would the best way to fix it. One
> could specifiy that parameters referred to by
> the .identifer syntax must of some integer type and
> that the sub-expression .identifer is always
> converted to a 'size_t'.

That makes sense, but then overnight some quite useful thing came to my mind 
that would not be possible with this limitation:


<https://software.codidact.com/posts/285946>

char *
stpecpy(char dst[.end - .dst], char *src, char end[1])
{
	for (/* void */; dst <= end; dst++) {
		*dst = *src++;
		if (*dst == '\0')
			return dst;
	}
	/* Truncation detected */
	*end = '\0';

#if !defined(NDEBUG)
	/* Consume the rest of the input string. */
	while (*src++) {};
#endif

	return end + 1;
}


stpecpy() is a function similar to strlcat(3) that gets a pointer to the end of 
the array instead of the size of the buffer.  This allows chaining without 
having performance issues[1].

[1]: <https://en.wikichip.org/wiki/schlemiel_the_painter%27s_algorithm>


Maybe allowing integral types and pointers would be enough.  However, foreseeing 
that the _Lengthof() proposal (BTW, which paper was it?) will succeed, and 
combining it with this one, _Lengthof(pointer) would ideally give the length of 
the array, so allowing pointers would conflict.

My solution is to disallow sizeof() and _Lengthof() on .identifier.  That could 
be done simply by saying that variably-modified types (VMT) are incomplete types 
until immediately after the comma that follows the parameter declaration. 
Therefore it would be allowed only in the same way as it is allowed right now 
with the normal syntax (i.e., after the parameter has been seen).

BTW, what was the number of the latest paper for _Lengthof() and what happened 
to it?  I guess it's likely to be added to C3x, isn't it?

And another BTW:  there's some kind of consistency in (some) projects for naming 
sizes, and I have pending a review of the Linux man-pages to make it consistent 
there too.

See the following table of usual conventions:

Operator/macro:                 variable names;    Description.
------------------------------|------------------|---------------------
strlen(3):                      length, len, l;    String length.
sizeof():                       size, sz, nbytes;  Identifier size in bytes.
nitems(), nelems():             n, nelem, nitems;  Array number of elements.
sizeof_array(), array_bytes():  size, sz, nbytes;  Array size in bytes.


Naming _Lengthof() the operator that gets the number of elements in an array 
would create naming confusion, since then length can mean two different things. 
I suggest _Nitemsof().


> 
> Maybe one should also add a constraint that all new
> type length expressions, i.e. using the syntax,
> can not have side effects. Or even that they follow
> all the rules of integer constant expressions with
> the fictitious assumption that all . identifer
> sub-expressions are integer constant expressions.
> The rationale being that this would facilitate
> compile time reasoning about length expressions.
>   
> 
> Martin
> 

Cheers,

Alex

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-13 13:19                                               ` Alejandro Colomar
@ 2022-11-13 13:33                                                 ` Alejandro Colomar
  2022-11-13 14:02                                                   ` Alejandro Colomar
  2022-11-14 17:52                                                 ` Joseph Myers
  1 sibling, 1 reply; 50+ messages in thread
From: Alejandro Colomar @ 2022-11-13 13:33 UTC (permalink / raw)
  To: Martin Uecker, Joseph Myers
  Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc


[-- Attachment #1.1: Type: text/plain, Size: 2080 bytes --]

Hi Martin,

On 11/13/22 14:19, Alejandro Colomar wrote:
>> But there are not only syntactical problems, because
>> also the type of the parameter might become relevant
>> and then you can get circular dependencies:
>>
>> void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);
> 
> This seems to be a difficult stone in the road.
> 
>>
>> I am not sure what would the best way to fix it. One
>> could specifiy that parameters referred to by
>> the .identifer syntax must of some integer type and
>> that the sub-expression .identifer is always
>> converted to a 'size_t'.
> 
> That makes sense, but then overnight some quite useful thing came to my mind 
> that would not be possible with this limitation:
> 
> 
> <https://software.codidact.com/posts/285946>
> 
> char *
> stpecpy(char dst[.end - .dst], char *src, char end[1])
> {
>      for (/* void */; dst <= end; dst++) {
>          *dst = *src++;
>          if (*dst == '\0')
>              return dst;
>      }
>      /* Truncation detected */
>      *end = '\0';
> 
> #if !defined(NDEBUG)
>      /* Consume the rest of the input string. */
>      while (*src++) {};
> #endif
> 
>      return end + 1;
> }

And I forgot to say it:  Default promotions rank high (probably the highest) in 
my list of most hated features^Wbugs in C.  I wouldn't convert it to size_t, but 
rather follow normal promotion rules.

Since you can use anything between INTMAX_MIN and UINTMAX_MAX for accessing an 
array (which took me some time to understand), I'd also allow the same here. 
So, the type of the expression between [] could perfectly be signed or unsigned.

So, you could use size_t for very high indices, or e.g. ptrdiff_t if you want to 
allow negative numbers.  In the function above, since dst can be a pointer to 
one-past-the-end (it represents a previous truncation; that's why the test 
dst<=end), forcing a size_t conversion would disallow that syntax.

Cheers,

Alex

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-13 13:33                                                 ` Alejandro Colomar
@ 2022-11-13 14:02                                                   ` Alejandro Colomar
  2022-11-13 14:58                                                     ` Martin Uecker
  0 siblings, 1 reply; 50+ messages in thread
From: Alejandro Colomar @ 2022-11-13 14:02 UTC (permalink / raw)
  To: Martin Uecker, Joseph Myers
  Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc


[-- Attachment #1.1: Type: text/plain, Size: 2405 bytes --]



On 11/13/22 14:33, Alejandro Colomar wrote:
> Hi Martin,
> 
> On 11/13/22 14:19, Alejandro Colomar wrote:
>>> But there are not only syntactical problems, because
>>> also the type of the parameter might become relevant
>>> and then you can get circular dependencies:
>>>
>>> void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);
>>
>> This seems to be a difficult stone in the road.
>>
>>>
>>> I am not sure what would the best way to fix it. One
>>> could specifiy that parameters referred to by
>>> the .identifer syntax must of some integer type and
>>> that the sub-expression .identifer is always
>>> converted to a 'size_t'.
>>
>> That makes sense, but then overnight some quite useful thing came to my mind 
>> that would not be possible with this limitation:
>>
>>
>> <https://software.codidact.com/posts/285946>
>>
>> char *
>> stpecpy(char dst[.end - .dst], char *src, char end[1])

Heh, I got an off-by-one error.  It should be dst[.end - .dst + 1], of course, 
and then the result of the whole expression would be 0, which is fine as size_t.

So, never mind.

>> {
>>      for (/* void */; dst <= end; dst++) {
>>          *dst = *src++;
>>          if (*dst == '\0')
>>              return dst;
>>      }
>>      /* Truncation detected */
>>      *end = '\0';
>>
>> #if !defined(NDEBUG)
>>      /* Consume the rest of the input string. */
>>      while (*src++) {};
>> #endif
>>
>>      return end + 1;
>> }

> 
> And I forgot to say it:  Default promotions rank high (probably the highest) in 
> my list of most hated features^Wbugs in C.  I wouldn't convert it to size_t, but 
> rather follow normal promotion rules.
> 
> Since you can use anything between INTMAX_MIN and UINTMAX_MAX for accessing an 
> array (which took me some time to understand), I'd also allow the same here. So, 
> the type of the expression between [] could perfectly be signed or unsigned.
> 
> So, you could use size_t for very high indices, or e.g. ptrdiff_t if you want to 
> allow negative numbers.  In the function above, since dst can be a pointer to 
> one-past-the-end (it represents a previous truncation; that's why the test 
> dst<=end), forcing a size_t conversion would disallow that syntax.
> 
> Cheers,
> 
> Alex
> 

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-13 14:02                                                   ` Alejandro Colomar
@ 2022-11-13 14:58                                                     ` Martin Uecker
  2022-11-13 15:15                                                       ` Alejandro Colomar
  0 siblings, 1 reply; 50+ messages in thread
From: Martin Uecker @ 2022-11-13 14:58 UTC (permalink / raw)
  To: Alejandro Colomar, Joseph Myers
  Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc

Am Sonntag, den 13.11.2022, 15:02 +0100 schrieb Alejandro Colomar:
> 
> On 11/13/22 14:33, Alejandro Colomar wrote:
> > Hi Martin,
> > 
> > On 11/13/22 14:19, Alejandro Colomar wrote:
> > > > But there are not only syntactical problems, because
> > > > also the type of the parameter might become relevant
> > > > and then you can get circular dependencies:
> > > > 
> > > > void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);
> > > 
> > > This seems to be a difficult stone in the road.

But note that GNU forward declarations solve this nicely.

> > > 
> > > > I am not sure what would the best way to fix it. One
> > > > could specifiy that parameters referred to by
> > > > the .identifer syntax must of some integer type and
> > > > that the sub-expression .identifer is always
> > > > converted to a 'size_t'.
> > > 
> > > That makes sense, but then overnight some quite useful thing came to my mind 
> > > that would not be possible with this limitation:
> > > 
> > > 
> > > <https://software.codidact.com/posts/285946>
> > > 
> > > char *
> > > stpecpy(char dst[.end - .dst], char *src, char end[1])
> 
> Heh, I got an off-by-one error.  It should be dst[.end - .dst + 1], of course, 
> and then the result of the whole expression would be 0, which is fine as size_t.
> 
> So, never mind.

.end and .dst would have pointer size though.

> > > {
> > >      for (/* void */; dst <= end; dst++) {
> > >          *dst = *src++;
> > >          if (*dst == '\0')
> > >              return dst;
> > >      }
> > >      /* Truncation detected */
> > >      *end = '\0';
> > > 
> > > #if !defined(NDEBUG)
> > >      /* Consume the rest of the input string. */
> > >      while (*src++) {};
> > > #endif
> > > 
> > >      return end + 1;
> > > }
> > And I forgot to say it:  Default promotions rank high (probably the highest) in 
> > my list of most hated features^Wbugs in C. 

If you replaced them with explicit conversion you then have
to add by hand all the time, I am pretty sure most people
would hate this more. (and it could also hide bugs)

> > I wouldn't convert it to size_t, but 
> > rather follow normal promotion rules.

The point of making it size_t is that you then
do need to know the type of the parameter to make
sense of the expression. If the type matters, then you get
mutual dependencies as in the example above. 

> > Since you can use anything between INTMAX_MIN and UINTMAX_MAX for accessing an 
> > array (which took me some time to understand), I'd also allow the same here. So, 
> > the type of the expression between [] could perfectly be signed or unsigned.
> > 
> > So, you could use size_t for very high indices, or e.g. ptrdiff_t if you want to 
> > allow negative numbers.  In the function above, since dst can be a pointer to 
> > one-past-the-end (it represents a previous truncation; that's why the test 
> > dst<=end), forcing a size_t conversion would disallow that syntax.

Yes, this then does not work.

Martin


> > Cheers,
> > 
> > Alex
> > 
> 
> -- 
> <http://www.alejandro-colomar.es/>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-13 14:58                                                     ` Martin Uecker
@ 2022-11-13 15:15                                                       ` Alejandro Colomar
  2022-11-13 15:32                                                         ` Martin Uecker
  2022-11-13 16:28                                                         ` Alejandro Colomar
  0 siblings, 2 replies; 50+ messages in thread
From: Alejandro Colomar @ 2022-11-13 15:15 UTC (permalink / raw)
  To: Martin Uecker, Joseph Myers
  Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc


[-- Attachment #1.1: Type: text/plain, Size: 3465 bytes --]

Hi Martin,

On 11/13/22 15:58, Martin Uecker wrote:
> Am Sonntag, den 13.11.2022, 15:02 +0100 schrieb Alejandro Colomar:
>>
>> On 11/13/22 14:33, Alejandro Colomar wrote:
>>>
>>> On 11/13/22 14:19, Alejandro Colomar wrote:
>>>>> But there are not only syntactical problems, because
>>>>> also the type of the parameter might become relevant
>>>>> and then you can get circular dependencies:
>>>>>
>>>>> void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);
>>>>
>>>> This seems to be a difficult stone in the road.
> 
> But note that GNU forward declarations solve this nicely.

How would that above be solved with GNU fwd decl?  I'm guessing that it can't. 
How do you forward declare incomplete VMTs?.

> 
>>>>
>>>>> I am not sure what would the best way to fix it. One
>>>>> could specifiy that parameters referred to by
>>>>> the .identifer syntax must of some integer type and
>>>>> that the sub-expression .identifer is always
>>>>> converted to a 'size_t'.
>>>>
>>>> That makes sense, but then overnight some quite useful thing came to my mind
>>>> that would not be possible with this limitation:
>>>>
>>>>
>>>> <https://software.codidact.com/posts/285946>
>>>>
>>>> char *
>>>> stpecpy(char dst[.end - .dst], char *src, char end[1])
>>
>> Heh, I got an off-by-one error.  It should be dst[.end - .dst + 1], of course,
>> and then the result of the whole expression would be 0, which is fine as size_t.
>>
>> So, never mind.
> 
> .end and .dst would have pointer size though.
> 
>>>> {
>>>>       for (/* void */; dst <= end; dst++) {
>>>>           *dst = *src++;
>>>>           if (*dst == '\0')
>>>>               return dst;
>>>>       }
>>>>       /* Truncation detected */
>>>>       *end = '\0';
>>>>
>>>> #if !defined(NDEBUG)
>>>>       /* Consume the rest of the input string. */
>>>>       while (*src++) {};
>>>> #endif
>>>>
>>>>       return end + 1;
>>>> }
>>> And I forgot to say it:  Default promotions rank high (probably the highest) in
>>> my list of most hated features^Wbugs in C.
> 
> If you replaced them with explicit conversion you then have
> to add by hand all the time, I am pretty sure most people
> would hate this more. (and it could also hide bugs)

Yeah, casts are also in my top 3 list of things to avoid (although in this case 
there's no bug); maybe a bit over default promotions :)

I didn't mean that all promotions are bad.  Just the gratuitous ones, like 
promoting everything to int before even needing it.  That makes uint16_t a 
theoretical type, because whenever you try to use it, you end up with a signed 
32-bit type; fun heh? :P  _BitInt() solves that for me.

But sure, in (1u + 1l), promotions are fine to get a common type.

> 
>>> I wouldn't convert it to size_t, but
>>> rather follow normal promotion rules.
> 
> The point of making it size_t is that you then
> do need to know the type of the parameter to make
> sense of the expression. If the type matters, then you get
> mutual dependencies as in the example above.

Except if you treat incomplete types as... incomplete types (for which sizeof() 
is disallowed by the standard).  And the issue we're having is that the types 
are not yet complete at the time we're using them, aren't they?

Kind of like the initialization order fiasco, but since we're in a limited 
scope, we can detect it.

Cheers,

Alex

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-13 15:15                                                       ` Alejandro Colomar
@ 2022-11-13 15:32                                                         ` Martin Uecker
  2022-11-13 16:25                                                           ` Alejandro Colomar
  2022-11-13 16:28                                                         ` Alejandro Colomar
  1 sibling, 1 reply; 50+ messages in thread
From: Martin Uecker @ 2022-11-13 15:32 UTC (permalink / raw)
  To: Alejandro Colomar, Joseph Myers
  Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc

Am Sonntag, den 13.11.2022, 16:15 +0100 schrieb Alejandro Colomar:
> Hi Martin,
> 
> On 11/13/22 15:58, Martin Uecker wrote:
> > Am Sonntag, den 13.11.2022, 15:02 +0100 schrieb Alejandro Colomar:
> > > On 11/13/22 14:33, Alejandro Colomar wrote:
> > > > On 11/13/22 14:19, Alejandro Colomar wrote:
> > > > > > But there are not only syntactical problems, because
> > > > > > also the type of the parameter might become relevant
> > > > > > and then you can get circular dependencies:
> > > > > > 
> > > > > > void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);
> > > > > 
> > > > > This seems to be a difficult stone in the road.
> > 
> > But note that GNU forward declarations solve this nicely.
> 
> How would that above be solved with GNU fwd decl?  I'm guessing that it can't. 
> How do you forward declare incomplete VMTs?.

You can't express it. This was my point: it is impossible
to create circular dependencies.

...

> > > > > {
> > > > >       for (/* void */; dst <= end; dst++) {
> > > > >           *dst = *src++;
> > > > >           if (*dst == '\0')
> > > > >               return dst;
> > > > >       }
> > > > >       /* Truncation detected */
> > > > >       *end = '\0';
> > > > > 
> > > > > #if !defined(NDEBUG)
> > > > >       /* Consume the rest of the input string. */
> > > > >       while (*src++) {};
> > > > > #endif
> > > > > 
> > > > >       return end + 1;
> > > > > }
> > > > And I forgot to say it:  Default promotions rank high (probably the highest) in
> > > > my list of most hated features^Wbugs in C.
> > 
> > If you replaced them with explicit conversion you then have
> > to add by hand all the time, I am pretty sure most people
> > would hate this more. (and it could also hide bugs)
> 
> Yeah, casts are also in my top 3 list of things to avoid (although in this case 
> there's no bug); maybe a bit over default promotions :)
> 
> I didn't mean that all promotions are bad.  Just the gratuitous ones, like 
> promoting everything to int before even needing it.  That makes uint16_t a 
> theoretical type, because whenever you try to use it, you end up with a signed 
> 32-bit type; fun heh? :P  _BitInt() solves that for me.

uint16_t is for storing data.  My expectation is that people
will find _BitInt() difficult and error-prone to use for
small sizes.  But maybe I am wrong...

> But sure, in (1u + 1l), promotions are fine to get a common type.
> 
> > > > I wouldn't convert it to size_t, but
> > > > rather follow normal promotion rules.
> > 
> > The point of making it size_t is that you then
> > do need to know the type of the parameter to make
> > sense of the expression. If the type matters, then you get
> > mutual dependencies as in the example above.
> 
> Except if you treat incomplete types as... incomplete types (for which sizeof() 
> is disallowed by the standard).  And the issue we're having is that the types 
> are not yet complete at the time we're using them, aren't they?

It is not an incomplete type. When doing parsing and do not have
a declaration we know nothing about it (not just not the size).
If we assume we know the type (by looking ahead) we get mutual
dependencies.

Also the capability to parse, fold, and do type checking
in one go is something worth preserving in my opinion. 

Martin


> Kind of like the initialization order fiasco, but since we're in a limited 
> scope, we can detect it.





^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-13 15:32                                                         ` Martin Uecker
@ 2022-11-13 16:25                                                           ` Alejandro Colomar
  0 siblings, 0 replies; 50+ messages in thread
From: Alejandro Colomar @ 2022-11-13 16:25 UTC (permalink / raw)
  To: Martin Uecker, Joseph Myers
  Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc


[-- Attachment #1.1: Type: text/plain, Size: 4129 bytes --]

Hi Martin,

On 11/13/22 16:32, Martin Uecker wrote:
> Am Sonntag, den 13.11.2022, 16:15 +0100 schrieb Alejandro Colomar:
>> Hi Martin,
>>
>> On 11/13/22 15:58, Martin Uecker wrote:
>>> Am Sonntag, den 13.11.2022, 15:02 +0100 schrieb Alejandro Colomar:
>>>> On 11/13/22 14:33, Alejandro Colomar wrote:
>>>>> On 11/13/22 14:19, Alejandro Colomar wrote:
>>>>>>> But there are not only syntactical problems, because
>>>>>>> also the type of the parameter might become relevant
>>>>>>> and then you can get circular dependencies:
>>>>>>>
>>>>>>> void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);
>>>>>>
>>>>>> This seems to be a difficult stone in the road.
>>>
>>> But note that GNU forward declarations solve this nicely.
>>
>> How would that above be solved with GNU fwd decl?  I'm guessing that it can't.
>> How do you forward declare incomplete VMTs?.
> 
> You can't express it. This was my point: it is impossible
> to create circular dependencies.
> 
> ...
> 
>>>>>> {
>>>>>>        for (/* void */; dst <= end; dst++) {
>>>>>>            *dst = *src++;
>>>>>>            if (*dst == '\0')
>>>>>>                return dst;
>>>>>>        }
>>>>>>        /* Truncation detected */
>>>>>>        *end = '\0';
>>>>>>
>>>>>> #if !defined(NDEBUG)
>>>>>>        /* Consume the rest of the input string. */
>>>>>>        while (*src++) {};
>>>>>> #endif
>>>>>>
>>>>>>        return end + 1;
>>>>>> }
>>>>> And I forgot to say it:  Default promotions rank high (probably the highest) in
>>>>> my list of most hated features^Wbugs in C.
>>>
>>> If you replaced them with explicit conversion you then have
>>> to add by hand all the time, I am pretty sure most people
>>> would hate this more. (and it could also hide bugs)
>>
>> Yeah, casts are also in my top 3 list of things to avoid (although in this case
>> there's no bug); maybe a bit over default promotions :)
>>
>> I didn't mean that all promotions are bad.  Just the gratuitous ones, like
>> promoting everything to int before even needing it.  That makes uint16_t a
>> theoretical type, because whenever you try to use it, you end up with a signed
>> 32-bit type; fun heh? :P  _BitInt() solves that for me.
> 
> uint16_t is for storing data.  My expectation is that people
> will find _BitInt() difficult and error-prone to use for
> small sizes.  But maybe I am wrong...

I'm a bit concerned about the suffix to create literals.  I'd have preferred a 
suffix that allowed creating a specific size (instead of the minimum one.  i.e., 
1u16 or something like that.  But otherwise I think it can be better.  I don't 
have in mind a big issue I had a year ago with uint16_t, but it required 3 casts 
in a line.  With _BitInt() I think none (or maybe one, for giving 1 the 
appropriate size) would have been needed.  But we'll see how it works out.


> 
>> But sure, in (1u + 1l), promotions are fine to get a common type.
>>
>>>>> I wouldn't convert it to size_t, but
>>>>> rather follow normal promotion rules.
>>>
>>> The point of making it size_t is that you then
>>> do need to know the type of the parameter to make
>>> sense of the expression. If the type matters, then you get
>>> mutual dependencies as in the example above.
>>
>> Except if you treat incomplete types as... incomplete types (for which sizeof()
>> is disallowed by the standard).  And the issue we're having is that the types
>> are not yet complete at the time we're using them, aren't they?
> 
> It is not an incomplete type. When doing parsing and do not have
> a declaration we know nothing about it (not just not the size).
> If we assume we know the type (by looking ahead) we get mutual
> dependencies.

Then I'd do the following:  .identifier always has an incomplete type.

I'm preparing a complete description of what I think of the feature.  I'll add that.

> 
> Also the capability to parse, fold, and do type checking
> in one go is something worth preserving in my opinion.

Makes sense.

Thanks for all the help, both!

Cheers,

Alex

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-13 15:15                                                       ` Alejandro Colomar
  2022-11-13 15:32                                                         ` Martin Uecker
@ 2022-11-13 16:28                                                         ` Alejandro Colomar
  2022-11-13 16:31                                                           ` Alejandro Colomar
  2022-11-14 18:13                                                           ` Joseph Myers
  1 sibling, 2 replies; 50+ messages in thread
From: Alejandro Colomar @ 2022-11-13 16:28 UTC (permalink / raw)
  To: Martin Uecker, Joseph Myers
  Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc


[-- Attachment #1.1: Type: text/plain, Size: 3542 bytes --]

SYNOPSIS:

unary-operator:  . identifier


DESCRIPTION:

-  It is not an lvalue.

    -  This means sizeof() and _Lengthof() cannot be applied to them.
    -  This prevents ambiguity with a designator in an initializer-list within a 
nested braced-initializer.

-  The type of a .identifier is always an incomplete type.

    -  This prevents circular dependencies involving sizeof() or _Lengthof().

-  Shadowing rules apply.

    -  This prevents ambiguity.


EXAMPLES:


-  Valid examples (libc):

        int
        strncmp(const char s1[.n],
                const char s2[.n],
                size_t n);

        int
        cacheflush(void addr[.nbytes],
                   int nbytes,
                   int cache);

        long
        mbind(void addr[.len],
              unsigned long len,
              int mode,
              const unsigned long nodemask[(.maxnode + ULONG_WIDTH ‐ 1)
                                           / ULONG_WIDTH],
              unsigned long maxnode, unsigned int flags);

        void *
        bsearch(const void key[.size],
                const void base[.size * .nmemb],
                size_t nmemb,
                size_t size,
                int (*compar)(const void [.size], const void [.size]));

-  Valid examples (my own):

        void
        ustr2str(char dst[restrict .len + 1],
                 const char src[restrict .len],
                 size_t len);

        char *
        stpecpy(char dst[.end - .dst + 1],
                char *restrict src,
                char end[1]);

-  Valid examples (from this thread):

    -
        struct s { int a; };
        void f(int a, int b[((struct s) { .a = 1 }).a]);

        Explanation:
        -  Because of shadowing rules, .a=1 refers to the struct member.
           -  Also, if .a referred to the parameter, it would be an rvalue, so 
it wouldn't be valid to assign to it.
        -  (...).a refers to the struct member too, since otherwise an rvalue is 
not expected there.

    -
        void foo(struct bar { int x; char c[.x] } a, int x);

        Explanation:
        -  Because of shadowing rules, [.x] refers to the struct member.

    -
        struct bar { int y; };
        void foo(char p[((struct bar){ .y = .x }).y], int x);

        Explanation:
        -  .x unambiguously refers to the parameter.

-  Undefined behavior:

    -
        struct bar { int y; };
        void foo(char p[((struct bar){ .y = .y }).y], int y);

        Explanation:
        -  Because of shadowing rules, =.y refers to the struct member.
        -  .y=.y means initialize the member with itself (uninitialized use).
        -  (...).y refers to the struct member, since otherwise an rvalue is not 
expected there.

-  Constraint violations:

    -
        void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);

        Explanation:
        -  sizeof(*.b): Cannot get size of incomplete type.
        -  sizeof(*.a): Cannot get size of incomplete type.

    -
        void f(size_t s, int a[sizeof(1) = 1]);

        Explanation:
        -  Cannot assign to rvalue.

    -
        void f(size_t s, int a[.s = 1]);

        Explanation:
        -  Cannot assign to rvalue.

    -
        void f(size_t s, int a[sizeof(.s)]);

        Explanation:
        -  sizeof(.s): Cannot get size of incomplete type.


Does this idea make sense to you?


Cheers,
Alex
-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-13 16:28                                                         ` Alejandro Colomar
@ 2022-11-13 16:31                                                           ` Alejandro Colomar
  2022-11-13 16:34                                                             ` Alejandro Colomar
  2022-11-14 18:13                                                           ` Joseph Myers
  1 sibling, 1 reply; 50+ messages in thread
From: Alejandro Colomar @ 2022-11-13 16:31 UTC (permalink / raw)
  To: Martin Uecker, Joseph Myers
  Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc


[-- Attachment #1.1: Type: text/plain, Size: 4560 bytes --]



On 11/13/22 17:28, Alejandro Colomar wrote:
> SYNOPSIS:
> 
> unary-operator:  . identifier
> 
> 
> DESCRIPTION:
> 
> -  It is not an lvalue.
> 
>     -  This means sizeof() and _Lengthof() cannot be applied to them.

Sorry, the above is a thinko.

I wanted to say that, like sizeof() and _Lengthof(), you can't assign to it.

>     -  This prevents ambiguity with a designator in an initializer-list within a 
> nested braced-initializer.
> 
> -  The type of a .identifier is always an incomplete type.
> 
>     -  This prevents circular dependencies involving sizeof() or _Lengthof().
> 
> -  Shadowing rules apply.
> 
>     -  This prevents ambiguity.
> 
> 
> EXAMPLES:
> 
> 
> -  Valid examples (libc):
> 
>         int
>         strncmp(const char s1[.n],
>                 const char s2[.n],
>                 size_t n);
> 
>         int
>         cacheflush(void addr[.nbytes],
>                    int nbytes,
>                    int cache);
> 
>         long
>         mbind(void addr[.len],
>               unsigned long len,
>               int mode,
>               const unsigned long nodemask[(.maxnode + ULONG_WIDTH ‐ 1)
>                                            / ULONG_WIDTH],
>               unsigned long maxnode, unsigned int flags);
> 
>         void *
>         bsearch(const void key[.size],
>                 const void base[.size * .nmemb],
>                 size_t nmemb,
>                 size_t size,
>                 int (*compar)(const void [.size], const void [.size]));
> 
> -  Valid examples (my own):
> 
>         void
>         ustr2str(char dst[restrict .len + 1],
>                  const char src[restrict .len],
>                  size_t len);
> 
>         char *
>         stpecpy(char dst[.end - .dst + 1],
>                 char *restrict src,
>                 char end[1]);
> 
> -  Valid examples (from this thread):
> 
>     -
>         struct s { int a; };
>         void f(int a, int b[((struct s) { .a = 1 }).a]);
> 
>         Explanation:
>         -  Because of shadowing rules, .a=1 refers to the struct member.
>            -  Also, if .a referred to the parameter, it would be an rvalue, so 
> it wouldn't be valid to assign to it.
>         -  (...).a refers to the struct member too, since otherwise an rvalue is 
> not expected there.
> 
>     -
>         void foo(struct bar { int x; char c[.x] } a, int x);
> 
>         Explanation:
>         -  Because of shadowing rules, [.x] refers to the struct member.
> 
>     -
>         struct bar { int y; };
>         void foo(char p[((struct bar){ .y = .x }).y], int x);
> 
>         Explanation:
>         -  .x unambiguously refers to the parameter.
> 
> -  Undefined behavior:
> 
>     -
>         struct bar { int y; };
>         void foo(char p[((struct bar){ .y = .y }).y], int y);
> 
>         Explanation:
>         -  Because of shadowing rules, =.y refers to the struct member.
>         -  .y=.y means initialize the member with itself (uninitialized use).
>         -  (...).y refers to the struct member, since otherwise an rvalue is not 
> expected there.
> 
> -  Constraint violations:
> 
>     -
>         void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);
> 
>         Explanation:
>         -  sizeof(*.b): Cannot get size of incomplete type.
>         -  sizeof(*.a): Cannot get size of incomplete type.
> 
>     -
>         void f(size_t s, int a[sizeof(1) = 1]);
> 
>         Explanation:
>         -  Cannot assign to rvalue.
> 
>     -
>         void f(size_t s, int a[.s = 1]);
> 
>         Explanation:
>         -  Cannot assign to rvalue.
> 
>     -
>         void f(size_t s, int a[sizeof(.s)]);
> 
>         Explanation:
>         -  sizeof(.s): Cannot get size of incomplete type.
> 
> 
> Does this idea make sense to you?
> 
> 
> Cheers,
> Alex

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-13 16:31                                                           ` Alejandro Colomar
@ 2022-11-13 16:34                                                             ` Alejandro Colomar
  2022-11-13 16:56                                                               ` Alejandro Colomar
  0 siblings, 1 reply; 50+ messages in thread
From: Alejandro Colomar @ 2022-11-13 16:34 UTC (permalink / raw)
  To: Martin Uecker, Joseph Myers
  Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc


[-- Attachment #1.1: Type: text/plain, Size: 4893 bytes --]



On 11/13/22 17:31, Alejandro Colomar wrote:
> 
> 
> On 11/13/22 17:28, Alejandro Colomar wrote:
>> SYNOPSIS:
>>
>> unary-operator:  . identifier
>>
>>
>> DESCRIPTION:
>>
>> -  It is not an lvalue.
>>
>>     -  This means sizeof() and _Lengthof() cannot be applied to them.
> 
> Sorry, the above is a thinko.
> 
> I wanted to say that, like sizeof() and _Lengthof(), you can't assign to it.
> 
>>     -  This prevents ambiguity with a designator in an initializer-list within 
>> a nested braced-initializer.
>>
>> -  The type of a .identifier is always an incomplete type.

Or rather, more easily prohibit explicitly using typeof(), sizeof(), and 
_Lengthof() to it.

>>
>>     -  This prevents circular dependencies involving sizeof() or _Lengthof().
>>
>> -  Shadowing rules apply.
>>
>>     -  This prevents ambiguity.
>>
>>
>> EXAMPLES:
>>
>>
>> -  Valid examples (libc):
>>
>>         int
>>         strncmp(const char s1[.n],
>>                 const char s2[.n],
>>                 size_t n);
>>
>>         int
>>         cacheflush(void addr[.nbytes],
>>                    int nbytes,
>>                    int cache);
>>
>>         long
>>         mbind(void addr[.len],
>>               unsigned long len,
>>               int mode,
>>               const unsigned long nodemask[(.maxnode + ULONG_WIDTH ‐ 1)
>>                                            / ULONG_WIDTH],
>>               unsigned long maxnode, unsigned int flags);
>>
>>         void *
>>         bsearch(const void key[.size],
>>                 const void base[.size * .nmemb],
>>                 size_t nmemb,
>>                 size_t size,
>>                 int (*compar)(const void [.size], const void [.size]));
>>
>> -  Valid examples (my own):
>>
>>         void
>>         ustr2str(char dst[restrict .len + 1],
>>                  const char src[restrict .len],
>>                  size_t len);
>>
>>         char *
>>         stpecpy(char dst[.end - .dst + 1],
>>                 char *restrict src,
>>                 char end[1]);
>>
>> -  Valid examples (from this thread):
>>
>>     -
>>         struct s { int a; };
>>         void f(int a, int b[((struct s) { .a = 1 }).a]);
>>
>>         Explanation:
>>         -  Because of shadowing rules, .a=1 refers to the struct member.
>>            -  Also, if .a referred to the parameter, it would be an rvalue, so 
>> it wouldn't be valid to assign to it.
>>         -  (...).a refers to the struct member too, since otherwise an rvalue 
>> is not expected there.
>>
>>     -
>>         void foo(struct bar { int x; char c[.x] } a, int x);
>>
>>         Explanation:
>>         -  Because of shadowing rules, [.x] refers to the struct member.
>>
>>     -
>>         struct bar { int y; };
>>         void foo(char p[((struct bar){ .y = .x }).y], int x);
>>
>>         Explanation:
>>         -  .x unambiguously refers to the parameter.
>>
>> -  Undefined behavior:
>>
>>     -
>>         struct bar { int y; };
>>         void foo(char p[((struct bar){ .y = .y }).y], int y);
>>
>>         Explanation:
>>         -  Because of shadowing rules, =.y refers to the struct member.
>>         -  .y=.y means initialize the member with itself (uninitialized use).
>>         -  (...).y refers to the struct member, since otherwise an rvalue is 
>> not expected there.
>>
>> -  Constraint violations:
>>
>>     -
>>         void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);
>>
>>         Explanation:
>>         -  sizeof(*.b): Cannot get size of incomplete type.
>>         -  sizeof(*.a): Cannot get size of incomplete type.
>>
>>     -
>>         void f(size_t s, int a[sizeof(1) = 1]);
>>
>>         Explanation:
>>         -  Cannot assign to rvalue.
>>
>>     -
>>         void f(size_t s, int a[.s = 1]);
>>
>>         Explanation:
>>         -  Cannot assign to rvalue.
>>
>>     -
>>         void f(size_t s, int a[sizeof(.s)]);
>>
>>         Explanation:
>>         -  sizeof(.s): Cannot get size of incomplete type.
>>
>>
>> Does this idea make sense to you?
>>
>>
>> Cheers,
>> Alex
> 

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-13 16:34                                                             ` Alejandro Colomar
@ 2022-11-13 16:56                                                               ` Alejandro Colomar
  2022-11-13 19:05                                                                 ` Alejandro Colomar
  0 siblings, 1 reply; 50+ messages in thread
From: Alejandro Colomar @ 2022-11-13 16:56 UTC (permalink / raw)
  To: Martin Uecker, Joseph Myers
  Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc


[-- Attachment #1.1: Type: text/plain, Size: 5300 bytes --]



On 11/13/22 17:34, Alejandro Colomar wrote:
> 
> 
> On 11/13/22 17:31, Alejandro Colomar wrote:
>>
>>
>> On 11/13/22 17:28, Alejandro Colomar wrote:
>>> SYNOPSIS:
>>>
>>> unary-operator:  . identifier
>>>
>>>
>>> DESCRIPTION:
>>>
>>> -  It is not an lvalue.
>>>
>>>     -  This means sizeof() and _Lengthof() cannot be applied to them.
>>
>> Sorry, the above is a thinko.
>>
>> I wanted to say that, like sizeof() and _Lengthof(), you can't assign to it.
>>
>>>     -  This prevents ambiguity with a designator in an initializer-list 
>>> within a nested braced-initializer.
>>>
>>> -  The type of a .identifier is always an incomplete type.
> 
> Or rather, more easily prohibit explicitly using typeof(), sizeof(), and 
> _Lengthof() to it.

Hmm, this is not enough.  Pointer arithmetics are interesting, and for that, you 
need to implicitly know the sizeof(*.p).

How about allowing only integral types or pointers to integral types?

> 
>>>
>>>     -  This prevents circular dependencies involving sizeof() or _Lengthof().
>>>
>>> -  Shadowing rules apply.
>>>
>>>     -  This prevents ambiguity.
>>>
>>>
>>> EXAMPLES:
>>>
>>>
>>> -  Valid examples (libc):
>>>
>>>         int
>>>         strncmp(const char s1[.n],
>>>                 const char s2[.n],
>>>                 size_t n);
>>>
>>>         int
>>>         cacheflush(void addr[.nbytes],
>>>                    int nbytes,
>>>                    int cache);
>>>
>>>         long
>>>         mbind(void addr[.len],
>>>               unsigned long len,
>>>               int mode,
>>>               const unsigned long nodemask[(.maxnode + ULONG_WIDTH ‐ 1)
>>>                                            / ULONG_WIDTH],
>>>               unsigned long maxnode, unsigned int flags);
>>>
>>>         void *
>>>         bsearch(const void key[.size],
>>>                 const void base[.size * .nmemb],
>>>                 size_t nmemb,
>>>                 size_t size,
>>>                 int (*compar)(const void [.size], const void [.size]));
>>>
>>> -  Valid examples (my own):
>>>
>>>         void
>>>         ustr2str(char dst[restrict .len + 1],
>>>                  const char src[restrict .len],
>>>                  size_t len);
>>>
>>>         char *
>>>         stpecpy(char dst[.end - .dst + 1],
>>>                 char *restrict src,
>>>                 char end[1]);
>>>
>>> -  Valid examples (from this thread):
>>>
>>>     -
>>>         struct s { int a; };
>>>         void f(int a, int b[((struct s) { .a = 1 }).a]);
>>>
>>>         Explanation:
>>>         -  Because of shadowing rules, .a=1 refers to the struct member.
>>>            -  Also, if .a referred to the parameter, it would be an rvalue, 
>>> so it wouldn't be valid to assign to it.
>>>         -  (...).a refers to the struct member too, since otherwise an rvalue 
>>> is not expected there.
>>>
>>>     -
>>>         void foo(struct bar { int x; char c[.x] } a, int x);
>>>
>>>         Explanation:
>>>         -  Because of shadowing rules, [.x] refers to the struct member.
>>>
>>>     -
>>>         struct bar { int y; };
>>>         void foo(char p[((struct bar){ .y = .x }).y], int x);
>>>
>>>         Explanation:
>>>         -  .x unambiguously refers to the parameter.
>>>
>>> -  Undefined behavior:
>>>
>>>     -
>>>         struct bar { int y; };
>>>         void foo(char p[((struct bar){ .y = .y }).y], int y);
>>>
>>>         Explanation:
>>>         -  Because of shadowing rules, =.y refers to the struct member.
>>>         -  .y=.y means initialize the member with itself (uninitialized use).
>>>         -  (...).y refers to the struct member, since otherwise an rvalue is 
>>> not expected there.
>>>
>>> -  Constraint violations:
>>>
>>>     -
>>>         void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);
>>>
>>>         Explanation:
>>>         -  sizeof(*.b): Cannot get size of incomplete type.
>>>         -  sizeof(*.a): Cannot get size of incomplete type.
>>>
>>>     -
>>>         void f(size_t s, int a[sizeof(1) = 1]);
>>>
>>>         Explanation:
>>>         -  Cannot assign to rvalue.
>>>
>>>     -
>>>         void f(size_t s, int a[.s = 1]);
>>>
>>>         Explanation:
>>>         -  Cannot assign to rvalue.
>>>
>>>     -
>>>         void f(size_t s, int a[sizeof(.s)]);
>>>
>>>         Explanation:
>>>         -  sizeof(.s): Cannot get size of incomplete type.
>>>
>>>
>>> Does this idea make sense to you?
>>>
>>>
>>> Cheers,
>>> Alex
>>
> 

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-13 16:56                                                               ` Alejandro Colomar
@ 2022-11-13 19:05                                                                 ` Alejandro Colomar
  0 siblings, 0 replies; 50+ messages in thread
From: Alejandro Colomar @ 2022-11-13 19:05 UTC (permalink / raw)
  To: Martin Uecker, Joseph Myers
  Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc


[-- Attachment #1.1: Type: text/plain, Size: 6113 bytes --]

On 11/13/22 17:56, Alejandro Colomar wrote:>>> On 11/13/22 17:28, Alejandro 
Colomar wrote:
>>>> SYNOPSIS:
>>>>
>>>> unary-operator:  . identifier
>>>>
>>>>
>>>> DESCRIPTION:
>>>>
>>>> -  It is not an lvalue.
>>>>
>>>>     -  This means sizeof() and _Lengthof() cannot be applied to them.
>>>
>>> Sorry, the above is a thinko.
>>>
>>> I wanted to say that, like sizeof() and _Lengthof(), you can't assign to it.
>>>
>>>>     -  This prevents ambiguity with a designator in an initializer-list 
>>>> within a nested braced-initializer.
>>>>
>>>> -  The type of a .identifier is always an incomplete type.
>>
>> Or rather, more easily prohibit explicitly using typeof(), sizeof(), and 
>> _Lengthof() to it.
> 
> Hmm, this is not enough.  Pointer arithmetics are interesting, and for that, you 
> need to implicitly know the sizeof(*.p).
> 
> How about allowing only integral types or pointers to integral types?

I've been thinking about keeping the number of passes as low as possible, while 
allowing most useful expressions:

Maybe forcing some ordering can help:

-  The type of a .initializer is complete after the opening parenthesis of the 
function-declarator (if it refers to a parameter) or after the opening brace of 
a braced-initializer, if it refers to a struct/union member, except when the 
type is a variably-modified type, which will be complete after the closing 
parenthesis or brace respectively.

I'm not sure I got the wording precisely, or if I covered all cases (like types 
that cannot be completed for other reasons, even after the closing ')' or '}'.

> 
>>
>>>>
>>>>     -  This prevents circular dependencies involving sizeof() or _Lengthof().
>>>>
>>>> -  Shadowing rules apply.
>>>>
>>>>     -  This prevents ambiguity.
>>>>
>>>>
>>>> EXAMPLES:
>>>>
>>>>
>>>> -  Valid examples (libc):
>>>>
>>>>         int
>>>>         strncmp(const char s1[.n],
>>>>                 const char s2[.n],
>>>>                 size_t n);
>>>>
>>>>         int
>>>>         cacheflush(void addr[.nbytes],
>>>>                    int nbytes,
>>>>                    int cache);
>>>>
>>>>         long
>>>>         mbind(void addr[.len],
>>>>               unsigned long len,
>>>>               int mode,
>>>>               const unsigned long nodemask[(.maxnode + ULONG_WIDTH ‐ 1)
>>>>                                            / ULONG_WIDTH],
>>>>               unsigned long maxnode, unsigned int flags);
>>>>
>>>>         void *
>>>>         bsearch(const void key[.size],
>>>>                 const void base[.size * .nmemb],
>>>>                 size_t nmemb,
>>>>                 size_t size,
>>>>                 int (*compar)(const void [.size], const void [.size]));
>>>>
>>>> -  Valid examples (my own):
>>>>
>>>>         void
>>>>         ustr2str(char dst[restrict .len + 1],
>>>>                  const char src[restrict .len],
>>>>                  size_t len);
>>>>
>>>>         char *
>>>>         stpecpy(char dst[.end - .dst + 1],
>>>>                 char *restrict src,
>>>>                 char end[1]);
>>>>
>>>> -  Valid examples (from this thread):
>>>>
>>>>     -
>>>>         struct s { int a; };
>>>>         void f(int a, int b[((struct s) { .a = 1 }).a]);
>>>>
>>>>         Explanation:
>>>>         -  Because of shadowing rules, .a=1 refers to the struct member.
>>>>            -  Also, if .a referred to the parameter, it would be an rvalue, 
>>>> so it wouldn't be valid to assign to it.
>>>>         -  (...).a refers to the struct member too, since otherwise an 
>>>> rvalue is not expected there.
>>>>
>>>>     -
>>>>         void foo(struct bar { int x; char c[.x] } a, int x);
>>>>
>>>>         Explanation:
>>>>         -  Because of shadowing rules, [.x] refers to the struct member.
>>>>
>>>>     -
>>>>         struct bar { int y; };
>>>>         void foo(char p[((struct bar){ .y = .x }).y], int x);
>>>>
>>>>         Explanation:
>>>>         -  .x unambiguously refers to the parameter.
>>>>
>>>> -  Undefined behavior:
>>>>
>>>>     -
>>>>         struct bar { int y; };
>>>>         void foo(char p[((struct bar){ .y = .y }).y], int y);
>>>>
>>>>         Explanation:
>>>>         -  Because of shadowing rules, =.y refers to the struct member.
>>>>         -  .y=.y means initialize the member with itself (uninitialized use).
>>>>         -  (...).y refers to the struct member, since otherwise an rvalue is 
>>>> not expected there.
>>>>
>>>> -  Constraint violations:
>>>>
>>>>     -
>>>>         void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);
>>>>
>>>>         Explanation:
>>>>         -  sizeof(*.b): Cannot get size of incomplete type.
>>>>         -  sizeof(*.a): Cannot get size of incomplete type.
>>>>
>>>>     -
>>>>         void f(size_t s, int a[sizeof(1) = 1]);
>>>>
>>>>         Explanation:
>>>>         -  Cannot assign to rvalue.
>>>>
>>>>     -
>>>>         void f(size_t s, int a[.s = 1]);
>>>>
>>>>         Explanation:
>>>>         -  Cannot assign to rvalue.
>>>>
>>>>     -
>>>>         void f(size_t s, int a[sizeof(.s)]);

This should actually be valid.

>>>>
>>>>         Explanation:
>>>>         -  sizeof(.s): Cannot get size of incomplete type.
>>>>
>>>>
>>>> Does this idea make sense to you?
>>>>
>>>>
>>>> Cheers,
>>>> Alex
>>>
>>
> 

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-13 13:19                                               ` Alejandro Colomar
  2022-11-13 13:33                                                 ` Alejandro Colomar
@ 2022-11-14 17:52                                                 ` Joseph Myers
  2022-11-14 17:57                                                   ` Alejandro Colomar
  1 sibling, 1 reply; 50+ messages in thread
From: Joseph Myers @ 2022-11-14 17:52 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc

On Sun, 13 Nov 2022, Alejandro Colomar via Gcc wrote:

> Maybe allowing integral types and pointers would be enough.  However,
> foreseeing that the _Lengthof() proposal (BTW, which paper was it?) will
> succeed, and combining it with this one, _Lengthof(pointer) would ideally give
> the length of the array, so allowing pointers would conflict.

Do you mean N2529 Romero, New pointer-proof keyword to determine array 
length?  To quote the convenor in WG14 reflector message 18575 (17 Nov 
2020) when I asked about its status, "The author asked me not to put those 
on the agenda.  He will supply updated versions later.".

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-14 17:52                                                 ` Joseph Myers
@ 2022-11-14 17:57                                                   ` Alejandro Colomar
  2022-11-14 18:26                                                     ` Joseph Myers
  0 siblings, 1 reply; 50+ messages in thread
From: Alejandro Colomar @ 2022-11-14 17:57 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc


[-- Attachment #1.1: Type: text/plain, Size: 1139 bytes --]

Hi Joseph!

On 11/14/22 18:52, Joseph Myers wrote:
> On Sun, 13 Nov 2022, Alejandro Colomar via Gcc wrote:
> 
>> Maybe allowing integral types and pointers would be enough.  However,
>> foreseeing that the _Lengthof() proposal (BTW, which paper was it?) will
>> succeed, and combining it with this one, _Lengthof(pointer) would ideally give
>> the length of the array, so allowing pointers would conflict.
> 
> Do you mean N2529 Romero, New pointer-proof keyword to determine array
> length?

Yes, that's it!  Thanks.

> To quote the convenor in WG14 reflector message 18575 (17 Nov
> 2020) when I asked about its status, "The author asked me not to put those
> on the agenda.  He will supply updated versions later.".

Since his email is not in the paper, would you mind forwarding him this 
suggestion of mine of renaming it to avoid confusion with string lengths?  Or 
maybe point him to the mailing list discussion[1]?

[1]: 
<https://lore.kernel.org/linux-man/20221110222540.as3jrjdzxsnot3zm@illithid/T/#m794ad2a3173a19099625ee1dec7ea11ab754513d>

Cheers,

Alex

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-13 16:28                                                         ` Alejandro Colomar
  2022-11-13 16:31                                                           ` Alejandro Colomar
@ 2022-11-14 18:13                                                           ` Joseph Myers
  1 sibling, 0 replies; 50+ messages in thread
From: Joseph Myers @ 2022-11-14 18:13 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc

On Sun, 13 Nov 2022, Alejandro Colomar via Gcc wrote:

> SYNOPSIS:
> 
> unary-operator:  . identifier

That's not what you mean.  See the standard syntax.

unary-expression:
  [other alternatives]
  unary-operator cast-expression

unary-operator: one of
  & * + - ~ !

> -  It is not an lvalue.
> 
>    -  This means sizeof() and _Lengthof() cannot be applied to them.

sizeof can be applied to non-lvalues.

>    -  This prevents ambiguity with a designator in an initializer-list within
> a nested braced-initializer.

No, it doesn't.  See my previous points about syntactic disambiguation 
being a separate matter from "one parse would result in a constraint 
violation, so choose another parse that doesn't" (necessarily, because the 
constraint violation that results could in general be at an arbitrary 
distance from the point where a choice of parse has to be made).  Or see 
e.g. the disambiguation rule about enum type specifiers: there is an 
explicit rule "If an enum type specifier is present, then the longest 
possible sequence of tokens that can be interpreted as a specifier 
qualifier list is interpreted as part of the enum type specifier." that 
ensures that "enum e : long int;" interprets "long int" as the enum type 
specifier, rather than "long" as the enum type specifier and "int" as 
another type specifier in the sequence of declaration specifiers, even 
though the latter parse would result in a constraint violation later.

Also, requiring unbounded lookahead to determine what kind of construct is 
being parsed may be considered questionable for C.  (If you have an 
initializer starting .a.b.c.d.e, possibly with array element access as 
well, those could all be designators or .a might be a reference to a 
parameter of struct or union type and .b.c.d.e a sequence of references to 
members within it and disambiguation under your rule would depend on 
whether an '=' follows such an unbounded sequence.)

> -  The type of a .identifier is always an incomplete type.
> 
>    -  This prevents circular dependencies involving sizeof() or _Lengthof().

We have typeof as well, which can be applied to expressions with 
incomplete type.

> -  Shadowing rules apply.
> 
>    -  This prevents ambiguity.

"Shadowing rules apply" isn't much of a specification.  You need detailed 
wording that would be added to 6.2.1 Scopes of identifiers (or equivalent 
elsewhere) to make it clear exactly what scopes apply for identifiers 
looked up using this construct.

>    -
>        void foo(struct bar { int x; char c[.x] } a, int x);
> 
>        Explanation:
>        -  Because of shadowing rules, [.x] refers to the struct member.

I really don't think standardizing VLAs-in-structures would be a good 
idea.  Certainly it would be a massive pain to specify meaningful 
semantics for them and this outline doesn't even attempt to work through 
the consequences of removing the rule that "If an identifier is declared 
as having a variably modified type, it shall be an ordinary identifier (as 
defined in 6.2.3), have no linkage, and have either block scope or 
function prototype scope.".

The idea that .x as an expression might refer to either a member or a 
parameter is also a massive change to the namespace rules, where at 
present those are in completely different namespaces and so in any given 
context a name only needs looking up as one or the other.

Again, proposals should be *minimal*.  And even when they are, many issues 
may well arise in practice (see the long list of constexpr issues in my 
commit message for that C2x feature, for example, which I expect to turn 
into multiple NB comments and at least two accompanying documents).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
  2022-11-14 17:57                                                   ` Alejandro Colomar
@ 2022-11-14 18:26                                                     ` Joseph Myers
  0 siblings, 0 replies; 50+ messages in thread
From: Joseph Myers @ 2022-11-14 18:26 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc

On Mon, 14 Nov 2022, Alejandro Colomar via Gcc wrote:

> > To quote the convenor in WG14 reflector message 18575 (17 Nov
> > 2020) when I asked about its status, "The author asked me not to put those
> > on the agenda.  He will supply updated versions later.".
> 
> Since his email is not in the paper, would you mind forwarding him this
> suggestion of mine of renaming it to avoid confusion with string lengths?  Or
> maybe point him to the mailing list discussion[1]?
> 
> [1]:
> <https://lore.kernel.org/linux-man/20221110222540.as3jrjdzxsnot3zm@illithid/T/#m794ad2a3173a19099625ee1dec7ea11ab754513d>

I don't have his email address (I don't see any emails from him on the 
reflector since I joined it in 2001).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2022-11-14 18:26 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20220826210710.35237-1-alx.manpages@gmail.com>
     [not found] ` <Ywn7jMtB5ppSW0PB@asta-kit.de>
     [not found]   ` <89d79095-d1cd-ab2b-00e4-caa31126751e@gmail.com>
     [not found]     ` <YwoXTGD8ljB8Gg6s@asta-kit.de>
     [not found]       ` <e29de088-ae10-bbc8-0bfd-90bbb63aaf06@gmail.com>
     [not found]         ` <5ba53bad-019e-8a94-d61e-85b2f13223a9@gmail.com>
     [not found]           ` <CACqA6+mfaj6Viw+LVOG=nE350gQhCwVKXRzycVru5Oi4EJzgTg@mail.gmail.com>
     [not found]             ` <491a930d-47eb-7c86-c0c4-25eef4ac0be0@gmail.com>
2022-09-02 21:57               ` [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters Alejandro Colomar
2022-09-03 12:47                 ` Martin Uecker
2022-09-03 13:29                   ` Ingo Schwarze
2022-09-03 15:08                     ` Alejandro Colomar
2022-09-03 13:41                   ` Alejandro Colomar
2022-09-03 14:35                     ` Martin Uecker
2022-09-03 14:59                       ` Alejandro Colomar
2022-09-03 15:31                         ` Martin Uecker
2022-09-03 20:02                           ` Alejandro Colomar
2022-09-05 14:31                             ` Alejandro Colomar
2022-11-10  0:06                           ` Alejandro Colomar
2022-11-10  0:09                             ` Alejandro Colomar
2022-11-10  1:33                             ` Joseph Myers
2022-11-10  1:39                               ` Joseph Myers
2022-11-10  6:21                                 ` Martin Uecker
2022-11-10 10:09                                   ` Alejandro Colomar
2022-11-10 23:19                                   ` Joseph Myers
2022-11-10 23:28                                     ` Alejandro Colomar
2022-11-11 19:52                                     ` Martin Uecker
2022-11-12  1:09                                       ` Joseph Myers
2022-11-12  7:24                                         ` Martin Uecker
2022-11-12 12:34                                     ` Alejandro Colomar
2022-11-12 12:46                                       ` Alejandro Colomar
2022-11-12 13:03                                       ` Joseph Myers
2022-11-12 13:40                                         ` Alejandro Colomar
2022-11-12 13:58                                           ` Alejandro Colomar
2022-11-12 14:54                                           ` Joseph Myers
2022-11-12 15:35                                             ` Alejandro Colomar
2022-11-12 17:02                                               ` Joseph Myers
2022-11-12 17:08                                                 ` Alejandro Colomar
2022-11-12 15:56                                             ` Martin Uecker
2022-11-13 13:19                                               ` Alejandro Colomar
2022-11-13 13:33                                                 ` Alejandro Colomar
2022-11-13 14:02                                                   ` Alejandro Colomar
2022-11-13 14:58                                                     ` Martin Uecker
2022-11-13 15:15                                                       ` Alejandro Colomar
2022-11-13 15:32                                                         ` Martin Uecker
2022-11-13 16:25                                                           ` Alejandro Colomar
2022-11-13 16:28                                                         ` Alejandro Colomar
2022-11-13 16:31                                                           ` Alejandro Colomar
2022-11-13 16:34                                                             ` Alejandro Colomar
2022-11-13 16:56                                                               ` Alejandro Colomar
2022-11-13 19:05                                                                 ` Alejandro Colomar
2022-11-14 18:13                                                           ` Joseph Myers
2022-11-14 17:52                                                 ` Joseph Myers
2022-11-14 17:57                                                   ` Alejandro Colomar
2022-11-14 18:26                                                     ` Joseph Myers
2022-11-10  9:40                             ` G. Branden Robinson
2022-11-10 10:59                               ` Alejandro Colomar
2022-11-10 22:25                                 ` G. Branden Robinson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).