* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
[not found] ` <491a930d-47eb-7c86-c0c4-25eef4ac0be0@gmail.com>
@ 2022-09-02 21:57 ` Alejandro Colomar
2022-09-03 12:47 ` Martin Uecker
0 siblings, 1 reply; 69+ messages in thread
From: Alejandro Colomar @ 2022-09-02 21:57 UTC (permalink / raw)
To: JeanHeyd Meneide; +Cc: Ingo Schwarze, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 8301 bytes --]
Hi JeanHeyd,
> Subject: Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in
> function parameters
> Date: Fri, 2 Sep 2022 16:56:00 -0400
> From: JeanHeyd Meneide <wg14@soasis.org>
> To: Alejandro Colomar <alx.manpages@gmail.com>
> CC: Ingo Schwarze <schwarze@usta.de>, linux-man@vger.kernel.org
>
>
>
> Hi Alejandro and Ingo,
>
> Just chiming in from a Standards perspective, here. We discussed,
> briefly, a way to allow Variable-Length function parameter declarations
> like the ones shown in this thread (e.g., char *getcwd(char buf[size],
> size_t size );).
>
> In GCC, there is a GNU extension that allows explicitly
> forward-declaring the prototype. Using the above example, it would look
> like so:
I added the GCC list to the thread, so that they can intervene if they
consider it necessary.
>
> char *getcwd(size_t size; char buf[size], size_t size);
I read about that, although I don't like it very much, and never used it.
>
> (Live Example [1])
>
> (Note the `;` after the first "size" declaration). This was brought
> before the Committee to vote on for C23 in the form of N2780 [2], around
> the January 2022 timeframe. The paper did not pass, and it was seen as a
> "failed extension". After the vote on that failed, we talked about other
> ways of allowing places whether there was some appetite to allow
> "forward parsing" for this sort of case. That is, could we simply allow:
>
> char *getcwd(char buf[size], size_t size);
>
> to work as expected. The vote for this did not gain full consensus
> either, but there were a lot of abstentions [3]. While I personally
> voted in favor of allowing such for C, there was distinct worry that
> this would produce issues for weaker C implementations that did not want
> to commit to delayed parsing or forward parsing of the entirety of the
> argument list before resolving types. There are enough abstentions
> during voting that a working implementation with a writeup of complexity
> would sway the Committee one way or the other.
I like that this got less hate than the GNU extension. It's nicer to my
eyes.
>
> This is not to dissuade Alejandro's position, or to bolster Ingo's
> point; I'm mostly just reporting the Committee's response here. This is
> an unsolved problem for the Committee, and also a larger holdover from
> the removal of K&R declarations from C23, which COULD solve this problem:
>
> // decl
> char *getcwd();
>
> // impl
> char* getcwd(buf, size)
> char buf[size];
> size_t size;
> {
> /* impl here */
> }
I won't miss them ;)
My regex-based parser[1] that finds declarations and definitions in C
code bases goes nuts with K&R functions. They are dead for good :)
[1]: <http://www.alejandro-colomar.es/src/alx/alx/grepc.git/>
>
> There is room for innovation here, or perhaps bolstering of the
> GCC original extension. As it stands right now, compilers only very
> recently started taking Variably-Modified Type parameters and Static
> Extent parameters seriously after carefully separating them out of
> Variable-Length Arrays, warning where they can when static or other
> array parameters do not match buffer lengths and so-on.
>
> Not just to the folks in this thread, but to the broader
> community for anyone who is paying attention: WG14 would actively like
> to solve this problem. If someone can:
> - prove out a way to do delayed parsing that is not implementation-costly,
> - revive the considered-dead GCC extension, or
> - provide a 3rd or 4th way to support the goals,
>
> I am certain WG14 would look favorably upon such a thing eventually,
> brought before the Committee in inclusion for C2y/C3a.
>
> Whether or not you feel like the manpages are the best place to
> start that, I'll leave up to you!
I'll try to defend the reasons to start this in the man-pages.
This feature is mostly for documentation purposes, not being meaningful
for code at all (for some meaning of meaningful), since it won't change
the function definition in any way, nor the calls to it. At least not
by itself; static analysis may get some benefits, though.
Also, new code can be designed from the beginning so that sizes go
before their corresponding arrays, so that new code won't typically be
affected by the lack of this feature in the language.
This leaves us with legacy code, especially libc, which just works, and
doesn't have any urgent needs to change their prototypes in this regard
(they could, to improve static analysis, but not what we'd call urgent).
And since most people don't go around reading libc headers searching for
function declarations (especially since there are manual pages that show
them nicely), it's not like the documentation of the code depends on how
the function is _actually_ declared in code (that's why I also defended
documenting restrict even if glibc wouldn't have cared to declare it),
but it depends basically on what the manual pages say about the
function. If the manual pages say a function gets 'restrict' params, it
means it gets 'restrict' params, no matter what the code says, and if it
doesn't, the function accepts overlapping pointers, at least for most of
the public (modulo manual page bugs, that is).
So this extension could very well be added by the manual pages, as a
form of documentation, and then maybe picked up by compilers that have
enough resources to implement it.
Considering that this feature is mostly about documentation (and a bit
of static analysis too), the documentation should be something appealing
to the reader.
Let's take an example:
int getnameinfo(const struct sockaddr *restrict addr,
socklen_t addrlen,
char *restrict host, socklen_t hostlen,
char *restrict serv, socklen_t servlen,
int flags);
and some transformations:
int getnameinfo(const struct sockaddr *restrict addr,
socklen_t addrlen,
char host[restrict hostlen], socklen_t hostlen,
char serv[restrict servlen], socklen_t servlen,
int flags);
int getnameinfo(socklen_t hostlen;
socklen_t servlen;
const struct sockaddr *restrict addr,
socklen_t addrlen,
char host[restrict hostlen], socklen_t hostlen,
char serv[restrict servlen], socklen_t servlen,
int flags);
(I'm not sure if I used correct GNU syntax, since I never used that
extension myself.)
The first transformation above is non-ambiguous, as concise as possible,
and its only issue is that it might complicate the implementation a bit
too much. I don't think forward-using a parameter's size would be too
much of a parsing problem for human readers.
The second one is unnecessarily long and verbose, and semicolons are not
very distinguishable from commas, for human readers, which may be very
confusing.
int foo(int a; int b[a], int a);
int foo(int a, int b[a], int o);
Those two are very different to the compiler, and yet very similar to
the human eye. I don't like it. The fact that it allows for simpler
compilers isn't enough to overcome the readability issues.
I think I'd prefer having the forward-using syntax as a non-standard
extension --or a standard but optional language feature-- to avoid
forcing small compilers to implement it, rather than having the GNU
extension standardized in all compilers.
Having this extension in any single compiler would even make it more
appealing to manual pages, which could use the syntax more freely
without fear of confusing readers. Even if the standard wouldn't accept it.
Let's see if GCC likes the feature and helps me attempt to use it a
little bit! :-)
Cheers,
Alex
--
Alejandro Colomar
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-09-02 21:57 ` [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters Alejandro Colomar
@ 2022-09-03 12:47 ` Martin Uecker
2022-09-03 13:29 ` Ingo Schwarze
2022-09-03 13:41 ` Alejandro Colomar
0 siblings, 2 replies; 69+ messages in thread
From: Martin Uecker @ 2022-09-03 12:47 UTC (permalink / raw)
To: Alejandro Colomar, JeanHeyd Meneide; +Cc: Ingo Schwarze, linux-man, gcc
...
> >
> > Whether or not you feel like the manpages are the best place to
> > start that, I'll leave up to you!
>
> I'll try to defend the reasons to start this in the man-pages.
>
> This feature is mostly for documentation purposes, not being meaningful
> for code at all (for some meaning of meaningful), since it won't change
> the function definition in any way, nor the calls to it. At least not
> by itself; static analysis may get some benefits, though.
GCC will warn if the bound is specified inconsistently between
declarations and also emit warnings if it can see that a buffer
which is passed is too small:
https://godbolt.org/z/PsjPG1nv7
BTW: If you declare pointers to arrays (not first elements) you
can get run-time bounds checking with UBSan:
https://godbolt.org/z/TvMo89WfP
>
> Also, new code can be designed from the beginning so that sizes go
> before their corresponding arrays, so that new code won't typically be
> affected by the lack of this feature in the language.
>
> This leaves us with legacy code, especially libc, which just works, and
> doesn't have any urgent needs to change their prototypes in this regard
> (they could, to improve static analysis, but not what we'd call urgent).
It would be useful step to find out-of-bounds problem in
applications using libc.
> And since most people don't go around reading libc headers searching for
> function declarations (especially since there are manual pages that show
> them nicely), it's not like the documentation of the code depends on how
> the function is _actually_ declared in code (that's why I also defended
> documenting restrict even if glibc wouldn't have cared to declare it),
> but it depends basically on what the manual pages say about the
> function. If the manual pages say a function gets 'restrict' params, it
> means it gets 'restrict' params, no matter what the code says, and if it
> doesn't, the function accepts overlapping pointers, at least for most of
> the public (modulo manual page bugs, that is).
>
> So this extension could very well be added by the manual pages, as a
> form of documentation, and then maybe picked up by compilers that have
> enough resources to implement it.
>
>
> Considering that this feature is mostly about documentation (and a bit
> of static analysis too), the documentation should be something appealing
> to the reader.
>
>
> Let's take an example:
>
>
> int getnameinfo(const struct sockaddr *restrict addr,
> socklen_t addrlen,
> char *restrict host, socklen_t hostlen,
> char *restrict serv, socklen_t servlen,
> int flags);
>
> and some transformations:
>
>
> int getnameinfo(const struct sockaddr *restrict addr,
> socklen_t addrlen,
> char host[restrict hostlen], socklen_t hostlen,
> char serv[restrict servlen], socklen_t servlen,
> int flags);
>
>
> int getnameinfo(socklen_t hostlen;
> socklen_t servlen;
> const struct sockaddr *restrict addr,
> socklen_t addrlen,
> char host[restrict hostlen], socklen_t hostlen,
> char serv[restrict servlen], socklen_t servlen,
> int flags);
>
> (I'm not sure if I used correct GNU syntax, since I never used that
> extension myself.)
>
> The first transformation above is non-ambiguous, as concise as possible,
> and its only issue is that it might complicate the implementation a bit
> too much. I don't think forward-using a parameter's size would be too
> much of a parsing problem for human readers.
I personally find the second form not terrible. Being
able to read code left-to-right, top-down is helpful in more
complicated examples.
> The second one is unnecessarily long and verbose, and semicolons are not
> very distinguishable from commas, for human readers, which may be very
> confusing.
>
> int foo(int a; int b[a], int a);
> int foo(int a, int b[a], int o);
>
> Those two are very different to the compiler, and yet very similar to
> the human eye. I don't like it. The fact that it allows for simpler
> compilers isn't enough to overcome the readability issues.
This is true, I would probably use it with a comma and/or
syntax highlighting.
> I think I'd prefer having the forward-using syntax as a non-standard
> extension --or a standard but optional language feature-- to avoid
> forcing small compilers to implement it, rather than having the GNU
> extension standardized in all compilers.
The problems with the second form are:
- it is not 100% backwards compatible (which maybe ok though) as
the semantics of the following code changes:
int n;
int foo(int a[n], int n); // refers to different n!
Code written for new compilers could then be misunderstood
by old compilers when a variable with 'n' is in scope.
- it would generally be fundamentally new to C to have
backwards references and parser might need to be changes
to allow this
- a compiler or tool then has to deal also with ugly
corner cases such as mutual references:
int foo(int (*a)[sizeof(*b)], int (*b)[sizeof(*a)]);
We could consider new syntax such as
int foo(char buf[.n], int n);
Personally, I would prefer the conceptual simplicity of forward
declarations and the fact that these exist already in GCC
over any alternative. I would also not mind new syntax, but
then one has to define the rules more precisely to avoid the
aforementioned problems.
Martin
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-09-03 12:47 ` Martin Uecker
@ 2022-09-03 13:29 ` Ingo Schwarze
2022-09-03 15:08 ` Alejandro Colomar
2022-09-03 13:41 ` Alejandro Colomar
1 sibling, 1 reply; 69+ messages in thread
From: Ingo Schwarze @ 2022-09-03 13:29 UTC (permalink / raw)
To: alx.manpages
Cc: Martin Uecker, Alejandro Colomar, JeanHeyd Meneide, linux-man, gcc
Hi,
the only point i strongly care about is this one:
Manual pages should not use
* non-standard syntax
* non-portable syntax
* ambiguous syntax (i.e. syntax that might have different meanings
with different compilers or in different contexts)
* syntax that might be invalid or dangerous with some widely
used compiler collections like GCC or LLVM
Regarding the discussions about standardization and extensions,
all proposals i have seen look seriously ugly and awkward to me,
and i'm not yet convinced such ugliness is sufficiently offset by
the relatively minor benefit that is apparent to me right now.
Yours,
Ingo
--
Ingo Schwarze <schwarze@usta.de>
http://www.openbsd.org/ <schwarze@openbsd.org>
http://mandoc.bsd.lv/ <schwarze@mandoc.bsd.lv>
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-09-03 12:47 ` Martin Uecker
2022-09-03 13:29 ` Ingo Schwarze
@ 2022-09-03 13:41 ` Alejandro Colomar
2022-09-03 14:35 ` Martin Uecker
1 sibling, 1 reply; 69+ messages in thread
From: Alejandro Colomar @ 2022-09-03 13:41 UTC (permalink / raw)
To: Martin Uecker; +Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 5699 bytes --]
Hi Martin,
On 9/3/22 14:47, Martin Uecker wrote:
[...]
> GCC will warn if the bound is specified inconsistently between
> declarations and also emit warnings if it can see that a buffer
> which is passed is too small:
>
> https://godbolt.org/z/PsjPG1nv7
That's very good news!
BTW, it's nice to see that GCC doesn't need 'static' for array
parameters. I never understood what the static keyword adds there.
There's no way one can specify an array size an mean anything other than
requiring that, for a non-null pointer, the array should have at least
that size.
>
>
> BTW: If you declare pointers to arrays (not first elements) you
> can get run-time bounds checking with UBSan:
>
> https://godbolt.org/z/TvMo89WfP
Couldn't that be caught at compile time? n is certainly out of bounds
always for such an array, since the last element is n-1.
>
>
>>
>> Also, new code can be designed from the beginning so that sizes go
>> before their corresponding arrays, so that new code won't typically be
>> affected by the lack of this feature in the language.
>>
>> This leaves us with legacy code, especially libc, which just works, and
>> doesn't have any urgent needs to change their prototypes in this regard
>> (they could, to improve static analysis, but not what we'd call urgent).
>
> It would be useful step to find out-of-bounds problem in
> applications using libc.
Yep, it would be very useful for that. Not urgent, but yes, very useful.
>> Let's take an example:
>>
>>
>> int getnameinfo(const struct sockaddr *restrict addr,
>> socklen_t addrlen,
>> char *restrict host, socklen_t hostlen,
>> char *restrict serv, socklen_t servlen,
>> int flags);
>>
>> and some transformations:
>>
>>
>> int getnameinfo(const struct sockaddr *restrict addr,
>> socklen_t addrlen,
>> char host[restrict hostlen], socklen_t hostlen,
>> char serv[restrict servlen], socklen_t servlen,
>> int flags);
>>
>>
>> int getnameinfo(socklen_t hostlen;
>> socklen_t servlen;
>> const struct sockaddr *restrict addr,
>> socklen_t addrlen,
>> char host[restrict hostlen], socklen_t hostlen,
>> char serv[restrict servlen], socklen_t servlen,
>> int flags);
>>
>> (I'm not sure if I used correct GNU syntax, since I never used that
>> extension myself.)
>>
>> The first transformation above is non-ambiguous, as concise as possible,
>> and its only issue is that it might complicate the implementation a bit
>> too much. I don't think forward-using a parameter's size would be too
>> much of a parsing problem for human readers.
>
>
> I personally find the second form not terrible. Being
> able to read code left-to-right, top-down is helpful in more
> complicated examples.
>
>
>
>> The second one is unnecessarily long and verbose, and semicolons are not
>> very distinguishable from commas, for human readers, which may be very
>> confusing.
>>
>> int foo(int a; int b[a], int a);
>> int foo(int a, int b[a], int o);
>>
>> Those two are very different to the compiler, and yet very similar to
>> the human eye. I don't like it. The fact that it allows for simpler
>> compilers isn't enough to overcome the readability issues.
>
> This is true, I would probably use it with a comma and/or
> syntax highlighting.
>
>
>> I think I'd prefer having the forward-using syntax as a non-standard
>> extension --or a standard but optional language feature-- to avoid
>> forcing small compilers to implement it, rather than having the GNU
>> extension standardized in all compilers.
>
> The problems with the second form are:
>
> - it is not 100% backwards compatible (which maybe ok though) as
> the semantics of the following code changes:
>
> int n;
> int foo(int a[n], int n); // refers to different n!
>
> Code written for new compilers could then be misunderstood
> by old compilers when a variable with 'n' is in scope.
>
>
Hmmm, this one is serious. I can't seem to solve it with that syntax.
> - it would generally be fundamentally new to C to have
> backwards references and parser might need to be changes
> to allow this
>
>
> - a compiler or tool then has to deal also with ugly
> corner cases such as mutual references:
>
> int foo(int (*a)[sizeof(*b)], int (*b)[sizeof(*a)]);
>
>
>
> We could consider new syntax such as
>
> int foo(char buf[.n], int n);
>
>
> Personally, I would prefer the conceptual simplicity of forward
> declarations and the fact that these exist already in GCC
> over any alternative. I would also not mind new syntax, but
> then one has to define the rules more precisely to avoid the
> aforementioned problems.
What about taking something from K&R functions for this?:
int foo(q; w; int a[q], int q, int s[w], int w);
By not specifying the types, the syntax is again short.
This is left-to-right, so no problems with global variables, and no need
for complex parsers.
Also, by not specifying types, now it's more obvious to the naked eye
that there's a difference:
int foo(a; int b[a], int a);
int foo(int a, int b[a], int o);
What do you think about this syntax?
Thanks,
Alex
--
Alejandro Colomar
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-09-03 13:41 ` Alejandro Colomar
@ 2022-09-03 14:35 ` Martin Uecker
2022-09-03 14:59 ` Alejandro Colomar
0 siblings, 1 reply; 69+ messages in thread
From: Martin Uecker @ 2022-09-03 14:35 UTC (permalink / raw)
To: Alejandro Colomar; +Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
Am Samstag, den 03.09.2022, 15:41 +0200 schrieb Alejandro Colomar:
> Hi Martin,
>
> On 9/3/22 14:47, Martin Uecker wrote:
> [...]
>
> > GCC will warn if the bound is specified inconsistently between
> > declarations and also emit warnings if it can see that a buffer
> > which is passed is too small:
> >
> > https://godbolt.org/z/PsjPG1nv7
>
> That's very good news!
>
> BTW, it's nice to see that GCC doesn't need 'static' for array
> parameters. I never understood what the static keyword adds there.
> There's no way one can specify an array size an mean anything other than
> requiring that, for a non-null pointer, the array should have at least
> that size.
From the C standard's point of view,
void foo(int n, char buf[n]);
is semantically equivalent to
void foo(int, char *buf);
and without 'static' the 'n' has no further meaning
(this is different for pointers to arrays).
The static keyword implies that the pointer is be valid and
non-zero and that there must be at least 'n' elements
accessible, so in some sense it is stronger (it implies
alid non-zero pointers), but at the same time it does not
imply a bound.
But I agree that 'n' without 'static' should simply imply
a bound and I think we should use it this way even when
the standard currently does not attach a meaning to it.
> >
> > BTW: If you declare pointers to arrays (not first elements) you
> > can get run-time bounds checking with UBSan:
> >
> > https://godbolt.org/z/TvMo89WfP
>
> Couldn't that be caught at compile time? n is certainly out of bounds
> always for such an array, since the last element is n-1.
Yes, in this example it could (and ideally should) be
detected at compile time.
But this notation already today allows passing of a bound
across API boundaries and thus enables run-time detection of
out-of-bound accesses even in scenarious where it could
not be found at compile time.
> >
> > > Also, new code can be designed from the beginning so that sizes go
> > > before their corresponding arrays, so that new code won't typically be
> > > affected by the lack of this feature in the language.
> > >
> > > This leaves us with legacy code, especially libc, which just works, and
> > > doesn't have any urgent needs to change their prototypes in this regard
> > > (they could, to improve static analysis, but not what we'd call urgent).
> >
> > It would be useful step to find out-of-bounds problem in
> > applications using libc.
>
> Yep, it would be very useful for that. Not urgent, but yes, very useful.
>
>
> > > Let's take an example:
> > >
> > >
> > > int getnameinfo(const struct sockaddr *restrict addr,
> > > socklen_t addrlen,
> > > char *restrict host, socklen_t hostlen,
> > > char *restrict serv, socklen_t servlen,
> > > int flags);
> > >
> > > and some transformations:
> > >
> > >
> > > int getnameinfo(const struct sockaddr *restrict addr,
> > > socklen_t addrlen,
> > > char host[restrict hostlen], socklen_t hostlen,
> > > char serv[restrict servlen], socklen_t servlen,
> > > int flags);
> > >
> > >
> > > int getnameinfo(socklen_t hostlen;
> > > socklen_t servlen;
> > > const struct sockaddr *restrict addr,
> > > socklen_t addrlen,
> > > char host[restrict hostlen], socklen_t hostlen,
> > > char serv[restrict servlen], socklen_t servlen,
> > > int flags);
> > >
> > > (I'm not sure if I used correct GNU syntax, since I never used that
> > > extension myself.)
> > >
> > > The first transformation above is non-ambiguous, as concise as possible,
> > > and its only issue is that it might complicate the implementation a bit
> > > too much. I don't think forward-using a parameter's size would be too
> > > much of a parsing problem for human readers.
> >
> > I personally find the second form not terrible. Being
> > able to read code left-to-right, top-down is helpful in more
> > complicated examples.
> >
> >
> >
> > > The second one is unnecessarily long and verbose, and semicolons are not
> > > very distinguishable from commas, for human readers, which may be very
> > > confusing.
> > >
> > > int foo(int a; int b[a], int a);
> > > int foo(int a, int b[a], int o);
> > >
> > > Those two are very different to the compiler, and yet very similar to
> > > the human eye. I don't like it. The fact that it allows for simpler
> > > compilers isn't enough to overcome the readability issues.
> >
> > This is true, I would probably use it with a comma and/or
> > syntax highlighting.
> >
> >
> > > I think I'd prefer having the forward-using syntax as a non-standard
> > > extension --or a standard but optional language feature-- to avoid
> > > forcing small compilers to implement it, rather than having the GNU
> > > extension standardized in all compilers.
> >
> > The problems with the second form are:
> >
> > - it is not 100% backwards compatible (which maybe ok though) as
> > the semantics of the following code changes:
> >
> > int n;
> > int foo(int a[n], int n); // refers to different n!
> >
> > Code written for new compilers could then be misunderstood
> > by old compilers when a variable with 'n' is in scope.
> >
> >
>
> Hmmm, this one is serious. I can't seem to solve it with that syntax.
>
> > - it would generally be fundamentally new to C to have
> > backwards references and parser might need to be changes
> > to allow this
> >
> >
> > - a compiler or tool then has to deal also with ugly
> > corner cases such as mutual references:
> >
> > int foo(int (*a)[sizeof(*b)], int (*b)[sizeof(*a)]);
> >
> >
> >
> > We could consider new syntax such as
> >
> > int foo(char buf[.n], int n);
> >
> >
> > Personally, I would prefer the conceptual simplicity of forward
> > declarations and the fact that these exist already in GCC
> > over any alternative. I would also not mind new syntax, but
> > then one has to define the rules more precisely to avoid the
> > aforementioned problems.
>
> What about taking something from K&R functions for this?:
>
> int foo(q; w; int a[q], int q, int s[w], int w);
>
> By not specifying the types, the syntax is again short.
> This is left-to-right, so no problems with global variables, and no need
> for complex parsers.
> Also, by not specifying types, now it's more obvious to the naked eye
> that there's a difference:
I am ok with the syntax, but I am not sure how this would
work. If the type is determined only later you would still
have to change parsers (some C compilers do type
checking and folding during parsing, so need the types
to be known during parsing) and you also still have the
problem with the mutual dependencies.
We thought about using this syntax
int foo(char buf[.n], int n);
because it is new syntax which means we can restrict the
size to be the name of a parameter instead of allowing
arbitrary expressions, which then makes forward references
less problematic. It is also consistent with designators in
initializers and could also be extend to annotate
flexible array members or for storing pointers to arrays
in structures:
struct {
int n;
char buf[.n];
};
struct {
int n;
char (*buf)[.n];
};
Martin
>
> int foo(a; int b[a], int a);
> int foo(int a, int b[a], int o);
>
>
> What do you think about this syntax?
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-09-03 14:35 ` Martin Uecker
@ 2022-09-03 14:59 ` Alejandro Colomar
2022-09-03 15:31 ` Martin Uecker
0 siblings, 1 reply; 69+ messages in thread
From: Alejandro Colomar @ 2022-09-03 14:59 UTC (permalink / raw)
To: Martin Uecker; +Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 3878 bytes --]
Hi Martin,
On 9/3/22 16:35, Martin Uecker wrote:
> Am Samstag, den 03.09.2022, 15:41 +0200 schrieb Alejandro Colomar:
>> Hi Martin,
>>
>> On 9/3/22 14:47, Martin Uecker wrote:
>> [...]
>>
>>> GCC will warn if the bound is specified inconsistently between
>>> declarations and also emit warnings if it can see that a buffer
>>> which is passed is too small:
>>>
>>> https://godbolt.org/z/PsjPG1nv7
>>
>> That's very good news!
>>
>> BTW, it's nice to see that GCC doesn't need 'static' for array
>> parameters. I never understood what the static keyword adds there.
>> There's no way one can specify an array size an mean anything other than
>> requiring that, for a non-null pointer, the array should have at least
>> that size.
>
> From the C standard's point of view,
>
> void foo(int n, char buf[n]);
>
> is semantically equivalent to
>
> void foo(int, char *buf);
>
> and without 'static' the 'n' has no further meaning
> (this is different for pointers to arrays).
I know. I just don't understand the rationale for that decission. :/
>
> The static keyword implies that the pointer is be valid and
> non-zero and that there must be at least 'n' elements
> accessible, so in some sense it is stronger (it implies
> alid non-zero pointers), but at the same time it does not
> imply a bound.
That stronger meaning, I think is a mistake by the standard.
Basically, [static n] means the same as [n] combined with [[gnu::nonnull]].
What the standard should have done would be to keep those two things
separate, since one may want to declare non-null non-array pointers, or
possibly-null array ones. So the standard should have standardized some
form of nonnull for that. But the recent discussion about presenting
nonnull pointers as [static 1] is horrible. But let's wait till the
future hopefully fixes this.
>
> But I agree that 'n' without 'static' should simply imply
> a bound and I think we should use it this way even when
> the standard currently does not attach a meaning to it.
Yep.
[...]
>> What about taking something from K&R functions for this?:
>>
>> int foo(q; w; int a[q], int q, int s[w], int w);
>>
>> By not specifying the types, the syntax is again short.
>> This is left-to-right, so no problems with global variables, and no need
>> for complex parsers.
>> Also, by not specifying types, now it's more obvious to the naked eye
>> that there's a difference:
>
> I am ok with the syntax, but I am not sure how this would
> work. If the type is determined only later you would still
> have to change parsers (some C compilers do type
> checking and folding during parsing, so need the types
> to be known during parsing) and you also still have the
> problem with the mutual dependencies.
This syntax resembles a lot K&R syntax. Any C compiler that supports
them (and I guess most compilers out there do) should be easily
convertible to support this syntax (at least more easily than other
alternatives). But this is just a guess.
>
> We thought about using this syntax
>
> int foo(char buf[.n], int n);
>
> because it is new syntax which means we can restrict the
> size to be the name of a parameter instead of allowing
> arbitrary expressions, which then makes forward references
> less problematic. It is also consistent with designators in
> initializers and could also be extend to annotate
> flexible array members or for storing pointers to arrays
> in structures:
It's not crazy. I don't have much to argue against it.
>
> struct {
> int n;
> char buf[.n];
> };
>
> struct {
> int n;
> char (*buf)[.n];
> };
Perhaps some doubts about how this would work for nested structures, but
not unreasonable.
Cheers,
Alex
--
Alejandro Colomar
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-09-03 13:29 ` Ingo Schwarze
@ 2022-09-03 15:08 ` Alejandro Colomar
0 siblings, 0 replies; 69+ messages in thread
From: Alejandro Colomar @ 2022-09-03 15:08 UTC (permalink / raw)
To: Ingo Schwarze; +Cc: Martin Uecker, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 1460 bytes --]
Hi Ingo,
On 9/3/22 15:29, Ingo Schwarze wrote:
> the only point i strongly care about is this one:
>
> Manual pages should not use
> * non-standard syntax
> * non-portable syntax
> * ambiguous syntax (i.e. syntax that might have different meanings
> with different compilers or in different contexts)
> * syntax that might be invalid or dangerous with some widely
> used compiler collections like GCC or LLVM
The first two are good guidelines, but not strict IMHO if there's a good
reason.
The third and fourth are a strong requirements.
For now I won't be applying this patch.
>
> Regarding the discussions about standardization and extensions,
> all proposals i have seen look seriously ugly and awkward to me,
> and i'm not yet convinced such ugliness is sufficiently offset by
> the relatively minor benefit that is apparent to me right now.
I hope we come up with something not ugly from that discussion.
The static analysis / compiler warning capabilities of using VLA syntax
seem strong reasons to me. They help avoid stupid bugs, even for
careless programmers (well, only if those careless programmers care just
enough to enable -Wall, and then to read the warnings). Not something
that will fix an incorrect algorithm, but can stop some typos, or other
stupid mistakes that we all do from time to time.
Cheers,
Alex
--
Alejandro Colomar
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-09-03 14:59 ` Alejandro Colomar
@ 2022-09-03 15:31 ` Martin Uecker
2022-09-03 20:02 ` Alejandro Colomar
2022-11-10 0:06 ` Alejandro Colomar
0 siblings, 2 replies; 69+ messages in thread
From: Martin Uecker @ 2022-09-03 15:31 UTC (permalink / raw)
To: Alejandro Colomar; +Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
Hi Alejandro,
Am Samstag, den 03.09.2022, 16:59 +0200 schrieb Alejandro Colomar:
> Hi Martin,
>
> On 9/3/22 16:35, Martin Uecker wrote:
> > Am Samstag, den 03.09.2022, 15:41 +0200 schrieb Alejandro Colomar:
> > > Hi Martin,
> > >
> > > On 9/3/22 14:47, Martin Uecker wrote:
> > > [...]
> > >
> > > > GCC will warn if the bound is specified inconsistently between
> > > > declarations and also emit warnings if it can see that a buffer
> > > > which is passed is too small:
> > > >
> > > > https://godbolt.org/z/PsjPG1nv7
> > >
> > > That's very good news!
> > >
> > > BTW, it's nice to see that GCC doesn't need 'static' for array
> > > parameters. I never understood what the static keyword adds there.
> > > There's no way one can specify an array size an mean anything other than
> > > requiring that, for a non-null pointer, the array should have at least
> > > that size.
> >
> > From the C standard's point of view,
> >
> > void foo(int n, char buf[n]);
> >
> > is semantically equivalent to
> >
> > void foo(int, char *buf);
> >
> > and without 'static' the 'n' has no further meaning
> > (this is different for pointers to arrays).
>
> I know. I just don't understand the rationale for that decission. :/
I guess it made sense in the past, but is simply not
what we need today.
> > The static keyword implies that the pointer is be valid and
> > non-zero and that there must be at least 'n' elements
> > accessible, so in some sense it is stronger (it implies
> > alid non-zero pointers), but at the same time it does not
> > imply a bound.
>
> That stronger meaning, I think is a mistake by the standard.
> Basically, [static n] means the same as [n] combined with [[gnu::nonnull]].
> What the standard should have done would be to keep those two things
> separate, since one may want to declare non-null non-array pointers, or
> possibly-null array ones. So the standard should have standardized some
> form of nonnull for that.
I agree the situation is not good.
> But the recent discussion about presenting
> nonnull pointers as [static 1] is horrible. But let's wait till the
> future hopefully fixes this.
yes, [static 1] is problematic because then the number
can not be used as a bound anymore.
My experience is that if one wants to see something fixed,
one has to push for it. Standardization is meant
to standardize existing practice, so if we want to see
this improved, we can not wait for this.
> > But I agree that 'n' without 'static' should simply imply
> > a bound and I think we should use it this way even when
> > the standard currently does not attach a meaning to it.
>
> Yep.
>
> [...]
>
> > > What about taking something from K&R functions for this?:
> > >
> > > int foo(q; w; int a[q], int q, int s[w], int w);
> > >
> > > By not specifying the types, the syntax is again short.
> > > This is left-to-right, so no problems with global variables, and no need
> > > for complex parsers.
> > > Also, by not specifying types, now it's more obvious to the naked eye
> > > that there's a difference:
> >
> > I am ok with the syntax, but I am not sure how this would
> > work. If the type is determined only later you would still
> > have to change parsers (some C compilers do type
> > checking and folding during parsing, so need the types
> > to be known during parsing) and you also still have the
> > problem with the mutual dependencies.
>
> This syntax resembles a lot K&R syntax. Any C compiler that supports
> them (and I guess most compilers out there do) should be easily
> convertible to support this syntax (at least more easily than other
> alternatives). But this is just a guess.
In K&R syntax this worked for definition:
void foo(y, n)
int n;
int y[n];
{ ...
But this worked because you could reorder the
declarations so that later declarations could
refer to previous ones.
So one could do
int foo(int n, char buf[n]; buf, n);
where the second part defines the order of
the parameter or
int foo(buf, n; int n, char buf[n]);
where the first part defins the order,
but the declarations need to have the size
first. But then you need to specify each
parameter twice...
> > We thought about using this syntax
> >
> > int foo(char buf[.n], int n);
> >
> > because it is new syntax which means we can restrict the
> > size to be the name of a parameter instead of allowing
> > arbitrary expressions, which then makes forward references
> > less problematic. It is also consistent with designators in
> > initializers and could also be extend to annotate
> > flexible array members or for storing pointers to arrays
> > in structures:
>
> It's not crazy. I don't have much to argue against it.
>
> > struct {
> > int n;
> > char buf[.n];
> > };
> >
> > struct {
> > int n;
> > char (*buf)[.n];
> > };
>
> Perhaps some doubts about how this would work for nested structures, but
> not unreasonable.
It is not implemented though...
Martin
> Cheers,
>
> Alex
>
> --
> Alejandro Colomar
> <http://www.alejandro-colomar.es/>
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-09-03 15:31 ` Martin Uecker
@ 2022-09-03 20:02 ` Alejandro Colomar
2022-09-05 14:31 ` Alejandro Colomar
2022-11-10 0:06 ` Alejandro Colomar
1 sibling, 1 reply; 69+ messages in thread
From: Alejandro Colomar @ 2022-09-03 20:02 UTC (permalink / raw)
To: Martin Uecker; +Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 2782 bytes --]
Hi Martin,
On 9/3/22 17:31, Martin Uecker wrote:
[...]
>> But the recent discussion about presenting
>> nonnull pointers as [static 1] is horrible. But let's wait till the
>> future hopefully fixes this.
>
> yes, [static 1] is problematic because then the number
> can not be used as a bound anymore.
>
> My experience is that if one wants to see something fixed,
> one has to push for it. Standardization is meant
> to standardize existing practice, so if we want to see
> this improved, we can not wait for this.
>
Yeah, I'm not just waiting to see if it gets fixed alone. I've been
discussing about nonnull being added to the standard, or improved in the
compilers, but so far no compiler has something convincing. GCC's
attribute is problematic due to UB issues, and Clang's _Nonnull keyword
is useless as of now:
<https://github.com/llvm/llvm-project/issues/57546>
Maybe GCC could add Clang's _Nonnull (and maybe _Nullable and the
pragmas, but definitely not _Null_unspecified), and add some good warnings.
Only then it would make sense to try to standardize the feature.
[...]
> In K&R syntax this worked for definition:
>
> void foo(y, n)
> int n;
> int y[n];
> { ...
>
> But this worked because you could reorder the
> declarations so that later declarations could
> refer to previous ones.
>
> So one could do
>
> int foo(int n, char buf[n]; buf, n);
>
> where the second part defines the order of
> the parameter or
>
> int foo(buf, n; int n, char buf[n]);
>
> where the first part defins the order,
> but the declarations need to have the size
> first. But then you need to specify each
> parameter twice...
Hmm, yeah, maybe the [.n] notation makes more sense.
>
>
>>> We thought about using this syntax
>>>
>>> int foo(char buf[.n], int n);
>>>
>>> because it is new syntax which means we can restrict the
>>> size to be the name of a parameter instead of allowing
>>> arbitrary expressions, which then makes forward references
>>> less problematic. It is also consistent with designators in
>>> initializers and could also be extend to annotate
>>> flexible array members or for storing pointers to arrays
>>> in structures:
>>
>> It's not crazy. I don't have much to argue against it.
>>
>>> struct {
>>> int n;
>>> char buf[.n];
>>> };
>>>
>>> struct {
>>> int n;
>>> char (*buf)[.n];
>>> };
>>
>> Perhaps some doubts about how this would work for nested structures, but
>> not unreasonable.
>
> It is not implemented though...
Well, are you planning to implement it?
If you do, I'm very interested in using it in the documentation ;)
Cheers,
Alex
--
Alejandro Colomar
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-09-03 20:02 ` Alejandro Colomar
@ 2022-09-05 14:31 ` Alejandro Colomar
0 siblings, 0 replies; 69+ messages in thread
From: Alejandro Colomar @ 2022-09-05 14:31 UTC (permalink / raw)
To: Martin Uecker; +Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 730 bytes --]
Hi Martin,
On 9/3/22 22:02, Alejandro Colomar wrote:
>>>> We thought about using this syntax
>>>>
>>>> int foo(char buf[.n], int n);
BTW, it would be useful if this syntax was accepted for void * too,
especially since GNU C allows pointer arithmetic on void *.
void *memmove(void dest[.n], const void src[.n], size_t n);
I understand that a void array doesn't make sense, so defining a VLA of
type void is an error elsewhere, but since array parameters are not
really arrays, and instead pointers, this could be reasonable.
The same that these "arrays" can have zero sizes, or even negative ones
in some weird cases.
Cheers,
Alex
--
Alejandro Colomar
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-09-03 15:31 ` Martin Uecker
2022-09-03 20:02 ` Alejandro Colomar
@ 2022-11-10 0:06 ` Alejandro Colomar
2022-11-10 0:09 ` Alejandro Colomar
` (2 more replies)
1 sibling, 3 replies; 69+ messages in thread
From: Alejandro Colomar @ 2022-11-10 0:06 UTC (permalink / raw)
To: Martin Uecker; +Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 1792 bytes --]
Hi Martin,
On 9/3/22 17:31, Martin Uecker wrote:
> My experience is that if one wants to see something fixed,
> one has to push for it. Standardization is meant
> to standardize existing practice, so if we want to see
> this improved, we can not wait for this.
I fully agree with you. I've been ruminating these patches for some time, for
having some more time to think about them. Now, I like them enough to push.
So, after a few minor cosmetic issues detected by some linters, I've pushed the
changes to document all of man2 and man3 with hypothetical VLA syntax.
Now, I've released man-pages-6.01 very recently (just a few weeks ago), and I
don't plan to release again in a year or two, so there's time to do the
implementation in GCC. From my side, please consider this an ACK or even
somewhat of a push to get things done in the compiler side of things :)
I'll show here an excerpt of what kind of syntax has been pushed. Of course,
there's room for improving/fixing, since it's not seen an official release, but
for now, this is what's up there:
int strncmp(const char s1[.n], const char s2[.n], size_t n);
long mbind(void addr[.len], unsigned long len, int mode,
const unsigned long nodemask[(.maxnode + ULONG_WIDTH ‐ 1)
/ ULONG_WIDTH],
unsigned long maxnode, unsigned int flags);
int cacheflush(void addr[.nbytes], int nbytes, int cache);
I've shown the three kinds of prototypes that have been changed:
- Normal VLA; nothing fancy except for the '.'.
- Complex size expressions.
- 'void *' VLAs (assuming GNU conventions: sizeof(void *)==1).
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-10 0:06 ` Alejandro Colomar
@ 2022-11-10 0:09 ` Alejandro Colomar
2022-11-10 1:33 ` Joseph Myers
2022-11-10 9:40 ` G. Branden Robinson
2 siblings, 0 replies; 69+ messages in thread
From: Alejandro Colomar @ 2022-11-10 0:09 UTC (permalink / raw)
To: Martin Uecker; +Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 2058 bytes --]
On 11/10/22 01:06, Alejandro Colomar wrote:
> Hi Martin,
>
> On 9/3/22 17:31, Martin Uecker wrote:
>> My experience is that if one wants to see something fixed,
>> one has to push for it. Standardization is meant
>> to standardize existing practice, so if we want to see
>> this improved, we can not wait for this.
>
> I fully agree with you. I've been ruminating these patches for some time, for
> having some more time to think about them. Now, I like them enough to push. So,
> after a few minor cosmetic issues detected by some linters, I've pushed the
> changes to document all of man2 and man3 with hypothetical VLA syntax.
>
> Now, I've released man-pages-6.01 very recently (just a few weeks ago), and I
> don't plan to release again in a year or two, so there's time to do the
> implementation in GCC. From my side, please consider this an ACK or even
> somewhat of a push to get things done in the compiler side of things :)
>
> I'll show here an excerpt of what kind of syntax has been pushed. Of course,
> there's room for improving/fixing, since it's not seen an official release, but
> for now, this is what's up there:
>
>
> int strncmp(const char s1[.n], const char s2[.n], size_t n);
>
> long mbind(void addr[.len], unsigned long len, int mode,
> const unsigned long nodemask[(.maxnode + ULONG_WIDTH ‐ 1)
> / ULONG_WIDTH],
> unsigned long maxnode, unsigned int flags);
>
> int cacheflush(void addr[.nbytes], int nbytes, int cache);
>
>
> I've shown the three kinds of prototypes that have been changed:
>
> - Normal VLA; nothing fancy except for the '.'.
> - Complex size expressions.
> - 'void *' VLAs (assuming GNU conventions: sizeof(void *)==1).
Oops: sizeof(void)==1
>
>
> Cheers,
>
> Alex
>
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-10 0:06 ` Alejandro Colomar
2022-11-10 0:09 ` Alejandro Colomar
@ 2022-11-10 1:33 ` Joseph Myers
2022-11-10 1:39 ` Joseph Myers
2022-11-10 9:40 ` G. Branden Robinson
2 siblings, 1 reply; 69+ messages in thread
From: Joseph Myers @ 2022-11-10 1:33 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
On Thu, 10 Nov 2022, Alejandro Colomar via Gcc wrote:
> I've shown the three kinds of prototypes that have been changed:
>
> - Normal VLA; nothing fancy except for the '.'.
> - Complex size expressions.
> - 'void *' VLAs (assuming GNU conventions: sizeof(void *)==1).
That doesn't cover any of the tricky issues with such proposals, such as
the choice of which entity is referred to by the parameter name when there
are multiple nested parameter lists that use the same parameter name, or
when the identifier is visible from an outer scope (including in
particular the case where it's declared as a typedef name in an outer
scope).
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-10 1:33 ` Joseph Myers
@ 2022-11-10 1:39 ` Joseph Myers
2022-11-10 6:21 ` Martin Uecker
0 siblings, 1 reply; 69+ messages in thread
From: Joseph Myers @ 2022-11-10 1:39 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
On Thu, 10 Nov 2022, Joseph Myers wrote:
> On Thu, 10 Nov 2022, Alejandro Colomar via Gcc wrote:
>
> > I've shown the three kinds of prototypes that have been changed:
> >
> > - Normal VLA; nothing fancy except for the '.'.
> > - Complex size expressions.
> > - 'void *' VLAs (assuming GNU conventions: sizeof(void *)==1).
>
> That doesn't cover any of the tricky issues with such proposals, such as
> the choice of which entity is referred to by the parameter name when there
> are multiple nested parameter lists that use the same parameter name, or
> when the identifier is visible from an outer scope (including in
> particular the case where it's declared as a typedef name in an outer
> scope).
In fact I can't tell from these examples whether you mean for a '.' token
after '[' to have special semantics, or whether you mean to have a special
'. identifier' form of expression valid in certain context (each of which
introduces its own complications; for the former, typedef names from outer
scopes are problematic; for the latter, it's designated initializers where
you get complications, for example). Designing new syntax that doesn't
cause ambiguity is generally tricky, and this sort of language extension
is the kind of thing where you'd expect to so through at least five
iterations of a WG14 paper before you have something like a sound
specification.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-10 1:39 ` Joseph Myers
@ 2022-11-10 6:21 ` Martin Uecker
2022-11-10 10:09 ` Alejandro Colomar
2022-11-10 23:19 ` Joseph Myers
0 siblings, 2 replies; 69+ messages in thread
From: Martin Uecker @ 2022-11-10 6:21 UTC (permalink / raw)
To: Joseph Myers, Alejandro Colomar
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
Am Donnerstag, den 10.11.2022, 01:39 +0000 schrieb Joseph Myers:
> On Thu, 10 Nov 2022, Joseph Myers wrote:
>
> > On Thu, 10 Nov 2022, Alejandro Colomar via Gcc wrote:
> >
> > > I've shown the three kinds of prototypes that have been changed:
> > >
> > > - Normal VLA; nothing fancy except for the '.'.
> > > - Complex size expressions.
> > > - 'void *' VLAs (assuming GNU conventions: sizeof(void *)==1).
> >
> > That doesn't cover any of the tricky issues with such proposals, such as
> > the choice of which entity is referred to by the parameter name when there
> > are multiple nested parameter lists that use the same parameter name, or
> > when the identifier is visible from an outer scope (including in
> > particular the case where it's declared as a typedef name in an outer
> > scope).
>
> In fact I can't tell from these examples whether you mean for a '.' token
> after '[' to have special semantics, or whether you mean to have a special
> '. identifier' form of expression valid in certain context (each of which
> introduces its own complications; for the former, typedef names from outer
> scopes are problematic; for the latter, it's designated initializers where
> you get complications, for example). Designing new syntax that doesn't
> cause ambiguity is generally tricky, and this sort of language extension
> is the kind of thing where you'd expect to so through at least five
> iterations of a WG14 paper before you have something like a sound
> specification.
I am not sure what Alejandro has in mind exactly, but my idea of using
a new notation [.identifier] would be to limit it to accessing other
parameter names in the same parameter list only, so that there is
1) no ambiguity what is referred to and
2) one can access parameters which come later
If we want to specify something like this, I think we should also
restrict what kind of expressions one allows, e.g. it has to
be side-effect free. But maybe we want to make this even more
restrictive (at least initially).
One problem with WG14 papers is that people put in too much,
because the overhead is so high and the standard is not updated
very often. It would be better to build such feature more
incrementally, which could be done more easily with a compiler
extension. One could start supporting just [.x] but not more
complicated expressions.
Later WG14 can still accept or reject or modify this proposal
based on the experience we get.
(I would also be happy with using GNU forward declarations, and
I am not sure why people dislike them so much.)
Martin
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-10 0:06 ` Alejandro Colomar
2022-11-10 0:09 ` Alejandro Colomar
2022-11-10 1:33 ` Joseph Myers
@ 2022-11-10 9:40 ` G. Branden Robinson
2022-11-10 10:59 ` Alejandro Colomar
2 siblings, 1 reply; 69+ messages in thread
From: G. Branden Robinson @ 2022-11-10 9:40 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1: Type: text/plain, Size: 944 bytes --]
Hi Alex,
At 2022-11-10T01:06:31+0100, Alejandro Colomar wrote:
> Now, I've released man-pages-6.01 very recently (just a few weeks
> ago), and I don't plan to release again in a year or two, so there's
> time to do the implementation in GCC. From my side, please consider
> this an ACK or even somewhat of a push to get things done in the
> compiler side of things :)
Do you mean you _don't_ plan to release again for a year or two?
You know what Moltke said about plans and contact with the enemy. For
one thing, I think the Linux kernel will move too fast to permit such a
leisurely cadence.
Also, as soon as Bertrand and I can get groff 1.23 out[1], I am hoping
you will, shortly thereafter, migrate to the new `MR` macro.
<tents fingers, laughs villainously>
Regards,
Branden
[1] Only 6 RC bugs left!
https://savannah.gnu.org/bugs/index.php?go_report=Apply&group=groff&set=custom&report_id=225&status_id=1&plan_release_id=103
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-10 6:21 ` Martin Uecker
@ 2022-11-10 10:09 ` Alejandro Colomar
2022-11-10 23:19 ` Joseph Myers
1 sibling, 0 replies; 69+ messages in thread
From: Alejandro Colomar @ 2022-11-10 10:09 UTC (permalink / raw)
To: Martin Uecker, Joseph Myers
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 6277 bytes --]
Hi Joseph and Martin!
On 11/10/22 07:21, Martin Uecker wrote:
> Am Donnerstag, den 10.11.2022, 01:39 +0000 schrieb Joseph Myers:
>> On Thu, 10 Nov 2022, Joseph Myers wrote:
>>
>>> On Thu, 10 Nov 2022, Alejandro Colomar via Gcc wrote:
>>>
>>>> I've shown the three kinds of prototypes that have been changed:
>>>>
>>>> - Normal VLA; nothing fancy except for the '.'.
>>>> - Complex size expressions.
>>>> - 'void *' VLAs (assuming GNU conventions: sizeof(void *)==1).
>>>
>>> That doesn't cover any of the tricky issues with such proposals, such as
>>> the choice of which entity is referred to by the parameter name when there
>>> are multiple nested parameter lists that use the same parameter name, or
>>> when the identifier is visible from an outer scope (including in
>>> particular the case where it's declared as a typedef name in an outer
>>> scope).
>>
>> In fact I can't tell from these examples whether you mean for a '.' token
>> after '[' to have special semantics, or whether you mean to have a special
>> '. identifier' form of expression valid in certain context (each of which
>> introduces its own complications; for the former, typedef names from outer
>> scopes are problematic; for the latter, it's designated initializers where
>> you get complications, for example). Designing new syntax that doesn't
>> cause ambiguity is generally tricky, and this sort of language extension
>> is the kind of thing where you'd expect to so through at least five
>> iterations of a WG14 paper before you have something like a sound
>> specification.
>
> I am not sure what Alejandro has in mind exactly, but my idea of using
> a new notation [.identifier] would be to limit it to accessing other
> parameter names in the same parameter list only, so that there is
>
> 1) no ambiguity what is referred to and
> 2) one can access parameters which come later
Yes, I implemented your idea. As always, I thought I had linked to it in the
commit message, but I didn't. Quite a bad thing for the commit that implements
a completely new feature to not point to the documentation/idea at all.
So, the documentation followed by these 3 patches is Martin's email:
<https://lore.kernel.org/linux-man/601680ae-30d7-1481-e152-034083f6dde1@gmail.com/T/#med2bdfcc31a3d0b3bc6c48b229c8d8dd5088935e>
It was sound in my head, and I couldn't see any inconsistencies.
- I implemented it with '.' as being restricted to refer to parameters of the
function being prototypes (commit 1).
- I also allowed complex expressions in the prototypes (commit 2), since it's
something that can be quite useful (that was already foreseen by Martin's idea,
IIRC). The most useful example that I have in my mind is a patch that I'm
developing for shadow-utils:
<https://github.com/shadow-maint/shadow/pull/569/files#diff-12b560bab6b4fb8f7f3a16f01aaa994de539a8bed3058c976be0daebe16405c1>
The gist of it is a function that gets a fixed-width non-NUL-terminated
string, and copies it into a NUL-terminated string in a buffer than has to be of
course +1 the size of the input string:
void buf2str(char dst[restrict .n+1], const char src[restrict .n],
size_t n);
- I extended the idea to apply to void[] (commit 3). Something not yet allowed
by GCC, but very useful IMO, especially for the mem...(3) functions. Since GNU
C consistently treats sizeof(void)==1, it makes sense to allow VLA syntax in
that way. This is not at all about allowing true VLAs of type void[]; that's
forbidden, and should continue to be forbidden. But since parameters are just
pointers, I don't see any issue with allowing false void[] VLAs in parameters
that really are void* in disguise.
The 3 commits are here (last 3 commits in that log):
<https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?id=c64cd13e002561c6802c6a1a1a8a640f034fea70>
Martin, please check if I implemented your idea faithfully. The 3 example
prototypes I showed are good representatives of what I added, so if you don't
understand man(7) source you could just read them and see if they make sense to
you; the rest of the changes are of the same kind. Or you could install the man
pages from the repo :)
>
> If we want to specify something like this, I think we should also
> restrict what kind of expressions one allows, e.g. it has to
> be side-effect free.
Well, yes, there should be no side effects; it would not make sense in a
prototype. I'd put it as simply as with _Generic(3) and similar stuff, where
the controlling expression is not evaluated for side effects. I never remember
about sizeof() or typeof(): I always need to consult if they have side effects
or not. I'll be documenting that in the man-pages soon.
> But maybe we want to make this even more
> restrictive (at least initially).
Yeah, you could go for an initial implementation that only supports my commit 1;
that would be the simplest. That would cover already the vast majority of
cases. But please consider commits 2 and 3 afterwards, since I believe they are
also of great importance.
>
> One problem with WG14 papers is that people put in too much,
> because the overhead is so high and the standard is not updated
> very often. It would be better to build such feature more
> incrementally, which could be done more easily with a compiler
> extension. One could start supporting just [.x] but not more
> complicated expressions.
>
> Later WG14 can still accept or reject or modify this proposal
> based on the experience we get.
Yeah, and I also think any WG14 papers with features as important as this one
without prior experience in a real compiler should be rejected. I don't think
it makes sense to standardize something just from theoretical discussions, and
force everyone to implement it afterwards. No matter how good the reviewers are.
>
> (I would also be happy with using GNU forward declarations, and
> I am not sure why people dislike them so much.)
For me, it's how easy it is to confuse a comma with a semicolon. Also,
unnecessarily long lines.
>
> Martin
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-10 9:40 ` G. Branden Robinson
@ 2022-11-10 10:59 ` Alejandro Colomar
2022-11-10 22:25 ` G. Branden Robinson
0 siblings, 1 reply; 69+ messages in thread
From: Alejandro Colomar @ 2022-11-10 10:59 UTC (permalink / raw)
To: G. Branden Robinson
Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 1807 bytes --]
Hi Branden!
On 11/10/22 10:40, G. Branden Robinson wrote:
> Hi Alex,
>
> At 2022-11-10T01:06:31+0100, Alejandro Colomar wrote:
>> Now, I've released man-pages-6.01 very recently (just a few weeks
>> ago), and I don't plan to release again in a year or two, so there's
>> time to do the implementation in GCC. From my side, please consider
>> this an ACK or even somewhat of a push to get things done in the
>> compiler side of things :)
>
> Do you mean you _don't_ plan to release again for a year or two?
>
> You know what Moltke said about plans and contact with the enemy. For
> one thing, I think the Linux kernel will move too fast to permit such a
> leisurely cadence.
Heh, at this point, I burnt my ships, by using enhanced VLA syntax. If I
release that before GCC, I'm expecting to see an avalanche of reports about it
(and I also expect that GCC and forums will receive a similar ammount). So yes,
I expect to wait some longish time.
>
> Also, as soon as Bertrand and I can get groff 1.23 out[1], I am hoping
> you will, shortly thereafter, migrate to the new `MR` macro.
Not as soon as it gets released, because I expect (at least a decent amount of)
contributors to be able to read the pages to which they contribute to, but as
soon as it makes it into Debian stable, yes, that's in my plans. So, if you
make it before the freeze, that means around a couple of months from now.
>
> <tents fingers, laughs villainously>
<also tents fingers, laughs villainously>
>
> Regards,
> Branden
>
> [1] Only 6 RC bugs left!
Looks good!
Cheers,
Alex
>
> https://savannah.gnu.org/bugs/index.php?go_report=Apply&group=groff&set=custom&report_id=225&status_id=1&plan_release_id=103
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-10 10:59 ` Alejandro Colomar
@ 2022-11-10 22:25 ` G. Branden Robinson
0 siblings, 0 replies; 69+ messages in thread
From: G. Branden Robinson @ 2022-11-10 22:25 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1: Type: text/plain, Size: 1533 bytes --]
Hi Alex,
At 2022-11-10T11:59:02+0100, Alejandro Colomar wrote:
> > You know what Moltke said about plans and contact with the enemy.
> > For one thing, I think the Linux kernel will move too fast to permit
> > such a leisurely cadence.
>
> Heh, at this point, I burnt my ships, by using enhanced VLA syntax.
> If I release that before GCC, I'm expecting to see an avalanche of
> reports about it (and I also expect that GCC and forums will receive a
> similar ammount). So yes, I expect to wait some longish time.
Hah, you rebutted my Moltke with your namesake. You understand that I'm
obligated to spring a reference to the Battle of Lepanto or something on
you at some point.
> > Also, as soon as Bertrand and I can get groff 1.23 out[1], I am
> > hoping you will, shortly thereafter, migrate to the new `MR` macro.
>
> Not as soon as it gets released, because I expect (at least a decent
> amount of) contributors to be able to read the pages to which they
> contribute to,
Laggardly adopters can always put this in man.local.
.if !d MR \{\
. de MR
. IR \\$1 (\\$2)\\$3
. .
.\}
> but as soon as it makes it into Debian stable, yes, that's in my
> plans. So, if you make it before the freeze, that means around a
> couple of months from now.
Yes. It is a major personal goal to get groff 1.23 into Debian
bookworm.
> > <tents fingers, laughs villainously>
>
> <also tents fingers, laughs villainously>
https://www.youtube.com/watch?v=VhH2egTLohM
Regards,
Branden
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-10 6:21 ` Martin Uecker
2022-11-10 10:09 ` Alejandro Colomar
@ 2022-11-10 23:19 ` Joseph Myers
2022-11-10 23:28 ` Alejandro Colomar
` (2 more replies)
1 sibling, 3 replies; 69+ messages in thread
From: Joseph Myers @ 2022-11-10 23:19 UTC (permalink / raw)
To: Martin Uecker
Cc: Alejandro Colomar, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
On Thu, 10 Nov 2022, Martin Uecker via Gcc wrote:
> One problem with WG14 papers is that people put in too much,
> because the overhead is so high and the standard is not updated
> very often. It would be better to build such feature more
> incrementally, which could be done more easily with a compiler
> extension. One could start supporting just [.x] but not more
> complicated expressions.
Even a compiler extension requires the level of detail of specification
that you get with a WG14 paper (and the level of work on finding bugs in
that specification), to avoid the problem we've had before with too many
features added in GCC 2.x days where a poorly defined feature is "whatever
the compiler accepts".
If you use .x as the notation but don't limit it to [.x], you have a
completely new ambiguity between ordinary identifiers and member names
struct s { int a; };
void f(int a, int b[((struct s) { .a = 1 }).a]);
where it's newly ambiguous whether ".a = 1" is an assignment to the
expression ".a" or a use of a designated initializer.
(I think that if you add any syntax for this, GNU VLA forward declarations
are clearly to be preferred to inventing something new like [.x] which
introduces its own problems.)
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-10 23:19 ` Joseph Myers
@ 2022-11-10 23:28 ` Alejandro Colomar
2022-11-11 19:52 ` Martin Uecker
2022-11-12 12:34 ` Alejandro Colomar
2 siblings, 0 replies; 69+ messages in thread
From: Alejandro Colomar @ 2022-11-10 23:28 UTC (permalink / raw)
To: Joseph Myers, Martin Uecker
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 2085 bytes --]
Hi Joseph,
On 11/11/22 00:19, Joseph Myers wrote:
> On Thu, 10 Nov 2022, Martin Uecker via Gcc wrote:
>
>> One problem with WG14 papers is that people put in too much,
>> because the overhead is so high and the standard is not updated
>> very often. It would be better to build such feature more
>> incrementally, which could be done more easily with a compiler
>> extension. One could start supporting just [.x] but not more
>> complicated expressions.
>
> Even a compiler extension requires the level of detail of specification
> that you get with a WG14 paper (and the level of work on finding bugs in
> that specification), to avoid the problem we've had before with too many
> features added in GCC 2.x days where a poorly defined feature is "whatever
> the compiler accepts".
>
> If you use .x as the notation but don't limit it to [.x], you have a
> completely new ambiguity between ordinary identifiers and member names
>
> struct s { int a; };
> void f(int a, int b[((struct s) { .a = 1 }).a]);
>
> where it's newly ambiguous whether ".a = 1" is an assignment to the
> expression ".a" or a use of a designated initializer.
>
> (I think that if you add any syntax for this, GNU VLA forward declarations
> are clearly to be preferred to inventing something new like [.x] which
> introduces its own problems.)
Yeah, I think limiting it to [.n] initially, and only moving forward, step by
step, if it's perfectly clear that it's doable seems very reasonable.
Re: GNU VLA fwd decl:
This example is what I'm worried about:
int foo(int a; int b[a], int a);
int foo(int a, int b[a], int o);
Okay, parameters should have more readable names... But still, it allows for a
high chance of wtf moments. However, I can think of a syntax very similar to
GNU's, that would make it a bit better in terms of readability: not declaring
the type in the fwd decl:
int foo(a; int b[a], int a);
int foo(int a, int b[a], int o);
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-10 23:19 ` Joseph Myers
2022-11-10 23:28 ` Alejandro Colomar
@ 2022-11-11 19:52 ` Martin Uecker
2022-11-12 1:09 ` Joseph Myers
2022-11-12 12:34 ` Alejandro Colomar
2 siblings, 1 reply; 69+ messages in thread
From: Martin Uecker @ 2022-11-11 19:52 UTC (permalink / raw)
To: Joseph Myers
Cc: Alejandro Colomar, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
Am Donnerstag, den 10.11.2022, 23:19 +0000 schrieb Joseph Myers:
> On Thu, 10 Nov 2022, Martin Uecker via Gcc wrote:
>
> > One problem with WG14 papers is that people put in too much,
> > because the overhead is so high and the standard is not updated
> > very often. It would be better to build such feature more
> > incrementally, which could be done more easily with a compiler
> > extension. One could start supporting just [.x] but not more
> > complicated expressions.
>
> Even a compiler extension requires the level of detail of specification
> that you get with a WG14 paper (and the level of work on finding bugs in
> that specification), to avoid the problem we've had before with too many
> features added in GCC 2.x days where a poorly defined feature is "whatever
> the compiler accepts".
I think the effort needed to specify the feature correctly
can be minimized by making the first version of the feature
as simple as possible.
> If you use .x as the notation but don't limit it to [.x], you have a
> completely new ambiguity between ordinary identifiers and member names
>
> struct s { int a; };
> void f(int a, int b[((struct s) { .a = 1 }).a]);
>
> where it's newly ambiguous whether ".a = 1" is an assignment to the
> expression ".a" or a use of a designated initializer.
If we only allowed [ . a ] then this example would not be allowed.
If need more flexibility, we could incrementally extend it.
> (I think that if you add any syntax for this, GNU VLA forward declarations
> are clearly to be preferred to inventing something new like [.x] which
> introduces its own problems.)
I also prefer this.
I proposed forward declarations but WG14 and also people in this
discussion did not like them. If we would actually start using
them, we could propose them again for the next revision.
Martin
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-11 19:52 ` Martin Uecker
@ 2022-11-12 1:09 ` Joseph Myers
2022-11-12 7:24 ` Martin Uecker
0 siblings, 1 reply; 69+ messages in thread
From: Joseph Myers @ 2022-11-12 1:09 UTC (permalink / raw)
To: Martin Uecker
Cc: Alejandro Colomar, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
On Fri, 11 Nov 2022, Martin Uecker via Gcc wrote:
> > Even a compiler extension requires the level of detail of specification
> > that you get with a WG14 paper (and the level of work on finding bugs in
> > that specification), to avoid the problem we've had before with too many
> > features added in GCC 2.x days where a poorly defined feature is "whatever
> > the compiler accepts".
>
> I think the effort needed to specify the feature correctly
> can be minimized by making the first version of the feature
> as simple as possible.
The version of constexpr in the current C2x working draft is more or less
as simple as possible. It also went through lots of revisions to get
there. I'm currently testing an implementation of C2x constexpr for GCC
13, and there are still several issues with the specification I found in
the implementation process, beyond those raised in WG14 discussions, for
which I'll need to raise NB comments to clarify things.
I think that illustrates that you need the several iterations on the
specification process, *and* making it as simple as possible, *and*
getting implementation experience, *and* the implementation experience
being with a close eye to what it implies for all the details in the
specification rather than just getting something vaguely functional but
not clearly specified.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-12 1:09 ` Joseph Myers
@ 2022-11-12 7:24 ` Martin Uecker
0 siblings, 0 replies; 69+ messages in thread
From: Martin Uecker @ 2022-11-12 7:24 UTC (permalink / raw)
To: Joseph Myers
Cc: Alejandro Colomar, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
Am Samstag, den 12.11.2022, 01:09 +0000 schrieb Joseph Myers:
> On Fri, 11 Nov 2022, Martin Uecker via Gcc wrote:
>
> > > Even a compiler extension requires the level of detail of specification
> > > that you get with a WG14 paper (and the level of work on finding bugs in
> > > that specification), to avoid the problem we've had before with too many
> > > features added in GCC 2.x days where a poorly defined feature is "whatever
> > > the compiler accepts".
> >
> > I think the effort needed to specify the feature correctly
> > can be minimized by making the first version of the feature
> > as simple as possible.
>
> The version of constexpr in the current C2x working draft is more or less
> as simple as possible. It also went through lots of revisions to get
> there. I'm currently testing an implementation of C2x constexpr for GCC
> 13, and there are still several issues with the specification I found in
> the implementation process, beyond those raised in WG14 discussions, for
> which I'll need to raise NB comments to clarify things.
constexpr had no implementation experience in C at all and
always suspected that C++ experience should somehow count is
not really justified.
> I think that illustrates that you need the several iterations on the
> specification process, *and* making it as simple as possible, *and*
> getting implementation experience, *and* the implementation experience
> being with a close eye to what it implies for all the details in the
> specification rather than just getting something vaguely functional but
> not clearly specified.
I agree. We should work on specification and on prototyping
new features in parallel.
Martin
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-10 23:19 ` Joseph Myers
2022-11-10 23:28 ` Alejandro Colomar
2022-11-11 19:52 ` Martin Uecker
@ 2022-11-12 12:34 ` Alejandro Colomar
2022-11-12 12:46 ` Alejandro Colomar
2022-11-12 13:03 ` Joseph Myers
2 siblings, 2 replies; 69+ messages in thread
From: Alejandro Colomar @ 2022-11-12 12:34 UTC (permalink / raw)
To: Joseph Myers, Martin Uecker
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 2166 bytes --]
Hi Joseph,
On 11/11/22 00:19, Joseph Myers wrote:
> On Thu, 10 Nov 2022, Martin Uecker via Gcc wrote:
>
>> One problem with WG14 papers is that people put in too much,
>> because the overhead is so high and the standard is not updated
>> very often. It would be better to build such feature more
>> incrementally, which could be done more easily with a compiler
>> extension. One could start supporting just [.x] but not more
>> complicated expressions.
>
> Even a compiler extension requires the level of detail of specification
> that you get with a WG14 paper (and the level of work on finding bugs in
> that specification), to avoid the problem we've had before with too many
> features added in GCC 2.x days where a poorly defined feature is "whatever
> the compiler accepts".
>
> If you use .x as the notation but don't limit it to [.x], you have a
> completely new ambiguity between ordinary identifiers and member names
>
> struct s { int a; };
> void f(int a, int b[((struct s) { .a = 1 }).a]);
Is it really ambiguous? Let's show some currently-valid code:
struct s {
int a;
};
struct t {
struct s s;
int a;
};
void f(void)
{
struct t x = {
.a = 1,
.s = {
.a = ((struct s) {.a = 1}).a,
},
};
}
It is ambiguous to a human reader, but that's a subjective thing, and of course
shadowing should be avoided by programmers. However, for a compiler, scoping
and syntax rules should be unambiguous, I think. In your code example, I
believe it is unambiguous that both '.a' refer to the struct member.
But maybe we're not considering more complex situations that might really be
ambiguous to the compiler, so a first round of supporting only [.a] would be a
good first implementation.
>
> where it's newly ambiguous whether ".a = 1" is an assignment to the
> expression ".a" or a use of a designated initializer.
>
> (I think that if you add any syntax for this, GNU VLA forward declarations
> are clearly to be preferred to inventing something new like [.x] which
> introduces its own problems.)
>
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-12 12:34 ` Alejandro Colomar
@ 2022-11-12 12:46 ` Alejandro Colomar
2022-11-12 13:03 ` Joseph Myers
1 sibling, 0 replies; 69+ messages in thread
From: Alejandro Colomar @ 2022-11-12 12:46 UTC (permalink / raw)
To: Joseph Myers, Martin Uecker
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 1004 bytes --]
On 11/12/22 13:34, Alejandro Colomar wrote:
> struct s {
> int a;
> };
>
> struct t {
> struct s s;
> int a;
> };
>
> void f(void)
> {
> struct t x = {
> .a = 1,
> .s = {
> .a = ((struct s) {.a = 1}).a,
> },
> };
> }
From here, a demonstration of what I understood from Martin's email is that
there's also an idea of allowing the following:
struct s {
int a;
int b;
};
struct t {
struct s s;
int a;
int b;
};
void f(void)
{
struct t x = {
.a = 1,
.s = {
// In the following line, .b=.a is assigning 2
.a = ((struct s) {.a = 2, .b = .a}).b,
// The previous line assigned 2, since the compound had 2 in .b
},
// In the following line, .b=.a is assigning 1
.b = .a,
};
}
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-12 12:34 ` Alejandro Colomar
2022-11-12 12:46 ` Alejandro Colomar
@ 2022-11-12 13:03 ` Joseph Myers
2022-11-12 13:40 ` Alejandro Colomar
1 sibling, 1 reply; 69+ messages in thread
From: Joseph Myers @ 2022-11-12 13:03 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
On Sat, 12 Nov 2022, Alejandro Colomar via Gcc wrote:
> > struct s { int a; };
> > void f(int a, int b[((struct s) { .a = 1 }).a]);
>
> Is it really ambiguous? Let's show some currently-valid code:
Well, I still don't know what the syntax addition you propose is. Is it
postfix-expression : . identifier
(with a special rule about how the identifier is interpreted, different
from the normal scope rules)? If so, then ".a = 1" could either match
assignment-expression directly (assigning to the postfix-expression ".a").
Or it could match designation[opt] initializer, where ".a" is a
designator. And as I've noted many times in discussions of C2x proposals
on the WG14 reflector, if some sequence of tokens can match the syntax in
more than one way, there always needs to be explicit normative text to
disambiguate the intended parse - it's not enough that one parse might
lead later to a violation of some other constraint (not that either parse
leads to a constraint violation in this case).
Or is the syntax
array-declarator : direct-declarator [ . assignment-expression ]
(with appropriate variants with static and type-qualifier-list and for
array-abstract-declarator as well, and with different identifier
interpretation rules inside the assignment-expression)? If so, then there
are big problems parsing [ . ( a ) + ( b ) ], where 'a' is a typedef name
in an outer scope, because the appropriate parse would depend on whether
'a' is shadowed by a parameter - unless of course you add appropriate
wording like that present in some places about not being able to use this
syntax to shadow a typedef name.
Or is it just
array-declarator : direct-declarator [ . identifier ]
which might avoid some of these problems at the expense of being less
expressive?
If you're proposing a C syntax addition, you always need to be clear about
exactly what the new cases in the syntax would be, and how you resolve
ambiguities with any other existing part of the syntax, how you interact
with rules on scopes, namespaces and linkage of identifiers, etc.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-12 13:03 ` Joseph Myers
@ 2022-11-12 13:40 ` Alejandro Colomar
2022-11-12 13:58 ` Alejandro Colomar
2022-11-12 14:54 ` Joseph Myers
0 siblings, 2 replies; 69+ messages in thread
From: Alejandro Colomar @ 2022-11-12 13:40 UTC (permalink / raw)
To: Joseph Myers
Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 3876 bytes --]
Hi Joseph,
On 11/12/22 14:03, Joseph Myers wrote:
> On Sat, 12 Nov 2022, Alejandro Colomar via Gcc wrote:
>
>>> struct s { int a; };
>>> void f(int a, int b[((struct s) { .a = 1 }).a]);
>>
>> Is it really ambiguous? Let's show some currently-valid code:
>
> Well, I still don't know what the syntax addition you propose is. Is it
>
> postfix-expression : . identifier
I'll try to explain it in standardeese, but I'm not sure if I'll get it right,
so I'll accompany it with plain English.
Maybe Martin can help.
Since it's to be used as an rvalue, not as a lvalue, I guess a
postfix-expression wouldn't be the right one.
>
> (with a special rule about how the identifier is interpreted, different
> from the normal scope rules)? If so, then ".a = 1" could either match
> assignment-expression directly (assigning to the postfix-expression ".a").
No, assigning to a function parameter from within another parameter declaration
wouldn't make sense. They should be readonly. Side effects should be
forbidden, I think.
> Or it could match designation[opt] initializer, where ".a" is a
> designator. And as I've noted many times in discussions of C2x proposals
> on the WG14 reflector, if some sequence of tokens can match the syntax in
> more than one way, there always needs to be explicit normative text to
> disambiguate the intended parse - it's not enough that one parse might
> lead later to a violation of some other constraint (not that either parse
> leads to a constraint violation in this case).
>
> Or is the syntax
>
> array-declarator : direct-declarator [ . assignment-expression ]
Not good either. The '.' should prefix the identifier, not the expression. So,
for example, you would have:
void *bsearch(const void key[.size], const void base[.size * .nmemb],
size_t nmemb, size_t size,
int (*compar)(const void [.size], const void [.size]));
That's taken from the current manual page from git HEAD. See 'base', which gets
its size from the multiplication of 'size' and 'nmemb'.
>
> (with appropriate variants with static and type-qualifier-list and for
> array-abstract-declarator as well, and with different identifier
> interpretation rules inside the assignment-expression)? If so, then there
> are big problems parsing [ . ( a ) + ( b ) ], where 'a' is a typedef name
> in an outer scope, because the appropriate parse would depend on whether
> 'a' is shadowed by a parameter - unless of course you add appropriate
> wording like that present in some places about not being able to use this
> syntax to shadow a typedef name.
>
> Or is it just
>
> array-declarator : direct-declarator [ . identifier ]
For the initial implementation, it would be, I think.
>
> which might avoid some of these problems at the expense of being less
> expressive?
Yes.
>
> If you're proposing a C syntax addition, you always need to be clear about
> exactly what the new cases in the syntax would be, and how you resolve
> ambiguities with any other existing part of the syntax, how you interact
> with rules on scopes, namespaces and linkage of identifiers, etc.
Yeah, I'll try.
I think that the complete feature would allow 'designator' to be used within
unary-expression:
unary-expression: designator
Since sizeof(foo) is a unary-expression and you can't assign to it, I'm guessing
that similar rules could be used for '.size'.
That would have the effect of allowing both features suggested by Martin: being
able to used designators in both structures (as demonstrated in my last email)
and function prototypes (as in the thing we're discussing).
I hope I got it right. I'm not used to lexical grammar so much.
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-12 13:40 ` Alejandro Colomar
@ 2022-11-12 13:58 ` Alejandro Colomar
2022-11-12 14:54 ` Joseph Myers
1 sibling, 0 replies; 69+ messages in thread
From: Alejandro Colomar @ 2022-11-12 13:58 UTC (permalink / raw)
To: Joseph Myers
Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 4271 bytes --]
On 11/12/22 14:40, Alejandro Colomar wrote:
> Hi Joseph,
>
> On 11/12/22 14:03, Joseph Myers wrote:
>> On Sat, 12 Nov 2022, Alejandro Colomar via Gcc wrote:
>>
>>>> struct s { int a; };
>>>> void f(int a, int b[((struct s) { .a = 1 }).a]);
>>>
>>> Is it really ambiguous? Let's show some currently-valid code:
>>
>> Well, I still don't know what the syntax addition you propose is. Is it
>>
>> postfix-expression : . identifier
>
> I'll try to explain it in standardeese, but I'm not sure if I'll get it right,
> so I'll accompany it with plain English.
>
> Maybe Martin can help.
>
> Since it's to be used as an rvalue, not as a lvalue, I guess a
> postfix-expression wouldn't be the right one.
>
>>
>> (with a special rule about how the identifier is interpreted, different
>> from the normal scope rules)? If so, then ".a = 1" could either match
>> assignment-expression directly (assigning to the postfix-expression ".a").
>
> No, assigning to a function parameter from within another parameter declaration
> wouldn't make sense. They should be readonly. Side effects should be
> forbidden, I think.
>
>> Or it could match designation[opt] initializer, where ".a" is a
>> designator. And as I've noted many times in discussions of C2x proposals
>> on the WG14 reflector, if some sequence of tokens can match the syntax in
>> more than one way, there always needs to be explicit normative text to
>> disambiguate the intended parse - it's not enough that one parse might
>> lead later to a violation of some other constraint (not that either parse
>> leads to a constraint violation in this case).
>>
>> Or is the syntax
>>
>> array-declarator : direct-declarator [ . assignment-expression ]
>
> Not good either. The '.' should prefix the identifier, not the expression. So,
> for example, you would have:
>
> void *bsearch(const void key[.size], const void base[.size * .nmemb],
> size_t nmemb, size_t size,
> int (*compar)(const void [.size], const void [.size]));
>
> That's taken from the current manual page from git HEAD. See 'base', which gets
> its size from the multiplication of 'size' and 'nmemb'.
>
>>
>> (with appropriate variants with static and type-qualifier-list and for
>> array-abstract-declarator as well, and with different identifier
>> interpretation rules inside the assignment-expression)? If so, then there
>> are big problems parsing [ . ( a ) + ( b ) ], where 'a' is a typedef name
>> in an outer scope, because the appropriate parse would depend on whether
>> 'a' is shadowed by a parameter - unless of course you add appropriate
>> wording like that present in some places about not being able to use this
>> syntax to shadow a typedef name.
>>
>> Or is it just
>>
>> array-declarator : direct-declarator [ . identifier ]
>
> For the initial implementation, it would be, I think.
>
>>
>> which might avoid some of these problems at the expense of being less
>> expressive?
>
> Yes.
>
>>
>> If you're proposing a C syntax addition, you always need to be clear about
>> exactly what the new cases in the syntax would be, and how you resolve
>> ambiguities with any other existing part of the syntax, how you interact
>> with rules on scopes, namespaces and linkage of identifiers, etc.
>
> Yeah, I'll try.
>
> I think that the complete feature would allow 'designator' to be used within
> unary-expression:
>
> unary-expression: designator
Some mistake I did: Since enum designators don't make sense in this feature, it
should only be:
unary-expression: . identifier
>
> Since sizeof(foo) is a unary-expression and you can't assign to it, I'm guessing
> that similar rules could be used for '.size'.
>
>
> That would have the effect of allowing both features suggested by Martin: being
> able to used designators in both structures (as demonstrated in my last email)
> and function prototypes (as in the thing we're discussing).
>
> I hope I got it right. I'm not used to lexical grammar so much.
>
> Cheers,
>
> Alex
>
>
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-12 13:40 ` Alejandro Colomar
2022-11-12 13:58 ` Alejandro Colomar
@ 2022-11-12 14:54 ` Joseph Myers
2022-11-12 15:35 ` Alejandro Colomar
2022-11-12 15:56 ` Martin Uecker
1 sibling, 2 replies; 69+ messages in thread
From: Joseph Myers @ 2022-11-12 14:54 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
On Sat, 12 Nov 2022, Alejandro Colomar via Gcc wrote:
> Since it's to be used as an rvalue, not as a lvalue, I guess a
> postfix-expression wouldn't be the right one.
Several forms of postfix-expression are only rvalues.
> > (with a special rule about how the identifier is interpreted, different
> > from the normal scope rules)? If so, then ".a = 1" could either match
> > assignment-expression directly (assigning to the postfix-expression ".a").
>
> No, assigning to a function parameter from within another parameter
> declaration wouldn't make sense. They should be readonly. Side effects
> should be forbidden, I think.
Such assignments are already allowed. In a function definition, the side
effects (including in size expressions for array parameters adjusted to
pointers) take place before entry to the function body.
And, in any case, if you did have a constraint disallowing such
assignments, it wouldn't suffice for syntactic disambiguation (see the
previous point I made about that; I have some rough notes towards a WG14
paper on syntactic disambiguation, but haven't converted them into a
coherent paper).
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-12 14:54 ` Joseph Myers
@ 2022-11-12 15:35 ` Alejandro Colomar
2022-11-12 17:02 ` Joseph Myers
2022-11-12 15:56 ` Martin Uecker
1 sibling, 1 reply; 69+ messages in thread
From: Alejandro Colomar @ 2022-11-12 15:35 UTC (permalink / raw)
To: Joseph Myers
Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 1768 bytes --]
Hi Joseph,
On 11/12/22 15:54, Joseph Myers wrote:
> On Sat, 12 Nov 2022, Alejandro Colomar via Gcc wrote:
>
>> Since it's to be used as an rvalue, not as a lvalue, I guess a
>> postfix-expression wouldn't be the right one.
>
> Several forms of postfix-expression are only rvalues.
>
>>> (with a special rule about how the identifier is interpreted, different
>>> from the normal scope rules)? If so, then ".a = 1" could either match
>>> assignment-expression directly (assigning to the postfix-expression ".a").
>>
>> No, assigning to a function parameter from within another parameter
>> declaration wouldn't make sense. They should be readonly. Side effects
>> should be forbidden, I think.
>
> Such assignments are already allowed. In a function definition, the side
> effects (including in size expressions for array parameters adjusted to
> pointers) take place before entry to the function body.
Then, I'm guessing that rules need to change in a way that .initializer cannot
appear as the left operand of an assignment-expression.
That is, for the following current definition of the assignment-expression (as
of N3054):
assignment-expression:
conditional-expression
unary-expression assignment-operator assignment-expression
The unary-expression cannot be formed by a .initializer.
Would that be doable and sufficient?
Cheers,
Alex
>
> And, in any case, if you did have a constraint disallowing such
> assignments, it wouldn't suffice for syntactic disambiguation (see the
> previous point I made about that; I have some rough notes towards a WG14
> paper on syntactic disambiguation, but haven't converted them into a
> coherent paper).
>
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-12 14:54 ` Joseph Myers
2022-11-12 15:35 ` Alejandro Colomar
@ 2022-11-12 15:56 ` Martin Uecker
2022-11-13 13:19 ` Alejandro Colomar
1 sibling, 1 reply; 69+ messages in thread
From: Martin Uecker @ 2022-11-12 15:56 UTC (permalink / raw)
To: Joseph Myers, Alejandro Colomar
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
Am Samstag, den 12.11.2022, 14:54 +0000 schrieb Joseph Myers:
> On Sat, 12 Nov 2022, Alejandro Colomar via Gcc wrote:
>
> > Since it's to be used as an rvalue, not as a lvalue, I guess a
> > postfix-expression wouldn't be the right one.
>
> Several forms of postfix-expression are only rvalues.
>
> > > (with a special rule about how the identifier is interpreted, different
> > > from the normal scope rules)? If so, then ".a = 1" could either match
> > > assignment-expression directly (assigning to the postfix-expression ".a").
> >
> > No, assigning to a function parameter from within another parameter
> > declaration wouldn't make sense. They should be readonly. Side effects
> > should be forbidden, I think.
>
> Such assignments are already allowed. In a function definition, the side
> effects (including in size expressions for array parameters adjusted to
> pointers) take place before entry to the function body.
>
> And, in any case, if you did have a constraint disallowing such
> assignments, it wouldn't suffice for syntactic disambiguation (see the
> previous point I made about that; I have some rough notes towards a WG14
> paper on syntactic disambiguation, but haven't converted them into a
> coherent paper).
My idea was to only allow
array-declarator : direct-declarator [ . identifier ]
and only for parameter (not nested inside structs declared
in parameter list) as a first step because it seems this
would exclude all difficult cases.
But if we need to allow more complicated expressions, then
it starts getting more complicated.
One could could allow more generic expressions, and
specify that the .identifier refers to a
parameter in
the nearest lexically enclosing parameter list or
struct/union.
Then
void foo(struct bar { int x; char c[.x] } a, int x);
would not be allowed (which is good because then we
could later use the syntax also inside structs). If
we apply scoping rules, the following would work:
struct bar { int y; };
void foo(char p[((struct bar){ .y = .x }).y], int x);
But not:
struct bar { int y; };
void foo(char p[((struct bar){ .y = .y }).y], int y);
But there are not only syntactical problems, because
also the type of the parameter might become relevant
and then you can get circular dependencies:
void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);
I am not sure what would the best way to fix it. One
could specifiy that parameters referred to by
the .identifer syntax must of some integer type and
that the sub-expression .identifer is always
converted to a 'size_t'.
Maybe one should also add a constraint that all new
type length expressions, i.e. using the syntax,
can not have side effects. Or even that they follow
all the rules of integer constant expressions with
the fictitious assumption that all . identifer
sub-expressions are integer constant expressions.
The rationale being that this would facilitate
compile time reasoning about length expressions.
Martin
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-12 15:35 ` Alejandro Colomar
@ 2022-11-12 17:02 ` Joseph Myers
2022-11-12 17:08 ` Alejandro Colomar
0 siblings, 1 reply; 69+ messages in thread
From: Joseph Myers @ 2022-11-12 17:02 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
On Sat, 12 Nov 2022, Alejandro Colomar via Gcc wrote:
> > > No, assigning to a function parameter from within another parameter
> > > declaration wouldn't make sense. They should be readonly. Side effects
> > > should be forbidden, I think.
> >
> > Such assignments are already allowed. In a function definition, the side
> > effects (including in size expressions for array parameters adjusted to
> > pointers) take place before entry to the function body.
>
> Then, I'm guessing that rules need to change in a way that .initializer cannot
> appear as the left operand of an assignment-expression.
I think needing such a very special case rule tends to indicate that some
alternative syntax, not needing such a rule, would be better.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-12 17:02 ` Joseph Myers
@ 2022-11-12 17:08 ` Alejandro Colomar
0 siblings, 0 replies; 69+ messages in thread
From: Alejandro Colomar @ 2022-11-12 17:08 UTC (permalink / raw)
To: Joseph Myers
Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 1244 bytes --]
On 11/12/22 18:02, Joseph Myers wrote:
> On Sat, 12 Nov 2022, Alejandro Colomar via Gcc wrote:
>
>>>> No, assigning to a function parameter from within another parameter
>>>> declaration wouldn't make sense. They should be readonly. Side effects
>>>> should be forbidden, I think.
>>>
>>> Such assignments are already allowed. In a function definition, the side
>>> effects (including in size expressions for array parameters adjusted to
>>> pointers) take place before entry to the function body.
>>
>> Then, I'm guessing that rules need to change in a way that .initializer cannot
>> appear as the left operand of an assignment-expression.
>
> I think needing such a very special case rule tends to indicate that some
> alternative syntax, not needing such a rule, would be better.
Well, by not being an lvalue, it can't be assigned to. That would be somewhat
like sizeof(identifier), which is also a unary-expression, so it's not so much
of a special case, is it?
void f(size_t s, int a[sizeof(1) = 1]); // constraint violation
void g(size_t s, int a[.s = 1]); // Also constraint violation
void h(size_t s, int a[s = 1]); // This is fine
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-12 15:56 ` Martin Uecker
@ 2022-11-13 13:19 ` Alejandro Colomar
2022-11-13 13:33 ` Alejandro Colomar
2022-11-14 17:52 ` Joseph Myers
0 siblings, 2 replies; 69+ messages in thread
From: Alejandro Colomar @ 2022-11-13 13:19 UTC (permalink / raw)
To: Martin Uecker, Joseph Myers
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 6674 bytes --]
Hi Martin!
On 11/12/22 16:56, Martin Uecker wrote:
> Am Samstag, den 12.11.2022, 14:54 +0000 schrieb Joseph Myers:
>> On Sat, 12 Nov 2022, Alejandro Colomar via Gcc wrote:
>>
>>> Since it's to be used as an rvalue, not as a lvalue, I guess a
>>> postfix-expression wouldn't be the right one.
>>
>> Several forms of postfix-expression are only rvalues.
>>
>>>> (with a special rule about how the identifier is interpreted, different
>>>> from the normal scope rules)? If so, then ".a = 1" could either match
>>>> assignment-expression directly (assigning to the postfix-expression ".a").
>>>
>>> No, assigning to a function parameter from within another parameter
>>> declaration wouldn't make sense. They should be readonly. Side effects
>>> should be forbidden, I think.
>>
>> Such assignments are already allowed. In a function definition, the side
>> effects (including in size expressions for array parameters adjusted to
>> pointers) take place before entry to the function body.
>>
>> And, in any case, if you did have a constraint disallowing such
>> assignments, it wouldn't suffice for syntactic disambiguation (see the
>> previous point I made about that; I have some rough notes towards a WG14
>> paper on syntactic disambiguation, but haven't converted them into a
>> coherent paper).
>
> My idea was to only allow
>
> array-declarator : direct-declarator [ . identifier ]
>
> and only for parameter (not nested inside structs declared
> in parameter list) as a first step because it seems this
> would exclude all difficult cases.
>
> But if we need to allow more complicated expressions, then
> it starts getting more complicated.
Ahh, I guess my work in documenting the man-pages prototypes got me thinking of
those extensions to the idea. I don't remember all the details :)
>
> One could could allow more generic expressions, and
> specify that the .identifier refers to a
> parameter in
> the nearest lexically enclosing parameter list or
> struct/union.
>
> Then
>
> void foo(struct bar { int x; char c[.x] } a, int x);
>
> would not be allowed (which is good because then we
> could later use the syntax also inside structs). If
> we apply scoping rules, the following would work:
>
> struct bar { int y; };
> void foo(char p[((struct bar){ .y = .x }).y], int x);
Makes sense.
>
> But not:
>
> struct bar { int y; };
> void foo(char p[((struct bar){ .y = .y }).y], int y);
Although it clearly is nonsense, I'm not sure I'd make it a constraint
violation, but rather Undefined Behavior. How is it different than this?:
$ cat foo.c
int main(void)
{
int i = i;
return i;
}
$ gcc --version | head -n1
gcc (Debian 12.2.0-9) 12.2.0
$ gcc -Wall -Wextra -Werror foo.c
$
$ clang --version | head -n1
Debian clang version 14.0.6
$ clang -Wall -Wextra -Werror foo.c
foo.c:3:10: error: variable 'i' is uninitialized when used within its own
initialization [-Werror,-Wuninitialized]
int i = i;
~ ^
1 error generated.
BTW, I just freaked out that GCC can't catch this trivial bug. Should I open a
bug report?
>
>
> But there are not only syntactical problems, because
> also the type of the parameter might become relevant
> and then you can get circular dependencies:
>
> void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);
This seems to be a difficult stone in the road.
>
> I am not sure what would the best way to fix it. One
> could specifiy that parameters referred to by
> the .identifer syntax must of some integer type and
> that the sub-expression .identifer is always
> converted to a 'size_t'.
That makes sense, but then overnight some quite useful thing came to my mind
that would not be possible with this limitation:
<https://software.codidact.com/posts/285946>
char *
stpecpy(char dst[.end - .dst], char *src, char end[1])
{
for (/* void */; dst <= end; dst++) {
*dst = *src++;
if (*dst == '\0')
return dst;
}
/* Truncation detected */
*end = '\0';
#if !defined(NDEBUG)
/* Consume the rest of the input string. */
while (*src++) {};
#endif
return end + 1;
}
stpecpy() is a function similar to strlcat(3) that gets a pointer to the end of
the array instead of the size of the buffer. This allows chaining without
having performance issues[1].
[1]: <https://en.wikichip.org/wiki/schlemiel_the_painter%27s_algorithm>
Maybe allowing integral types and pointers would be enough. However, foreseeing
that the _Lengthof() proposal (BTW, which paper was it?) will succeed, and
combining it with this one, _Lengthof(pointer) would ideally give the length of
the array, so allowing pointers would conflict.
My solution is to disallow sizeof() and _Lengthof() on .identifier. That could
be done simply by saying that variably-modified types (VMT) are incomplete types
until immediately after the comma that follows the parameter declaration.
Therefore it would be allowed only in the same way as it is allowed right now
with the normal syntax (i.e., after the parameter has been seen).
BTW, what was the number of the latest paper for _Lengthof() and what happened
to it? I guess it's likely to be added to C3x, isn't it?
And another BTW: there's some kind of consistency in (some) projects for naming
sizes, and I have pending a review of the Linux man-pages to make it consistent
there too.
See the following table of usual conventions:
Operator/macro: variable names; Description.
------------------------------|------------------|---------------------
strlen(3): length, len, l; String length.
sizeof(): size, sz, nbytes; Identifier size in bytes.
nitems(), nelems(): n, nelem, nitems; Array number of elements.
sizeof_array(), array_bytes(): size, sz, nbytes; Array size in bytes.
Naming _Lengthof() the operator that gets the number of elements in an array
would create naming confusion, since then length can mean two different things.
I suggest _Nitemsof().
>
> Maybe one should also add a constraint that all new
> type length expressions, i.e. using the syntax,
> can not have side effects. Or even that they follow
> all the rules of integer constant expressions with
> the fictitious assumption that all . identifer
> sub-expressions are integer constant expressions.
> The rationale being that this would facilitate
> compile time reasoning about length expressions.
>
>
> Martin
>
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-13 13:19 ` Alejandro Colomar
@ 2022-11-13 13:33 ` Alejandro Colomar
2022-11-13 14:02 ` Alejandro Colomar
2022-11-14 17:52 ` Joseph Myers
1 sibling, 1 reply; 69+ messages in thread
From: Alejandro Colomar @ 2022-11-13 13:33 UTC (permalink / raw)
To: Martin Uecker, Joseph Myers
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 2080 bytes --]
Hi Martin,
On 11/13/22 14:19, Alejandro Colomar wrote:
>> But there are not only syntactical problems, because
>> also the type of the parameter might become relevant
>> and then you can get circular dependencies:
>>
>> void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);
>
> This seems to be a difficult stone in the road.
>
>>
>> I am not sure what would the best way to fix it. One
>> could specifiy that parameters referred to by
>> the .identifer syntax must of some integer type and
>> that the sub-expression .identifer is always
>> converted to a 'size_t'.
>
> That makes sense, but then overnight some quite useful thing came to my mind
> that would not be possible with this limitation:
>
>
> <https://software.codidact.com/posts/285946>
>
> char *
> stpecpy(char dst[.end - .dst], char *src, char end[1])
> {
> for (/* void */; dst <= end; dst++) {
> *dst = *src++;
> if (*dst == '\0')
> return dst;
> }
> /* Truncation detected */
> *end = '\0';
>
> #if !defined(NDEBUG)
> /* Consume the rest of the input string. */
> while (*src++) {};
> #endif
>
> return end + 1;
> }
And I forgot to say it: Default promotions rank high (probably the highest) in
my list of most hated features^Wbugs in C. I wouldn't convert it to size_t, but
rather follow normal promotion rules.
Since you can use anything between INTMAX_MIN and UINTMAX_MAX for accessing an
array (which took me some time to understand), I'd also allow the same here.
So, the type of the expression between [] could perfectly be signed or unsigned.
So, you could use size_t for very high indices, or e.g. ptrdiff_t if you want to
allow negative numbers. In the function above, since dst can be a pointer to
one-past-the-end (it represents a previous truncation; that's why the test
dst<=end), forcing a size_t conversion would disallow that syntax.
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-13 13:33 ` Alejandro Colomar
@ 2022-11-13 14:02 ` Alejandro Colomar
2022-11-13 14:58 ` Martin Uecker
0 siblings, 1 reply; 69+ messages in thread
From: Alejandro Colomar @ 2022-11-13 14:02 UTC (permalink / raw)
To: Martin Uecker, Joseph Myers
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 2405 bytes --]
On 11/13/22 14:33, Alejandro Colomar wrote:
> Hi Martin,
>
> On 11/13/22 14:19, Alejandro Colomar wrote:
>>> But there are not only syntactical problems, because
>>> also the type of the parameter might become relevant
>>> and then you can get circular dependencies:
>>>
>>> void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);
>>
>> This seems to be a difficult stone in the road.
>>
>>>
>>> I am not sure what would the best way to fix it. One
>>> could specifiy that parameters referred to by
>>> the .identifer syntax must of some integer type and
>>> that the sub-expression .identifer is always
>>> converted to a 'size_t'.
>>
>> That makes sense, but then overnight some quite useful thing came to my mind
>> that would not be possible with this limitation:
>>
>>
>> <https://software.codidact.com/posts/285946>
>>
>> char *
>> stpecpy(char dst[.end - .dst], char *src, char end[1])
Heh, I got an off-by-one error. It should be dst[.end - .dst + 1], of course,
and then the result of the whole expression would be 0, which is fine as size_t.
So, never mind.
>> {
>> for (/* void */; dst <= end; dst++) {
>> *dst = *src++;
>> if (*dst == '\0')
>> return dst;
>> }
>> /* Truncation detected */
>> *end = '\0';
>>
>> #if !defined(NDEBUG)
>> /* Consume the rest of the input string. */
>> while (*src++) {};
>> #endif
>>
>> return end + 1;
>> }
>
> And I forgot to say it: Default promotions rank high (probably the highest) in
> my list of most hated features^Wbugs in C. I wouldn't convert it to size_t, but
> rather follow normal promotion rules.
>
> Since you can use anything between INTMAX_MIN and UINTMAX_MAX for accessing an
> array (which took me some time to understand), I'd also allow the same here. So,
> the type of the expression between [] could perfectly be signed or unsigned.
>
> So, you could use size_t for very high indices, or e.g. ptrdiff_t if you want to
> allow negative numbers. In the function above, since dst can be a pointer to
> one-past-the-end (it represents a previous truncation; that's why the test
> dst<=end), forcing a size_t conversion would disallow that syntax.
>
> Cheers,
>
> Alex
>
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-13 14:02 ` Alejandro Colomar
@ 2022-11-13 14:58 ` Martin Uecker
2022-11-13 15:15 ` Alejandro Colomar
2022-11-28 23:18 ` Alex Colomar
0 siblings, 2 replies; 69+ messages in thread
From: Martin Uecker @ 2022-11-13 14:58 UTC (permalink / raw)
To: Alejandro Colomar, Joseph Myers
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
Am Sonntag, den 13.11.2022, 15:02 +0100 schrieb Alejandro Colomar:
>
> On 11/13/22 14:33, Alejandro Colomar wrote:
> > Hi Martin,
> >
> > On 11/13/22 14:19, Alejandro Colomar wrote:
> > > > But there are not only syntactical problems, because
> > > > also the type of the parameter might become relevant
> > > > and then you can get circular dependencies:
> > > >
> > > > void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);
> > >
> > > This seems to be a difficult stone in the road.
But note that GNU forward declarations solve this nicely.
> > >
> > > > I am not sure what would the best way to fix it. One
> > > > could specifiy that parameters referred to by
> > > > the .identifer syntax must of some integer type and
> > > > that the sub-expression .identifer is always
> > > > converted to a 'size_t'.
> > >
> > > That makes sense, but then overnight some quite useful thing came to my mind
> > > that would not be possible with this limitation:
> > >
> > >
> > > <https://software.codidact.com/posts/285946>
> > >
> > > char *
> > > stpecpy(char dst[.end - .dst], char *src, char end[1])
>
> Heh, I got an off-by-one error. It should be dst[.end - .dst + 1], of course,
> and then the result of the whole expression would be 0, which is fine as size_t.
>
> So, never mind.
.end and .dst would have pointer size though.
> > > {
> > > for (/* void */; dst <= end; dst++) {
> > > *dst = *src++;
> > > if (*dst == '\0')
> > > return dst;
> > > }
> > > /* Truncation detected */
> > > *end = '\0';
> > >
> > > #if !defined(NDEBUG)
> > > /* Consume the rest of the input string. */
> > > while (*src++) {};
> > > #endif
> > >
> > > return end + 1;
> > > }
> > And I forgot to say it: Default promotions rank high (probably the highest) in
> > my list of most hated features^Wbugs in C.
If you replaced them with explicit conversion you then have
to add by hand all the time, I am pretty sure most people
would hate this more. (and it could also hide bugs)
> > I wouldn't convert it to size_t, but
> > rather follow normal promotion rules.
The point of making it size_t is that you then
do need to know the type of the parameter to make
sense of the expression. If the type matters, then you get
mutual dependencies as in the example above.
> > Since you can use anything between INTMAX_MIN and UINTMAX_MAX for accessing an
> > array (which took me some time to understand), I'd also allow the same here. So,
> > the type of the expression between [] could perfectly be signed or unsigned.
> >
> > So, you could use size_t for very high indices, or e.g. ptrdiff_t if you want to
> > allow negative numbers. In the function above, since dst can be a pointer to
> > one-past-the-end (it represents a previous truncation; that's why the test
> > dst<=end), forcing a size_t conversion would disallow that syntax.
Yes, this then does not work.
Martin
> > Cheers,
> >
> > Alex
> >
>
> --
> <http://www.alejandro-colomar.es/>
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-13 14:58 ` Martin Uecker
@ 2022-11-13 15:15 ` Alejandro Colomar
2022-11-13 15:32 ` Martin Uecker
2022-11-13 16:28 ` Alejandro Colomar
2022-11-28 23:18 ` Alex Colomar
1 sibling, 2 replies; 69+ messages in thread
From: Alejandro Colomar @ 2022-11-13 15:15 UTC (permalink / raw)
To: Martin Uecker, Joseph Myers
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 3465 bytes --]
Hi Martin,
On 11/13/22 15:58, Martin Uecker wrote:
> Am Sonntag, den 13.11.2022, 15:02 +0100 schrieb Alejandro Colomar:
>>
>> On 11/13/22 14:33, Alejandro Colomar wrote:
>>>
>>> On 11/13/22 14:19, Alejandro Colomar wrote:
>>>>> But there are not only syntactical problems, because
>>>>> also the type of the parameter might become relevant
>>>>> and then you can get circular dependencies:
>>>>>
>>>>> void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);
>>>>
>>>> This seems to be a difficult stone in the road.
>
> But note that GNU forward declarations solve this nicely.
How would that above be solved with GNU fwd decl? I'm guessing that it can't.
How do you forward declare incomplete VMTs?.
>
>>>>
>>>>> I am not sure what would the best way to fix it. One
>>>>> could specifiy that parameters referred to by
>>>>> the .identifer syntax must of some integer type and
>>>>> that the sub-expression .identifer is always
>>>>> converted to a 'size_t'.
>>>>
>>>> That makes sense, but then overnight some quite useful thing came to my mind
>>>> that would not be possible with this limitation:
>>>>
>>>>
>>>> <https://software.codidact.com/posts/285946>
>>>>
>>>> char *
>>>> stpecpy(char dst[.end - .dst], char *src, char end[1])
>>
>> Heh, I got an off-by-one error. It should be dst[.end - .dst + 1], of course,
>> and then the result of the whole expression would be 0, which is fine as size_t.
>>
>> So, never mind.
>
> .end and .dst would have pointer size though.
>
>>>> {
>>>> for (/* void */; dst <= end; dst++) {
>>>> *dst = *src++;
>>>> if (*dst == '\0')
>>>> return dst;
>>>> }
>>>> /* Truncation detected */
>>>> *end = '\0';
>>>>
>>>> #if !defined(NDEBUG)
>>>> /* Consume the rest of the input string. */
>>>> while (*src++) {};
>>>> #endif
>>>>
>>>> return end + 1;
>>>> }
>>> And I forgot to say it: Default promotions rank high (probably the highest) in
>>> my list of most hated features^Wbugs in C.
>
> If you replaced them with explicit conversion you then have
> to add by hand all the time, I am pretty sure most people
> would hate this more. (and it could also hide bugs)
Yeah, casts are also in my top 3 list of things to avoid (although in this case
there's no bug); maybe a bit over default promotions :)
I didn't mean that all promotions are bad. Just the gratuitous ones, like
promoting everything to int before even needing it. That makes uint16_t a
theoretical type, because whenever you try to use it, you end up with a signed
32-bit type; fun heh? :P _BitInt() solves that for me.
But sure, in (1u + 1l), promotions are fine to get a common type.
>
>>> I wouldn't convert it to size_t, but
>>> rather follow normal promotion rules.
>
> The point of making it size_t is that you then
> do need to know the type of the parameter to make
> sense of the expression. If the type matters, then you get
> mutual dependencies as in the example above.
Except if you treat incomplete types as... incomplete types (for which sizeof()
is disallowed by the standard). And the issue we're having is that the types
are not yet complete at the time we're using them, aren't they?
Kind of like the initialization order fiasco, but since we're in a limited
scope, we can detect it.
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-13 15:15 ` Alejandro Colomar
@ 2022-11-13 15:32 ` Martin Uecker
2022-11-13 16:25 ` Alejandro Colomar
2022-11-13 16:28 ` Alejandro Colomar
1 sibling, 1 reply; 69+ messages in thread
From: Martin Uecker @ 2022-11-13 15:32 UTC (permalink / raw)
To: Alejandro Colomar, Joseph Myers
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
Am Sonntag, den 13.11.2022, 16:15 +0100 schrieb Alejandro Colomar:
> Hi Martin,
>
> On 11/13/22 15:58, Martin Uecker wrote:
> > Am Sonntag, den 13.11.2022, 15:02 +0100 schrieb Alejandro Colomar:
> > > On 11/13/22 14:33, Alejandro Colomar wrote:
> > > > On 11/13/22 14:19, Alejandro Colomar wrote:
> > > > > > But there are not only syntactical problems, because
> > > > > > also the type of the parameter might become relevant
> > > > > > and then you can get circular dependencies:
> > > > > >
> > > > > > void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);
> > > > >
> > > > > This seems to be a difficult stone in the road.
> >
> > But note that GNU forward declarations solve this nicely.
>
> How would that above be solved with GNU fwd decl? I'm guessing that it can't.
> How do you forward declare incomplete VMTs?.
You can't express it. This was my point: it is impossible
to create circular dependencies.
...
> > > > > {
> > > > > for (/* void */; dst <= end; dst++) {
> > > > > *dst = *src++;
> > > > > if (*dst == '\0')
> > > > > return dst;
> > > > > }
> > > > > /* Truncation detected */
> > > > > *end = '\0';
> > > > >
> > > > > #if !defined(NDEBUG)
> > > > > /* Consume the rest of the input string. */
> > > > > while (*src++) {};
> > > > > #endif
> > > > >
> > > > > return end + 1;
> > > > > }
> > > > And I forgot to say it: Default promotions rank high (probably the highest) in
> > > > my list of most hated features^Wbugs in C.
> >
> > If you replaced them with explicit conversion you then have
> > to add by hand all the time, I am pretty sure most people
> > would hate this more. (and it could also hide bugs)
>
> Yeah, casts are also in my top 3 list of things to avoid (although in this case
> there's no bug); maybe a bit over default promotions :)
>
> I didn't mean that all promotions are bad. Just the gratuitous ones, like
> promoting everything to int before even needing it. That makes uint16_t a
> theoretical type, because whenever you try to use it, you end up with a signed
> 32-bit type; fun heh? :P _BitInt() solves that for me.
uint16_t is for storing data. My expectation is that people
will find _BitInt() difficult and error-prone to use for
small sizes. But maybe I am wrong...
> But sure, in (1u + 1l), promotions are fine to get a common type.
>
> > > > I wouldn't convert it to size_t, but
> > > > rather follow normal promotion rules.
> >
> > The point of making it size_t is that you then
> > do need to know the type of the parameter to make
> > sense of the expression. If the type matters, then you get
> > mutual dependencies as in the example above.
>
> Except if you treat incomplete types as... incomplete types (for which sizeof()
> is disallowed by the standard). And the issue we're having is that the types
> are not yet complete at the time we're using them, aren't they?
It is not an incomplete type. When doing parsing and do not have
a declaration we know nothing about it (not just not the size).
If we assume we know the type (by looking ahead) we get mutual
dependencies.
Also the capability to parse, fold, and do type checking
in one go is something worth preserving in my opinion.
Martin
> Kind of like the initialization order fiasco, but since we're in a limited
> scope, we can detect it.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-13 15:32 ` Martin Uecker
@ 2022-11-13 16:25 ` Alejandro Colomar
0 siblings, 0 replies; 69+ messages in thread
From: Alejandro Colomar @ 2022-11-13 16:25 UTC (permalink / raw)
To: Martin Uecker, Joseph Myers
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 4129 bytes --]
Hi Martin,
On 11/13/22 16:32, Martin Uecker wrote:
> Am Sonntag, den 13.11.2022, 16:15 +0100 schrieb Alejandro Colomar:
>> Hi Martin,
>>
>> On 11/13/22 15:58, Martin Uecker wrote:
>>> Am Sonntag, den 13.11.2022, 15:02 +0100 schrieb Alejandro Colomar:
>>>> On 11/13/22 14:33, Alejandro Colomar wrote:
>>>>> On 11/13/22 14:19, Alejandro Colomar wrote:
>>>>>>> But there are not only syntactical problems, because
>>>>>>> also the type of the parameter might become relevant
>>>>>>> and then you can get circular dependencies:
>>>>>>>
>>>>>>> void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);
>>>>>>
>>>>>> This seems to be a difficult stone in the road.
>>>
>>> But note that GNU forward declarations solve this nicely.
>>
>> How would that above be solved with GNU fwd decl? I'm guessing that it can't.
>> How do you forward declare incomplete VMTs?.
>
> You can't express it. This was my point: it is impossible
> to create circular dependencies.
>
> ...
>
>>>>>> {
>>>>>> for (/* void */; dst <= end; dst++) {
>>>>>> *dst = *src++;
>>>>>> if (*dst == '\0')
>>>>>> return dst;
>>>>>> }
>>>>>> /* Truncation detected */
>>>>>> *end = '\0';
>>>>>>
>>>>>> #if !defined(NDEBUG)
>>>>>> /* Consume the rest of the input string. */
>>>>>> while (*src++) {};
>>>>>> #endif
>>>>>>
>>>>>> return end + 1;
>>>>>> }
>>>>> And I forgot to say it: Default promotions rank high (probably the highest) in
>>>>> my list of most hated features^Wbugs in C.
>>>
>>> If you replaced them with explicit conversion you then have
>>> to add by hand all the time, I am pretty sure most people
>>> would hate this more. (and it could also hide bugs)
>>
>> Yeah, casts are also in my top 3 list of things to avoid (although in this case
>> there's no bug); maybe a bit over default promotions :)
>>
>> I didn't mean that all promotions are bad. Just the gratuitous ones, like
>> promoting everything to int before even needing it. That makes uint16_t a
>> theoretical type, because whenever you try to use it, you end up with a signed
>> 32-bit type; fun heh? :P _BitInt() solves that for me.
>
> uint16_t is for storing data. My expectation is that people
> will find _BitInt() difficult and error-prone to use for
> small sizes. But maybe I am wrong...
I'm a bit concerned about the suffix to create literals. I'd have preferred a
suffix that allowed creating a specific size (instead of the minimum one. i.e.,
1u16 or something like that. But otherwise I think it can be better. I don't
have in mind a big issue I had a year ago with uint16_t, but it required 3 casts
in a line. With _BitInt() I think none (or maybe one, for giving 1 the
appropriate size) would have been needed. But we'll see how it works out.
>
>> But sure, in (1u + 1l), promotions are fine to get a common type.
>>
>>>>> I wouldn't convert it to size_t, but
>>>>> rather follow normal promotion rules.
>>>
>>> The point of making it size_t is that you then
>>> do need to know the type of the parameter to make
>>> sense of the expression. If the type matters, then you get
>>> mutual dependencies as in the example above.
>>
>> Except if you treat incomplete types as... incomplete types (for which sizeof()
>> is disallowed by the standard). And the issue we're having is that the types
>> are not yet complete at the time we're using them, aren't they?
>
> It is not an incomplete type. When doing parsing and do not have
> a declaration we know nothing about it (not just not the size).
> If we assume we know the type (by looking ahead) we get mutual
> dependencies.
Then I'd do the following: .identifier always has an incomplete type.
I'm preparing a complete description of what I think of the feature. I'll add that.
>
> Also the capability to parse, fold, and do type checking
> in one go is something worth preserving in my opinion.
Makes sense.
Thanks for all the help, both!
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-13 15:15 ` Alejandro Colomar
2022-11-13 15:32 ` Martin Uecker
@ 2022-11-13 16:28 ` Alejandro Colomar
2022-11-13 16:31 ` Alejandro Colomar
2022-11-14 18:13 ` Joseph Myers
1 sibling, 2 replies; 69+ messages in thread
From: Alejandro Colomar @ 2022-11-13 16:28 UTC (permalink / raw)
To: Martin Uecker, Joseph Myers
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 3542 bytes --]
SYNOPSIS:
unary-operator: . identifier
DESCRIPTION:
- It is not an lvalue.
- This means sizeof() and _Lengthof() cannot be applied to them.
- This prevents ambiguity with a designator in an initializer-list within a
nested braced-initializer.
- The type of a .identifier is always an incomplete type.
- This prevents circular dependencies involving sizeof() or _Lengthof().
- Shadowing rules apply.
- This prevents ambiguity.
EXAMPLES:
- Valid examples (libc):
int
strncmp(const char s1[.n],
const char s2[.n],
size_t n);
int
cacheflush(void addr[.nbytes],
int nbytes,
int cache);
long
mbind(void addr[.len],
unsigned long len,
int mode,
const unsigned long nodemask[(.maxnode + ULONG_WIDTH ‐ 1)
/ ULONG_WIDTH],
unsigned long maxnode, unsigned int flags);
void *
bsearch(const void key[.size],
const void base[.size * .nmemb],
size_t nmemb,
size_t size,
int (*compar)(const void [.size], const void [.size]));
- Valid examples (my own):
void
ustr2str(char dst[restrict .len + 1],
const char src[restrict .len],
size_t len);
char *
stpecpy(char dst[.end - .dst + 1],
char *restrict src,
char end[1]);
- Valid examples (from this thread):
-
struct s { int a; };
void f(int a, int b[((struct s) { .a = 1 }).a]);
Explanation:
- Because of shadowing rules, .a=1 refers to the struct member.
- Also, if .a referred to the parameter, it would be an rvalue, so
it wouldn't be valid to assign to it.
- (...).a refers to the struct member too, since otherwise an rvalue is
not expected there.
-
void foo(struct bar { int x; char c[.x] } a, int x);
Explanation:
- Because of shadowing rules, [.x] refers to the struct member.
-
struct bar { int y; };
void foo(char p[((struct bar){ .y = .x }).y], int x);
Explanation:
- .x unambiguously refers to the parameter.
- Undefined behavior:
-
struct bar { int y; };
void foo(char p[((struct bar){ .y = .y }).y], int y);
Explanation:
- Because of shadowing rules, =.y refers to the struct member.
- .y=.y means initialize the member with itself (uninitialized use).
- (...).y refers to the struct member, since otherwise an rvalue is not
expected there.
- Constraint violations:
-
void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);
Explanation:
- sizeof(*.b): Cannot get size of incomplete type.
- sizeof(*.a): Cannot get size of incomplete type.
-
void f(size_t s, int a[sizeof(1) = 1]);
Explanation:
- Cannot assign to rvalue.
-
void f(size_t s, int a[.s = 1]);
Explanation:
- Cannot assign to rvalue.
-
void f(size_t s, int a[sizeof(.s)]);
Explanation:
- sizeof(.s): Cannot get size of incomplete type.
Does this idea make sense to you?
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-13 16:28 ` Alejandro Colomar
@ 2022-11-13 16:31 ` Alejandro Colomar
2022-11-13 16:34 ` Alejandro Colomar
2022-11-14 18:13 ` Joseph Myers
1 sibling, 1 reply; 69+ messages in thread
From: Alejandro Colomar @ 2022-11-13 16:31 UTC (permalink / raw)
To: Martin Uecker, Joseph Myers
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 4560 bytes --]
On 11/13/22 17:28, Alejandro Colomar wrote:
> SYNOPSIS:
>
> unary-operator: . identifier
>
>
> DESCRIPTION:
>
> - It is not an lvalue.
>
> - This means sizeof() and _Lengthof() cannot be applied to them.
Sorry, the above is a thinko.
I wanted to say that, like sizeof() and _Lengthof(), you can't assign to it.
> - This prevents ambiguity with a designator in an initializer-list within a
> nested braced-initializer.
>
> - The type of a .identifier is always an incomplete type.
>
> - This prevents circular dependencies involving sizeof() or _Lengthof().
>
> - Shadowing rules apply.
>
> - This prevents ambiguity.
>
>
> EXAMPLES:
>
>
> - Valid examples (libc):
>
> int
> strncmp(const char s1[.n],
> const char s2[.n],
> size_t n);
>
> int
> cacheflush(void addr[.nbytes],
> int nbytes,
> int cache);
>
> long
> mbind(void addr[.len],
> unsigned long len,
> int mode,
> const unsigned long nodemask[(.maxnode + ULONG_WIDTH ‐ 1)
> / ULONG_WIDTH],
> unsigned long maxnode, unsigned int flags);
>
> void *
> bsearch(const void key[.size],
> const void base[.size * .nmemb],
> size_t nmemb,
> size_t size,
> int (*compar)(const void [.size], const void [.size]));
>
> - Valid examples (my own):
>
> void
> ustr2str(char dst[restrict .len + 1],
> const char src[restrict .len],
> size_t len);
>
> char *
> stpecpy(char dst[.end - .dst + 1],
> char *restrict src,
> char end[1]);
>
> - Valid examples (from this thread):
>
> -
> struct s { int a; };
> void f(int a, int b[((struct s) { .a = 1 }).a]);
>
> Explanation:
> - Because of shadowing rules, .a=1 refers to the struct member.
> - Also, if .a referred to the parameter, it would be an rvalue, so
> it wouldn't be valid to assign to it.
> - (...).a refers to the struct member too, since otherwise an rvalue is
> not expected there.
>
> -
> void foo(struct bar { int x; char c[.x] } a, int x);
>
> Explanation:
> - Because of shadowing rules, [.x] refers to the struct member.
>
> -
> struct bar { int y; };
> void foo(char p[((struct bar){ .y = .x }).y], int x);
>
> Explanation:
> - .x unambiguously refers to the parameter.
>
> - Undefined behavior:
>
> -
> struct bar { int y; };
> void foo(char p[((struct bar){ .y = .y }).y], int y);
>
> Explanation:
> - Because of shadowing rules, =.y refers to the struct member.
> - .y=.y means initialize the member with itself (uninitialized use).
> - (...).y refers to the struct member, since otherwise an rvalue is not
> expected there.
>
> - Constraint violations:
>
> -
> void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);
>
> Explanation:
> - sizeof(*.b): Cannot get size of incomplete type.
> - sizeof(*.a): Cannot get size of incomplete type.
>
> -
> void f(size_t s, int a[sizeof(1) = 1]);
>
> Explanation:
> - Cannot assign to rvalue.
>
> -
> void f(size_t s, int a[.s = 1]);
>
> Explanation:
> - Cannot assign to rvalue.
>
> -
> void f(size_t s, int a[sizeof(.s)]);
>
> Explanation:
> - sizeof(.s): Cannot get size of incomplete type.
>
>
> Does this idea make sense to you?
>
>
> Cheers,
> Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-13 16:31 ` Alejandro Colomar
@ 2022-11-13 16:34 ` Alejandro Colomar
2022-11-13 16:56 ` Alejandro Colomar
0 siblings, 1 reply; 69+ messages in thread
From: Alejandro Colomar @ 2022-11-13 16:34 UTC (permalink / raw)
To: Martin Uecker, Joseph Myers
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 4893 bytes --]
On 11/13/22 17:31, Alejandro Colomar wrote:
>
>
> On 11/13/22 17:28, Alejandro Colomar wrote:
>> SYNOPSIS:
>>
>> unary-operator: . identifier
>>
>>
>> DESCRIPTION:
>>
>> - It is not an lvalue.
>>
>> - This means sizeof() and _Lengthof() cannot be applied to them.
>
> Sorry, the above is a thinko.
>
> I wanted to say that, like sizeof() and _Lengthof(), you can't assign to it.
>
>> - This prevents ambiguity with a designator in an initializer-list within
>> a nested braced-initializer.
>>
>> - The type of a .identifier is always an incomplete type.
Or rather, more easily prohibit explicitly using typeof(), sizeof(), and
_Lengthof() to it.
>>
>> - This prevents circular dependencies involving sizeof() or _Lengthof().
>>
>> - Shadowing rules apply.
>>
>> - This prevents ambiguity.
>>
>>
>> EXAMPLES:
>>
>>
>> - Valid examples (libc):
>>
>> int
>> strncmp(const char s1[.n],
>> const char s2[.n],
>> size_t n);
>>
>> int
>> cacheflush(void addr[.nbytes],
>> int nbytes,
>> int cache);
>>
>> long
>> mbind(void addr[.len],
>> unsigned long len,
>> int mode,
>> const unsigned long nodemask[(.maxnode + ULONG_WIDTH ‐ 1)
>> / ULONG_WIDTH],
>> unsigned long maxnode, unsigned int flags);
>>
>> void *
>> bsearch(const void key[.size],
>> const void base[.size * .nmemb],
>> size_t nmemb,
>> size_t size,
>> int (*compar)(const void [.size], const void [.size]));
>>
>> - Valid examples (my own):
>>
>> void
>> ustr2str(char dst[restrict .len + 1],
>> const char src[restrict .len],
>> size_t len);
>>
>> char *
>> stpecpy(char dst[.end - .dst + 1],
>> char *restrict src,
>> char end[1]);
>>
>> - Valid examples (from this thread):
>>
>> -
>> struct s { int a; };
>> void f(int a, int b[((struct s) { .a = 1 }).a]);
>>
>> Explanation:
>> - Because of shadowing rules, .a=1 refers to the struct member.
>> - Also, if .a referred to the parameter, it would be an rvalue, so
>> it wouldn't be valid to assign to it.
>> - (...).a refers to the struct member too, since otherwise an rvalue
>> is not expected there.
>>
>> -
>> void foo(struct bar { int x; char c[.x] } a, int x);
>>
>> Explanation:
>> - Because of shadowing rules, [.x] refers to the struct member.
>>
>> -
>> struct bar { int y; };
>> void foo(char p[((struct bar){ .y = .x }).y], int x);
>>
>> Explanation:
>> - .x unambiguously refers to the parameter.
>>
>> - Undefined behavior:
>>
>> -
>> struct bar { int y; };
>> void foo(char p[((struct bar){ .y = .y }).y], int y);
>>
>> Explanation:
>> - Because of shadowing rules, =.y refers to the struct member.
>> - .y=.y means initialize the member with itself (uninitialized use).
>> - (...).y refers to the struct member, since otherwise an rvalue is
>> not expected there.
>>
>> - Constraint violations:
>>
>> -
>> void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);
>>
>> Explanation:
>> - sizeof(*.b): Cannot get size of incomplete type.
>> - sizeof(*.a): Cannot get size of incomplete type.
>>
>> -
>> void f(size_t s, int a[sizeof(1) = 1]);
>>
>> Explanation:
>> - Cannot assign to rvalue.
>>
>> -
>> void f(size_t s, int a[.s = 1]);
>>
>> Explanation:
>> - Cannot assign to rvalue.
>>
>> -
>> void f(size_t s, int a[sizeof(.s)]);
>>
>> Explanation:
>> - sizeof(.s): Cannot get size of incomplete type.
>>
>>
>> Does this idea make sense to you?
>>
>>
>> Cheers,
>> Alex
>
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-13 16:34 ` Alejandro Colomar
@ 2022-11-13 16:56 ` Alejandro Colomar
2022-11-13 19:05 ` Alejandro Colomar
0 siblings, 1 reply; 69+ messages in thread
From: Alejandro Colomar @ 2022-11-13 16:56 UTC (permalink / raw)
To: Martin Uecker, Joseph Myers
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 5300 bytes --]
On 11/13/22 17:34, Alejandro Colomar wrote:
>
>
> On 11/13/22 17:31, Alejandro Colomar wrote:
>>
>>
>> On 11/13/22 17:28, Alejandro Colomar wrote:
>>> SYNOPSIS:
>>>
>>> unary-operator: . identifier
>>>
>>>
>>> DESCRIPTION:
>>>
>>> - It is not an lvalue.
>>>
>>> - This means sizeof() and _Lengthof() cannot be applied to them.
>>
>> Sorry, the above is a thinko.
>>
>> I wanted to say that, like sizeof() and _Lengthof(), you can't assign to it.
>>
>>> - This prevents ambiguity with a designator in an initializer-list
>>> within a nested braced-initializer.
>>>
>>> - The type of a .identifier is always an incomplete type.
>
> Or rather, more easily prohibit explicitly using typeof(), sizeof(), and
> _Lengthof() to it.
Hmm, this is not enough. Pointer arithmetics are interesting, and for that, you
need to implicitly know the sizeof(*.p).
How about allowing only integral types or pointers to integral types?
>
>>>
>>> - This prevents circular dependencies involving sizeof() or _Lengthof().
>>>
>>> - Shadowing rules apply.
>>>
>>> - This prevents ambiguity.
>>>
>>>
>>> EXAMPLES:
>>>
>>>
>>> - Valid examples (libc):
>>>
>>> int
>>> strncmp(const char s1[.n],
>>> const char s2[.n],
>>> size_t n);
>>>
>>> int
>>> cacheflush(void addr[.nbytes],
>>> int nbytes,
>>> int cache);
>>>
>>> long
>>> mbind(void addr[.len],
>>> unsigned long len,
>>> int mode,
>>> const unsigned long nodemask[(.maxnode + ULONG_WIDTH ‐ 1)
>>> / ULONG_WIDTH],
>>> unsigned long maxnode, unsigned int flags);
>>>
>>> void *
>>> bsearch(const void key[.size],
>>> const void base[.size * .nmemb],
>>> size_t nmemb,
>>> size_t size,
>>> int (*compar)(const void [.size], const void [.size]));
>>>
>>> - Valid examples (my own):
>>>
>>> void
>>> ustr2str(char dst[restrict .len + 1],
>>> const char src[restrict .len],
>>> size_t len);
>>>
>>> char *
>>> stpecpy(char dst[.end - .dst + 1],
>>> char *restrict src,
>>> char end[1]);
>>>
>>> - Valid examples (from this thread):
>>>
>>> -
>>> struct s { int a; };
>>> void f(int a, int b[((struct s) { .a = 1 }).a]);
>>>
>>> Explanation:
>>> - Because of shadowing rules, .a=1 refers to the struct member.
>>> - Also, if .a referred to the parameter, it would be an rvalue,
>>> so it wouldn't be valid to assign to it.
>>> - (...).a refers to the struct member too, since otherwise an rvalue
>>> is not expected there.
>>>
>>> -
>>> void foo(struct bar { int x; char c[.x] } a, int x);
>>>
>>> Explanation:
>>> - Because of shadowing rules, [.x] refers to the struct member.
>>>
>>> -
>>> struct bar { int y; };
>>> void foo(char p[((struct bar){ .y = .x }).y], int x);
>>>
>>> Explanation:
>>> - .x unambiguously refers to the parameter.
>>>
>>> - Undefined behavior:
>>>
>>> -
>>> struct bar { int y; };
>>> void foo(char p[((struct bar){ .y = .y }).y], int y);
>>>
>>> Explanation:
>>> - Because of shadowing rules, =.y refers to the struct member.
>>> - .y=.y means initialize the member with itself (uninitialized use).
>>> - (...).y refers to the struct member, since otherwise an rvalue is
>>> not expected there.
>>>
>>> - Constraint violations:
>>>
>>> -
>>> void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);
>>>
>>> Explanation:
>>> - sizeof(*.b): Cannot get size of incomplete type.
>>> - sizeof(*.a): Cannot get size of incomplete type.
>>>
>>> -
>>> void f(size_t s, int a[sizeof(1) = 1]);
>>>
>>> Explanation:
>>> - Cannot assign to rvalue.
>>>
>>> -
>>> void f(size_t s, int a[.s = 1]);
>>>
>>> Explanation:
>>> - Cannot assign to rvalue.
>>>
>>> -
>>> void f(size_t s, int a[sizeof(.s)]);
>>>
>>> Explanation:
>>> - sizeof(.s): Cannot get size of incomplete type.
>>>
>>>
>>> Does this idea make sense to you?
>>>
>>>
>>> Cheers,
>>> Alex
>>
>
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-13 16:56 ` Alejandro Colomar
@ 2022-11-13 19:05 ` Alejandro Colomar
0 siblings, 0 replies; 69+ messages in thread
From: Alejandro Colomar @ 2022-11-13 19:05 UTC (permalink / raw)
To: Martin Uecker, Joseph Myers
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 6113 bytes --]
On 11/13/22 17:56, Alejandro Colomar wrote:>>> On 11/13/22 17:28, Alejandro
Colomar wrote:
>>>> SYNOPSIS:
>>>>
>>>> unary-operator: . identifier
>>>>
>>>>
>>>> DESCRIPTION:
>>>>
>>>> - It is not an lvalue.
>>>>
>>>> - This means sizeof() and _Lengthof() cannot be applied to them.
>>>
>>> Sorry, the above is a thinko.
>>>
>>> I wanted to say that, like sizeof() and _Lengthof(), you can't assign to it.
>>>
>>>> - This prevents ambiguity with a designator in an initializer-list
>>>> within a nested braced-initializer.
>>>>
>>>> - The type of a .identifier is always an incomplete type.
>>
>> Or rather, more easily prohibit explicitly using typeof(), sizeof(), and
>> _Lengthof() to it.
>
> Hmm, this is not enough. Pointer arithmetics are interesting, and for that, you
> need to implicitly know the sizeof(*.p).
>
> How about allowing only integral types or pointers to integral types?
I've been thinking about keeping the number of passes as low as possible, while
allowing most useful expressions:
Maybe forcing some ordering can help:
- The type of a .initializer is complete after the opening parenthesis of the
function-declarator (if it refers to a parameter) or after the opening brace of
a braced-initializer, if it refers to a struct/union member, except when the
type is a variably-modified type, which will be complete after the closing
parenthesis or brace respectively.
I'm not sure I got the wording precisely, or if I covered all cases (like types
that cannot be completed for other reasons, even after the closing ')' or '}'.
>
>>
>>>>
>>>> - This prevents circular dependencies involving sizeof() or _Lengthof().
>>>>
>>>> - Shadowing rules apply.
>>>>
>>>> - This prevents ambiguity.
>>>>
>>>>
>>>> EXAMPLES:
>>>>
>>>>
>>>> - Valid examples (libc):
>>>>
>>>> int
>>>> strncmp(const char s1[.n],
>>>> const char s2[.n],
>>>> size_t n);
>>>>
>>>> int
>>>> cacheflush(void addr[.nbytes],
>>>> int nbytes,
>>>> int cache);
>>>>
>>>> long
>>>> mbind(void addr[.len],
>>>> unsigned long len,
>>>> int mode,
>>>> const unsigned long nodemask[(.maxnode + ULONG_WIDTH ‐ 1)
>>>> / ULONG_WIDTH],
>>>> unsigned long maxnode, unsigned int flags);
>>>>
>>>> void *
>>>> bsearch(const void key[.size],
>>>> const void base[.size * .nmemb],
>>>> size_t nmemb,
>>>> size_t size,
>>>> int (*compar)(const void [.size], const void [.size]));
>>>>
>>>> - Valid examples (my own):
>>>>
>>>> void
>>>> ustr2str(char dst[restrict .len + 1],
>>>> const char src[restrict .len],
>>>> size_t len);
>>>>
>>>> char *
>>>> stpecpy(char dst[.end - .dst + 1],
>>>> char *restrict src,
>>>> char end[1]);
>>>>
>>>> - Valid examples (from this thread):
>>>>
>>>> -
>>>> struct s { int a; };
>>>> void f(int a, int b[((struct s) { .a = 1 }).a]);
>>>>
>>>> Explanation:
>>>> - Because of shadowing rules, .a=1 refers to the struct member.
>>>> - Also, if .a referred to the parameter, it would be an rvalue,
>>>> so it wouldn't be valid to assign to it.
>>>> - (...).a refers to the struct member too, since otherwise an
>>>> rvalue is not expected there.
>>>>
>>>> -
>>>> void foo(struct bar { int x; char c[.x] } a, int x);
>>>>
>>>> Explanation:
>>>> - Because of shadowing rules, [.x] refers to the struct member.
>>>>
>>>> -
>>>> struct bar { int y; };
>>>> void foo(char p[((struct bar){ .y = .x }).y], int x);
>>>>
>>>> Explanation:
>>>> - .x unambiguously refers to the parameter.
>>>>
>>>> - Undefined behavior:
>>>>
>>>> -
>>>> struct bar { int y; };
>>>> void foo(char p[((struct bar){ .y = .y }).y], int y);
>>>>
>>>> Explanation:
>>>> - Because of shadowing rules, =.y refers to the struct member.
>>>> - .y=.y means initialize the member with itself (uninitialized use).
>>>> - (...).y refers to the struct member, since otherwise an rvalue is
>>>> not expected there.
>>>>
>>>> - Constraint violations:
>>>>
>>>> -
>>>> void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);
>>>>
>>>> Explanation:
>>>> - sizeof(*.b): Cannot get size of incomplete type.
>>>> - sizeof(*.a): Cannot get size of incomplete type.
>>>>
>>>> -
>>>> void f(size_t s, int a[sizeof(1) = 1]);
>>>>
>>>> Explanation:
>>>> - Cannot assign to rvalue.
>>>>
>>>> -
>>>> void f(size_t s, int a[.s = 1]);
>>>>
>>>> Explanation:
>>>> - Cannot assign to rvalue.
>>>>
>>>> -
>>>> void f(size_t s, int a[sizeof(.s)]);
This should actually be valid.
>>>>
>>>> Explanation:
>>>> - sizeof(.s): Cannot get size of incomplete type.
>>>>
>>>>
>>>> Does this idea make sense to you?
>>>>
>>>>
>>>> Cheers,
>>>> Alex
>>>
>>
>
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-13 13:19 ` Alejandro Colomar
2022-11-13 13:33 ` Alejandro Colomar
@ 2022-11-14 17:52 ` Joseph Myers
2022-11-14 17:57 ` Alejandro Colomar
1 sibling, 1 reply; 69+ messages in thread
From: Joseph Myers @ 2022-11-14 17:52 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
On Sun, 13 Nov 2022, Alejandro Colomar via Gcc wrote:
> Maybe allowing integral types and pointers would be enough. However,
> foreseeing that the _Lengthof() proposal (BTW, which paper was it?) will
> succeed, and combining it with this one, _Lengthof(pointer) would ideally give
> the length of the array, so allowing pointers would conflict.
Do you mean N2529 Romero, New pointer-proof keyword to determine array
length? To quote the convenor in WG14 reflector message 18575 (17 Nov
2020) when I asked about its status, "The author asked me not to put those
on the agenda. He will supply updated versions later.".
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-14 17:52 ` Joseph Myers
@ 2022-11-14 17:57 ` Alejandro Colomar
2022-11-14 18:26 ` Joseph Myers
0 siblings, 1 reply; 69+ messages in thread
From: Alejandro Colomar @ 2022-11-14 17:57 UTC (permalink / raw)
To: Joseph Myers
Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 1139 bytes --]
Hi Joseph!
On 11/14/22 18:52, Joseph Myers wrote:
> On Sun, 13 Nov 2022, Alejandro Colomar via Gcc wrote:
>
>> Maybe allowing integral types and pointers would be enough. However,
>> foreseeing that the _Lengthof() proposal (BTW, which paper was it?) will
>> succeed, and combining it with this one, _Lengthof(pointer) would ideally give
>> the length of the array, so allowing pointers would conflict.
>
> Do you mean N2529 Romero, New pointer-proof keyword to determine array
> length?
Yes, that's it! Thanks.
> To quote the convenor in WG14 reflector message 18575 (17 Nov
> 2020) when I asked about its status, "The author asked me not to put those
> on the agenda. He will supply updated versions later.".
Since his email is not in the paper, would you mind forwarding him this
suggestion of mine of renaming it to avoid confusion with string lengths? Or
maybe point him to the mailing list discussion[1]?
[1]:
<https://lore.kernel.org/linux-man/20221110222540.as3jrjdzxsnot3zm@illithid/T/#m794ad2a3173a19099625ee1dec7ea11ab754513d>
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-13 16:28 ` Alejandro Colomar
2022-11-13 16:31 ` Alejandro Colomar
@ 2022-11-14 18:13 ` Joseph Myers
2022-11-28 22:59 ` Alex Colomar
1 sibling, 1 reply; 69+ messages in thread
From: Joseph Myers @ 2022-11-14 18:13 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
On Sun, 13 Nov 2022, Alejandro Colomar via Gcc wrote:
> SYNOPSIS:
>
> unary-operator: . identifier
That's not what you mean. See the standard syntax.
unary-expression:
[other alternatives]
unary-operator cast-expression
unary-operator: one of
& * + - ~ !
> - It is not an lvalue.
>
> - This means sizeof() and _Lengthof() cannot be applied to them.
sizeof can be applied to non-lvalues.
> - This prevents ambiguity with a designator in an initializer-list within
> a nested braced-initializer.
No, it doesn't. See my previous points about syntactic disambiguation
being a separate matter from "one parse would result in a constraint
violation, so choose another parse that doesn't" (necessarily, because the
constraint violation that results could in general be at an arbitrary
distance from the point where a choice of parse has to be made). Or see
e.g. the disambiguation rule about enum type specifiers: there is an
explicit rule "If an enum type specifier is present, then the longest
possible sequence of tokens that can be interpreted as a specifier
qualifier list is interpreted as part of the enum type specifier." that
ensures that "enum e : long int;" interprets "long int" as the enum type
specifier, rather than "long" as the enum type specifier and "int" as
another type specifier in the sequence of declaration specifiers, even
though the latter parse would result in a constraint violation later.
Also, requiring unbounded lookahead to determine what kind of construct is
being parsed may be considered questionable for C. (If you have an
initializer starting .a.b.c.d.e, possibly with array element access as
well, those could all be designators or .a might be a reference to a
parameter of struct or union type and .b.c.d.e a sequence of references to
members within it and disambiguation under your rule would depend on
whether an '=' follows such an unbounded sequence.)
> - The type of a .identifier is always an incomplete type.
>
> - This prevents circular dependencies involving sizeof() or _Lengthof().
We have typeof as well, which can be applied to expressions with
incomplete type.
> - Shadowing rules apply.
>
> - This prevents ambiguity.
"Shadowing rules apply" isn't much of a specification. You need detailed
wording that would be added to 6.2.1 Scopes of identifiers (or equivalent
elsewhere) to make it clear exactly what scopes apply for identifiers
looked up using this construct.
> -
> void foo(struct bar { int x; char c[.x] } a, int x);
>
> Explanation:
> - Because of shadowing rules, [.x] refers to the struct member.
I really don't think standardizing VLAs-in-structures would be a good
idea. Certainly it would be a massive pain to specify meaningful
semantics for them and this outline doesn't even attempt to work through
the consequences of removing the rule that "If an identifier is declared
as having a variably modified type, it shall be an ordinary identifier (as
defined in 6.2.3), have no linkage, and have either block scope or
function prototype scope.".
The idea that .x as an expression might refer to either a member or a
parameter is also a massive change to the namespace rules, where at
present those are in completely different namespaces and so in any given
context a name only needs looking up as one or the other.
Again, proposals should be *minimal*. And even when they are, many issues
may well arise in practice (see the long list of constexpr issues in my
commit message for that C2x feature, for example, which I expect to turn
into multiple NB comments and at least two accompanying documents).
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-14 17:57 ` Alejandro Colomar
@ 2022-11-14 18:26 ` Joseph Myers
2022-11-28 23:02 ` Alex Colomar
0 siblings, 1 reply; 69+ messages in thread
From: Joseph Myers @ 2022-11-14 18:26 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
On Mon, 14 Nov 2022, Alejandro Colomar via Gcc wrote:
> > To quote the convenor in WG14 reflector message 18575 (17 Nov
> > 2020) when I asked about its status, "The author asked me not to put those
> > on the agenda. He will supply updated versions later.".
>
> Since his email is not in the paper, would you mind forwarding him this
> suggestion of mine of renaming it to avoid confusion with string lengths? Or
> maybe point him to the mailing list discussion[1]?
>
> [1]:
> <https://lore.kernel.org/linux-man/20221110222540.as3jrjdzxsnot3zm@illithid/T/#m794ad2a3173a19099625ee1dec7ea11ab754513d>
I don't have his email address (I don't see any emails from him on the
reflector since I joined it in 2001).
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-14 18:13 ` Joseph Myers
@ 2022-11-28 22:59 ` Alex Colomar
0 siblings, 0 replies; 69+ messages in thread
From: Alex Colomar @ 2022-11-28 22:59 UTC (permalink / raw)
To: Joseph Myers
Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 5335 bytes --]
Hi Joseph,
On 11/14/22 19:13, Joseph Myers wrote:
> On Sun, 13 Nov 2022, Alejandro Colomar via Gcc wrote:
>
>> SYNOPSIS:
>>
>> unary-operator: . identifier
>
> That's not what you mean. See the standard syntax.
Yup; typo there.
>
> unary-expression:
> [other alternatives]
> unary-operator cast-expression
>
> unary-operator: one of
> & * + - ~ !
>
>> - It is not an lvalue.
>>
>> - This means sizeof() and _Lengthof() cannot be applied to them.
>
> sizeof can be applied to non-lvalues.
thinko there. I fixed it in a subsequent email.
>
>> - This prevents ambiguity with a designator in an initializer-list within
>> a nested braced-initializer.
>
> No, it doesn't. See my previous points about syntactic disambiguation
> being a separate matter from "one parse would result in a constraint
> violation, so choose another parse that doesn't" (necessarily, because the
> constraint violation that results could in general be at an arbitrary
> distance from the point where a choice of parse has to be made). Or see
> e.g. the disambiguation rule about enum type specifiers: there is an
> explicit rule "If an enum type specifier is present, then the longest
> possible sequence of tokens that can be interpreted as a specifier
> qualifier list is interpreted as part of the enum type specifier." that
> ensures that "enum e : long int;" interprets "long int" as the enum type
> specifier, rather than "long" as the enum type specifier and "int" as
> another type specifier in the sequence of declaration specifiers, even
> though the latter parse would result in a constraint violation later.
I get it. It's only unambiguous if there's lookahead.
>
> Also, requiring unbounded lookahead to determine what kind of construct is
> being parsed may be considered questionable for C. (If you have an
> initializer starting .a.b.c.d.e, possibly with array element access as
> well, those could all be designators or .a might be a reference to a
> parameter of struct or union type and .b.c.d.e a sequence of references to
> members within it and disambiguation under your rule would depend on
> whether an '=' follows such an unbounded sequence.)
I'm thinking of an idea for this.
>
>> - The type of a .identifier is always an incomplete type.
>>
>> - This prevents circular dependencies involving sizeof() or _Lengthof().
>
> We have typeof as well, which can be applied to expressions with
> incomplete type.
Yes, but it would not be problematic in the two-pass parsing I have in mind.
>
>> - Shadowing rules apply.
>>
>> - This prevents ambiguity.
>
> "Shadowing rules apply" isn't much of a specification. You need detailed
> wording that would be added to 6.2.1 Scopes of identifiers (or equivalent
> elsewhere) to make it clear exactly what scopes apply for identifiers
> looked up using this construct.
Yeah, I guess. I'm being easy for this draft. I'll try to be more
precise for future revisions.
>
>> -
>> void foo(struct bar { int x; char c[.x] } a, int x);
>>
>> Explanation:
>> - Because of shadowing rules, [.x] refers to the struct member.
>
> I really don't think standardizing VLAs-in-structures would be a good
> idea. Certainly it would be a massive pain to specify meaningful
> semantics for them and this outline doesn't even attempt to work through
> the consequences of removing the rule that "If an identifier is declared
> as having a variably modified type, it shall be an ordinary identifier (as
> defined in 6.2.3), have no linkage, and have either block scope or
> function prototype scope.".
Maybe. I didn't have them in mind until Martin mentioned them. Now
that he mentioned them, I'd like at least to be careful so that any new
syntax doesn't do something that impedes adding them in the future, if
it is ever considered desirable.
>
> The idea that .x as an expression might refer to either a member or a
> parameter is also a massive change to the namespace rules, where at
> present those are in completely different namespaces and so in any given
> context a name only needs looking up as one or the other.
>
> Again, proposals should be *minimal*.
Yes. I only want to have a rough discussion about how the entire
feature in an ideal future where everything is added would look like.
Otherwise, adding a minimal feature without considering this future,
might do something that prevents some part of it being implemented due
to backwards compatibility.
So I'd like to discuss the whole idea before then going to a minimal
proposal that will be *much* smaller than this idea that I'm discussing.
I'm happy with the Linux man-pages implementing the whole idea (even if
it's impossible to implement it in C ever), and letting ISO C / GCC
implement initially (and possibly ever) only the minimal stuff.
> And even when they are, many issues
> may well arise in practice (see the long list of constexpr issues in my
> commit message for that C2x feature, for example, which I expect to turn
> into multiple NB comments and at least two accompanying documents).
Sure; I expect that.
Cheers,
Alex
>
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-14 18:26 ` Joseph Myers
@ 2022-11-28 23:02 ` Alex Colomar
0 siblings, 0 replies; 69+ messages in thread
From: Alex Colomar @ 2022-11-28 23:02 UTC (permalink / raw)
To: Joseph Myers
Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 986 bytes --]
Hi Joseph,
On 11/14/22 19:26, Joseph Myers wrote:
> On Mon, 14 Nov 2022, Alejandro Colomar via Gcc wrote:
>
>>> To quote the convenor in WG14 reflector message 18575 (17 Nov
>>> 2020) when I asked about its status, "The author asked me not to put those
>>> on the agenda. He will supply updated versions later.".
>>
>> Since his email is not in the paper, would you mind forwarding him this
>> suggestion of mine of renaming it to avoid confusion with string lengths? Or
>> maybe point him to the mailing list discussion[1]?
>>
>> [1]:
>> <https://lore.kernel.org/linux-man/20221110222540.as3jrjdzxsnot3zm@illithid/T/#m794ad2a3173a19099625ee1dec7ea11ab754513d>
>
> I don't have his email address (I don't see any emails from him on the
> reflector since I joined it in 2001).
Meh; thanks. Would you mind commenting this issue to whoever defends
his document, whenever you talk about it?
Thanks,
Alex
>
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-13 14:58 ` Martin Uecker
2022-11-13 15:15 ` Alejandro Colomar
@ 2022-11-28 23:18 ` Alex Colomar
2022-11-29 0:05 ` Joseph Myers
2022-11-29 14:58 ` Michael Matz
1 sibling, 2 replies; 69+ messages in thread
From: Alex Colomar @ 2022-11-28 23:18 UTC (permalink / raw)
To: Martin Uecker, Joseph Myers
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 4976 bytes --]
Hi Martin,
On 11/13/22 15:58, Martin Uecker wrote:
> Am Sonntag, den 13.11.2022, 15:02 +0100 schrieb Alejandro Colomar:
>>
>> On 11/13/22 14:33, Alejandro Colomar wrote:
>>> Hi Martin,
>>>
>>> On 11/13/22 14:19, Alejandro Colomar wrote:
>>>>> But there are not only syntactical problems, because
>>>>> also the type of the parameter might become relevant
>>>>> and then you can get circular dependencies:
>>>>>
>>>>> void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);
>>>>
>>>> This seems to be a difficult stone in the road.
>
> But note that GNU forward declarations solve this nicely.
Okay, so GNU declarations basically work by duplicating (some of) the
declarations.
How about the compiler parsing the parameter list twice? One for
getting the declarations and their types (but not resolving any
sizeof(), _Lengthof(), or typeof(), when they contain .identifier (or
expressions containing it; in those cases, leave the type incomplete, to
be completed in the second pass). As if the programmer had specified
the firward declarations, but it's the compiler that gets them
automatically.
I guess asking the compiler to do two passes on the param list isn't as
bad as asking to do unbound lookahead. In this case it's bound: look
ahead till the end of the param list; get as much info as possible, and
then do it again to complete. Anything not yet clear after two passes
is not valid.
So, for
void foo(char (*a)[sizeof(*.b)], char (*b)[sizeof(*.a)]);
in the first pass, the compiler would read:
char (*a)[sizeof(*.b)]; // sizeof .identifier; incomplete type;
continue parsing
char (*b)[sizeof(*.a)]; // sizeof .identifier; incomplete type;
continue parsing
At the end of the first pass, the compiler only know:
char (*a)[];
char (*b)[];
At the second pass, when evaluating sizeof(), since the type of the
arguments are yet incomplete, it can't be evaluated, and therefore,
there's an error at the first sizeof(*.b): *.b has incomplete type.
---
Let's show a distinct case:
void foo(char (*a)[sizeof(*.b)], char (*b)[10]);
After the first pass, the compiler would know:
char (*a)[];
char (*b)[10];
At the second pass, sizeof(*.b) would be evaluated undoubtedly to
sizeof(char[10]), and the parameter list would then be fine.
Does this 2-pass parsing make sense to you? Did I miss any details?
>
>>>>
>>>>> I am not sure what would the best way to fix it. One
>>>>> could specifiy that parameters referred to by
>>>>> the .identifer syntax must of some integer type and
>>>>> that the sub-expression .identifer is always
>>>>> converted to a 'size_t'.
>>>>
>>>> That makes sense, but then overnight some quite useful thing came to my mind
>>>> that would not be possible with this limitation:
>>>>
>>>>
>>>> <https://software.codidact.com/posts/285946>
>>>>
>>>> char *
>>>> stpecpy(char dst[.end - .dst], char *src, char end[1])
>>
>> Heh, I got an off-by-one error. It should be dst[.end - .dst + 1], of course,
>> and then the result of the whole expression would be 0, which is fine as size_t.
>>
>> So, never mind.
>
> .end and .dst would have pointer size though.
>
>>>> {
>>>> for (/* void */; dst <= end; dst++) {
>>>> *dst = *src++;
>>>> if (*dst == '\0')
>>>> return dst;
>>>> }
>>>> /* Truncation detected */
>>>> *end = '\0';
>>>>
>>>> #if !defined(NDEBUG)
>>>> /* Consume the rest of the input string. */
>>>> while (*src++) {};
>>>> #endif
>>>>
>>>> return end + 1;
>>>> }
>>> And I forgot to say it: Default promotions rank high (probably the highest) in
>>> my list of most hated features^Wbugs in C.
>
> If you replaced them with explicit conversion you then have
> to add by hand all the time, I am pretty sure most people
> would hate this more. (and it could also hide bugs)
>
>>> I wouldn't convert it to size_t, but
>>> rather follow normal promotion rules.
>
> The point of making it size_t is that you then
> do need to know the type of the parameter to make
> sense of the expression. If the type matters, then you get
> mutual dependencies as in the example above.
>
>>> Since you can use anything between INTMAX_MIN and UINTMAX_MAX for accessing an
>>> array (which took me some time to understand), I'd also allow the same here. So,
>>> the type of the expression between [] could perfectly be signed or unsigned.
>>>
>>> So, you could use size_t for very high indices, or e.g. ptrdiff_t if you want to
>>> allow negative numbers. In the function above, since dst can be a pointer to
>>> one-past-the-end (it represents a previous truncation; that's why the test
>>> dst<=end), forcing a size_t conversion would disallow that syntax.
>
> Yes, this then does not work.
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-28 23:18 ` Alex Colomar
@ 2022-11-29 0:05 ` Joseph Myers
2022-11-29 14:58 ` Michael Matz
1 sibling, 0 replies; 69+ messages in thread
From: Joseph Myers @ 2022-11-29 0:05 UTC (permalink / raw)
To: Alex Colomar
Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
On Tue, 29 Nov 2022, Alex Colomar via Gcc wrote:
> I guess asking the compiler to do two passes on the param list isn't as bad as
> asking to do unbound lookahead. In this case it's bound: look ahead till the
> end of the param list; get as much info as possible, and then do it again to
> complete. Anything not yet clear after two passes is not valid.
Unbounded here means an unbounded number of tokens, as opposed to e.g.
looking one token ahead after seeing an identifier in statement context to
determine if it's a label.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-28 23:18 ` Alex Colomar
2022-11-29 0:05 ` Joseph Myers
@ 2022-11-29 14:58 ` Michael Matz
2022-11-29 15:17 ` Uecker, Martin
2022-11-29 16:49 ` Joseph Myers
1 sibling, 2 replies; 69+ messages in thread
From: Michael Matz @ 2022-11-29 14:58 UTC (permalink / raw)
To: Alex Colomar
Cc: Martin Uecker, Joseph Myers, Ingo Schwarze, JeanHeyd Meneide,
linux-man, gcc
Hey,
On Tue, 29 Nov 2022, Alex Colomar via Gcc wrote:
> How about the compiler parsing the parameter list twice?
This _is_ unbounded look-ahead. You could avoid this by using "." for
your new syntax. Use something unambiguous that can't be confused with
other syntactic elements, e.g. with a different punctuator like '@' or the
like. But I'm generally doubtful of this whole feature within C itself.
It serves a purpose in documentation, so in man-pages it seems fine enough
(but then still could use a different puncuator to not be confusable with
C syntax).
But within C it still can only serve a documentation purpose as no
checking could be performed without also changes in how e.g. arrays are
represented (they always would need to come with a size). It seems
doubtful to introduce completely new and ambiguous syntax with all the
problems Joseph lists just in order to be able to write documentation when
there's a perfectly fine method to do so: comments.
Ciao,
Michael.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-29 14:58 ` Michael Matz
@ 2022-11-29 15:17 ` Uecker, Martin
2022-11-29 15:44 ` Michael Matz
2022-11-29 16:49 ` Joseph Myers
1 sibling, 1 reply; 69+ messages in thread
From: Uecker, Martin @ 2022-11-29 15:17 UTC (permalink / raw)
To: alx.manpages, matz; +Cc: gcc, linux-man, joseph, schwarze, wg14
Am Dienstag, dem 29.11.2022 um 14:58 +0000 schrieb Michael Matz:
> Hey,
>
> On Tue, 29 Nov 2022, Alex Colomar via Gcc wrote:
>
> > How about the compiler parsing the parameter list twice?
>
> This _is_ unbounded look-ahead. You could avoid this by using "."
> for
> your new syntax. Use something unambiguous that can't be confused
> with
> other syntactic elements, e.g. with a different punctuator like '@'
> or the
> like. But I'm generally doubtful of this whole feature within C
> itself.
> It serves a purpose in documentation, so in man-pages it seems fine
> enough
> (but then still could use a different puncuator to not be confusable
> with
> C syntax).
>
> But within C it still can only serve a documentation purpose as no
> checking could be performed without also changes in how e.g. arrays
> are
> represented (they always would need to come with a size).
It does not require any changes on how arrays are represented.
As part of VM-types the size becomes part of the type and this
can be used for static or dynamic analysis, e.g. you can
- today - get a run-time bounds violation with the sanitizer:
void foo(int n, char (*buf)[n])
{
(*buf)[n] = 1;
}
int main()
{
char buf[10];
foo(10, &buf);
}
https://godbolt.org/z/WWEdeYchs
I personally find this already extremely useful.
For
void foo(int n, char buf[n]);
it semantically has no meaning according to the C standard,
but a compiler could still warn.
It could also warn for
void foo(int n, char buf[n]);
int main()
{
char buf[9];
foo(buf);
}
if the passed buffer is too short. And here, GCC and Clang
already do this! (although - so far - only for static
bounds I think)
https://godbolt.org/z/afPhnxfzx
With "static"
void foo(int n, char buf[static n]);
this would also be UB according to C.
We miss some features in GCC to make this more useful (and
I filed bugs a while ago). For example, UB sanitzer should detect
additional cases which are UB.
But in general: This feature is useful not only for documentation
but also for analysis. You can get bounds checking in C which
works today and with additional compiler features this would
be very useful!
Martin
> It seems
> doubtful to introduce completely new and ambiguous syntax with all
> the
> problems Joseph lists just in order to be able to write documentation
> when
> there's a perfectly fine method to do so: comments.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-29 15:17 ` Uecker, Martin
@ 2022-11-29 15:44 ` Michael Matz
2022-11-29 16:58 ` Uecker, Martin
0 siblings, 1 reply; 69+ messages in thread
From: Michael Matz @ 2022-11-29 15:44 UTC (permalink / raw)
To: Uecker, Martin; +Cc: alx.manpages, gcc, linux-man, joseph, schwarze, wg14
[-- Attachment #1: Type: text/plain, Size: 2427 bytes --]
Hey,
On Tue, 29 Nov 2022, Uecker, Martin wrote:
> It does not require any changes on how arrays are represented.
>
> As part of VM-types the size becomes part of the type and this
> can be used for static or dynamic analysis, e.g. you can
> - today - get a run-time bounds violation with the sanitizer:
>
> void foo(int n, char (*buf)[n])
> {
> (*buf)[n] = 1;
> }
This can already statically analyzed as being wrong, no need for dynamic
checking. What I mean is the checking of the claimed contract. Above you
assure for the function body that buf has n elements. This is also a
pre-condition for calling this function and _that_ can't be checked in all
cases because:
void foo (int n, char (*buf)[n]) { (*buf)[n-1] = 1; }
void callfoo(char * buf) { foo(10, buf); }
buf doesn't have a known size. And a pre-condition that can't be checked
is no pre-condition at all, as only then it can become a guarantee for the
body.
The compiler has no choice than to trust the user that the pre-condition
for calling foo is fulfilled. I can see how being able to just check half
of the contract might be useful, but if it doesn't give full checking then
any proposal for syntax should be even more obviously orthogonal than the
current one.
> For
>
> void foo(int n, char buf[n]);
>
> it semantically has no meaning according to the C standard,
> but a compiler could still warn.
Hmm? Warn about what in this decl?
> It could also warn for
>
> void foo(int n, char buf[n]);
>
> int main()
> {
> char buf[9];
> foo(buf);
> }
You mean if you write 'foo(10,buf)' (the above, as is, is simply a syntax
error for non-matching number of args). Or was it a mispaste and you mean
the one from the godbolt link, i.e.:
void foo(char buf[10]){ buf[9] = 1; }
int main()
{
char buf[9];
foo(buf);
}
? If so, yeah, we warn already. I don't think this is an argument for
(or against) introducing new syntax.
...
> But in general: This feature is useful not only for documentation
> but also for analysis.
Which feature we're talking about now? The ones you used all work today,
as you demonstrated. I thought we would be talking about that ".whatever"
syntax to refer to arbitrary parameters, even following ones? I think a
disrupting syntax change like that should have a higher bar than "in some
cases, depending on circumstance, we might even be able to warn".
Ciao,
Michael.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-29 14:58 ` Michael Matz
2022-11-29 15:17 ` Uecker, Martin
@ 2022-11-29 16:49 ` Joseph Myers
2022-11-29 16:53 ` Jonathan Wakely
1 sibling, 1 reply; 69+ messages in thread
From: Joseph Myers @ 2022-11-29 16:49 UTC (permalink / raw)
To: Michael Matz
Cc: Alex Colomar, Martin Uecker, Ingo Schwarze, JeanHeyd Meneide,
linux-man, gcc
On Tue, 29 Nov 2022, Michael Matz via Gcc wrote:
> like. But I'm generally doubtful of this whole feature within C itself.
> It serves a purpose in documentation, so in man-pages it seems fine enough
> (but then still could use a different puncuator to not be confusable with
> C syntax).
In man-pages you don't need to invent syntax at all. You can write
int f(char buf[n], int n);
and in the context of a man page it will be clear to readers what is
meant, though such a syntax would be problematic in actual C source files
because of issues with circular dependencies between parameters and with n
already being declared in an outer scope.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-29 16:49 ` Joseph Myers
@ 2022-11-29 16:53 ` Jonathan Wakely
2022-11-29 17:00 ` Martin Uecker
0 siblings, 1 reply; 69+ messages in thread
From: Jonathan Wakely @ 2022-11-29 16:53 UTC (permalink / raw)
To: Joseph Myers
Cc: Michael Matz, Alex Colomar, Martin Uecker, Ingo Schwarze,
JeanHeyd Meneide, linux-man, gcc
On Tue, 29 Nov 2022 at 16:49, Joseph Myers wrote:
>
> On Tue, 29 Nov 2022, Michael Matz via Gcc wrote:
>
> > like. But I'm generally doubtful of this whole feature within C itself.
> > It serves a purpose in documentation, so in man-pages it seems fine enough
> > (but then still could use a different puncuator to not be confusable with
> > C syntax).
>
> In man-pages you don't need to invent syntax at all. You can write
>
> int f(char buf[n], int n);
>
> and in the context of a man page it will be clear to readers what is
> meant,
Considerably more clear than new invented syntax IMHO.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-29 15:44 ` Michael Matz
@ 2022-11-29 16:58 ` Uecker, Martin
2022-11-29 17:28 ` Alex Colomar
0 siblings, 1 reply; 69+ messages in thread
From: Uecker, Martin @ 2022-11-29 16:58 UTC (permalink / raw)
To: matz; +Cc: gcc, alx.manpages, linux-man, joseph, schwarze, wg14
Hi,
Am Dienstag, dem 29.11.2022 um 15:44 +0000 schrieb Michael Matz:
> Hey,
>
> On Tue, 29 Nov 2022, Uecker, Martin wrote:
>
> > It does not require any changes on how arrays are represented.
> >
> > As part of VM-types the size becomes part of the type and this
> > can be used for static or dynamic analysis, e.g. you can
> > - today - get a run-time bounds violation with the sanitizer:
> >
> > void foo(int n, char (*buf)[n])
> > {
> > (*buf)[n] = 1;
> > }
>
> This can already statically analyzed as being wrong, no need for
> dynamic checking.
In this toy example, but in general in can be checked
only at run-time by using the information about the
dynamic bound.
> What I mean is the checking of the claimed contract.
> Above you assure for the function body that buf has n elements.
Yes.
> This is also a pre-condition for calling this function and
> _that_ can't be checked in all cases because:
>
> void foo (int n, char (*buf)[n]) { (*buf)[n-1] = 1; }
> void callfoo(char * buf) { foo(10, buf); }
>
> buf doesn't have a known size.
This does not type check.
> And a pre-condition that can't be checked
> is no pre-condition at all, as only then it can become a guarantee
> for the body.
The example above should look like:
void foo(int n, char (*buf)[n]);
void callfoo(char (*buf)[12]) { foo(10, buf); }
This could be checked by an UB sanitizer as calling
the function with an argument of incompatible type
is UB (but we currently do not do this)
If you think about
void foo(int n, char buf[n]);
void callfoo(char *buf) { foo(10, buf); }
Then you are right that this can not be checked at this
time. But this does not mean it is useless because we
still can detect inconsistencies in other cases:
void callfoo(int n, char buf[n - 1]) { foo(n, buf); }
We could also - in the future - have a warning about all
situations where bound information is lost, making sure
that preconditions are always checked for people who
consistently use these annotations.
> The compiler has no choice than to trust the user that the pre-
> condition for calling foo is fulfilled. I can see how
> being able to just check half of the contract might be
> useful, but if it doesn't give full checking then
> any proposal for syntax should be even more obviously
> orthogonal than the current one.
Your argument is not clear to me.
> > For
> >
> > void foo(int n, char buf[n]);
> >
> > it semantically has no meaning according to the C standard,
> > but a compiler could still warn.
>
> Hmm? Warn about what in this decl?
I meant, we could warn about something like this
because it is likely an error:
void foo(int n, char buf[n])
{
buf[n] = 1;
}
> > It could also warn for
> >
> > void foo(int n, char buf[n]);
> >
> > int main()
> > {
> > char buf[9];
> > foo(buf);
> > }
>
> You mean if you write 'foo(10,buf)' (the above, as is, is simply a
> syntax error for non-matching number of args). Or was it a mispaste
> and you mean the one from the godbolt link, i.e.:
I meant:
char buf[9];
foo(10, buf);
In fact, it turns out we warn already:
https://godbolt.org/z/qcvsv87Ev
> void foo(char buf[10]){ buf[9] = 1; }
> int main()
> {
> char buf[9];
> foo(buf);
> }
>
> ? If so, yeah, we warn already. I don't think this is an argument
> for (or against) introducing new syntax.
> ...
It is argument for having this syntax, because we could
extend such warning (those we already have and those we
could still add) to more common cases such as
void foo(char buf[.n], size_t n);
In my opinion, this would a huge step forward for
safety of C programs as we already have a lot of
infrastructure for checking bounds.
Of course, the existing GNU extension would achieve
the same thing:
void foo(size_t n; char buf[n], size_t n);
> > But in general: This feature is useful not only for documentation
> > but also for analysis.
>
> Which feature we're talking about now? The ones you used all work
> today,
> as you demonstrated. I thought we would be talking about that
> ".whatever"
> syntax to refer to arbitrary parameters, even following ones? I
> think a
> disrupting syntax change like that should have a higher bar than "in
> some
> cases, depending on circumstance, we might even be able to warn".
We can use our existing features and then apply them
to cases where the bound is specified after the pointer,
which is more common in practice.
Martin
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-29 16:53 ` Jonathan Wakely
@ 2022-11-29 17:00 ` Martin Uecker
2022-11-29 17:19 ` Alex Colomar
0 siblings, 1 reply; 69+ messages in thread
From: Martin Uecker @ 2022-11-29 17:00 UTC (permalink / raw)
To: Jonathan Wakely, Joseph Myers
Cc: Michael Matz, Alex Colomar, Ingo Schwarze, JeanHeyd Meneide,
linux-man, gcc
Am Dienstag, dem 29.11.2022 um 16:53 +0000 schrieb Jonathan Wakely:
> On Tue, 29 Nov 2022 at 16:49, Joseph Myers wrote:
> >
> > On Tue, 29 Nov 2022, Michael Matz via Gcc wrote:
> >
> > > like. But I'm generally doubtful of this whole feature within C
> > > itself.
> > > It serves a purpose in documentation, so in man-pages it seems
> > > fine enough
> > > (but then still could use a different puncuator to not be
> > > confusable with
> > > C syntax).
> >
> > In man-pages you don't need to invent syntax at all. You can write
> >
> > int f(char buf[n], int n);
> >
> > and in the context of a man page it will be clear to readers what
> > is
> > meant,
>
> Considerably more clear than new invented syntax IMHO.
True, but I think it would be a mistake to use code in
man pages which then does not work as expected (or even
is subtle wrong) in actual code.
Martin
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-29 17:00 ` Martin Uecker
@ 2022-11-29 17:19 ` Alex Colomar
2022-11-29 17:29 ` Alex Colomar
0 siblings, 1 reply; 69+ messages in thread
From: Alex Colomar @ 2022-11-29 17:19 UTC (permalink / raw)
To: Martin Uecker, Jonathan Wakely, Joseph Myers
Cc: Michael Matz, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 1328 bytes --]
Hi Martin, Joseph,
On 11/29/22 18:00, Martin Uecker wrote:
> Am Dienstag, dem 29.11.2022 um 16:53 +0000 schrieb Jonathan Wakely:
>> On Tue, 29 Nov 2022 at 16:49, Joseph Myers wrote:
>>>
>>> On Tue, 29 Nov 2022, Michael Matz via Gcc wrote:
>>>
>>>> like. But I'm generally doubtful of this whole feature within C
>>>> itself.
>>>> It serves a purpose in documentation, so in man-pages it seems
>>>> fine enough
>>>> (but then still could use a different puncuator to not be
>>>> confusable with
>>>> C syntax).
>>>
>>> In man-pages you don't need to invent syntax at all. You can write
>>>
>>> int f(char buf[n], int n);
>>>
>>> and in the context of a man page it will be clear to readers what
>>> is
>>> meant,
>>
>> Considerably more clear than new invented syntax IMHO.
>
> True, but I think it would be a mistake to use code in
> man pages which then does not work as expected (or even
> is subtle wrong) in actual code.
Exactly. Using your proposed syntax (which was my first draft) would
have probably been the source of hidden bugs, since it might work (read
compile) in some cases, but with wrong results.
I prefer this hypothetical syntax, which at most will cause compile errors.
Cheers,
Alex
>
> Martin
>
>
>
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-29 16:58 ` Uecker, Martin
@ 2022-11-29 17:28 ` Alex Colomar
0 siblings, 0 replies; 69+ messages in thread
From: Alex Colomar @ 2022-11-29 17:28 UTC (permalink / raw)
To: Uecker, Martin, matz; +Cc: gcc, linux-man, joseph, schwarze, wg14
[-- Attachment #1.1: Type: text/plain, Size: 6055 bytes --]
Hi Martin and Michael,
On 11/29/22 17:58, Uecker, Martin wrote:
>
> Hi,
>
> Am Dienstag, dem 29.11.2022 um 15:44 +0000 schrieb Michael Matz:
>> Hey,
>>
>> On Tue, 29 Nov 2022, Uecker, Martin wrote:
>>
>>> It does not require any changes on how arrays are represented.
>>>
>>> As part of VM-types the size becomes part of the type and this
>>> can be used for static or dynamic analysis, e.g. you can
>>> - today - get a run-time bounds violation with the sanitizer:
>>>
>>> void foo(int n, char (*buf)[n])
>>> {
>>> (*buf)[n] = 1;
>>> }
>>
>> This can already statically analyzed as being wrong, no need for
>> dynamic checking.
>
> In this toy example, but in general in can be checked
> only at run-time by using the information about the
> dynamic bound.
>
>> What I mean is the checking of the claimed contract.
>> Above you assure for the function body that buf has n elements.
>
> Yes.
>
>> This is also a pre-condition for calling this function and
>> _that_ can't be checked in all cases because:
>>
>> void foo (int n, char (*buf)[n]) { (*buf)[n-1] = 1; }
>> void callfoo(char * buf) { foo(10, buf); }
>>
>> buf doesn't have a known size.
>
> This does not type check.
>
>> And a pre-condition that can't be checked
>> is no pre-condition at all, as only then it can become a guarantee
>> for the body.
>
> The example above should look like:
>
> void foo(int n, char (*buf)[n]);
>
> void callfoo(char (*buf)[12]) { foo(10, buf); }
>
> This could be checked by an UB sanitizer as calling
> the function with an argument of incompatible type
> is UB (but we currently do not do this)
>
>
> If you think about
>
> void foo(int n, char buf[n]);
>
> void callfoo(char *buf) { foo(10, buf); }
>
>
> Then you are right that this can not be checked at this
> time. But this does not mean it is useless because we
> still can detect inconsistencies in other cases:
>
> void callfoo(int n, char buf[n - 1]) { foo(n, buf); }
>
> We could also - in the future - have a warning about all
> situations where bound information is lost, making sure
> that preconditions are always checked for people who
> consistently use these annotations.
>
>
>> The compiler has no choice than to trust the user that the pre-
>> condition for calling foo is fulfilled. I can see how
>> being able to just check half of the contract might be
>> useful, but if it doesn't give full checking then
>> any proposal for syntax should be even more obviously
>> orthogonal than the current one.
>
> Your argument is not clear to me.
>
>
>>> For
>>>
>>> void foo(int n, char buf[n]);
>>>
>>> it semantically has no meaning according to the C standard,
>>> but a compiler could still warn.
>>
>> Hmm? Warn about what in this decl?
>
> I meant, we could warn about something like this
> because it is likely an error:
>
> void foo(int n, char buf[n])
> {
> buf[n] = 1;
> }
>
>
>>> It could also warn for
>>>
>>> void foo(int n, char buf[n]);
>>>
>>> int main()
>>> {
>>> char buf[9];
>>> foo(buf);
>>> }
>>
>> You mean if you write 'foo(10,buf)' (the above, as is, is simply a
>> syntax error for non-matching number of args). Or was it a mispaste
>> and you mean the one from the godbolt link, i.e.:
>
> I meant:
>
> char buf[9];
> foo(10, buf);
>
> In fact, it turns out we warn already:
>
> https://godbolt.org/z/qcvsv87Ev
>
>> void foo(char buf[10]){ buf[9] = 1; }
>> int main()
>> {
>> char buf[9];
>> foo(buf);
>> }
>>
>> ? If so, yeah, we warn already. I don't think this is an argument
>> for (or against) introducing new syntax.
>> ...
>
> It is argument for having this syntax, because we could
> extend such warning (those we already have and those we
> could still add) to more common cases such as
>
> void foo(char buf[.n], size_t n);
>
> In my opinion, this would a huge step forward for
> safety of C programs as we already have a lot of
> infrastructure for checking bounds.
>
> Of course, the existing GNU extension would achieve
> the same thing:
>
> void foo(size_t n; char buf[n], size_t n);
>
>
>
>>> But in general: This feature is useful not only for documentation
>>> but also for analysis.
>>
>> Which feature we're talking about now? The ones you used all work
>> today,
>> as you demonstrated. I thought we would be talking about that
>> ".whatever"
>> syntax to refer to arbitrary parameters, even following ones? I
>> think a
>> disrupting syntax change like that should have a higher bar than "in
>> some
>> cases, depending on circumstance, we might even be able to warn".
>
> We can use our existing features and then apply them
> to cases where the bound is specified after the pointer,
> which is more common in practice.
Yep; basically adding some (not perfect, but some) static analysis to
many libc function calls.
Also, considering the issues with sizeof and arrays, and the lack of a
_Nitems() [proposed as _Lengthof()] operator, there's a lot of manual
work in array (read pointer) parameters.
However, a hypothetical _Nitems() operator could make use of this
syntactic sugar, and be more useful than just providing static analysis.
Using _Nitems() on a VMT (including pointer parameters) could be
specified to return the number of elements, so I foresee code like:
void foo(int arr[nmemb], size_t nmemb)
{
// _Nitems() evaluates to nmemb
for (size_t i = 0; i < _Nitems(arr); i++)
arr[i] = i;
}
void bar(int arr[])
{
// Constraint violation
for (size_t i = 0; i < _Nitems(arr); i++)
arr[i] = i;
}
This is probably the most useful part of this feature (but admittedly
it's not only about this feature, or even could be added without this
feature).
>
>
> Martin
>
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-29 17:19 ` Alex Colomar
@ 2022-11-29 17:29 ` Alex Colomar
2022-12-03 21:03 ` Alejandro Colomar
0 siblings, 1 reply; 69+ messages in thread
From: Alex Colomar @ 2022-11-29 17:29 UTC (permalink / raw)
To: Martin Uecker, Jonathan Wakely, Joseph Myers
Cc: Michael Matz, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 1452 bytes --]
On 11/29/22 18:19, Alex Colomar wrote:
> Hi Martin, Joseph,
>
> On 11/29/22 18:00, Martin Uecker wrote:
>> Am Dienstag, dem 29.11.2022 um 16:53 +0000 schrieb Jonathan Wakely:
>>> On Tue, 29 Nov 2022 at 16:49, Joseph Myers wrote:
>>>>
>>>> On Tue, 29 Nov 2022, Michael Matz via Gcc wrote:
>>>>
>>>>> like. But I'm generally doubtful of this whole feature within C
>>>>> itself.
>>>>> It serves a purpose in documentation, so in man-pages it seems
>>>>> fine enough
>>>>> (but then still could use a different puncuator to not be
>>>>> confusable with
>>>>> C syntax).
>>>>
>>>> In man-pages you don't need to invent syntax at all. You can write
>>>>
>>>> int f(char buf[n], int n);
>>>>
>>>> and in the context of a man page it will be clear to readers what
>>>> is
>>>> meant,
>>>
>>> Considerably more clear than new invented syntax IMHO.
>>
>> True, but I think it would be a mistake to use code in
>> man pages which then does not work as expected (or even
>> is subtle wrong) in actual code.
>
> Exactly. Using your
s/your/Joseph's/
> proposed syntax (which was my first draft) would
> have probably been the source of hidden bugs, since it might work (read
> compile) in some cases, but with wrong results.
>
> I prefer this hypothetical syntax, which at most will cause compile errors.
>
> Cheers,
>
> Alex
>
>>
>> Martin
>>
>>
>>
>
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-29 17:29 ` Alex Colomar
@ 2022-12-03 21:03 ` Alejandro Colomar
2022-12-03 21:13 ` Andrew Pinski
` (2 more replies)
0 siblings, 3 replies; 69+ messages in thread
From: Alejandro Colomar @ 2022-12-03 21:03 UTC (permalink / raw)
To: Martin Uecker, Jonathan Wakely, Joseph Myers, Michael Matz
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 1641 bytes --]
Hi!
I'll probably have to release again before the Debian freeze of Bookworm.
That's something I didn't want to do, but there's some important bug that
affects downstream projects (translation pages), and I need to release. It's a
bit weird that the bug has been reported now, because it has always been there
(it's not a regression), but still, I want to address it before the next Debian.
And I don't want to start with stable releases, so I won't be releasing
man-pages-6.01.1. That means that all changes that I have in the project that I
didn't plan to release until 2024 will be released in a few weeks, notably
including the VLA syntax.
This means that while this syntax is still an invent, not something real that
can be used, I need to be careful about the future if I plan to make it public
so soon.
Since we've seen that using a '.' prefix seems to be problematic because of
lookahead, and recently Michael Matz proposed using a different punctuator (he
proposed '@') for differentiating parameters from struct members, I think going
in that direction may be a good idea.
How about '$'?
It's been used for function parameters since... forever? in sh(1). And it's
being added to the source character set in C23, so it seems to be a good choice.
It should also be intuitive what it means.
What do you think about it? I'm not asking for your opinion about adding it to
GCC, but rather for replacing the current '.' in the man-pages before I release
later this month. Do you think I should apply that change?
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-12-03 21:03 ` Alejandro Colomar
@ 2022-12-03 21:13 ` Andrew Pinski
2022-12-03 21:15 ` Martin Uecker
2022-12-06 2:08 ` Joseph Myers
2 siblings, 0 replies; 69+ messages in thread
From: Andrew Pinski @ 2022-12-03 21:13 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Martin Uecker, Jonathan Wakely, Joseph Myers, Michael Matz,
Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
On Sat, Dec 3, 2022 at 1:05 PM Alejandro Colomar via Gcc
<gcc@gcc.gnu.org> wrote:
>
> Hi!
>
> I'll probably have to release again before the Debian freeze of Bookworm.
> That's something I didn't want to do, but there's some important bug that
> affects downstream projects (translation pages), and I need to release. It's a
> bit weird that the bug has been reported now, because it has always been there
> (it's not a regression), but still, I want to address it before the next Debian.
>
> And I don't want to start with stable releases, so I won't be releasing
> man-pages-6.01.1. That means that all changes that I have in the project that I
> didn't plan to release until 2024 will be released in a few weeks, notably
> including the VLA syntax.
>
> This means that while this syntax is still an invent, not something real that
> can be used, I need to be careful about the future if I plan to make it public
> so soon.
>
> Since we've seen that using a '.' prefix seems to be problematic because of
> lookahead, and recently Michael Matz proposed using a different punctuator (he
> proposed '@') for differentiating parameters from struct members, I think going
> in that direction may be a good idea.
>
> How about '$'?
$ is a GNU extension for identifiers already.
See https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/Dollar-Signs.html#Dollar-Signs
Thanks,
Andrew
>
> It's been used for function parameters since... forever? in sh(1). And it's
> being added to the source character set in C23, so it seems to be a good choice.
> It should also be intuitive what it means.
>
> What do you think about it? I'm not asking for your opinion about adding it to
> GCC, but rather for replacing the current '.' in the man-pages before I release
> later this month. Do you think I should apply that change?
>
> Cheers,
>
> Alex
>
>
> --
> <http://www.alejandro-colomar.es/>
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-12-03 21:03 ` Alejandro Colomar
2022-12-03 21:13 ` Andrew Pinski
@ 2022-12-03 21:15 ` Martin Uecker
2022-12-03 21:18 ` Alejandro Colomar
2022-12-06 2:08 ` Joseph Myers
2 siblings, 1 reply; 69+ messages in thread
From: Martin Uecker @ 2022-12-03 21:15 UTC (permalink / raw)
To: Alejandro Colomar, Jonathan Wakely, Joseph Myers, Michael Matz
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
Am Samstag, dem 03.12.2022 um 22:03 +0100 schrieb Alejandro Colomar:
...
> Since we've seen that using a '.' prefix seems to be problematic
> because of lookahead, and recently Michael Matz proposed using a
> different punctuator (he proposed '@') for differentiating parameters
> from struct members, I think going in that direction may be a good
> idea.
>
> How about '$'?
I don't see how the lookahead issue has anything to do with the choice
of the symbol. Here, also with the context would fully disambiguate
between other uses so I do not think there is any issue with using this
syntax. '$' is much more problematic as people use it in identifiers,
'@' may cause confusion with objective C.
Martin
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-12-03 21:15 ` Martin Uecker
@ 2022-12-03 21:18 ` Alejandro Colomar
0 siblings, 0 replies; 69+ messages in thread
From: Alejandro Colomar @ 2022-12-03 21:18 UTC (permalink / raw)
To: Martin Uecker, Jonathan Wakely, Joseph Myers, Michael Matz
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 1435 bytes --]
Hi Martin and Andrew!
On 12/3/22 22:15, Martin Uecker wrote:
> Am Samstag, dem 03.12.2022 um 22:03 +0100 schrieb Alejandro Colomar:
> ...
>> Since we've seen that using a '.' prefix seems to be problematic
>> because of lookahead, and recently Michael Matz proposed using a
>> different punctuator (he proposed '@') for differentiating parameters
>> from struct members, I think going in that direction may be a good
>> idea.
>>
>> How about '$'?
>
> I don't see how the lookahead issue has anything to do with the choice
> of the symbol.
In simple [.identifier] expressions it's not a problem. I was foreseeing more
complex expressions, as I suggested earlier.
> Here, also with the context would fully disambiguate
> between other uses so I do not think there is any issue with using this
> syntax. '$' is much more problematic as people use it in identifiers,
> '@' may cause confusion with objective C.
On 12/3/22 22:13, Andrew Pinski wrote:
> $ is a GNU extension for identifiers already.
> Seehttps://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/Dollar-Signs.html#Dollar-Signs
>
> Thanks,
> Andrew
>
Hmmm, I see. '$' is too bad. '@' is confusing. I think I'll keep the '.' for
now then, and assume that there's a high possibility that we'll never have
complex expressions with it.
>
> Martin
>
Thanks you!
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-12-03 21:03 ` Alejandro Colomar
2022-12-03 21:13 ` Andrew Pinski
2022-12-03 21:15 ` Martin Uecker
@ 2022-12-06 2:08 ` Joseph Myers
2 siblings, 0 replies; 69+ messages in thread
From: Joseph Myers @ 2022-12-06 2:08 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Martin Uecker, Jonathan Wakely, Michael Matz, Ingo Schwarze,
JeanHeyd Meneide, linux-man, gcc
On Sat, 3 Dec 2022, Alejandro Colomar via Gcc wrote:
> What do you think about it? I'm not asking for your opinion about adding it
> to GCC, but rather for replacing the current '.' in the man-pages before I
> release later this month. Do you think I should apply that change?
I think man pages should not use any novel syntax - even syntax newly
added to the C standard or GCC, unless required to express the standard
prototype for a function. They should be written for maximal
comprehensibility to C users in general, who are often behind on knowledge
standard features let alone the more obscure extensions - and certainly
don't know about random, highly speculative suggestions for possible
features suggested in random mailing list threads. So: don't use any
invented syntax (even if you explain it somewhere in the man pages), don't
use any syntax newly introduced in C23 unless strictly necessary and
you're sure it's already extremely widely understood among C users, be
wary of syntax introduced in C11. If a new feature in this area were
introduced in C29, waiting at least several years after that standard is
released (*not* just after the feature gets added to a draft) to start
using the new syntax in man pages would be a good idea.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 69+ messages in thread
end of thread, other threads:[~2022-12-06 2:08 UTC | newest]
Thread overview: 69+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20220826210710.35237-1-alx.manpages@gmail.com>
[not found] ` <Ywn7jMtB5ppSW0PB@asta-kit.de>
[not found] ` <89d79095-d1cd-ab2b-00e4-caa31126751e@gmail.com>
[not found] ` <YwoXTGD8ljB8Gg6s@asta-kit.de>
[not found] ` <e29de088-ae10-bbc8-0bfd-90bbb63aaf06@gmail.com>
[not found] ` <5ba53bad-019e-8a94-d61e-85b2f13223a9@gmail.com>
[not found] ` <CACqA6+mfaj6Viw+LVOG=nE350gQhCwVKXRzycVru5Oi4EJzgTg@mail.gmail.com>
[not found] ` <491a930d-47eb-7c86-c0c4-25eef4ac0be0@gmail.com>
2022-09-02 21:57 ` [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters Alejandro Colomar
2022-09-03 12:47 ` Martin Uecker
2022-09-03 13:29 ` Ingo Schwarze
2022-09-03 15:08 ` Alejandro Colomar
2022-09-03 13:41 ` Alejandro Colomar
2022-09-03 14:35 ` Martin Uecker
2022-09-03 14:59 ` Alejandro Colomar
2022-09-03 15:31 ` Martin Uecker
2022-09-03 20:02 ` Alejandro Colomar
2022-09-05 14:31 ` Alejandro Colomar
2022-11-10 0:06 ` Alejandro Colomar
2022-11-10 0:09 ` Alejandro Colomar
2022-11-10 1:33 ` Joseph Myers
2022-11-10 1:39 ` Joseph Myers
2022-11-10 6:21 ` Martin Uecker
2022-11-10 10:09 ` Alejandro Colomar
2022-11-10 23:19 ` Joseph Myers
2022-11-10 23:28 ` Alejandro Colomar
2022-11-11 19:52 ` Martin Uecker
2022-11-12 1:09 ` Joseph Myers
2022-11-12 7:24 ` Martin Uecker
2022-11-12 12:34 ` Alejandro Colomar
2022-11-12 12:46 ` Alejandro Colomar
2022-11-12 13:03 ` Joseph Myers
2022-11-12 13:40 ` Alejandro Colomar
2022-11-12 13:58 ` Alejandro Colomar
2022-11-12 14:54 ` Joseph Myers
2022-11-12 15:35 ` Alejandro Colomar
2022-11-12 17:02 ` Joseph Myers
2022-11-12 17:08 ` Alejandro Colomar
2022-11-12 15:56 ` Martin Uecker
2022-11-13 13:19 ` Alejandro Colomar
2022-11-13 13:33 ` Alejandro Colomar
2022-11-13 14:02 ` Alejandro Colomar
2022-11-13 14:58 ` Martin Uecker
2022-11-13 15:15 ` Alejandro Colomar
2022-11-13 15:32 ` Martin Uecker
2022-11-13 16:25 ` Alejandro Colomar
2022-11-13 16:28 ` Alejandro Colomar
2022-11-13 16:31 ` Alejandro Colomar
2022-11-13 16:34 ` Alejandro Colomar
2022-11-13 16:56 ` Alejandro Colomar
2022-11-13 19:05 ` Alejandro Colomar
2022-11-14 18:13 ` Joseph Myers
2022-11-28 22:59 ` Alex Colomar
2022-11-28 23:18 ` Alex Colomar
2022-11-29 0:05 ` Joseph Myers
2022-11-29 14:58 ` Michael Matz
2022-11-29 15:17 ` Uecker, Martin
2022-11-29 15:44 ` Michael Matz
2022-11-29 16:58 ` Uecker, Martin
2022-11-29 17:28 ` Alex Colomar
2022-11-29 16:49 ` Joseph Myers
2022-11-29 16:53 ` Jonathan Wakely
2022-11-29 17:00 ` Martin Uecker
2022-11-29 17:19 ` Alex Colomar
2022-11-29 17:29 ` Alex Colomar
2022-12-03 21:03 ` Alejandro Colomar
2022-12-03 21:13 ` Andrew Pinski
2022-12-03 21:15 ` Martin Uecker
2022-12-03 21:18 ` Alejandro Colomar
2022-12-06 2:08 ` Joseph Myers
2022-11-14 17:52 ` Joseph Myers
2022-11-14 17:57 ` Alejandro Colomar
2022-11-14 18:26 ` Joseph Myers
2022-11-28 23:02 ` Alex Colomar
2022-11-10 9:40 ` G. Branden Robinson
2022-11-10 10:59 ` Alejandro Colomar
2022-11-10 22:25 ` G. Branden Robinson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).