Hi Martin!

On 11/12/22 16:56, Martin Uecker wrote:
> Am Samstag, den 12.11.2022, 14:54 +0000 schrieb Joseph Myers:
>> On Sat, 12 Nov 2022, Alejandro Colomar via Gcc wrote:
>>
>>> Since it's to be used as an rvalue, not as a lvalue, I guess a
>>> postfix-expression wouldn't be the right one.
>>
>> Several forms of postfix-expression are only rvalues.
>>
>>>> (with a special rule about how the identifier is interpreted, different
>>>> from the normal scope rules)?  If so, then ".a = 1" could either match
>>>> assignment-expression directly (assigning to the postfix-expression ".a").
>>>
>>> No, assigning to a function parameter from within another parameter
>>> declaration wouldn't make sense.  They should be readonly.  Side effects
>>> should be forbidden, I think.
>>
>> Such assignments are already allowed.  In a function definition, the side
>> effects (including in size expressions for array parameters adjusted to
>> pointers) take place before entry to the function body.
>>
>> And, in any case, if you did have a constraint disallowing such
>> assignments, it wouldn't suffice for syntactic disambiguation (see the
>> previous point I made about that; I have some rough notes towards a WG14
>> paper on syntactic disambiguation, but haven't converted them into a
>> coherent paper).
> 
> My idea was to only allow
> 
> array-declarator : direct-declarator [ . identifier ]
> 
> and only for parameter (not nested inside structs declared
> in parameter list) as a first step because it seems this
> would exclude all difficult cases.
> 
> But if we need to allow more complicated expressions, then
> it starts getting more complicated.

Ahh, I guess my work in documenting the man-pages prototypes got me thinking of 
those extensions to the idea.  I don't remember all the details :)

> 
> One could could allow more generic expressions, and
> specify that the .identifier refers to a
> parameter in
> the nearest lexically enclosing parameter list or
> struct/union.
> 
> Then
> 
> void foo(struct bar { int x; char c[.x] } a, int x);
> 
> would not be allowed (which is good because then we
> could later use the syntax also inside structs). If
> we apply scoping rules, the following would work:
> 
> struct bar { int y; };
> void foo(char p[((struct bar){ .y = .x }).y], int x);

Makes sense.

> 
> But not:
> 
> struct bar { int y; };
> void foo(char p[((struct bar){ .y = .y }).y], int y);

Although it clearly is nonsense, I'm not sure I'd make it a constraint 
violation, but rather Undefined Behavior.  How is it different than this?:

$ cat foo.c
int main(void)
{
	int i = i;
	return i;
}


$ gcc --version | head -n1
gcc (Debian 12.2.0-9) 12.2.0
$ gcc -Wall -Wextra -Werror foo.c
$

$ clang --version | head -n1
Debian clang version 14.0.6
$ clang -Wall -Wextra -Werror foo.c
foo.c:3:10: error: variable 'i' is uninitialized when used within its own 
initialization [-Werror,-Wuninitialized]
         int i = i;
             ~   ^
1 error generated.


BTW, I just freaked out that GCC can't catch this trivial bug.  Should I open a 
bug report?

> 
> 
> But there are not only syntactical problems, because
> also the type of the parameter might become relevant
> and then you can get circular dependencies:
> 
> void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);

This seems to be a difficult stone in the road.

> 
> I am not sure what would the best way to fix it. One
> could specifiy that parameters referred to by
> the .identifer syntax must of some integer type and
> that the sub-expression .identifer is always
> converted to a 'size_t'.

That makes sense, but then overnight some quite useful thing came to my mind 
that would not be possible with this limitation:


<https://software.codidact.com/posts/285946>

char *
stpecpy(char dst[.end - .dst], char *src, char end[1])
{
	for (/* void */; dst <= end; dst++) {
		*dst = *src++;
		if (*dst == '\0')
			return dst;
	}
	/* Truncation detected */
	*end = '\0';

#if !defined(NDEBUG)
	/* Consume the rest of the input string. */
	while (*src++) {};
#endif

	return end + 1;
}


stpecpy() is a function similar to strlcat(3) that gets a pointer to the end of 
the array instead of the size of the buffer.  This allows chaining without 
having performance issues[1].

[1]: <https://en.wikichip.org/wiki/schlemiel_the_painter%27s_algorithm>


Maybe allowing integral types and pointers would be enough.  However, foreseeing 
that the _Lengthof() proposal (BTW, which paper was it?) will succeed, and 
combining it with this one, _Lengthof(pointer) would ideally give the length of 
the array, so allowing pointers would conflict.

My solution is to disallow sizeof() and _Lengthof() on .identifier.  That could 
be done simply by saying that variably-modified types (VMT) are incomplete types 
until immediately after the comma that follows the parameter declaration. 
Therefore it would be allowed only in the same way as it is allowed right now 
with the normal syntax (i.e., after the parameter has been seen).

BTW, what was the number of the latest paper for _Lengthof() and what happened 
to it?  I guess it's likely to be added to C3x, isn't it?

And another BTW:  there's some kind of consistency in (some) projects for naming 
sizes, and I have pending a review of the Linux man-pages to make it consistent 
there too.

See the following table of usual conventions:

Operator/macro:                 variable names;    Description.
------------------------------|------------------|---------------------
strlen(3):                      length, len, l;    String length.
sizeof():                       size, sz, nbytes;  Identifier size in bytes.
nitems(), nelems():             n, nelem, nitems;  Array number of elements.
sizeof_array(), array_bytes():  size, sz, nbytes;  Array size in bytes.


Naming _Lengthof() the operator that gets the number of elements in an array 
would create naming confusion, since then length can mean two different things. 
I suggest _Nitemsof().


> 
> Maybe one should also add a constraint that all new
> type length expressions, i.e. using the syntax,
> can not have side effects. Or even that they follow
> all the rules of integer constant expressions with
> the fictitious assumption that all . identifer
> sub-expressions are integer constant expressions.
> The rationale being that this would facilitate
> compile time reasoning about length expressions.
>   
> 
> Martin
> 

Cheers,

Alex

-- 
<http://www.alejandro-colomar.es/>