From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mailrelay.tugraz.at (mailrelay.tugraz.at [129.27.2.202]) by sourceware.org (Postfix) with ESMTPS id 1C2663858418 for ; Sat, 3 Sep 2022 14:35:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1C2663858418 Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=tugraz.at Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=tugraz.at Received: from [192.168.0.150] (84-115-222-2.cable.dynamic.surfer.at [84.115.222.2]) by mailrelay.tugraz.at (Postfix) with ESMTPSA id 4MKclB2CF0z1LLF5; Sat, 3 Sep 2022 16:35:34 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.11.0 mailrelay.tugraz.at 4MKclB2CF0z1LLF5 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tugraz.at; s=mailrelay; t=1662215735; bh=8363BPQnUTApnD288wpG6D1koeGL49gmWIsQYozDpzY=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=HB0kapb+ETfUjmrFyD7uoYDJDvMPSpk60RQSR+kobg0RuRzGgUltE+XB/WweB5P9W I27CHaE41GEbqcXaj4V9juxwQrq1EhfzjTpA1DjLjwjYRGI8Buxvy0kVurr/EcDP1n GxYTWdkmP/ZccSdLnmg+PAJMtjg1kpZ6jRPyOYwc= Message-ID: Subject: Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters From: Martin Uecker To: Alejandro Colomar Cc: Ingo Schwarze , JeanHeyd Meneide , linux-man@vger.kernel.org, gcc@gcc.gnu.org Date: Sat, 03 Sep 2022 16:35:33 +0200 In-Reply-To: References: <20220826210710.35237-1-alx.manpages@gmail.com> <89d79095-d1cd-ab2b-00e4-caa31126751e@gmail.com> <5ba53bad-019e-8a94-d61e-85b2f13223a9@gmail.com> <491a930d-47eb-7c86-c0c4-25eef4ac0be0@gmail.com> <2abccaa2-d472-4c5b-aea6-7a2dddd665da@gmail.com> <4475b350c2a4d60da540c0f3055f466640e6c409.camel@tugraz.at> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.30.5-1.1 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-TUG-Backscatter-control: G/VXY7/6zeyuAY/PU2/0qw X-Spam-Scanner: SpamAssassin 3.003001 X-Spam-Score-relay: -0.4 X-Scanned-By: MIMEDefang 2.74 on 129.27.10.116 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Am Samstag, den 03.09.2022, 15:41 +0200 schrieb Alejandro Colomar: > Hi Martin, > > On 9/3/22 14:47, Martin Uecker wrote: > [...] > > > GCC will warn if the bound is specified inconsistently between > > declarations and also emit warnings if it can see that a buffer > > which is passed is too small: > > > > https://godbolt.org/z/PsjPG1nv7 > > That's very good news! > > BTW, it's nice to see that GCC doesn't need 'static' for array > parameters. I never understood what the static keyword adds there. > There's no way one can specify an array size an mean anything other than > requiring that, for a non-null pointer, the array should have at least > that size. >From the C standard's point of view, void foo(int n, char buf[n]); is semantically equivalent to void foo(int, char *buf); and without 'static' the 'n' has no further meaning (this is different for pointers to arrays). The static keyword implies that the pointer is be valid and non-zero and that there must be at least 'n' elements accessible, so in some sense it is stronger (it implies alid non-zero pointers), but at the same time it does not imply a bound. But I agree that 'n' without 'static' should simply imply a bound and I think we should use it this way even when the standard currently does not attach a meaning to it. > > > > BTW: If you declare pointers to arrays (not first elements) you > > can get run-time bounds checking with UBSan: > > > > https://godbolt.org/z/TvMo89WfP > > Couldn't that be caught at compile time? n is certainly out of bounds > always for such an array, since the last element is n-1. Yes, in this example it could (and ideally should) be detected at compile time. But this notation already today allows passing of a bound across API boundaries and thus enables run-time detection of out-of-bound accesses even in scenarious where it could not be found at compile time. > > > > > Also, new code can be designed from the beginning so that sizes go > > > before their corresponding arrays, so that new code won't typically be > > > affected by the lack of this feature in the language. > > > > > > This leaves us with legacy code, especially libc, which just works, and > > > doesn't have any urgent needs to change their prototypes in this regard > > > (they could, to improve static analysis, but not what we'd call urgent). > > > > It would be useful step to find out-of-bounds problem in > > applications using libc. > > Yep, it would be very useful for that. Not urgent, but yes, very useful. > > > > > Let's take an example: > > > > > > > > > int getnameinfo(const struct sockaddr *restrict addr, > > > socklen_t addrlen, > > > char *restrict host, socklen_t hostlen, > > > char *restrict serv, socklen_t servlen, > > > int flags); > > > > > > and some transformations: > > > > > > > > > int getnameinfo(const struct sockaddr *restrict addr, > > > socklen_t addrlen, > > > char host[restrict hostlen], socklen_t hostlen, > > > char serv[restrict servlen], socklen_t servlen, > > > int flags); > > > > > > > > > int getnameinfo(socklen_t hostlen; > > > socklen_t servlen; > > > const struct sockaddr *restrict addr, > > > socklen_t addrlen, > > > char host[restrict hostlen], socklen_t hostlen, > > > char serv[restrict servlen], socklen_t servlen, > > > int flags); > > > > > > (I'm not sure if I used correct GNU syntax, since I never used that > > > extension myself.) > > > > > > The first transformation above is non-ambiguous, as concise as possible, > > > and its only issue is that it might complicate the implementation a bit > > > too much. I don't think forward-using a parameter's size would be too > > > much of a parsing problem for human readers. > > > > I personally find the second form not terrible. Being > > able to read code left-to-right, top-down is helpful in more > > complicated examples. > > > > > > > > > The second one is unnecessarily long and verbose, and semicolons are not > > > very distinguishable from commas, for human readers, which may be very > > > confusing. > > > > > > int foo(int a; int b[a], int a); > > > int foo(int a, int b[a], int o); > > > > > > Those two are very different to the compiler, and yet very similar to > > > the human eye. I don't like it. The fact that it allows for simpler > > > compilers isn't enough to overcome the readability issues. > > > > This is true, I would probably use it with a comma and/or > > syntax highlighting. > > > > > > > I think I'd prefer having the forward-using syntax as a non-standard > > > extension --or a standard but optional language feature-- to avoid > > > forcing small compilers to implement it, rather than having the GNU > > > extension standardized in all compilers. > > > > The problems with the second form are: > > > > - it is not 100% backwards compatible (which maybe ok though) as > > the semantics of the following code changes: > > > > int n; > > int foo(int a[n], int n); // refers to different n! > > > > Code written for new compilers could then be misunderstood > > by old compilers when a variable with 'n' is in scope. > > > > > > Hmmm, this one is serious. I can't seem to solve it with that syntax. > > > - it would generally be fundamentally new to C to have > > backwards references and parser might need to be changes > > to allow this > > > > > > - a compiler or tool then has to deal also with ugly > > corner cases such as mutual references: > > > > int foo(int (*a)[sizeof(*b)], int (*b)[sizeof(*a)]); > > > > > > > > We could consider new syntax such as > > > > int foo(char buf[.n], int n); > > > > > > Personally, I would prefer the conceptual simplicity of forward > > declarations and the fact that these exist already in GCC > > over any alternative. I would also not mind new syntax, but > > then one has to define the rules more precisely to avoid the > > aforementioned problems. > > What about taking something from K&R functions for this?: > > int foo(q; w; int a[q], int q, int s[w], int w); > > By not specifying the types, the syntax is again short. > This is left-to-right, so no problems with global variables, and no need > for complex parsers. > Also, by not specifying types, now it's more obvious to the naked eye > that there's a difference: I am ok with the syntax, but I am not sure how this would work. If the type is determined only later you would still have to change parsers (some C compilers do type checking and folding during parsing, so need the types to be known during parsing) and you also still have the problem with the mutual dependencies. We thought about using this syntax int foo(char buf[.n], int n); because it is new syntax which means we can restrict the size to be the name of a parameter instead of allowing arbitrary expressions, which then makes forward references less problematic. It is also consistent with designators in initializers and could also be extend to annotate flexible array members or for storing pointers to arrays in structures: struct { int n; char buf[.n]; }; struct { int n; char (*buf)[.n]; }; Martin > > int foo(a; int b[a], int a); > int foo(int a, int b[a], int o); > > > What do you think about this syntax?