From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by sourceware.org (Postfix) with ESMTPS id ADAC53858D1E for ; Mon, 18 Mar 2024 09:19:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org ADAC53858D1E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=kernel.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=kernel.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org ADAC53858D1E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2604:1380:4641:c500::1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1710753588; cv=none; b=MMjW6HGym7C4lvNZCywi6mpALfa9cLNPW0sJvJN3M4H2+mHGNfSVA/b/dPhx1PpTDtY//eqMO6XtE2Qz1uFCe+jvwZnsEEnItQviS8zPQwOglWvkOkoB5GLliNG6GzZ/NAjTzzrwcjlBGWelZ2zfmDbCnr4oCHTRQT/d6q0HstM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1710753588; c=relaxed/simple; bh=Or5/MfNMn6WfELjXiPFvzMT6k0TRWExopbftMJocBWA=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=KjSfV39t943jZ8Wsr//5KYezBwaOjBQm09itr7dKGtsuiuMmiUNv6Z1Ose7+HKvl5W5BHqHwWwaTZ1A0jXzcizaHzclwsZKQroCvd9P1DEPi7l8vmg0rLU1dcUqwK/kF3Ic60mW/DhSASjzFTNzeza+lW7VgCfAcckOT3M3miNc= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 0BEE860B47; Mon, 18 Mar 2024 09:19:45 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0EA7AC433C7; Mon, 18 Mar 2024 09:19:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1710753584; bh=Or5/MfNMn6WfELjXiPFvzMT6k0TRWExopbftMJocBWA=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=JqtITvgMODDqhuZOubVd+VEcMtec7OSryebABS0KUAKAESUl5IdU2WE8fCm2q4jJW maxTb1SlolUM/7Puv1zyZMtEsLdd971GkBp0JVzF2VgYEPy3u/dQuJGUbccOV25QYw u0WFpuShbAjUnlE8XTjVC2vH+cHZqG04OrbhaMksZL/6gVb5v72nGsqz68WctW4Va3 T1gEQoSTRAH8IjwIdBtEnqyPPfcf31i+BYOeGTkxJz+jq7G993gKDMFJxBTeaKEUcS xZsUiDCgmm+xFajsLEtiKy0XzGzCFeU48K/QWp4sndasKTbU4WughPjEZM/p7sgBRP uVPrXPkLithrQ== Date: Mon, 18 Mar 2024 10:19:35 +0100 From: Alejandro Colomar To: Amol Surati Cc: gcc-help@gcc.gnu.org Subject: Re: Assignment of union containing const-qualifier member Message-ID: References: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="CNx2sj5EiXrCrR39" Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --CNx2sj5EiXrCrR39 Content-Type: text/plain; protected-headers=v1; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Date: Mon, 18 Mar 2024 10:19:35 +0100 From: Alejandro Colomar To: Amol Surati Cc: gcc-help@gcc.gnu.org Subject: Re: Assignment of union containing const-qualifier member Hi Amol, On Mon, Feb 12, 2024 at 04:15:24PM +0530, Amol Surati wrote: > Hi Alex, >=20 > On Wed, 7 Feb 2024 at 18:59, Alejandro Colomar wrote: > > > > Hi Amol, > > > > On Wed, Feb 07, 2024 at 09:43:55AM +0530, Amol Surati wrote: > [ ... ] > > The fact that the compiler does the equivalent of memcpy(3) doesn't > > imply a removal of type safety. > > > > The copiler is aware of the union type at the assignment, and so can > > emit diagnostics if anything is wrong. >=20 > True. That's the diagnostics that was emitted when running 'v=3Du'. >=20 > > > > When assigning to a union member, only that member needs to be copied, > > and I'm not sure what happens to other members that have a larger size. > > As far as I can read in the standard, it's Undefined Behavior. Maybe > > someone can correct me. >=20 > The std says: > "When a value is stored in a member of an object of union type, the bytes > of the object representation that do not correspond to that member but > do correspond to other members take unspecified values." Hmm, interesting. > > When assigning to a union (not its members), the standard doesn't seem > > to specify at all the semantics, so I can't really tell. Depending on > > the semantics of assigning to a member, I can tell. At first glance, I > > don't see anything we should worry. >=20 > [ ... ] > > Not really. If well used, they do actually improve type safety. Think > > of my string unions vs the struct proposed by Martin. My union has > > const correctness, which is impossible to achieve via structs. A >=20 > I am not sure I fully understand how using structs forbids achieving > const-correctness (vs the unions as declared), > but I am also not aware > of all the ways in which nginx uses strings. >=20 > [ ... ] >=20 > > Indeed. But I'm not sure I'd like a string type that I must treat as a > > FILE*. Also, how do you initialize such strings? With an EMPTY_STRING > > macro? Do you always get a heap pointer, so you can't declare a string > > in the stack as this? > > > > struct string s; > > > > What's the content of that string? Is a NULL pointer a valid state of > > the string? In NGINX we found this week several cases where we were > > doing pointer arithmetics on strings that were holding NULL, precisely > > due to this (or it was after some calloc(3), more likely, I don't > > remember). It was NULL + 0, which is the least of undefined behaviors > > I'm concerned about, but hey, there's another NULL to worry about. > > > > So, you'll need to do > > > > struct string s =3D EMPTY_STRING; > > > > to avoid ever having a NULL, which is yet another thing to worry about. > > > > Having suffered these strings, I wouldn't recommend them as something > > better than a char*. In NGINX, we have them due to historic reasons, > > and rewriting the entire code base would be unreasonable, and while they > > may have been a good optimization 2 decades ago, I bet that it's a > > useless micro-optimization today, and we could just use char* everywhere > > to have less problems. >=20 > If a variable is going to be overwritten without being read first, then t= here's > no need to initialize it. Within my implementation, the state resulting f= rom > zero-init, i.e. 'struct string s =3D {0};', is deliberately chosen to be > well-defined > for the implementation. For e.g. appending a char to a zero-init'd string= is > well-defined. Good. Reading a zero-init'd string would return something like "", I guess. Probably needs some special-case in the code, but ok. > > > > You can only make them const, if you use two distinct types: a read= -only > > > > version, let's call it rstring, and a read-write version, let's cal= l it > > > > string. > > > > > > But do we need both views (r/w and r/o) at the same time? > > > > Yes: I want to be able to take a r/w string and pass it to a function > > that accepts a r/o string. Since distinct types are incompatible, and > > this is the main problem with things like sockaddr, you can't just cast; > > that's likely to invoke Undefined Behavior. >=20 > The std says: > "A pointer to an object type may be converted to a pointer to a different > object type. If the resulting pointer is not correctly aligned for the > referenced type, the behavior is undefined. Otherwise, when converted > back again, the result shall compare equal to the original pointer." >=20 > So typecasting between object types is well-defined, given that the Yes, casting pointers is fine. > resulting pointer is indeed correctly aligned for the target type. The But having the resulting pointer be correctly aligned is enough for allowing a pointer conversion. But the only thing that the standard allows to do with that pointer is to convert it back. It is not allowed to dereference it, even if the alignment is correct. > various sock_addr types all impose a common default alignment, > because they all begin with a member of the same type. >=20 > The std. also says: > "An object shall have its stored value accessed only by an lvalue > expression that has one of the following types: > . . . > =E2=80=94 a character type." >=20 > For e.g., the _sys_bind system-call under Linux kernel receives the > same 'const struct sockaddr *', along with the size of the data, from > glibc. The kernel then copies the data using memcpy (or equivalent > routines). >=20 > Does the pointer conversion forbid the kernel from memcpy'ing the > object using that converted pointer? (It is likely that the converted > pointer, if it is not of compatible type, may not be dereferenced using > that type; but here the kernel isn't dereferencing the pointer using > the 'const struct sockaddr *' type; it is reading the obj-representation > using memcpy.) >=20 > If no, then this behaviour is well-defined. Nah, memcpy'ing is fair game. > If yes, then this behaviour is undefined, although, in practice, both > gcc and clang (the compilers that can compile Linux kernels) must be > allowing the expected behaviour here, since the compiled kernel > binaries do indeed work wherever they are deployed. This only means > that, *if* the behaviour turns out to be theoretically undefined, *then* > the std. can be augmented to specifically allow reading of > representations of objects pointed to by such suitably converted pointers, > if it does not already allow such reads. However, when the kernel writes data that the user should read, it does so via memcpy on a sockaddr_storage. How is the user supposed to read that data? Say we define a sockaddr_storage, pass it to getpeername(2), where the kernel writes to it, and then we want to read it. Which type should we use? Imagine that it's a Unix socket, so it should be as sockaddr_un. But it was never written as that type, so we can't. > [ ... ] >=20 > > > A function that is passed 'struct string *' can still be allowed to > > > inplace-modify data[i] by invoking appropriate string APIs that tempo= rarily > > > casts away the constness of data[i]. > > > > That would be a violation of const correctness. If necessary, yeah, go > > ahead. But personally, I'd prefer avoiding that. For something that is > > proposed as an inclusion to ISO C, I'd rather not have it. >=20 > I keep two bits of info within the string: > (1) whether the 'struct string' is heap-allocated, or not. > (2) whether the data (const char *) it points to is heap-allocated, or no= t. >=20 > The std. says: > "If an attempt is made to modify an object defined with a const-qualified > type through use of an lvalue with non-const-qualified type, the behavior > is undefined." >=20 > Given that a malloc'd object doesn't have its own declaration or a > declared type to begin with, it also doesn't have a definition, let alone > a const-qualified definition. Modifying malloc'd objects should not cause > const-correctness to be violated, even when casting away const on a > pointer that points to such an object. >=20 > Modifiable strings can be created from statically allocated data (for e.g= =2E, > pointing inside the .rodata section), and the data can be malloc'd and > changed/appended-to on a copy-on-write basis. Hmmm, while casting away const is valid, it's not safe. I'm not sure I'd call that const-correct. It's not UB, that I agree. But you're bypassing type safety when you do a cast. Have a lovely day! Alex --=20 Looking for a remote C programming job at the moment. --CNx2sj5EiXrCrR39 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE6jqH8KTroDDkXfJAnowa+77/2zIFAmX4BycACgkQnowa+77/ 2zKbDw//RE7GytsWEZCCWZB5Hx703yDliAQRbniQ8fzK9rhGNb0Hoik1oExe66R5 c3RLjJgva2h+0sHj1nxQ3HkROnQbLlSp3cE+npxgXqtEBR56hnnFQ8MbW8pP1FFv 2iwR3/7i7MO9WgM4e20cJ4YLC8ysolieuzE36GWgZxWfqn6wxDl2fFUqsUAURlet ofcmsHbR/ftXBqupJJiFdCDzINQyASW9Jeh5//5g+gtjS8B7ovUuEPGwjBJMIQW8 ol+fD6qebQ80Hxurra+6oYV36BOVz9ctuMBAozHEEdAhKJBUekWbrkwR7yJbvBHw ckP6tNgp0N9x+3mjjR7L9zEKKiScZWCK1TQb0lBcMcWYdIWpaHs81pTqP/zUQ48v 8S8vLrJ88OU5cqCZpgoX2AHIhS/Dcbk8LYjBwUj+/n+i7wr2UsjLLI4zACG3kPPl UzJlG2wGU+7qip7+wmv25ySgQCfKhAW/1fV8/jQ72rXYkPutERkva4/Y36Iw/BoC n2Udpt7zt7RFqOBT+Bu5J/qVQrWV5AHLUT2Il9zqpg4wgQ+fVfPzADo4+5gJWrGu KWcH9uYmoRl/6XRR4VXHO02aHNY7RQXux/8RkZKcc4WCMvc+c/6U/meXgUDRNnW9 TDjgE+tM71Iwqgf4zevlarVcUE/93YG0449diXkv193OBa1z9jA= =btlF -----END PGP SIGNATURE----- --CNx2sj5EiXrCrR39--