public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: Alejandro Colomar <alx.manpages@gmail.com>
To: Stefan Puiu <stefan.puiu@gmail.com>
Cc: GNU C Library <libc-alpha@sourceware.org>,
	linux-man <linux-man@vger.kernel.org>,
	gcc@gcc.gnu.org, Igor Sysoev <igor@sysoev.ru>
Subject: Re: struct sockaddr_storage
Date: Fri, 20 Jan 2023 13:39:51 +0100	[thread overview]
Message-ID: <61bbb556-ff9b-ebdc-5566-bc1ae533c0aa@gmail.com> (raw)
In-Reply-To: <CACKs7VAXOXLw5Zm0wqVt8dDwam_=w8aeAu5wNpXcTRSqObimyQ@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 6064 bytes --]

Hi Stefan,

On 1/20/23 11:06, Stefan Puiu wrote:
> Hi Alex,
> 
> On Thu, Jan 19, 2023 at 4:14 PM Alejandro Colomar
> <alx.manpages@gmail.com> wrote:
>>
>> Hi!
>>
>> I just received a report about struct sockaddr_storage in the man pages.  It
>> reminded me of some concern I've always had about it: it doesn't seem to be a
>> usable type.
>>
>> It has some alignment promises that make it "just work" most of the time, but
>> it's still a UB mine, according to ISO C.
>>
>> According to strict aliasing rules, if you declare a variable of type 'struct
>> sockaddr_storage', that's what you get, and trying to access it later as some
>> other sockaddr_8 is simply not legal.  The compiler may assume those accesses
>> can't happen, and optimize as it pleases.
> 
> Can you detail the "is not legal" part?

I mean that it's Undefined Behavior contraband.

> How about the APIs like
> connect() etc that use pointers to struct sockaddr, where the
> underlying type is different, why would that be legal while using
> sockaddr_storage isn't?

That's also bad.  However, it can be fixed by fixing `sockaddr_storage` and 
telling everyone to use it instead of using whatever other `sockaddr_*`.  You 
need a union for the underlying storage, so that the library functions can 
access both as `sockaddr` and as `sockaddr_*`.

The problem isn't really in the implementation of connect(2), but on the type. 
The implementation of connect(2) would be fine if we just fixed the type.  See 
some example:

struct my_sockaddr_storage {
	union {
		sa_family_t          ss_family;
		struct sockaddr      sa;
		struct sockaddr_in   sin;
		struct sockaddr_in6  sin6;
		struct sockaddr_un   sun;
	};
};


void
foo(foo)
{
	struct my_sockaddr_storage  mss;
	struct sockaddr_storage     ss;

	// initialize mss and ss

	inet_sockaddr2str(&mss.sa);  // correct
	inet_sockaddr2str((struct sockaddr_storage *)&ss);  // UB
}

/* This function is correct, as far as the accessed object has the
  * type we're using.  That's only possible through a `union`, since
  * we're accessing it with 2 different types: `sockaddr` for the
  * `sa_family` and then the appropriate subtype for the address
  * itself.
  */
const char *
inet_sockaddr2str(const struct sockaddr *sa)
{
	struct sockaddr_in   *sin;
	struct sockaddr_in6  *sin6;

	static char          buf[INET_ADDRSTRLENMAX];

	switch (sa->sa_family) {
	case AF_INET:
		sin = (struct sockaddr_in *) sa;
		inet_ntop(AF_INET, &sin->sin_addr, buf, NITEMS(buf));
		return buf;
	case AF_INET6:
		sin6 = (struct sockaddr_in6 *) sa;
		inet_ntop(AF_INET6, &sin6->sin6_addr, buf, NITEMS(buf));
		return buf;
	default:
		errno = EAFNOSUPPORT;
		return NULL;
	}
}


BTW, you need a union _even if_ you only care about a single address family. 
That is, if you only care about Unix sockets, you can't declare your variable of 
type sockaddr_un, because the libc functions and syscalls still need to access 
it as a sockaddr to see which family it has.

> Will code break in practice?

Well, it depends on how much compilers advance.  Here's some interesting experiment:

<https://software.codidact.com/posts/287748/287750#answer-287750>

I wouldn't rely on Undefined Behavior not causing nasal demons.  When you get 
them, you can only kill them with garlic.

> 
>>
>> That means that one needs to declare a union with all possible sockaddr_* types
>> that are of interest, so that access as any of them is later allowed by the
>> compiler (of course, the user still needs to access the correct one, but that's
>> of course).
>>
>> In that union, one could add a member that is of type sockaddr_storage for
>> getting a more consistent structure size (for example, if some members are
>> conditional on preprocessor stuff), but I don't see much value in that.
>> Especially, given this comment that Igor Sysoev wrote in NGINX Unit's source code:
>>
>>    * struct sockaddr_storage is:
>>    *    128 bytes on Linux, FreeBSD, MacOSX, NetBSD;
>>    *    256 bytes on Solaris, OpenBSD, and HP-UX;
>>    *   1288 bytes on AIX.
>>    *
>>    * struct sockaddr_storage is too large on some platforms
>>    * or less than real maximum struct sockaddr_un length.
>>
>> Which makes it even more useless as a type.
> 
> I'm not sure using struct sockaddr_storage for storing sockaddr_un's
> (UNIX domain socket addresses, right?) is that common a usage. I've
> used it in the past to store either a sockaddr_in or a sockaddr_in6,
> and I think that would be a more common scenario. The comment above
> probably makes sense for nginx, but different projects have different
> needs.
> 
> As for the size, I guess it might matter if you want to port your code
> to AIX, Solaris, OpenBSD etc. I don't think all software is meant to
> be portable, though (or portable to those platforms). Maybe a warning
> is in order that, for portable code, developers should check its size
> on the other platforms targeted.

The size thing is just an added problem.  The deep problem is that you need to 
use a union that contains all types that you care about _plus_ plain sockaddr, 
because the structure will be accessed at least as a sockaddr, plus one of the 
different specialized structures.  So even for only sockaddr_un, you need at 
least the following:

union my_unix_sockaddr {
	struct sockaddr     sa;
	struct sockaddr_un  sun;
};

Not doing that will necessarily result in invoking Undefined Behavior at some point.

> 
> Just my 2 cents, as always,
> Stefan.

The good thing is that fixing sockaddr_storage and telling everybody to use it 
always fixes the problem, so I'm preparing a patch for glibc.

Cheers,

Alex

> 
>>
>>
>> Should we warn about uses of this type?  Should we recommend against using it in
>> the manual page, since there's no legitimate uses of it?
>>
>> Cheers,
>>
>> Alex
>>
>> --
>> <http://www.alejandro-colomar.es/>

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2023-01-20 12:40 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-19 14:11 Alejandro Colomar
2023-01-20 10:06 ` Stefan Puiu
2023-01-20 12:39   ` Alejandro Colomar [this message]
2023-01-23  7:40     ` Stefan Puiu
2023-01-23 16:03       ` Alejandro Colomar
2023-01-23 16:28         ` Richard Biener
2023-01-24 16:38           ` Alex Colomar
2023-01-23 16:37         ` Jakub Jelinek
2023-01-24 16:40           ` Alex Colomar
2023-01-24 18:00           ` Alex Colomar
2023-01-24 11:16   ` Rich Felker
2023-01-24 16:53     ` Alex Colomar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=61bbb556-ff9b-ebdc-5566-bc1ae533c0aa@gmail.com \
    --to=alx.manpages@gmail.com \
    --cc=gcc@gcc.gnu.org \
    --cc=igor@sysoev.ru \
    --cc=libc-alpha@sourceware.org \
    --cc=linux-man@vger.kernel.org \
    --cc=stefan.puiu@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).