From: Stefan Puiu <stefan.puiu@gmail.com>
To: Alejandro Colomar <alx.manpages@gmail.com>
Cc: GNU C Library <libc-alpha@sourceware.org>,
linux-man <linux-man@vger.kernel.org>,
gcc@gcc.gnu.org, Igor Sysoev <igor@sysoev.ru>
Subject: Re: struct sockaddr_storage
Date: Mon, 23 Jan 2023 09:40:57 +0200 [thread overview]
Message-ID: <CACKs7VDGAaSXkjeuBdvEkFbFJ_OnwObTf1_9eVb44RJf-O3Fwg@mail.gmail.com> (raw)
In-Reply-To: <61bbb556-ff9b-ebdc-5566-bc1ae533c0aa@gmail.com>
Hi Alex,
On Fri, Jan 20, 2023 at 2:40 PM Alejandro Colomar
<alx.manpages@gmail.com> wrote:
>
> Hi Stefan,
>
> On 1/20/23 11:06, Stefan Puiu wrote:
> > Hi Alex,
> >
> > On Thu, Jan 19, 2023 at 4:14 PM Alejandro Colomar
> > <alx.manpages@gmail.com> wrote:
> >>
> >> Hi!
> >>
> >> I just received a report about struct sockaddr_storage in the man pages. It
> >> reminded me of some concern I've always had about it: it doesn't seem to be a
> >> usable type.
> >>
> >> It has some alignment promises that make it "just work" most of the time, but
> >> it's still a UB mine, according to ISO C.
> >>
> >> According to strict aliasing rules, if you declare a variable of type 'struct
> >> sockaddr_storage', that's what you get, and trying to access it later as some
> >> other sockaddr_8 is simply not legal. The compiler may assume those accesses
> >> can't happen, and optimize as it pleases.
> >
> > Can you detail the "is not legal" part?
>
> I mean that it's Undefined Behavior contraband.
OK, next question. Is this theoretical or practical UB? People check
documentation about how to write code today, I think.
>
> > How about the APIs like
> > connect() etc that use pointers to struct sockaddr, where the
> > underlying type is different, why would that be legal while using
> > sockaddr_storage isn't?
>
> That's also bad. However, it can be fixed by fixing `sockaddr_storage` and
> telling everyone to use it instead of using whatever other `sockaddr_*`. You
> need a union for the underlying storage, so that the library functions can
> access both as `sockaddr` and as `sockaddr_*`.
>
> The problem isn't really in the implementation of connect(2), but on the type.
> The implementation of connect(2) would be fine if we just fixed the type. See
> some example:
>
> struct my_sockaddr_storage {
> union {
> sa_family_t ss_family;
> struct sockaddr sa;
> struct sockaddr_in sin;
> struct sockaddr_in6 sin6;
> struct sockaddr_un sun;
> };
> };
>
>
> void
> foo(foo)
> {
> struct my_sockaddr_storage mss;
> struct sockaddr_storage ss;
>
> // initialize mss and ss
>
> inet_sockaddr2str(&mss.sa); // correct
> inet_sockaddr2str((struct sockaddr_storage *)&ss); // UB
> }
>
> /* This function is correct, as far as the accessed object has the
> * type we're using. That's only possible through a `union`, since
> * we're accessing it with 2 different types: `sockaddr` for the
> * `sa_family` and then the appropriate subtype for the address
> * itself.
> */
> const char *
> inet_sockaddr2str(const struct sockaddr *sa)
> {
> struct sockaddr_in *sin;
> struct sockaddr_in6 *sin6;
>
> static char buf[INET_ADDRSTRLENMAX];
>
> switch (sa->sa_family) {
> case AF_INET:
> sin = (struct sockaddr_in *) sa;
> inet_ntop(AF_INET, &sin->sin_addr, buf, NITEMS(buf));
> return buf;
> case AF_INET6:
> sin6 = (struct sockaddr_in6 *) sa;
> inet_ntop(AF_INET6, &sin6->sin6_addr, buf, NITEMS(buf));
> return buf;
> default:
> errno = EAFNOSUPPORT;
> return NULL;
> }
> }
>
>
> BTW, you need a union _even if_ you only care about a single address family.
> That is, if you only care about Unix sockets, you can't declare your variable of
> type sockaddr_un, because the libc functions and syscalls still need to access
> it as a sockaddr to see which family it has.
>
> > Will code break in practice?
>
> Well, it depends on how much compilers advance. Here's some interesting experiment:
>
> <https://software.codidact.com/posts/287748/287750#answer-287750>
That code plays with 2 pointers to the same area, one to double and
one to int, so I don't think it's that similar to the sockaddr
situation. At least for struct sockaddr, the sa_family field is the
same for all struct sockaddr_* variants. Also, in practical terms, I
don't think any compiler optimization that breaks socket APIs (and, if
I recall correctly, there are instances of this pattern in the kernel
as well) is going to be an easy sell. It's possible, but realistically
speaking, I don't think it's going to happen.
>
> I wouldn't rely on Undefined Behavior not causing nasal demons. When you get
> them, you can only kill them with garlic.
OK, but not all theoretical issues have practical implications. Is
there code that can show UB in practical terms with struct
sockaddr_storage today? Like Eric mentioned in another thread, does
UBSan complain about code using struct sockaddr_storage?
Thanks,
Stefan.
>
> >
> >>
> >> That means that one needs to declare a union with all possible sockaddr_* types
> >> that are of interest, so that access as any of them is later allowed by the
> >> compiler (of course, the user still needs to access the correct one, but that's
> >> of course).
> >>
> >> In that union, one could add a member that is of type sockaddr_storage for
> >> getting a more consistent structure size (for example, if some members are
> >> conditional on preprocessor stuff), but I don't see much value in that.
> >> Especially, given this comment that Igor Sysoev wrote in NGINX Unit's source code:
> >>
> >> * struct sockaddr_storage is:
> >> * 128 bytes on Linux, FreeBSD, MacOSX, NetBSD;
> >> * 256 bytes on Solaris, OpenBSD, and HP-UX;
> >> * 1288 bytes on AIX.
> >> *
> >> * struct sockaddr_storage is too large on some platforms
> >> * or less than real maximum struct sockaddr_un length.
> >>
> >> Which makes it even more useless as a type.
> >
> > I'm not sure using struct sockaddr_storage for storing sockaddr_un's
> > (UNIX domain socket addresses, right?) is that common a usage. I've
> > used it in the past to store either a sockaddr_in or a sockaddr_in6,
> > and I think that would be a more common scenario. The comment above
> > probably makes sense for nginx, but different projects have different
> > needs.
> >
> > As for the size, I guess it might matter if you want to port your code
> > to AIX, Solaris, OpenBSD etc. I don't think all software is meant to
> > be portable, though (or portable to those platforms). Maybe a warning
> > is in order that, for portable code, developers should check its size
> > on the other platforms targeted.
>
> The size thing is just an added problem. The deep problem is that you need to
> use a union that contains all types that you care about _plus_ plain sockaddr,
> because the structure will be accessed at least as a sockaddr, plus one of the
> different specialized structures. So even for only sockaddr_un, you need at
> least the following:
>
> union my_unix_sockaddr {
> struct sockaddr sa;
> struct sockaddr_un sun;
> };
>
> Not doing that will necessarily result in invoking Undefined Behavior at some point.
>
> >
> > Just my 2 cents, as always,
> > Stefan.
>
> The good thing is that fixing sockaddr_storage and telling everybody to use it
> always fixes the problem, so I'm preparing a patch for glibc.
>
> Cheers,
>
> Alex
>
> >
> >>
> >>
> >> Should we warn about uses of this type? Should we recommend against using it in
> >> the manual page, since there's no legitimate uses of it?
> >>
> >> Cheers,
> >>
> >> Alex
> >>
> >> --
> >> <http://www.alejandro-colomar.es/>
>
> --
> <http://www.alejandro-colomar.es/>
next prev parent reply other threads:[~2023-01-23 7:41 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-19 14:11 Alejandro Colomar
2023-01-20 10:06 ` Stefan Puiu
2023-01-20 12:39 ` Alejandro Colomar
2023-01-23 7:40 ` Stefan Puiu [this message]
2023-01-23 16:03 ` Alejandro Colomar
2023-01-23 16:28 ` Richard Biener
2023-01-24 16:38 ` Alex Colomar
2023-01-23 16:37 ` Jakub Jelinek
2023-01-24 16:40 ` Alex Colomar
2023-01-24 18:00 ` Alex Colomar
2023-01-24 11:16 ` Rich Felker
2023-01-24 16:53 ` Alex Colomar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CACKs7VDGAaSXkjeuBdvEkFbFJ_OnwObTf1_9eVb44RJf-O3Fwg@mail.gmail.com \
--to=stefan.puiu@gmail.com \
--cc=alx.manpages@gmail.com \
--cc=gcc@gcc.gnu.org \
--cc=igor@sysoev.ru \
--cc=libc-alpha@sourceware.org \
--cc=linux-man@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).