public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: Stefan Puiu <stefan.puiu@gmail.com>
To: Alejandro Colomar <alx.manpages@gmail.com>
Cc: GNU C Library <libc-alpha@sourceware.org>,
	linux-man <linux-man@vger.kernel.org>,
	 gcc@gcc.gnu.org, Igor Sysoev <igor@sysoev.ru>
Subject: Re: struct sockaddr_storage
Date: Mon, 23 Jan 2023 09:40:57 +0200	[thread overview]
Message-ID: <CACKs7VDGAaSXkjeuBdvEkFbFJ_OnwObTf1_9eVb44RJf-O3Fwg@mail.gmail.com> (raw)
In-Reply-To: <61bbb556-ff9b-ebdc-5566-bc1ae533c0aa@gmail.com>

Hi Alex,

On Fri, Jan 20, 2023 at 2:40 PM Alejandro Colomar
<alx.manpages@gmail.com> wrote:
>
> Hi Stefan,
>
> On 1/20/23 11:06, Stefan Puiu wrote:
> > Hi Alex,
> >
> > On Thu, Jan 19, 2023 at 4:14 PM Alejandro Colomar
> > <alx.manpages@gmail.com> wrote:
> >>
> >> Hi!
> >>
> >> I just received a report about struct sockaddr_storage in the man pages.  It
> >> reminded me of some concern I've always had about it: it doesn't seem to be a
> >> usable type.
> >>
> >> It has some alignment promises that make it "just work" most of the time, but
> >> it's still a UB mine, according to ISO C.
> >>
> >> According to strict aliasing rules, if you declare a variable of type 'struct
> >> sockaddr_storage', that's what you get, and trying to access it later as some
> >> other sockaddr_8 is simply not legal.  The compiler may assume those accesses
> >> can't happen, and optimize as it pleases.
> >
> > Can you detail the "is not legal" part?
>
> I mean that it's Undefined Behavior contraband.

OK, next question. Is this theoretical or practical UB? People check
documentation about how to write code today, I think.

>
> > How about the APIs like
> > connect() etc that use pointers to struct sockaddr, where the
> > underlying type is different, why would that be legal while using
> > sockaddr_storage isn't?
>
> That's also bad.  However, it can be fixed by fixing `sockaddr_storage` and
> telling everyone to use it instead of using whatever other `sockaddr_*`.  You
> need a union for the underlying storage, so that the library functions can
> access both as `sockaddr` and as `sockaddr_*`.
>
> The problem isn't really in the implementation of connect(2), but on the type.
> The implementation of connect(2) would be fine if we just fixed the type.  See
> some example:
>
> struct my_sockaddr_storage {
>         union {
>                 sa_family_t          ss_family;
>                 struct sockaddr      sa;
>                 struct sockaddr_in   sin;
>                 struct sockaddr_in6  sin6;
>                 struct sockaddr_un   sun;
>         };
> };
>
>
> void
> foo(foo)
> {
>         struct my_sockaddr_storage  mss;
>         struct sockaddr_storage     ss;
>
>         // initialize mss and ss
>
>         inet_sockaddr2str(&mss.sa);  // correct
>         inet_sockaddr2str((struct sockaddr_storage *)&ss);  // UB
> }
>
> /* This function is correct, as far as the accessed object has the
>   * type we're using.  That's only possible through a `union`, since
>   * we're accessing it with 2 different types: `sockaddr` for the
>   * `sa_family` and then the appropriate subtype for the address
>   * itself.
>   */
> const char *
> inet_sockaddr2str(const struct sockaddr *sa)
> {
>         struct sockaddr_in   *sin;
>         struct sockaddr_in6  *sin6;
>
>         static char          buf[INET_ADDRSTRLENMAX];
>
>         switch (sa->sa_family) {
>         case AF_INET:
>                 sin = (struct sockaddr_in *) sa;
>                 inet_ntop(AF_INET, &sin->sin_addr, buf, NITEMS(buf));
>                 return buf;
>         case AF_INET6:
>                 sin6 = (struct sockaddr_in6 *) sa;
>                 inet_ntop(AF_INET6, &sin6->sin6_addr, buf, NITEMS(buf));
>                 return buf;
>         default:
>                 errno = EAFNOSUPPORT;
>                 return NULL;
>         }
> }
>
>
> BTW, you need a union _even if_ you only care about a single address family.
> That is, if you only care about Unix sockets, you can't declare your variable of
> type sockaddr_un, because the libc functions and syscalls still need to access
> it as a sockaddr to see which family it has.
>
> > Will code break in practice?
>
> Well, it depends on how much compilers advance.  Here's some interesting experiment:
>
> <https://software.codidact.com/posts/287748/287750#answer-287750>

That code plays with 2 pointers to the same area, one to double and
one to int, so I don't think it's that similar to the sockaddr
situation. At least for struct sockaddr, the sa_family field is the
same for all struct sockaddr_* variants. Also, in practical terms, I
don't think any compiler optimization that breaks socket APIs (and, if
I recall correctly, there are instances of this pattern in the kernel
as well) is going to be an easy sell. It's possible, but realistically
speaking, I don't think it's going to happen.

>
> I wouldn't rely on Undefined Behavior not causing nasal demons.  When you get
> them, you can only kill them with garlic.

OK, but not all theoretical issues have practical implications. Is
there code that can show UB in practical terms with struct
sockaddr_storage today? Like Eric mentioned in another thread, does
UBSan complain about code using struct sockaddr_storage?

Thanks,
Stefan.

>
> >
> >>
> >> That means that one needs to declare a union with all possible sockaddr_* types
> >> that are of interest, so that access as any of them is later allowed by the
> >> compiler (of course, the user still needs to access the correct one, but that's
> >> of course).
> >>
> >> In that union, one could add a member that is of type sockaddr_storage for
> >> getting a more consistent structure size (for example, if some members are
> >> conditional on preprocessor stuff), but I don't see much value in that.
> >> Especially, given this comment that Igor Sysoev wrote in NGINX Unit's source code:
> >>
> >>    * struct sockaddr_storage is:
> >>    *    128 bytes on Linux, FreeBSD, MacOSX, NetBSD;
> >>    *    256 bytes on Solaris, OpenBSD, and HP-UX;
> >>    *   1288 bytes on AIX.
> >>    *
> >>    * struct sockaddr_storage is too large on some platforms
> >>    * or less than real maximum struct sockaddr_un length.
> >>
> >> Which makes it even more useless as a type.
> >
> > I'm not sure using struct sockaddr_storage for storing sockaddr_un's
> > (UNIX domain socket addresses, right?) is that common a usage. I've
> > used it in the past to store either a sockaddr_in or a sockaddr_in6,
> > and I think that would be a more common scenario. The comment above
> > probably makes sense for nginx, but different projects have different
> > needs.
> >
> > As for the size, I guess it might matter if you want to port your code
> > to AIX, Solaris, OpenBSD etc. I don't think all software is meant to
> > be portable, though (or portable to those platforms). Maybe a warning
> > is in order that, for portable code, developers should check its size
> > on the other platforms targeted.
>
> The size thing is just an added problem.  The deep problem is that you need to
> use a union that contains all types that you care about _plus_ plain sockaddr,
> because the structure will be accessed at least as a sockaddr, plus one of the
> different specialized structures.  So even for only sockaddr_un, you need at
> least the following:
>
> union my_unix_sockaddr {
>         struct sockaddr     sa;
>         struct sockaddr_un  sun;
> };
>
> Not doing that will necessarily result in invoking Undefined Behavior at some point.
>
> >
> > Just my 2 cents, as always,
> > Stefan.
>
> The good thing is that fixing sockaddr_storage and telling everybody to use it
> always fixes the problem, so I'm preparing a patch for glibc.
>
> Cheers,
>
> Alex
>
> >
> >>
> >>
> >> Should we warn about uses of this type?  Should we recommend against using it in
> >> the manual page, since there's no legitimate uses of it?
> >>
> >> Cheers,
> >>
> >> Alex
> >>
> >> --
> >> <http://www.alejandro-colomar.es/>
>
> --
> <http://www.alejandro-colomar.es/>

  reply	other threads:[~2023-01-23  7:41 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-19 14:11 Alejandro Colomar
2023-01-20 10:06 ` Stefan Puiu
2023-01-20 12:39   ` Alejandro Colomar
2023-01-23  7:40     ` Stefan Puiu [this message]
2023-01-23 16:03       ` Alejandro Colomar
2023-01-23 16:28         ` Richard Biener
2023-01-24 16:38           ` Alex Colomar
2023-01-23 16:37         ` Jakub Jelinek
2023-01-24 16:40           ` Alex Colomar
2023-01-24 18:00           ` Alex Colomar
2023-01-24 11:16   ` Rich Felker
2023-01-24 16:53     ` Alex Colomar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACKs7VDGAaSXkjeuBdvEkFbFJ_OnwObTf1_9eVb44RJf-O3Fwg@mail.gmail.com \
    --to=stefan.puiu@gmail.com \
    --cc=alx.manpages@gmail.com \
    --cc=gcc@gcc.gnu.org \
    --cc=igor@sysoev.ru \
    --cc=libc-alpha@sourceware.org \
    --cc=linux-man@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).