Hi Stefan, On 1/20/23 11:06, Stefan Puiu wrote: > Hi Alex, > > On Thu, Jan 19, 2023 at 4:14 PM Alejandro Colomar > wrote: >> >> Hi! >> >> I just received a report about struct sockaddr_storage in the man pages. It >> reminded me of some concern I've always had about it: it doesn't seem to be a >> usable type. >> >> It has some alignment promises that make it "just work" most of the time, but >> it's still a UB mine, according to ISO C. >> >> According to strict aliasing rules, if you declare a variable of type 'struct >> sockaddr_storage', that's what you get, and trying to access it later as some >> other sockaddr_8 is simply not legal. The compiler may assume those accesses >> can't happen, and optimize as it pleases. > > Can you detail the "is not legal" part? I mean that it's Undefined Behavior contraband. > How about the APIs like > connect() etc that use pointers to struct sockaddr, where the > underlying type is different, why would that be legal while using > sockaddr_storage isn't? That's also bad. However, it can be fixed by fixing `sockaddr_storage` and telling everyone to use it instead of using whatever other `sockaddr_*`. You need a union for the underlying storage, so that the library functions can access both as `sockaddr` and as `sockaddr_*`. The problem isn't really in the implementation of connect(2), but on the type. The implementation of connect(2) would be fine if we just fixed the type. See some example: struct my_sockaddr_storage { union { sa_family_t ss_family; struct sockaddr sa; struct sockaddr_in sin; struct sockaddr_in6 sin6; struct sockaddr_un sun; }; }; void foo(foo) { struct my_sockaddr_storage mss; struct sockaddr_storage ss; // initialize mss and ss inet_sockaddr2str(&mss.sa); // correct inet_sockaddr2str((struct sockaddr_storage *)&ss); // UB } /* This function is correct, as far as the accessed object has the * type we're using. That's only possible through a `union`, since * we're accessing it with 2 different types: `sockaddr` for the * `sa_family` and then the appropriate subtype for the address * itself. */ const char * inet_sockaddr2str(const struct sockaddr *sa) { struct sockaddr_in *sin; struct sockaddr_in6 *sin6; static char buf[INET_ADDRSTRLENMAX]; switch (sa->sa_family) { case AF_INET: sin = (struct sockaddr_in *) sa; inet_ntop(AF_INET, &sin->sin_addr, buf, NITEMS(buf)); return buf; case AF_INET6: sin6 = (struct sockaddr_in6 *) sa; inet_ntop(AF_INET6, &sin6->sin6_addr, buf, NITEMS(buf)); return buf; default: errno = EAFNOSUPPORT; return NULL; } } BTW, you need a union _even if_ you only care about a single address family. That is, if you only care about Unix sockets, you can't declare your variable of type sockaddr_un, because the libc functions and syscalls still need to access it as a sockaddr to see which family it has. > Will code break in practice? Well, it depends on how much compilers advance. Here's some interesting experiment: I wouldn't rely on Undefined Behavior not causing nasal demons. When you get them, you can only kill them with garlic. > >> >> That means that one needs to declare a union with all possible sockaddr_* types >> that are of interest, so that access as any of them is later allowed by the >> compiler (of course, the user still needs to access the correct one, but that's >> of course). >> >> In that union, one could add a member that is of type sockaddr_storage for >> getting a more consistent structure size (for example, if some members are >> conditional on preprocessor stuff), but I don't see much value in that. >> Especially, given this comment that Igor Sysoev wrote in NGINX Unit's source code: >> >> * struct sockaddr_storage is: >> * 128 bytes on Linux, FreeBSD, MacOSX, NetBSD; >> * 256 bytes on Solaris, OpenBSD, and HP-UX; >> * 1288 bytes on AIX. >> * >> * struct sockaddr_storage is too large on some platforms >> * or less than real maximum struct sockaddr_un length. >> >> Which makes it even more useless as a type. > > I'm not sure using struct sockaddr_storage for storing sockaddr_un's > (UNIX domain socket addresses, right?) is that common a usage. I've > used it in the past to store either a sockaddr_in or a sockaddr_in6, > and I think that would be a more common scenario. The comment above > probably makes sense for nginx, but different projects have different > needs. > > As for the size, I guess it might matter if you want to port your code > to AIX, Solaris, OpenBSD etc. I don't think all software is meant to > be portable, though (or portable to those platforms). Maybe a warning > is in order that, for portable code, developers should check its size > on the other platforms targeted. The size thing is just an added problem. The deep problem is that you need to use a union that contains all types that you care about _plus_ plain sockaddr, because the structure will be accessed at least as a sockaddr, plus one of the different specialized structures. So even for only sockaddr_un, you need at least the following: union my_unix_sockaddr { struct sockaddr sa; struct sockaddr_un sun; }; Not doing that will necessarily result in invoking Undefined Behavior at some point. > > Just my 2 cents, as always, > Stefan. The good thing is that fixing sockaddr_storage and telling everybody to use it always fixes the problem, so I'm preparing a patch for glibc. Cheers, Alex > >> >> >> Should we warn about uses of this type? Should we recommend against using it in >> the manual page, since there's no legitimate uses of it? >> >> Cheers, >> >> Alex >> >> -- >> --