Hi Zack, On 1/21/23 03:38, Alejandro Colomar wrote: > Hi Zack, > > On 1/20/23 20:25, Alejandro Colomar wrote: >> [CC += GCC]  // pun not intended :P >> >> Hi Zack, >> >> On 1/20/23 19:04, Zack Weinberg wrote: >>> On Fri, Jan 20, 2023, at 8:40 AM, Alejandro Colomar wrote: >>>> The historical design of `sockaddr_storage` makes it impossible to use >>>> without breaking strict aliasing rules.  Not only this type is unusable, >>>> but even the use of other `sockaddr_*` structures directly (when the >>>> programmer only cares about a single address family) is also incorrect, >>>> since at some point the structure will be accessed as a `sockaddr`, and >>>> that breaks strict aliasing rules too. >>>> >>>> So, the only way for a programmer to not invoke Undefined Behavior is to >>>> declare a union that includes `sockaddr` and any `sockaddr_*` structures >>>> that are of interest, which allows later accessing as either the correct >>>> structure or plain `sockaddr` for the sa_family. >>> >>> ... >>> >>>>      struct new_sockaddr_storage  nss; >>>> >>>>      // ... (initialize oss and nss) >>>> >>>>      inet_sockaddr2str(&nss.sa);  // correct (and has no casts) >>> >>> I think we need to move slowly here and be _sure_ that any proposed change >>> does in fact reduce the amount of UB. >> >> Sure, I'm just sending the patch to polish the idea around some concrete code. >> While I checked that it compiles, I didn't add any tests about it or anything, >> to see that it's usable (and Joseph's email showed me that it's far from being >> finished).  I expect that this'll take some time. >> >> >>>  This construct, in particular, might >>> not actually be correct in practice: see https://godbolt.org/z/rn51cracn for >>> a case where, if I'm reading it right, the compiler assumes that a write >>> through a `struct fancy *` cannot alter the values accessible through a >>> `struct simple *` even though both pointers point into the same union. >>> (Test case provided by ; >> > > [...] > > I was wrong in my guess; the correct output is 3/3; I think I had read it the > other way around.  So yes, I believe it's doing what you just wrote there, but > can't understand why. > > I reduced @supercat's example to a smaller reproducer program (I couldn't > minimize it any more than this; any further simplification removes the incorrect > behavior): > > #include > > struct a { int y[1];}; > struct b { int y[1];}; > union u  { struct a a; struct b b; }; > > > int read_a(struct a *a) > { >     return a->y[0]; > } > > void write_b(struct b *b, int j) > { >     b->y[j] = 2; > } > > int use_union(union u *u, int j) > { >     if (u->a.y[0] == 0) >         write_b(&u->b, j); >         //write_b((struct b *)u, j);   // this has the same issue >     return read_a(&u->a); >     return read_a((struct a *)u);      // this has the same issue > } > > int (*volatile vtest)(union u *u, int j) = use_union; > > int main(void) > { >     int       r1, r2; >     union u   u; >     struct b  b = {0}; > >     u.b = b; >     r1 = vtest(&u, 0); >     r2 = u.a.y[0]; > >     printf("%d/%d\n", r1, r2); > } Ahh, indeed it seems to be UB. It's in the same 6.5.2.3/6: there's a requirement that the information about the union is kept in the function in which it's accessed. The standard presents an example, which is a bit ambiguous: The following is not a valid fragment (because the union type is not visible within function f): struct t1 { int m; }; struct t2 { int m; }; int f(struct t1 *p1, struct t2 *p2) { if (p1->m < 0) p2->m = -p2->m; return p1->m; } int g() { union { struct t1 s1; struct t2 s2; } u; /* ... */ return f(&u.s1, &u.s2); } I don't know what's the intention if the union type was global but the variable `u` was still not seen by f(). But it seems GCC's interpretation is UB, according to the test we just saw. The solution that I can see for that is making sockaddr also be a union. That way, the union is kept across all calls (since they all use sockaddr). struct sockaddr { union { struct { sa_family_t sa_family; char sa_data[14]; // why 14? } struct sockaddr_in sin; struct sockaddr_in6 sin6; struct sockaddr_un sun; }; }; struct sockaddr_storage { union { sa_family_t ss_family; struct sockaddr sa; }; }; With this, sockaddr_storage becomes useless, but still usable. New code, could just use sockaddr and use the internal union members as necessary. Union info is kept across all function boundaries. Cheers, Alex --