From: "Bastien Roucariès" <rouca@debian.org>
To: Zack Weinberg <zack@owlfolio.org>,
GNU libc development <libc-alpha@sourceware.org>,
gcc@gcc.gnu.org, Alejandro Colomar <alx.manpages@gmail.com>
Cc: Alejandro Colomar <alx@kernel.org>,
'linux-man' <linux-man@vger.kernel.org>,
Eric Blake <eblake@redhat.com>,
Stefan Puiu <stefan.puiu@gmail.com>, Igor Sysoev <igor@sysoev.ru>
Subject: Re: [PATCH v2] socket: Implement sockaddr_storage with an anonymous union
Date: Sat, 21 Jan 2023 13:30:49 +0000 [thread overview]
Message-ID: <2611573.oxQNTHoq5J@portable-bastien> (raw)
In-Reply-To: <0eaedb18-54df-9c2d-49ab-397fd6b4f63a@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 6047 bytes --]
Le samedi 21 janvier 2023, 03:17:39 UTC Alejandro Colomar a écrit :
> Hi Zack,
>
> On 1/21/23 03:38, Alejandro Colomar wrote:
> > Hi Zack,
> >
> > On 1/20/23 20:25, Alejandro Colomar wrote:
> >> [CC += GCC] // pun not intended :P
> >>
> >> Hi Zack,
> >>
> >> On 1/20/23 19:04, Zack Weinberg wrote:
> >>> On Fri, Jan 20, 2023, at 8:40 AM, Alejandro Colomar wrote:
> >>>> The historical design of `sockaddr_storage` makes it impossible to use
> >>>> without breaking strict aliasing rules. Not only this type is unusable,
> >>>> but even the use of other `sockaddr_*` structures directly (when the
> >>>> programmer only cares about a single address family) is also incorrect,
> >>>> since at some point the structure will be accessed as a `sockaddr`, and
> >>>> that breaks strict aliasing rules too.
> >>>>
> >>>> So, the only way for a programmer to not invoke Undefined Behavior is to
> >>>> declare a union that includes `sockaddr` and any `sockaddr_*` structures
> >>>> that are of interest, which allows later accessing as either the correct
> >>>> structure or plain `sockaddr` for the sa_family.
> >>>
> >>> ...
> >>>
> >>>> struct new_sockaddr_storage nss;
> >>>>
> >>>> // ... (initialize oss and nss)
> >>>>
> >>>> inet_sockaddr2str(&nss.sa); // correct (and has no casts)
> >>>
> >>> I think we need to move slowly here and be _sure_ that any proposed change
> >>> does in fact reduce the amount of UB.
> >>
> >> Sure, I'm just sending the patch to polish the idea around some concrete code.
> >> While I checked that it compiles, I didn't add any tests about it or anything,
> >> to see that it's usable (and Joseph's email showed me that it's far from being
> >> finished). I expect that this'll take some time.
> >>
> >>
> >>> This construct, in particular, might
> >>> not actually be correct in practice: see https://godbolt.org/z/rn51cracn for
> >>> a case where, if I'm reading it right, the compiler assumes that a write
> >>> through a `struct fancy *` cannot alter the values accessible through a
> >>> `struct simple *` even though both pointers point into the same union.
> >>> (Test case provided by <https://stackoverflow.com/users/363751/supercat>;
> >>
> >
> > [...]
> >
> > I was wrong in my guess; the correct output is 3/3; I think I had read it the
> > other way around. So yes, I believe it's doing what you just wrote there, but
> > can't understand why.
> >
> > I reduced @supercat's example to a smaller reproducer program (I couldn't
> > minimize it any more than this; any further simplification removes the incorrect
> > behavior):
> >
> > #include <stdio.h>
> >
> > struct a { int y[1];};
> > struct b { int y[1];};
> > union u { struct a a; struct b b; };
> >
> >
> > int read_a(struct a *a)
> > {
> > return a->y[0];
> > }
> >
> > void write_b(struct b *b, int j)
> > {
> > b->y[j] = 2;
> > }
> >
> > int use_union(union u *u, int j)
> > {
> > if (u->a.y[0] == 0)
> > write_b(&u->b, j);
> > //write_b((struct b *)u, j); // this has the same issue
> > return read_a(&u->a);
> > return read_a((struct a *)u); // this has the same issue
> > }
> >
> > int (*volatile vtest)(union u *u, int j) = use_union;
> >
> > int main(void)
> > {
> > int r1, r2;
> > union u u;
> > struct b b = {0};
> >
> > u.b = b;
> > r1 = vtest(&u, 0);
> > r2 = u.a.y[0];
> >
> > printf("%d/%d\n", r1, r2);
> > }
>
>
> Ahh, indeed it seems to be UB. It's in the same 6.5.2.3/6: there's a
> requirement that the information about the union is kept in the function in
> which it's accessed.
>
> The standard presents an example, which is a bit ambiguous:
>
> The following is not a valid fragment (because the union type is not
> visible within function f):
>
> struct t1 { int m; };
> struct t2 { int m; };
> int f(struct t1 *p1, struct t2 *p2)
> {
> if (p1->m < 0)
> p2->m = -p2->m;
> return p1->m;
> }
> int g()
> {
> union {
> struct t1 s1;
> struct t2 s2;
> } u;
> /* ... */
> return f(&u.s1, &u.s2);
> }
>
> I don't know what's the intention if the union type was global but the variable
> `u` was still not seen by f(). But it seems GCC's interpretation is UB,
> according to the test we just saw.
>
> The solution that I can see for that is making sockaddr also be a union. That
> way, the union is kept across all calls (since they all use sockaddr).
>
> struct sockaddr {
> union {
> struct {
> sa_family_t sa_family;
> char sa_data[14]; // why 14?
> }
> struct sockaddr_in sin;
> struct sockaddr_in6 sin6;
> struct sockaddr_un sun;
> };
> };
No the solution is to avoid sockaddr and mark as deprecated. The problem it should be part of union without raising a warning each time we use a safe type...
The other solution is to render public and ABI stable the type here
https://github.com/bminor/glibc/blob/ae612c45efb5e34713859a5facf92368307efb6e/socket/sys/socket.h#L78
under for instance sockaddr_ptr and sockaddr_const_ptr
Moreover this are some patch arch by arch
https://sourceware.org/legacy-ml/libc-alpha/2016-02/msg00340.html that should be made default
Bastien
>
> struct sockaddr_storage {
> union {
> sa_family_t ss_family;
> struct sockaddr sa;
> };
> };
>
>
> With this, sockaddr_storage becomes useless, but still usable. New code, could
> just use sockaddr and use the internal union members as necessary. Union info
> is kept across all function boundaries.
>
> Cheers,
>
> Alex
>
> --
> <http://www.alejandro-colomar.es/>
>
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2023-01-21 13:30 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-20 13:40 Alejandro Colomar
2023-01-20 17:49 ` Joseph Myers
2023-01-20 19:26 ` Alejandro Colomar
2023-01-20 18:04 ` Zack Weinberg
2023-01-20 19:25 ` Alejandro Colomar
2023-01-21 2:38 ` Alejandro Colomar
2023-01-21 3:17 ` Alejandro Colomar
2023-01-21 13:30 ` Bastien Roucariès [this message]
2023-01-21 14:30 ` Alejandro Colomar
2023-01-22 14:12 ` Bastien Roucariès
2023-01-20 20:32 ` Bastien Roucariès
2023-01-20 20:38 ` Alejandro Colomar
2023-01-20 20:46 ` Bastien Roucariès
2023-01-20 20:51 ` Alejandro Colomar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2611573.oxQNTHoq5J@portable-bastien \
--to=rouca@debian.org \
--cc=alx.manpages@gmail.com \
--cc=alx@kernel.org \
--cc=eblake@redhat.com \
--cc=gcc@gcc.gnu.org \
--cc=igor@sysoev.ru \
--cc=libc-alpha@sourceware.org \
--cc=linux-man@vger.kernel.org \
--cc=stefan.puiu@gmail.com \
--cc=zack@owlfolio.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).