public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [PATCH v2] socket: Implement sockaddr_storage with an anonymous union
       [not found] ` <d77b529d-e54d-4919-87a4-d90fd816ba8b@app.fastmail.com>
@ 2023-01-20 19:25   ` Alejandro Colomar
  2023-01-21  2:38     ` Alejandro Colomar
  0 siblings, 1 reply; 9+ messages in thread
From: Alejandro Colomar @ 2023-01-20 19:25 UTC (permalink / raw)
  To: Zack Weinberg, GNU libc development, gcc
  Cc: Alejandro Colomar, 'linux-man',
	Bastien Roucariès, Eric Blake, Stefan Puiu, Igor Sysoev


[-- Attachment #1.1: Type: text/plain, Size: 3975 bytes --]

[CC += GCC]  // pun not intended :P

Hi Zack,

On 1/20/23 19:04, Zack Weinberg wrote:
> On Fri, Jan 20, 2023, at 8:40 AM, Alejandro Colomar wrote:
>> The historical design of `sockaddr_storage` makes it impossible to use
>> without breaking strict aliasing rules.  Not only this type is unusable,
>> but even the use of other `sockaddr_*` structures directly (when the
>> programmer only cares about a single address family) is also incorrect,
>> since at some point the structure will be accessed as a `sockaddr`, and
>> that breaks strict aliasing rules too.
>>
>> So, the only way for a programmer to not invoke Undefined Behavior is to
>> declare a union that includes `sockaddr` and any `sockaddr_*` structures
>> that are of interest, which allows later accessing as either the correct
>> structure or plain `sockaddr` for the sa_family.
> 
> ...
> 
>>      struct new_sockaddr_storage  nss;
>>
>>      // ... (initialize oss and nss)
>>
>>      inet_sockaddr2str(&nss.sa);  // correct (and has no casts)
> 
> I think we need to move slowly here and be _sure_ that any proposed change
> does in fact reduce the amount of UB.

Sure, I'm just sending the patch to polish the idea around some concrete code. 
While I checked that it compiles, I didn't add any tests about it or anything, 
to see that it's usable (and Joseph's email showed me that it's far from being 
finished).  I expect that this'll take some time.


>  This construct, in particular, might
> not actually be correct in practice: see https://godbolt.org/z/rn51cracn for
> a case where, if I'm reading it right, the compiler assumes that a write
> through a `struct fancy *` cannot alter the values accessible through a
> `struct simple *` even though both pointers point into the same union.
> (Test case provided by <https://stackoverflow.com/users/363751/supercat>;

I @supercat around?  Maybe you want to open a question on SO (or Codidact) and 
we can discuss it there; it would be interesting.  :)

About the program, I have doubts.  It's more or less what I asked on Codidact 
yesterday[1].  I can't find anything in the standard to support GCC's behavior, 
and am inclined to think that it's a compiler bug.  @Lundin's answer[2] seems 
reasonable.  But still, my understanding until yesterday seemed to be in line 
with the compiler behavior that you showed: the compiler sees a pointer to a 
different type, and assumes things.

I believe it's fine according to the common initial sequence rule in 
C11::6.5.2.3/6 <https://port70.net/%7Ensz/c/c11/n1570.html#6.5.2.3p6>
in combination with the pointer conversion rule in C11::6.3.2.3/7 
<https://port70.net/%7Ensz/c/c11/n1570.html#6.3.2.3p7>.

The test you showed in godbolt shows two behaviors: sometimes it prints 1/3 
(this is correct; it happens for high values of -O) and sometimes it prints 3/3 
(this is invalid; it happens for low values of -O).  BTW, I'm a bit surprised 
that GCC optimizes out the local in -O0 or even -Og.

What I guess that GCC is doing in the invalid case is that in main() it sees 
that myThing is modified in vtest(), and then the structure is only used through 
a pointer to a different member, which according to 6.5.2.3/6 is only allowed 
for reading but not writing, so it assumes that loading the structure again at 
printf() will be fine, so it optimizes out the local.  However, GCC didn't take 
into account that the pointer can later be casted back to the right member, and 
that way it's allowed to modify it, according to 6.3.2.3/7.

I had dropped GCC from the CC in the email to not spam them too much, but since 
this concerns them, I'll keep them in the loop from now on.



[1]:  <https://software.codidact.com/posts/287748>

[2]:  <https://software.codidact.com/posts/287748/287750#answer-287750>

> I don't know any other identifier for them.)
> 
> zw

Cheers,

Alex

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] socket: Implement sockaddr_storage with an anonymous union
       [not found] ` <5187043.CeDsVVrsAm@portable-bastien>
@ 2023-01-20 20:38   ` Alejandro Colomar
  2023-01-20 20:46     ` Bastien Roucariès
  0 siblings, 1 reply; 9+ messages in thread
From: Alejandro Colomar @ 2023-01-20 20:38 UTC (permalink / raw)
  To: Bastien Roucariès, libc-alpha
  Cc: Alejandro Colomar, linux-man, Eric Blake, Zack Weinberg,
	Stefan Puiu, Igor Sysoev, gcc, Joseph Myers


[-- Attachment #1.1: Type: text/plain, Size: 2037 bytes --]

Hi Bastien,

On 1/20/23 21:32, Bastien Roucariès wrote:
[...]

>> diff --git a/bits/socket.h b/bits/socket.h
>> index aac8c49b00..c0c23b4e84 100644
>> --- a/bits/socket.h
>> +++ b/bits/socket.h
>> @@ -168,9 +168,14 @@ struct sockaddr
>>   
>>   struct sockaddr_storage
>>     {
>> -    __SOCKADDR_COMMON (ss_);	/* Address family, etc.  */
>> -    char __ss_padding[_SS_PADSIZE];
>> -    __ss_aligntype __ss_align;	/* Force desired alignment.  */
> no this is not correct you break ABI by reducing size
>> +    union
>> +      {
>> +        __SOCKADDR_COMMON (ss_);	/* Address family, etc.  */
>> +        struct sockaddr      sa;
>> +        struct sockaddr_in   sin;
>> +        struct sockaddr_in6  sin6;
>> +        struct sockaddr_un   sun;
>> +      };
>>     };
> 
> Correct one structure is
> 
> struct __private_sock_storage {
>      __SOCKADDR_COMMON (ssprivate_);   /* Address family, etc. */
>      char __ss_padding[_SS_PADSIZE];
>      __ss_aligntype __ss_align; /* Force desired alignment. */
> }

What is this structure for?  I expect that it's for declaring a wide-enough and 
correctly aligned type, but the union containing all the other types already 
guarantees a size as wide as any other sockaddr_* and with the widest alignment.

Also, any member that is necessary for superalignment or padding could be added 
at the end of sockaddr_storage, after the anon union; you don't need the extra 
struct, I guess.

Right?

> 
>   struct sockaddr_storage
>     {
>         union
>        {
>           __SOCKADDR_COMMON (ss_);       /* Address family, etc. */
>          struct sockaddr      sa;
>           struct sockaddr_in   sin;
>          struct sockaddr_in6  sin6;
>          struct sockaddr_un   sun;
>          struct __private_sock_storage _private;
>        };
> };
> 
> May it could be dropped later using align construct for modern C and padding
> 

Cheers,

Alex

> Bastien
>>   
>>   
>>
> 

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] socket: Implement sockaddr_storage with an anonymous union
  2023-01-20 20:38   ` Alejandro Colomar
@ 2023-01-20 20:46     ` Bastien Roucariès
  2023-01-20 20:51       ` Alejandro Colomar
  0 siblings, 1 reply; 9+ messages in thread
From: Bastien Roucariès @ 2023-01-20 20:46 UTC (permalink / raw)
  To: libc-alpha, Alejandro Colomar
  Cc: Alejandro Colomar, linux-man, Eric Blake, Zack Weinberg,
	Stefan Puiu, Igor Sysoev, gcc, Joseph Myers

[-- Attachment #1: Type: text/plain, Size: 2703 bytes --]

Le vendredi 20 janvier 2023, 20:38:32 UTC Alejandro Colomar a écrit :
> Hi Bastien,
> 
> On 1/20/23 21:32, Bastien Roucariès wrote:
> [...]
> 
> >> diff --git a/bits/socket.h b/bits/socket.h
> >> index aac8c49b00..c0c23b4e84 100644
> >> --- a/bits/socket.h
> >> +++ b/bits/socket.h
> >> @@ -168,9 +168,14 @@ struct sockaddr
> >>   
> >>   struct sockaddr_storage
> >>     {
> >> -    __SOCKADDR_COMMON (ss_);	/* Address family, etc.  */
> >> -    char __ss_padding[_SS_PADSIZE];
> >> -    __ss_aligntype __ss_align;	/* Force desired alignment.  */
> > no this is not correct you break ABI by reducing size
> >> +    union
> >> +      {
> >> +        __SOCKADDR_COMMON (ss_);	/* Address family, etc.  */
> >> +        struct sockaddr      sa;
> >> +        struct sockaddr_in   sin;
> >> +        struct sockaddr_in6  sin6;
> >> +        struct sockaddr_un   sun;
> >> +      };
> >>     };
> > 
> > Correct one structure is
> > 
> > struct __private_sock_storage {
> >      __SOCKADDR_COMMON (ssprivate_);   /* Address family, etc. */
> >      char __ss_padding[_SS_PADSIZE];
> >      __ss_aligntype __ss_align; /* Force desired alignment. */
> > }
> 
> What is this structure for?  I expect that it's for declaring a wide-enough and 
> correctly aligned type, but the union containing all the other types already 
> guarantees a size as wide as any other sockaddr_* and with the widest alignment.
> 
> Also, any member that is necessary for superalignment or padding could be added 
> at the end of sockaddr_storage, after the anon union; you don't need the extra 
> struct, I guess.

No we need it, max of structure is struct sockaddr_un   sun and is size of 108.
sizeof(sockaddr_storage) is 128...

Did you see the line of the kernel source I send you ? kernel expect size of 109 for un aka we should pad by a nul byte...

I think it is safer in a first step, to keep the old structure... Maybe later simplify

Did you also see 
https://github.com/bminor/glibc/blob/master/socket/sys/socket.h#L63

Bastien


> 
> Right?
> 
> > 
> >   struct sockaddr_storage
> >     {
> >         union
> >        {
> >           __SOCKADDR_COMMON (ss_);       /* Address family, etc. */
> >          struct sockaddr      sa;
> >           struct sockaddr_in   sin;
> >          struct sockaddr_in6  sin6;
> >          struct sockaddr_un   sun;
> >          struct __private_sock_storage _private;
> >        };
> > };
> > 
> > May it could be dropped later using align construct for modern C and padding
> > 
> 
> Cheers,
> 
> Alex
> 
> > Bastien
> >>   
> >>   
> >>
> > 
> 
> -- 
> <http://www.alejandro-colomar.es/>
> 


[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] socket: Implement sockaddr_storage with an anonymous union
  2023-01-20 20:46     ` Bastien Roucariès
@ 2023-01-20 20:51       ` Alejandro Colomar
  0 siblings, 0 replies; 9+ messages in thread
From: Alejandro Colomar @ 2023-01-20 20:51 UTC (permalink / raw)
  To: Bastien Roucariès, libc-alpha
  Cc: Alejandro Colomar, linux-man, Eric Blake, Zack Weinberg,
	Stefan Puiu, Igor Sysoev, gcc, Joseph Myers


[-- Attachment #1.1: Type: text/plain, Size: 2801 bytes --]



On 1/20/23 21:46, Bastien Roucariès wrote:
> Le vendredi 20 janvier 2023, 20:38:32 UTC Alejandro Colomar a écrit :
>> Hi Bastien,
>>
>> On 1/20/23 21:32, Bastien Roucariès wrote:
>> [...]
>>
>>>> diff --git a/bits/socket.h b/bits/socket.h
>>>> index aac8c49b00..c0c23b4e84 100644
>>>> --- a/bits/socket.h
>>>> +++ b/bits/socket.h
>>>> @@ -168,9 +168,14 @@ struct sockaddr
>>>>    
>>>>    struct sockaddr_storage
>>>>      {
>>>> -    __SOCKADDR_COMMON (ss_);	/* Address family, etc.  */
>>>> -    char __ss_padding[_SS_PADSIZE];
>>>> -    __ss_aligntype __ss_align;	/* Force desired alignment.  */
>>> no this is not correct you break ABI by reducing size
>>>> +    union
>>>> +      {
>>>> +        __SOCKADDR_COMMON (ss_);	/* Address family, etc.  */
>>>> +        struct sockaddr      sa;
>>>> +        struct sockaddr_in   sin;
>>>> +        struct sockaddr_in6  sin6;
>>>> +        struct sockaddr_un   sun;
>>>> +      };
>>>>      };
>>>
>>> Correct one structure is
>>>
>>> struct __private_sock_storage {
>>>       __SOCKADDR_COMMON (ssprivate_);   /* Address family, etc. */
>>>       char __ss_padding[_SS_PADSIZE];
>>>       __ss_aligntype __ss_align; /* Force desired alignment. */
>>> }
>>
>> What is this structure for?  I expect that it's for declaring a wide-enough and
>> correctly aligned type, but the union containing all the other types already
>> guarantees a size as wide as any other sockaddr_* and with the widest alignment.
>>
>> Also, any member that is necessary for superalignment or padding could be added
>> at the end of sockaddr_storage, after the anon union; you don't need the extra
>> struct, I guess.
> 
> No we need it, max of structure is struct sockaddr_un   sun and is size of 108.
> sizeof(sockaddr_storage) is 128...
> 
> Did you see the line of the kernel source I send you ? kernel expect size of 109 for un aka we should pad by a nul byte...

Yes, I saw it.  But that line from the kernel is already Undefined Behavior. 
The correct fix should go to sockaddr_un, not sockaddr_storage, IMO.

However, applying this change to sockaddr_storage would expose that kernel bug, 
so I think a prepatch to sockaddr_un that adds a padding byte to sockaddr_un 
would make sense.

struct sockaddr_un {
	__kernel_sa_family_t sun_family; /* AF_UNIX */
	char sun_path[UNIX_PATH_MAX];	/* pathname */
	char __null;  // make sure sun_path is terminated
};


> 
> I think it is safer in a first step, to keep the old structure... Maybe later simplify
> 
> Did you also see
> https://github.com/bminor/glibc/blob/master/socket/sys/socket.h#L63

Heh, I didn't see that one :)
I'll take it into account for a revision of the patch.

Cheers,

Alex


-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] socket: Implement sockaddr_storage with an anonymous union
  2023-01-20 19:25   ` [PATCH v2] socket: Implement sockaddr_storage with an anonymous union Alejandro Colomar
@ 2023-01-21  2:38     ` Alejandro Colomar
  2023-01-21  3:17       ` Alejandro Colomar
  0 siblings, 1 reply; 9+ messages in thread
From: Alejandro Colomar @ 2023-01-21  2:38 UTC (permalink / raw)
  To: Zack Weinberg, GNU libc development, gcc
  Cc: Alejandro Colomar, 'linux-man',
	Bastien Roucariès, Eric Blake, Stefan Puiu, Igor Sysoev


[-- Attachment #1.1: Type: text/plain, Size: 3179 bytes --]

Hi Zack,

On 1/20/23 20:25, Alejandro Colomar wrote:
> [CC += GCC]  // pun not intended :P
> 
> Hi Zack,
> 
> On 1/20/23 19:04, Zack Weinberg wrote:
>> On Fri, Jan 20, 2023, at 8:40 AM, Alejandro Colomar wrote:
>>> The historical design of `sockaddr_storage` makes it impossible to use
>>> without breaking strict aliasing rules.  Not only this type is unusable,
>>> but even the use of other `sockaddr_*` structures directly (when the
>>> programmer only cares about a single address family) is also incorrect,
>>> since at some point the structure will be accessed as a `sockaddr`, and
>>> that breaks strict aliasing rules too.
>>>
>>> So, the only way for a programmer to not invoke Undefined Behavior is to
>>> declare a union that includes `sockaddr` and any `sockaddr_*` structures
>>> that are of interest, which allows later accessing as either the correct
>>> structure or plain `sockaddr` for the sa_family.
>>
>> ...
>>
>>>      struct new_sockaddr_storage  nss;
>>>
>>>      // ... (initialize oss and nss)
>>>
>>>      inet_sockaddr2str(&nss.sa);  // correct (and has no casts)
>>
>> I think we need to move slowly here and be _sure_ that any proposed change
>> does in fact reduce the amount of UB.
> 
> Sure, I'm just sending the patch to polish the idea around some concrete code. 
> While I checked that it compiles, I didn't add any tests about it or anything, 
> to see that it's usable (and Joseph's email showed me that it's far from being 
> finished).  I expect that this'll take some time.
> 
> 
>>  This construct, in particular, might
>> not actually be correct in practice: see https://godbolt.org/z/rn51cracn for
>> a case where, if I'm reading it right, the compiler assumes that a write
>> through a `struct fancy *` cannot alter the values accessible through a
>> `struct simple *` even though both pointers point into the same union.
>> (Test case provided by <https://stackoverflow.com/users/363751/supercat>;
> 

[...]

I was wrong in my guess; the correct output is 3/3; I think I had read it the 
other way around.  So yes, I believe it's doing what you just wrote there, but 
can't understand why.

I reduced @supercat's example to a smaller reproducer program (I couldn't 
minimize it any more than this; any further simplification removes the incorrect 
behavior):

#include <stdio.h>

struct a { int y[1];};
struct b { int y[1];};
union u  { struct a a; struct b b; };


int read_a(struct a *a)
{
     return a->y[0];
}

void write_b(struct b *b, int j)
{
     b->y[j] = 2;
}

int use_union(union u *u, int j)
{
     if (u->a.y[0] == 0)
         write_b(&u->b, j);
         //write_b((struct b *)u, j);   // this has the same issue
     return read_a(&u->a);
     return read_a((struct a *)u);      // this has the same issue
}

int (*volatile vtest)(union u *u, int j) = use_union;

int main(void)
{
     int       r1, r2;
     union u   u;
     struct b  b = {0};

     u.b = b;
     r1 = vtest(&u, 0);
     r2 = u.a.y[0];

     printf("%d/%d\n", r1, r2);
}


Cheers,

Alex

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] socket: Implement sockaddr_storage with an anonymous union
  2023-01-21  2:38     ` Alejandro Colomar
@ 2023-01-21  3:17       ` Alejandro Colomar
  2023-01-21 13:30         ` Bastien Roucariès
  0 siblings, 1 reply; 9+ messages in thread
From: Alejandro Colomar @ 2023-01-21  3:17 UTC (permalink / raw)
  To: Zack Weinberg, GNU libc development, gcc
  Cc: Alejandro Colomar, 'linux-man',
	Bastien Roucariès, Eric Blake, Stefan Puiu, Igor Sysoev


[-- Attachment #1.1: Type: text/plain, Size: 5179 bytes --]

Hi Zack,

On 1/21/23 03:38, Alejandro Colomar wrote:
> Hi Zack,
> 
> On 1/20/23 20:25, Alejandro Colomar wrote:
>> [CC += GCC]  // pun not intended :P
>>
>> Hi Zack,
>>
>> On 1/20/23 19:04, Zack Weinberg wrote:
>>> On Fri, Jan 20, 2023, at 8:40 AM, Alejandro Colomar wrote:
>>>> The historical design of `sockaddr_storage` makes it impossible to use
>>>> without breaking strict aliasing rules.  Not only this type is unusable,
>>>> but even the use of other `sockaddr_*` structures directly (when the
>>>> programmer only cares about a single address family) is also incorrect,
>>>> since at some point the structure will be accessed as a `sockaddr`, and
>>>> that breaks strict aliasing rules too.
>>>>
>>>> So, the only way for a programmer to not invoke Undefined Behavior is to
>>>> declare a union that includes `sockaddr` and any `sockaddr_*` structures
>>>> that are of interest, which allows later accessing as either the correct
>>>> structure or plain `sockaddr` for the sa_family.
>>>
>>> ...
>>>
>>>>      struct new_sockaddr_storage  nss;
>>>>
>>>>      // ... (initialize oss and nss)
>>>>
>>>>      inet_sockaddr2str(&nss.sa);  // correct (and has no casts)
>>>
>>> I think we need to move slowly here and be _sure_ that any proposed change
>>> does in fact reduce the amount of UB.
>>
>> Sure, I'm just sending the patch to polish the idea around some concrete code. 
>> While I checked that it compiles, I didn't add any tests about it or anything, 
>> to see that it's usable (and Joseph's email showed me that it's far from being 
>> finished).  I expect that this'll take some time.
>>
>>
>>>  This construct, in particular, might
>>> not actually be correct in practice: see https://godbolt.org/z/rn51cracn for
>>> a case where, if I'm reading it right, the compiler assumes that a write
>>> through a `struct fancy *` cannot alter the values accessible through a
>>> `struct simple *` even though both pointers point into the same union.
>>> (Test case provided by <https://stackoverflow.com/users/363751/supercat>;
>>
> 
> [...]
> 
> I was wrong in my guess; the correct output is 3/3; I think I had read it the 
> other way around.  So yes, I believe it's doing what you just wrote there, but 
> can't understand why.
> 
> I reduced @supercat's example to a smaller reproducer program (I couldn't 
> minimize it any more than this; any further simplification removes the incorrect 
> behavior):
> 
> #include <stdio.h>
> 
> struct a { int y[1];};
> struct b { int y[1];};
> union u  { struct a a; struct b b; };
> 
> 
> int read_a(struct a *a)
> {
>      return a->y[0];
> }
> 
> void write_b(struct b *b, int j)
> {
>      b->y[j] = 2;
> }
> 
> int use_union(union u *u, int j)
> {
>      if (u->a.y[0] == 0)
>          write_b(&u->b, j);
>          //write_b((struct b *)u, j);   // this has the same issue
>      return read_a(&u->a);
>      return read_a((struct a *)u);      // this has the same issue
> }
> 
> int (*volatile vtest)(union u *u, int j) = use_union;
> 
> int main(void)
> {
>      int       r1, r2;
>      union u   u;
>      struct b  b = {0};
> 
>      u.b = b;
>      r1 = vtest(&u, 0);
>      r2 = u.a.y[0];
> 
>      printf("%d/%d\n", r1, r2);
> }


Ahh, indeed it seems to be UB.  It's in the same 6.5.2.3/6:  there's a 
requirement that the information about the union is kept in the function in 
which it's accessed.

The standard presents an example, which is a bit ambiguous:

      The following is not a valid fragment (because the union type is not 
visible within function f):

           struct t1 { int m; };
           struct t2 { int m; };
           int f(struct t1 *p1, struct t2 *p2)
           {
                 if (p1->m < 0)
                         p2->m = -p2->m;
                 return p1->m;
           }
           int g()
           {
                 union {
                         struct t1 s1;
                         struct t2 s2;
                 } u;
                 /* ... */
                 return f(&u.s1, &u.s2);
           }

I don't know what's the intention if the union type was global but the variable 
`u` was still not seen by f().  But it seems GCC's interpretation is UB, 
according to the test we just saw.

The solution that I can see for that is making sockaddr also be a union.  That 
way, the union is kept across all calls (since they all use sockaddr).

struct sockaddr {
	union {
		struct {
			sa_family_t  sa_family;
			char         sa_data[14];  // why 14?
		}
		struct sockaddr_in   sin;
		struct sockaddr_in6  sin6;
		struct sockaddr_un   sun;
	};
};


struct sockaddr_storage {
	union {
		sa_family_t          ss_family;
		struct sockaddr      sa;
	};
};


With this, sockaddr_storage becomes useless, but still usable.  New code, could 
just use sockaddr and use the internal union members as necessary.  Union info 
is kept across all function boundaries.

Cheers,

Alex

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] socket: Implement sockaddr_storage with an anonymous union
  2023-01-21  3:17       ` Alejandro Colomar
@ 2023-01-21 13:30         ` Bastien Roucariès
  2023-01-21 14:30           ` Alejandro Colomar
  0 siblings, 1 reply; 9+ messages in thread
From: Bastien Roucariès @ 2023-01-21 13:30 UTC (permalink / raw)
  To: Zack Weinberg, GNU libc development, gcc, Alejandro Colomar
  Cc: Alejandro Colomar, 'linux-man',
	Eric Blake, Stefan Puiu, Igor Sysoev

[-- Attachment #1: Type: text/plain, Size: 6047 bytes --]

Le samedi 21 janvier 2023, 03:17:39 UTC Alejandro Colomar a écrit :
> Hi Zack,
> 
> On 1/21/23 03:38, Alejandro Colomar wrote:
> > Hi Zack,
> > 
> > On 1/20/23 20:25, Alejandro Colomar wrote:
> >> [CC += GCC]  // pun not intended :P
> >>
> >> Hi Zack,
> >>
> >> On 1/20/23 19:04, Zack Weinberg wrote:
> >>> On Fri, Jan 20, 2023, at 8:40 AM, Alejandro Colomar wrote:
> >>>> The historical design of `sockaddr_storage` makes it impossible to use
> >>>> without breaking strict aliasing rules.  Not only this type is unusable,
> >>>> but even the use of other `sockaddr_*` structures directly (when the
> >>>> programmer only cares about a single address family) is also incorrect,
> >>>> since at some point the structure will be accessed as a `sockaddr`, and
> >>>> that breaks strict aliasing rules too.
> >>>>
> >>>> So, the only way for a programmer to not invoke Undefined Behavior is to
> >>>> declare a union that includes `sockaddr` and any `sockaddr_*` structures
> >>>> that are of interest, which allows later accessing as either the correct
> >>>> structure or plain `sockaddr` for the sa_family.
> >>>
> >>> ...
> >>>
> >>>>      struct new_sockaddr_storage  nss;
> >>>>
> >>>>      // ... (initialize oss and nss)
> >>>>
> >>>>      inet_sockaddr2str(&nss.sa);  // correct (and has no casts)
> >>>
> >>> I think we need to move slowly here and be _sure_ that any proposed change
> >>> does in fact reduce the amount of UB.
> >>
> >> Sure, I'm just sending the patch to polish the idea around some concrete code. 
> >> While I checked that it compiles, I didn't add any tests about it or anything, 
> >> to see that it's usable (and Joseph's email showed me that it's far from being 
> >> finished).  I expect that this'll take some time.
> >>
> >>
> >>>  This construct, in particular, might
> >>> not actually be correct in practice: see https://godbolt.org/z/rn51cracn for
> >>> a case where, if I'm reading it right, the compiler assumes that a write
> >>> through a `struct fancy *` cannot alter the values accessible through a
> >>> `struct simple *` even though both pointers point into the same union.
> >>> (Test case provided by <https://stackoverflow.com/users/363751/supercat>;
> >>
> > 
> > [...]
> > 
> > I was wrong in my guess; the correct output is 3/3; I think I had read it the 
> > other way around.  So yes, I believe it's doing what you just wrote there, but 
> > can't understand why.
> > 
> > I reduced @supercat's example to a smaller reproducer program (I couldn't 
> > minimize it any more than this; any further simplification removes the incorrect 
> > behavior):
> > 
> > #include <stdio.h>
> > 
> > struct a { int y[1];};
> > struct b { int y[1];};
> > union u  { struct a a; struct b b; };
> > 
> > 
> > int read_a(struct a *a)
> > {
> >      return a->y[0];
> > }
> > 
> > void write_b(struct b *b, int j)
> > {
> >      b->y[j] = 2;
> > }
> > 
> > int use_union(union u *u, int j)
> > {
> >      if (u->a.y[0] == 0)
> >          write_b(&u->b, j);
> >          //write_b((struct b *)u, j);   // this has the same issue
> >      return read_a(&u->a);
> >      return read_a((struct a *)u);      // this has the same issue
> > }
> > 
> > int (*volatile vtest)(union u *u, int j) = use_union;
> > 
> > int main(void)
> > {
> >      int       r1, r2;
> >      union u   u;
> >      struct b  b = {0};
> > 
> >      u.b = b;
> >      r1 = vtest(&u, 0);
> >      r2 = u.a.y[0];
> > 
> >      printf("%d/%d\n", r1, r2);
> > }
> 
> 
> Ahh, indeed it seems to be UB.  It's in the same 6.5.2.3/6:  there's a 
> requirement that the information about the union is kept in the function in 
> which it's accessed.
> 
> The standard presents an example, which is a bit ambiguous:
> 
>       The following is not a valid fragment (because the union type is not 
> visible within function f):
> 
>            struct t1 { int m; };
>            struct t2 { int m; };
>            int f(struct t1 *p1, struct t2 *p2)
>            {
>                  if (p1->m < 0)
>                          p2->m = -p2->m;
>                  return p1->m;
>            }
>            int g()
>            {
>                  union {
>                          struct t1 s1;
>                          struct t2 s2;
>                  } u;
>                  /* ... */
>                  return f(&u.s1, &u.s2);
>            }
> 
> I don't know what's the intention if the union type was global but the variable 
> `u` was still not seen by f().  But it seems GCC's interpretation is UB, 
> according to the test we just saw.
> 
> The solution that I can see for that is making sockaddr also be a union.  That 
> way, the union is kept across all calls (since they all use sockaddr).
> 
> struct sockaddr {
> 	union {
> 		struct {
> 			sa_family_t  sa_family;
> 			char         sa_data[14];  // why 14?
> 		}
> 		struct sockaddr_in   sin;
> 		struct sockaddr_in6  sin6;
> 		struct sockaddr_un   sun;
> 	};
> };

No the solution is to avoid sockaddr and mark as deprecated. The problem it should be part of union without raising a warning each time we use a safe type...

The other solution is to render public  and ABI stable the type here
https://github.com/bminor/glibc/blob/ae612c45efb5e34713859a5facf92368307efb6e/socket/sys/socket.h#L78
under for instance sockaddr_ptr and sockaddr_const_ptr

Moreover this are some patch arch by arch
https://sourceware.org/legacy-ml/libc-alpha/2016-02/msg00340.html that should be made default 

Bastien



> 
> struct sockaddr_storage {
> 	union {
> 		sa_family_t          ss_family;
> 		struct sockaddr      sa;
> 	};
> };
> 
> 
> With this, sockaddr_storage becomes useless, but still usable.  New code, could 
> just use sockaddr and use the internal union members as necessary.  Union info 
> is kept across all function boundaries.
> 
> Cheers,
> 
> Alex
> 
> -- 
> <http://www.alejandro-colomar.es/>
> 


[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] socket: Implement sockaddr_storage with an anonymous union
  2023-01-21 13:30         ` Bastien Roucariès
@ 2023-01-21 14:30           ` Alejandro Colomar
  2023-01-22 14:12             ` Bastien Roucariès
  0 siblings, 1 reply; 9+ messages in thread
From: Alejandro Colomar @ 2023-01-21 14:30 UTC (permalink / raw)
  To: Bastien Roucariès, Zack Weinberg, GNU libc development, gcc
  Cc: Alejandro Colomar, 'linux-man',
	Eric Blake, Stefan Puiu, Igor Sysoev


[-- Attachment #1.1: Type: text/plain, Size: 4244 bytes --]

Hi Bastien,

On 1/21/23 14:30, Bastien Roucariès wrote:

[...]

>> Ahh, indeed it seems to be UB.  It's in the same 6.5.2.3/6:  there's a
>> requirement that the information about the union is kept in the function in
>> which it's accessed.
>>
>> The standard presents an example, which is a bit ambiguous:
>>
>>        The following is not a valid fragment (because the union type is not
>> visible within function f):
>>

[...]

>>
>> I don't know what's the intention if the union type was global but the variable
>> `u` was still not seen by f().  But it seems GCC's interpretation is UB,
>> according to the test we just saw.
>>
>> The solution that I can see for that is making sockaddr also be a union.  That
>> way, the union is kept across all calls (since they all use sockaddr).
>>
>> struct sockaddr {
>> 	union {
>> 		struct {
>> 			sa_family_t  sa_family;
>> 			char         sa_data[14];  // why 14?
>> 		}
>> 		struct sockaddr_in   sin;
>> 		struct sockaddr_in6  sin6;
>> 		struct sockaddr_un   sun;
>> 	};
>> };
> 
> No the solution is to avoid sockaddr and mark as deprecated.

Declaring `sockaddr` as deprecated means deprecating also:

accept(2)
accept4(2)
bind(2)
connect(2)
getpeername(2)
getsockname(2)
recvfrom(2)
sendto(2)
getnameinfo(3)

which use the type in their prototype.

Also, other types such as `addrinfo`, which contain `sockaddr` would also need 
to be deprecated, which would itself require deprecating:

getaddrinfo(3)
freeaddrinfo(3)

Since addrinfo is itself contained in other structures such as `gaicb`, we would 
also need to deprecate those, which would in turn require deprecating more APIs:

getaddrinfo_a(3)
gai_error(3)
gai_cancel(3)

And maybe I left some.  This feels like nuking the entire networking API, which 
I don't see happening soon.


Otherwise, we need to come up with a solution that keeps these APIs compatible 
with whatever new type we suggest using.  Changing them to use `void*` instead 
of `sockaddr*` would be possible, but would decrease type safety considerably, 
so there should be a good reason for that.

Suggesting to use always `sockaddr_storage` but using `sockaddr` in the function 
parameters keeps the current not-so-nice casting issues, which are not Undefined 
Behavior per se, but not ideal either (in fact, I don't think `void*` is much 
worse than code full of casts).  And it would also be error-prone, since users 
could get the idea that `sockaddr` can be used safely, since it's what gets 
passed as the parameter.

> The problem it should be part of union without raising a warning each time we use a safe type...

I don't understand this; please detail.

> 
> The other solution is to render public  and ABI stable the type here
> https://github.com/bminor/glibc/blob/ae612c45efb5e34713859a5facf92368307efb6e/socket/sys/socket.h#L78
> under for instance sockaddr_ptr and sockaddr_const_ptr

[[gnu::transparent_union]] (used in that link) seems like a "the design of this 
interface is bad, sorry, this workaround will just make it work".  I guess it 
just works, and probably it's the reason that so much undefined behavior hasn't 
exploded so far.  However, if we can solve this using core language features, 
I'd go that way.

> 
> Moreover this are some patch arch by arch
> https://sourceware.org/legacy-ml/libc-alpha/2016-02/msg00340.html that should be made default
> 
> Bastien
> 
> 
> 
>>
>> struct sockaddr_storage {
>> 	union {
>> 		sa_family_t          ss_family;
>> 		struct sockaddr      sa;
>> 	};
>> };

Hmm, this isn't still perfect.  Since the APIs get the sockaddr, this union 
information would be lost.  `sockaddr` needs to be the type that is declared. 
`sockaddr_storage` should just die; there's no way to make it compatible with 
APIs getting a `sockaddr`.  The attribute `transparent_union` is the only way to 
use is safely.

Cheers,

Alex

>>
>>
>> With this, sockaddr_storage becomes useless, but still usable.  New code, could
>> just use sockaddr and use the internal union members as necessary.  Union info
>> is kept across all function boundaries.

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] socket: Implement sockaddr_storage with an anonymous union
  2023-01-21 14:30           ` Alejandro Colomar
@ 2023-01-22 14:12             ` Bastien Roucariès
  0 siblings, 0 replies; 9+ messages in thread
From: Bastien Roucariès @ 2023-01-22 14:12 UTC (permalink / raw)
  To: Zack Weinberg, GNU libc development, gcc, Alejandro Colomar
  Cc: Alejandro Colomar, 'linux-man',
	Eric Blake, Stefan Puiu, Igor Sysoev

[-- Attachment #1: Type: text/plain, Size: 5072 bytes --]

Le samedi 21 janvier 2023, 14:30:11 UTC Alejandro Colomar a écrit :
> Hi Bastien,
> 
> On 1/21/23 14:30, Bastien Roucariès wrote:
> 
> [...]
> 
> >> Ahh, indeed it seems to be UB.  It's in the same 6.5.2.3/6:  there's a
> >> requirement that the information about the union is kept in the function in
> >> which it's accessed.
> >>
> >> The standard presents an example, which is a bit ambiguous:
> >>
> >>        The following is not a valid fragment (because the union type is not
> >> visible within function f):
> >>
> 
> [...]
> 
> >>
> >> I don't know what's the intention if the union type was global but the variable
> >> `u` was still not seen by f().  But it seems GCC's interpretation is UB,
> >> according to the test we just saw.
> >>
> >> The solution that I can see for that is making sockaddr also be a union.  That
> >> way, the union is kept across all calls (since they all use sockaddr).
> >>
> >> struct sockaddr {
> >> 	union {
> >> 		struct {
> >> 			sa_family_t  sa_family;
> >> 			char         sa_data[14];  // why 14?
> >> 		}
> >> 		struct sockaddr_in   sin;
> >> 		struct sockaddr_in6  sin6;
> >> 		struct sockaddr_un   sun;
> >> 	};
> >> };
> > 
> > No the solution is to avoid sockaddr and mark as deprecated.
> 
> Declaring `sockaddr` as deprecated means deprecating also:
> 
> accept(2)
> accept4(2)
> bind(2)
> connect(2)
> getpeername(2)
> getsockname(2)
> recvfrom(2)
> sendto(2)
> getnameinfo(3)
> 
> which use the type in their prototype.
> 
> Also, other types such as `addrinfo`, which contain `sockaddr` would also need 
> to be deprecated, which would itself require deprecating:

No because this function will take a opaque transparent union pointer. I mean only raise 
a warning when user declare a variable (storage) of struct sockaddr...
> 
> getaddrinfo(3)
> freeaddrinfo(3)
> 
> Since addrinfo is itself contained in other structures such as `gaicb`, we would 
> also need to deprecate those, which would in turn require deprecating more APIs:
> 
> getaddrinfo_a(3)
> gai_error(3)
> gai_cancel(3)
> 
> And maybe I left some.  This feels like nuking the entire networking API, which 
> I don't see happening soon.
> 
> 
> Otherwise, we need to come up with a solution that keeps these APIs compatible 
> with whatever new type we suggest using.  Changing them to use `void*` instead 
> of `sockaddr*` would be possible, but would decrease type safety considerably, 
> so there should be a good reason for that.
> 
> Suggesting to use always `sockaddr_storage` but using `sockaddr` in the function 
> parameters keeps the current not-so-nice casting issues, which are not Undefined 
> Behavior per se, but not ideal either (in fact, I don't think `void*` is much 
> worse than code full of casts).  And it would also be error-prone, since users 
> could get the idea that `sockaddr` can be used safely, since it's what gets 
> passed as the parameter.
> 
> > The problem it should be part of union without raising a warning each time we use a safe type...
> 
> I don't understand this; please detail.

the transparent union will include sockaddr, thus even if we use it correctly raise a warning...
> 
> > 
> > The other solution is to render public  and ABI stable the type here
> > https://github.com/bminor/glibc/blob/ae612c45efb5e34713859a5facf92368307efb6e/socket/sys/socket.h#L78
> > under for instance sockaddr_ptr and sockaddr_const_ptr
> 
> [[gnu::transparent_union]] (used in that link) seems like a "the design of this 
> interface is bad, sorry, this workaround will just make it work".  I guess it 
> just works, and probably it's the reason that so much undefined behavior hasn't 
> exploded so far.  However, if we can solve this using core language features, 
> I'd go that way.

It solve the problems and could be used without the  [[gnu::transparent_union]], c++17 support transparent union.

The transparent union should also include pointer to sockaddr_storage and bluetooth socket.

Bastien
> 
> > 
> > Moreover this are some patch arch by arch
> > https://sourceware.org/legacy-ml/libc-alpha/2016-02/msg00340.html that should be made default
> > 
> > Bastien
> > 
> > 
> > 
> >>
> >> struct sockaddr_storage {
> >> 	union {
> >> 		sa_family_t          ss_family;
> >> 		struct sockaddr      sa;
> >> 	};
> >> };
> 
> Hmm, this isn't still perfect.  Since the APIs get the sockaddr, this union 
> information would be lost.  `sockaddr` needs to be the type that is declared. 
> `sockaddr_storage` should just die; there's no way to make it compatible with 
> APIs getting a `sockaddr`.  The attribute `transparent_union` is the only way to 
> use is safely.
> 
> Cheers,
> 
> Alex
> 
> >>
> >>
> >> With this, sockaddr_storage becomes useless, but still usable.  New code, could
> >> just use sockaddr and use the internal union members as necessary.  Union info
> >> is kept across all function boundaries.
> 
> -- 
> <http://www.alejandro-colomar.es/>
> 


[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-01-22 14:12 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20230120134043.10247-1-alx@kernel.org>
     [not found] ` <d77b529d-e54d-4919-87a4-d90fd816ba8b@app.fastmail.com>
2023-01-20 19:25   ` [PATCH v2] socket: Implement sockaddr_storage with an anonymous union Alejandro Colomar
2023-01-21  2:38     ` Alejandro Colomar
2023-01-21  3:17       ` Alejandro Colomar
2023-01-21 13:30         ` Bastien Roucariès
2023-01-21 14:30           ` Alejandro Colomar
2023-01-22 14:12             ` Bastien Roucariès
     [not found] ` <5187043.CeDsVVrsAm@portable-bastien>
2023-01-20 20:38   ` Alejandro Colomar
2023-01-20 20:46     ` Bastien Roucariès
2023-01-20 20:51       ` Alejandro Colomar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).