public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: Alejandro Colomar <alx.manpages@gmail.com>
To: Stefan Puiu <stefan.puiu@gmail.com>
Cc: "GNU C Library" <libc-alpha@sourceware.org>,
	linux-man <linux-man@vger.kernel.org>,
	gcc@gcc.gnu.org, "Igor Sysoev" <igor@sysoev.ru>,
	"Bastien Roucariès" <rouca@debian.org>
Subject: Re: struct sockaddr_storage
Date: Mon, 23 Jan 2023 17:03:00 +0100	[thread overview]
Message-ID: <4c47dcb0-f665-d6ff-cc26-d5f4e99bd739@gmail.com> (raw)
In-Reply-To: <CACKs7VDGAaSXkjeuBdvEkFbFJ_OnwObTf1_9eVb44RJf-O3Fwg@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 4583 bytes --]

Hi Stefan,

On 1/23/23 08:40, Stefan Puiu wrote:
>>>> According to strict aliasing rules, if you declare a variable of type 'struct
>>>> sockaddr_storage', that's what you get, and trying to access it later as some
>>>> other sockaddr_8 is simply not legal.  The compiler may assume those accesses
>>>> can't happen, and optimize as it pleases.
>>>
>>> Can you detail the "is not legal" part?
>>
>> I mean that it's Undefined Behavior contraband.
> 
> OK, next question. Is this theoretical or practical UB?


Since the functions using this type are not functions that should be inlined, 
since the code is rather large, they are not visible to the compiler, so many of 
the optimizations that this UB enables are not likely to happen.  Translation 
Unit (TU) boundaries are what keeps most UB invokations not be really dangerous.

Also, glibc seems to be using a GCC attribute (transparent_union) to make the 
code avoid UB even if it were inlined, so if you use glibc, you're fine.  If 
you're using some smaller libc with a less capable compiler, or maybe C++, you 
are less lucky, but TU boundaries will probably still save your day.

> People check
> documentation about how to write code today, I think.

I'm fine documenting how to do it today.  But before changing the documentation, 
I'd like to take some time to reflect on what can we do to fix the standard so 
that we don't have this semi-broken state forever.  When we have a clear idea of 
what we can do to fix the implementation and hopefully the standard long-term, 
possibly keeping source code the same, we can do a better recommendation for 
programmers.

Today, you can do 2 things:

-  You don't care about UB, and would like that C had always been K&R C, and GCC 
just makes it work.  Then use `sockaddr_storage`.  It will just work.  When it 
stops working, you can blame the compiler and libc for optimizing way too much.

-  You care a lot about UB.  Then write your own union, as all the `sockaddr` 
interface should have been designed from the ground up.  That's what unions are for.

Which should we recommend?  That's my problem.

I don't want to be documenting the latter, because it's non-standard, and it's 
still likely to do it invoking UB in a different way, because it's a difficult 
part of the language, and when you roll your own, you're likely to make accidents.

So, ideally, I'd like to document the former, but for that, I'd like to make 
sure that it will work forever, since otherwise we'd be blamed when somebody's 
code is compiled in a platform with some combination of libc, compiler, and 
phase of the moon, that makes the UB become non-theoretical.

I think we can fix the definition of `sockaddr_storage` to have defined 
behavior, with the changes I'm discussing with Bastien, so I guess we'll 
document the former.

>>> Will code break in practice?
>>
>> Well, it depends on how much compilers advance.  Here's some interesting experiment:
>>
>> <https://software.codidact.com/posts/287748/287750#answer-287750>
> 
> That code plays with 2 pointers to the same area, one to double and
> one to int, so I don't think it's that similar to the sockaddr
> situation. At least for struct sockaddr, the sa_family field is the
> same for all struct sockaddr_* variants. Also, in practical terms, I
> don't think any compiler optimization that breaks socket APIs (and, if
> I recall correctly, there are instances of this pattern in the kernel
> as well) is going to be an easy sell. It's possible, but realistically
> speaking, I don't think it's going to happen.

The common initial sequence of structures is only allowed if the structures form 
part of a union (which is why to avoid UB you need a union; and still, you need 
to make sure you don't invoke UB in a different way).
<https://port70.net/%7Ensz/c/c11/n1570.html#6.5.2.3p6>

> 
>>
>> I wouldn't rely on Undefined Behavior not causing nasal demons.  When you get
>> them, you can only kill them with garlic.
> 
> OK, but not all theoretical issues have practical implications. Is
> there code that can show UB in practical terms with struct
> sockaddr_storage today? Like Eric mentioned in another thread, does
> UBSan complain about code using struct sockaddr_storage?

It's unlikely.  But I can't promise it will be safe under some random 
combination of compiler and library, and depends also on what you do in your 
code, which will affect compiler optimizations.

Cheers,

Alex

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2023-01-23 16:03 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-19 14:11 Alejandro Colomar
2023-01-20 10:06 ` Stefan Puiu
2023-01-20 12:39   ` Alejandro Colomar
2023-01-23  7:40     ` Stefan Puiu
2023-01-23 16:03       ` Alejandro Colomar [this message]
2023-01-23 16:28         ` Richard Biener
2023-01-24 16:38           ` Alex Colomar
2023-01-23 16:37         ` Jakub Jelinek
2023-01-24 16:40           ` Alex Colomar
2023-01-24 18:00           ` Alex Colomar
2023-01-24 11:16   ` Rich Felker
2023-01-24 16:53     ` Alex Colomar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4c47dcb0-f665-d6ff-cc26-d5f4e99bd739@gmail.com \
    --to=alx.manpages@gmail.com \
    --cc=gcc@gcc.gnu.org \
    --cc=igor@sysoev.ru \
    --cc=libc-alpha@sourceware.org \
    --cc=linux-man@vger.kernel.org \
    --cc=rouca@debian.org \
    --cc=stefan.puiu@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).