public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Florian Weimer <fweimer@redhat.com>
To: Zack Weinberg <zackw@panix.com>
Cc: libc-alpha@sourceware.org, Siddhesh Poyarekar <siddhesh@gotplt.org>
Subject: Re: Evolution of ELF symbol management
Date: Thu, 24 Nov 2016 10:01:00 -0000	[thread overview]
Message-ID: <ba048956-9a10-d92b-360c-13d49de25a5a@redhat.com> (raw)
In-Reply-To: <4e84e53e-b86d-996f-f513-09c942359d66@panix.com>

On 11/23/2016 03:08 PM, Zack Weinberg wrote:

>> In general, headers should avoid using libc types, particularly
>> off_t, time_t, struct timeval, struct stat, and so on.  But there
>> might be exceptions.
>
> ... I'm not sure why we're suddenly discussing typedef names?

I'm trying to come up with reasons why the usual header file conflict 
avoidance mechanisms would not work.

I mean, if you use C, you pretty much agreed to using separate 
compilation to work around header conflict issues.

>> For C, it's not clear at all to me whether we need any kind of
>> opt-in besides compiling with a particular _*_SOURCE variant (which
>> introduces the definition).
>
> I am having trouble articulating why I don't like this.  I think it
> might be mostly to do with the way new symbols are now different than
> old symbols.  I'd be okay with an across-the-board change to symbol
> resolution (as discussed below) that made interposition not work by
> default, but I'm not okay with the idea of its being supported only for
> symbols added before some arbitrary release.

I prefer the case-by-case approach because it allows us to review ABI 
changes individually.

> Third-party C libraries don't tend to put nearly as much code in inline
> functions, so the need for explicitly-useable __names is lessened there,
> I think.

Right, and separate compilation is available as workaround.

>> For C++, we might use something based on namespaces to get a clear
>> separation.  However, the problem there is that type names and
>> struct tags end up in C++ mangled identifiers and thus impact
>> application ABI. I have no good idea what to do there.
>
> Arguably that's a Good Thing -- a change to what off_t means is an ABI
> break whether or not it shows up in symbol names, and _making_ that one
> in particular show up in symbol names might solve some of the problems
> that lead to _FILE_OFFSET_BITS=64 still not being default for the older
> 32-bit architectures.

Suppose we want to make struct sockaddr_un available to C++ code under a 
namespaced name.  Then C++ code has to use that name.  But this could 
change ABI on the C++ side merely due to name mangling (on top of 
potential type compatibility issues interface with user code).  If we do 
not solve this in some way, I don't think many C++ projects will switch 
to internal names to avoid the header file collision because it's not 
worth the impact on compatibility.

>>> I was imagining a new annotation on _all_ undefined symbols in a
>>> shared object, giving the soname of the object that they were
>>> satisfied by at link time.  At load time, 'getrandom!libc.so.6'
>>> resolves to the 'getrandom' definition in libc.so.6, ignoring all
>>> other definitions of the same name.  If there are symbol versions
>>> involved, only the versions exported by libc.so.6 are considered.
>>> For instance, 'getrandom!libc.so.6@GLIBC_2.25' cannot be satisfied
>>> by 'getrandom@GLIBC_2.25' exported by libmissing-syscalls.so.1.
>>
>> We still need to support LD_PRELOAD and interposition of arbitrary
>> symbols, and not just malloc-related ones, for the benefit of
>> Address Sanitizer, fakeroot, cwrap, memstomp and other tools.
>>
>> This is why hard-coding the DSO name does not seem advisable.
>
> This argument applies equally to every new symbol we might add, and in
> fact to every _intra_-libc call that currently _can't_ be interposed.
> So I'm inclined to discount it.

I think there's a big difference if you have to write new interceptors 
to support newer glibc versions, or if you have to rewrite your whole 
library as an audit module because the ability to interpose the symbols 
you are interested in is gone completely.

> The solution I'm leaning toward involves each library designating a set
> of exported symbols, calls to which _can_ be interposed; the default is
> not to allow it.  We'd probably have to spend some time figuring out
> exactly which of libc's symbols should be interposeable.

It seems to me that interposition of arbitrary symbols is currently part 
of the programming interface.  We didn't plan for things like fakeroot 
and cwrap, but someone created those tools eventually, and they 
apparently address a real need.

>> When an application is linked against a shared object, if it
>> interposes any symbols in it, the symbols becomes exported, so that
>> interposition works at run time (otherwise, it could not happen).
>> You can see an example here:
>>
>> $ nm -Dg malloc/tst-interpose-nothread  | grep ' T '
>>
>> The application is *not* compiled with -Bdynamic or something like
>> that, it happens automatically.
>>
>> But the symbol version from libc.so.6 is not attached to this symbol
>> (“nm” would not show it, but you can check with eu-readelf, for
>> example).
>
> Well, OK, why don't we just fix that?  Is there a good reason why it
> _doesn't_ pick up the symbol version?

I'm not sure if interposition at load time will still happen.  But this 
should be easy to verify.  I'll give it a try.

> We agreed that the unmangled name has to exist, so how about we move
> forward by introducing only the unmangled names for the new symbols
> currently proposed (getrandom, explicit_bzero), introduce mangling if
> necessary based on feedback, and work toward a long-term solution that
> can be applied across the board?

What kind of feedback would trigger mangled names?  Is having a 
real-world application which triggers accidental interposition sufficient?

For getrandom, not using the mangled name by default looks like a 
security bug in the making.  Less so for explict_bzero.

Thanks,
Florian

      reply	other threads:[~2016-11-24 10:01 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-18  9:26 Florian Weimer
2016-10-18 16:50 ` Joseph Myers
2016-10-25 14:32   ` Florian Weimer
2016-10-25 15:37     ` Joseph Myers
2016-11-21 15:35       ` Florian Weimer
2016-10-26 12:17     ` Joseph Myers
2016-11-20 11:13     ` Mike Frysinger
2016-11-21 10:12       ` Florian Weimer
2016-11-16 15:55 ` Zack Weinberg
2016-11-18 15:48   ` Florian Weimer
2016-11-19 17:25     ` Zack Weinberg
2016-11-22 15:09       ` Florian Weimer
2016-11-22 15:30         ` Andreas Schwab
2016-11-22 15:39           ` Florian Weimer
2016-11-22 15:48             ` Zack Weinberg
2016-11-22 15:48               ` Zack Weinberg
2016-11-22 17:42         ` Joseph Myers
2016-11-23 14:09         ` Zack Weinberg
2016-11-24 10:01           ` Florian Weimer [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ba048956-9a10-d92b-360c-13d49de25a5a@redhat.com \
    --to=fweimer@redhat.com \
    --cc=libc-alpha@sourceware.org \
    --cc=siddhesh@gotplt.org \
    --cc=zackw@panix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).