From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-alpha-return-75126-listarch-libc-alpha=sources.redhat.com@sourceware.org>
Received: (qmail 30092 invoked by alias); 24 Nov 2016 10:01:59 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Received: (qmail 30076 invoked by uid 89); 24 Nov 2016 10:01:59 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-3.0 required=5.0 tests=BAYES_40,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=equally, inclined, discount, arguably
X-HELO: mx1.redhat.com
Subject: Re: Evolution of ELF symbol management
To: Zack Weinberg <zackw@panix.com>
References: <9727f95a-df3d-ec11-8c1d-9b7ea6cbcaac@redhat.com>
 <0c682496-5c13-b3c5-ff66-0f8923a1d6e3@panix.com>
 <94e35eaa-a01d-db4e-eabe-6d100e581302@redhat.com>
 <ddf76775-1ff5-d7b0-50d0-a08867b5fafb@panix.com>
 <137dfcf1-eeeb-89c6-9882-b290983bc482@redhat.com>
 <4e84e53e-b86d-996f-f513-09c942359d66@panix.com>
Cc: libc-alpha@sourceware.org, Siddhesh Poyarekar <siddhesh@gotplt.org>
From: Florian Weimer <fweimer@redhat.com>
Message-ID: <ba048956-9a10-d92b-360c-13d49de25a5a@redhat.com>
Date: Thu, 24 Nov 2016 10:01:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.4.0
MIME-Version: 1.0
In-Reply-To: <4e84e53e-b86d-996f-f513-09c942359d66@panix.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
X-SW-Source: 2016-11/txt/msg00879.txt.bz2

On 11/23/2016 03:08 PM, Zack Weinberg wrote:

>> In general, headers should avoid using libc types, particularly
>> off_t, time_t, struct timeval, struct stat, and so on.  But there
>> might be exceptions.
>
> ... I'm not sure why we're suddenly discussing typedef names?

I'm trying to come up with reasons why the usual header file conflict 
avoidance mechanisms would not work.

I mean, if you use C, you pretty much agreed to using separate 
compilation to work around header conflict issues.

>> For C, it's not clear at all to me whether we need any kind of
>> opt-in besides compiling with a particular _*_SOURCE variant (which
>> introduces the definition).
>
> I am having trouble articulating why I don't like this.  I think it
> might be mostly to do with the way new symbols are now different than
> old symbols.  I'd be okay with an across-the-board change to symbol
> resolution (as discussed below) that made interposition not work by
> default, but I'm not okay with the idea of its being supported only for
> symbols added before some arbitrary release.

I prefer the case-by-case approach because it allows us to review ABI 
changes individually.

> Third-party C libraries don't tend to put nearly as much code in inline
> functions, so the need for explicitly-useable __names is lessened there,
> I think.

Right, and separate compilation is available as workaround.

>> For C++, we might use something based on namespaces to get a clear
>> separation.  However, the problem there is that type names and
>> struct tags end up in C++ mangled identifiers and thus impact
>> application ABI. I have no good idea what to do there.
>
> Arguably that's a Good Thing -- a change to what off_t means is an ABI
> break whether or not it shows up in symbol names, and _making_ that one
> in particular show up in symbol names might solve some of the problems
> that lead to _FILE_OFFSET_BITS=64 still not being default for the older
> 32-bit architectures.

Suppose we want to make struct sockaddr_un available to C++ code under a 
namespaced name.  Then C++ code has to use that name.  But this could 
change ABI on the C++ side merely due to name mangling (on top of 
potential type compatibility issues interface with user code).  If we do 
not solve this in some way, I don't think many C++ projects will switch 
to internal names to avoid the header file collision because it's not 
worth the impact on compatibility.

>>> I was imagining a new annotation on _all_ undefined symbols in a
>>> shared object, giving the soname of the object that they were
>>> satisfied by at link time.  At load time, 'getrandom!libc.so.6'
>>> resolves to the 'getrandom' definition in libc.so.6, ignoring all
>>> other definitions of the same name.  If there are symbol versions
>>> involved, only the versions exported by libc.so.6 are considered.
>>> For instance, 'getrandom!libc.so.6@GLIBC_2.25' cannot be satisfied
>>> by 'getrandom@GLIBC_2.25' exported by libmissing-syscalls.so.1.
>>
>> We still need to support LD_PRELOAD and interposition of arbitrary
>> symbols, and not just malloc-related ones, for the benefit of
>> Address Sanitizer, fakeroot, cwrap, memstomp and other tools.
>>
>> This is why hard-coding the DSO name does not seem advisable.
>
> This argument applies equally to every new symbol we might add, and in
> fact to every _intra_-libc call that currently _can't_ be interposed.
> So I'm inclined to discount it.

I think there's a big difference if you have to write new interceptors 
to support newer glibc versions, or if you have to rewrite your whole 
library as an audit module because the ability to interpose the symbols 
you are interested in is gone completely.

> The solution I'm leaning toward involves each library designating a set
> of exported symbols, calls to which _can_ be interposed; the default is
> not to allow it.  We'd probably have to spend some time figuring out
> exactly which of libc's symbols should be interposeable.

It seems to me that interposition of arbitrary symbols is currently part 
of the programming interface.  We didn't plan for things like fakeroot 
and cwrap, but someone created those tools eventually, and they 
apparently address a real need.

>> When an application is linked against a shared object, if it
>> interposes any symbols in it, the symbols becomes exported, so that
>> interposition works at run time (otherwise, it could not happen).
>> You can see an example here:
>>
>> $ nm -Dg malloc/tst-interpose-nothread  | grep ' T '
>>
>> The application is *not* compiled with -Bdynamic or something like
>> that, it happens automatically.
>>
>> But the symbol version from libc.so.6 is not attached to this symbol
>> (ânmâ would not show it, but you can check with eu-readelf, for
>> example).
>
> Well, OK, why don't we just fix that?  Is there a good reason why it
> _doesn't_ pick up the symbol version?

I'm not sure if interposition at load time will still happen.  But this 
should be easy to verify.  I'll give it a try.

> We agreed that the unmangled name has to exist, so how about we move
> forward by introducing only the unmangled names for the new symbols
> currently proposed (getrandom, explicit_bzero), introduce mangling if
> necessary based on feedback, and work toward a long-term solution that
> can be applied across the board?

What kind of feedback would trigger mangled names?  Is having a 
real-world application which triggers accidental interposition sufficient?

For getrandom, not using the mangled name by default looks like a 
security bug in the making.  Less so for explict_bzero.

Thanks,
Florian