From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 30092 invoked by alias); 24 Nov 2016 10:01:59 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 30076 invoked by uid 89); 24 Nov 2016 10:01:59 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-3.0 required=5.0 tests=BAYES_40,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=equally, inclined, discount, arguably X-HELO: mx1.redhat.com Subject: Re: Evolution of ELF symbol management To: Zack Weinberg References: <9727f95a-df3d-ec11-8c1d-9b7ea6cbcaac@redhat.com> <0c682496-5c13-b3c5-ff66-0f8923a1d6e3@panix.com> <94e35eaa-a01d-db4e-eabe-6d100e581302@redhat.com> <137dfcf1-eeeb-89c6-9882-b290983bc482@redhat.com> <4e84e53e-b86d-996f-f513-09c942359d66@panix.com> Cc: libc-alpha@sourceware.org, Siddhesh Poyarekar From: Florian Weimer Message-ID: Date: Thu, 24 Nov 2016 10:01:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <4e84e53e-b86d-996f-f513-09c942359d66@panix.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-SW-Source: 2016-11/txt/msg00879.txt.bz2 On 11/23/2016 03:08 PM, Zack Weinberg wrote: >> In general, headers should avoid using libc types, particularly >> off_t, time_t, struct timeval, struct stat, and so on. But there >> might be exceptions. > > ... I'm not sure why we're suddenly discussing typedef names? I'm trying to come up with reasons why the usual header file conflict avoidance mechanisms would not work. I mean, if you use C, you pretty much agreed to using separate compilation to work around header conflict issues. >> For C, it's not clear at all to me whether we need any kind of >> opt-in besides compiling with a particular _*_SOURCE variant (which >> introduces the definition). > > I am having trouble articulating why I don't like this. I think it > might be mostly to do with the way new symbols are now different than > old symbols. I'd be okay with an across-the-board change to symbol > resolution (as discussed below) that made interposition not work by > default, but I'm not okay with the idea of its being supported only for > symbols added before some arbitrary release. I prefer the case-by-case approach because it allows us to review ABI changes individually. > Third-party C libraries don't tend to put nearly as much code in inline > functions, so the need for explicitly-useable __names is lessened there, > I think. Right, and separate compilation is available as workaround. >> For C++, we might use something based on namespaces to get a clear >> separation. However, the problem there is that type names and >> struct tags end up in C++ mangled identifiers and thus impact >> application ABI. I have no good idea what to do there. > > Arguably that's a Good Thing -- a change to what off_t means is an ABI > break whether or not it shows up in symbol names, and _making_ that one > in particular show up in symbol names might solve some of the problems > that lead to _FILE_OFFSET_BITS=64 still not being default for the older > 32-bit architectures. Suppose we want to make struct sockaddr_un available to C++ code under a namespaced name. Then C++ code has to use that name. But this could change ABI on the C++ side merely due to name mangling (on top of potential type compatibility issues interface with user code). If we do not solve this in some way, I don't think many C++ projects will switch to internal names to avoid the header file collision because it's not worth the impact on compatibility. >>> I was imagining a new annotation on _all_ undefined symbols in a >>> shared object, giving the soname of the object that they were >>> satisfied by at link time. At load time, 'getrandom!libc.so.6' >>> resolves to the 'getrandom' definition in libc.so.6, ignoring all >>> other definitions of the same name. If there are symbol versions >>> involved, only the versions exported by libc.so.6 are considered. >>> For instance, 'getrandom!libc.so.6@GLIBC_2.25' cannot be satisfied >>> by 'getrandom@GLIBC_2.25' exported by libmissing-syscalls.so.1. >> >> We still need to support LD_PRELOAD and interposition of arbitrary >> symbols, and not just malloc-related ones, for the benefit of >> Address Sanitizer, fakeroot, cwrap, memstomp and other tools. >> >> This is why hard-coding the DSO name does not seem advisable. > > This argument applies equally to every new symbol we might add, and in > fact to every _intra_-libc call that currently _can't_ be interposed. > So I'm inclined to discount it. I think there's a big difference if you have to write new interceptors to support newer glibc versions, or if you have to rewrite your whole library as an audit module because the ability to interpose the symbols you are interested in is gone completely. > The solution I'm leaning toward involves each library designating a set > of exported symbols, calls to which _can_ be interposed; the default is > not to allow it. We'd probably have to spend some time figuring out > exactly which of libc's symbols should be interposeable. It seems to me that interposition of arbitrary symbols is currently part of the programming interface. We didn't plan for things like fakeroot and cwrap, but someone created those tools eventually, and they apparently address a real need. >> When an application is linked against a shared object, if it >> interposes any symbols in it, the symbols becomes exported, so that >> interposition works at run time (otherwise, it could not happen). >> You can see an example here: >> >> $ nm -Dg malloc/tst-interpose-nothread | grep ' T ' >> >> The application is *not* compiled with -Bdynamic or something like >> that, it happens automatically. >> >> But the symbol version from libc.so.6 is not attached to this symbol >> (“nm” would not show it, but you can check with eu-readelf, for >> example). > > Well, OK, why don't we just fix that? Is there a good reason why it > _doesn't_ pick up the symbol version? I'm not sure if interposition at load time will still happen. But this should be easy to verify. I'll give it a try. > We agreed that the unmangled name has to exist, so how about we move > forward by introducing only the unmangled names for the new symbols > currently proposed (getrandom, explicit_bzero), introduce mangling if > necessary based on feedback, and work toward a long-term solution that > can be applied across the board? What kind of feedback would trigger mangled names? Is having a real-world application which triggers accidental interposition sufficient? For getrandom, not using the mangled name by default looks like a security bug in the making. Less so for explict_bzero. Thanks, Florian