From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 123554 invoked by alias); 3 May 2019 14:23:59 -0000 Mailing-List: contact binutils-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: binutils-owner@sourceware.org Received: (qmail 123545 invoked by uid 89); 3 May 2019 14:23:59 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-11.5 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS,SPF_PASS,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 spammy=risking, practical, blowing, overlooked X-HELO: userp2120.oracle.com Received: from userp2120.oracle.com (HELO userp2120.oracle.com) (156.151.31.85) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 03 May 2019 14:23:53 +0000 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x43EIxuN013473; Fri, 3 May 2019 14:23:51 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : references : date : in-reply-to : message-id : mime-version : content-type; s=corp-2018-07-02; bh=Z/aw87uJ9Th9venHV/cBHHBmxar9DtA+uij1heqlapc=; b=1hqbkuomNDrxL1TKHeslXlSNC+Y4ybdbMrpSawxRkY4/wRg1XrAF4md41q/zPTXkr86F YjbNjBIuAwuMAdOAB8bIzDEX3KtC5U3I+ezJaK57enb/PewzUB1ucF2niRTLgX1QoazM SJwtpdVEP3PKKwJHYVzOcilZ/vITJ+SBkNdijf50E/a8q0GXkQKFu1ISB4Xjtpa/FedN hLM8M2GN2MPZMFLhEfjqHxP5QA39413MZZd13pv0zjWLrqvjCuwMMBahrKGr2ysxkhCJ zpRbIKJ4DWFmLizzeXJnis+65eR+wMsF93PDKWayzJb+3aajgdhStNzSaqcA2xc4AUNT jw== Received: from aserp3030.oracle.com (aserp3030.oracle.com [141.146.126.71]) by userp2120.oracle.com with ESMTP id 2s6xhyy2yg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 03 May 2019 14:23:50 +0000 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x43EMpQo148946; Fri, 3 May 2019 14:23:50 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserp3030.oracle.com with ESMTP id 2s7rtcagy0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 03 May 2019 14:23:49 +0000 Received: from abhmp0011.oracle.com (abhmp0011.oracle.com [141.146.116.17]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id x43ENmST014358; Fri, 3 May 2019 14:23:48 GMT Received: from loom (/81.187.191.129) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Fri, 03 May 2019 07:23:47 -0700 From: Nick Alcock To: Joseph Myers Cc: Subject: Re: [PATCH 00/19] libctf, and CTF support for objdump and readelf References: <20190430225706.159422-1-nick.alcock@oracle.com> Date: Fri, 03 May 2019 14:23:00 -0000 In-Reply-To: (Joseph Myers's message of "Thu, 2 May 2019 15:22:30 +0000") Message-ID: <8736lvwr9p.fsf@esperi.org.uk> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-IsSubscribed: yes X-SW-Source: 2019-05/txt/msg00114.txt.bz2 [looking at your comments first because you were so very helpful last time I contributed to glibc. :) ] (And thank you! I haven't done quite everything you suggested, at least not yet, but the 90% I have done is entirely beneficial and you spotted a lot of things I overlooked.) On 2 May 2019, Joseph Myers spake thusly: > This patch series introduces a dependency of binutils on libctf. > > This means libctf should be portable to the same range of hosts as > binutils, all of which can be used as hosts for cross toolchains for a > range of targets (ELF and non-ELF). For example, it should be portable to > hosts such as MinGW or OS X. Seems sensible. It might lose some functionality, though, at least to start with. (Say, sharing string tables with the overarching container, opening CTF files by handing them an fd to a containing binary, etc. There are alternatives callers can use in all these cases.) I'll probably arrange for the deduplicating linker plugin to be ELF-only, at least to start with, because I have no way to test on anything else, and it might always keep strings and symbols internal to the CTF file rather than trying to share them with the enclosing binary, until someone else contributes that sort of thing for non-ELF. > Some apparent portability issues in this code include: > > * Use of dlfcn.h. Such use in existing binutils code (e.g. bfd/plugin.c) > is conditional, to avoid trying to use it on hosts without that > functionality. This was used by ancient code in the OpenSolaris days that endeavoured to dlopen() zlib to avoid linking against it (why one would want to avoid linking against zlib is opaque to me). No user these days: dropped. > * Use of sys/mman.h. Again, mmap usage in existing code is appropriately > conditional. We can fall back to copying or malloc in that situation, in most cases. However, the CTF archive code would be made significantly more complicated, more than cancelling out the implementation simplicity which was one reason for using mmap() there in the first place. So for now my no-mmap() CTF archive code just fails: callers can detect the failure and fall back to storing CTF containers separately in that case. (Both reading and writing fail symmetrically, so you aren't going to end up creating containers you then can't read.) If there are really still platforms relevant outside industrial museums without mmap(), we can rethink this, but I bet there aren't, or that any such platforms aren't going to be storing huge numbers of CTF containers in any case. (The use case for this is if you have so many TUs that you can't store one per section without risking blowing the 64K section limit. Any machine new enough to be dealing with anything in that size range is going to have mmap() as well, right? Or something we can use instead of it with similar semantics...) Note that it's only *creating* CTF archives without mmap() that is too horrible to countenance. It is relatively easy to support reading CTF archives on non-mmap-supporting systems, if quite inefficiently, so we could arrange to fall back to read-and-copy in that case, allowing people in cross environments to not need to worry about whether their target supports mmap() before creating CTF archives. This might be a reasonable middle ground, perhaps? (Added fallbacks for mmap() in all cases but CTF archives: as noted above, we can add fallbacks for archive usage, too, just not creation.) (oh btw you missed a bit: we use pread() too, and badly, ignoring the possibility of short reads or -EINTR returns. Fixing, and adding a fallback for that as well.) > * Use of sys/errno.h. The standard name is errno.h. Ancient historical wart: fixed, thank you! How did I miss that?! > * Use of elf.h. Non-ELF hosts won't have such a header. You should be > working with the existing include/elf/*.h definitions of ELF data > structures in binutils. This is all for build hosts that aren't ELF, right? I don't think we can reasonably expect ctf_open() or ctf_fdopen() to work for anything but raw CTF files on non-ELF hosts, given that by their very nature these functions are getting CTF data out of ELF sections, and non-ELF formats don't necessarily support anything like the named section concept ELF has got at all. The only other ELF-specificity is looking up types by symbol table offset. Again, I don't know enough about non-ELF platforms to know if this concept is even practical there, which would mean the data object and function info sections would be empty on such hosts, and ctf_lookup_by_symbol(), ctf_func_info() and ctf_func_args() would not function or would be excluded from the set of exported symbols entirely. This would reduce libctf's utility, but not eliminate it: external systems can still look up types by name or CTF type ID even if they can't do it by symbol. It is possible that such things could be revived: all we'd need for a given non-ELF platform would be a way to consistently split whatever they use as a symbol table into an ordered list of data and function objects that could be referenced by those CTF sections. However, for now, this functionality is intrinsically ELF-only in the sense that nobody has ever considered how it might work on non-ELF platforms and it certainly has no users there. However, for now we can do a little better than this: see below. > * Use of gelf.h. This seems to be something from some versions of libelf, > which isn't an existing build dependency of binutils at all (and given the > existence of multiple, incompatible versions of libelf, one should be wary > of depending on it). The only gelf.h I have locally here is in a checkout > of prelink sources. Again, use existing ELF structures in headers present > in binutils. This is a historical thing: libelf was of course part of Solaris so its usage was pervasive, even when unnecessary, as here. What we're actually using is a few datatypes, nothing more: the Elf64_Sym, from (on Linux, provided by glibc), the Elf*_GHdr and Elf*_SHdr, and the primitive ELF-sized datatypes like Elf64_Word that those structures use. I don't see any immediate replacement for most of this stuff in binutils, even though I'd expect to find it: the Elf64_External_Sym's members are entirely the wrong type (char arrays), and there doesn't seem to be any common non-architecture-dependent structure with more useful types at all! Elf64_Internal_Sym is very bfd-specific (and I'm trying not to have libctf depend on libbfd unnecessarily, since it needs little of its functionality), and the code in readelf that mocks up an internal_sym from file data spends almost all its time getting around the problem that its datatypes are different from the (standard-guaranteed) data types in the ELF file itself. This is more futzing about than seems sane given that we're not using the rest of bfd at all. So I'd rather find a way to do the simple 'get a bit of very simple data out of an ELF file we have an fd to (symbol lookups and a couple of section lookups)' without needing to rejig everything to use bfd just to do that, particularly given that libctf's APIs that involve the caller passing info corresponding to a section into libctf do not require the caller to use bfd and I have not the least idea how to go from data+size-and-no-fd to a bfd_asection (it's probably not possible). I could just copy the (fairly small number of) relevant types from glibc's installed elf.h header into the ctf internals (the license is compatible, after all, as is the copyright ownership), using a different (CTF-internal) name to avoid clashes causing trouble at build time. Would that be acceptable? This lets us operate unchanged on non-ELF hosts and when not targetting ELF, and leave this code in and even functional in that situation: it detects ELF files by their magic number, which will presumably never match things passed in to ctf_open() on non-ELF targets, and nothing would ever generate contents for the function info or data object sections on such non-ELF targets either (until we figured out how to do so), so the ELF-specific code involved in reading those sections is also not a problem. Adding more magic numbers for more executable types is possible: if we started handling COFF or PE or Mach-O or something like that, we would probably soon hit a stage where it would become useful to start using some bfd abstractions, but I think the time is not yet. (I don't know enough about these formats to know if they even *have* named sections.) > * Use of byteswap.h and endian.h. Such headers are not portably > available. Note how byteswap.h usage in gold / elfcpp is appropriately > conditional. Makes sense. I can easily arrange to use code like elfcpp does in that case. ... (done.)