From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from crocodile.elm.relay.mailchannels.net (crocodile.elm.relay.mailchannels.net [23.83.212.45]) by sourceware.org (Postfix) with ESMTPS id 072133840C32 for ; Thu, 21 Apr 2022 15:54:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 072133840C32 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=gotplt.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gotplt.org Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id F214B6C1187; Thu, 21 Apr 2022 15:54:39 +0000 (UTC) Received: from pdx1-sub0-mail-a307.dreamhost.com (unknown [127.0.0.6]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id 8FCE36C0F72; Thu, 21 Apr 2022 15:54:39 +0000 (UTC) ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1650556479; a=rsa-sha256; cv=none; b=JgomECvTSIbRVFzq6aO2SFhLNNgvmdJBJmgTtR0YqBt0Cz2TxK5S4NwcWX9bxPoZYKja/T nL3ewXHNVHjwWXvl275gS+N7SmzwWk6B8V3W0BYJeYwcDr8eST9EmeihCC6yxIq+jCq98v KH2vzcPBwV127lkkExg3pbGnWGedwwAKElTGHbEx++Vr/AGcxOzP5r6mZwkZlUgaZ5YXfw ZL4zL7bS6lG7hT+baiTal+1RMwizq2ABRxMubbMbt+5GlAG74nIx08zp21kN8ijnlB9+OJ ABH3XKITSrOMHHW4LSK9lhicc3uelGkVdU67RxAp4YdMqCRCp7tBtA/hqDK62g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=mailchannels.net; s=arc-2022; t=1650556479; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2/+CNvfirmAtQathWHGyw+Q4meVTmmIpI4+/QZ8ylJ0=; b=gJdHHhUtd4iTLxYNy9mNmkUr/JG97cmpxzEDCRixc93Ht4IWePjp0FtTfYmxHzQNlSwGY7 ByzFREbXbaUS2KbOwYB5jJR8/BF6DDi6uX4CX8kGq9aBQAGEkCeok8Ey3PQJmrSjet/Zjh ziuZZMlVB494dWHZ0+7OiZrqNQEv4r7jl/14RMPU9aXsmMJ45/ihWV1j50cZvLiuS3jsgL p9hGo+iZmWDbZ+F8h0xfjNhpKDUNsuYQIUuUmJZ8Xy0/KgezOP/X8NUYUOaD+yn7BbPSaL zYL9baDojoY8oh+dBEHl2sHLNlbR//srTbkqJDbIhwPbmQ88o0Yas8GNOXLQGA== ARC-Authentication-Results: i=1; rspamd-7968956b8c-btz5p; auth=pass smtp.auth=dreamhost smtp.mailfrom=siddhesh@gotplt.org X-MC-Relay: Neutral X-MailChannels-SenderId: dreamhost|x-authsender|siddhesh@gotplt.org X-MailChannels-Auth-Id: dreamhost X-Shade-Tasty: 5a23fc51012da604_1650556479831_2086500717 X-MC-Loop-Signature: 1650556479831:1013861420 X-MC-Ingress-Time: 1650556479831 Received: from pdx1-sub0-mail-a307.dreamhost.com (pop.dreamhost.com [64.90.62.162]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384) by 100.107.255.130 (trex/6.7.1); Thu, 21 Apr 2022 15:54:39 +0000 Received: from [192.168.1.174] (unknown [1.186.121.46]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: siddhesh@gotplt.org) by pdx1-sub0-mail-a307.dreamhost.com (Postfix) with ESMTPSA id 4Kkhtb1Tydz1Sy; Thu, 21 Apr 2022 08:54:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gotplt.org; s=dreamhost; t=1650556479; bh=2/+CNvfirmAtQathWHGyw+Q4meVTmmIpI4+/QZ8ylJ0=; h=Date:From:Subject:To:Content-Type:Content-Transfer-Encoding; b=Hm7jB4V+XzMilEJTyULcpPoNth9CLTAtWXVJR1kDnufBL503/siR6vQ8R6tG2+qWg Z16aShXgtqg4dBgVE2hFeeuZCns1ihIMbrXFxFxbkrkeQFeWLmDaS27M1xoFnLQdwd f6Q5Z63cWyW9C1I0YgYXONvDS9G9xLBTFDK80xxSkGmnQtjZjLwvmQwnzsEPx79rrg DTtaLxjTPsxkkDR4QC0b7bAKr02eCoisgrwL4NQdYF3je65REbO4Gqq2ECt4qSxt2z SqWeOkgoN+SAOY5dXRlFJBHD9I5/Z3k7Q858cDdd5GMgAx3VNfFZaxGwBRbmVqczhR scmyAssJIseXA== Message-ID: Date: Thu, 21 Apr 2022 21:24:26 +0530 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 From: Siddhesh Poyarekar Subject: Re: [PATCH v3 1/2] scripts: Add glibcelf.py module To: Florian Weimer , libc-alpha@sourceware.org References: Content-Language: en-US In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3041.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, RCVD_IN_SBL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Apr 2022 15:54:46 -0000 On 11/04/2022 21:02, Florian Weimer via Libc-alpha wrote: > Hopefully, this will lead to tests that are easier to maintain. The > current approach of parsing readelf -W output using regular expressions > is not necessarily easier than parsing the ELF data directly. > > This module is still somewhat incomplete (e.g., coverage of relocation > types and versioning information is missing), but it is sufficient to > perform basic symbol analysis or program header analysis. This looks mostly OK, with some comments below. Apart from a couple of possible typos and nits the comments are light suggestions and not blockers for commit. > --- > v3: Unchanged. > scripts/glibcelf.py | 842 ++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 842 insertions(+) > create mode 100644 scripts/glibcelf.py > > diff --git a/scripts/glibcelf.py b/scripts/glibcelf.py > new file mode 100644 > index 0000000000..053b9fa165 > --- /dev/null > +++ b/scripts/glibcelf.py > @@ -0,0 +1,842 @@ > +#!/usr/bin/python3 > +# ELF support functionality for Python. > +# Copyright (C) 2022 Free Software Foundation, Inc. > +# This file is part of the GNU C Library. > +# > +# The GNU C Library is free software; you can redistribute it and/or > +# modify it under the terms of the GNU Lesser General Public > +# License as published by the Free Software Foundation; either > +# version 2.1 of the License, or (at your option) any later version. > +# > +# The GNU C Library is distributed in the hope that it will be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > +# Lesser General Public License for more details. > +# > +# You should have received a copy of the GNU Lesser General Public > +# License along with the GNU C Library; if not, see > +# . > + > +"""Basic ELF parser. > + > +Use Image.readfile(path) to read an ELF file into memory and begin > +parsing it. > + > +""" > + > +import collections > +import enum > +import struct > + > +class _OpenIntEnum(enum.IntEnum): > + """Integer enumeration that supports arbitrary int values.""" > + @classmethod > + def _missing_(cls, value): > + # See enum.IntFlag._create_pseudo_member_. This allows > + # creating of enum constants with arbitrary integer values. > + pseudo_member = int.__new__(cls, value) > + pseudo_member._name_ = None > + pseudo_member._value_ = value > + return pseudo_member > + > + def __repr__(self): > + name = self._name_ > + if name is not None: > + # The names have prefixes like SHT_, implying their type. > + return name > + return '{}({})'.format(self.__class__.__name__, self._value_) > + > + def __str__(self): > + name = self._name_ > + if name is not None: > + return name > + return str(self._value_) > + > +class ElfClass(_OpenIntEnum): > + """ELF word size. Type of EI_CLASS values.""" > + ELFCLASSNONE = 0 > + ELFCLASS32 = 1 > + ELFCLASS64 = 2 > + > +class ElfData(_OpenIntEnum): > + """ELF endianess. Type of EI_DATA values.""" > + ELFDATANONE = 0 > + ELFDATA2LSB = 1 > + ELFDATA2MSB = 2 > + > +class Machine(_OpenIntEnum): > + """ELF machine type. Type of values in Ehdr.e_machine field.""" > + EM_NONE = 0 > + EM_M32 = 1 > + EM_SPARC = 2 > + EM_386 = 3 > + EM_68K = 4 > + EM_88K = 5 > + EM_IAMCU = 6 > + EM_860 = 7 > + EM_MIPS = 8 > + EM_S370 = 9 > + EM_MIPS_RS3_LE = 10 > + EM_PARISC = 15 > + EM_VPP500 = 17 > + EM_SPARC32PLUS = 18 > + EM_960 = 19 > + EM_PPC = 20 > + EM_PPC64 = 21 > + EM_S390 = 22 > + EM_SPU = 23 > + EM_V800 = 36 > + EM_FR20 = 37 > + EM_RH32 = 38 > + EM_RCE = 39 > + EM_ARM = 40 > + EM_FAKE_ALPHA = 41 > + EM_SH = 42 > + EM_SPARCV9 = 43 > + EM_TRICORE = 44 > + EM_ARC = 45 > + EM_H8_300 = 46 > + EM_H8_300H = 47 > + EM_H8S = 48 > + EM_H8_500 = 49 > + EM_IA_64 = 50 > + EM_MIPS_X = 51 > + EM_COLDFIRE = 52 > + EM_68HC12 = 53 > + EM_MMA = 54 > + EM_PCP = 55 > + EM_NCPU = 56 > + EM_NDR1 = 57 > + EM_STARCORE = 58 > + EM_ME16 = 59 > + EM_ST100 = 60 > + EM_TINYJ = 61 > + EM_X86_64 = 62 > + EM_PDSP = 63 > + EM_PDP10 = 64 > + EM_PDP11 = 65 > + EM_FX66 = 66 > + EM_ST9PLUS = 67 > + EM_ST7 = 68 > + EM_68HC16 = 69 > + EM_68HC11 = 70 > + EM_68HC08 = 71 > + EM_68HC05 = 72 > + EM_SVX = 73 > + EM_ST19 = 74 > + EM_VAX = 75 > + EM_CRIS = 76 > + EM_JAVELIN = 77 > + EM_FIREPATH = 78 > + EM_ZSP = 79 > + EM_MMIX = 80 > + EM_HUANY = 81 > + EM_PRISM = 82 > + EM_AVR = 83 > + EM_FR30 = 84 > + EM_D10V = 85 > + EM_D30V = 86 > + EM_V850 = 87 > + EM_M32R = 88 > + EM_MN10300 = 89 > + EM_MN10200 = 90 > + EM_PJ = 91 > + EM_OPENRISC = 92 > + EM_ARC_COMPACT = 93 > + EM_XTENSA = 94 > + EM_VIDEOCORE = 95 > + EM_TMM_GPP = 96 > + EM_NS32K = 97 > + EM_TPC = 98 > + EM_SNP1K = 99 > + EM_ST200 = 100 > + EM_IP2K = 101 > + EM_MAX = 102 > + EM_CR = 103 > + EM_F2MC16 = 104 > + EM_MSP430 = 105 > + EM_BLACKFIN = 106 > + EM_SE_C33 = 107 > + EM_SEP = 108 > + EM_ARCA = 109 > + EM_UNICORE = 110 > + EM_EXCESS = 111 > + EM_DXP = 112 > + EM_ALTERA_NIOS2 = 113 > + EM_CRX = 114 > + EM_XGATE = 115 > + EM_C166 = 116 > + EM_M16C = 117 > + EM_DSPIC30F = 118 > + EM_CE = 119 > + EM_M32C = 120 > + EM_TSK3000 = 131 > + EM_RS08 = 132 > + EM_SHARC = 133 > + EM_ECOG2 = 134 > + EM_SCORE7 = 135 > + EM_DSP24 = 136 > + EM_VIDEOCORE3 = 137 > + EM_LATTICEMICO32 = 138 > + EM_SE_C17 = 139 > + EM_TI_C6000 = 140 > + EM_TI_C2000 = 141 > + EM_TI_C5500 = 142 > + EM_TI_ARP32 = 143 > + EM_TI_PRU = 144 > + EM_MMDSP_PLUS = 160 > + EM_CYPRESS_M8C = 161 > + EM_R32C = 162 > + EM_TRIMEDIA = 163 > + EM_QDSP6 = 164 > + EM_8051 = 165 > + EM_STXP7X = 166 > + EM_NDS32 = 167 > + EM_ECOG1X = 168 > + EM_MAXQ30 = 169 > + EM_XIMO16 = 170 > + EM_MANIK = 171 > + EM_CRAYNV2 = 172 > + EM_RX = 173 > + EM_METAG = 174 > + EM_MCST_ELBRUS = 175 > + EM_ECOG16 = 176 > + EM_CR16 = 177 > + EM_ETPU = 178 > + EM_SLE9X = 179 > + EM_L10M = 180 > + EM_K10M = 181 > + EM_AARCH64 = 183 > + EM_AVR32 = 185 > + EM_STM8 = 186 > + EM_TILE64 = 187 > + EM_TILEPRO = 188 > + EM_MICROBLAZE = 189 > + EM_CUDA = 190 > + EM_TILEGX = 191 > + EM_CLOUDSHIELD = 192 > + EM_COREA_1ST = 193 > + EM_COREA_2ND = 194 > + EM_ARCV2 = 195 > + EM_OPEN8 = 196 > + EM_RL78 = 197 > + EM_VIDEOCORE5 = 198 > + EM_78KOR = 199 > + EM_56800EX = 200 > + EM_BA1 = 201 > + EM_BA2 = 202 > + EM_XCORE = 203 > + EM_MCHP_PIC = 204 > + EM_INTELGT = 205 > + EM_KM32 = 210 > + EM_KMX32 = 211 > + EM_EMX16 = 212 > + EM_EMX8 = 213 > + EM_KVARC = 214 > + EM_CDP = 215 > + EM_COGE = 216 > + EM_COOL = 217 > + EM_NORC = 218 > + EM_CSR_KALIMBA = 219 > + EM_Z80 = 220 > + EM_VISIUM = 221 > + EM_FT32 = 222 > + EM_MOXIE = 223 > + EM_AMDGPU = 224 > + EM_RISCV = 243 > + EM_BPF = 247 > + EM_CSKY = 252 > + EM_NUM = 253 > + EM_ALPHA = 0x9026 > + > +class Et(_OpenIntEnum): > + """ELF file type. Type of ET_* values and the Ehdr.e_type field.""" > + ET_NONE = 0 > + ET_REL = 1 > + ET_EXEC = 2 > + ET_DYN = 3 > + ET_CORE = 4 > + > +class Shn(_OpenIntEnum): > + """ELF reserved section indices.""" > + SHN_UNDEF = 0 > + SHN_ABS = 0xfff1 > + SHN_COMMON = 0xfff2 > + SHN_XINDEX = 0xffff > + > +class Sht(_OpenIntEnum): > + """ELF section types. Type of SHT_* values.""" > + SHT_NULL = 0 > + SHT_PROGBITS = 1 > + SHT_SYMTAB = 2 > + SHT_STRTAB = 3 > + SHT_RELA = 4 > + SHT_HASH = 5 > + SHT_DYNAMIC = 6 > + SHT_NOTE = 7 > + SHT_NOBITS = 8 > + SHT_REL = 9 > + SHT_DYNSYM = 11 > + SHT_INIT_ARRAY = 14 > + SHT_FINI_ARRAY = 15 > + SHT_PREINIT_ARRAY = 16 > + SHT_GROUP = 17 > + SHT_SYMTAB_SHNDX = 18 > + SHT_GNU_ATTRIBUTES = 0x6ffffff5 > + SHT_GNU_HASH = 0x6ffffff6 > + SHT_GNU_LIBLIST = 0x6ffffff7 > + SHT_CHECKSUM = 0x6ffffff8 > + SHT_GNU_verdef = 0x6ffffffd > + SHT_GNU_verneed = 0x6ffffffe > + SHT_GNU_versym = 0x6fffffff > + > +class Pf(enum.IntFlag): > + """Program header flags. Type of Phdr.p_flags values.""" > + PF_X = 1 > + PF_W = 2 > + PF_R = 4 > + > +class Shf(enum.IntFlag): > + """Section flags. Type of Shdr.sh_type values.""" > + SHF_WRITE = 1 << 0 > + SHF_ALLOC = 1 << 1 > + SHF_EXECINSTR = 1 << 2 > + SHF_MERGE = 1 << 4 > + SHF_STRINGS = 1 << 5 > + SHF_INFO_LINK = 1 << 6 > + SHF_LINK_ORDER = 1 << 7 > + SHF_OS_NONCONFORMING = 256 > + SHF_GROUP = 1 << 9 > + SHF_TLS = 1 << 10 > + SHF_COMPRESSED = 1 << 11 > + SHF_GNU_RETAIN = 1 << 21 > + SHF_ORDERED = 1 << 30 > + SHF_RETAIN = 1 << 31 > + > +class Stb(_OpenIntEnum): > + """ELF symbol binding type.""" > + STB_LOCAL = 0 > + STB_GLOBAL = 1 > + STB_WEAK = 3 > + STB_GNU_UNIQUE = 10 > + > +class Stt(_OpenIntEnum): > + """ELF symbol type.""" > + STT_NOTYPE = 0 > + STT_OBJECT = 1 > + STT_FUNC = 2 > + STT_SECTION = 3 > + STT_FILE = 4 > + STT_COMMON = 5 > + STT_TLS = 6 > + STT_GNU_IFUNC = 10 > + > +class Pt(_OpenIntEnum): > + """ELF program header types. Type of Phdr.p_type.""" > + PT_NULL = 0 > + PT_LOAD = 1 > + PT_DYNAMIC = 2 > + PT_INTERP = 3 > + PT_NOTE = 4 > + PT_SHLIB = 5 > + PT_PHDR = 6 > + PT_TLS = 7 > + PT_NUM = 8 > + PT_GNU_EH_FRAME = 0x6474e550 > + PT_GNU_STACK = 0x6474e551 > + PT_GNU_RELRO = 0x6474e552 > + PT_GNU_PROPERTY = 0x6474e553 > + PT_SUNWBSS = 0x6ffffffa > + PT_SUNWSTACK = 0x6ffffffb > + > +class Dt(_OpenIntEnum): > + """ELF dynamic segment tags. Type of Dyn.d_val.""" > + DT_NULL = 0 > + DT_NEEDED = 1 > + DT_PLTRELSZ = 2 > + DT_PLTGOT = 3 > + DT_HASH = 4 > + DT_STRTAB = 5 > + DT_SYMTAB = 6 > + DT_RELA = 7 > + DT_RELASZ = 8 > + DT_RELAENT = 9 > + DT_STRSZ = 10 > + DT_SYMENT = 11 > + DT_INIT = 12 > + DT_FINI = 13 > + DT_SONAME = 14 > + DT_RPATH = 15 > + DT_SYMBOLIC = 16 > + DT_REL = 17 > + DT_RELSZ = 18 > + DT_RELENT = 19 > + DT_PLTREL = 20 > + DT_DEBUG = 21 > + DT_TEXTREL = 22 > + DT_JMPREL = 23 > + DT_RUNPATH = 29 > + DT_FLAGS = 30 > + DT_ENCODING = 32 > + DT_PREINIT_ARRAY = 32 > + DT_PREINIT_ARRAYSZ = 33 > + DT_SYMTAB_SHNDX = 34 > + DT_GNU_PRELINKED = 0x6ffffdf5 > + DT_GNU_CONFLICTSZ = 0x6ffffdf6 > + DT_GNU_LIBLISTSZ = 0x6ffffdf7 > + DT_CHECKSUM = 0x6ffffdf8 > + DT_PLTPADSZ = 0x6ffffdf9 > + DT_MOVEENT = 0x6ffffdfa > + DT_MOVESZ = 0x6ffffdfb > + DT_FEATURE_1 = 0x6ffffdfc > + DT_POSFLAG_1 = 0x6ffffdfd > + DT_SYMINSZ = 0x6ffffdfe > + DT_SYMINENT = 0x6ffffdff > + DT_GNU_HASH = 0x6ffffef5 > + DT_TLSDESC_PLT = 0x6ffffef6 > + DT_TLSDESC_GOT = 0x6ffffef7 > + DT_GNU_CONFLICT = 0x6ffffef8 > + DT_GNU_LIBLIST = 0x6ffffef9 > + DT_CONFIG = 0x6ffffefa > + DT_DEPAUDIT = 0x6ffffefb > + DT_AUDIT = 0x6ffffefc > + DT_SYMINFO = 0x6ffffeff > + DT_VERSYM = 0x6ffffff0 > + DT_RELACOUNT = 0x6ffffff9 > + DT_RELCOUNT = 0x6ffffffa > + DT_FLAGS_1 = 0x6ffffffb > + DT_VERDEF = 0x6ffffffc > + DT_VERDEFNUM = 0x6ffffffd > + DT_VERNEED = 0x6ffffffe > + DT_VERNEEDNUM = 0x6fffffff > + DT_AUXILIARY = 0x7ffffffd > + DT_FILTER = 0x7fffffff Could we generate all these from elf.h, or generate both, this and elf.h from some common source? At the moment there are a lot of platform-specific macros missing and some common ones too, e.g. DT_BIND_NOW. Further, STB_WEAK appears to have a different value from that in elf.h and there's an SHF_RETAIN that doesn't have a corresponding macro in elf.h. We could avoid all this if it is all generated from a single source of truth. Minimally, ISTM that STB_WEAK needs to be fixed and SHF_RETAIN dropped, but the rest is more a suggestion to ease future maintenance so you could either do it now or later as the script evolves. > + > +class StInfo: > + """ELF symbol binding and type. Type of the Sym.st_info field.""" > + def __init__(self, arg0, arg1=None): > + if isinstance(arg0, int) and arg1 is None: > + self.bind = Stb(arg0 >> 4) > + self.type = Stt(arg0 & 15) > + else: > + self.bind = Stb(arg0) > + self.type = Stt(arg1) > + > + def value(self): > + """Returns the raw value for the bind/type combination.""" > + return (self.bind.value() << 4) | (self.type.value()) OK. > + > +# Type in an ELF file. Used for deserialization. > +_Layout = collections.namedtuple('_Layout', 'unpack size') > + > +def _define_layouts(baseclass: type, layout32: str, layout64: str, > + types=None, fields32=None): > + """Assign variants dict to baseclass. > + > + The variants dict is indexed by (ElfClass, ElfData) pairs, and its > + values are _Layout instances. > + > + """ > + struct32 = struct.Struct(layout32) > + struct64 = struct.Struct(layout64) > + > + # Check that the struct formats yield the right number of components. > + for s in (struct32, struct64): > + example = s.unpack(b' ' * s.size) > + if len(example) != len(baseclass._fields): > + raise ValueError('{!r} yields wrong field count: {} != {}'.format( > + s.format, len(example), len(baseclass._fields))) > + > + # Check that field names in types are correct. > + if types is None: > + types = () > + for n in types: > + if n not in baseclass._fields: > + raise ValueError('{} does not have field {!r}'.format( > + baseclass.__name__, n)) > + > + if fields32 is not None \ > + and set(fields32) != set(baseclass._fields): > + raise ValueError('{!r} is not a permutation of the fields {!r}'.format( > + fields32, baseclass._fields)) Validations. OK. > + > + def unique_name(name, used_names = (set((baseclass.__name__,)) > + | set(baseclass._fields) > + | {n.__name__ > + for n in (types or {}).values()})): > + """Find a name that is not used for a class or field name.""" > + candidate = name > + n = 0 > + while candidate in used_names: > + n += 1 > + candidate = '{}{}'.format(name, n) > + used_names.add(candidate) > + return candidate Another newline here please, so that the nested function stands out a bit. > + blob_name = unique_name('blob') > + struct_unpack_name = unique_name('struct_unpack') > + comps_name = unique_name('comps') > + > + layouts = {} > + for (bits, elfclass, layout, fields) in ( > + (32, ElfClass.ELFCLASS32, layout32, fields32), > + (64, ElfClass.ELFCLASS64, layout64, None), > + ): > + for (elfdata, structprefix, funcsuffix) in ( > + (ElfData.ELFDATA2LSB, '<', 'LE'), > + (ElfData.ELFDATA2MSB, '>', 'BE'), > + ): > + env = { > + baseclass.__name__: baseclass, > + struct_unpack_name: struct.unpack, > + } > + > + # Add the type converters. > + if types: > + for cls in types.values(): > + env[cls.__name__] = cls > + > + funcname = ''.join( > + ('unpack_', baseclass.__name__, str(bits), funcsuffix)) > + > + code = ''' > +def {funcname}({blob_name}): > +'''.format(funcname=funcname, blob_name=blob_name) > + > + indent = ' ' * 4 > + unpack_call = '{}({!r}, {})'.format( > + struct_unpack_name, structprefix + layout, blob_name) > + field_names = ', '.join(baseclass._fields) > + if types is None and fields is None: > + code += '{}return {}({})\n'.format( > + indent, baseclass.__name__, unpack_call) > + else: > + # Destructuring tuple assignment. > + if fields is None: > + code += '{}{} = {}\n'.format( > + indent, field_names, unpack_call) > + else: > + # Use custom field order. > + code += '{}{} = {}\n'.format( > + indent, ', '.join(fields), unpack_call) > + > + # Perform the type conversions. > + for n in baseclass._fields: > + if n in types: > + code += '{}{} = {}({})\n'.format( > + indent, n, types[n].__name__, n) > + # Create the named tuple. > + code += '{}return {}({})\n'.format( > + indent, baseclass.__name__, field_names) > + > + exec(code, env) > + layouts[(elfclass, elfdata)] = _Layout( > + env[funcname], struct.calcsize(layout)) Building layouts for all wordsize and endianness permutations. OK. > + baseclass.layouts = layouts > + > + > +# Corresponds to EI_* indices into Elf*_Ehdr.e_indent. > +class Ident(collections.namedtuple('Ident', > + 'ei_mag ei_class ei_data ei_version ei_osabi ei_abiversion ei_pad')): > + > + def __new__(cls, *args): > + """Construct an object from a blob or its constituent fields.""" > + if len(args) == 1: > + return cls.unpack(args[0]) > + return cls.__base__.__new__(cls, *args) > + > + @staticmethod > + def unpack(blob: memoryview) -> 'Ident': > + """Parse raws data into a tuple.""" > + ei_mag, ei_class, ei_data, ei_version, ei_osabi, ei_abiversion, \ > + ei_pad = struct.unpack('4s5B7s', blob) > + return Ident(ei_mag, ElfClass(ei_class), ElfData(ei_data), > + ei_version, ei_osabi, ei_abiversion, ei_pad) > + size = 16 OK. > + > +# Corresponds to Elf32_Ehdr and Elf64_Ehdr. > +Ehdr = collections.namedtuple('Ehdr', > + 'e_ident e_type e_machine e_version e_entry e_phoff e_shoff e_flags' > + + ' e_ehsize e_phentsize e_phnum e_shentsize e_shnum e_shstrndx') > +_define_layouts(Ehdr, > + layout32='16s2H5I6H', > + layout64='16s2HI3QI6H', Verified against the structs. OK. > + types=dict(e_ident=Ident, > + e_machine=Machine, > + e_type=Et, > + e_shstrndx=Shn)) > + > +# Corresponds to Elf32_Phdr and Elf64_Pdhr. Order follows the latter. > +Phdr = collections.namedtuple('Phdr', > + 'p_type p_flags p_offset p_vaddr p_paddr p_filesz p_memsz p_align') > +_define_layouts(Phdr, > + layout32='8I', > + fields32=('p_type', 'p_offset', 'p_vaddr', 'p_paddr', > + 'p_filesz', 'p_memsz', 'p_flags', 'p_align'), > + layout64='2I6Q', > + types=dict(p_type=Pt, p_flags=Pf)) Likewise. OK. > + > + > +# Corresponds to Elf32_Shdr and Elf64_Shdr. > +class Shdr(collections.namedtuple('Shdr', > + 'sh_name sh_type sh_flags sh_addr sh_offset sh_size sh_link sh_info' > + + ' sh_addralign sh_entsize')): > + def resolve(self, strtab: 'StringTable') -> 'Shdr': > + """Resolve sh_name using a string table.""" > + return self.__class__(strtab.get(self[0]), *self[1:]) > +_define_layouts(Shdr, > + layout32='10I', > + layout64='2I4Q2I2Q', > + types=dict(sh_type=Sht, > + sh_flags=Shf, > + sh_link=Shn)) OK. > + > +# Corresponds to Elf32_Dyn and Elf64_Dyn. The nesting through the > +# d_un union is skipped, and d_ptr is missing (its representation in > +# Python would be identical to d_val). > +Dyn = collections.namedtuple('Dyn', 'd_tag d_val') > +_define_layouts(Dyn, > + layout32='2i', > + layout64='2q', > + types=dict(d_tag=Dt)) OK. > + > +# Corresponds to Elf32_Sym and Elf64_Sym. > +class Sym(collections.namedtuple('Sym', > + 'st_name st_info st_other st_shndx st_value st_size')): > + def resolve(self, strtab: 'StringTable') -> 'Sym': > + """Resolve st_name using a string table.""" > + return self.__class__(strtab.get(self[0]), *self[1:]) > +_define_layouts(Sym, > + layout32='3I2BH', > + layout64='I2BH2Q', > + fields32=('st_name', 'st_value', 'st_size', 'st_info', > + 'st_other', 'st_shndx'), > + types=dict(st_shndx=Shn, > + st_info=StInfo)) OK. > + > +# Corresponds to Elf32_Rel and Elf64_Rel. > +Rel = collections.namedtuple('Rel', 'r_offset r_info') > +_define_layouts(Rel, > + layout32='2I', > + layout64='2Q') > + OK. > +# Corresponds to Elf32_Rel and Elf64_Rel. > +Rela = collections.namedtuple('Rela', 'r_offset r_info r_addend') > +_define_layouts(Rela, > + layout32='3I', > + layout64='3Q') OK. > + > +class StringTable: > + """ELF string table.""" > + def __init__(self, blob): > + """Create a new string table backed by the data in the blob. > + > + blob: a memoryview-like object > + > + """ > + self.blob = blob > + > + def get(self, index) -> bytes: > + """Returns the null-terminated byte string at the index.""" > + blob = self.blob > + endindex = index > + while True: > + if blob[endindex] == 0: > + return bytes(blob[index:endindex]) > + endindex += 1 OK. > + > +class Image: > + """ELF image parser.""" > + def __init__(self, image): > + """Create an ELF image from binary image data. > + > + image: a memoryview-like object that supports efficient range > + subscripting. > + > + """ > + self.image = image > + ident = self.read(Ident, 0) > + classdata = (ident.ei_class, ident.ei_data) > + # Set self.Ehdr etc. to the subtypes with the right parsers. > + for typ in (Ehdr, Phdr, Shdr, Dyn, Sym, Rel, Rela): > + setattr(self, typ.__name__, typ.layouts.get(classdata, None)) > + > + if self.Ehdr is not None: > + self.ehdr = self.read(self.Ehdr, 0) > + self._shdr_num = self._compute_shdr_num() > + else: > + self.ehdr = None > + self._shdr_num = 0 > + > + self._section = {} > + self._stringtab = {} > + > + if self._shdr_num > 0: > + self._shdr_strtab = self._find_shdr_strtab() > + else: > + self._shdr_strtab = None OK. > + > + @staticmethod > + def readfile(path: str) -> 'Image': > + """Reads the ELF file at the specified path.""" > + with open(path, 'rb') as inp: > + return Image(memoryview(inp.read())) OK. > + > + def _compute_shdr_num(self) -> int: > + """Computes the actual number of section headers.""" > + shnum = self.ehdr.e_shnum > + if shnum == 0: > + if self.ehdr.e_shoff == 0 or self.ehdr.e_shentsize == 0: > + # No section headers. > + return 0 > + # Otherwise the extension mechanism is used (which may be > + # needed because e_shnum is just 16 bits). > + return self.read(self.Shdr, self.ehdr.e_shoff).sh_size > + return shnum OK. > + > + def _find_shdr_strtab(self) -> StringTable: > + """Finds the section header string table (maybe via extensions).""" > + shstrndx = self.ehdr.e_shstrndx > + if shstrndx == Shn.SHN_XINDEX: > + shstrndx = self.read(self.Shdr, self.ehdr.e_shoff).sh_link > + return self._find_stringtab(shstrndx) OK. > + > + def read(self, typ: type, offset:int ): > + """Reads an object at a specific offset. > + > + The type must have been enhanced using _define_variants. > + > + """ > + return typ.unpack(self.image[offset: offset + typ.size]) OK. > + > + def phdrs(self) -> Phdr: > + """Generator iterating over the program headers.""" > + if self.ehdr is None: > + return > + size = self.ehdr.e_phentsize > + if size != self.Phdr.size: > + raise ValueError('Unexpected Phdr size in ELF header: {} != {}' > + .format(size, self.Phdr.size)) > + > + offset = self.ehdr.e_phoff > + for _ in range(self.ehdr.e_phnum): > + yield self.read(self.Phdr, offset) > + offset += size OK. > + > + def shdrs(self, resolve: bool=True) -> Shdr: > + """Generator iterating over the section headers. > + > + If resolve, section names are automatically translated > + using the section header string table. > + > + """ > + if self._shdr_num == 0: > + return > + > + size = self.ehdr.e_shentsize > + if size != self.Shdr.size: > + raise ValueError('Unexpected Shdr size in ELF header: {} != {}' > + .format(size, self.Shdr.size)) > + > + offset = self.ehdr.e_shoff > + for _ in range(self._shdr_num): > + shdr = self.read(self.Shdr, offset) > + if resolve: > + shdr = shdr.resolve(self._shdr_strtab) > + yield shdr > + offset += size OK. > + > + def dynamic(self) -> Dyn: > + """Generator iterating over the dynamic segment.""" > + for phdr in self.phdrs(): > + if phdr.p_type == Pt.PT_DYNAMIC: > + # Pick the first dynamic segment, like the loader. > + if phdr.p_filesz == 0: > + # Probably separated debuginfo. > + return > + offset = phdr.p_offset > + end = offset + phdr.p_memsz > + size = self.Dyn.size > + while True: > + next_offset = offset + size > + if next_offset > end: > + raise ValueError( > + 'Dynamic segment size {} is not a multiple of Dyn size {}'.format( > + phdr.p_memsz, size)) > + yield self.read(self.Dyn, offset) > + if next_offset == end: > + return > + offset = next_offset OK. > + > + def syms(self, shdr: Shdr, resolve: bool=True) -> Sym: > + """A generator iterating over a symbol table. > + > + If resolve, symbol names are automatically translated using > + the string table for the symbol table. > + > + """ > + assert shdr.sh_type == Sht.SHT_SYMTAB > + size = shdr.sh_entsize > + if size != self.Sym.size: > + raise ValueError('Invalid symbol table entry size {}'.format(size)) > + offset = shdr.sh_offset > + end = shdr.sh_offset + shdr.sh_size > + if resolve: > + strtab = self._find_stringtab(shdr.sh_link) > + while offset < end: > + sym = self.read(self.Sym, offset) > + if resolve: > + sym = sym.resolve(strtab) > + yield sym > + offset += size > + if offset != end: > + raise ValueError('Symbol table is not a multiple of entry size') OK. > + > + def lookup_string(self, strtab_index: int, strtab_offset: int) -> bytes: > + """Looks up a string in a string table identified by its link index.""" > + try: > + strtab = self._stringtab[strtab_index] > + except KeyError: > + strtab = self._find_stringtab(strtab_index) > + return strtab.get(strtab_offset) OK. > + > + def find_section(self, shndx: Shn) -> Shdr: > + """Returns the section header for the indexed section. > + > + The section name is not resolved. > + """ > + try: > + return self._section[shndx] > + except KeyError: > + pass > + if shndx in Shn: > + raise ValueError('Reserved section index {}'.format(shndx)) > + idx = shndx.value > + if idx < 0 or idx > self._shdr_num: > + raise ValueError('Section index {} out of range [0, {})'.format( > + idx, self._shdr_num)) > + shdr = self.read( > + self.Shdr, self.ehdr.e_shoff + idx * self.Shdr.size) > + self._section[shndx] = shdr > + return shdr OK. > + > + def _find_stringtab(self, sh_link: int) -> StringTable: > + if sh_link in self._stringtab: > + return self._stringtab > + if sh_link < 0 or sh_link >= self._shdr_num: > + raise ValueError('Section index {} out of range [0, {})'.format( > + sh_link, self._shdr_num)) > + shdr = self.read( > + self.Shdr, self.ehdr.e_shoff + sh_link * self.Shdr.size) > + if shdr.sh_type != Sht.SHT_STRTAB: > + raise ValueError( > + 'Section {} is not a string table: {}'.format( > + sh_link, shdr.sh_type)) > + strtab = StringTable( > + self.image[shdr.sh_offset:shdr.sh_offset + shdr.sh_size]) > + # This could retrain essentially arbitrary amounts of data, > + # but caching string tables seems important for performance. > + self._stringtab[sh_link] = strtab > + return strtab > + > + > +__all__ = [name for name in dir() if name[0].isupper()] OK. > > base-commit: 1a85970f41ea1e5abe6da2298a5e8fedcea26b70