From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <siddhesh@gotplt.org>
Received: from crocodile.elm.relay.mailchannels.net
 (crocodile.elm.relay.mailchannels.net [23.83.212.45])
 by sourceware.org (Postfix) with ESMTPS id 072133840C32
 for <libc-alpha@sourceware.org>; Thu, 21 Apr 2022 15:54:41 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 072133840C32
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=gotplt.org
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gotplt.org
Received: from relay.mailchannels.net (localhost [127.0.0.1])
 by relay.mailchannels.net (Postfix) with ESMTP id F214B6C1187;
 Thu, 21 Apr 2022 15:54:39 +0000 (UTC)
Received: from pdx1-sub0-mail-a307.dreamhost.com (unknown [127.0.0.6])
 (Authenticated sender: dreamhost)
 by relay.mailchannels.net (Postfix) with ESMTPA id 8FCE36C0F72;
 Thu, 21 Apr 2022 15:54:39 +0000 (UTC)
ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1650556479; a=rsa-sha256;
 cv=none;
 b=JgomECvTSIbRVFzq6aO2SFhLNNgvmdJBJmgTtR0YqBt0Cz2TxK5S4NwcWX9bxPoZYKja/T
 nL3ewXHNVHjwWXvl275gS+N7SmzwWk6B8V3W0BYJeYwcDr8eST9EmeihCC6yxIq+jCq98v
 KH2vzcPBwV127lkkExg3pbGnWGedwwAKElTGHbEx++Vr/AGcxOzP5r6mZwkZlUgaZ5YXfw
 ZL4zL7bS6lG7hT+baiTal+1RMwizq2ABRxMubbMbt+5GlAG74nIx08zp21kN8ijnlB9+OJ
 ABH3XKITSrOMHHW4LSK9lhicc3uelGkVdU67RxAp4YdMqCRCp7tBtA/hqDK62g==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
 d=mailchannels.net; s=arc-2022; t=1650556479;
 h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
 to:to:cc:mime-version:mime-version:content-type:content-type:
 content-transfer-encoding:content-transfer-encoding:
 in-reply-to:in-reply-to:references:references:dkim-signature;
 bh=2/+CNvfirmAtQathWHGyw+Q4meVTmmIpI4+/QZ8ylJ0=;
 b=gJdHHhUtd4iTLxYNy9mNmkUr/JG97cmpxzEDCRixc93Ht4IWePjp0FtTfYmxHzQNlSwGY7
 ByzFREbXbaUS2KbOwYB5jJR8/BF6DDi6uX4CX8kGq9aBQAGEkCeok8Ey3PQJmrSjet/Zjh
 ziuZZMlVB494dWHZ0+7OiZrqNQEv4r7jl/14RMPU9aXsmMJ45/ihWV1j50cZvLiuS3jsgL
 p9hGo+iZmWDbZ+F8h0xfjNhpKDUNsuYQIUuUmJZ8Xy0/KgezOP/X8NUYUOaD+yn7BbPSaL
 zYL9baDojoY8oh+dBEHl2sHLNlbR//srTbkqJDbIhwPbmQ88o0Yas8GNOXLQGA==
ARC-Authentication-Results: i=1; rspamd-7968956b8c-btz5p;
 auth=pass smtp.auth=dreamhost smtp.mailfrom=siddhesh@gotplt.org
X-MC-Relay: Neutral
X-MailChannels-SenderId: dreamhost|x-authsender|siddhesh@gotplt.org
X-MailChannels-Auth-Id: dreamhost
X-Shade-Tasty: 5a23fc51012da604_1650556479831_2086500717
X-MC-Loop-Signature: 1650556479831:1013861420
X-MC-Ingress-Time: 1650556479831
Received: from pdx1-sub0-mail-a307.dreamhost.com (pop.dreamhost.com
 [64.90.62.162]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384)
 by 100.107.255.130 (trex/6.7.1); Thu, 21 Apr 2022 15:54:39 +0000
Received: from [192.168.1.174] (unknown [1.186.121.46])
 (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256)
 (No client certificate requested)
 (Authenticated sender: siddhesh@gotplt.org)
 by pdx1-sub0-mail-a307.dreamhost.com (Postfix) with ESMTPSA id 4Kkhtb1Tydz1Sy; 
 Thu, 21 Apr 2022 08:54:30 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gotplt.org;
 s=dreamhost; t=1650556479;
 bh=2/+CNvfirmAtQathWHGyw+Q4meVTmmIpI4+/QZ8ylJ0=;
 h=Date:From:Subject:To:Content-Type:Content-Transfer-Encoding;
 b=Hm7jB4V+XzMilEJTyULcpPoNth9CLTAtWXVJR1kDnufBL503/siR6vQ8R6tG2+qWg
 Z16aShXgtqg4dBgVE2hFeeuZCns1ihIMbrXFxFxbkrkeQFeWLmDaS27M1xoFnLQdwd
 f6Q5Z63cWyW9C1I0YgYXONvDS9G9xLBTFDK80xxSkGmnQtjZjLwvmQwnzsEPx79rrg
 DTtaLxjTPsxkkDR4QC0b7bAKr02eCoisgrwL4NQdYF3je65REbO4Gqq2ECt4qSxt2z
 SqWeOkgoN+SAOY5dXRlFJBHD9I5/Z3k7Q858cDdd5GMgAx3VNfFZaxGwBRbmVqczhR
 scmyAssJIseXA==
Message-ID: <f23ce788-9506-026c-eda0-b1bb04a1fd8a@gotplt.org>
Date: Thu, 21 Apr 2022 21:24:26 +0530
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
 Thunderbird/91.7.0
From: Siddhesh Poyarekar <siddhesh@gotplt.org>
Subject: Re: [PATCH v3 1/2] scripts: Add glibcelf.py module
To: Florian Weimer <fweimer@redhat.com>, libc-alpha@sourceware.org
References: <d187ef1019edfe5d94a6a15cde4b537cccf44aad.1649691083.git.fweimer@redhat.com>
Content-Language: en-US
In-Reply-To: <d187ef1019edfe5d94a6a15cde4b537cccf44aad.1649691083.git.fweimer@redhat.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-3041.3 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, NICE_REPLY_A,
 RCVD_IN_DNSWL_NONE, RCVD_IN_SBL, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Apr 2022 15:54:46 -0000

On 11/04/2022 21:02, Florian Weimer via Libc-alpha wrote:
> Hopefully, this will lead to tests that are easier to maintain.  The
> current approach of parsing readelf -W output using regular expressions
> is not necessarily easier than parsing the ELF data directly.
> 
> This module is still somewhat incomplete (e.g., coverage of relocation
> types and versioning information is missing), but it is sufficient to
> perform basic symbol analysis or program header analysis.

This looks mostly OK, with some comments below.  Apart from a couple of 
possible typos and nits the comments are light suggestions and not 
blockers for commit.

> ---
> v3: Unchanged.
>   scripts/glibcelf.py | 842 ++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 842 insertions(+)
>   create mode 100644 scripts/glibcelf.py
> 
> diff --git a/scripts/glibcelf.py b/scripts/glibcelf.py
> new file mode 100644
> index 0000000000..053b9fa165
> --- /dev/null
> +++ b/scripts/glibcelf.py
> @@ -0,0 +1,842 @@
> +#!/usr/bin/python3
> +# ELF support functionality for Python.
> +# Copyright (C) 2022 Free Software Foundation, Inc.
> +# This file is part of the GNU C Library.
> +#
> +# The GNU C Library is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU Lesser General Public
> +# License as published by the Free Software Foundation; either
> +# version 2.1 of the License, or (at your option) any later version.
> +#
> +# The GNU C Library is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +# Lesser General Public License for more details.
> +#
> +# You should have received a copy of the GNU Lesser General Public
> +# License along with the GNU C Library; if not, see
> +# <https://www.gnu.org/licenses/>.
> +
> +"""Basic ELF parser.
> +
> +Use Image.readfile(path) to read an ELF file into memory and begin
> +parsing it.
> +
> +"""
> +
> +import collections
> +import enum
> +import struct
> +
> +class _OpenIntEnum(enum.IntEnum):
> +    """Integer enumeration that supports arbitrary int values."""
> +    @classmethod
> +    def _missing_(cls, value):
> +        # See enum.IntFlag._create_pseudo_member_.  This allows
> +        # creating of enum constants with arbitrary integer values.
> +        pseudo_member = int.__new__(cls, value)
> +        pseudo_member._name_ = None
> +        pseudo_member._value_ = value
> +        return pseudo_member
> +
> +    def __repr__(self):
> +        name = self._name_
> +        if name is not None:
> +            # The names have prefixes like SHT_, implying their type.
> +            return name
> +        return '{}({})'.format(self.__class__.__name__, self._value_)
> +
> +    def __str__(self):
> +        name = self._name_
> +        if name is not None:
> +            return name
> +        return str(self._value_)
> +
> +class ElfClass(_OpenIntEnum):
> +    """ELF word size.  Type of EI_CLASS values."""
> +    ELFCLASSNONE = 0
> +    ELFCLASS32 = 1
> +    ELFCLASS64 = 2
> +
> +class ElfData(_OpenIntEnum):
> +    """ELF endianess.  Type of EI_DATA values."""
> +    ELFDATANONE = 0
> +    ELFDATA2LSB = 1
> +    ELFDATA2MSB = 2
> +
> +class Machine(_OpenIntEnum):
> +    """ELF machine type.  Type of values in Ehdr.e_machine field."""
> +    EM_NONE = 0
> +    EM_M32 = 1
> +    EM_SPARC = 2
> +    EM_386 = 3
> +    EM_68K = 4
> +    EM_88K = 5
> +    EM_IAMCU = 6
> +    EM_860 = 7
> +    EM_MIPS = 8
> +    EM_S370 = 9
> +    EM_MIPS_RS3_LE = 10
> +    EM_PARISC = 15
> +    EM_VPP500 = 17
> +    EM_SPARC32PLUS = 18
> +    EM_960 = 19
> +    EM_PPC = 20
> +    EM_PPC64 = 21
> +    EM_S390 = 22
> +    EM_SPU = 23
> +    EM_V800 = 36
> +    EM_FR20 = 37
> +    EM_RH32 = 38
> +    EM_RCE = 39
> +    EM_ARM = 40
> +    EM_FAKE_ALPHA = 41
> +    EM_SH = 42
> +    EM_SPARCV9 = 43
> +    EM_TRICORE = 44
> +    EM_ARC = 45
> +    EM_H8_300 = 46
> +    EM_H8_300H = 47
> +    EM_H8S = 48
> +    EM_H8_500 = 49
> +    EM_IA_64 = 50
> +    EM_MIPS_X = 51
> +    EM_COLDFIRE = 52
> +    EM_68HC12 = 53
> +    EM_MMA = 54
> +    EM_PCP = 55
> +    EM_NCPU = 56
> +    EM_NDR1 = 57
> +    EM_STARCORE = 58
> +    EM_ME16 = 59
> +    EM_ST100 = 60
> +    EM_TINYJ = 61
> +    EM_X86_64 = 62
> +    EM_PDSP = 63
> +    EM_PDP10 = 64
> +    EM_PDP11 = 65
> +    EM_FX66 = 66
> +    EM_ST9PLUS = 67
> +    EM_ST7 = 68
> +    EM_68HC16 = 69
> +    EM_68HC11 = 70
> +    EM_68HC08 = 71
> +    EM_68HC05 = 72
> +    EM_SVX = 73
> +    EM_ST19 = 74
> +    EM_VAX = 75
> +    EM_CRIS = 76
> +    EM_JAVELIN = 77
> +    EM_FIREPATH = 78
> +    EM_ZSP = 79
> +    EM_MMIX = 80
> +    EM_HUANY = 81
> +    EM_PRISM = 82
> +    EM_AVR = 83
> +    EM_FR30 = 84
> +    EM_D10V = 85
> +    EM_D30V = 86
> +    EM_V850 = 87
> +    EM_M32R = 88
> +    EM_MN10300 = 89
> +    EM_MN10200 = 90
> +    EM_PJ = 91
> +    EM_OPENRISC = 92
> +    EM_ARC_COMPACT = 93
> +    EM_XTENSA = 94
> +    EM_VIDEOCORE = 95
> +    EM_TMM_GPP = 96
> +    EM_NS32K = 97
> +    EM_TPC = 98
> +    EM_SNP1K = 99
> +    EM_ST200 = 100
> +    EM_IP2K = 101
> +    EM_MAX = 102
> +    EM_CR = 103
> +    EM_F2MC16 = 104
> +    EM_MSP430 = 105
> +    EM_BLACKFIN = 106
> +    EM_SE_C33 = 107
> +    EM_SEP = 108
> +    EM_ARCA = 109
> +    EM_UNICORE = 110
> +    EM_EXCESS = 111
> +    EM_DXP = 112
> +    EM_ALTERA_NIOS2 = 113
> +    EM_CRX = 114
> +    EM_XGATE = 115
> +    EM_C166 = 116
> +    EM_M16C = 117
> +    EM_DSPIC30F = 118
> +    EM_CE = 119
> +    EM_M32C = 120
> +    EM_TSK3000 = 131
> +    EM_RS08 = 132
> +    EM_SHARC = 133
> +    EM_ECOG2 = 134
> +    EM_SCORE7 = 135
> +    EM_DSP24 = 136
> +    EM_VIDEOCORE3 = 137
> +    EM_LATTICEMICO32 = 138
> +    EM_SE_C17 = 139
> +    EM_TI_C6000 = 140
> +    EM_TI_C2000 = 141
> +    EM_TI_C5500 = 142
> +    EM_TI_ARP32 = 143
> +    EM_TI_PRU = 144
> +    EM_MMDSP_PLUS = 160
> +    EM_CYPRESS_M8C = 161
> +    EM_R32C = 162
> +    EM_TRIMEDIA = 163
> +    EM_QDSP6 = 164
> +    EM_8051 = 165
> +    EM_STXP7X = 166
> +    EM_NDS32 = 167
> +    EM_ECOG1X = 168
> +    EM_MAXQ30 = 169
> +    EM_XIMO16 = 170
> +    EM_MANIK = 171
> +    EM_CRAYNV2 = 172
> +    EM_RX = 173
> +    EM_METAG = 174
> +    EM_MCST_ELBRUS = 175
> +    EM_ECOG16 = 176
> +    EM_CR16 = 177
> +    EM_ETPU = 178
> +    EM_SLE9X = 179
> +    EM_L10M = 180
> +    EM_K10M = 181
> +    EM_AARCH64 = 183
> +    EM_AVR32 = 185
> +    EM_STM8 = 186
> +    EM_TILE64 = 187
> +    EM_TILEPRO = 188
> +    EM_MICROBLAZE = 189
> +    EM_CUDA = 190
> +    EM_TILEGX = 191
> +    EM_CLOUDSHIELD = 192
> +    EM_COREA_1ST = 193
> +    EM_COREA_2ND = 194
> +    EM_ARCV2 = 195
> +    EM_OPEN8 = 196
> +    EM_RL78 = 197
> +    EM_VIDEOCORE5 = 198
> +    EM_78KOR = 199
> +    EM_56800EX = 200
> +    EM_BA1 = 201
> +    EM_BA2 = 202
> +    EM_XCORE = 203
> +    EM_MCHP_PIC = 204
> +    EM_INTELGT = 205
> +    EM_KM32 = 210
> +    EM_KMX32 = 211
> +    EM_EMX16 = 212
> +    EM_EMX8 = 213
> +    EM_KVARC = 214
> +    EM_CDP = 215
> +    EM_COGE = 216
> +    EM_COOL = 217
> +    EM_NORC = 218
> +    EM_CSR_KALIMBA = 219
> +    EM_Z80 = 220
> +    EM_VISIUM = 221
> +    EM_FT32 = 222
> +    EM_MOXIE = 223
> +    EM_AMDGPU = 224
> +    EM_RISCV = 243
> +    EM_BPF = 247
> +    EM_CSKY = 252
> +    EM_NUM = 253
> +    EM_ALPHA = 0x9026
> +
> +class Et(_OpenIntEnum):
> +    """ELF file type.  Type of ET_* values and the Ehdr.e_type field."""
> +    ET_NONE = 0
> +    ET_REL = 1
> +    ET_EXEC = 2
> +    ET_DYN = 3
> +    ET_CORE = 4
> +
> +class Shn(_OpenIntEnum):
> +    """ELF reserved section indices."""
> +    SHN_UNDEF = 0
> +    SHN_ABS = 0xfff1
> +    SHN_COMMON = 0xfff2
> +    SHN_XINDEX = 0xffff
> +
> +class Sht(_OpenIntEnum):
> +    """ELF section types.  Type of SHT_* values."""
> +    SHT_NULL = 0
> +    SHT_PROGBITS = 1
> +    SHT_SYMTAB = 2
> +    SHT_STRTAB = 3
> +    SHT_RELA = 4
> +    SHT_HASH = 5
> +    SHT_DYNAMIC = 6
> +    SHT_NOTE = 7
> +    SHT_NOBITS = 8
> +    SHT_REL = 9
> +    SHT_DYNSYM = 11
> +    SHT_INIT_ARRAY = 14
> +    SHT_FINI_ARRAY = 15
> +    SHT_PREINIT_ARRAY = 16
> +    SHT_GROUP = 17
> +    SHT_SYMTAB_SHNDX = 18
> +    SHT_GNU_ATTRIBUTES = 0x6ffffff5
> +    SHT_GNU_HASH = 0x6ffffff6
> +    SHT_GNU_LIBLIST = 0x6ffffff7
> +    SHT_CHECKSUM = 0x6ffffff8
> +    SHT_GNU_verdef = 0x6ffffffd
> +    SHT_GNU_verneed = 0x6ffffffe
> +    SHT_GNU_versym = 0x6fffffff
> +
> +class Pf(enum.IntFlag):
> +    """Program header flags.  Type of Phdr.p_flags values."""
> +    PF_X = 1
> +    PF_W = 2
> +    PF_R = 4
> +
> +class Shf(enum.IntFlag):
> +    """Section flags.  Type of Shdr.sh_type values."""
> +    SHF_WRITE = 1 << 0
> +    SHF_ALLOC = 1 << 1
> +    SHF_EXECINSTR = 1 << 2
> +    SHF_MERGE = 1 << 4
> +    SHF_STRINGS = 1 << 5
> +    SHF_INFO_LINK = 1 << 6
> +    SHF_LINK_ORDER = 1 << 7
> +    SHF_OS_NONCONFORMING = 256
> +    SHF_GROUP = 1 << 9
> +    SHF_TLS = 1 << 10
> +    SHF_COMPRESSED = 1 << 11
> +    SHF_GNU_RETAIN = 1 << 21
> +    SHF_ORDERED = 1 << 30
> +    SHF_RETAIN = 1 << 31
> +
> +class Stb(_OpenIntEnum):
> +    """ELF symbol binding type."""
> +    STB_LOCAL = 0
> +    STB_GLOBAL = 1
> +    STB_WEAK = 3
> +    STB_GNU_UNIQUE = 10
> +
> +class Stt(_OpenIntEnum):
> +    """ELF symbol type."""
> +    STT_NOTYPE = 0
> +    STT_OBJECT = 1
> +    STT_FUNC = 2
> +    STT_SECTION = 3
> +    STT_FILE = 4
> +    STT_COMMON = 5
> +    STT_TLS = 6
> +    STT_GNU_IFUNC = 10
> +
> +class Pt(_OpenIntEnum):
> +    """ELF program header types.  Type of Phdr.p_type."""
> +    PT_NULL = 0
> +    PT_LOAD = 1
> +    PT_DYNAMIC = 2
> +    PT_INTERP = 3
> +    PT_NOTE = 4
> +    PT_SHLIB = 5
> +    PT_PHDR = 6
> +    PT_TLS = 7
> +    PT_NUM = 8
> +    PT_GNU_EH_FRAME = 0x6474e550
> +    PT_GNU_STACK = 0x6474e551
> +    PT_GNU_RELRO = 0x6474e552
> +    PT_GNU_PROPERTY = 0x6474e553
> +    PT_SUNWBSS = 0x6ffffffa
> +    PT_SUNWSTACK = 0x6ffffffb
> +
> +class Dt(_OpenIntEnum):
> +    """ELF dynamic segment tags.  Type of Dyn.d_val."""
> +    DT_NULL = 0
> +    DT_NEEDED = 1
> +    DT_PLTRELSZ = 2
> +    DT_PLTGOT = 3
> +    DT_HASH = 4
> +    DT_STRTAB = 5
> +    DT_SYMTAB = 6
> +    DT_RELA = 7
> +    DT_RELASZ = 8
> +    DT_RELAENT = 9
> +    DT_STRSZ = 10
> +    DT_SYMENT = 11
> +    DT_INIT = 12
> +    DT_FINI = 13
> +    DT_SONAME = 14
> +    DT_RPATH = 15
> +    DT_SYMBOLIC = 16
> +    DT_REL = 17
> +    DT_RELSZ = 18
> +    DT_RELENT = 19
> +    DT_PLTREL = 20
> +    DT_DEBUG = 21
> +    DT_TEXTREL = 22
> +    DT_JMPREL = 23
> +    DT_RUNPATH = 29
> +    DT_FLAGS = 30
> +    DT_ENCODING = 32
> +    DT_PREINIT_ARRAY = 32
> +    DT_PREINIT_ARRAYSZ = 33
> +    DT_SYMTAB_SHNDX = 34
> +    DT_GNU_PRELINKED = 0x6ffffdf5
> +    DT_GNU_CONFLICTSZ = 0x6ffffdf6
> +    DT_GNU_LIBLISTSZ = 0x6ffffdf7
> +    DT_CHECKSUM = 0x6ffffdf8
> +    DT_PLTPADSZ = 0x6ffffdf9
> +    DT_MOVEENT = 0x6ffffdfa
> +    DT_MOVESZ = 0x6ffffdfb
> +    DT_FEATURE_1 = 0x6ffffdfc
> +    DT_POSFLAG_1 = 0x6ffffdfd
> +    DT_SYMINSZ = 0x6ffffdfe
> +    DT_SYMINENT = 0x6ffffdff
> +    DT_GNU_HASH = 0x6ffffef5
> +    DT_TLSDESC_PLT = 0x6ffffef6
> +    DT_TLSDESC_GOT = 0x6ffffef7
> +    DT_GNU_CONFLICT = 0x6ffffef8
> +    DT_GNU_LIBLIST = 0x6ffffef9
> +    DT_CONFIG = 0x6ffffefa
> +    DT_DEPAUDIT = 0x6ffffefb
> +    DT_AUDIT = 0x6ffffefc
> +    DT_SYMINFO = 0x6ffffeff
> +    DT_VERSYM = 0x6ffffff0
> +    DT_RELACOUNT = 0x6ffffff9
> +    DT_RELCOUNT = 0x6ffffffa
> +    DT_FLAGS_1 = 0x6ffffffb
> +    DT_VERDEF = 0x6ffffffc
> +    DT_VERDEFNUM = 0x6ffffffd
> +    DT_VERNEED = 0x6ffffffe
> +    DT_VERNEEDNUM = 0x6fffffff
> +    DT_AUXILIARY = 0x7ffffffd
> +    DT_FILTER = 0x7fffffff

Could we generate all these from elf.h, or generate both, this and elf.h 
from some common source?  At the moment there are a lot of 
platform-specific macros missing and some common ones too, e.g. 
DT_BIND_NOW.  Further, STB_WEAK appears to have a different value from 
that in elf.h and there's an SHF_RETAIN that doesn't have a 
corresponding macro in elf.h.

We could avoid all this if it is all generated from a single source of 
truth.  Minimally, ISTM that STB_WEAK needs to be fixed and SHF_RETAIN 
dropped, but the rest is more a suggestion to ease future maintenance so 
you could either do it now or later as the script evolves.

> +
> +class StInfo:
> +    """ELF symbol binding and type.  Type of the Sym.st_info field."""
> +    def __init__(self, arg0, arg1=None):
> +        if isinstance(arg0, int) and arg1 is None:
> +            self.bind = Stb(arg0 >> 4)
> +            self.type = Stt(arg0 & 15)
> +        else:
> +            self.bind = Stb(arg0)
> +            self.type = Stt(arg1)
> +
> +    def value(self):
> +        """Returns the raw value for the bind/type combination."""
> +        return (self.bind.value() << 4) | (self.type.value())

OK.

> +
> +# Type in an ELF file.  Used for deserialization.
> +_Layout = collections.namedtuple('_Layout', 'unpack size')
> +
> +def _define_layouts(baseclass: type, layout32: str, layout64: str,
> +                    types=None, fields32=None):
> +    """Assign variants dict to baseclass.
> +
> +    The variants dict is indexed by (ElfClass, ElfData) pairs, and its
> +    values are _Layout instances.
> +
> +    """
> +    struct32 = struct.Struct(layout32)
> +    struct64 = struct.Struct(layout64)
> +
> +    # Check that the struct formats yield the right number of components.
> +    for s in (struct32, struct64):
> +        example = s.unpack(b' ' * s.size)
> +        if len(example) != len(baseclass._fields):
> +            raise ValueError('{!r} yields wrong field count: {} != {}'.format(
> +                s.format, len(example),  len(baseclass._fields)))
> +
> +    # Check that field names in types are correct.
> +    if types is None:
> +        types = ()
> +    for n in types:
> +        if n not in baseclass._fields:
> +            raise ValueError('{} does not have field {!r}'.format(
> +                baseclass.__name__, n))
> +
> +    if fields32 is not None \
> +       and set(fields32) != set(baseclass._fields):
> +        raise ValueError('{!r} is not a permutation of the fields {!r}'.format(
> +            fields32, baseclass._fields))

Validations.  OK.

> +
> +    def unique_name(name, used_names = (set((baseclass.__name__,))
> +                                        | set(baseclass._fields)
> +                                        | {n.__name__
> +                                           for n in (types or {}).values()})):
> +        """Find a name that is not used for a class or field name."""
> +        candidate = name
> +        n = 0
> +        while candidate in used_names:
> +            n += 1
> +            candidate = '{}{}'.format(name, n)
> +        used_names.add(candidate)
> +        return candidate

Another newline here please, so that the nested function stands out a bit.

> +    blob_name = unique_name('blob')
> +    struct_unpack_name = unique_name('struct_unpack')
> +    comps_name = unique_name('comps')
> +
> +    layouts = {}
> +    for (bits, elfclass, layout, fields) in (
> +            (32, ElfClass.ELFCLASS32, layout32, fields32),
> +            (64, ElfClass.ELFCLASS64, layout64, None),
> +    ):
> +        for (elfdata, structprefix, funcsuffix) in (
> +                (ElfData.ELFDATA2LSB, '<', 'LE'),
> +                (ElfData.ELFDATA2MSB, '>', 'BE'),
> +        ):
> +            env = {
> +                baseclass.__name__: baseclass,
> +                struct_unpack_name: struct.unpack,
> +            }
> +
> +            # Add the type converters.
> +            if types:
> +                for cls in types.values():
> +                    env[cls.__name__] = cls
> +
> +            funcname = ''.join(
> +                ('unpack_', baseclass.__name__, str(bits), funcsuffix))
> +
> +            code = '''
> +def {funcname}({blob_name}):
> +'''.format(funcname=funcname, blob_name=blob_name)
> +
> +            indent = ' ' * 4
> +            unpack_call = '{}({!r}, {})'.format(
> +                struct_unpack_name, structprefix + layout, blob_name)
> +            field_names = ', '.join(baseclass._fields)
> +            if types is None and fields is None:
> +                code += '{}return {}({})\n'.format(
> +                    indent, baseclass.__name__, unpack_call)
> +            else:
> +                # Destructuring tuple assignment.
> +                if fields is None:
> +                    code += '{}{} = {}\n'.format(
> +                        indent, field_names, unpack_call)
> +                else:
> +                    # Use custom field order.
> +                    code += '{}{} = {}\n'.format(
> +                        indent, ', '.join(fields), unpack_call)
> +
> +                # Perform the type conversions.
> +                for n in baseclass._fields:
> +                    if n in types:
> +                        code += '{}{} = {}({})\n'.format(
> +                            indent, n, types[n].__name__, n)
> +                # Create the named tuple.
> +                code += '{}return {}({})\n'.format(
> +                    indent, baseclass.__name__, field_names)
> +
> +            exec(code, env)
> +            layouts[(elfclass, elfdata)] = _Layout(
> +                env[funcname], struct.calcsize(layout))

Building layouts for all wordsize and endianness permutations.  OK.

> +    baseclass.layouts = layouts
> +
> +
> +# Corresponds to EI_* indices into Elf*_Ehdr.e_indent.
> +class Ident(collections.namedtuple('Ident',
> +    'ei_mag ei_class ei_data ei_version ei_osabi ei_abiversion ei_pad')):
> +
> +    def __new__(cls, *args):
> +        """Construct an object from a blob or its constituent fields."""
> +        if len(args) == 1:
> +            return cls.unpack(args[0])
> +        return cls.__base__.__new__(cls, *args)
> +
> +    @staticmethod
> +    def unpack(blob: memoryview) -> 'Ident':
> +        """Parse raws data into a tuple."""
> +        ei_mag, ei_class, ei_data, ei_version, ei_osabi, ei_abiversion, \
> +            ei_pad = struct.unpack('4s5B7s', blob)
> +        return Ident(ei_mag, ElfClass(ei_class), ElfData(ei_data),
> +                     ei_version, ei_osabi, ei_abiversion, ei_pad)
> +    size = 16

OK.

> +
> +# Corresponds to Elf32_Ehdr and Elf64_Ehdr.
> +Ehdr = collections.namedtuple('Ehdr',
> +   'e_ident e_type e_machine e_version e_entry e_phoff e_shoff e_flags'
> +    + ' e_ehsize e_phentsize e_phnum e_shentsize e_shnum e_shstrndx')
> +_define_layouts(Ehdr,
> +                layout32='16s2H5I6H',
> +                layout64='16s2HI3QI6H',

Verified against the structs.  OK.

> +                types=dict(e_ident=Ident,
> +                           e_machine=Machine,
> +                           e_type=Et,
> +                           e_shstrndx=Shn))
> +
> +# Corresponds to Elf32_Phdr and Elf64_Pdhr.  Order follows the latter.
> +Phdr = collections.namedtuple('Phdr',
> +    'p_type p_flags p_offset p_vaddr p_paddr p_filesz p_memsz p_align')
> +_define_layouts(Phdr,
> +                layout32='8I',
> +                fields32=('p_type', 'p_offset', 'p_vaddr', 'p_paddr',
> +                          'p_filesz', 'p_memsz', 'p_flags', 'p_align'),
> +                layout64='2I6Q',
> +            types=dict(p_type=Pt, p_flags=Pf))

Likewise.  OK.

> +
> +
> +# Corresponds to Elf32_Shdr and Elf64_Shdr.
> +class Shdr(collections.namedtuple('Shdr',
> +    'sh_name sh_type sh_flags sh_addr sh_offset sh_size sh_link sh_info'
> +    + ' sh_addralign sh_entsize')):
> +    def resolve(self, strtab: 'StringTable') -> 'Shdr':
> +        """Resolve sh_name using a string table."""
> +        return self.__class__(strtab.get(self[0]), *self[1:])
> +_define_layouts(Shdr,
> +                layout32='10I',
> +                layout64='2I4Q2I2Q',
> +                types=dict(sh_type=Sht,
> +                           sh_flags=Shf,
> +                           sh_link=Shn))

OK.

> +
> +# Corresponds to Elf32_Dyn and Elf64_Dyn.  The nesting through the
> +# d_un union is skipped, and d_ptr is missing (its representation in
> +# Python would be identical to d_val).
> +Dyn = collections.namedtuple('Dyn', 'd_tag d_val')
> +_define_layouts(Dyn,
> +                layout32='2i',
> +                layout64='2q',
> +                types=dict(d_tag=Dt))

OK.

> +
> +# Corresponds to Elf32_Sym and Elf64_Sym.
> +class Sym(collections.namedtuple('Sym',
> +    'st_name st_info st_other st_shndx st_value st_size')):
> +    def resolve(self, strtab: 'StringTable') -> 'Sym':
> +        """Resolve st_name using a string table."""
> +        return self.__class__(strtab.get(self[0]), *self[1:])
> +_define_layouts(Sym,
> +                layout32='3I2BH',
> +                layout64='I2BH2Q',
> +                fields32=('st_name', 'st_value', 'st_size', 'st_info',
> +                          'st_other', 'st_shndx'),
> +                types=dict(st_shndx=Shn,
> +                           st_info=StInfo))

OK.

> +
> +# Corresponds to Elf32_Rel and Elf64_Rel.
> +Rel = collections.namedtuple('Rel', 'r_offset r_info')
> +_define_layouts(Rel,
> +                layout32='2I',
> +                layout64='2Q')
> +

OK.

> +# Corresponds to Elf32_Rel and Elf64_Rel.
> +Rela = collections.namedtuple('Rela', 'r_offset r_info r_addend')
> +_define_layouts(Rela,
> +                layout32='3I',
> +                layout64='3Q')

OK.

> +
> +class StringTable:
> +    """ELF string table."""
> +    def __init__(self, blob):
> +        """Create a new string table backed by the data in the blob.
> +
> +        blob: a memoryview-like object
> +
> +        """
> +        self.blob = blob
> +
> +    def get(self, index) -> bytes:
> +        """Returns the null-terminated byte string at the index."""
> +        blob = self.blob
> +        endindex = index
> +        while True:
> +            if blob[endindex] == 0:
> +                return bytes(blob[index:endindex])
> +            endindex += 1

OK.

> +
> +class Image:
> +    """ELF image parser."""
> +    def __init__(self, image):
> +        """Create an ELF image from binary image data.
> +
> +        image: a memoryview-like object that supports efficient range
> +        subscripting.
> +
> +        """
> +        self.image = image
> +        ident = self.read(Ident, 0)
> +        classdata = (ident.ei_class, ident.ei_data)
> +        # Set self.Ehdr etc. to the subtypes with the right parsers.
> +        for typ in (Ehdr, Phdr, Shdr, Dyn, Sym, Rel, Rela):
> +            setattr(self, typ.__name__, typ.layouts.get(classdata, None))
> +
> +        if self.Ehdr is not None:
> +            self.ehdr = self.read(self.Ehdr, 0)
> +            self._shdr_num = self._compute_shdr_num()
> +        else:
> +            self.ehdr = None
> +            self._shdr_num = 0
> +
> +        self._section = {}
> +        self._stringtab = {}
> +
> +        if self._shdr_num > 0:
> +            self._shdr_strtab = self._find_shdr_strtab()
> +        else:
> +            self._shdr_strtab = None

OK.

> +
> +    @staticmethod
> +    def readfile(path: str) -> 'Image':
> +        """Reads the ELF file at the specified path."""
> +        with open(path, 'rb') as inp:
> +            return Image(memoryview(inp.read()))

OK.

> +
> +    def _compute_shdr_num(self) -> int:
> +        """Computes the actual number of section headers."""
> +        shnum = self.ehdr.e_shnum
> +        if shnum == 0:
> +            if self.ehdr.e_shoff == 0 or self.ehdr.e_shentsize == 0:
> +                # No section headers.
> +                return 0
> +            # Otherwise the extension mechanism is used (which may be
> +            # needed because e_shnum is just 16 bits).
> +            return self.read(self.Shdr, self.ehdr.e_shoff).sh_size
> +        return shnum

OK.

> +
> +    def _find_shdr_strtab(self) -> StringTable:
> +        """Finds the section header string table (maybe via extensions)."""
> +        shstrndx = self.ehdr.e_shstrndx
> +        if shstrndx == Shn.SHN_XINDEX:
> +            shstrndx = self.read(self.Shdr, self.ehdr.e_shoff).sh_link
> +        return self._find_stringtab(shstrndx)

OK.

> +
> +    def read(self, typ: type, offset:int ):
> +        """Reads an object at a specific offset.
> +
> +        The type must have been enhanced using _define_variants.
> +
> +        """
> +        return typ.unpack(self.image[offset: offset + typ.size])

OK.

> +
> +    def phdrs(self) -> Phdr:
> +        """Generator iterating over the program headers."""
> +        if self.ehdr is None:
> +            return
> +        size = self.ehdr.e_phentsize
> +        if size != self.Phdr.size:
> +            raise ValueError('Unexpected Phdr size in ELF header: {} != {}'
> +                             .format(size, self.Phdr.size))
> +
> +        offset = self.ehdr.e_phoff
> +        for _ in range(self.ehdr.e_phnum):
> +            yield self.read(self.Phdr, offset)
> +            offset += size

OK.

> +
> +    def shdrs(self, resolve: bool=True) -> Shdr:
> +        """Generator iterating over the section headers.
> +
> +        If resolve, section names are automatically translated
> +        using the section header string table.
> +
> +        """
> +        if self._shdr_num == 0:
> +            return
> +
> +        size = self.ehdr.e_shentsize
> +        if size != self.Shdr.size:
> +            raise ValueError('Unexpected Shdr size in ELF header: {} != {}'
> +                             .format(size, self.Shdr.size))
> +
> +        offset = self.ehdr.e_shoff
> +        for _ in range(self._shdr_num):
> +            shdr = self.read(self.Shdr, offset)
> +            if resolve:
> +                shdr = shdr.resolve(self._shdr_strtab)
> +            yield shdr
> +            offset += size

OK.

> +
> +    def dynamic(self) -> Dyn:
> +        """Generator iterating over the dynamic segment."""
> +        for phdr in self.phdrs():
> +            if phdr.p_type == Pt.PT_DYNAMIC:
> +                # Pick the first dynamic segment, like the loader.
> +                if phdr.p_filesz == 0:
> +                    # Probably separated debuginfo.
> +                    return
> +                offset = phdr.p_offset
> +                end = offset + phdr.p_memsz
> +                size = self.Dyn.size
> +                while True:
> +                    next_offset = offset + size
> +                    if next_offset > end:
> +                        raise ValueError(
> +                            'Dynamic segment size {} is not a multiple of Dyn size {}'.format(
> +                                phdr.p_memsz, size))
> +                    yield self.read(self.Dyn, offset)
> +                    if next_offset == end:
> +                        return
> +                    offset = next_offset

OK.

> +
> +    def syms(self, shdr: Shdr, resolve: bool=True) -> Sym:
> +        """A generator iterating over a symbol table.
> +
> +        If resolve, symbol names are automatically translated using
> +        the string table for the symbol table.
> +
> +        """
> +        assert shdr.sh_type == Sht.SHT_SYMTAB
> +        size = shdr.sh_entsize
> +        if size != self.Sym.size:
> +            raise ValueError('Invalid symbol table entry size {}'.format(size))
> +        offset = shdr.sh_offset
> +        end = shdr.sh_offset + shdr.sh_size
> +        if resolve:
> +            strtab = self._find_stringtab(shdr.sh_link)
> +        while offset < end:
> +            sym = self.read(self.Sym, offset)
> +            if resolve:
> +                sym = sym.resolve(strtab)
> +            yield sym
> +            offset += size
> +        if offset != end:
> +            raise ValueError('Symbol table is not a multiple of entry size')

OK.

> +
> +    def lookup_string(self, strtab_index: int, strtab_offset: int) -> bytes:
> +        """Looks up a string in a string table identified by its link index."""
> +        try:
> +            strtab = self._stringtab[strtab_index]
> +        except KeyError:
> +            strtab = self._find_stringtab(strtab_index)
> +        return strtab.get(strtab_offset)

OK.

> +
> +    def find_section(self, shndx: Shn) -> Shdr:
> +        """Returns the section header for the indexed section.
> +
> +        The section name is not resolved.
> +        """
> +        try:
> +            return self._section[shndx]
> +        except KeyError:
> +            pass
> +        if shndx in Shn:
> +            raise ValueError('Reserved section index {}'.format(shndx))
> +        idx = shndx.value
> +        if idx < 0 or idx > self._shdr_num:
> +            raise ValueError('Section index {} out of range [0, {})'.format(
> +                idx, self._shdr_num))
> +        shdr = self.read(
> +            self.Shdr, self.ehdr.e_shoff + idx * self.Shdr.size)
> +        self._section[shndx] = shdr
> +        return shdr

OK.

> +
> +    def _find_stringtab(self, sh_link: int) -> StringTable:
> +        if sh_link in self._stringtab:
> +            return self._stringtab
> +        if sh_link < 0 or sh_link >= self._shdr_num:
> +            raise ValueError('Section index {} out of range [0, {})'.format(
> +                sh_link, self._shdr_num))
> +        shdr = self.read(
> +            self.Shdr, self.ehdr.e_shoff + sh_link * self.Shdr.size)
> +        if shdr.sh_type != Sht.SHT_STRTAB:
> +            raise ValueError(
> +                'Section {} is not a string table: {}'.format(
> +                    sh_link, shdr.sh_type))
> +        strtab = StringTable(
> +            self.image[shdr.sh_offset:shdr.sh_offset + shdr.sh_size])
> +        # This could retrain essentially arbitrary amounts of data,
> +        # but caching string tables seems important for performance.
> +        self._stringtab[sh_link] = strtab
> +        return strtab
> +
> +
> +__all__ = [name for name in dir() if name[0].isupper()]

OK.

> 
> base-commit: 1a85970f41ea1e5abe6da2298a5e8fedcea26b70