From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by sourceware.org (Postfix) with ESMTPS id 3717B3858C83 for ; Thu, 11 Aug 2022 23:26:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3717B3858C83 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=harmstone.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-wr1-x42b.google.com with SMTP id bs25so4478695wrb.2 for ; Thu, 11 Aug 2022 16:26:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :sender:from:to:cc; bh=9CLMgMOtFgNk6Avjk/mvoSwMOuz8j5xt0xHlFbnLyAc=; b=oaKVU/F3Y3x9Y6T8S2Evgw2gZwkh0BuXw8gl+KZk552UjxitCgLABAJuWQHVlt+U28 OPLTuKPL4iwHFlMTZiHQStVuvZeFIW672eqBb9BT3JvbWh+DrHdMhIfCjZlOjnwb4w+q cMYPjDWsENPOwSgo2T64jovc/C7D9/fefmLGNXLx1bacTFBMKlwbvNh5D8PWS4fQesSU 9S1H/eWVMyWnanHIK+ZZW/e4TIfWUmjCnwDuqZf+TKOTweNC1DYl7tQ7dQr8UZOOSHGd yCcgBFBgqid+55pCu0qzoQJ19TEvY5iKyarGlnNsEbQzc8ivfjJEqGf29Xw99U2+5deD mKbA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :sender:x-gm-message-state:from:to:cc; bh=9CLMgMOtFgNk6Avjk/mvoSwMOuz8j5xt0xHlFbnLyAc=; b=qqMwP0lyYQW2bXGbNOBNXIU3VN6RAygLn1ZJQc+E7Qt4jUw0hrpcSwPHp8C+mKtPdB zxs3gzciZE2Nh/2ljYC1hquzh+OOnN0j3O3FfMyZY26lg0HSSTvKIuVAnNSSvzp+v2Hd Tb94+3tFxhh6Nuez6tBwS63Xy6NAzpfTxyMkSY2Pfy4ysDrDPmsgd9VNLQXEK2rOE+Di zQnbWlcVOcSeC0RFC2dGa7ton2rvfJQORueazXuVVbJXpt3dNCdVDFXQZsEht145RKVS NO7T0weu6UbjZhyvCybqU1EQJ0Zub8lThkMoYpGlrlDdoJMfya8hG4Ilqhq4TsCjpky2 AkUQ== X-Gm-Message-State: ACgBeo3O7B61HSRXDeanepqPz1wy/yKQ3Gauw/Iz1SHuSxODQdFuVHJB TNe01QrpXdBOoLY7kZacYf+Szc7h8aEf6w== X-Google-Smtp-Source: AA6agR7gacygMrq8TstLVN5VBphXM4XfJziULZ4fmtQyv7AUs+qQnEbGX95+SjvXCIiECux09GfAiQ== X-Received: by 2002:a5d:6212:0:b0:222:cf83:48f0 with SMTP id y18-20020a5d6212000000b00222cf8348f0mr527225wru.123.1660260400599; Thu, 11 Aug 2022 16:26:40 -0700 (PDT) Received: from ?IPV6:2a02:8010:64ea:0:8eb8:7eff:fe53:9d5f? ([2a02:8010:64ea:0:8eb8:7eff:fe53:9d5f]) by smtp.googlemail.com with ESMTPSA id g6-20020adfa486000000b0021ee65426a2sm372707wrb.65.2022.08.11.16.26.39 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 11 Aug 2022 16:26:40 -0700 (PDT) Sender: Mark Harmstone Message-ID: <182ff63d-9b76-3897-4da0-9641ac27f5c9@harmstone.com> Date: Fri, 12 Aug 2022 00:26:29 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.0 Subject: Re: [PATCH] Add pdb archive target To: Jan Beulich Cc: binutils@sourceware.org References: <20220725234405.19880-1-mark@harmstone.com> Content-Language: en-US From: Mark Harmstone In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.0 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, KAM_SHORT, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: binutils@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Binutils mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Aug 2022 23:26:47 -0000 Thanks Jan. On 11/8/22 14:02, Jan Beulich wrote: > On 26.07.2022 01:44, Mark Harmstone wrote: >> This adds support for the "Multi-Stream Format" container format that >> MSVC uses for its PDB debugging files, as documented at >> https://llvm.org/docs/PDB/MsfFile.html. > > Looking at binutils/testsuite/binutils-all/pdb*.d I wonder what "support" > here means: What is dumped is the binary contents of the file (claimed > to be coming from section .data) rather than the inner file structure. I'm not quite sure what you're getting at. This is purely for PDB files as archives, there is no inner file structure. The tests check that the hex dump of the files matches one possible way to represent an archive of the dummy files. > Also this looks to cover only one of several flavors/versions, which may > want calling out here and which also may influence the naming of certain > things throughout the patch. I assume you're basing this off the "Microsoft C/C++ MSF 7.00" in the header. MSVC++ 7 came out in 2002, and it's been the same since... there's only one living version of the format. >> This is a prerequisite for gdb to be able to read Microsoft's debug >> files, and for gcc and ld to generate debugging info that can be read by >> Microsoft's tools. > > Are there plans in any of those three directions? Yes! See https://github.com/maharmstone/binutils-gdb for my messy dev repo which adds PDB support for ld - it needs cleaning up and rebasing, but it's fully functional. I submitted patches to gcc last year, which were rejected because they'd changed their policy on how debug hooks should work, and I'd written it the old way. See https://www.phoronix.com/news/GCC-PE-Specific-CodeView. I've since come round to the view that it'd be pretty useless having the patches in gcc before they're in ld, which is why I've not resubmitted them. > >> --- a/bfd/config.bfd >> +++ b/bfd/config.bfd >> @@ -388,7 +388,7 @@ case "${targ}" in >>      ;; >>    arm-wince-pe | arm-*-wince | arm*-*-mingw32ce* | arm*-*-cegcc*) >>      targ_defvec=arm_pe_wince_le_vec >> -    targ_selvecs="arm_pe_wince_le_vec arm_pe_wince_be_vec arm_pei_wince_le_vec arm_pei_wince_be_vec" >> +    targ_selvecs="arm_pe_wince_le_vec arm_pe_wince_be_vec arm_pei_wince_le_vec arm_pei_wince_be_vec pdb_vec" >>      targ_underscore=no >>      targ_cflags="-DARM_WINCE -DARM_COFF_BUGFIX" >>      ;; >> @@ -708,7 +708,7 @@ case "${targ}" in >>      ;; >>    x86_64-*-mingw* | x86_64-*-pe | x86_64-*-pep | x86_64-*-cygwin) >>      targ_defvec=x86_64_pe_vec >> -    targ_selvecs="x86_64_pe_vec x86_64_pei_vec x86_64_pe_big_vec x86_64_elf64_vec l1om_elf64_vec k1om_elf64_vec i386_pe_vec i386_pei_vec i386_elf32_vec iamcu_elf32_vec" >> +    targ_selvecs="x86_64_pe_vec x86_64_pei_vec x86_64_pe_big_vec x86_64_elf64_vec l1om_elf64_vec k1om_elf64_vec i386_pe_vec i386_pei_vec i386_elf32_vec iamcu_elf32_vec pdb_vec" > > I'm not sure adding this as a default for -pe and -pep is appropriate; I > certainly agree with MingW and Cygwin. I don't agree: the format's specific to PE files, not specific to Windows. There's no reason why you can't have PDB files linked to EFI images (and Microsoft's bootloaders do just that.) > Same for ix86's -pe then, whereas I'm unclear about what arm*-*-cegcc* is. I'm not sure either. Happy to nix it for Windows CE. >> --- a/bfd/configure.ac >> +++ b/bfd/configure.ac >> @@ -416,7 +416,7 @@ tb= >> >>  elf="elf.lo elflink.lo elf-attrs.lo elf-strtab.lo elf-eh-frame.lo >>       dwarf1.lo dwarf2.lo" >> -coffgen="coffgen.lo dwarf2.lo" >> +coffgen="coffgen.lo dwarf2.lo pdb.lo" >>  coff="cofflink.lo $coffgen" >>  ecoff="ecofflink.lo $coffgen" >>  xcoff="xcofflink.lo $coffgen" > > Similarly here - is this really relevant to ecoff and xcoff as well? > I'm not even convinced this wants universally tying to coff. I think quite possibly that I naively thought that COFF == PE, when I now realize that's not the case... >> --- /dev/null >> +++ b/bfd/pdb.c >> @@ -0,0 +1,804 @@ >> +/* BFD back-end for PDB Multi-Stream Format archives. >> +   Copyright (C) 2022 Free Software Foundation, Inc. >> + >> +   This file is part of BFD, the Binary File Descriptor library. >> + >> +   This program is free software; you can redistribute it and/or modify >> +   it under the terms of the GNU General Public License as published by >> +   the Free Software Foundation; either version 3 of the License, or >> +   (at your option) any later version. >> + >> +   This program is distributed in the hope that it will be useful, >> +   but WITHOUT ANY WARRANTY; without even the implied warranty of >> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> +   GNU General Public License for more details. >> + >> +   You should have received a copy of the GNU General Public License >> +   along with this program; if not, write to the Free Software >> +   Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston, >> +   MA 02110-1301, USA. */ >> + >> +/* This describes the MSF file archive format, which is used for the >> +   PDB debug info generated by MSVC. See https://llvm.org/docs/PDB/MsfFile.html >> +   for a full description of the format. */ >> + >> +#include "sysdep.h" >> +#include "bfd.h" >> +#include "libbfd.h" >> + >> +/* "Microsoft C/C++ MSF 7.00\r\n\x1a\x44\x53\0\0\0" */ >> +static const uint8_t pdb_magic[] = >> +{ 0x4d, 0x69, 0x63, 0x72, 0x6f, 0x73, 0x6f, 0x66, >> +  0x74, 0x20, 0x43, 0x2f, 0x43, 0x2b, 0x2b, 0x20, >> +  0x4d, 0x53, 0x46, 0x20, 0x37, 0x2e, 0x30, 0x30, >> +  0x0d, 0x0a, 0x1a, 0x44, 0x53, 0x00, 0x00, 0x00 }; >> + >> +#define arch_eltdata(bfd) ((struct areltdata *) ((bfd)->arelt_data)) >> + >> +static bfd_cleanup >> +pdb_archive_p (bfd *abfd) >> +{ >> +  int ret; >> +  char magic[sizeof (pdb_magic)]; >> + >> +  ret = bfd_bread (magic, sizeof (magic), abfd); >> +  if (ret != sizeof (magic)) >> +    { >> +      bfd_set_error (bfd_error_wrong_format); >> +      return NULL; >> +    } >> + >> +  if (memcmp (magic, pdb_magic, sizeof (magic))) >> +    { >> +      bfd_set_error (bfd_error_wrong_format); >> +      return NULL; >> +    } >> + >> +  return _bfd_no_cleanup; >> +} >> + >> +static bfd * >> +pdb_get_elt_at_index (bfd *abfd, symindex sym_index) >> +{ >> +  char int_buf[sizeof (uint32_t)]; >> +  uint32_t block_size, block_map_addr, block, num_files; >> +  uint32_t first_dir_block, dir_offset, file_size, block_off, left; >> +  char name[10]; >> +  bfd *file; >> +  char *buf; >> + >> +  /* get block_size */ >> + >> +  if (bfd_seek (abfd, sizeof (pdb_magic), SEEK_SET)) >> +    return NULL; >> + >> +  if (bfd_bread (int_buf, sizeof (uint32_t), abfd) != sizeof (uint32_t)) >> +    { >> +      bfd_set_error (bfd_error_malformed_archive); >> +      return NULL; >> +    } >> + >> +  block_size = bfd_getl32 (int_buf); >> + >> +  /* get block_map_addr */ >> + >> +  if (bfd_seek (abfd, 4 * sizeof (uint32_t), SEEK_CUR)) >> +    return NULL; >> + >> +  if (bfd_bread (int_buf, sizeof (uint32_t), abfd) != sizeof (uint32_t)) >> +    { >> +      bfd_set_error (bfd_error_malformed_archive); >> +      return NULL; >> +    } >> + >> +  block_map_addr = bfd_getl32 (int_buf); >> + >> +  /* get num_files */ >> + >> +  if (bfd_seek (abfd, block_map_addr * block_size, SEEK_SET)) >> +    return NULL; >> + >> +  if (bfd_bread (int_buf, sizeof (uint32_t), abfd) != sizeof (uint32_t)) >> +    { >> +      bfd_set_error (bfd_error_malformed_archive); >> +      return NULL; >> +    } >> + >> +  first_dir_block = bfd_getl32 (int_buf); >> + >> +  if (bfd_seek (abfd, first_dir_block * block_size, SEEK_SET)) >> +    return NULL; >> + >> +  if (bfd_bread (int_buf, sizeof (uint32_t), abfd) != sizeof (uint32_t)) >> +    { >> +      bfd_set_error (bfd_error_malformed_archive); >> +      return NULL; >> +    } >> + >> +  num_files = bfd_getl32 (int_buf); >> + >> +  if (sym_index >= num_files) >> +    { >> +      bfd_set_error (bfd_error_no_more_archived_files); >> +      return NULL; >> +    } >> + >> +  /* read file size */ >> + >> +  dir_offset = sizeof (uint32_t) * (sym_index + 1); >> + >> +  if (dir_offset >= block_size) >> +    { >> +      uint32_t block_map_addr_off; >> + >> +      block_map_addr_off = ((dir_offset / block_size) * sizeof (uint32_t)); >> + >> +      if (bfd_seek (abfd, (block_map_addr * block_size) + block_map_addr_off, >> +            SEEK_SET)) >> +    return NULL; >> + >> +      if (bfd_bread (int_buf, sizeof (uint32_t), abfd) != sizeof (uint32_t)) >> +    { >> +      bfd_set_error (bfd_error_malformed_archive); >> +      return NULL; >> +    } >> + >> +      block = bfd_getl32 (int_buf); >> +    } >> +  else >> +    { >> +      block = first_dir_block; >> +    } >> + >> +  if (bfd_seek (abfd, (block * block_size) + (dir_offset % block_size), >> +        SEEK_SET)) >> +    return NULL; >> + >> +  if (bfd_bread (int_buf, sizeof (uint32_t), abfd) != sizeof (uint32_t)) >> +    { >> +      bfd_set_error (bfd_error_malformed_archive); >> +      return NULL; >> +    } >> + >> +  file_size = bfd_getl32 (int_buf); >> + >> +  /* create BFD */ >> + >> +  sprintf (name, "%04lx", sym_index); > > Is there a reason for this 4-or-more digits naming of the file? Would > it make sense to use 8 digits (beyond which the index apparently > cannot grow)? In practice, 4 digits is plenty. The number of files in the archive is proportional to the number of object files linked into the image... for the NT kernel, which is probably the most complicated EXE out there, the PDB has 1,100 files. I can't imagine anybody will ever go over 65,535 - and it's not visible anyway, unless you play around with ar. > >> --- /dev/null >> +++ b/binutils/testsuite/binutils-all/pdb.exp >> @@ -0,0 +1,157 @@ >> +#   Copyright (C) 2022 Free Software Foundation, Inc. >> + >> +# This file is part of the GNU Binutils. >> +# >> +# This program is free software; you can redistribute it and/or modify >> +# it under the terms of the GNU General Public License as published by >> +# the Free Software Foundation; either version 3 of the License, or >> +# (at your option) any later version. >> +# >> +# This program is distributed in the hope that it will be useful, >> +# but WITHOUT ANY WARRANTY; without even the implied warranty of >> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> +# GNU General Public License for more details. >> +# >> +# You should have received a copy of the GNU General Public License >> +# along with this program; if not, write to the Free Software >> +# Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston, MA 02110-1301, USA. >> + >> +proc pdb_archive_1 { } { >> +    global AR >> +    global OBJDUMP >> +    global srcdir >> +    global subdir >> + >> +    set testname "pdb archive 1" >> + >> +    file delete tmpdir/test.pdb >> + >> +    # add short file (less than block size) >> + >> +    set got [binutils_run $AR "cr --target=pdb tmpdir/test.pdb $srcdir/$subdir/pdbfile1"] >> +    if ![string match "" $got] { >> +    fail $testname >> +    return >> +    } >> + >> +    set got [binutils_run $AR "tv tmpdir/test.pdb"] >> +    if ![string match "rw-r--r-- 0/0      3 *0000*" $got] { >> +    fail $testname >> +    return >> +    } >> + >> +    set got [binutils_run $OBJDUMP "-s --target=binary tmpdir/test.pdb"] >> +    set exp [file_contents "$srcdir/$subdir/pdb1.d"] >> +    if ![string equal $got $exp] { >> +    fail $testname >> +    return >> +    } >> + >> +    set got [binutils_run $AR "x tmpdir/test.pdb 0000 --output=tmpdir"] >> +    if ![string match "" $got] { >> +    fail $testname >> +    return >> +    } >> + >> +    set got [file_contents tmpdir/0000] >> +    set exp [file_contents "$srcdir/$subdir/pdbfile1"] >> +    if ![string equal $got $exp] { >> +    fail $testname >> +    return >> +    } >> + >> +    pass $testname >> +} >> + >> +proc pdb_archive_2 { } { >> +    global AR >> +    global OBJDUMP >> +    global srcdir >> +    global subdir >> + >> +    set testname "pdb archive 2" >> + >> +    # add empty file >> + >> +    set got [binutils_run $AR "cr --target=pdb tmpdir/test.pdb /dev/null"] >> +    if ![string match "" $got] { >> +    fail $testname >> +    return >> +    } >> + >> +    set got [binutils_run $AR "tv tmpdir/test.pdb"] >> +    if ![string match "*\nrw-r--r-- 0/0      0 *0001*" $got] { >> +    fail $testname >> +    return >> +    } >> + >> +    set got [binutils_run $OBJDUMP "-s --target=binary tmpdir/test.pdb"] >> +    set exp [file_contents "$srcdir/$subdir/pdb2.d"] >> +    if ![string equal $got $exp] { >> +    fail $testname >> +    return >> +    } >> + >> +    set got [binutils_run $AR "x tmpdir/test.pdb 0001 --output=tmpdir"] >> +    if ![string match "" $got] { >> +    fail $testname >> +    return >> +    } >> + >> +    set got [file_contents tmpdir/0001] >> +    if ![string equal $got ""] { >> +    fail $testname >> +    return >> +    } >> + >> +    pass $testname >> +} >> + >> +proc pdb_archive_3 { } { >> +    global AR >> +    global OBJDUMP >> +    global srcdir >> +    global subdir >> + >> +    set testname "pdb archive 3" >> + >> +    # add long file (greater than block size) >> + >> +    set got [binutils_run $AR "cr --target=pdb tmpdir/test.pdb $srcdir/$subdir/pdbfile2"] >> +    if ![string match "" $got] { >> +    fail $testname >> +    return >> +    } >> + >> +    set got [binutils_run $AR "tv tmpdir/test.pdb"] >> +    if ![string match "*\nrw-r--r-- 0/0   1032 *0002*" $got] { >> +    fail $testname >> +    return >> +    } >> + >> +    set got [binutils_run $OBJDUMP "-s --target=binary tmpdir/test.pdb"] >> +    set exp [file_contents "$srcdir/$subdir/pdb3.d"] >> +    if ![string equal $got $exp] { >> +    fail $testname >> +    return >> +    } >> + >> +    set got [binutils_run $AR "x tmpdir/test.pdb 0002 --output=tmpdir"] >> +    if ![string match "" $got] { >> +    fail $testname >> +    return >> +    } >> + >> +    set got [file_contents tmpdir/0002] >> +    set exp [file_contents "$srcdir/$subdir/pdbfile2"] >> +    if ![string equal $got $exp] { >> +    fail $testname >> +    return >> +    } >> + >> +    pass $testname >> +} >> + >> +pdb_archive_1 >> +pdb_archive_2 >> +pdb_archive_3 > > The three functions look pretty similar. Any chance of folding them into > just one, suitably parametrized? I'm not sure - to my mind, that would imply that the functions were independent, when they need to be run one after the other. I think we do need to test a small file, an empty file, and a long file, but could I get away with just having one test, which adds all three files? Mark