public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
From: Mark Harmstone <mark@harmstone.com>
To: Jan Beulich <jbeulich@suse.com>
Cc: binutils@sourceware.org
Subject: Re: [PATCH] Add pdb archive target
Date: Fri, 12 Aug 2022 00:26:29 +0100	[thread overview]
Message-ID: <182ff63d-9b76-3897-4da0-9641ac27f5c9@harmstone.com> (raw)
In-Reply-To: <c3c43971-bcd5-8b60-3790-b54df98f7df9@suse.com>

Thanks Jan.

On 11/8/22 14:02, Jan Beulich wrote:
 > On 26.07.2022 01:44, Mark Harmstone wrote:
 >> This adds support for the "Multi-Stream Format" container format that
 >> MSVC uses for its PDB debugging files, as documented at
 >> https://llvm.org/docs/PDB/MsfFile.html.
 >
 > Looking at binutils/testsuite/binutils-all/pdb*.d I wonder what "support"
 > here means: What is dumped is the binary contents of the file (claimed
 > to be coming from section .data) rather than the inner file structure.

I'm not quite sure what you're getting at. This is purely for PDB files as
archives, there is no inner file structure. The tests check that the hex
dump of the files matches one possible way to represent an archive of the
dummy files.

 > Also this looks to cover only one of several flavors/versions, which may
 > want calling out here and which also may influence the naming of certain
 > things throughout the patch.

I assume you're basing this off the "Microsoft C/C++ MSF 7.00" in the header.
MSVC++ 7 came out in 2002, and it's been the same since... there's only one
living version of the format.

 >> This is a prerequisite for gdb to be able to read Microsoft's debug
 >> files, and for gcc and ld to generate debugging info that can be read by
 >> Microsoft's tools.
 >
 > Are there plans in any of those three directions?

Yes! See https://github.com/maharmstone/binutils-gdb for my messy dev repo
which adds PDB support for ld - it needs cleaning up and rebasing, but it's
fully functional.

I submitted patches to gcc last year, which were rejected because they'd
changed their policy on how debug hooks should work, and I'd written it the
old way. See https://www.phoronix.com/news/GCC-PE-Specific-CodeView.

I've since come round to the view that it'd be pretty useless having the
patches in gcc before they're in ld, which is why I've not resubmitted them.

 >
 >> --- a/bfd/config.bfd
 >> +++ b/bfd/config.bfd
 >> @@ -388,7 +388,7 @@ case "${targ}" in
 >>      ;;
 >>    arm-wince-pe | arm-*-wince | arm*-*-mingw32ce* | arm*-*-cegcc*)
 >>      targ_defvec=arm_pe_wince_le_vec
 >> -    targ_selvecs="arm_pe_wince_le_vec arm_pe_wince_be_vec arm_pei_wince_le_vec arm_pei_wince_be_vec"
 >> +    targ_selvecs="arm_pe_wince_le_vec arm_pe_wince_be_vec arm_pei_wince_le_vec arm_pei_wince_be_vec pdb_vec"
 >>      targ_underscore=no
 >>      targ_cflags="-DARM_WINCE -DARM_COFF_BUGFIX"
 >>      ;;
 >> @@ -708,7 +708,7 @@ case "${targ}" in
 >>      ;;
 >>    x86_64-*-mingw* | x86_64-*-pe | x86_64-*-pep | x86_64-*-cygwin)
 >>      targ_defvec=x86_64_pe_vec
 >> -    targ_selvecs="x86_64_pe_vec x86_64_pei_vec x86_64_pe_big_vec x86_64_elf64_vec l1om_elf64_vec k1om_elf64_vec i386_pe_vec i386_pei_vec i386_elf32_vec iamcu_elf32_vec"
 >> +    targ_selvecs="x86_64_pe_vec x86_64_pei_vec x86_64_pe_big_vec x86_64_elf64_vec l1om_elf64_vec k1om_elf64_vec i386_pe_vec i386_pei_vec i386_elf32_vec iamcu_elf32_vec pdb_vec"
 >
 > I'm not sure adding this as a default for -pe and -pep is appropriate; I
 > certainly agree with MingW and Cygwin.

I don't agree: the format's specific to PE files, not specific to Windows.
There's no reason why you can't have PDB files linked to EFI images (and
Microsoft's bootloaders do just that.)

 > Same for ix86's -pe then, whereas I'm unclear about what arm*-*-cegcc* is.

I'm not sure either. Happy to nix it for Windows CE.

 >> --- a/bfd/configure.ac
 >> +++ b/bfd/configure.ac
 >> @@ -416,7 +416,7 @@ tb=
 >>
 >>  elf="elf.lo elflink.lo elf-attrs.lo elf-strtab.lo elf-eh-frame.lo
 >>       dwarf1.lo dwarf2.lo"
 >> -coffgen="coffgen.lo dwarf2.lo"
 >> +coffgen="coffgen.lo dwarf2.lo pdb.lo"
 >>  coff="cofflink.lo $coffgen"
 >>  ecoff="ecofflink.lo $coffgen"
 >>  xcoff="xcofflink.lo $coffgen"
 >
 > Similarly here - is this really relevant to ecoff and xcoff as well?
 > I'm not even convinced this wants universally tying to coff.

I think quite possibly that I naively thought that COFF == PE, when I
now realize that's not the case...

 >> --- /dev/null
 >> +++ b/bfd/pdb.c
 >> @@ -0,0 +1,804 @@
 >> +/* BFD back-end for PDB Multi-Stream Format archives.
 >> +   Copyright (C) 2022 Free Software Foundation, Inc.
 >> +
 >> +   This file is part of BFD, the Binary File Descriptor library.
 >> +
 >> +   This program is free software; you can redistribute it and/or modify
 >> +   it under the terms of the GNU General Public License as published by
 >> +   the Free Software Foundation; either version 3 of the License, or
 >> +   (at your option) any later version.
 >> +
 >> +   This program is distributed in the hope that it will be useful,
 >> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
 >> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
 >> +   GNU General Public License for more details.
 >> +
 >> +   You should have received a copy of the GNU General Public License
 >> +   along with this program; if not, write to the Free Software
 >> +   Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston,
 >> +   MA 02110-1301, USA. */
 >> +
 >> +/* This describes the MSF file archive format, which is used for the
 >> +   PDB debug info generated by MSVC. See https://llvm.org/docs/PDB/MsfFile.html
 >> +   for a full description of the format. */
 >> +
 >> +#include "sysdep.h"
 >> +#include "bfd.h"
 >> +#include "libbfd.h"
 >> +
 >> +/* "Microsoft C/C++ MSF 7.00\r\n\x1a\x44\x53\0\0\0" */
 >> +static const uint8_t pdb_magic[] =
 >> +{ 0x4d, 0x69, 0x63, 0x72, 0x6f, 0x73, 0x6f, 0x66,
 >> +  0x74, 0x20, 0x43, 0x2f, 0x43, 0x2b, 0x2b, 0x20,
 >> +  0x4d, 0x53, 0x46, 0x20, 0x37, 0x2e, 0x30, 0x30,
 >> +  0x0d, 0x0a, 0x1a, 0x44, 0x53, 0x00, 0x00, 0x00 };
 >> +
 >> +#define arch_eltdata(bfd) ((struct areltdata *) ((bfd)->arelt_data))
 >> +
 >> +static bfd_cleanup
 >> +pdb_archive_p (bfd *abfd)
 >> +{
 >> +  int ret;
 >> +  char magic[sizeof (pdb_magic)];
 >> +
 >> +  ret = bfd_bread (magic, sizeof (magic), abfd);
 >> +  if (ret != sizeof (magic))
 >> +    {
 >> +      bfd_set_error (bfd_error_wrong_format);
 >> +      return NULL;
 >> +    }
 >> +
 >> +  if (memcmp (magic, pdb_magic, sizeof (magic)))
 >> +    {
 >> +      bfd_set_error (bfd_error_wrong_format);
 >> +      return NULL;
 >> +    }
 >> +
 >> +  return _bfd_no_cleanup;
 >> +}
 >> +
 >> +static bfd *
 >> +pdb_get_elt_at_index (bfd *abfd, symindex sym_index)
 >> +{
 >> +  char int_buf[sizeof (uint32_t)];
 >> +  uint32_t block_size, block_map_addr, block, num_files;
 >> +  uint32_t first_dir_block, dir_offset, file_size, block_off, left;
 >> +  char name[10];
 >> +  bfd *file;
 >> +  char *buf;
 >> +
 >> +  /* get block_size */
 >> +
 >> +  if (bfd_seek (abfd, sizeof (pdb_magic), SEEK_SET))
 >> +    return NULL;
 >> +
 >> +  if (bfd_bread (int_buf, sizeof (uint32_t), abfd) != sizeof (uint32_t))
 >> +    {
 >> +      bfd_set_error (bfd_error_malformed_archive);
 >> +      return NULL;
 >> +    }
 >> +
 >> +  block_size = bfd_getl32 (int_buf);
 >> +
 >> +  /* get block_map_addr */
 >> +
 >> +  if (bfd_seek (abfd, 4 * sizeof (uint32_t), SEEK_CUR))
 >> +    return NULL;
 >> +
 >> +  if (bfd_bread (int_buf, sizeof (uint32_t), abfd) != sizeof (uint32_t))
 >> +    {
 >> +      bfd_set_error (bfd_error_malformed_archive);
 >> +      return NULL;
 >> +    }
 >> +
 >> +  block_map_addr = bfd_getl32 (int_buf);
 >> +
 >> +  /* get num_files */
 >> +
 >> +  if (bfd_seek (abfd, block_map_addr * block_size, SEEK_SET))
 >> +    return NULL;
 >> +
 >> +  if (bfd_bread (int_buf, sizeof (uint32_t), abfd) != sizeof (uint32_t))
 >> +    {
 >> +      bfd_set_error (bfd_error_malformed_archive);
 >> +      return NULL;
 >> +    }
 >> +
 >> +  first_dir_block = bfd_getl32 (int_buf);
 >> +
 >> +  if (bfd_seek (abfd, first_dir_block * block_size, SEEK_SET))
 >> +    return NULL;
 >> +
 >> +  if (bfd_bread (int_buf, sizeof (uint32_t), abfd) != sizeof (uint32_t))
 >> +    {
 >> +      bfd_set_error (bfd_error_malformed_archive);
 >> +      return NULL;
 >> +    }
 >> +
 >> +  num_files = bfd_getl32 (int_buf);
 >> +
 >> +  if (sym_index >= num_files)
 >> +    {
 >> +      bfd_set_error (bfd_error_no_more_archived_files);
 >> +      return NULL;
 >> +    }
 >> +
 >> +  /* read file size */
 >> +
 >> +  dir_offset = sizeof (uint32_t) * (sym_index + 1);
 >> +
 >> +  if (dir_offset >= block_size)
 >> +    {
 >> +      uint32_t block_map_addr_off;
 >> +
 >> +      block_map_addr_off = ((dir_offset / block_size) * sizeof (uint32_t));
 >> +
 >> +      if (bfd_seek (abfd, (block_map_addr * block_size) + block_map_addr_off,
 >> +            SEEK_SET))
 >> +    return NULL;
 >> +
 >> +      if (bfd_bread (int_buf, sizeof (uint32_t), abfd) != sizeof (uint32_t))
 >> +    {
 >> +      bfd_set_error (bfd_error_malformed_archive);
 >> +      return NULL;
 >> +    }
 >> +
 >> +      block = bfd_getl32 (int_buf);
 >> +    }
 >> +  else
 >> +    {
 >> +      block = first_dir_block;
 >> +    }
 >> +
 >> +  if (bfd_seek (abfd, (block * block_size) + (dir_offset % block_size),
 >> +        SEEK_SET))
 >> +    return NULL;
 >> +
 >> +  if (bfd_bread (int_buf, sizeof (uint32_t), abfd) != sizeof (uint32_t))
 >> +    {
 >> +      bfd_set_error (bfd_error_malformed_archive);
 >> +      return NULL;
 >> +    }
 >> +
 >> +  file_size = bfd_getl32 (int_buf);
 >> +
 >> +  /* create BFD */
 >> +
 >> +  sprintf (name, "%04lx", sym_index);
 >
 > Is there a reason for this 4-or-more digits naming of the file? Would
 > it make sense to use 8 digits (beyond which the index apparently
 > cannot grow)?

In practice, 4 digits is plenty. The number of files in the archive is
proportional to the number of object files linked into the image... for the
NT kernel, which is probably the most complicated EXE out there, the PDB
has 1,100 files. I can't imagine anybody will ever go over 65,535 - and it's
not visible anyway, unless you play around with ar.

 >
 >> --- /dev/null
 >> +++ b/binutils/testsuite/binutils-all/pdb.exp
 >> @@ -0,0 +1,157 @@
 >> +#   Copyright (C) 2022 Free Software Foundation, Inc.
 >> +
 >> +# This file is part of the GNU Binutils.
 >> +#
 >> +# This program is free software; you can redistribute it and/or modify
 >> +# it under the terms of the GNU General Public License as published by
 >> +# the Free Software Foundation; either version 3 of the License, or
 >> +# (at your option) any later version.
 >> +#
 >> +# This program is distributed in the hope that it will be useful,
 >> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
 >> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
 >> +# GNU General Public License for more details.
 >> +#
 >> +# You should have received a copy of the GNU General Public License
 >> +# along with this program; if not, write to the Free Software
 >> +# Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston, MA 02110-1301, USA.
 >> +
 >> +proc pdb_archive_1 { } {
 >> +    global AR
 >> +    global OBJDUMP
 >> +    global srcdir
 >> +    global subdir
 >> +
 >> +    set testname "pdb archive 1"
 >> +
 >> +    file delete tmpdir/test.pdb
 >> +
 >> +    # add short file (less than block size)
 >> +
 >> +    set got [binutils_run $AR "cr --target=pdb tmpdir/test.pdb $srcdir/$subdir/pdbfile1"]
 >> +    if ![string match "" $got] {
 >> +    fail $testname
 >> +    return
 >> +    }
 >> +
 >> +    set got [binutils_run $AR "tv tmpdir/test.pdb"]
 >> +    if ![string match "rw-r--r-- 0/0      3 *0000*" $got] {
 >> +    fail $testname
 >> +    return
 >> +    }
 >> +
 >> +    set got [binutils_run $OBJDUMP "-s --target=binary tmpdir/test.pdb"]
 >> +    set exp [file_contents "$srcdir/$subdir/pdb1.d"]
 >> +    if ![string equal $got $exp] {
 >> +    fail $testname
 >> +    return
 >> +    }
 >> +
 >> +    set got [binutils_run $AR "x tmpdir/test.pdb 0000 --output=tmpdir"]
 >> +    if ![string match "" $got] {
 >> +    fail $testname
 >> +    return
 >> +    }
 >> +
 >> +    set got [file_contents tmpdir/0000]
 >> +    set exp [file_contents "$srcdir/$subdir/pdbfile1"]
 >> +    if ![string equal $got $exp] {
 >> +    fail $testname
 >> +    return
 >> +    }
 >> +
 >> +    pass $testname
 >> +}
 >> +
 >> +proc pdb_archive_2 { } {
 >> +    global AR
 >> +    global OBJDUMP
 >> +    global srcdir
 >> +    global subdir
 >> +
 >> +    set testname "pdb archive 2"
 >> +
 >> +    # add empty file
 >> +
 >> +    set got [binutils_run $AR "cr --target=pdb tmpdir/test.pdb /dev/null"]
 >> +    if ![string match "" $got] {
 >> +    fail $testname
 >> +    return
 >> +    }
 >> +
 >> +    set got [binutils_run $AR "tv tmpdir/test.pdb"]
 >> +    if ![string match "*\nrw-r--r-- 0/0      0 *0001*" $got] {
 >> +    fail $testname
 >> +    return
 >> +    }
 >> +
 >> +    set got [binutils_run $OBJDUMP "-s --target=binary tmpdir/test.pdb"]
 >> +    set exp [file_contents "$srcdir/$subdir/pdb2.d"]
 >> +    if ![string equal $got $exp] {
 >> +    fail $testname
 >> +    return
 >> +    }
 >> +
 >> +    set got [binutils_run $AR "x tmpdir/test.pdb 0001 --output=tmpdir"]
 >> +    if ![string match "" $got] {
 >> +    fail $testname
 >> +    return
 >> +    }
 >> +
 >> +    set got [file_contents tmpdir/0001]
 >> +    if ![string equal $got ""] {
 >> +    fail $testname
 >> +    return
 >> +    }
 >> +
 >> +    pass $testname
 >> +}
 >> +
 >> +proc pdb_archive_3 { } {
 >> +    global AR
 >> +    global OBJDUMP
 >> +    global srcdir
 >> +    global subdir
 >> +
 >> +    set testname "pdb archive 3"
 >> +
 >> +    # add long file (greater than block size)
 >> +
 >> +    set got [binutils_run $AR "cr --target=pdb tmpdir/test.pdb $srcdir/$subdir/pdbfile2"]
 >> +    if ![string match "" $got] {
 >> +    fail $testname
 >> +    return
 >> +    }
 >> +
 >> +    set got [binutils_run $AR "tv tmpdir/test.pdb"]
 >> +    if ![string match "*\nrw-r--r-- 0/0   1032 *0002*" $got] {
 >> +    fail $testname
 >> +    return
 >> +    }
 >> +
 >> +    set got [binutils_run $OBJDUMP "-s --target=binary tmpdir/test.pdb"]
 >> +    set exp [file_contents "$srcdir/$subdir/pdb3.d"]
 >> +    if ![string equal $got $exp] {
 >> +    fail $testname
 >> +    return
 >> +    }
 >> +
 >> +    set got [binutils_run $AR "x tmpdir/test.pdb 0002 --output=tmpdir"]
 >> +    if ![string match "" $got] {
 >> +    fail $testname
 >> +    return
 >> +    }
 >> +
 >> +    set got [file_contents tmpdir/0002]
 >> +    set exp [file_contents "$srcdir/$subdir/pdbfile2"]
 >> +    if ![string equal $got $exp] {
 >> +    fail $testname
 >> +    return
 >> +    }
 >> +
 >> +    pass $testname
 >> +}
 >> +
 >> +pdb_archive_1
 >> +pdb_archive_2
 >> +pdb_archive_3
 >
 > The three functions look pretty similar. Any chance of folding them into
 > just one, suitably parametrized?

I'm not sure - to my mind, that would imply that the functions were
independent, when they need to be run one after the other.

I think we do need to test a small file, an empty file, and a long file, but
could I get away with just having one test, which adds all three files?

Mark

  reply	other threads:[~2022-08-11 23:26 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-25 23:44 Mark Harmstone
2022-08-11 13:02 ` Jan Beulich
2022-08-11 23:26   ` Mark Harmstone [this message]
2022-08-12  6:14     ` Jan Beulich
2022-08-15 17:06       ` Mark Harmstone
2022-08-15 19:27     ` NightStrike

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=182ff63d-9b76-3897-4da0-9641ac27f5c9@harmstone.com \
    --to=mark@harmstone.com \
    --cc=binutils@sourceware.org \
    --cc=jbeulich@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).