[cygport] enabling a replacement for "objdump -d -l"

public inbox for cygwin-apps@cygwin.com
 help / color / mirror / Atom feed

From: ASSI <Stromeko@nexgo.de>
To: cygwin-apps@cygwin.com
Subject: [cygport] enabling a replacement for "objdump -d -l"
Date: Sun, 18 Feb 2024 20:51:41 +0100	[thread overview]
Message-ID: <87a5nx5z5e.fsf@Gerda.invalid> (raw)

Cygport uses "objdump -d -l" to extract the list of source files that
need to be copied into the debuginfo package.  This operation triggers
some O(N²) or even higher complexity and in addition has been getting
slower in recent binutils releases due to more and more information
being put into the object files.  For gcc-11 extracting the debug source
files takes up to 45 minutes per executable (up from about 15 minutes
until 2.39) and for gcc-13 (with about 1.5 times the number of lines to
extract) it is already taking more than two hours.  So if you just
package gcc-13 using a single thread you'd be looking on the order of 20
hours wall clock time, which is unacceptable.

The deassembly implied by the "-d" (which is not the part that has the
superlinear complexity btw, but produces a baseline of 2 hours single
thread runtime all by itself) is also unnecessary to extract just the
filenames of the source files as we throw away the location information
anyway and so I've written a small parser that works on the DWARF dump
instead (which can be produced in linear time with a very small scaling
factor, so practically constant time even for very large executables).
Unfortunately binutils does not yet offer a machine readable format for
these dumps, but parsing the text is not too difficult even though the
format is undocumented.  The DWARF-5 documentation isn't the most
enjoyable read, but it was helpful enough to figure it all out.  I've
also integrated the filtering of unrelated source file information (from
system headers and external libraries).  The end result is the same
runtime as before on small object files, a factor up to 100 speedup for
medium sized object files and speedups in the several thousands range
for large sized ones (or a total single-thread runtime of less than 20
seconds for gcc-13).

dwarf-parse.-pl
--8<---------------cut here---------------start------------->8---
#!perl -w
use common::sense;
use List::Util qw( sum );

my $filter = shift @ARGV
    or die "not enough arguments";
my $obj = shift @ARGV
    or die "not enough arguments";
my @objdump = qw( /usr/bin/objdump -WNl );
open my $DWARF, "-|", @objdump, $obj
    or die "can't invoke objdump\n$!";

my ( @dirs, @files, %fn, %rn );
while (<$DWARF>) {
    if (/^ The Directory Table/../^$/) {
	if (/^  \d+/) {

	    my ( $entry, $dir ) = m/^  (\d+)\t.+: (.+)$/;
	    $dir = "$dirs[0]/$dir" if ($dir =~ m:\A[^/]:);
	    push @dirs, $dir;
	}
    }
    if (/^ The File Name Table/../^$/) {
	if (/^  \d+/) {
	    my ( $idx, $fn, undef ) = m/^  \d+\t(\d+)\t.+: (.+)$/;
	    $rn{"$dirs[$idx]/$fn"}++;
	    push @files, "$dirs[$idx]/$fn";
	}
    }
    if (my $rc = /^ Line Number Statements/../^  Offset:/) {
	$fn{"$files[0]"}++ if ($rc == 1);
	$fn{"$files[$1]"}++ if m/ Set File Name to entry (\d+) in the File Name Table/;
	@files = () if ($rc =~ m/E0$/);
	@dirs  = () if ($rc =~ m/E0$/);
    }
    if (/^ No Line Number Statements./../^$/) {
	@files = ();
	@dirs  = ();
    }
}
foreach my $fn (grep m:^$filter:, sort keys %fn) {
    say sprintf "%s", $fn;
}
say STDERR sprintf "\tLNS: %6d (%6d locations) <=> FNT: %6d ( %6d locations)",
    0+grep( m:^$filter:, keys %fn ), sum( values %fn ),
    0+grep( m:^$filter:, keys %rn ), sum( values %rn )
    if (0);

close $DWARF
    or die "failed to close objdump\n$!";
--8<---------------cut here---------------end--------------->8---

Integration into cygport is made configurable via a variable to be set
in .cygportrc for instance in order to easily revert back to the
original objdump invocation if necessary.  I've been producing packages
with that setup for a while now and have not noticed any errors.  In
principle the new parser actually produces more complete output as there
can be multiple line number statements and hence source files per
location, but objdump only lists one of them in the disassembly (at
least sometimes).  In practise I haven't found a package until now where
the final list (after filtering) is different.

https://repo.or.cz/cygport/rpm-style.git/commitdiff_plain/7ab8b26aaefb8a6ce050a196ddc97ce416ebe7a9
--8<---------------cut here---------------start------------->8---
lib/src_postinst.cygpart: use DWARF_PARSE optionally instead of objdump -dl
---

diff --git a/lib/src_postinst.cygpart b/lib/src_postinst.cygpart
index f06004e4..3dd6e893 100644
--- a/lib/src_postinst.cygpart
+++ b/lib/src_postinst.cygpart
@@ -1096,7 +1096,12 @@ __prepstrip_one() {
 	else
 		dbg="/usr/lib/debug/${exe}.dbg";

-		lines=$(${objdump} -d -l "${exe}" 2>/dev/null | sed -ne "s|.*\(/usr/src/debug/${PF}/.*\):[0-9]*$|\1|gp" | sort -u | tee -a ${T}/.dbgsrc.out.${oxt} | wc -l);
+		if defined DWARF_PARSE
+		then
+			lines=$(${DWARF_PARSE} /usr/src/debug/${PF}/ "${exe}" | tee -a ${T}/.dbgsrc.out.${oxt} | wc -l);
+		else
+			lines=$(${objdump} -d -l "${exe}" 2>/dev/null | sed -ne "s|.*\(/usr/src/debug/${PF}/.*\):[0-9]*$|\1|gp" | sort -u | tee -a ${T}/.dbgsrc.out.${oxt} | wc -l);
+		fi

 		# we expect --add-gnu-debuglink to fail if a
 		# .gnu_debuglink section already exists (e.g. binutils,
--8<---------------cut here---------------end--------------->8---

Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Wavetables for the Terratec KOMPLEXER:
http://Synth.Stromeko.net/Downloads.html#KomplexerWaves

next             reply	other threads:[~2024-02-18 19:51 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-18 19:51 ASSI [this message]
2024-02-20  3:42 ` Marco Atzeri
2024-02-20 18:21   ` ASSI
2024-02-26 19:29 ` Jon Turney
2024-03-11 19:35   ` ASSI
2024-03-12 17:41     ` Jon Turney
2024-03-12 17:49       ` ASSI
2024-03-12 21:39         ` Brian Inglis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a5nx5z5e.fsf@Gerda.invalid \
    --to=stromeko@nexgo.de \
    --cc=cygwin-apps@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).