public inbox for cygwin-apps@cygwin.com
 help / color / mirror / Atom feed
* [cygport] enabling a replacement for "objdump -d -l"
@ 2024-02-18 19:51 ASSI
  2024-02-20  3:42 ` Marco Atzeri
  2024-02-26 19:29 ` Jon Turney
  0 siblings, 2 replies; 8+ messages in thread
From: ASSI @ 2024-02-18 19:51 UTC (permalink / raw)
  To: cygwin-apps


Cygport uses "objdump -d -l" to extract the list of source files that
need to be copied into the debuginfo package.  This operation triggers
some O(N²) or even higher complexity and in addition has been getting
slower in recent binutils releases due to more and more information
being put into the object files.  For gcc-11 extracting the debug source
files takes up to 45 minutes per executable (up from about 15 minutes
until 2.39) and for gcc-13 (with about 1.5 times the number of lines to
extract) it is already taking more than two hours.  So if you just
package gcc-13 using a single thread you'd be looking on the order of 20
hours wall clock time, which is unacceptable.

The deassembly implied by the "-d" (which is not the part that has the
superlinear complexity btw, but produces a baseline of 2 hours single
thread runtime all by itself) is also unnecessary to extract just the
filenames of the source files as we throw away the location information
anyway and so I've written a small parser that works on the DWARF dump
instead (which can be produced in linear time with a very small scaling
factor, so practically constant time even for very large executables).
Unfortunately binutils does not yet offer a machine readable format for
these dumps, but parsing the text is not too difficult even though the
format is undocumented.  The DWARF-5 documentation isn't the most
enjoyable read, but it was helpful enough to figure it all out.  I've
also integrated the filtering of unrelated source file information (from
system headers and external libraries).  The end result is the same
runtime as before on small object files, a factor up to 100 speedup for
medium sized object files and speedups in the several thousands range
for large sized ones (or a total single-thread runtime of less than 20
seconds for gcc-13).

dwarf-parse.-pl
--8<---------------cut here---------------start------------->8---
#!perl -w
use common::sense;
use List::Util qw( sum );

my $filter = shift @ARGV
    or die "not enough arguments";
my $obj = shift @ARGV
    or die "not enough arguments";
my @objdump = qw( /usr/bin/objdump -WNl );
open my $DWARF, "-|", @objdump, $obj
    or die "can't invoke objdump\n$!";

my ( @dirs, @files, %fn, %rn );
while (<$DWARF>) {
    if (/^ The Directory Table/../^$/) {
	if (/^  \d+/) {
	    
	    my ( $entry, $dir ) = m/^  (\d+)\t.+: (.+)$/;
	    $dir = "$dirs[0]/$dir" if ($dir =~ m:\A[^/]:);
	    push @dirs, $dir;
	}
    }
    if (/^ The File Name Table/../^$/) {
	if (/^  \d+/) {
	    my ( $idx, $fn, undef ) = m/^  \d+\t(\d+)\t.+: (.+)$/;
	    $rn{"$dirs[$idx]/$fn"}++;
	    push @files, "$dirs[$idx]/$fn";
	}
    }
    if (my $rc = /^ Line Number Statements/../^  Offset:/) {
	$fn{"$files[0]"}++ if ($rc == 1);
	$fn{"$files[$1]"}++ if m/ Set File Name to entry (\d+) in the File Name Table/;
	@files = () if ($rc =~ m/E0$/);
	@dirs  = () if ($rc =~ m/E0$/);
    }
    if (/^ No Line Number Statements./../^$/) {
	@files = ();
	@dirs  = ();
    }
}
foreach my $fn (grep m:^$filter:, sort keys %fn) {
    say sprintf "%s", $fn;
}
say STDERR sprintf "\tLNS: %6d (%6d locations) <=> FNT: %6d ( %6d locations)",
    0+grep( m:^$filter:, keys %fn ), sum( values %fn ),
    0+grep( m:^$filter:, keys %rn ), sum( values %rn )
    if (0);

close $DWARF
    or die "failed to close objdump\n$!";
--8<---------------cut here---------------end--------------->8---

Integration into cygport is made configurable via a variable to be set
in .cygportrc for instance in order to easily revert back to the
original objdump invocation if necessary.  I've been producing packages
with that setup for a while now and have not noticed any errors.  In
principle the new parser actually produces more complete output as there
can be multiple line number statements and hence source files per
location, but objdump only lists one of them in the disassembly (at
least sometimes).  In practise I haven't found a package until now where
the final list (after filtering) is different.

https://repo.or.cz/cygport/rpm-style.git/commitdiff_plain/7ab8b26aaefb8a6ce050a196ddc97ce416ebe7a9
--8<---------------cut here---------------start------------->8---
lib/src_postinst.cygpart: use DWARF_PARSE optionally instead of objdump -dl
---

diff --git a/lib/src_postinst.cygpart b/lib/src_postinst.cygpart
index f06004e4..3dd6e893 100644
--- a/lib/src_postinst.cygpart
+++ b/lib/src_postinst.cygpart
@@ -1096,7 +1096,12 @@ __prepstrip_one() {
 	else
 		dbg="/usr/lib/debug/${exe}.dbg";
 
-		lines=$(${objdump} -d -l "${exe}" 2>/dev/null | sed -ne "s|.*\(/usr/src/debug/${PF}/.*\):[0-9]*$|\1|gp" | sort -u | tee -a ${T}/.dbgsrc.out.${oxt} | wc -l);
+		if defined DWARF_PARSE
+		then
+			lines=$(${DWARF_PARSE} /usr/src/debug/${PF}/ "${exe}" | tee -a ${T}/.dbgsrc.out.${oxt} | wc -l);
+		else
+			lines=$(${objdump} -d -l "${exe}" 2>/dev/null | sed -ne "s|.*\(/usr/src/debug/${PF}/.*\):[0-9]*$|\1|gp" | sort -u | tee -a ${T}/.dbgsrc.out.${oxt} | wc -l);
+		fi
 
 		# we expect --add-gnu-debuglink to fail if a
 		# .gnu_debuglink section already exists (e.g. binutils,
--8<---------------cut here---------------end--------------->8---


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Wavetables for the Terratec KOMPLEXER:
http://Synth.Stromeko.net/Downloads.html#KomplexerWaves

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [cygport] enabling a replacement for "objdump -d -l"
  2024-02-18 19:51 [cygport] enabling a replacement for "objdump -d -l" ASSI
@ 2024-02-20  3:42 ` Marco Atzeri
  2024-02-20 18:21   ` ASSI
  2024-02-26 19:29 ` Jon Turney
  1 sibling, 1 reply; 8+ messages in thread
From: Marco Atzeri @ 2024-02-20  3:42 UTC (permalink / raw)
  To: cygwin-apps

On 18/02/2024 20:51, ASSI via Cygwin-apps wrote:
> 
> Cygport uses "objdump -d -l" to extract the list of source files that
> need to be copied into the debuginfo package.  This operation triggers
> some O(N²) or even higher complexity and in addition has been getting
> slower in recent binutils releases due to more and more information
> being put into the object files.  For gcc-11 extracting the debug source
> files takes up to 45 minutes per executable (up from about 15 minutes
> until 2.39) and for gcc-13 (with about 1.5 times the number of lines to
> extract) it is already taking more than two hours.  So if you just
> package gcc-13 using a single thread you'd be looking on the order of 20
> hours wall clock time, which is unacceptable.
> 
> The deassembly implied by the "-d" (which is not the part that has the
> superlinear complexity btw, but produces a baseline of 2 hours single
> thread runtime all by itself) is also unnecessary to extract just the
> filenames of the source files as we throw away the location information
> anyway and so I've written a small parser that works on the DWARF dump
> instead (which can be produced in linear time with a very small scaling
> factor, so practically constant time even for very large executables).
> Unfortunately binutils does not yet offer a machine readable format for
> these dumps, but parsing the text is not too difficult even though the
> format is undocumented.  The DWARF-5 documentation isn't the most
> enjoyable read, but it was helpful enough to figure it all out.  I've
> also integrated the filtering of unrelated source file information (from
> system headers and external libraries).  The end result is the same
> runtime as before on small object files, a factor up to 100 speedup for
> medium sized object files and speedups in the several thousands range
> for large sized ones (or a total single-thread runtime of less than 20
> seconds for gcc-13).
> 
>
> Integration into cygport is made configurable via a variable to be set
> in .cygportrc for instance in order to easily revert back to the
> original objdump invocation if necessary.  I've been producing packages
> with that setup for a while now and have not noticed any errors.  In
> principle the new parser actually produces more complete output as there
> can be multiple line number statements and hence source files per
> location, but objdump only lists one of them in the disassembly (at
> least sometimes).  In practise I haven't found a package until now where
> the final list (after filtering) is different.
> 

if works should not be the default ?
Reducing that time is very interesting for the big stuff


> 
> Regards,
> Achim.

Thanks
Marco


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [cygport] enabling a replacement for "objdump -d -l"
  2024-02-20  3:42 ` Marco Atzeri
@ 2024-02-20 18:21   ` ASSI
  0 siblings, 0 replies; 8+ messages in thread
From: ASSI @ 2024-02-20 18:21 UTC (permalink / raw)
  To: cygwin-apps

Marco Atzeri via Cygwin-apps writes:
> if works should not be the default ?

Let's start with interested parties giving it a spin, shall we?  ONce
configured into your .cygportrc it essentially is the default for all
your builds.

> Reducing that time is very interesting for the big stuff

That's the raison d'être…

Anyway, I still hope to wrap my head around how libbfd works and either
produce a standalone executable or objdump sub-command that implements
the same functionality (probably sans the filtering).


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

SD adaptations for Waldorf Q V3.00R3 and Q+ V3.54R2:
http://Synth.Stromeko.net/Downloads.html#WaldorfSDada

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [cygport] enabling a replacement for "objdump -d -l"
  2024-02-18 19:51 [cygport] enabling a replacement for "objdump -d -l" ASSI
  2024-02-20  3:42 ` Marco Atzeri
@ 2024-02-26 19:29 ` Jon Turney
  2024-03-11 19:35   ` ASSI
  1 sibling, 1 reply; 8+ messages in thread
From: Jon Turney @ 2024-02-26 19:29 UTC (permalink / raw)
  To: Achim Gratz; +Cc: cygwin-apps


Thanks, this is great!

On 18/02/2024 19:51, ASSI via Cygwin-apps wrote:
> 
[...]
> dwarf-parse.-pl

There should be some build system changes which install this file, 
probably in /usr/share/cygport/tools/, which it can then be run from.

> --8<---------------cut here---------------start------------->8---

Please, please make a patch with git format-patch, which I can then just 
apply.

> #!perl -w

Fifty lines of perl with no comments! This is just line noise to me 
unless I spend lots of time staring at it :)

Seriously, this should at least say "I'm running objdump -Wl to dump out 
the .debug_line section containing DWARF XYZ information.

Then maybe some comments about what assumptions it's making about the 
human-readable output it's parsing.

> use common::sense;
> use List::Util qw( sum );
> 
> my $filter = shift @ARGV
>      or die "not enough arguments";
> my $obj = shift @ARGV
>      or die "not enough arguments";
> my @objdump = qw( /usr/bin/objdump -WNl );

cygport goes to some lengths to identify the correct objdump to use when 
cross-building, so it should probably should be used here (passed in as 
an arg?), rather than assuming it's /usr/bin/objdump.

> open my $DWARF, "-|", @objdump, $obj
>      or die "can't invoke objdump\n$!";
> 
> my ( @dirs, @files, %fn, %rn );
> while (<$DWARF>) {
>      if (/^ The Directory Table/../^$/) {
> 	if (/^  \d+/) {
> 	
> 	    my ( $entry, $dir ) = m/^  (\d+)\t.+: (.+)$/;
> 	    $dir = "$dirs[0]/$dir" if ($dir =~ m:\A[^/]:);
> 	    push @dirs, $dir;
> 	}
>      }
>      if (/^ The File Name Table/../^$/) {
> 	if (/^  \d+/) {
> 	    my ( $idx, $fn, undef ) = m/^  \d+\t(\d+)\t.+: (.+)$/;
> 	    $rn{"$dirs[$idx]/$fn"}++;
> 	    push @files, "$dirs[$idx]/$fn";
> 	}
>      }
>      if (my $rc = /^ Line Number Statements/../^  Offset:/) {
> 	$fn{"$files[0]"}++ if ($rc == 1);
> 	$fn{"$files[$1]"}++ if m/ Set File Name to entry (\d+) in the File Name Table/;

What this line is doing is obvious, the rest of this block, not so much.

You might also like to touch on why we bother looking at the line number 
information at all, rather than just producing a (filtered) list of all 
the pathnames mentioned?

> 	@files = () if ($rc =~ m/E0$/);
> 	@dirs  = () if ($rc =~ m/E0$/);
>      }
>      if (/^ No Line Number Statements./../^$/) {
> 	@files = ();
> 	@dirs  = ();
>      }
> }
> foreach my $fn (grep m:^$filter:, sort keys %fn) {
>      say sprintf "%s", $fn;
> }
> say STDERR sprintf "\tLNS: %6d (%6d locations) <=> FNT: %6d ( %6d locations)",
>      0+grep( m:^$filter:, keys %fn ), sum( values %fn ),
>      0+grep( m:^$filter:, keys %rn ), sum( values %rn )
>      if (0);

If you're going to keep this (which you probably should), perhaps it 
should be under some 'if (DEBUG)' conditional.

> 
> close $DWARF
>      or die "failed to close objdump\n$!";
> --8<---------------cut here---------------end--------------->8---
> 
> Integration into cygport is made configurable via a variable to be set
> in .cygportrc for instance in order to easily revert back to the
> original objdump invocation if necessary.  I've been producing packages

DWARF_PARSE should be mentioned in the documentation for cygport.conf

Since the helper script will be installed, it could be made a boolean.

> with that setup for a while now and have not noticed any errors.  In
> principle the new parser actually produces more complete output as there
> can be multiple line number statements and hence source files per
> location, but objdump only lists one of them in the disassembly (at
> least sometimes).  In practise I haven't found a package until now where
> the final list (after filtering) is different.
> 
> --8<---------------cut here---------------start------------->8---
> lib/src_postinst.cygpart: use DWARF_PARSE optionally instead of objdump -dl
> ---
> 
> diff --git a/lib/src_postinst.cygpart b/lib/src_postinst.cygpart
> index f06004e4..3dd6e893 100644
> --- a/lib/src_postinst.cygpart
> +++ b/lib/src_postinst.cygpart
> @@ -1096,7 +1096,12 @@ __prepstrip_one() {
>   	else
>   		dbg="/usr/lib/debug/${exe}.dbg";
>   
> -		lines=$(${objdump} -d -l "${exe}" 2>/dev/null | sed -ne "s|.*\(/usr/src/debug/${PF}/.*\):[0-9]*$|\1|gp" | sort -u | tee -a ${T}/.dbgsrc.out.${oxt} | wc -l);
> +		if defined DWARF_PARSE
> +		then
> +			lines=$(${DWARF_PARSE} /usr/src/debug/${PF}/ "${exe}" | tee -a ${T}/.dbgsrc.out.${oxt} | wc -l);
> +		else
> +			lines=$(${objdump} -d -l "${exe}" 2>/dev/null | sed -ne "s|.*\(/usr/src/debug/${PF}/.*\):[0-9]*$|\1|gp" | sort -u | tee -a ${T}/.dbgsrc.out.${oxt} | wc -l);
> +		fi
>   
>   		# we expect --add-gnu-debuglink to fail if a
>   		# .gnu_debuglink section already exists (e.g. binutils,
> --8<---------------cut here---------------end--------------->8---


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [cygport] enabling a replacement for "objdump -d -l"
  2024-02-26 19:29 ` Jon Turney
@ 2024-03-11 19:35   ` ASSI
  2024-03-12 17:41     ` Jon Turney
  0 siblings, 1 reply; 8+ messages in thread
From: ASSI @ 2024-03-11 19:35 UTC (permalink / raw)
  To: cygwin-apps

Jon Turney via Cygwin-apps writes:
> Thanks, this is great!

You're welcome.

> Please, please make a patch with git format-patch, which I can then
> just apply.

You can always just pull it in from my repo… when it's ready.

> Fifty lines of perl with no comments! This is just line noise to me
> unless I spend lots of time staring at it :)

That's what you get from an experiment that went rather more well than
planned.

> Seriously, this should at least say "I'm running objdump -Wl to dump
> out the .debug_line section containing DWARF XYZ information.
>
> Then maybe some comments about what assumptions it's making about the
> human-readable output it's parsing.

So you're asking for a manpage, really.  Should be doable with enough
round tuits.

> cygport goes to some lengths to identify the correct objdump to use
> when cross-building, so it should probably should be used here (passed
> in as an arg?), rather than assuming it's /usr/bin/objdump.

Yes, either that or using whatever variable cygport sets up with the
correct objdump.

> What this line is doing is obvious, the rest of this block, not so much.

Nothing to see here, move along… :-P

> You might also like to touch on why we bother looking at the line
> number information at all, rather than just producing a (filtered)
> list of all the pathnames mentioned?

I was using this to figure out why the "objdump -d -l" was missing some
of the file names I was seeing (in general, again, it comes to the same
set of files in the end).

> If you're going to keep this (which you probably should), perhaps it
> should be under some 'if (DEBUG)' conditional.

Yeah, can do if I use GetOpt::Long, which I should probably do anyway
just in case this gets extended later on.

> DWARF_PARSE should be mentioned in the documentation for cygport.conf

Yes.

> Since the helper script will be installed, it could be made a boolean.

Out of habit grown over decades, I always keep an escape hatch for using
local (modified) copies in such scripts.


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Wavetables for the Waldorf Blofeld:
http://Synth.Stromeko.net/Downloads.html#BlofeldUserWavetables

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [cygport] enabling a replacement for "objdump -d -l"
  2024-03-11 19:35   ` ASSI
@ 2024-03-12 17:41     ` Jon Turney
  2024-03-12 17:49       ` ASSI
  0 siblings, 1 reply; 8+ messages in thread
From: Jon Turney @ 2024-03-12 17:41 UTC (permalink / raw)
  To: Achim Gratz; +Cc: cygwin-apps

On 11/03/2024 19:35, ASSI via Cygwin-apps wrote:
> Jon Turney via Cygwin-apps writes:
[...]
> 
>> Fifty lines of perl with no comments! This is just line noise to me
>> unless I spend lots of time staring at it :)
> 
> That's what you get from an experiment that went rather more well than
> planned.
> 
>> Seriously, this should at least say "I'm running objdump -Wl to dump
>> out the .debug_line section containing DWARF XYZ information.
>>
>> Then maybe some comments about what assumptions it's making about the
>> human-readable output it's parsing.
> 
> So you're asking for a manpage, really.  Should be doable with enough
> round tuits.

Well, that would be nice too, but there is is difference between 
describing what it does and describing how it does what it does.

But I'm not being oblique here. I really do want comments.

I'm not sure what's so astounding about the suggestion that a fifty line 
script should have some comments in it that you can't believe I mean 
that literally...

[...]
>> What this line is doing is obvious, the rest of this block, not so much.
> 
> Nothing to see here, move along… :-P

...

[...]
>> Since the helper script will be installed, it could be made a boolean.
> 
> Out of habit grown over decades, I always keep an escape hatch for using
> local (modified) copies in such scripts.

Well, OK.  This is less useful to people who actually want to use it, 
though, requiring them to know that 
"DWARF_PARSE=/usr/share/cygport/tools/dwarf-parse.pl" is the right 
incantation.

Perhaps "DWARF_PARSE=1" could be a shorthand for that?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [cygport] enabling a replacement for "objdump -d -l"
  2024-03-12 17:41     ` Jon Turney
@ 2024-03-12 17:49       ` ASSI
  2024-03-12 21:39         ` Brian Inglis
  0 siblings, 1 reply; 8+ messages in thread
From: ASSI @ 2024-03-12 17:49 UTC (permalink / raw)
  To: cygwin-apps

Jon Turney via Cygwin-apps writes:
> But I'm not being oblique here. I really do want comments.

Well, adding comments or proper POD is about the same effort, so I'd
tend towards the latter.

> I'm not sure what's so astounding about the suggestion that a fifty
> line script should have some comments in it that you can't believe I
> mean that literally...

As the saying goes: Communication is possible, but improbable…

> Well, OK.  This is less useful to people who actually want to use it,
> though, requiring them to know that
> "DWARF_PARSE=/usr/share/cygport/tools/dwarf-parse.pl" is the right
> incantation.
>
> Perhaps "DWARF_PARSE=1" could be a shorthand for that?

I don't see why not, it just requires one extra test.


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Factory and User Sound Singles for Waldorf Blofeld:
http://Synth.Stromeko.net/Downloads.html#WaldorfSounds

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [cygport] enabling a replacement for "objdump -d -l"
  2024-03-12 17:49       ` ASSI
@ 2024-03-12 21:39         ` Brian Inglis
  0 siblings, 0 replies; 8+ messages in thread
From: Brian Inglis @ 2024-03-12 21:39 UTC (permalink / raw)
  To: cygwin-apps

On 2024-03-12 11:49, ASSI via Cygwin-apps wrote:
> Jon Turney via Cygwin-apps writes:
>> But I'm not being oblique here. I really do want comments.
> 
> Well, adding comments or proper POD is about the same effort, so I'd
> tend towards the latter.
> 
>> I'm not sure what's so astounding about the suggestion that a fifty
>> line script should have some comments in it that you can't believe I
>> mean that literally...
> 
> As the saying goes: Communication is possible, but improbable…
> 
>> Well, OK.  This is less useful to people who actually want to use it,
>> though, requiring them to know that
>> "DWARF_PARSE=/usr/share/cygport/tools/dwarf-parse.pl" is the right
>> incantation.
>>
>> Perhaps "DWARF_PARSE=1" could be a shorthand for that?

[Sorry still makes me think Dwarvish and Runes e.g.
	https://www.evertype.com/standards/iso10646/pdf/cirth.pdf
and if you say ELF I still think zwolf, dreizehn, ... ;^> ]

> I don't see why not, it just requires one extra test.

Maybe even more meaningful names, to those of us not so familiar with binutils 
and compiler debug esoterica, stating the connection between the build and the 
function, something like debuginfo_fast_source_filter.pl and 
DEBUGINFO_FAST_SOURCE_FILTER flag, default on if .pl installed?

-- 
Take care. Thanks, Brian Inglis              Calgary, Alberta, Canada

La perfection est atteinte                   Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter  not when there is no more to add
mais lorsqu'il n'y a plus rien à retirer     but when there is no more to cut
                                 -- Antoine de Saint-Exupéry

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-03-12 21:39 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-18 19:51 [cygport] enabling a replacement for "objdump -d -l" ASSI
2024-02-20  3:42 ` Marco Atzeri
2024-02-20 18:21   ` ASSI
2024-02-26 19:29 ` Jon Turney
2024-03-11 19:35   ` ASSI
2024-03-12 17:41     ` Jon Turney
2024-03-12 17:49       ` ASSI
2024-03-12 21:39         ` Brian Inglis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).