From: Mark Wielaard <mark@klomp.org>
To: Denys Vlasenko <dvlasenk@redhat.com>
Cc: debugedit@sourceware.org
Subject: Re: [PATCH] find-debuginfo: remove duplicate filenames when creating debugsources.list
Date: Wed, 14 Jun 2023 18:30:45 +0200 [thread overview]
Message-ID: <4fd77ae2cbc3fdc194bcd0fc37c0c2f92efb66e6.camel@klomp.org> (raw)
In-Reply-To: <20230614145638.7830-1-dvlasenk@redhat.com>
Hi Dynes,
CCing the debugedit devel list.
On Wed, 2023-06-14 at 16:56 +0200, Denys Vlasenko wrote:
> We remove duplicate filenames when we _process_ debugsources.list.
> However, this means that momentarily we may have a very large
> (in the range of *giga*bytes) debugsources.list.
>
> This is unnecessary, we can also remove dups when we *create* it.
We can also teach debugedit itself to not emit duplicate lines
(currently it simply outputs every file/dir found in the .debug_info
and .debug_line tables). But that wouldn't make this unnecessary
(debugedit cannot know about the other file lists). It might be more
efficient/create smaller temporary files though.
> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
> ---
> scripts/find-debuginfo.in | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/scripts/find-debuginfo.in b/scripts/find-debuginfo.in
> index 7dec3c3..e7ac095 100755
> --- a/scripts/find-debuginfo.in
> +++ b/scripts/find-debuginfo.in
> @@ -575,7 +575,10 @@ else
> exit 1
> fi
> done
> - cat "$temp"/debugsources.* >"$SOURCEFILE"
> + # List of sources may have lots of duplicates. A kernel build was seen
> + # with this list reaching 448 megabytes in size. "sort" helps to not have
> + # _two_ sets of 448 megabytes of temp files here.
> + LC_ALL=C sort -z -u "$temp"/debugsources.* >"$SOURCEFILE"
> cat "$temp"/elfbins.* >"$ELFBINSFILE"
> fi
>
Looks good, applied as commit 41fc1335b8b364c95a8ee2ed2956bbdfe7957853
Thanks,
Mark
parent reply other threads:[~2023-06-14 16:30 UTC|newest]
Thread overview: expand[flat|nested] mbox.gz Atom feed
[parent not found: <20230614145638.7830-1-dvlasenk@redhat.com>]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4fd77ae2cbc3fdc194bcd0fc37c0c2f92efb66e6.camel@klomp.org \
--to=mark@klomp.org \
--cc=debugedit@sourceware.org \
--cc=dvlasenk@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).