public inbox for debugedit@sourceware.org
 help / color / mirror / Atom feed
From: Mark Wielaard <mark@klomp.org>
To: Denys Vlasenko <dvlasenk@redhat.com>
Cc: debugedit@sourceware.org
Subject: Re: [PATCH] find-debuginfo: remove duplicate filenames when creating debugsources.list
Date: Wed, 14 Jun 2023 18:30:45 +0200	[thread overview]
Message-ID: <4fd77ae2cbc3fdc194bcd0fc37c0c2f92efb66e6.camel@klomp.org> (raw)
In-Reply-To: <20230614145638.7830-1-dvlasenk@redhat.com>

Hi Dynes,

CCing the debugedit devel list.

On Wed, 2023-06-14 at 16:56 +0200, Denys Vlasenko wrote:
> We remove duplicate filenames when we _process_ debugsources.list.
> However, this means that momentarily we may have a very large
> (in the range of *giga*bytes) debugsources.list.
> 
> This is unnecessary, we can also remove dups when we *create* it.

We can also teach debugedit itself to not emit duplicate lines
(currently it simply outputs every file/dir found in the .debug_info
and .debug_line tables). But that wouldn't make this unnecessary
(debugedit cannot know about the other file lists). It might be more
efficient/create smaller temporary files though.

> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
> ---
>  scripts/find-debuginfo.in | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/scripts/find-debuginfo.in b/scripts/find-debuginfo.in
> index 7dec3c3..e7ac095 100755
> --- a/scripts/find-debuginfo.in
> +++ b/scripts/find-debuginfo.in
> @@ -575,7 +575,10 @@ else
>        exit 1
>      fi
>    done
> -  cat "$temp"/debugsources.* >"$SOURCEFILE"
> +  # List of sources may have lots of duplicates. A kernel build was seen
> +  # with this list reaching 448 megabytes in size. "sort" helps to not have
> +  # _two_ sets of 448 megabytes of temp files here.
> +  LC_ALL=C sort -z -u "$temp"/debugsources.* >"$SOURCEFILE"
>    cat "$temp"/elfbins.* >"$ELFBINSFILE"
>  fi
>  

Looks good, applied as commit 41fc1335b8b364c95a8ee2ed2956bbdfe7957853

Thanks,

Mark

           reply	other threads:[~2023-06-14 16:30 UTC|newest]

Thread overview: expand[flat|nested]  mbox.gz  Atom feed
 [parent not found: <20230614145638.7830-1-dvlasenk@redhat.com>]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4fd77ae2cbc3fdc194bcd0fc37c0c2f92efb66e6.camel@klomp.org \
    --to=mark@klomp.org \
    --cc=debugedit@sourceware.org \
    --cc=dvlasenk@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).