public inbox for
 help / color / mirror / Atom feed
From: Mark Wielaard <>
To: Denys Vlasenko <>
Subject: Re: [PATCH] find-debuginfo: remove duplicate filenames when creating debugsources.list
Date: Wed, 14 Jun 2023 18:30:45 +0200	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

Hi Dynes,

CCing the debugedit devel list.

On Wed, 2023-06-14 at 16:56 +0200, Denys Vlasenko wrote:
> We remove duplicate filenames when we _process_ debugsources.list.
> However, this means that momentarily we may have a very large
> (in the range of *giga*bytes) debugsources.list.
> This is unnecessary, we can also remove dups when we *create* it.

We can also teach debugedit itself to not emit duplicate lines
(currently it simply outputs every file/dir found in the .debug_info
and .debug_line tables). But that wouldn't make this unnecessary
(debugedit cannot know about the other file lists). It might be more
efficient/create smaller temporary files though.

> Signed-off-by: Denys Vlasenko <>
> ---
>  scripts/ | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> diff --git a/scripts/ b/scripts/
> index 7dec3c3..e7ac095 100755
> --- a/scripts/
> +++ b/scripts/
> @@ -575,7 +575,10 @@ else
>        exit 1
>      fi
>    done
> -  cat "$temp"/debugsources.* >"$SOURCEFILE"
> +  # List of sources may have lots of duplicates. A kernel build was seen
> +  # with this list reaching 448 megabytes in size. "sort" helps to not have
> +  # _two_ sets of 448 megabytes of temp files here.
> +  LC_ALL=C sort -z -u "$temp"/debugsources.* >"$SOURCEFILE"
>    cat "$temp"/elfbins.* >"$ELFBINSFILE"
>  fi

Looks good, applied as commit 41fc1335b8b364c95a8ee2ed2956bbdfe7957853



           reply	other threads:[~2023-06-14 16:30 UTC|newest]

Thread overview: expand[flat|nested]  mbox.gz  Atom feed
 [parent not found: <>]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).