public inbox for debugedit@sourceware.org
 help / color / mirror / Atom feed
* Re: [PATCH] find-debuginfo: remove duplicate filenames when creating debugsources.list
       [not found] <20230614145638.7830-1-dvlasenk@redhat.com>
@ 2023-06-14 16:30 ` Mark Wielaard
  0 siblings, 0 replies; only message in thread
From: Mark Wielaard @ 2023-06-14 16:30 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: debugedit

Hi Dynes,

CCing the debugedit devel list.

On Wed, 2023-06-14 at 16:56 +0200, Denys Vlasenko wrote:
> We remove duplicate filenames when we _process_ debugsources.list.
> However, this means that momentarily we may have a very large
> (in the range of *giga*bytes) debugsources.list.
> 
> This is unnecessary, we can also remove dups when we *create* it.

We can also teach debugedit itself to not emit duplicate lines
(currently it simply outputs every file/dir found in the .debug_info
and .debug_line tables). But that wouldn't make this unnecessary
(debugedit cannot know about the other file lists). It might be more
efficient/create smaller temporary files though.

> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
> ---
>  scripts/find-debuginfo.in | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/scripts/find-debuginfo.in b/scripts/find-debuginfo.in
> index 7dec3c3..e7ac095 100755
> --- a/scripts/find-debuginfo.in
> +++ b/scripts/find-debuginfo.in
> @@ -575,7 +575,10 @@ else
>        exit 1
>      fi
>    done
> -  cat "$temp"/debugsources.* >"$SOURCEFILE"
> +  # List of sources may have lots of duplicates. A kernel build was seen
> +  # with this list reaching 448 megabytes in size. "sort" helps to not have
> +  # _two_ sets of 448 megabytes of temp files here.
> +  LC_ALL=C sort -z -u "$temp"/debugsources.* >"$SOURCEFILE"
>    cat "$temp"/elfbins.* >"$ELFBINSFILE"
>  fi
>  

Looks good, applied as commit 41fc1335b8b364c95a8ee2ed2956bbdfe7957853

Thanks,

Mark

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2023-06-14 16:30 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20230614145638.7830-1-dvlasenk@redhat.com>
2023-06-14 16:30 ` [PATCH] find-debuginfo: remove duplicate filenames when creating debugsources.list Mark Wielaard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).