* Re: [PATCH] find-debuginfo: remove duplicate filenames when creating debugsources.list
[not found] <20230614145638.7830-1-dvlasenk@redhat.com>
@ 2023-06-14 16:30 ` Mark Wielaard
0 siblings, 0 replies; only message in thread
From: Mark Wielaard @ 2023-06-14 16:30 UTC (permalink / raw)
To: Denys Vlasenko; +Cc: debugedit
Hi Dynes,
CCing the debugedit devel list.
On Wed, 2023-06-14 at 16:56 +0200, Denys Vlasenko wrote:
> We remove duplicate filenames when we _process_ debugsources.list.
> However, this means that momentarily we may have a very large
> (in the range of *giga*bytes) debugsources.list.
>
> This is unnecessary, we can also remove dups when we *create* it.
We can also teach debugedit itself to not emit duplicate lines
(currently it simply outputs every file/dir found in the .debug_info
and .debug_line tables). But that wouldn't make this unnecessary
(debugedit cannot know about the other file lists). It might be more
efficient/create smaller temporary files though.
> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
> ---
> scripts/find-debuginfo.in | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/scripts/find-debuginfo.in b/scripts/find-debuginfo.in
> index 7dec3c3..e7ac095 100755
> --- a/scripts/find-debuginfo.in
> +++ b/scripts/find-debuginfo.in
> @@ -575,7 +575,10 @@ else
> exit 1
> fi
> done
> - cat "$temp"/debugsources.* >"$SOURCEFILE"
> + # List of sources may have lots of duplicates. A kernel build was seen
> + # with this list reaching 448 megabytes in size. "sort" helps to not have
> + # _two_ sets of 448 megabytes of temp files here.
> + LC_ALL=C sort -z -u "$temp"/debugsources.* >"$SOURCEFILE"
> cat "$temp"/elfbins.* >"$ELFBINSFILE"
> fi
>
Looks good, applied as commit 41fc1335b8b364c95a8ee2ed2956bbdfe7957853
Thanks,
Mark
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2023-06-14 16:30 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20230614145638.7830-1-dvlasenk@redhat.com>
2023-06-14 16:30 ` [PATCH] find-debuginfo: remove duplicate filenames when creating debugsources.list Mark Wielaard
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).