From: Mark Wielaard <mark@klomp.org>
To: Aaron Merey <amerey@redhat.com>, elfutils-devel@sourceware.org
Subject: Re: [PATCH] libdw: dwarf_getsrcfiles should not imply dwarf_getsrclines
Date: Wed, 10 Apr 2024 21:47:56 +0200 [thread overview]
Message-ID: <2a6f04357e6cb900fe055939673a7abc542ae333.camel@klomp.org> (raw)
In-Reply-To: <20240410034539.164402-1-amerey@redhat.com>
Hi Aaron,
On Tue, 2024-04-09 at 23:45 -0400, Aaron Merey wrote:
> dwarf_getsrcfiles causes line data to be read in addition to file data.
> This is wasteful for programs which only need file or directory names.
> Debuginfod server is one such example.
>
> Fix this by moving the srcfile reading in read_srclines into a separate
> function read_srcfiles. This change improves debuginfod server's max
> resident set size by up to 75% during rpm indexing.
>
> * libdw/dwarf_getsrcfiles.c (dwarf_getsrcfiles): Replace
> dwarf_getsrclines and __libdw_getsrclines with
> __libdw_getsrcfiles.
> * libdw/dwarf_getsrclines.c (read_line_header): New function.
> (read_srcfiles): New function.
> (read_srclines): Move srcfile reading into read_srcfiles.
> Add parameter to use cached srcfiles if available.
> Also merge srcfiles with any files from DW_LNE_define_file.
> (__libdw_getsrclines): Changed to call get_lines_or_files.
> (__libdw_getsrcfiles): New function. Calls get_lines_or_files.
> (get_lines_or_files): New function based on the old
> __libdw_getsrclines. Call read_srcfiles if linesp is NULL,
> otherwise call read_srclines. Pass previously read srcfiles
> to read_srclines if available.
> * libdw/dwarf_macro_getsrcfiles.c (dwarf_macro_getsrcfiles):
> Replace __libdw_getsrclines with __libdw_getsrcfiles.
> * libdw/libdwP.h (__libdw_getsrcfiles): New declaration.
> * tests/.gitignore: Add new test binary.
> * tests/get-files.c: Verify that dwarf_getsrcfiles does
> not cause srclines to be read.
> * tests/get-lines.c: Verify that srclines can be read
> after srcfiles have been read.
> * tests/Makefile.am: Add new testfiles.
> * tests/get-files-define-file.c: Print file names before
> and after reading DW_LNE_define_file.
> * tests/run-get-files.sh: Add get-files-define-file test.
> * tests/testfile-define-file.bz2: New testfile. Copy of
> testfile36.debug but with a line program consisting of two
> DW_LNE_define_file opcodes.
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=27405
>
> Signed-off-by: Aaron Merey <amerey@redhat.com>
> ---
>
> v2 changes:
> Restored support for DW_LNE_define_file.
Great. And sorry I first suggested to just drop it and then said I
would like it back. This was more work than I though.
> Added DW_LNE_define_file testcase and testfile.
Nice. How did you edit this file?
> Added new function get_lines_or_files which is based on the old
> __libdw_getsrclines. __libdw_getsrclines and __libdw_getsrcfiles now
> call get_lines_or_files.
It is just a simple rename, but so much nicer to my eyes. Thanks.
> Mentioned max resident set size improvement in the commit message. I
> did not mention a 5% speed up to rpm indexing that I have previously
> talked about since I could not reliably reproduce it with v2 of the patch.
> I suspect that speed up may have been due to the smaller max RSS reducing
> swapping during that round of testing.
Makes sense. And a reduction of 75% of the max resident size it pretty
huge in itself.
> + case DW_LNE_define_file:
> + {
> + char *fname = (char *) linep;
> + uint8_t *endp = memchr (linep, '\0', lineendp - linep);
> + if (endp == NULL)
> + goto invalid_data;
> + size_t fnamelen = endp - linep;
> + linep = endp + 1;
> +
> + unsigned int diridx;
> + if (unlikely (linep >= lineendp))
> + goto invalid_data;
> + get_uleb128 (diridx, linep, lineendp);
> +
> + size_t ndirs = (*filesp)->ndirs;
This one line uses tabs for indentation, all others spaces.
> + if (unlikely (diridx >= ndirs))
> + {
> + __libdw_seterrno (DWARF_E_INVALID_DIR_IDX);
> + goto invalid_data;
> + }
> + Dwarf_Word mtime;
> + if (unlikely (linep >= lineendp))
> + goto invalid_data;
> + get_uleb128 (mtime, linep, lineendp);
> + Dwarf_Word filelength;
> + if (unlikely (linep >= lineendp))
> + goto invalid_data;
> + get_uleb128 (filelength, linep, lineendp);
> +
> + /* Add new_file to filelist that will be merged with filesp. */
> + struct filelist *new_file = malloc (sizeof (struct filelist));
> + if (unlikely (new_file == NULL))
> {
> - __libdw_seterrno (DWARF_E_INVALID_DIR_IDX);
> - goto invalid_data;
> + __libdw_seterrno (DWARF_E_NOMEM);
> + goto out;
> }
And this last block uses tabs again.
> - Dwarf_Word mtime;
> - if (unlikely (linep >= lineendp))
> - goto invalid_data;
> - get_uleb128 (mtime, linep, lineendp);
> - Dwarf_Word filelength;
> - if (unlikely (linep >= lineendp))
> - goto invalid_data;
> - get_uleb128 (filelength, linep, lineendp);
> -
> - struct filelist *new_file = NEW_FILE ();
> - if (fname[0] == '/')
> - new_file->info.name = fname;
> - else
> - {
> - new_file->info.name =
> - libdw_alloc (dbg, char, 1, (dirarray[diridx].len + 1
> - + fnamelen + 1));
> - char *cp = new_file->info.name;
> -
> - if (dirarray[diridx].dir != NULL)
> - /* This value could be NULL in case the
> - DW_AT_comp_dir was not present. We
> - cannot do much in this case. Just
> - keep the file relative. */
> - {
> - cp = stpcpy (cp, dirarray[diridx].dir);
> - *cp++ = '/';
> - }
> - strcpy (cp, fname);
> - }
> -
> - new_file->info.mtime = mtime;
> - new_file->info.length = filelength;
> - }
> - break;
> + nfilelist++;
> + new_file->next = filelist;
> + filelist = new_file;
Here tabs again, then the rest spaces.
> + if (fname[0] == '/')
> + new_file->info.name = fname;
> + else
> + {
> + /* Directory names are stored in a char *[ndirs] located
> + after the last Dwarf_Fileinfo_s. */
> + size_t nfiles = (*filesp)->nfiles;
> + const char **dirarray
> + = (const char **) &((*filesp)->info[nfiles]);
> +
> + const char *dname = dirarray[diridx];
> + size_t dnamelen = strlen (dname);
> +
> + new_file->info.name =
> + libdw_alloc (dbg, char, 1, (dnamelen + fnamelen + 2));
> + char *cp = new_file->info.name;
> +
> + if (dname != NULL)
> +
> + /* This value could be NULL in case the
> + DW_AT_comp_dir was not present. We
> + cannot do much in this case. Just
> + keep the file relative. */
> +
> + {
> + cp = stpcpy (cp, dname);
> + *cp++ = '/';
> + }
> + strcpy (cp, fname);
> + }
> +
> + new_file->info.mtime = mtime;
> + new_file->info.length = filelength;
> + }
> + break;
>
So inconsistent indentation, but the code looks good.
> @@ -1027,32 +1183,49 @@ read_srclines (Dwarf *dbg,
> }
> }
>
> - /* Put all the files in an array. */
> - Dwarf_Files *files = libdw_alloc (dbg, Dwarf_Files,
> - sizeof (Dwarf_Files)
> - + nfilelist * sizeof (Dwarf_Fileinfo)
> - + (ndirlist + 1) * sizeof (char *),
> - 1);
> - const char **dirs = (void *) &files->info[nfilelist];
> -
> - struct filelist *fileslist = filelist;
> - files->nfiles = nfilelist;
> - for (size_t n = nfilelist; n > 0; n--)
> + /* Merge filesp with the files from DW_LNE_define_file, if any. */
> + if (unlikely (filelist != NULL))
> {
> - files->info[n - 1] = fileslist->info;
> - fileslist = fileslist->next;
> - }
> - assert (fileslist == NULL);
> + Dwarf_Files *prevfiles = *filesp;
> + size_t ndirs = prevfiles->ndirs;
> + size_t nprevfiles = prevfiles->nfiles;
> + size_t nnewfiles = nprevfiles + nfilelist;
> +
> + Dwarf_Files *newfiles
> + = libdw_alloc (dbg, Dwarf_Files,
> + sizeof (Dwarf_Files)
> + + nnewfiles * sizeof (Dwarf_Fileinfo)
> + + (ndirs + 1) * sizeof (char *),
> + 1);
> +
> +
> + /* Copy prevfiles to newfiles. */
> + for (size_t n = 0; n < nprevfiles; n++)
> + newfiles->info[n] = prevfiles->info[n];
> +
> + /* Add files from DW_LNE_define_file to newfiles. */
> + struct filelist *fileslist = filelist;
> + for (size_t n = nfilelist; n > 0; n--)
> + {
> + newfiles->info[nprevfiles + n - 1] = fileslist->info;
> + fileslist = fileslist->next;
> + }
>
> - /* Put all the directory strings in an array. */
> - files->ndirs = ndirlist;
> - for (unsigned int i = 0; i < ndirlist; ++i)
> - dirs[i] = dirarray[i].dir;
> - dirs[ndirlist] = NULL;
> + if (fileslist != NULL)
> + goto invalid_data;
>
> - /* Pass the file data structure to the caller. */
> - if (filesp != NULL)
> - *filesp = files;
> + const char **newdirs = (void *) &newfiles->info[nnewfiles];
> + const char **prevdirs = (void *) &prevfiles->info[nprevfiles];
> +
> + /* Copy prevdirs to newdirs. */
> + for (size_t n = 0; n < ndirs; n++)
> + newdirs[n] = prevdirs[n];
Again slightly off indentation.
But I also don't fully follow the prevdirs/newdirs copying.
Why is this? No newdirs are defined here, are there?
Maybe I don't understand the data-structure used here.
> diff --git a/tests/.gitignore b/tests/.gitignore
> index 0289959d..e40ad238 100644
> --- a/tests/.gitignore
> +++ b/tests/.gitignore
> @@ -74,6 +74,7 @@
> /funcscopes
> /get-aranges
> /get-files
> +/get-files-define_file
> /get-lines
> /get-pubnames
> /get-units-invalid
> diff --git a/tests/Makefile.am b/tests/Makefile.am
> index 9141074f..7b016dc8 100644
> --- a/tests/Makefile.am
> +++ b/tests/Makefile.am
> @@ -35,7 +35,7 @@ endif
> check_PROGRAMS = arextract arsymtest newfile saridx scnnames sectiondump \
> showptable update1 update2 update3 update4 test-nlist \
> show-die-info get-files next-files get-lines next-lines \
> - get-pubnames \
> + get-pubnames get-files-define-file \
> get-aranges allfcts line2addr addrscopes funcscopes \
> show-abbrev hash newscn ecp dwflmodtest \
> find-prologues funcretval allregs rdwrmmap \
> @@ -646,7 +646,8 @@ EXTRA_DIST = run-arextract.sh run-arsymtest.sh run-ar.sh \
> testfile-dwp-5-cu-index-overflow.dwp.bz2 \
> testfile-dwp-4-cu-index-overflow.bz2 \
> testfile-dwp-4-cu-index-overflow.dwp.bz2 \
> - testfile-dwp-cu-index-overflow.source
> + testfile-dwp-cu-index-overflow.source \
> + testfile-define-file.bz2
>
>
> if USE_VALGRIND
> @@ -725,6 +726,7 @@ show_abbrev_LDADD = $(libdw) $(libelf)
> get_lines_LDADD = $(libdw) $(libelf)
> next_lines_LDADD = $(libdw) $(libelf)
> get_files_LDADD = $(libdw) $(libelf)
> +get_files_define_file_LDADD = $(libdw) $(libelf)
> next_files_LDADD = $(libdw) $(libelf)
> get_aranges_LDADD = $(libdw) $(libelf)
> allfcts_LDADD = $(libdw) $(libelf)
> diff --git a/tests/get-files-define-file.c b/tests/get-files-define-file.c
> new file mode 100644
> index 00000000..583f9852
> --- /dev/null
> +++ b/tests/get-files-define-file.c
> @@ -0,0 +1,162 @@
> +/* Copyright (C) 2002, 2004, 2005, 2007 Red Hat, Inc.
> + This file is part of elfutils.
> + Written by Ulrich Drepper <drepper@redhat.com>, 2002.
> +
> + This file is free software; you can redistribute it and/or modify
> + it under the terms of the GNU General Public License as published by
> + the Free Software Foundation; either version 3 of the License, or
> + (at your option) any later version.
> +
> + elfutils is distributed in the hope that it will be useful, but
> + WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + GNU General Public License for more details.
> +
> + You should have received a copy of the GNU General Public License
> + along with this program. If not, see <http://www.gnu.org/licenses/>. */
> +
> +#ifdef HAVE_CONFIG_H
> +# include <config.h>
> +#endif
> +
> +#include <fcntl.h>
> +#include <libelf.h>
> +#include ELFUTILS_HEADER(dw)
> +#include <stdio.h>
> +#include <unistd.h>
> +#include "../libdw/libdwP.h"
> +
> +static void
> +print_dirs_and_files (Dwarf_Files *files, const char *const *dirs,
> + size_t nfiles, size_t ndirs)
> +{
> + if (dirs[0] == NULL)
> + puts (" dirs[0] = (null)");
> + else
> + printf (" dirs[0] = \"%s\"\n", dirs[0]);
> + for (size_t i = 1; i < ndirs; ++i)
> + printf (" dirs[%zu] = \"%s\"\n", i, dirs[i]);
> +
> + for (size_t i = 0; i < nfiles; ++i)
> + printf (" file[%zu] = \"%s\"\n", i,
> + dwarf_filesrc (files, i, NULL, NULL));
> +}
> +
> +
> +int
> +main (int argc, char *argv[])
> +{
> + int result = 0;
> + int cnt = argc - 1;
> +
> + int fd = open (argv[cnt], O_RDONLY);
> +
> + Dwarf *dbg = dwarf_begin (fd, DWARF_C_READ);
> + if (dbg == NULL)
> + {
> + printf ("%s not usable\n", argv[cnt]);
> + result = 1;
> + if (fd != -1)
> + close (fd);
> + goto out;
> + }
> +
> + Dwarf_Off o = 0;
> + Dwarf_Off ncu;
> + size_t cuhl;
> +
> + /* Just inspect the first CU. */
> + if (dwarf_nextcu (dbg, o, &ncu, &cuhl, NULL, NULL, NULL) != 0)
> + {
> + printf ("%s: cannot get CU\n", argv[cnt]);
> + result = 1;
> + goto out;
> + }
> +
> + Dwarf_Die die_mem;
> + Dwarf_Die *die = dwarf_offdie (dbg, o + cuhl, &die_mem);
> +
> + if (die == NULL)
> + {
> + printf ("%s: cannot get CU die\n", argv[cnt]);
> + result = 1;
> + goto out;
> + }
> +
> + Dwarf_Files *files;
> + size_t nfiles;
> +
> + /* The files from DW_LNE_define_file should not be included
> + until dwarf_getsrclines is called. */
> + if (dwarf_getsrcfiles (die, &files, &nfiles) != 0)
> + {
> + printf ("%s: cannot get files\n", argv[cnt]);
> + result = 1;
> + goto out;
> + }
> +
> + if (die->cu->lines != NULL)
> + {
> + printf ("%s: dwarf_getsrcfiles should not get lines\n", argv[cnt]);
> + result = 1;
> + goto out;
> + }
> +
> + const char *const *dirs;
> + size_t ndirs;
> + if (dwarf_getsrcdirs (files, &dirs, &ndirs) != 0)
> + {
> + printf ("%s: cannot get include directories\n", argv[cnt]);
> + result = 1;
> + goto out;
> + }
> +
> + /* Print file info without files from DW_LNE_define_file. */
> + print_dirs_and_files (files, dirs, nfiles, ndirs);
> +
> + Dwarf_Lines *lines;
> + size_t nlines;
> +
> + /* Reading the line program should add the new files. */
> + if (dwarf_getsrclines (die, &lines, &nlines) != 0)
> + {
> + printf ("%s: cannot get lines\n", argv[cnt]);
> + result = 1;
> + goto out;
> + }
> +
> + Dwarf_Files *updated_files;
> + size_t num_updated_files;
> +
> + /* Get the new files. */
> + if (dwarf_getsrcfiles (die, &updated_files, &num_updated_files) != 0)
> + {
> + printf ("%s: cannot get files\n", argv[cnt]);
> + result = 1;
> + goto out;
> + }
> +
> + const char *const *updated_dirs;
> + size_t num_updated_dirs;
> +
> + /* The dirs shouldn't change but verify that getsrcdirs still works. */
> + if (dwarf_getsrcdirs (updated_files, &updated_dirs, &num_updated_dirs) != 0)
> + {
> + printf ("%s: cannot get include directories\n", argv[cnt]);
> + result = 1;
> + goto out;
> + }
> +
> + /* Verify that we didn't invalidate the old file info. */
> + print_dirs_and_files (files, dirs, nfiles, ndirs);
> +
> + /* Print all files including those from DW_LNE_define_file. */
> + print_dirs_and_files (updated_files, updated_dirs,
> + num_updated_files, num_updated_dirs);
> +
> +out:
> + dwarf_end (dbg);
> + close (fd);
> +
> + return result;
> +}
>
> [...]
> diff --git a/tests/run-get-files.sh b/tests/run-get-files.sh
> index 1306544d..c2bc01bf 100755
> --- a/tests/run-get-files.sh
> +++ b/tests/run-get-files.sh
> @@ -18,7 +18,7 @@
>
> . $srcdir/test-subr.sh
>
> -testfiles testfile testfile2
> +testfiles testfile testfile2 testfile-define-file
>
> testrun_compare ${abs_builddir}/get-files testfile testfile2 <<\EOF
> cuhl = 11, o = 0, asz = 4, osz = 4, ncu = 191
> @@ -245,4 +245,67 @@ cuhl = 11, o = 0, asz = 8, osz = 4, ncu = 857
> file[3] = "/usr/include/stdc-predef.h"
> EOF
>
> +tempfiles files define-files.out get-files-define-file.out
> +
> +cat > files <<\EOF
> + dirs[0] = "session"
> + dirs[1] = "/home/wcohen/minimal_mod"
> + dirs[2] = "include/asm"
> + dirs[3] = "include/linux"
> + dirs[4] = "include/asm-generic"
> + file[0] = "???"
> + file[1] = "/home/wcohen/minimal_mod/minimal_mod.c"
> + file[2] = "include/asm/gcc_intrin.h"
> + file[3] = "include/linux/kernel.h"
> + file[4] = "include/asm/processor.h"
> + file[5] = "include/asm/types.h"
> + file[6] = "include/asm/ptrace.h"
> + file[7] = "include/linux/sched.h"
> + file[8] = "include/asm/thread_info.h"
> + file[9] = "include/linux/thread_info.h"
> + file[10] = "include/asm/atomic.h"
> + file[11] = "include/linux/list.h"
> + file[12] = "include/linux/cpumask.h"
> + file[13] = "include/linux/rbtree.h"
> + file[14] = "include/asm/page.h"
> + file[15] = "include/linux/rwsem.h"
> + file[16] = "include/asm/rwsem.h"
> + file[17] = "include/asm/spinlock.h"
> + file[18] = "include/linux/completion.h"
> + file[19] = "include/linux/wait.h"
> + file[20] = "include/linux/aio.h"
> + file[21] = "include/linux/workqueue.h"
> + file[22] = "include/linux/timer.h"
> + file[23] = "include/linux/types.h"
> + file[24] = "include/asm/posix_types.h"
> + file[25] = "include/linux/pid.h"
> + file[26] = "include/linux/time.h"
> + file[27] = "include/linux/capability.h"
> + file[28] = "include/linux/signal.h"
> + file[29] = "include/linux/resource.h"
> + file[30] = "include/linux/sem.h"
> + file[31] = "include/asm/fpu.h"
> + file[32] = "include/linux/fs_struct.h"
> + file[33] = "include/asm/signal.h"
> + file[34] = "include/asm/siginfo.h"
> + file[35] = "include/asm-generic/siginfo.h"
> + file[36] = "include/asm/nodedata.h"
> + file[37] = "include/linux/mmzone.h"
> + file[38] = "include/linux/jiffies.h"
> + file[39] = "include/asm/io.h"
> + file[40] = "include/asm/machvec.h"
> + file[41] = "include/asm/smp.h"
> + file[42] = "include/asm/numa.h"
> + file[43] = "include/linux/slab.h"
> +EOF
> +
> +# Files should be printed 3 times, followed by the files from DW_LNE_define_file
> +cat files > define-files.out
> +cat files >> define-files.out
> +cat files >> define-files.out
> +echo ' file[44] = "include/asm/abc.c"' >> define-files.out
> +echo ' file[45] = "include/linux/01.c"' >> define-files.out
> +
> +cat define-files.out | testrun_compare ${abs_builddir}/get-files-define-file testfile-define-file
> +
> exit 0
Nice.
So testfile-define-file is actually testfile36.debug but with a new
line program? How did you edit/insert that one?
Cheers,
Mark
next prev parent reply other threads:[~2024-04-10 19:47 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-10 3:45 Aaron Merey
2024-04-10 19:47 ` Mark Wielaard [this message]
2024-04-10 20:43 ` Aaron Merey
2024-04-11 9:55 ` Mark Wielaard
2024-04-11 16:41 ` Aaron Merey
-- strict thread matches above, loose matches on Subject: below --
2024-03-29 1:12 Aaron Merey
2024-04-02 14:47 ` Mark Wielaard
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2a6f04357e6cb900fe055939673a7abc542ae333.camel@klomp.org \
--to=mark@klomp.org \
--cc=amerey@redhat.com \
--cc=elfutils-devel@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).