public inbox for elfutils@sourceware.org
 help / color / mirror / Atom feed
From: Mark Wielaard <mark@klomp.org>
To: Aaron Merey <amerey@redhat.com>, elfutils-devel@sourceware.org
Subject: Re: [PATCH] libdw: dwarf_getsrcfiles should not imply dwarf_getsrclines
Date: Wed, 10 Apr 2024 21:47:56 +0200	[thread overview]
Message-ID: <2a6f04357e6cb900fe055939673a7abc542ae333.camel@klomp.org> (raw)
In-Reply-To: <20240410034539.164402-1-amerey@redhat.com>

Hi Aaron,

On Tue, 2024-04-09 at 23:45 -0400, Aaron Merey wrote:
> dwarf_getsrcfiles causes line data to be read in addition to file data.
> This is wasteful for programs which only need file or directory names.
> Debuginfod server is one such example.
> 
> Fix this by moving the srcfile reading in read_srclines into a separate
> function read_srcfiles.  This change improves debuginfod server's max
> resident set size by up to 75% during rpm indexing.
> 
> 	* libdw/dwarf_getsrcfiles.c (dwarf_getsrcfiles): Replace
> 	dwarf_getsrclines and __libdw_getsrclines with
> 	__libdw_getsrcfiles.
> 	* libdw/dwarf_getsrclines.c (read_line_header): New function.
> 	(read_srcfiles): New function.
> 	(read_srclines): Move srcfile reading into read_srcfiles.
> 	Add parameter to use cached srcfiles if available.
> 	Also merge srcfiles with any files from	DW_LNE_define_file.
> 	(__libdw_getsrclines): Changed to call get_lines_or_files.
> 	(__libdw_getsrcfiles): New function.  Calls get_lines_or_files.
> 	(get_lines_or_files): New function based on the old
> 	__libdw_getsrclines.  Call read_srcfiles if linesp is NULL,
> 	otherwise call read_srclines.  Pass previously read srcfiles
> 	to read_srclines if available.
> 	* libdw/dwarf_macro_getsrcfiles.c (dwarf_macro_getsrcfiles):
> 	Replace __libdw_getsrclines with __libdw_getsrcfiles.
> 	* libdw/libdwP.h (__libdw_getsrcfiles): New declaration.
> 	* tests/.gitignore: Add new test binary.
> 	* tests/get-files.c: Verify that dwarf_getsrcfiles does
> 	not cause srclines to be read.
> 	* tests/get-lines.c: Verify that srclines can be read
> 	after srcfiles have been read.
> 	* tests/Makefile.am: Add new testfiles.
> 	* tests/get-files-define-file.c: Print file names before
> 	and after reading DW_LNE_define_file.
> 	* tests/run-get-files.sh: Add get-files-define-file test.
> 	* tests/testfile-define-file.bz2: New testfile.  Copy of
> 	testfile36.debug but with a line program consisting of two
> 	DW_LNE_define_file opcodes.
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=27405
> 
> Signed-off-by: Aaron Merey <amerey@redhat.com>
> ---
> 
> v2 changes:
> Restored support for DW_LNE_define_file.

Great. And sorry I first suggested to just drop it and then said I
would like it back. This was more work than I though.

> Added DW_LNE_define_file testcase and testfile.

Nice. How did you edit this file?

> Added new function get_lines_or_files which is based on the old
> __libdw_getsrclines.  __libdw_getsrclines and __libdw_getsrcfiles now
> call get_lines_or_files.

It is just a simple rename, but so much nicer to my eyes. Thanks.

> Mentioned max resident set size improvement in the commit message.  I
> did not mention a 5% speed up to rpm indexing that I have previously
> talked about since I could not reliably reproduce it with v2 of the patch.
> I suspect that speed up may have been due to the smaller max RSS reducing
> swapping during that round of testing.

Makes sense. And a reduction of 75% of the max resident size it pretty
huge in itself.

> +            case DW_LNE_define_file:
> +              {
> +                char *fname = (char *) linep;
> +                uint8_t *endp = memchr (linep, '\0', lineendp - linep);
> +                if (endp == NULL)
> +                  goto invalid_data;
> +                size_t fnamelen = endp - linep;
> +                linep = endp + 1;
> +
> +                unsigned int diridx;
> +                if (unlikely (linep >= lineendp))
> +                  goto invalid_data;
> +                get_uleb128 (diridx, linep, lineendp);
> +
> +		size_t ndirs = (*filesp)->ndirs;

This one line uses tabs for indentation, all others spaces.

> +                if (unlikely (diridx >= ndirs))
> +                  {
> +                    __libdw_seterrno (DWARF_E_INVALID_DIR_IDX);
> +                    goto invalid_data;
> +                  }
> +                Dwarf_Word mtime;
> +                if (unlikely (linep >= lineendp))
> +                  goto invalid_data;
> +                get_uleb128 (mtime, linep, lineendp);
> +                Dwarf_Word filelength;
> +                if (unlikely (linep >= lineendp))
> +                  goto invalid_data;
> +                get_uleb128 (filelength, linep, lineendp);
> +
> +		/* Add new_file to filelist that will be merged with filesp.  */
> +                struct filelist *new_file = malloc (sizeof (struct filelist));
> +		if (unlikely (new_file == NULL))
>  		  {
> -		    __libdw_seterrno (DWARF_E_INVALID_DIR_IDX);
> -		    goto invalid_data;
> +		    __libdw_seterrno (DWARF_E_NOMEM);
> +		    goto out;
>  		  }

And this last block uses tabs again.

> -		Dwarf_Word mtime;
> -		if (unlikely (linep >= lineendp))
> -		  goto invalid_data;
> -		get_uleb128 (mtime, linep, lineendp);
> -		Dwarf_Word filelength;
> -		if (unlikely (linep >= lineendp))
> -		  goto invalid_data;
> -		get_uleb128 (filelength, linep, lineendp);
> -
> -		struct filelist *new_file = NEW_FILE ();
> -		if (fname[0] == '/')
> -		  new_file->info.name = fname;
> -		else
> -		  {
> -		    new_file->info.name =
> -		      libdw_alloc (dbg, char, 1, (dirarray[diridx].len + 1
> -						  + fnamelen + 1));
> -		    char *cp = new_file->info.name;
> -
> -		    if (dirarray[diridx].dir != NULL)
> -		      /* This value could be NULL in case the
> -			 DW_AT_comp_dir was not present.  We
> -			 cannot do much in this case.  Just
> -			 keep the file relative.  */
> -		      {
> -			cp = stpcpy (cp, dirarray[diridx].dir);
> -			*cp++ = '/';
> -		      }
> -		    strcpy (cp, fname);
> -		  }
> -
> -		new_file->info.mtime = mtime;
> -		new_file->info.length = filelength;
> -	      }
> -	      break;
> +		nfilelist++;
> +		new_file->next = filelist;
> +		filelist = new_file;

Here tabs again, then the rest spaces.

> +                if (fname[0] == '/')
> +                  new_file->info.name = fname;
> +                else
> +                  {
> +		    /* Directory names are stored in a char *[ndirs] located
> +		       after the last Dwarf_Fileinfo_s.  */
> +		    size_t nfiles = (*filesp)->nfiles;
> +		    const char **dirarray
> +		      = (const char **) &((*filesp)->info[nfiles]);
> +
> +		    const char *dname = dirarray[diridx];
> +		    size_t dnamelen = strlen (dname);
> +
> +                    new_file->info.name =
> +                      libdw_alloc (dbg, char, 1, (dnamelen + fnamelen + 2));
> +                    char *cp = new_file->info.name;
> +
> +                    if (dname != NULL)
> +
> +                      /* This value could be NULL in case the
> +                         DW_AT_comp_dir was not present.  We
> +                         cannot do much in this case.  Just
> +                         keep the file relative.  */
> +
> +                      {
> +                        cp = stpcpy (cp, dname);
> +                        *cp++ = '/';
> +                      }
> +                    strcpy (cp, fname);
> +                  }
> +
> +                new_file->info.mtime = mtime;
> +                new_file->info.length = filelength;
> +              }
> +              break;
> 

So inconsistent indentation, but the code looks good.

> @@ -1027,32 +1183,49 @@ read_srclines (Dwarf *dbg,
>  	}
>      }
>  
> -  /* Put all the files in an array.  */
> -  Dwarf_Files *files = libdw_alloc (dbg, Dwarf_Files,
> -				    sizeof (Dwarf_Files)
> -				    + nfilelist * sizeof (Dwarf_Fileinfo)
> -				    + (ndirlist + 1) * sizeof (char *),
> -				    1);
> -  const char **dirs = (void *) &files->info[nfilelist];
> -
> -  struct filelist *fileslist = filelist;
> -  files->nfiles = nfilelist;
> -  for (size_t n = nfilelist; n > 0; n--)
> +  /* Merge filesp with the files from DW_LNE_define_file, if any.  */
> +  if (unlikely (filelist != NULL))
>      {
> -      files->info[n - 1] = fileslist->info;
> -      fileslist = fileslist->next;
> -    }
> -  assert (fileslist == NULL);
> +      Dwarf_Files *prevfiles = *filesp;
> +      size_t ndirs = prevfiles->ndirs;
> +      size_t nprevfiles = prevfiles->nfiles;
> +      size_t nnewfiles = nprevfiles + nfilelist;
> +
> +      Dwarf_Files *newfiles
> +	= libdw_alloc (dbg, Dwarf_Files,
> +		       sizeof (Dwarf_Files)
> +                       + nnewfiles * sizeof (Dwarf_Fileinfo)
> +                       + (ndirs + 1) * sizeof (char *),
> +                       1);
> +
> +
> +      /* Copy prevfiles to newfiles.  */
> +      for (size_t n = 0; n < nprevfiles; n++)
> +	newfiles->info[n] = prevfiles->info[n];
> +
> +      /* Add files from DW_LNE_define_file to newfiles.  */
> +      struct filelist *fileslist = filelist;
> +      for (size_t n = nfilelist; n > 0; n--)
> +	{
> +	  newfiles->info[nprevfiles + n - 1] = fileslist->info;
> +	  fileslist = fileslist->next;
> +	}
>  
> -  /* Put all the directory strings in an array.  */
> -  files->ndirs = ndirlist;
> -  for (unsigned int i = 0; i < ndirlist; ++i)
> -    dirs[i] = dirarray[i].dir;
> -  dirs[ndirlist] = NULL;
> +      if (fileslist != NULL)
> +	goto invalid_data;
>  
> -  /* Pass the file data structure to the caller.  */
> -  if (filesp != NULL)
> -    *filesp = files;
> +      const char **newdirs = (void *) &newfiles->info[nnewfiles];
> +      const char **prevdirs = (void *) &prevfiles->info[nprevfiles];
> +
> +      /* Copy prevdirs to newdirs.  */
> +      for (size_t n = 0; n < ndirs; n++)
> +	newdirs[n] = prevdirs[n];

Again slightly off indentation.
But I also don't fully follow the prevdirs/newdirs copying.
Why is this? No newdirs are defined here, are there?
Maybe I don't understand the data-structure used here.


> diff --git a/tests/.gitignore b/tests/.gitignore
> index 0289959d..e40ad238 100644
> --- a/tests/.gitignore
> +++ b/tests/.gitignore
> @@ -74,6 +74,7 @@
>  /funcscopes
>  /get-aranges
>  /get-files
> +/get-files-define_file
>  /get-lines
>  /get-pubnames
>  /get-units-invalid
> diff --git a/tests/Makefile.am b/tests/Makefile.am
> index 9141074f..7b016dc8 100644
> --- a/tests/Makefile.am
> +++ b/tests/Makefile.am
> @@ -35,7 +35,7 @@ endif
>  check_PROGRAMS = arextract arsymtest newfile saridx scnnames sectiondump \
>  		  showptable update1 update2 update3 update4 test-nlist \
>  		  show-die-info get-files next-files get-lines next-lines \
> -		  get-pubnames \
> +		  get-pubnames get-files-define-file \
>  		  get-aranges allfcts line2addr addrscopes funcscopes \
>  		  show-abbrev hash newscn ecp dwflmodtest \
>  		  find-prologues funcretval allregs rdwrmmap \
> @@ -646,7 +646,8 @@ EXTRA_DIST = run-arextract.sh run-arsymtest.sh run-ar.sh \
>  	     testfile-dwp-5-cu-index-overflow.dwp.bz2 \
>  	     testfile-dwp-4-cu-index-overflow.bz2 \
>  	     testfile-dwp-4-cu-index-overflow.dwp.bz2 \
> -	     testfile-dwp-cu-index-overflow.source
> +	     testfile-dwp-cu-index-overflow.source \
> +	     testfile-define-file.bz2
>  
>  
>  if USE_VALGRIND
> @@ -725,6 +726,7 @@ show_abbrev_LDADD = $(libdw) $(libelf)
>  get_lines_LDADD = $(libdw) $(libelf)
>  next_lines_LDADD = $(libdw) $(libelf)
>  get_files_LDADD = $(libdw) $(libelf)
> +get_files_define_file_LDADD = $(libdw) $(libelf)
>  next_files_LDADD = $(libdw) $(libelf)
>  get_aranges_LDADD = $(libdw) $(libelf)
>  allfcts_LDADD = $(libdw) $(libelf)
> diff --git a/tests/get-files-define-file.c b/tests/get-files-define-file.c
> new file mode 100644
> index 00000000..583f9852
> --- /dev/null
> +++ b/tests/get-files-define-file.c
> @@ -0,0 +1,162 @@
> +/* Copyright (C) 2002, 2004, 2005, 2007 Red Hat, Inc.
> +   This file is part of elfutils.
> +   Written by Ulrich Drepper <drepper@redhat.com>, 2002.
> +
> +   This file is free software; you can redistribute it and/or modify
> +   it under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3 of the License, or
> +   (at your option) any later version.
> +
> +   elfutils is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +   GNU General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
> +
> +#ifdef HAVE_CONFIG_H
> +# include <config.h>
> +#endif
> +
> +#include <fcntl.h>
> +#include <libelf.h>
> +#include ELFUTILS_HEADER(dw)
> +#include <stdio.h>
> +#include <unistd.h>
> +#include "../libdw/libdwP.h"
> +
> +static void
> +print_dirs_and_files (Dwarf_Files *files, const char *const *dirs,
> +		      size_t nfiles, size_t ndirs)
> +{
> +  if (dirs[0] == NULL)
> +    puts (" dirs[0] = (null)");
> +  else
> +    printf (" dirs[0] = \"%s\"\n", dirs[0]);
> +  for (size_t i = 1; i < ndirs; ++i)
> +    printf (" dirs[%zu] = \"%s\"\n", i, dirs[i]);
> +
> +  for (size_t i = 0; i < nfiles; ++i)
> +    printf (" file[%zu] = \"%s\"\n", i,
> +	    dwarf_filesrc (files, i, NULL, NULL));
> +}
> +
> +
> +int
> +main (int argc, char *argv[])
> +{
> +  int result = 0;
> +  int cnt = argc - 1;
> +
> +  int fd = open (argv[cnt], O_RDONLY);
> +
> +  Dwarf *dbg = dwarf_begin (fd, DWARF_C_READ);
> +  if (dbg == NULL)
> +    {
> +      printf ("%s not usable\n", argv[cnt]);
> +      result = 1;
> +      if (fd != -1)
> +	close (fd);
> +      goto out;
> +    }
> +
> +  Dwarf_Off o = 0;
> +  Dwarf_Off ncu;
> +  size_t cuhl;
> +
> +  /* Just inspect the first CU.  */
> +  if (dwarf_nextcu (dbg, o, &ncu, &cuhl, NULL, NULL, NULL) != 0)
> +    {
> +      printf ("%s: cannot get CU\n", argv[cnt]);
> +      result = 1;
> +      goto out;
> +    }
> +
> +  Dwarf_Die die_mem;
> +  Dwarf_Die *die = dwarf_offdie (dbg, o + cuhl, &die_mem);
> +
> +  if (die == NULL)
> +    {
> +      printf ("%s: cannot get CU die\n", argv[cnt]);
> +      result = 1;
> +      goto out;
> +    }
> +
> +  Dwarf_Files *files;
> +  size_t nfiles;
> +
> +  /* The files from DW_LNE_define_file should not be included
> +     until dwarf_getsrclines is called.  */
> +  if (dwarf_getsrcfiles (die, &files, &nfiles) != 0)
> +    {
> +      printf ("%s: cannot get files\n", argv[cnt]);
> +      result = 1;
> +      goto out;
> +    }
> +
> +  if (die->cu->lines != NULL)
> +    {
> +      printf ("%s: dwarf_getsrcfiles should not get lines\n", argv[cnt]);
> +      result = 1;
> +      goto out;
> +    }
> +
> +  const char *const *dirs;
> +  size_t ndirs;
> +  if (dwarf_getsrcdirs (files, &dirs, &ndirs) != 0)
> +    {
> +      printf ("%s: cannot get include directories\n", argv[cnt]);
> +      result = 1;
> +      goto out;
> +    }
> +
> +  /* Print file info without files from DW_LNE_define_file.  */
> +  print_dirs_and_files (files, dirs, nfiles, ndirs);
> +
> +  Dwarf_Lines *lines;
> +  size_t nlines;
> +
> +  /* Reading the line program should add the new files.  */
> +  if (dwarf_getsrclines (die, &lines, &nlines) != 0)
> +    {
> +      printf ("%s: cannot get lines\n", argv[cnt]);
> +      result = 1;
> +      goto out;
> +    }
> +
> +  Dwarf_Files *updated_files;
> +  size_t num_updated_files;
> +
> +  /* Get the new files.  */
> +  if (dwarf_getsrcfiles (die, &updated_files, &num_updated_files) != 0)
> +    {
> +      printf ("%s: cannot get files\n", argv[cnt]);
> +      result = 1;
> +      goto out;
> +    }
> +
> +  const char *const *updated_dirs;
> +  size_t num_updated_dirs;
> +
> +  /* The dirs shouldn't change but verify that getsrcdirs still works.  */
> +  if (dwarf_getsrcdirs (updated_files, &updated_dirs, &num_updated_dirs) != 0)
> +    {
> +      printf ("%s: cannot get include directories\n", argv[cnt]);
> +      result = 1;
> +      goto out;
> +    }
> +
> +  /* Verify that we didn't invalidate the old file info.  */
> +  print_dirs_and_files (files, dirs, nfiles, ndirs);
> +
> +  /* Print all files including those from DW_LNE_define_file.  */
> +  print_dirs_and_files (updated_files, updated_dirs,
> +			num_updated_files, num_updated_dirs);
> +
> +out:
> +  dwarf_end (dbg);
> +  close (fd);
> +
> +  return result;
> +}
> 
> [...]
> diff --git a/tests/run-get-files.sh b/tests/run-get-files.sh
> index 1306544d..c2bc01bf 100755
> --- a/tests/run-get-files.sh
> +++ b/tests/run-get-files.sh
> @@ -18,7 +18,7 @@
>  
>  . $srcdir/test-subr.sh
>  
> -testfiles testfile testfile2
> +testfiles testfile testfile2 testfile-define-file
>  
>  testrun_compare ${abs_builddir}/get-files testfile testfile2 <<\EOF
>  cuhl = 11, o = 0, asz = 4, osz = 4, ncu = 191
> @@ -245,4 +245,67 @@ cuhl = 11, o = 0, asz = 8, osz = 4, ncu = 857
>   file[3] = "/usr/include/stdc-predef.h"
>  EOF
>  
> +tempfiles files define-files.out get-files-define-file.out
> +
> +cat > files <<\EOF
> + dirs[0] = "session"
> + dirs[1] = "/home/wcohen/minimal_mod"
> + dirs[2] = "include/asm"
> + dirs[3] = "include/linux"
> + dirs[4] = "include/asm-generic"
> + file[0] = "???"
> + file[1] = "/home/wcohen/minimal_mod/minimal_mod.c"
> + file[2] = "include/asm/gcc_intrin.h"
> + file[3] = "include/linux/kernel.h"
> + file[4] = "include/asm/processor.h"
> + file[5] = "include/asm/types.h"
> + file[6] = "include/asm/ptrace.h"
> + file[7] = "include/linux/sched.h"
> + file[8] = "include/asm/thread_info.h"
> + file[9] = "include/linux/thread_info.h"
> + file[10] = "include/asm/atomic.h"
> + file[11] = "include/linux/list.h"
> + file[12] = "include/linux/cpumask.h"
> + file[13] = "include/linux/rbtree.h"
> + file[14] = "include/asm/page.h"
> + file[15] = "include/linux/rwsem.h"
> + file[16] = "include/asm/rwsem.h"
> + file[17] = "include/asm/spinlock.h"
> + file[18] = "include/linux/completion.h"
> + file[19] = "include/linux/wait.h"
> + file[20] = "include/linux/aio.h"
> + file[21] = "include/linux/workqueue.h"
> + file[22] = "include/linux/timer.h"
> + file[23] = "include/linux/types.h"
> + file[24] = "include/asm/posix_types.h"
> + file[25] = "include/linux/pid.h"
> + file[26] = "include/linux/time.h"
> + file[27] = "include/linux/capability.h"
> + file[28] = "include/linux/signal.h"
> + file[29] = "include/linux/resource.h"
> + file[30] = "include/linux/sem.h"
> + file[31] = "include/asm/fpu.h"
> + file[32] = "include/linux/fs_struct.h"
> + file[33] = "include/asm/signal.h"
> + file[34] = "include/asm/siginfo.h"
> + file[35] = "include/asm-generic/siginfo.h"
> + file[36] = "include/asm/nodedata.h"
> + file[37] = "include/linux/mmzone.h"
> + file[38] = "include/linux/jiffies.h"
> + file[39] = "include/asm/io.h"
> + file[40] = "include/asm/machvec.h"
> + file[41] = "include/asm/smp.h"
> + file[42] = "include/asm/numa.h"
> + file[43] = "include/linux/slab.h"
> +EOF
> +
> +# Files should be printed 3 times, followed by the files from DW_LNE_define_file
> +cat files > define-files.out
> +cat files >> define-files.out
> +cat files >> define-files.out
> +echo ' file[44] = "include/asm/abc.c"' >> define-files.out
> +echo ' file[45] = "include/linux/01.c"' >> define-files.out
> +
> +cat define-files.out | testrun_compare ${abs_builddir}/get-files-define-file testfile-define-file
> +
>  exit 0

Nice.

So testfile-define-file is actually testfile36.debug but with a new
line program? How did you edit/insert that one?

Cheers,

Mark

  reply	other threads:[~2024-04-10 19:47 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-10  3:45 Aaron Merey
2024-04-10 19:47 ` Mark Wielaard [this message]
2024-04-10 20:43   ` Aaron Merey
2024-04-11  9:55     ` Mark Wielaard
2024-04-11 16:41       ` Aaron Merey
  -- strict thread matches above, loose matches on Subject: below --
2024-03-29  1:12 Aaron Merey
2024-04-02 14:47 ` Mark Wielaard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2a6f04357e6cb900fe055939673a7abc542ae333.camel@klomp.org \
    --to=mark@klomp.org \
    --cc=amerey@redhat.com \
    --cc=elfutils-devel@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).