public inbox for elfutils@sourceware.org
 help / color / mirror / Atom feed
From: Aaron Merey <amerey@redhat.com>
To: Housam Alamour <halamour@redhat.com>
Cc: elfutils-devel@sourceware.org
Subject: Re: [PATCH] PR 30991: srcfiles tarball feature
Date: Mon, 22 Jan 2024 22:02:44 -0500	[thread overview]
Message-ID: <CAJDtP-Rdh=qj2x8Nmn0jmOO_VzCaGb+jcWq3WwOYch9RcgLNmg@mail.gmail.com> (raw)
In-Reply-To: <20240119164716.261781-1-halamour@redhat.com>

Hi Housam,

This is a very cool feature.  Thanks again for working on this!

On Fri, Jan 19, 2024 at 11:47 AM Housam Alamour <halamour@redhat.com> wrote:
>
> * srcfiles.cxx: Introduce new --zip option that places all the
>     source files associated with a specified dwarf/elf file
>     into a zip file and sends it to stdout. Files may be
>     fetched from debuginfod (if applicable) or locally as
>     a backup.
>     Added -b option to disable the backup of checking
>     for files locally in -z mode.
>
> * run-srcfiles-self.sh: Added test-case for the new zip
>     feature that archives the source files of the srcfiles
>     tool and checks archive integrity. An additional test
>     ensures that if debuginfod is enabled, the files are
>     fetched and archived properly while maintaing integrity.

Should be 'maintaining'.

>
> * debuginfod-subr.sh: On very slow/remote storage, it can
>     take O(minute) to finish indexing the entire elfutils
>     build tree, so a wait_ready4 shell function is one
>     way to let a longer debuginfod wait operation work.
>
> * srcfiles.1, NEWS: Added documentation for the new zip feature.
>
> * configure.ac: Simplify check for libarchive for srcfiles.cxx
>     by integrating it into the same check for debuginfod.
>
> * Makefile.am: build with local copy of debuginfod-client.
>
> Example:
> % ./src/srcfiles -z -e /bin/ls > output.zip
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=30991
>
> Signed-off-by: Housam Alamour <halamour@redhat.com>
> ---
>  NEWS                       |   8 +
>  configure.ac               |   5 +-
>  debuginfod/debuginfod.cxx  |   2 -
>  doc/srcfiles.1             |  26 +++-
>  src/Makefile.am            |   9 +-
>  src/srcfiles.cxx           | 307 ++++++++++++++++++++++++++++++++-----
>  tests/debuginfod-subr.sh   |  14 +-
>  tests/run-srcfiles-self.sh |  77 +++++++++-
>  8 files changed, 394 insertions(+), 54 deletions(-)
>
> diff --git a/NEWS b/NEWS
> index 0420d3b8..3391d6a1 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -1,3 +1,8 @@
> +Version 0.191 (after 0.189)
> +
> +srcfiles: Can now fetch the source files of a DWARF/ELF file and
> +          place them into a zip.
> +
>  Version 0.190 "Woke!"
>
>  CONTRIBUTING: Switch from real name policy to known identity policy.
> @@ -9,6 +14,9 @@ libelf: Add RELR support.
>
>  libdw: Recognize .debug_[ct]u_index sections
>
> +srcfiles: added srcfiles tool that lists all the source files of a given
> +          DWARF/ELF file.
> +
>  readelf: Support readelf -Ds, --use-dynamic --symbol.
>           Support .gdb_index version 9
>
> diff --git a/configure.ac b/configure.ac
> index af5b6bf7..ddb79b83 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -841,7 +841,7 @@ AM_CONDITIONAL([LIBDEBUGINFOD],[test "x$enable_libdebuginfod" = "xyes" || test "
>  AM_CONDITIONAL([DUMMY_LIBDEBUGINFOD],[test "x$enable_libdebuginfod" = "xdummy"])
>  AC_CHECK_HEADERS([execinfo.h])
>
> -# Look for libmicrohttpd, libarchive, sqlite for debuginfo server
> +# Look for libmicrohttpd, libarchive, sqlite for debuginfo server and srcfiles tool
>  # minimum versions as per rhel7.
>  AC_ARG_ENABLE([debuginfod],AS_HELP_STRING([--enable-debuginfod], [Build debuginfod server]))
>  AS_IF([test "x$enable_debuginfod" != "xno"], [
> @@ -853,11 +853,12 @@ AS_IF([test "x$enable_debuginfod" != "xno"], [
>        AC_MSG_ERROR([need libdebuginfod (or dummy), use --disable-debuginfod to disable.])
>      fi
>      enable_debuginfod=yes # presume success
> +    AC_DEFINE([HAVE_LIBARCHIVE], [1], [Define to 1 if libarchive is available]) # presume success
>      PKG_PROG_PKG_CONFIG
>      PKG_CHECK_MODULES([libmicrohttpd],[libmicrohttpd >= 0.9.33],[],[enable_debuginfod=no])
>      PKG_CHECK_MODULES([oldlibmicrohttpd],[libmicrohttpd < 0.9.51],[old_libmicrohttpd=yes],[old_libmicrohttpd=no])
>      PKG_CHECK_MODULES([sqlite3],[sqlite3 >= 3.7.17],[],[enable_debuginfod=no])
> -    PKG_CHECK_MODULES([libarchive],[libarchive >= 3.1.2],[],[enable_debuginfod=no])
> +    PKG_CHECK_MODULES([libarchive],[libarchive >= 3.1.2],[],[enable_debuginfod=no], AC_DEFINE([HAVE_LIBARCHIVE], [0], [Define to 0 if libarchive is not available]))
>      if test "x$enable_debuginfod" = "xno"; then
>        AC_MSG_ERROR([dependencies not found, use --disable-debuginfod to disable.])
>      fi
> diff --git a/debuginfod/debuginfod.cxx b/debuginfod/debuginfod.cxx
> index 524be948..6b21f46f 100644
> --- a/debuginfod/debuginfod.cxx
> +++ b/debuginfod/debuginfod.cxx
> @@ -2996,8 +2996,6 @@ dwarf_extract_source_paths (Elf *elf, set<string>& debug_sourcefiles)
>
>        if (comp_dir[0] == '\0' && cuname[0] != '/')
>          {
> -          // This is a common symptom for dwz-compressed debug files,
> -          // where the altdebug file cannot be resolved.
>            if (verbose > 3)
>              obatched(clog) << "skipping cu=" << cuname << " due to empty comp_dir" << endl;
>            continue;
> diff --git a/doc/srcfiles.1 b/doc/srcfiles.1
> index 6149c21b..a7cde664 100644
> --- a/doc/srcfiles.1
> +++ b/doc/srcfiles.1
> @@ -21,15 +21,18 @@
>  eu-srcfiles \- Lists the source files of a DWARF/ELF file.
>
>  .SH "SYNOPSIS"
> -eu-srcfiles [\fB\-0\fR|\fB\-\-null\fR] [\fB\-c\fR|\fB\-\-cu\-only\fR] [\fB\-v\fR|\fB\-\-verbose\fR] INPUT
> +eu-srcfiles [\fB\-0\fR|\fB\-\-null\fR] [\fB\-c\fR|\fB\-\-cu\-only\fR] [\fB\-v\fR|\fB\-\-verbose\fR] [\fB\-z\fR|\fB\-\-zip\fR] INPUT
>
>  .SH "DESCRIPTION"
> -\fBeu-srcfiles\fR lists the source files of a given \s-DWARF/ELF\s0
> +\fBeu-srcfiles\fR lists all the source files of a given DWARF/ELF
>  file.  This list is based on a search of the DWARF debuginfo, which
>  may be automatically fetched by debuginfod if applicable.  The target
>  file may be an executable, a coredump, a process, or even the running
> -kernel.  The default is the file 'a.out'.  The source file names are
> -made unique and printed to standard output.
> +kernel.  The default input is the file 'a.out'.  The source file names are
> +made unique by prepending the full path name and then printed to standard output. The source files can be
> +placed in a zip file that is sent to stdout.
> +
> +Note that all occurrences of '/./' and '/../' in the path name are canonicalized.
>
>  .SH "INPUT OPTIONS"
>  The long and short forms of options, shown here as alternatives, are
> @@ -82,12 +85,20 @@ Separate items by a null instead of a newline.
>
>  .TP
>  \fB\-c, \-\-cu\-only\fR
> -Only list the CU names.
> +Only list the (compilation unit) CU names.

I think this should be either "compilation unit (CU)" or CU (compilation unit)".

>  .TP
>  \fB\-v, \-\-verbose\fR
>  Increase verbosity of logging messages.
>
> +.TP
> +\fB\-z, \-\-zip\fR
> +Zip all the source files and send to stdout.
> +Files may be automatically fetched by
> +debuginfod (if applicable) or locally as a
> +backup. Any source files that were not found
> +will not be archived.
> +
>
>  .SH EXAMPLES
>
> @@ -119,6 +130,11 @@ List the source files of a kernel image.
>  eu-srcfiles -e /boot/vmlinuz-`uname -r`
>  .ESAMPLE
>
> +Zip all the source files for a binary.
> +.SAMPLE
> +eu-srcfiles -z -e /bin/ls > ls.zip
> +.ESAMPLE
> +
>
>  .SH "AUTHOR"
>  Written by Housam Alamour.
> diff --git a/src/Makefile.am b/src/Makefile.am
> index d3d9d408..5fa8df3d 100644
> --- a/src/Makefile.am
> +++ b/src/Makefile.am
> @@ -21,7 +21,7 @@ DEFS += $(YYDEBUG) -DDEBUGPRED=@DEBUGPRED@ \
>         -DSRCDIR=\"$(shell cd $(srcdir);pwd)\" -DOBJDIR=\"$(shell pwd)\"
>  AM_CPPFLAGS += -I$(srcdir)/../libelf -I$(srcdir)/../libebl \
>             -I$(srcdir)/../libdw -I$(srcdir)/../libdwelf \
> -           -I$(srcdir)/../libdwfl -I$(srcdir)/../libasm
> +           -I$(srcdir)/../libdwfl -I$(srcdir)/../libasm -I../debuginfod
>
>  AM_LDFLAGS = -Wl,-rpath-link,../libelf:../libdw $(STACK_USAGE_NO_ERROR)
>
> @@ -50,6 +50,11 @@ libelf = ../libelf/libelf.so
>  endif
>  libebl = ../libebl/libebl.a ../backends/libebl_backends.a ../libcpu/libcpu.a
>  libeu = ../lib/libeu.a
> +if LIBDEBUGINFOD
> +libdebuginfod = ../debuginfod/libdebuginfod.so
> +else
> +libdebuginfod =
> +endif
>
>  if DEMANGLE
>  demanglelib = -lstdc++
> @@ -85,7 +90,7 @@ stack_LDADD = $(libebl) $(libelf) $(libdw) $(libeu) $(argp_LDADD) $(demanglelib)
>  elfcompress_LDADD = $(libebl) $(libelf) $(libdw) $(libeu) $(argp_LDADD)
>  elfclassify_LDADD = $(libelf) $(libdw) $(libeu) $(argp_LDADD)
>  srcfiles_SOURCES = srcfiles.cxx
> -srcfiles_LDADD = $(libdw) $(libelf) $(libeu)  $(argp_LDADD)
> +srcfiles_LDADD = $(libdw) $(libelf) $(libeu)  $(argp_LDADD) $(libarchive_LIBS) $(libdebuginfod)
>
>  installcheck-binPROGRAMS: $(bin_PROGRAMS)
>         bad=0; pid=$$$$; list="$(bin_PROGRAMS)"; for p in $$list; do \
> diff --git a/src/srcfiles.cxx b/src/srcfiles.cxx
> index 3c7afdc4..0e4f756f 100644
> --- a/src/srcfiles.cxx
> +++ b/src/srcfiles.cxx
> @@ -15,6 +15,21 @@
>
>     You should have received a copy of the GNU General Public License
>     along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
> +
> +
> +/* In case we have a bad fts we include this before config.h because it
> +   can't handle _FILE_OFFSET_BITS.
> +   Everything we need here is fine if its declarations just come first.
> +   Also, include sys/types.h before fts. On some systems fts.h is not self
> +   contained. */
> +#ifdef BAD_FTS
> +  #include <sys/types.h>
> +  #include <fts.h>
> +#endif
> +
> +#ifdef HAVE_CONFIG_H
> +# include <config.h>
> +#endif
>
>  #include "printversion.h"
>  #include <dwarf.h>
> @@ -23,23 +38,50 @@
>  #include <set>
>  #include <string>
>  #include <cassert>
> -#include <config.h>
> +#include <gelf.h>
> +#include <memory>
> +
> +#ifdef ENABLE_LIBDEBUGINFOD
> +  #include "debuginfod.h"
> +#endif
>
>  #include <libdwfl.h>
>  #include <fcntl.h>
>  #include <iostream>
>  #include <libdw.h>
> +#include <sstream>
> +#include <vector>
> +
> +/* Libraries for use by the --zip option */
> +#ifdef HAVE_LIBARCHIVE
> +  #include <archive.h>
> +  #include <archive_entry.h>
> +#endif
> +
> +/* If fts.h is included before config.h, its indirect inclusions may not
> +   give us the right LFS aliases of these functions, so map them manually. */
> +#ifdef BAD_FTS
> +  #ifdef _FILE_OFFSET_BITS
> +    #define open open64
> +    #define fopen fopen64
> +  #endif
> +#else
> +  #include <sys/types.h>
> +  #include <fts.h>
> +#endif
>
>  using namespace std;
>
> -/* Name and version of program.  */
> +/* Name and version of program. */

FYI it's common to leave two spaces after a period.  But if you
prefer just one space that's fine too.

>  ARGP_PROGRAM_VERSION_HOOK_DEF = print_version;
>
> -/* Bug report address.  */
> +/* Bug report address. */
>  ARGP_PROGRAM_BUG_ADDRESS_DEF = PACKAGE_BUGREPORT;
>
> -/* Definitions of arguments for argp functions.  */
> -static const struct argp_option options[] =
> +constexpr size_t BUFFER_SIZE = 8192;
> +
> +/* Definitions of arguments for argp functions. */
> +static const struct argp_option options[] =
>  {
>    { NULL, 0, NULL, OPTION_DOC, N_("Output options:"), 1 },
>    { "null", '0', NULL, 0,
> @@ -47,21 +89,30 @@ static const struct argp_option options[] =
>    { "verbose", 'v', NULL, 0,
>      N_ ("Increase verbosity of logging messages."), 0 },
>    { "cu-only", 'c', NULL, 0, N_ ("Only list the CU names."), 0 },
> -  { NULL, 0, NULL, 0, NULL, 0 }
> +  #ifdef HAVE_LIBARCHIVE
> +  { "zip", 'z', NULL, 0, N_ ("Zip all the source files and send to stdout."
> +    "Cannot be used with the null option"), 0 },
> +    #ifdef ENABLE_LIBDEBUGINFOD
> +    { "no-backup", 'b', NULL, 0, N_ ("Disables local source file search when "
> +      "debuginfod fails to fetch files. This option is only applicable"

There should be a space after "send to stdout." and "is only
applicable".

I would emphasize in the -z help text and man page entry that
debuginfod queries, if enabled, take priority over local source
files by default.

I would also keep -z and -b options even when srcfiles isn't
built with libarchive/libdebuginfod.  If -z or -b is given
in these cases, we can just emit a warning that the option
isn't supported and will be ignored.  This helps prevent
immediate error exits due to unrecognized options, which
can be inconvenient when srcfiles is used in a script.

Also it looks like the man page is missing an entry for -b.

> +      "when fetching and zipping files."), 0 },
> +    #endif
> +  #endif
> +  { NULL, 0, NULL, 0, NULL, 0 }
>  };
>
> -/* Short description of program.  */
> +/* Short description of program. */
>  static const char doc[] = N_("Lists the source files of a DWARF/ELF file. The default input is the file 'a.out'.");
>
> -/* Strings for arguments in help texts.  */
> +/* Strings for arguments in help texts. */
>  static const char args_doc[] = N_("INPUT");
>
> -/* Prototype for option handler.  */
> +/* Prototype for option handler. */
>  static error_t parse_opt (int key, char *arg, struct argp_state *state);
>
> -static struct argp_child argp_children[2]; /* [0] is set in main.  */
> +static struct argp_child argp_children[2]; /* [0] is set in main. */
>
> -/* Data structure to communicate with argp functions.  */
> +/* Data structure to communicate with argp functions. */
>  static const struct argp argp =
>  {
>    options, parse_opt, args_doc, doc, argp_children, NULL, NULL
> @@ -73,8 +124,19 @@ static bool verbose;
>  static bool null_arg;
>  /* Only print compilation unit names. */
>  static bool CU_only;
> -
> -/* Handle program arguments. */
> +#ifdef HAVE_LIBARCHIVE
> +  /* Zip all the source files and send to stdout. */
> +  static bool zip;
> +
> +  #ifdef ENABLE_LIBDEBUGINFOD
> +    /* Disables local source file search when debuginfod fails to fetch them.
> +       This option is only applicable when fetching and zipping files.*/
> +    static bool no_backup;
> +  #endif
> +#endif
> +
> +/* Handle program arguments. Note null arg and zip
> +    cannot be combined due to warnings raised when unzipping. */
>  static error_t
>  parse_opt (int key, char *arg, struct argp_state *state)
>  {
> @@ -97,6 +159,18 @@ parse_opt (int key, char *arg, struct argp_state *state)
>      case 'c':
>        CU_only = true;
>        break;
> +
> +    #ifdef HAVE_LIBARCHIVE
> +      case 'z':
> +      zip = true;
> +      break;
> +
> +      #ifdef ENABLE_LIBDEBUGINFOD
> +        case 'b':
> +        no_backup = true;
> +        break;
> +      #endif
> +    #endif
>
>      default:
>        return ARGP_ERR_UNKNOWN;
> @@ -104,11 +178,40 @@ parse_opt (int key, char *arg, struct argp_state *state)
>    return 0;
>  }
>
> +/* Remove the "/./" , "../" and the preceding directory
> +    that some paths include which raise errors during unzip. */
> +string canonicalize_path(string path)
> +{
> +    stringstream ss(path);
> +    string token;
> +    vector<string> tokens;
> +    /* Extract each directory of the path and place into a vector. */
> +    while (getline(ss, token, '/')) {
> +      /* Ignore any empty //, or /./ dirs. */
> +        if (token == "" || token == ".")
> +            continue;
> +      /* When /.. is encountered, remove the most recent directory from the vector. */
> +        else if (token == "..") {
> +            if (!tokens.empty())
> +                tokens.pop_back();
> +        } else
> +            tokens.push_back(token);
> +    }
> +    stringstream result;
> +    if (tokens.empty())
> +        return "/";
> +    /* Reconstruct the path from the extracted directories. */
> +    for (const string &t : tokens) {
> +        result << '/' << t;
> +    }
> +    return result.str();
> +}
>
> -/* Global list of collected source files.  Normally, it'll contain
> -   the sources of just one named binary, but the '-K' option can cause
> -   multiple dwfl modules to be loaded, thus listed.   */
> -   set<string> debug_sourcefiles;
> +/* Global list of collected source files and their respective module.
> +   Normally, it'll contain the sources of just one named binary, but
> +   the '-K' option can cause multiple dwfl modules to be loaded, thus
> +   listed. */
> +set<pair<string, Dwfl_Module*>> debug_sourcefiles;
>
>  static int
>  collect_sourcefiles (Dwfl_Module *dwflmod,
> @@ -118,15 +221,14 @@ collect_sourcefiles (Dwfl_Module *dwflmod,
>                       void *arg __attribute__ ((unused)))
>  {
>    Dwarf *dbg;
> -  Dwarf_Addr bias; /* ignored - for addressing purposes only  */
> -
> +  Dwarf_Addr bias; /* ignored - for addressing purposes only. */
> +
>    dbg = dwfl_module_getdwarf (dwflmod, &bias);
>
>    Dwarf_Off offset = 0;
>    Dwarf_Off old_offset;
>    size_t hsize;
> -
> -  /* Traverse all CUs of this module.  */
> +  /* Traverse all CUs of this module. */
>    while (dwarf_nextcu (dbg, old_offset = offset, &offset, &hsize, NULL, NULL, NULL) == 0)
>      {
>        Dwarf_Die cudie_mem;
> @@ -141,7 +243,7 @@ collect_sourcefiles (Dwfl_Module *dwflmod,
>        if (dwarf_getsrcfiles (cudie, &files, &nfiles) != 0)
>          continue;
>
> -      /* extract DW_AT_comp_dir to resolve relative file names  */
> +      /* extract DW_AT_comp_dir to resolve relative file names. */
>        const char *comp_dir = "";
>        const char *const *dirs;
>        size_t ndirs;
> @@ -152,11 +254,19 @@ collect_sourcefiles (Dwfl_Module *dwflmod,
>          comp_dir = "";
>
>        if (verbose)
> -        std::clog << "searching for sources for cu=" << cuname
> +        clog << "searching for sources for cu=" << cuname
>                    << " comp_dir=" << comp_dir << " #files=" << nfiles
>                    << " #dirs=" << ndirs << endl;
> -
> -      for (size_t f = 1; f < nfiles; f++)
> +
> +      if (comp_dir[0] == '\0' && cuname[0] != '/')
> +        {
> +          /* This is a common symptom for dwz-compressed debug files,
> +             where the altdebug file cannot be resolved. */
> +          if (verbose)
> +            clog << "skipping cu=" << cuname << " due to empty comp_dir" << endl;
> +          continue;
> +        }
> +      for (size_t f = 1; f < nfiles; ++f)
>          {
>            const char *hat;
>            if (CU_only)
> @@ -172,7 +282,7 @@ collect_sourcefiles (Dwfl_Module *dwflmod,
>              continue;
>
>            if (string(hat).find("<built-in>")
> -              != std::string::npos) /* gcc intrinsics, don't bother record  */
> +              != string::npos) /* gcc intrinsics, don't bother recording */
>              continue;
>
>            string waldo;
> @@ -180,23 +290,137 @@ collect_sourcefiles (Dwfl_Module *dwflmod,
>              waldo = (string (hat));
>            else if (comp_dir[0] != '\0') /* comp_dir relative */
>              waldo = (string (comp_dir) + string ("/") + string (hat));
> -          debug_sourcefiles.insert (waldo);
> +          else
> +           {
> +             if (verbose)
> +              clog << "skipping file=" << hat << " due to empty comp_dir" << endl;
> +             continue;
> +           }
> +          waldo = canonicalize_path (waldo);
> +          debug_sourcefiles.insert (make_pair(waldo, dwflmod));
>          }
>      }
> -
>    return DWARF_CB_OK;
>  }
>
> +#ifdef HAVE_LIBARCHIVE
> +  void zip_files()

You don't have to indent within preprocessor conditionals. Especially
since the lines of code in this function are already quite long.

> +  {
> +    struct archive *a = archive_write_new();
> +    struct stat st;
> +    char buff[BUFFER_SIZE];
> +    int len;
> +    int fd;
> +    #ifdef ENABLE_LIBDEBUGINFOD
> +      /* Initialize a debuginfod client. */
> +      static unique_ptr <debuginfod_client, void (*)(debuginfod_client*)>
> +        client (debuginfod_begin(), &debuginfod_end);
> +    #endif
> +
> +    archive_write_set_format_zip(a);
> +    archive_write_open_fd(a, STDOUT_FILENO);
> +
> +    int missing_files = 0;
> +    for (const auto &pair : debug_sourcefiles)
> +    {
> +      fd = -1;
> +      const std::string &file_path = pair.first;
> +
> +      /* Attempt to query debuginfod client to fetch source files. */
> +      #ifdef ENABLE_LIBDEBUGINFOD
> +        Dwfl_Module* dwflmod = pair.second;
> +        /* Obtain source file's build ID. */
> +        const unsigned char *bits;
> +        GElf_Addr vaddr;
> +        int bits_length = dwfl_module_build_id(dwflmod, &bits, &vaddr);
> +        /* Ensure successful client and build ID acquisition. */
> +        if (client.get() != NULL && bits_length > 0)
> +        {
> +          fd = debuginfod_find_source(client.get(),
> +                                        bits, bits_length,
> +                                        file_path.c_str(), NULL);
> +        }
> +        else
> +        {
> +            if (client.get() == NULL)
> +                cerr << "Error: Failed to initialize debuginfod client." << endl;
> +            else
> +                cerr << "Error: Invalid build ID length (" << bits_length << ")." << endl;
> +        }
> +
> +      #endif
> +      if (!no_backup)
> +        /* Files could not be located using debuginfod, search locally */
> +        if (fd < 0)
> +          fd = open(file_path.c_str(), O_RDONLY);
> +      if (fd < 0)
> +      {
> +        missing_files++;
> +        continue;
> +      }
> +
> +      /* Create an entry for each file including file information to be placed in the zip. */
> +      if (fstat(fd, &st) == -1)
> +      {
> +        missing_files++;
> +        if (verbose)
> +          cerr << "Error: Failed to get file status for " << file_path << ": " << strerror(errno) << endl;
> +        continue;
> +      }
> +      struct archive_entry *entry = archive_entry_new();
> +      /* Removing first "/"" to make the path "relative" before zipping, otherwise warnings are raised when unzipping. */
> +      string entry_name = file_path.substr(file_path.find_first_of('/') + 1);
> +      archive_entry_set_pathname(entry, entry_name.c_str());
> +      archive_entry_copy_stat(entry, &st);
> +      if (archive_write_header(a, entry) != ARCHIVE_OK)
> +      {
> +        missing_files++;
> +        if (verbose)
> +          cerr << "Error: failed to write header for " << file_path << ": " << archive_error_string(a) << endl;
> +        continue;
> +      }
> +
> +      /* Write the file to the zip. */
> +      len = read(fd, buff, sizeof(buff));
> +      if (len == -1)
> +      {
> +        missing_files++;
> +        if (verbose)
> +          cerr << "Error: Failed to open file: " << file_path << ": " << strerror(errno) <<endl;
> +        continue;
> +      }
> +      if (verbose && len > 0)
> +        clog << "Writing to zip: " << file_path << endl;
> +      while (len > 0)
> +      {
> +        if (archive_write_data(a, buff, len) < ARCHIVE_OK)
> +        {
> +          if (verbose)
> +            cerr << "Error: Failed to read from the file: " << file_path << ": " << strerror(errno) << endl;
> +          break;
> +        }
> +        len = read(fd, buff, sizeof(buff));
> +      }
> +      close(fd);
> +      archive_entry_free(entry);
> +    }
> +    if (verbose && missing_files > 0 )
> +      cerr << "WARNING: " << missing_files << " files could not be found." << endl;

I think we should also print the names of the missing source files.
This information could help the user decide if the missing files are a
problem for them.

Also just a small detail, but I'd remove "WARNING" and just say:

    N files could not be found:
    file1
    file2
    ...

A file might not be found for a variety of mundane reasons but the
all-caps warning suggests to me that something is critically wrong.

> +
> +    archive_write_close(a);
> +    archive_write_free(a);
> +  }
> +#endif
>
>  int
>  main (int argc, char *argv[])
>  {
>    int remaining;
> -
> -  /* Parse and process arguments.  This includes opening the modules.  */
> +
> +  /* Parse and process arguments.  This includes opening the modules. */
>    argp_children[0].argp = dwfl_standard_argp ();
>    argp_children[0].group = 1;
> -
> +
>    Dwfl *dwfl = NULL;
>    (void) argp_parse (&argp, argc, argv, 0, &remaining, &dwfl);
>    assert (dwfl != NULL);
> @@ -204,14 +428,23 @@ main (int argc, char *argv[])
>    (void) dwfl_getmodules (dwfl, &collect_sourcefiles, NULL, 0);
>
>    if (!debug_sourcefiles.empty ())
> -    for (const string &element : debug_sourcefiles)
> +  {
> +    #ifdef HAVE_LIBARCHIVE
> +      if (zip)
> +        zip_files();
> +      else
> +    #endif
>        {
> -        std::cout << element;
> -        if (null_arg)
> -          std::cout << '\0';
> -        else
> -          std::cout << '\n';
> +        for (const auto &pair : debug_sourcefiles)
> +          {
> +            cout << pair.first;
> +            if (null_arg)
> +              cout << '\0';
> +            else
> +              cout << '\n';
> +          }
>        }
> +  }
>
>    dwfl_end (dwfl);
>    return 0;
> diff --git a/tests/debuginfod-subr.sh b/tests/debuginfod-subr.sh
> index 108dff74..c3b0603d 100755
> --- a/tests/debuginfod-subr.sh
> +++ b/tests/debuginfod-subr.sh
> @@ -79,12 +79,12 @@ errfiles() {
>  # So we gather the LD_LIBRARY_PATH with this cunning trick:
>  ldpath=`testrun sh -c 'echo $LD_LIBRARY_PATH'`
>
> -wait_ready()
> +wait_ready4()
>  {
>    port=$1;
>    what=$2;
>    value=$3;
> -  timeout=20;
> +  timeout=$4;
>
>    echo "Wait $timeout seconds on $port for metric $what to change to $value"
>    while [ $timeout -gt 0 ]; do
> @@ -105,6 +105,16 @@ wait_ready()
>    fi
>  }
>
> +wait_ready()
> +{
> +  port=$1;
> +  what=$2;
> +  value=$3;
> +  timeout=20;
> +  wait_ready4 "$port" "$what" "$value" "$timeout"
> +}
> +
> +
>  archive_test() {
>      __BUILDID=$1
>      __SOURCEPATH=$2
> diff --git a/tests/run-srcfiles-self.sh b/tests/run-srcfiles-self.sh
> index 0e64dd2b..1ee460f6 100755
> --- a/tests/run-srcfiles-self.sh
> +++ b/tests/run-srcfiles-self.sh
> @@ -1,4 +1,4 @@
> -#! /bin/sh
> +#! /usr/bin/env bash
>  # Copyright (C) 2023 Red Hat, Inc.
>  # This file is part of elfutils.
>  #
> @@ -15,7 +15,16 @@
>  # You should have received a copy of the GNU General Public License
>  # along with this program.  If not, see <http://www.gnu.org/licenses/>.
>
> -. $srcdir/test-subr.sh
> +. $srcdir/debuginfod-subr.sh
> +
> +# prereqs
> +type unzip 2>/dev/null || exit 77

We could check for unzip when it's actually needed. This way
we can at least run the tests that don't rely on unzip.

> +
> +# for test case debugging, uncomment:
> +# set -x
> +set -e
> +base=14000
> +get_ports
>
>  # Test different command line combinations on the srcfiles binary itself.
>  ET_EXEC="${abs_top_builddir}/src/srcfiles"
> @@ -26,11 +35,16 @@ SRC_NAME="srcfiles.cxx"
>  # Ensure the output contains the expected source file srcfiles.cxx
>  testrun $ET_EXEC -e $ET_EXEC | grep $SRC_NAME > /dev/null
>
> +# Check if zip option is available (only available if libarchive is available.
> +#   Debuginfod optional to fetch source files from debuginfod federation.)
> +$ET_EXEC --help | grep -q zip && zip=true || zip=false

If -z is changed to be always a valid option even when support
is missing (as suggested above), then this check for -z support
will have to change.

> +
>  for null_arg in --null ""; do
>    for verbose_arg in --verbose ""; do
> +    echo "Test with options $null_arg $verbose_arg"
>      testrun $ET_EXEC $null_arg $verbose_arg -p $ET_PID > /dev/null
>
> -    # Ensure that the output contains srclines.cxx
> +    # Ensure that the output contains srcfiles.cxx
>      cu_only=$(testrun $ET_EXEC $null_arg $verbose_arg -c -e $ET_EXEC)
>      default=$(testrun $ET_EXEC $null_arg $verbose_arg -e $ET_EXEC)
>      result1=$(echo "$cu_only" | grep "$SRC_NAME")
> @@ -40,9 +54,64 @@ for null_arg in --null ""; do
>        exit 1
>      fi
>
> -    # Ensure that the output with the cu-only option contains less source files
> +    # Ensure that the output with the cu-only option contains fewer source files
>      if [ $(echo "$cu_only" | wc -m) -gt $(echo "$default" | wc -m) ]; then
>        exit 1
>      fi
> +
> +    if $zip; then
> +      # Zip option tests
> +        testrun $ET_EXEC $verbose_arg -z -e $ET_EXEC > test.zip
> +        tempfiles test.zip
> +
> +        unzip -v test.zip
> +        unzip -t test.zip
> +
> +        # Ensure unzipped srcfiles.cxx and its contents are the same as the original source file
> +        unzip -j test.zip "*/$SRC_NAME"
> +        diff "$SRC_NAME" $abs_srcdir/../src/$SRC_NAME
> +        rm -f test.zip $SRC_NAME
> +    fi
>    done
>  done
> +
> +# Debuginfod source file downloading test.
> +# Start debuginfod server on the elfutils build directory.
> +if [ -x ${abs_builddir}/../debuginfod/debuginfod ] && $zip; then
> +  LD_LIBRARY_PATH=$ldpath ${abs_builddir}/../debuginfod/debuginfod -vvvv -d debuginfod.sqlite3 -F -p $PORT1 ${abs_top_builddir}/src > debuginfod.log 2>&1 &
> +  PID1=$!
> +  tempfiles debuginfod.sqlite3 debuginfod.log
> +  wait_ready $PORT1 'ready' 1
> +  wait_ready $PORT1 'thread_work_total{role="traverse"}' 1
> +  wait_ready $PORT1 'thread_work_pending{role="scan"}' 0
> +  wait_ready4 $PORT1 'thread_busy{role="scan"}' 0 300 # lots of source files may be slow to index with $db on NFS
> +
> +  export DEBUGINFOD_URLS="http://localhost:${PORT1}/"
> +  export DEBUGINFOD_VERBOSE=1
> +  testrun $ET_EXEC -z -b -e $ET_EXEC > test.zip
> +  tempfiles test.zip
> +
> +  unzip -v test.zip
> +  unzip -t test.zip
> +
> +  # Extract the zip.
> +  mkdir extracted
> +  unzip test.zip -d extracted
> +
> +  # Ensure that source files for this tool have been archived.
> +  source_files="srcfiles.cxx libdwfl.h gelf.h"
> +  extracted_files=$(find extracted -type f)
> +  for file in $source_files; do
> +      echo "$extracted_files" | grep -q "$file" > /dev/null
> +  done
> +
> +  # Compare between the extracted file and the actual source file srcfiles.cxx.
> +  extracted_file=$(find extracted -name $SRC_NAME)
> +  diff "$extracted_file" $abs_srcdir/../src/$SRC_NAME
> +
> +  rm -rf extracted
> +
> +  kill $PID1
> +  wait
> +  PID1=0
> +fi
> --
> 2.43.0
>

Nice, the tests pass for me even if libarchive and/or debuginfod is missing.

Aaron


  reply	other threads:[~2024-01-23  3:03 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-19 16:47 Housam Alamour
2024-01-23  3:02 ` Aaron Merey [this message]
2024-01-26 21:00   ` [PATCH] srcfiles.cxx: Addressed all patch review notes. typos, formatting, documentation, show -z and -b option even if libarchive is not available (and adjust test script accordingly), show source files missing from -z option Housam Alamour
2024-01-29 23:45     ` Aaron Merey
     [not found] <CAF=UuGOz9WjpHRdbHbwgcrETmUZQqYQFgQrJd0TOBT3A_J0JYw@mail.gmail.com>
2024-02-02 15:49 ` [PATCH] PR 30991: srcfiles tarball feature Housam Alamour
2024-02-06  0:24   ` Aaron Merey
2024-02-06 12:03     ` Mark Wielaard
2024-02-06 14:47       ` Frank Ch. Eigler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJDtP-Rdh=qj2x8Nmn0jmOO_VzCaGb+jcWq3WwOYch9RcgLNmg@mail.gmail.com' \
    --to=amerey@redhat.com \
    --cc=elfutils-devel@sourceware.org \
    --cc=halamour@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).