From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 62391 invoked by alias); 5 Feb 2020 20:09:28 -0000 Mailing-List: contact elfutils-devel-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Post: List-Help: List-Subscribe: Sender: elfutils-devel-owner@sourceware.org Received: (qmail 62317 invoked by uid 89); 5 Feb 2020 20:09:26 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Checked: by ClamAV 0.100.3 on sourceware.org X-Virus-Found: No X-Spam-SWARE-Status: No, score=-16.7 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 spammy=awaiting, H*r:sk:elfutil, accumulated, distro X-Spam-Status: No, score=-16.7 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on sourceware.org X-Spam-Level: X-HELO: us-smtp-1.mimecast.com Received: from us-smtp-delivery-1.mimecast.com (HELO us-smtp-1.mimecast.com) (207.211.31.120) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 05 Feb 2020 20:09:24 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1580933363; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=dkSnOFgDqYSqwnWZoRrkq9fNgNl2k3k4QbbqPtIbwMI=; b=MuQsb5BZ6dUGVHzU9AFxpJxuhRmmdjEPkvFirhLRF6E7TTWEPeq7A+kOMfsuOWFYLcqmOg cAHQBoPHov7loAHaC0Hv5ISf/eM4BTmSdDhp9MfZHdMKDEzm46sChoYElUSuuFd30gfcS4 olpa7vzHUe4KDJwB0fIwK6f8wMyQyC4= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-245-ZqzcpkPxOCu6Ni0_SvgOQQ-1; Wed, 05 Feb 2020 15:09:21 -0500 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 6E7D5DB66 for ; Wed, 5 Feb 2020 20:09:20 +0000 (UTC) Received: from redhat.com (ovpn-116-36.phx2.redhat.com [10.3.116.36]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 3765B87EFF for ; Wed, 5 Feb 2020 20:09:20 +0000 (UTC) Received: from fche by redhat.com with local (Exim 4.92) (envelope-from ) id 1izQz9-0000Ld-1n for elfutils-devel@sourceware.org; Wed, 05 Feb 2020 15:09:19 -0500 Date: Wed, 05 Feb 2020 20:09:00 -0000 From: "Frank Ch. Eigler" To: elfutils-devel@sourceware.org Subject: patch rfc: debuginfod -Z (generalized archive) support Message-ID: <20200205200918.GA1336@redhat.com> MIME-Version: 1.0 User-Agent: Mutt/1.12.0 (2019-05-25) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-MC-Unique: ZqzcpkPxOCu6Ni0_SvgOQQ-1 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Content-Disposition: inline X-IsSubscribed: yes X-SW-Source: 2020-q1/txt/msg00068.txt Hi - A little extension lets us process arch-linux archives. Awaiting for some small test .pkg's from the arch folks for the elfutils testsuite. However, hand-testing on severa larger files works! commit b51ae89befeb81c8b51b15b7168c6e616255b486 (fche/pacman-Z) Author: Frank Ch. Eigler Date: Wed Feb 5 15:04:18 2020 -0500 debuginfod: generalized archive support =20=20=20=20 Add a '-Z EXT=3DCMD' option to debuginfod, which lets it scan any given extension and run CMD on it to unwrap distro archives. For example, for arch-linux pacman files, -Z '.tar.zst=3Dzstdcat' lets debuginfod grok debug and source content in split-debuginfo files. diff --git a/debuginfod/ChangeLog b/debuginfod/ChangeLog index 8c97fdcf7085..d812e6d71ff0 100644 --- a/debuginfod/ChangeLog +++ b/debuginfod/ChangeLog @@ -1,3 +1,9 @@ +2020-02-05 Frank Ch. Eigler + + * debuginfod.cxx (argp options): Add -Z option. + (canonicalized_archive_entry_pathname): New function for + distro-agnostic file name matching/storage. + 2020-01-22 Frank Ch. Eigler =20 * debuginfod.cxx (dwarf_extract_source_paths): Don't print diff --git a/debuginfod/debuginfod.cxx b/debuginfod/debuginfod.cxx index 623dbc593c70..0de6bbaea0ee 100644 --- a/debuginfod/debuginfod.cxx +++ b/debuginfod/debuginfod.cxx @@ -333,9 +333,10 @@ ARGP_PROGRAM_BUG_ADDRESS_DEF =3D PACKAGE_BUGREPORT; static const struct argp_option options[] =3D { { NULL, 0, NULL, 0, "Scanners:", 1 }, - { "scan-file-dir", 'F', NULL, 0, "Enable ELF/DWARF file scanning thread= s.", 0 }, - { "scan-rpm-dir", 'R', NULL, 0, "Enable RPM scanning threads.", 0 }, - { "scan-deb-dir", 'U', NULL, 0, "Enable DEB scanning threads.", 0 }, + { "scan-file-dir", 'F', NULL, 0, "Enable ELF/DWARF file scanning.", 0 }, + { "scan-rpm-dir", 'R', NULL, 0, "Enable RPM scanning.", 0 }, + { "scan-deb-dir", 'U', NULL, 0, "Enable DEB scanning.", 0 }, + { "scan-archive", 'Z', "EXT=3DCMD", 0, "Enable arbitrary archive scanni= ng.", 0 }, // "source-oci-imageregistry" ... =20 { NULL, 0, NULL, 0, "Options:", 2 }, @@ -428,6 +429,15 @@ parse_opt (int key, char *arg, scan_archives[".deb"]=3D"dpkg-deb --fsys-tarfile"; scan_archives[".ddeb"]=3D"dpkg-deb --fsys-tarfile"; break; + case 'Z': + { + char* extension =3D strchr(arg, '=3D'); + if (extension) + scan_archives[string(arg, (extension-arg))]=3Dstring(extension+1= ); + else + argp_failure(state, 1, EINVAL, "bad EXT=3DCMD format"); + } + break; case 'L': traverse_logical =3D true; break; @@ -1068,6 +1078,25 @@ class libarchive_fdcache static libarchive_fdcache fdcache; =20 =20 +// For security/portability reasons, many distro-package archives have +// a "./" in front of path names; others have nothing, others have +// "/". Canonicalize them all to a single leading "/", with the +// assumption that this matches the dwarf-derived file names too. +string canonicalized_archive_entry_pathname(struct archive_entry *e) +{ + string fn =3D archive_entry_pathname(e); + if (fn.size() =3D=3D 0) + return fn; + if (fn[0] =3D=3D '/') + return fn; + if (fn[0] =3D=3D '.') + return fn.substr(1); + else + return string("/")+fn; +} + + + static struct MHD_Response* handle_buildid_r_match (int64_t b_mtime, const string& b_source0, @@ -1162,8 +1191,8 @@ handle_buildid_r_match (int64_t b_mtime, if (! S_ISREG(archive_entry_mode (e))) // skip non-files completely continue; =20 - string fn =3D archive_entry_pathname (e); - if (fn !=3D string(".")+b_source1) + string fn =3D canonicalized_archive_entry_pathname (e); + if (fn !=3D b_source1) continue; =20 // extract this file to a temporary file @@ -2055,9 +2084,7 @@ archive_classify (const string& rps, string& archive_= extension, if (! S_ISREG(archive_entry_mode (e))) // skip non-files complet= ely continue; =20 - string fn =3D archive_entry_pathname (e); - if (fn.size() > 1 && fn[0] =3D=3D '.') - fn =3D fn.substr(1); // trim off the leading '.' + string fn =3D canonicalized_archive_entry_pathname (e); =20 if (verbose > 3) obatched(clog) << "libarchive checking " << fn << endl; @@ -2764,7 +2791,7 @@ main (int argc, char *argv[]) "unexpected argument: %s", argv[remaining]); =20 if (scan_archives.size()=3D=3D0 && !scan_files && source_paths.size()>0) - obatched(clog) << "warning: without -F -R -U, ignoring PATHs" << endl; + obatched(clog) << "warning: without -F -R -U -Z, ignoring PATHs" << en= dl; =20 fdcache.limit(fdcache_fds, fdcache_mbs); =20 @@ -2894,7 +2921,7 @@ main (int argc, char *argv[]) obatched ob(clog); auto& o =3D ob << "scanning archive types "; for (auto&& arch : scan_archives) - o << arch.first << " "; + o << arch.first << "(" << arch.second << ") "; o << endl; } const char* du =3D getenv(DEBUGINFOD_URLS_ENV_VAR); diff --git a/doc/ChangeLog b/doc/ChangeLog index 651ea33d4106..36094d002f75 100644 --- a/doc/ChangeLog +++ b/doc/ChangeLog @@ -1,3 +1,7 @@ +2020-02-05 Frank Ch. Eigler + + * debuginfod.8: Document new -Z flag and tweak other bits. + 2020-01-10 Mark Wielaard =20 * debuginfod_find_debuginfo.3 (DEBUGINFOD_PROGRESS): Mention progress diff --git a/doc/debuginfod.8 b/doc/debuginfod.8 index 166c7c4590ed..d6561edf7159 100644 --- a/doc/debuginfod.8 +++ b/doc/debuginfod.8 @@ -61,20 +61,22 @@ or ^C .ESAMPLE =20 -If the \fB\-R\fP and/or \fB-U\fP option is given, each file is scanned -as an archive file that may contain ELF/DWARF/source files. If \-R is -given, the will scan RPMs; and/or if \-U is given, they will scan DEB -/ DDEB files. (The terms RPM and DEB and DDEB are used synonymously -as "archives" in diagnostic messages.) Because of complications such -as DWZ-compressed debuginfo, may require \fItwo\fP traversal passes to -identify all source code. Source files for RPMs are only served from -other RPMs, so the caution for \-F does not apply. Note that due to -Debian/Ubuntu packaging policies & mechanisms, debuginfod cannot -resolve source files for DEB/DDEB at all. - -If no PATH is listed, or neither \fB\-F\fP nor \fB\-R\fP nor \fB\-U\fP -option is given, then \fBdebuginfod\fP will simply serve content that -it accumulated into its index in all previous runs. +If any of the \fB\-R\fP, \fB-U\fP, or \fB-Z\fP options is given, each +file is scanned as an archive file that may contain ELF/DWARF/source +files. Archive files are recognized by extension. If \-R is given, +".rpm" files are scanned; if \-D is given, ".deb" and ".ddeb" files +are scanned; if \-Z is given, the listed extensions are scanned. +Because of complications such as DWZ-compressed debuginfo, may require +\fItwo\fP traversal passes to identify all source code. Source files +for RPMs are only served from other RPMs, so the caution for \-F does +not apply. Note that due to Debian/Ubuntu packaging policies & +mechanisms, debuginfod cannot resolve source files for DEB/DDEB at +all. + +If no PATH is listed, or none of the scanning options is given, then +\fBdebuginfod\fP will simply serve content that it accumulated into +its index in all previous runs, and federate to any upstream +debuginfod servers. =20 =20 .SH OPTIONS @@ -91,6 +93,16 @@ Activate RPM patterns in archive scanning. The default = is off. .B "\-U" Activate DEB/DDEB patterns in archive scanning. The default is off. =20 +.TP +.B "\-Z EXT=3DCMD" +Activate an additional pattern in archive scanning. Files with name +extension EXT will be processed with CMD. CMD is invoked with the +file name added to its argument list, and is should produce the +archive on its standard output. debuginfod uses libarchive to consume +the result, so it can accept a wide range of archive formats and +compression. (Include the dot in EXT.) The default is no additional +patterns. This option may be repeated. + .TP .B "\-d FILE" "\-\-database=3DFILE" Set the path of the sqlite database used to store the index. This @@ -123,7 +135,8 @@ against the full path of each file, based on its \fBrea= lpath(3)\fP canonicalization. By default, all files are included and none are excluded. A file that matches both include and exclude REGEX is excluded. (The \fIcontents\fP of archive files are not subject to -inclusion or exclusion filtering: they are all processed.) +inclusion or exclusion filtering: they are all processed.) Only the +last of each type of regular expression given is used. =20 .TP .B "\-t SECONDS" "\-\-rescan\-time=3DSECONDS"