public inbox for elfutils@sourceware.org
 help / color / mirror / Atom feed
From: Mark Wielaard <mark@klomp.org>
To: "Frank Ch. Eigler" <fche@redhat.com>
Cc: elfutils-devel@sourceware.org, amerey@redhat.com
Subject: Re: patch 2/2 debuginfod server etc.
Date: Thu, 21 Nov 2019 14:16:00 -0000	[thread overview]
Message-ID: <770d563fc3cb0681678c919024edcd5ce13d874a.camel@klomp.org> (raw)
In-Reply-To: <7f1273e6dbfd52b95e9f8e86f6096fe46e800745.camel@klomp.org>

[-- Attachment #1: Type: text/plain, Size: 1341 bytes --]

Hi,

On Wed, 2019-11-20 at 12:53 +0100, Mark Wielaard wrote:
> Sure, you could use that if you wanted to share your whole build/source
> trees and don't mind serving any other files on some local network. I
> just think it shouldn't be the default. If you go look for odd paths in
> .debug files you probably will find them. We already know some builds
> generate and/or build files in /tmp or outside the src/builddir.
> 
> I'll look to see what is necessary to make sure none of those leak out
> by default.

The attached is what I came up with.

It simply splits the paths into those scanned for rpms, those scanned
for files and (optional) paths that are extra trusted prefixes for
source files. The paths that are scanned for files are trusted source
prefixes by default. There is a new option to also remove those using
-N, --no-files-sources). And you can switch back to allowing all files
on the file system with -A, --all-sources.

I think this provides a way to do what we both want, it just makes
things a little bit more explicit.

As a bonus it separates scanning trees for files and rpms, so no
unnecessary work is done.

I haven't updated the documentation yet. Let me know what you think
about the patch and I can update the documentation if we agree on the
options/defaults.

Cheers,

Mark

[-- Attachment #2: 0001-debuginfod-Separate-files-rpms-and-source-paths.patch --]
[-- Type: text/x-patch, Size: 9810 bytes --]

From 3755f47f65b8fd2b5d59ecaa9c14319b9ffa3487 Mon Sep 17 00:00:00 2001
From: Mark Wielaard <mark@klomp.org>
Date: Thu, 21 Nov 2019 14:17:12 +0100
Subject: [PATCH] debuginfod: Separate files, rpms and source paths.

Process paths for rpms, files and sources separately. This makes it
possible to scan just rpms, just executable and debug files and index
source prefixes independently.

-F, --scan-file-dir and -R, --scan-rpm-dir now take a PATH.
A new option -S, --sources=PATH has been added to add extra source
prefixes that are allowed to be indexed. The -F file dirs are added
automatically unless -N, --no-files-sources is given. To allow all
files reachable on the file system to be indexed use -A, --all-sources.

Signed-off-by: Mark Wielaard <mark@klomp.org>
---
 config/debuginfod.sysconfig  |   2 +-
 debuginfod/debuginfod.cxx    | 115 +++++++++++++++++++++++++++++------
 tests/run-debuginfod-find.sh |   4 +-
 3 files changed, 101 insertions(+), 20 deletions(-)

diff --git a/config/debuginfod.sysconfig b/config/debuginfod.sysconfig
index 2de067d6..4476d90c 100644
--- a/config/debuginfod.sysconfig
+++ b/config/debuginfod.sysconfig
@@ -3,7 +3,7 @@ DEBUGINFOD_PORT="8002"
 #DEBUGINFOD_VERBOSE="-v"
 
 # some common places to find trustworthy ELF/DWARF files and RPMs
-DEBUGINFOD_PATHS="-t43200 -F -R /usr/lib/debug /usr/bin /usr/libexec /usr/sbin /usr/lib /usr/lib64 /usr/local /opt /var/cache/yum /var/cache/dnf"
+DEBUGINFOD_PATHS="-t43200 -F /usr/lib/debug -F /usr/bin -F /usr/libexec -F /usr/sbin -F /usr/lib -F /usr/lib64 -R /var/cache/yum -R /var/cache/dnf -N -S /usr/src/debug"
 
 # prefer reliability/durability over performance
 #DEBUGINFOD_PRAGMAS="-D 'pragma synchronous=full;'"
diff --git a/debuginfod/debuginfod.cxx b/debuginfod/debuginfod.cxx
index 7baf18ec..5a08b134 100644
--- a/debuginfod/debuginfod.cxx
+++ b/debuginfod/debuginfod.cxx
@@ -324,11 +324,22 @@ ARGP_PROGRAM_BUG_ADDRESS_DEF = PACKAGE_BUGREPORT;
 static const struct argp_option options[] =
   {
    { NULL, 0, NULL, 0, "Scanners:", 1 },
-   { "scan-file-dir", 'F', NULL, 0, "Enable ELF/DWARF file scanning threads.", 0 },
-   { "scan-rpm-dir", 'R', NULL, 0, "Enable RPM scanning threads.", 0 },
+   { "scan-file-dir", 'F', "PATH", 0,
+     "Scan ELF/DWARF files under PATH.", 0 },
+   { "scan-rpm-dir", 'R', "PATH", 0,
+     "Scan RPMs under PATH.", 0 },
    // "source-oci-imageregistry"  ...
 
-   { NULL, 0, NULL, 0, "Options:", 2 },
+   { NULL, 0, NULL, 0, "Sources:", 2 },
+   { "sources", 'S', "PATH", 0,
+     "Index sources under PATH.", 0 },
+   { "all-sources", 'A', NULL, 0,
+     "Index all sources accessible on the file system.", 0 },
+   { "no-files-sources", 'N', NULL, 0,
+     "Do not include the scan-file-dir PATHs in the default sources PATHs.",
+     0 },
+
+   { NULL, 0, NULL, 0, "Options:", 3 },
    { "logical", 'L', NULL, 0, "Follow symlinks, default=ignore.", 0 },
    { "rescan-time", 't', "SECONDS", 0, "Number of seconds to wait between rescans, 0=disable.", 0 },
    { "groom-time", 'g', "SECONDS", 0, "Number of seconds to wait between database grooming, 0=disable.", 0 },
@@ -345,7 +356,7 @@ static const struct argp_option options[] =
   };
 
 /* Short description of program.  */
-static const char doc[] = "Serve debuginfo-related content across HTTP from files under PATHs.";
+static const char doc[] = "Serve debuginfo-related content across HTTP from files under rpms/files/sources PATHs.";
 
 /* Strings for arguments in help texts.  */
 static const char args_doc[] = "[PATH ...]";
@@ -371,9 +382,11 @@ static unsigned rescan_s = 300;
 static unsigned groom_s = 86400;
 static unsigned maxigroom = false;
 static unsigned concurrency = std::thread::hardware_concurrency() ?: 1;
-static set<string> source_paths;
-static bool scan_files = false;
-static bool scan_rpms = false;
+static set<string> rpms_paths;
+static set<string> files_paths;
+static set<string> sources_paths;
+static bool sources_files_ok = true;
+static bool sources_all_ok = false;
 static vector<string> extra_ddl;
 static regex_t file_include_regex;
 static regex_t file_exclude_regex;
@@ -401,8 +414,12 @@ parse_opt (int key, char *arg,
     case 'p': http_port = (unsigned) atoi(arg);
       if (http_port > 65535) argp_failure(state, 1, EINVAL, "port number");
       break;
-    case 'F': scan_files = true; break;
-    case 'R': scan_rpms = true; break;
+    case 'F':
+      files_paths.insert(string(arg));
+      break;
+    case 'R':
+      rpms_paths.insert(string(arg));
+      break;
     case 'L':
       traverse_logical = true;
       break;
@@ -433,8 +450,24 @@ parse_opt (int key, char *arg,
       if (rc != 0)
         argp_failure(state, 1, EINVAL, "regular expession");
       break;
-    case ARGP_KEY_ARG:
-      source_paths.insert(string(arg));
+    case 'S':
+      {
+	sources_paths.insert(string(arg));
+	/* Also include resolved/real path in case it is different.
+	   set.insert wil deduplicate when they are the same. */
+	char *realp = realpath(arg, NULL);
+	if (realp != NULL)
+	  {
+	    sources_paths.insert(string(realp));
+	    free(realp);
+	  }
+      }
+      break;
+    case 'N':
+      sources_files_ok = false;
+      break;
+    case 'A':
+      sources_all_ok = false;
       break;
       // case 'h': argp_state_help (state, stderr, ARGP_HELP_LONG|ARGP_HELP_EXIT_OK);
     default: return ARGP_ERR_UNKNOWN;
@@ -1707,7 +1740,8 @@ scan_source_file_path (const string& dir)
                       .step_ok_done();
                   }
 
-                if (sourcefiles.size() && buildid != "")
+                if (sourcefiles.size() && buildid != ""
+		    && (sources_all_ok || sources_paths.size () > 0))
                   {
                     fts_sourcefiles += sourcefiles.size();
 
@@ -1720,6 +1754,23 @@ scan_source_file_path (const string& dir)
                         string srps = string(srp);
                         free (srp);
 
+			/* Make sure the source file (real)path is
+			   acceptable to include in the index by
+			   checking against the allowed sources_path
+			   dir prefixes.  */
+			bool ok = sources_all_ok;
+			if (! ok)
+			  for (auto&& path : sources_paths)
+			    if (dwarfsrc.find(path) == 0
+				|| srps.find(path) == 0)
+			      {
+				ok = true;
+				break;
+			      }
+
+			if (! ok)
+			  continue;
+
                         struct stat sfs;
                         rc = stat(srps.c_str(), &sfs);
                         if (rc != 0)
@@ -2472,9 +2523,39 @@ main (int argc, char *argv[])
       error (EXIT_FAILURE, 0,
              "unexpected argument: %s", argv[remaining]);
 
-  if (!scan_rpms && !scan_files && source_paths.size()>0)
-    obatched(clog) << "warning: without -F and/or -R, ignoring PATHs" << endl;
-  
+  /* By default the dirs that we scan for files are also ok as sources
+     index prefix.  */
+  if (! sources_all_ok && sources_files_ok)
+    for (auto&& file_path : files_paths)
+      {
+	sources_paths.insert(file_path);
+	char *rpath = realpath(file_path.c_str(), NULL);
+	if (rpath != NULL)
+	  {
+	    sources_paths.insert(string(rpath));
+	    free(rpath);
+	  }
+      }
+
+  if (verbose > 1)
+    {
+      for (auto&& p : rpms_paths)
+	obatched(clog) << "Scanning rpms from " << p << endl;
+
+      for (auto&& p : files_paths)
+	obatched(clog) << "Scanning files from " << p << endl;
+
+      if (sources_all_ok)
+	obatched(clog)
+	  << "All sources found on the file system are OK" << endl;
+      else
+	for (auto&& p : sources_paths)
+	  obatched(clog) << "Indexing sources from " << p << endl;
+    }
+
+  if (! sources_all_ok && sources_paths.size() == 0)
+    obatched(clog) << "warning: no sources will be indexed" << endl;
+
   (void) signal (SIGPIPE, SIG_IGN); // microhttpd can generate it incidentally, ignore
   (void) signal (SIGINT, signal_handler); // ^C
   (void) signal (SIGHUP, signal_handler); // EOF
@@ -2607,7 +2688,7 @@ main (int argc, char *argv[])
   if (rc < 0)
     error (0, 0, "warning: cannot spawn thread (%d) to groom database\n", rc);
 
-  if (scan_files) for (auto&& it : source_paths)
+  for (auto&& it : files_paths)
     {
       pthread_t pt;
       rc = pthread_create (& pt, NULL, thread_main_scan_source_file_path, (void*) it.c_str());
@@ -2617,7 +2698,7 @@ main (int argc, char *argv[])
         source_file_scanner_threads.push_back(pt);
     }
 
-  if (scan_rpms) for (auto&& it : source_paths)
+  for (auto&& it : rpms_paths)
     {
       pthread_t pt;
       rc = pthread_create (& pt, NULL, thread_main_scan_source_rpm_path, (void*) it.c_str());
diff --git a/tests/run-debuginfod-find.sh b/tests/run-debuginfod-find.sh
index d240257c..5c6e3575 100755
--- a/tests/run-debuginfod-find.sh
+++ b/tests/run-debuginfod-find.sh
@@ -45,7 +45,7 @@ mkdir F R L
 # not tempfiles F R L - they are directories which we clean up manually
 ln -s ${abs_builddir}/dwfllines L/foo   # any program not used elsewhere in this test
 
-env DEBUGINFOD_TEST_WEBAPI_SLEEP=3 LD_LIBRARY_PATH=$ldpath DEBUGINFOD_URLS= ${abs_builddir}/../debuginfod/debuginfod -F -R -vvvv -d $DB -p $PORT1 -t0 -g0 R F L &
+env DEBUGINFOD_TEST_WEBAPI_SLEEP=3 LD_LIBRARY_PATH=$ldpath DEBUGINFOD_URLS= ${abs_builddir}/../debuginfod/debuginfod -vvvv -d $DB -p $PORT1 -t0 -g0 -R R -F F -F L -S ${PWD}&
 PID1=$!
 sleep 3
 export DEBUGINFOD_URLS=http://localhost:$PORT1/   # or without trailing /
@@ -197,7 +197,7 @@ export DEBUGINFOD_CACHE_PATH=${PWD}/.client_cache2
 mkdir -p $DEBUGINFOD_CACHE_PATH
 # NB: inherits the DEBUGINFOD_URLS to the first server
 # NB: run in -L symlink-following mode for the L subdir
-env LD_LIBRARY_PATH=$ldpath ${abs_builddir}/../debuginfod/debuginfod -F -vvvv -d ${DB}_2 -p $PORT2 -L L &
+env LD_LIBRARY_PATH=$ldpath ${abs_builddir}/../debuginfod/debuginfod  -vvvv -d ${DB}_2 -p $PORT2 -L -F L &
 PID2=$!
 tempfiles ${DB}_2
 sleep 3
-- 
2.18.1


  parent reply	other threads:[~2019-11-21 14:16 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-28 19:04 patch 0/2 debuginfod submission Frank Ch. Eigler
2019-10-28 19:06 ` patch 1/2 debuginfod client Frank Ch. Eigler
2019-10-28 19:09   ` patch 2/2 debuginfod server etc Frank Ch. Eigler
2019-11-04 21:48     ` patch 3/3 debuginfod client interruptability Frank Ch. Eigler
2019-11-07  9:07       ` patch 4 debuginfod: symlink following mode Frank Ch. Eigler
2019-11-07  9:08         ` patch 5 debuginfod: prometheus metrics Frank Ch. Eigler
2019-11-15 17:26           ` Mark Wielaard
2019-11-15 17:58             ` Frank Ch. Eigler
2019-11-18 16:20               ` Mark Wielaard
2019-11-18 16:48                 ` Frank Ch. Eigler
2019-11-19 16:13                   ` Mark Wielaard
2019-11-15 16:49         ` patch 4 debuginfod: symlink following mode Mark Wielaard
2019-11-15 18:31           ` Frank Ch. Eigler
2019-11-18 16:27             ` Mark Wielaard
2019-11-15 16:16       ` patch 3/3 debuginfod client interruptability Mark Wielaard
2019-11-15 17:03         ` Aaron Merey
2019-11-15 17:35           ` Mark Wielaard
2019-11-15 18:14             ` Pedro Alves
2019-11-17 23:44               ` Mark Wielaard
2019-11-18  2:50                 ` Frank Ch. Eigler
2019-11-18  9:24                   ` Pedro Alves
2019-11-19 12:58                   ` Mark Wielaard
2019-11-13 17:22     ` patch 2/2 debuginfod server etc Mark Wielaard
2019-11-14 11:54       ` Frank Ch. Eigler
2019-11-16  1:31         ` Mark Wielaard
2019-11-13 23:19     ` Mark Wielaard
2019-11-14 12:30       ` Frank Ch. Eigler
2019-11-18 14:17         ` Mark Wielaard
2019-11-18 18:41           ` Frank Ch. Eigler
2019-11-19 15:41             ` Mark Wielaard
2019-11-19 16:13               ` Frank Ch. Eigler
2019-11-19 20:11                 ` Mark Wielaard
2019-11-19 21:15                   ` Frank Ch. Eigler
2019-11-20 11:53                     ` Mark Wielaard
2019-11-20 12:29                       ` Frank Ch. Eigler
2019-11-21 14:16                       ` Mark Wielaard [this message]
2019-11-21 15:40                         ` Mark Wielaard
2019-11-21 16:01                           ` Frank Ch. Eigler
2019-11-21 15:58                         ` Frank Ch. Eigler
2019-11-21 16:37                           ` Mark Wielaard
2019-11-21 17:18                             ` Frank Ch. Eigler
2019-11-21 20:42                               ` Mark Wielaard
2019-11-22 12:08                                 ` Mark Wielaard
2019-11-14 20:45     ` Mark Wielaard
2019-11-15 11:03       ` Mark Wielaard
2019-11-15 21:00       ` Frank Ch. Eigler
2019-11-18 15:01         ` Mark Wielaard
2019-11-15 14:40     ` Mark Wielaard
2019-11-15 19:54       ` Frank Ch. Eigler
2019-11-18 15:31         ` Mark Wielaard
2019-11-18 15:49           ` Frank Ch. Eigler
2019-11-12 11:12   ` patch 1/2 debuginfod client Mark Wielaard
2019-11-12 15:14     ` Frank Ch. Eigler
2019-11-12 21:59       ` Mark Wielaard
2019-11-14  0:33         ` Frank Ch. Eigler
2019-11-15 21:33       ` Mark Wielaard
2019-11-12 21:25   ` Mark Wielaard
2019-11-13 23:25     ` Frank Ch. Eigler
2019-11-16  0:46       ` Mark Wielaard
2019-11-16 18:53         ` Frank Ch. Eigler
2019-11-18 17:17           ` Mark Wielaard
2019-11-18 20:33             ` Frank Ch. Eigler
2019-11-19 15:57               ` Mark Wielaard
2019-11-19 16:20                 ` Frank Ch. Eigler
2019-11-19 20:16                   ` Mark Wielaard
2019-11-19 21:22                     ` Frank Ch. Eigler
2019-11-20 12:50                       ` Mark Wielaard
2019-11-20 13:30                         ` Frank Ch. Eigler
2019-11-21 14:02                           ` Mark Wielaard
2019-11-13 13:57   ` Mark Wielaard
2019-11-14 11:24     ` Frank Ch. Eigler
2019-11-16  0:52       ` Mark Wielaard
2019-11-16  2:28         ` Frank Ch. Eigler
2019-10-30 11:04 ` patch 0/2 debuginfod submission Mark Wielaard
2019-10-30 13:40   ` Frank Ch. Eigler
2019-10-30 14:12     ` Mark Wielaard
2019-10-30 18:11       ` Frank Ch. Eigler
2019-10-31 11:18         ` Mark Wielaard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=770d563fc3cb0681678c919024edcd5ce13d874a.camel@klomp.org \
    --to=mark@klomp.org \
    --cc=amerey@redhat.com \
    --cc=elfutils-devel@sourceware.org \
    --cc=fche@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).