PR25394 patch: debuginfod groom vs. scan race condition

public inbox for elfutils@sourceware.org
 help / color / mirror / Atom feed

* PR25394 patch: debuginfod groom vs. scan race condition
@ 2020-01-20  1:43 Frank Ch. Eigler
  2020-01-20 17:03 ` Mark Wielaard
  0 siblings, 1 reply; 4+ messages in thread
From: Frank Ch. Eigler @ 2020-01-20  1:43 UTC (permalink / raw)
  To: elfutils-devel

Hi -

This was a doozie.  As the commit text hints, I haven't found a way of
testing this on an elfutils/tests snack-sized data set (except perhaps
by injecting long sleeps?), but the changes work on the larger
archives such as the debuginfod.systemtap.org set.


commit 34e67018914cf9ebbef07065965755b6554fd66e (HEAD -> master, origin/fche/debuginfod-PR25394)
Author: Frank Ch. Eigler <fche@redhat.com>
Date:   Sun Jan 19 20:33:32 2020 -0500

    PR25394: debuginfod mutex between grooming and scanning
    
    Extended the work-queue concept with "idlers" - other threads that
    block on the work queue until it becomes empty (rather than normal
    consumers that block on it until it becomes non-empty).  Use this
    facility for the groomer thread to avoid working at the same time as
    the scanner threads.  Use this for the fts traversal thread for
    similar reasons.  One user-visible effect: response to SIGUSR1 and
    SIGUSR2 will wait until the work queue runs empty, but the man page
    was unspecific so does not need changing.
    
    It's not obvious how to test this with a tests/ dataset so small that
    scanning takes negligible time, so the former races are very tight.
    
    P.S. We also evaluated using sqlite level transactions to isolate the
    scanner thread groups-of-operations from the groomer.  These
    experiments failed to produce a nominally concurrent debuginfod,
    having triggered "database locked" type errors.  So we remain
    single-threaded (fully serialized) at the sqlite API level.

diff --git a/debuginfod/ChangeLog b/debuginfod/ChangeLog
index 35130b2a5d85..83b94a1108a6 100644
--- a/debuginfod/ChangeLog
+++ b/debuginfod/ChangeLog
@@ -1,3 +1,14 @@
+2020-01-19  Frank Ch. Eigler  <fche@redhat.com>
+
+	* debuginfod.cxx (scanq): Rework to let groomer/fts threads
+	synchronize with an empty workqueue, and lock out workqueue
+	consumers.
+	(thread_groom): Adopt new scanq idle APIs to lock out scanners.
+	(thread_main_fts_source_paths): Adopt new scanq idler API to
+	avoid being restarted while scanners haven't even finished yet.
+	(thread_main_*): Increment thread_work_total metric only after
+	a work cycle is completed, not when it begins.
+
 2020-01-18  Frank Ch. Eigler  <fche@redhat.com>
 
 	* debuginfod.cxx (thread_main_scanner): Handle empty source_paths[].
diff --git a/debuginfod/debuginfod.cxx b/debuginfod/debuginfod.cxx
index bf680048d44d..b93e29ff117d 100644
--- a/debuginfod/debuginfod.cxx
+++ b/debuginfod/debuginfod.cxx
@@ -374,7 +374,7 @@ static struct argp argp =
 
 
 static string db_path;
-static sqlite3 *db;
+static sqlite3 *db; // single connection, serialized across all our threads!
 static unsigned verbose;
 static volatile sig_atomic_t interrupted = 0;
 static volatile sig_atomic_t sigusr1 = 0;
@@ -537,23 +537,25 @@ struct elfutils_exception: public reportable_exception
 template <typename Payload>
 class workq
 {
-  deque<Payload> q;
+  set<Payload> q; // eliminate duplicates
   mutex mtx;
   condition_variable cv;
   bool dead;
+  unsigned idlers;
 
 public:
-  workq() { dead = false;}
+  workq() { dead = false; idlers = 0; }
   ~workq() {}
 
   void push_back(const Payload& p)
   {
     unique_lock<mutex> lock(mtx);
-    q.push_back(p);
+    q.insert (p);
     set_metric("thread_work_pending","role","scan", q.size());
-    cv.notify_one();
+    cv.notify_all();
   }
 
+  // kill this workqueue, wake up all idlers / scanners
   void nuke() {
     unique_lock<mutex> lock(mtx);
     // optional: q.clear();
@@ -561,28 +563,53 @@ class workq
     cv.notify_all();
   }
 
+  // block this scanner thread until there is work to do and no active
   bool wait_front (Payload& p)
   {
     unique_lock<mutex> lock(mtx);
-    while (q.size() == 0 && !dead)
+    while (!dead && (q.size() == 0 || idlers > 0))
       cv.wait(lock);
     if (dead)
       return false;
     else
       {
-        p = q.front();
-        q.pop_front();
+        p = * q.begin();
+        q.erase (q.begin());
         set_metric("thread_work_pending","role","scan", q.size());
+        if (q.size() == 0)
+          cv.notify_all(); // maybe wake up waiting idlers
         return true;
       }
   }
+
+  // block this idler thread until there is no work to do
+  void wait_idle ()
+  {
+    unique_lock<mutex> lock(mtx);
+    cv.notify_all(); // maybe wake up waiting scanners
+    while (!dead && (q.size() != 0))
+      cv.wait(lock);
+    idlers ++;
+  }
+
+  void done_idle ()
+  {
+    unique_lock<mutex> lock(mtx);
+    idlers --;
+    cv.notify_all(); // maybe wake up waiting scanners, but probably not (shutting down)
+  }
 };
 
 typedef struct stat stat_t;
 typedef pair<string,stat_t> scan_payload;
+inline bool operator< (const scan_payload& a, const scan_payload& b)
+{
+  return a.first < b.first; // don't bother compare the stat fields
+}
 static workq<scan_payload> scanq; // just a single one
-// producer: thread_main_fts_source_paths()
+// producer & idler: thread_main_fts_source_paths()
 // consumer: thread_main_scanner()
+// idler: thread_main_groom()
 
 
 
@@ -2309,7 +2336,6 @@ thread_main_scanner (void* arg)
       bool gotone = scanq.wait_front(p);
       add_metric("thread_busy", "role", "scan", 1);
       if (! gotone) continue; // or break
-      inc_metric("thread_work_total", "role","scan");
 
       try
         {
@@ -2347,6 +2373,8 @@ thread_main_scanner (void* arg)
         {
           e.report(cerr);
         }
+
+      inc_metric("thread_work_total", "role","scan");
     }
 
   add_metric("thread_busy", "role", "scan", -1);
@@ -2448,41 +2476,44 @@ thread_main_fts_source_paths (void* arg)
 {
   (void) arg; // ignore; we operate on global data
 
-  unsigned rescan_timer = 0;
   sig_atomic_t forced_rescan_count = 0;
-  set_metric("thread_timer_max", "role","traverse", rescan_s);
   set_metric("thread_tid", "role","traverse", tid());
   add_metric("thread_count", "role", "traverse", 1);
+
+  time_t last_rescan = 0;
+
   while (! interrupted)
     {
-      set_metric("thread_timer", "role","traverse", rescan_timer);
-      // set_metric("thread_forced_total", "role","traverse", forced_rescan_count);
-      if (rescan_s && rescan_timer > rescan_s)
-        rescan_timer = 0;
+      sleep (1);
+      scanq.wait_idle(); // don't start a new traversal while scanners haven't finished the job
+      scanq.done_idle(); // release the hounds
+      if (interrupted) break;
+
+      time_t now = time(NULL);
+      bool rescan_now = false;
+      if (last_rescan == 0) // at least one initial rescan is documented even for -t0
+        rescan_now = true;
+      if (rescan_s > 0 && now > last_rescan + rescan_s)
+        rescan_now = true;
       if (sigusr1 != forced_rescan_count)
         {
           forced_rescan_count = sigusr1;
-          rescan_timer = 0;
+          rescan_now = true;
         }
-      if (rescan_timer == 0)
+      if (rescan_now)
         try
           {
             set_metric("thread_busy", "role","traverse", 1);
-            inc_metric("thread_work_total", "role","traverse");
             scan_source_paths();
+            inc_metric("thread_work_total", "role","traverse");
             set_metric("thread_busy", "role","traverse", 0);
           }
         catch (const reportable_exception& e)
           {
             e.report(cerr);
           }
-      sleep (1);
-      rescan_timer ++;
     }
 
-  // wake up any blocked scanning threads so they can check $interrupted and kill themselves
-  scanq.nuke();
-
   return 0;
 }
 
@@ -2589,36 +2620,44 @@ void groom()
 static void*
 thread_main_groom (void* /*arg*/)
 {
-  unsigned groom_timer = 0;
   sig_atomic_t forced_groom_count = 0;
-  set_metric("thread_timer_max", "role", "groom", groom_s);
   set_metric("thread_tid", "role", "groom", tid());
   add_metric("thread_count", "role", "groom", 1);
-  while (! interrupted)
+
+  time_t last_groom = 0;
+
+  while (1)
     {
-      set_metric("thread_timer", "role", "groom", groom_timer);
-      // set_metric("thread_forced_total", "role", "groom", forced_groom_count);
-      if (groom_s && groom_timer > groom_s)
-        groom_timer = 0;
+      sleep (1);
+      scanq.wait_idle(); // PR25394: block scanners during grooming!
+      if (interrupted) break;
+
+      time_t now = time(NULL);
+      bool groom_now = false;
+      if (last_groom == 0) // at least one initial groom is documented even for -g0
+        groom_now = true;
+      if (groom_s > 0 && now > last_groom + groom_s)
+        groom_now = true;
       if (sigusr2 != forced_groom_count)
         {
           forced_groom_count = sigusr2;
-          groom_timer = 0;
+          groom_now = true;
         }
-      if (groom_timer == 0)
+      if (groom_now)
         try
           {
             set_metric("thread_busy", "role", "groom", 1);
-            inc_metric("thread_work_total", "role", "groom");
             groom ();
+            last_groom = time(NULL); // NB: now was before grooming
+            inc_metric("thread_work_total", "role", "groom");
             set_metric("thread_busy", "role", "groom", 0);
           }
         catch (const sqlite_exception& e)
           {
             obatched(cerr) << e.message << endl;
           }
-      sleep (1);
-      groom_timer ++;
+
+      scanq.done_idle();
     }
 
   return 0;
@@ -2860,26 +2899,27 @@ main (int argc, char *argv[])
   if (du && du[0] != '\0') // set to non-empty string?
     obatched(clog) << "upstream debuginfod servers: " << du << endl;
 
-  vector<pthread_t> scanner_threads;
-  pthread_t groom_thread;
+  vector<pthread_t> all_threads;
 
-  rc = pthread_create (& groom_thread, NULL, thread_main_groom, NULL);
+  pthread_t pt;
+  rc = pthread_create (& pt, NULL, thread_main_groom, NULL);
   if (rc < 0)
     error (0, 0, "warning: cannot spawn thread (%d) to groom database\n", rc);
+  else
+    all_threads.push_back(pt);
 
   if (scan_files || scan_archives.size() > 0)
     {
-      pthread_t pt;
       pthread_create (& pt, NULL, thread_main_fts_source_paths, NULL);
       if (rc < 0)
         error (0, 0, "warning: cannot spawn thread (%d) to traverse source paths\n", rc);
-      scanner_threads.push_back(pt);
+      all_threads.push_back(pt);
       for (unsigned i=0; i<concurrency; i++)
         {
           pthread_create (& pt, NULL, thread_main_scanner, NULL);
           if (rc < 0)
             error (0, 0, "warning: cannot spawn thread (%d) to scan source files / archives\n", rc);
-          scanner_threads.push_back(pt);
+          all_threads.push_back(pt);
         }
     }
 
@@ -2887,15 +2927,15 @@ main (int argc, char *argv[])
   set_metric("ready", 1);
   while (! interrupted)
     pause ();
+  scanq.nuke(); // wake up any remaining scanq-related threads, let them die
   set_metric("ready", 0);
 
   if (verbose)
     obatched(clog) << "stopping" << endl;
 
   /* Join all our threads. */
-  for (auto&& it : scanner_threads)
+  for (auto&& it : all_threads)
     pthread_join (it, NULL);
-  pthread_join (groom_thread, NULL);
 
   /* Stop all the web service threads. */
   if (d4) MHD_stop_daemon (d4);
diff --git a/tests/ChangeLog b/tests/ChangeLog
index 8936e410b3d6..8b03204182af 100644
--- a/tests/ChangeLog
+++ b/tests/ChangeLog
@@ -1,3 +1,7 @@
+2020-01-19  Frank Ch. Eigler  <fche@redhat.com>
+
+	* run-debuginfod-find.sh: Check for proper groom completion count.
+
 2020-01-18  Frank Ch. Eigler  <fche@redhat.com>
 
 	* run-debuginfod-find.sh: Test empty source_paths[].
diff --git a/tests/run-debuginfod-find.sh b/tests/run-debuginfod-find.sh
index 2a3d591b4e77..33bbd1a6dafe 100755
--- a/tests/run-debuginfod-find.sh
+++ b/tests/run-debuginfod-find.sh
@@ -257,6 +257,7 @@ RPM_BUILDID=d44d42cbd7d915bc938c81333a21e355a6022fb7 # in rhel6/ subdir, for a l
 rm -r R/debuginfod-rpms/rhel6/*
 kill -USR2 $PID1  # groom cycle
 # Expect 3 rpms to be deleted by the groom
+wait_ready $PORT1 'thread_work_total{role="groom"}' 1
 wait_ready $PORT1 'groom{statistic="file d/e"}' 3
 
 rm -rf $DEBUGINFOD_CACHE_PATH # clean it from previous tests

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: PR25394 patch: debuginfod groom vs. scan race condition
  2020-01-20  1:43 PR25394 patch: debuginfod groom vs. scan race condition Frank Ch. Eigler
@ 2020-01-20 17:03 ` Mark Wielaard
  2020-01-20 17:10   ` Frank Ch. Eigler
  0 siblings, 1 reply; 4+ messages in thread
From: Mark Wielaard @ 2020-01-20 17:03 UTC (permalink / raw)
  To: Frank Ch. Eigler, elfutils-devel

Hi Frank,

On Sun, 2020-01-19 at 20:43 -0500, Frank Ch. Eigler wrote:
> This was a doozie.  As the commit text hints, I haven't found a way
> of
> testing this on an elfutils/tests snack-sized data set (except perhaps
> by injecting long sleeps?), but the changes work on the larger
> archives such as the debuginfod.systemtap.org set.
> 
> 
> commit 34e67018914cf9ebbef07065965755b6554fd66e (HEAD -> master, origin/fche/debuginfod-PR25394)
> Author: Frank Ch. Eigler <fche@redhat.com>
> Date:   Sun Jan 19 20:33:32 2020 -0500
> 
>     PR25394: debuginfod mutex between grooming and scanning
>     
>     Extended the work-queue concept with "idlers" - other threads that
>     block on the work queue until it becomes empty (rather than normal
>     consumers that block on it until it becomes non-empty).  Use this
>     facility for the groomer thread to avoid working at the same time as
>     the scanner threads.  Use this for the fts traversal thread for
>     similar reasons.  One user-visible effect: response to SIGUSR1 and
>     SIGUSR2 will wait until the work queue runs empty, but the man page
>     was unspecific so does not need changing.
>     
>     It's not obvious how to test this with a tests/ dataset so small that
>     scanning takes negligible time, so the former races are very tight.
>     
>     P.S. We also evaluated using sqlite level transactions to isolate the
>     scanner thread groups-of-operations from the groomer.  These
>     experiments failed to produce a nominally concurrent debuginfod,
>     having triggered "database locked" type errors.  So we remain
>     single-threaded (fully serialized) at the sqlite API level.
> 
> diff --git a/debuginfod/ChangeLog b/debuginfod/ChangeLog
> index 35130b2a5d85..83b94a1108a6 100644
> --- a/debuginfod/ChangeLog
> +++ b/debuginfod/ChangeLog
> @@ -1,3 +1,14 @@
> +2020-01-19  Frank Ch. Eigler  <fche@redhat.com>
> +
> +	* debuginfod.cxx (scanq): Rework to let groomer/fts threads
> +	synchronize with an empty workqueue, and lock out workqueue
> +	consumers.
> +	(thread_groom): Adopt new scanq idle APIs to lock out scanners.
> +	(thread_main_fts_source_paths): Adopt new scanq idler API to
> +	avoid being restarted while scanners haven't even finished yet.
> +	(thread_main_*): Increment thread_work_total metric only after
> +	a work cycle is completed, not when it begins.

Nice work. And I think I almost understand it. I am just a little
confused about thread_main_fts_source_paths also being an "idler":

>   while (! interrupted)
>      {
> -      set_metric("thread_timer", "role","traverse", rescan_timer);
> -      // set_metric("thread_forced_total", "role","traverse", forced_rescan_count);
> -      if (rescan_s && rescan_timer > rescan_s)
> -        rescan_timer = 0;
> +      sleep (1);
> +      scanq.wait_idle(); // don't start a new traversal while scanners haven't finished the job
> +      scanq.done_idle(); // release the hounds
> +      if (interrupted) break;

I guess the comment "release the hounds" gets me confused. I don't know precisely what that means. Is this needed for correctness, or is it simply to have an hint that the scanners have ran because you register as an idler and so at this point you could only run if no scanner is running? But given you immediately tell them you are done a scanner could in theory start their work up again while you are doing your own traversal?

Thanks,

Mark

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: PR25394 patch: debuginfod groom vs. scan race condition
  2020-01-20 17:03 ` Mark Wielaard
@ 2020-01-20 17:10   ` Frank Ch. Eigler
  2020-01-20 17:31     ` Mark Wielaard
  0 siblings, 1 reply; 4+ messages in thread
From: Frank Ch. Eigler @ 2020-01-20 17:10 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: elfutils-devel

Hi -

> Nice work. And I think I almost understand it. I am just a little
> confused about thread_main_fts_source_paths also being an "idler":

Yup, it's a bit subtle.

> >   while (! interrupted)
> >      {
> > -      set_metric("thread_timer", "role","traverse", rescan_timer);
> > -      // set_metric("thread_forced_total", "role","traverse", forced_rescan_count);
> > -      if (rescan_s && rescan_timer > rescan_s)
> > -        rescan_timer = 0;
> > +      sleep (1);
> > +      scanq.wait_idle(); // don't start a new traversal while scanners haven't finished the job
> > +      scanq.done_idle(); // release the hounds
> > +      if (interrupted) break;

> I guess the comment "release the hounds" gets me confused. I don't
> know precisely what that means. 

https://www.youtube.com/watch?time_continue=28&v=ORvkElqw0QM

> Is this needed for correctness, or is it simply to have an hint that
> the scanners have ran because you register as an idler and so at
> this point you could only run if no scanner is running?

It's because we don't want to start a new fts traversal pass while the
scanners haven't even finished the last one.  I've seen a case where
the rescan interval was so short that the scanner backlog just kept
growing and growing and growing.  (Changing the work queue to a set
from a dequeue also helps with duplicate elimination.)  So in this
way, it just returns to the timing behaviour of the pre-workqueue
implementation: we wait a certain amount of time AFTER the scanning is
ALL DONE, before starting a new scan.

> But given you immediately tell them you are done a scanner could in
> theory start their work up again while you are doing your own
> traversal?

Yes, because that's ideal for concurrency: while one thread fts(3)'s
and enumerates the results, the scanner threads can process the
already enumerated files.

- FChE

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: PR25394 patch: debuginfod groom vs. scan race condition
  2020-01-20 17:10   ` Frank Ch. Eigler
@ 2020-01-20 17:31     ` Mark Wielaard
  0 siblings, 0 replies; 4+ messages in thread
From: Mark Wielaard @ 2020-01-20 17:31 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: elfutils-devel

Hi Frank,

On Mon, 2020-01-20 at 12:10 -0500, Frank Ch. Eigler wrote:
> > Nice work. And I think I almost understand it. I am just a little
> > confused about thread_main_fts_source_paths also being an "idler":
> 
> Yup, it's a bit subtle.
> 
> > >   while (! interrupted)
> > >      {
> > > -      set_metric("thread_timer", "role","traverse", rescan_timer);
> > > -      // set_metric("thread_forced_total", "role","traverse", forced_rescan_count);
> > > -      if (rescan_s && rescan_timer > rescan_s)
> > > -        rescan_timer = 0;
> > > +      sleep (1);
> > > +      scanq.wait_idle(); // don't start a new traversal while scanners haven't finished the job
> > > +      scanq.done_idle(); // release the hounds
> > > +      if (interrupted) break;
> > I guess the comment "release the hounds" gets me confused. I don't
> > know precisely what that means. 
> 
> https://www.youtube.com/watch?time_continue=28&v=ORvkElqw0QM

Aha, so it is as if we are going to start to run (chasing files),
except there is also the rescan timeout which might still stop the
hounds from really being released :)

> > Is this needed for correctness, or is it simply to have an hint that
> > the scanners have ran because you register as an idler and so at
> > this point you could only run if no scanner is running?
> 
> It's because we don't want to start a new fts traversal pass while the
> scanners haven't even finished the last one.  I've seen a case where
> the rescan interval was so short that the scanner backlog just kept
> growing and growing and growing.  (Changing the work queue to a set
> from a dequeue also helps with duplicate elimination.)  So in this
> way, it just returns to the timing behaviour of the pre-workqueue
> implementation: we wait a certain amount of time AFTER the scanning is
> ALL DONE, before starting a new scan.

That all makes sense. Thanks.

> > But given you immediately tell them you are done a scanner could in
> > theory start their work up again while you are doing your own
> > traversal?
> 
> Yes, because that's ideal for concurrency: while one thread fts(3)'s
> and enumerates the results, the scanner threads can process the
> already enumerated files.

OK. And if any idlers (grooming) would wake up that is no problem,
because that simply means the work queue was empty and adding work to
the queue can be done whether either the grooming thread (x)or scanner
threads are running concurrently.

Please do push if you feel confident about it being correct. I think it
is now that I understand how it actually works.

Thanks,

Mark

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-01-20 17:31 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-20  1:43 PR25394 patch: debuginfod groom vs. scan race condition Frank Ch. Eigler
2020-01-20 17:03 ` Mark Wielaard
2020-01-20 17:10   ` Frank Ch. Eigler
2020-01-20 17:31     ` Mark Wielaard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).