public inbox for elfutils@sourceware.org
 help / color / mirror / Atom feed
* [Bug debuginfod/25394] New: groom vs. scan race condition
@ 2020-01-15 23:10 fche at redhat dot com
  2020-01-21  3:26 ` [Bug debuginfod/25394] " fche at redhat dot com
  2020-01-21  9:05 ` mark at klomp dot org
  0 siblings, 2 replies; 3+ messages in thread
From: fche at redhat dot com @ 2020-01-15 23:10 UTC (permalink / raw)
  To: elfutils-devel

https://sourceware.org/bugzilla/show_bug.cgi?id=25394

            Bug ID: 25394
           Summary: groom vs. scan race condition
           Product: elfutils
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: debuginfod
          Assignee: fche at redhat dot com
          Reporter: fche at redhat dot com
                CC: elfutils-devel at sourceware dot org
  Target Milestone: ---

The grooming thread tries to nuke orphan buildid records (those with no d/e/s
type payload records pointing to them).  This can race with the stages of
creation of a payload record in the scanner threads, wherein a buildid record
is interned first, then the referring d/e/s record is written.  If these are
interleaved just right, the d/e/s record will be disallowed, so the file or
archive data will be incomplete.  Worse, since the size/mtime payload record is
still written (because the scanner threads think d/e/s was successful), a later
scan pass will not try to rescan the affected file/archive either.

We need some A(tomicity) in the scanner threads (a transaction that includes
the interning and the payload inserts).  And we probably need a schema reset,
just so possibly incomplete databases are regenerated correctly.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug debuginfod/25394] groom vs. scan race condition
  2020-01-15 23:10 [Bug debuginfod/25394] New: groom vs. scan race condition fche at redhat dot com
@ 2020-01-21  3:26 ` fche at redhat dot com
  2020-01-21  9:05 ` mark at klomp dot org
  1 sibling, 0 replies; 3+ messages in thread
From: fche at redhat dot com @ 2020-01-21  3:26 UTC (permalink / raw)
  To: elfutils-devel

https://sourceware.org/bugzilla/show_bug.cgi?id=25394

Frank Ch. Eigler <fche at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #1 from Frank Ch. Eigler <fche at redhat dot com> ---
commit 34e67018914cf9ebbef07065965755b6554fd66e
let's try to put out of our minds the four subsequent cleanup patches

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug debuginfod/25394] groom vs. scan race condition
  2020-01-15 23:10 [Bug debuginfod/25394] New: groom vs. scan race condition fche at redhat dot com
  2020-01-21  3:26 ` [Bug debuginfod/25394] " fche at redhat dot com
@ 2020-01-21  9:05 ` mark at klomp dot org
  1 sibling, 0 replies; 3+ messages in thread
From: mark at klomp dot org @ 2020-01-21  9:05 UTC (permalink / raw)
  To: elfutils-devel

https://sourceware.org/bugzilla/show_bug.cgi?id=25394

Mark Wielaard <mark at klomp dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mark at klomp dot org

--- Comment #2 from Mark Wielaard <mark at klomp dot org> ---
(In reply to Frank Ch. Eigler from comment #1)
> commit 34e67018914cf9ebbef07065965755b6554fd66e
> let's try to put out of our minds the four subsequent cleanup patches

Thanks!
And the cleanups are important. No worries, that is what the buildbots are for.

In this case it was actually a couple of separate things:

- On some arches the size and signedness of time_t is different.
  Leading to some fixes/casts when doing (long) time calculations:

commit 91b7beaef91b60fbde13dadf86091e57c8245008
Author: Frank Ch. Eigler <fche@redhat.com>
Date:   Mon Jan 20 14:44:15 2020 -0500

    PR25394 followup: debuginfod casting fixes

    Buildbot reports type warnings in time_t arithmetic.
    Explicit (long) cast pushed as obvious.

commit 09d76c1dd5e45c5512db997e52234dd2ddab8c2d
Author: Frank Ch. Eigler <fche@redhat.com>
Date:   Mon Jan 20 14:44:15 2020 -0500

    PR25394 followup#2: debuginfod casting fixes

    Buildbot still reports type warnings in time_t arithmetic.
    Explicit (long)er cast pushed as obvious ... or is it? :-)

- On the buildbot workers /usr/sbin might not have been in the PATH
  (when the worker was (re)started through cron)
  This caused the ss binary not to be found.
  Fixed in on the workers.

- last_rescan timestamp was lost:

commit c351734f4feff176b3e0ca8fbbc8353053c3ab6d
Author: Frank Ch. Eigler <fche@redhat.com>
Date:   Mon Jan 20 15:37:33 2020 -0500

    PR25394 cont'd: debuginfod timing fix for fts-traversal thread

    The new code neglected to set the last_rescan timestamp, leading
    to overly frequent rescanning.

- As mentioned in the original commit acting on the USR1 signal might be
delayed a bit now. But the testcase actually depended on the current immediate
timing.

Author: Frank Ch. Eigler <fche@redhat.com>
Date:   Mon Jan 20 15:37:33 2020 -0500

    PR25394 cont'd: debuginfod testsuite fix for -USR1 timing

    If a SIGUSR1 is sent before the initial traversal, it no longer
    results in an extra traversal.  That's a sensible effect.  The
    test case just needs to wait before the kill -USR1.

Just adding a wait_ready $PORT1 'thread_work_total{role="traverse"}' 1 in the
testcase fixes that.

It is actually amazing that the full testsuite was GREEN on our local setups.
Thanks buildbot for having weird arches and timings :)

P.S. Please don't forget the signed-off-by line on your commits.
See
https://sourceware.org/git/?p=elfutils.git;a=blob_plain;f=CONTRIBUTING;hb=HEAD

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-01-21  9:05 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-15 23:10 [Bug debuginfod/25394] New: groom vs. scan race condition fche at redhat dot com
2020-01-21  3:26 ` [Bug debuginfod/25394] " fche at redhat dot com
2020-01-21  9:05 ` mark at klomp dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).