public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* [Bug runtime/19799] New: deleting from array of aggregate unreliable
@ 2016-03-09 17:21 raeburn at permabit dot com
  2016-03-09 19:05 ` [Bug runtime/19799] " fche at redhat dot com
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: raeburn at permabit dot com @ 2016-03-09 17:21 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=19799

            Bug ID: 19799
           Summary: deleting from array of aggregate unreliable
           Product: systemtap
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: runtime
          Assignee: systemtap at sourceware dot org
          Reporter: raeburn at permabit dot com
  Target Milestone: ---

I've got a SystemTap script that updates entries in an array of aggregates, and
occasionally deletes entries, but the deletion doesn't reliably seem to work.

I'm using a function probe that collects some timing data and updates stats in
an array. Periodically (with a "timer.ms" probe) we pick a range of indices and
print out the values accumulated so far, and (try to) clear them.

  global stats[500000];
  probe module(...).function(...) {
    ...stats[a,1] <<< value1;...stats[a,2] <<< value2;...etc...
  }
  probe timer.ms(NNN) {
    for (...) {
      printf(...stats[x,y]...);
      delete stats[x,y];
    }
  }

As I understand it, the delete should get rid of the array entry, effectively
resetting the counter for the key-pair to zero. What I'm seeing instead is that
often the array entry doesn't get deleted; if I use:

            delete stats[thisIndex,1];
            if ([thisIndex,1] in stats) {
                printf("eek! [%d,1] in stats after deletion??\n",
                       thisIndex);
            }

then the error message fires pretty often, but not always, with my script. And
the values output are clearly continuing to accumulate data from one report to
the next.

This happens with "version 2.7/0.161, rpm 2.7-2.el6" on RHEL6, version 2.9 from
the web site, and git rev d3aa622.

Looking at pmap-gen.c in git (which could use a few more comments maybe?), it
looks to me like the data is stored in per-CPU maps, and collected from all of
them when read out, but _stp_pmap_del appears to operate only on the per-CPU
map for the current CPU.

A quick experiment putting a for_each_possible_cpu loop into _stp_pmap_del
seems to fix the problem for me, on initial testing; the error message above
doesn't fire, and the counters reported are often smaller than on the previous
iteration. I won't bother sending my patch, as it seems to be functional but
isn't very good -- it recomputes the hash value for every per-CPU map, and I
overlooked the aggregate map, but I assume the entry should probably be removed
there too.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug runtime/19799] deleting from array of aggregate unreliable
  2016-03-09 17:21 [Bug runtime/19799] New: deleting from array of aggregate unreliable raeburn at permabit dot com
@ 2016-03-09 19:05 ` fche at redhat dot com
  2016-03-09 19:37 ` dsmith at redhat dot com
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: fche at redhat dot com @ 2016-03-09 19:05 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=19799

Frank Ch. Eigler <fche at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fche at redhat dot com

--- Comment #1 from Frank Ch. Eigler <fche at redhat dot com> ---
Thanks for your excellent bug report.  We should have this fixed in no time.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug runtime/19799] deleting from array of aggregate unreliable
  2016-03-09 17:21 [Bug runtime/19799] New: deleting from array of aggregate unreliable raeburn at permabit dot com
  2016-03-09 19:05 ` [Bug runtime/19799] " fche at redhat dot com
@ 2016-03-09 19:37 ` dsmith at redhat dot com
  2016-03-09 22:34 ` dsmith at redhat dot com
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: dsmith at redhat dot com @ 2016-03-09 19:37 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=19799

David Smith <dsmith at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dsmith at redhat dot com

--- Comment #2 from David Smith <dsmith at redhat dot com> ---
Created attachment 9080
  --> https://sourceware.org/bugzilla/attachment.cgi?id=9080&action=edit
test script

Here's a small script that shows the problem based on the description in
comment #0. Here's a short run of the script:

====
# stap stat_delete.stp
[0, 0]: 49905045
eek! [0, 0] in stats after deletion??
[0, 1]: 4286255610905
eek! [0, 1] in stats after deletion??
[0, 0]: 174856307
eek! [0, 0] in stats after deletion??
[0, 1]: 6433509015679
eek! [0, 1] in stats after deletion??
^C#
====

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug runtime/19799] deleting from array of aggregate unreliable
  2016-03-09 17:21 [Bug runtime/19799] New: deleting from array of aggregate unreliable raeburn at permabit dot com
  2016-03-09 19:05 ` [Bug runtime/19799] " fche at redhat dot com
  2016-03-09 19:37 ` dsmith at redhat dot com
@ 2016-03-09 22:34 ` dsmith at redhat dot com
  2016-03-11 18:37 ` dsmith at redhat dot com
  2016-03-11 22:34 ` raeburn at permabit dot com
  4 siblings, 0 replies; 6+ messages in thread
From: dsmith at redhat dot com @ 2016-03-09 22:34 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=19799

--- Comment #3 from David Smith <dsmith at redhat dot com> ---
Created attachment 9081
  --> https://sourceware.org/bugzilla/attachment.cgi?id=9081&action=edit
simple patch

Here's the simple version of a patch to fix this bug. It deletes the value from
every cpu's map, then deletes the value from the aggregate map.

I've tested it with the test script I uploaded earlier, and it fixes the
problem there. I'm going to more fully test it and see how it goes.

I'm seeing some skipped probes with using this patch, so I'll try to work on a
more sophisticated version that only computes the hash once. I doubt it will
make much difference when the map indices are integers, but it could make a
difference when the map indices are strings.

I also need to write up a test case for this issue so it remains fixed.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug runtime/19799] deleting from array of aggregate unreliable
  2016-03-09 17:21 [Bug runtime/19799] New: deleting from array of aggregate unreliable raeburn at permabit dot com
                   ` (2 preceding siblings ...)
  2016-03-09 22:34 ` dsmith at redhat dot com
@ 2016-03-11 18:37 ` dsmith at redhat dot com
  2016-03-11 22:34 ` raeburn at permabit dot com
  4 siblings, 0 replies; 6+ messages in thread
From: dsmith at redhat dot com @ 2016-03-11 18:37 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=19799

David Smith <dsmith at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #4 from David Smith <dsmith at redhat dot com> ---
Fixed in commit 9926a71.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug runtime/19799] deleting from array of aggregate unreliable
  2016-03-09 17:21 [Bug runtime/19799] New: deleting from array of aggregate unreliable raeburn at permabit dot com
                   ` (3 preceding siblings ...)
  2016-03-11 18:37 ` dsmith at redhat dot com
@ 2016-03-11 22:34 ` raeburn at permabit dot com
  4 siblings, 0 replies; 6+ messages in thread
From: raeburn at permabit dot com @ 2016-03-11 22:34 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=19799

--- Comment #5 from Ken Raeburn <raeburn at permabit dot com> ---
I've been trying out the git version a bit with my test setup and ugly,
complicated script, and it seems to be working perfectly so far. Thanks!

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-03-11 22:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-09 17:21 [Bug runtime/19799] New: deleting from array of aggregate unreliable raeburn at permabit dot com
2016-03-09 19:05 ` [Bug runtime/19799] " fche at redhat dot com
2016-03-09 19:37 ` dsmith at redhat dot com
2016-03-09 22:34 ` dsmith at redhat dot com
2016-03-11 18:37 ` dsmith at redhat dot com
2016-03-11 22:34 ` raeburn at permabit dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).