public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* [Bug runtime/11472] New: many skipped probes due lengthy printing delays
@ 2010-04-06 20:00 jistone at redhat dot com
  2010-04-16 19:32 ` [Bug runtime/11472] " fche at redhat dot com
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: jistone at redhat dot com @ 2010-04-06 20:00 UTC (permalink / raw)
  To: systemtap

On IRC, soren reported many skipped probes, which we determined to be because
the collection probes were waiting for a lock held by a slow reporting probe. 
This simplified script can show the long delays:


global vfs_write_bytes

probe timer.s(1) {
  for (i=0; i<400; ++i)
    vfs_write_bytes[i, sprint(i), sprint(i*i)] <<< i
}

probe timer.s(1) {
  foreach ([uid,fsname,devname] in vfs_write_bytes) {
    printf("vfs_write_bytes[uid=%d,fsname=%s,devname=%s] = %d\n",
           uid, fsname, devname, @sum(vfs_write_bytes[uid,fsname,devname]))
    printf("vfs_write_calls[uid=%d,fsname=%s,devname=%s] = %d\n",
           uid, fsname, devname, @count(vfs_write_bytes[uid,fsname,devname]))
  }
  delete vfs_write_bytes
}


"stap -t" shows the reporting loop to take well over 1ms for just 400 entries
(thus 800 lines).

I experimented with commenting out printfs, and I estimate that the overhead is
about 20% in the foreach aggregation and 40% each for the printfs.

Changing the key tuple from (long,string,string) to (long,long,long) didn't seem
to have any measurable effect.

Bumping TRYLOCKDELAY to 20000 (20us) fixed the problem for soren, such that the
collection probes are now willing to wait long enough for the reporter for
finish.  I think in the general case this is too long, but at least our tunables
were able to make it work.

We bounced around many ideas on how we should improve, which I'll try to summarize:

= Make the stats array (pmap) lockless so the aggregator doesn't block writers.
- how would this work if multiple foreach loops access the same array? do we
lose atomicity?

= Consider bumping TRYLOCKDELAY and/or MAXTRYLOCK to higher defaults, or even
dynamically tune them within overhead bounds.

= Consider making MAXSKIPPED resettable in the script, or maybe just:
  function reset_skipped() %{ atomic_set(&skipped_count, 0); %}

= Consider making a MAXSKIPPED a rate rather than an absolute count, so
long-running scripts can choose to tolerate infrequent skips.

= And of course, figure out why we're taking so long in the first place.
  1ms / 800 printfs == 1.25us/printf ... can we do better?

-- 
           Summary: many skipped probes due lengthy printing delays
           Product: systemtap
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: runtime
        AssignedTo: systemtap at sources dot redhat dot com
        ReportedBy: jistone at redhat dot com


http://sourceware.org/bugzilla/show_bug.cgi?id=11472

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug runtime/11472] many skipped probes due lengthy printing delays
  2010-04-06 20:00 [Bug runtime/11472] New: many skipped probes due lengthy printing delays jistone at redhat dot com
@ 2010-04-16 19:32 ` fche at redhat dot com
  2010-04-20 14:38 ` jistone at redhat dot com
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: fche at redhat dot com @ 2010-04-16 19:32 UTC (permalink / raw)
  To: systemtap



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
OtherBugsDependingO|                            |11179
              nThis|                            |


http://sourceware.org/bugzilla/show_bug.cgi?id=11472

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug runtime/11472] many skipped probes due lengthy printing delays
  2010-04-06 20:00 [Bug runtime/11472] New: many skipped probes due lengthy printing delays jistone at redhat dot com
  2010-04-16 19:32 ` [Bug runtime/11472] " fche at redhat dot com
@ 2010-04-20 14:38 ` jistone at redhat dot com
  2010-05-08 15:36 ` fche at redhat dot com
  2010-05-08 18:21 ` fche at redhat dot com
  3 siblings, 0 replies; 7+ messages in thread
From: jistone at redhat dot com @ 2010-04-20 14:38 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From jistone at redhat dot com  2010-04-20 03:07 -------
See commit bdc8227, PR11473: Optimize references to iteration values

With this change we get rid of redundant map lookups, and I see 12-15% timing
improvement in the example above.

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=11472

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug runtime/11472] many skipped probes due lengthy printing delays
  2010-04-06 20:00 [Bug runtime/11472] New: many skipped probes due lengthy printing delays jistone at redhat dot com
  2010-04-16 19:32 ` [Bug runtime/11472] " fche at redhat dot com
  2010-04-20 14:38 ` jistone at redhat dot com
@ 2010-05-08 15:36 ` fche at redhat dot com
  2010-05-08 18:21 ` fche at redhat dot com
  3 siblings, 0 replies; 7+ messages in thread
From: fche at redhat dot com @ 2010-05-08 15:36 UTC (permalink / raw)
  To: systemtap



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
  BugsThisDependsOn|                            |11528


http://sourceware.org/bugzilla/show_bug.cgi?id=11472

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug runtime/11472] many skipped probes due lengthy printing delays
  2010-04-06 20:00 [Bug runtime/11472] New: many skipped probes due lengthy printing delays jistone at redhat dot com
                   ` (2 preceding siblings ...)
  2010-05-08 15:36 ` fche at redhat dot com
@ 2010-05-08 18:21 ` fche at redhat dot com
  3 siblings, 0 replies; 7+ messages in thread
From: fche at redhat dot com @ 2010-05-08 18:21 UTC (permalink / raw)
  To: systemtap



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
  BugsThisDependsOn|                            |3217


http://sourceware.org/bugzilla/show_bug.cgi?id=11472

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug runtime/11472] many skipped probes due lengthy printing delays
       [not found] <bug-11472-6586@http.sourceware.org/bugzilla/>
  2011-10-07 18:20 ` jistone at redhat dot com
@ 2014-10-10 15:04 ` ajakop at redhat dot com
  1 sibling, 0 replies; 7+ messages in thread
From: ajakop at redhat dot com @ 2014-10-10 15:04 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=11472
Bug 11472 depends on bug 11528, which changed state.

Bug 11528 Summary: Pass string values by reference where possible
https://sourceware.org/bugzilla/show_bug.cgi?id=11528

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug runtime/11472] many skipped probes due lengthy printing delays
       [not found] <bug-11472-6586@http.sourceware.org/bugzilla/>
@ 2011-10-07 18:20 ` jistone at redhat dot com
  2014-10-10 15:04 ` ajakop at redhat dot com
  1 sibling, 0 replies; 7+ messages in thread
From: jistone at redhat dot com @ 2011-10-07 18:20 UTC (permalink / raw)
  To: systemtap

http://sourceware.org/bugzilla/show_bug.cgi?id=11472

Bug 11472 depends on bug 3217, which changed state.

Bug 3217 Summary: compiled binary tracing
http://sourceware.org/bugzilla/show_bug.cgi?id=3217

           What    |Old Value                   |New Value
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |FIXED

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-10-10 15:04 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-06 20:00 [Bug runtime/11472] New: many skipped probes due lengthy printing delays jistone at redhat dot com
2010-04-16 19:32 ` [Bug runtime/11472] " fche at redhat dot com
2010-04-20 14:38 ` jistone at redhat dot com
2010-05-08 15:36 ` fche at redhat dot com
2010-05-08 18:21 ` fche at redhat dot com
     [not found] <bug-11472-6586@http.sourceware.org/bugzilla/>
2011-10-07 18:20 ` jistone at redhat dot com
2014-10-10 15:04 ` ajakop at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).