public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* [Bug runtime/2514] New: Sequence had xxx drops.
@ 2006-04-05 14:31 guanglei at cn dot ibm dot com
  2006-04-05 15:15 ` [Bug runtime/2514] " fche at redhat dot com
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: guanglei at cn dot ibm dot com @ 2006-04-05 14:31 UTC (permalink / raw)
  To: systemtap

In the early overhead testing stage of LKET, we only use syscall event hook to
trace the execution of some benchmark tools, and we found at the end of testing,
SystemTap will give a report of "Sequence had xxx drops". But the drops of
sequence is only several hunderds.

But as we begin to use more than one event hooks to trace the benchmark tools,
the number of dropped sequence increase greatly, e.g, the testing of dbench shows:
...
Sequence had 8667222 drops.
total hooks triggered is 11087355

Such a big drop number is unbearable, and it will even cause the final trace
data useless at all.

One reason I guess is that in such a heavy loaded system, the probe handlers are
triggered in a very high freq and generates too much data to be handled by
relayfs. The speed of putting into relayfs transport layer is much faster than
the speed that those data are retrieved away from relayfs from user space.

So are there any ways of fixing this? It is a problem for the trace tool like
LKET which will generate a very large trace data file.

-- 
           Summary: Sequence had xxx drops.
           Product: systemtap
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: runtime
        AssignedTo: systemtap at sources dot redhat dot com
        ReportedBy: guanglei at cn dot ibm dot com


http://sourceware.org/bugzilla/show_bug.cgi?id=2514

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug runtime/2514] Sequence had xxx drops.
  2006-04-05 14:31 [Bug runtime/2514] New: Sequence had xxx drops guanglei at cn dot ibm dot com
@ 2006-04-05 15:15 ` fche at redhat dot com
  2006-04-05 15:23 ` guanglei at cn dot ibm dot com
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: fche at redhat dot com @ 2006-04-05 15:15 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From fche at redhat dot com  2006-04-05 15:15 -------
How large buffers have you tried?

Those hard-coded VT100 escape codes in stpd/librelay.c and elsewhere should come
out, btw.


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=2514

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug runtime/2514] Sequence had xxx drops.
  2006-04-05 14:31 [Bug runtime/2514] New: Sequence had xxx drops guanglei at cn dot ibm dot com
  2006-04-05 15:15 ` [Bug runtime/2514] " fche at redhat dot com
@ 2006-04-05 15:23 ` guanglei at cn dot ibm dot com
  2006-04-05 16:19 ` zanussi at us dot ibm dot com
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: guanglei at cn dot ibm dot com @ 2006-04-05 15:23 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From guanglei at cn dot ibm dot com  2006-04-05 15:23 -------
(In reply to comment #1)
> How large buffers have you tried?
you mean the buffer size specified by "stap -s"?
One of my colleagues(Liang Shanshan) said she tried 1m, 2m, 4m, 8m, 16m buffer
size, but got almost the same result:

It seems that option "-s" has no effect on sequence dropping.

first times:(-s 1)
Sequence had 5189560 drops.
counthook average is 8342273

second times:(-s 2)
Sequence had 6404573 drops.
counthook average is 8305578

...

forth times:(-s 8)
Sequence had 3859367 drops.
counthook average is 8326021
...

> 
> Those hard-coded VT100 escape codes in stpd/librelay.c and elsewhere should come
> out, btw.
> 

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=2514

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug runtime/2514] Sequence had xxx drops.
  2006-04-05 14:31 [Bug runtime/2514] New: Sequence had xxx drops guanglei at cn dot ibm dot com
  2006-04-05 15:15 ` [Bug runtime/2514] " fche at redhat dot com
  2006-04-05 15:23 ` guanglei at cn dot ibm dot com
@ 2006-04-05 16:19 ` zanussi at us dot ibm dot com
  2006-04-06 19:48 ` [Bug runtime/2514] New: " Martin Hunt
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: zanussi at us dot ibm dot com @ 2006-04-05 16:19 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From zanussi at us dot ibm dot com  2006-04-05 16:19 -------
(In reply to comment #2)
> (In reply to comment #1)
> > How large buffers have you tried?
> you mean the buffer size specified by "stap -s"?
> One of my colleagues(Liang Shanshan) said she tried 1m, 2m, 4m, 8m, 16m buffer
> size, but got almost the same result:
> 
> It seems that option "-s" has no effect on sequence dropping.
> 
> first times:(-s 1)
> Sequence had 5189560 drops.
> counthook average is 8342273
> 
> second times:(-s 2)
> Sequence had 6404573 drops.
> counthook average is 8305578
> 
> ...
> 
> forth times:(-s 8)
> Sequence had 3859367 drops.
> counthook average is 8326021
> ...
> 
> > 
> > Those hard-coded VT100 escape codes in stpd/librelay.c and elsewhere should come
> > out, btw.
> > 

Well, it does look like it's having some effect - when you went to 8Mb, you cut
the drops in half.

If you're logging huge amounts of data, though, you'll eventually reach a point
where stpd can't write the data to disk fast enough to keep up.

One thing that worked well for blktrace was to mmap the output buffers - using
this method, it seems to be able to log data at pretty high rates without
dropping any records and with relatively small buffer sizes.  Another thing that
seemed to work well was to keep a list of unwritten sub-buffers in the daemon
which would act as a 'buffer' against temporary backlogs; a relatively small
buffer size could then be used to handle normal expected traffic, but when data
came in too quickly to be written to disk, the list could temporarily absorb the
backlog.

I've been meaning to try something like this with systemtap; sounds like it
would be a good time to do it - I'll play around with it as soon as I get the
chance...

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=2514

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Bug runtime/2514] New: Sequence had xxx drops.
  2006-04-05 14:31 [Bug runtime/2514] New: Sequence had xxx drops guanglei at cn dot ibm dot com
                   ` (2 preceding siblings ...)
  2006-04-05 16:19 ` zanussi at us dot ibm dot com
@ 2006-04-06 19:48 ` Martin Hunt
  2006-04-06 19:49 ` [Bug runtime/2514] " hunt at redhat dot com
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Martin Hunt @ 2006-04-06 19:48 UTC (permalink / raw)
  To: sourceware-bugzilla; +Cc: systemtap

On Wed, 2006-04-05 at 14:31 +0000, guanglei at cn dot ibm dot com wrote:
> One reason I guess is that in such a heavy loaded system, the probe handlers are
> triggered in a very high freq and generates too much data to be handled by
> relayfs. The speed of putting into relayfs transport layer is much faster than
> the speed that those data are retrieved away from relayfs from user space.
>
> So are there any ways of fixing this? It is a problem for the trace tool like
> LKET which will generate a very large trace data file.

The upper limit of both procfs and relayfs is the speed of the hard
disks, typically around 40MB/sec. If you are trying to log more data
than you can write to disk, you have a problem that cannot be fixed
without upgrading your hardware.

For bursts of data that temporarily exceed 40MB/sec, you can use "stap
-s" to allocate a larger buffer size.

It is possible to slow down the write process when the output buffer
starts getting full to avoid dropped packets. I have experimented with
this in procfs. However, that slows down the system and mainly just
reduces the need for large buffers. It cannot increase the sustained
logging rate which is limited by the drive speed.

Martin




^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug runtime/2514] Sequence had xxx drops.
  2006-04-05 14:31 [Bug runtime/2514] New: Sequence had xxx drops guanglei at cn dot ibm dot com
                   ` (3 preceding siblings ...)
  2006-04-06 19:48 ` [Bug runtime/2514] New: " Martin Hunt
@ 2006-04-06 19:49 ` hunt at redhat dot com
  2007-03-16 16:00 ` fche at redhat dot com
  2007-05-08 19:06 ` fche at redhat dot com
  6 siblings, 0 replies; 8+ messages in thread
From: hunt at redhat dot com @ 2006-04-06 19:49 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From hunt at redhat dot com  2006-04-06 19:48 -------
Subject: Re:  New: Sequence had xxx drops.

On Wed, 2006-04-05 at 14:31 +0000, guanglei at cn dot ibm dot com wrote:
> One reason I guess is that in such a heavy loaded system, the probe handlers are
> triggered in a very high freq and generates too much data to be handled by
> relayfs. The speed of putting into relayfs transport layer is much faster than
> the speed that those data are retrieved away from relayfs from user space.
>
> So are there any ways of fixing this? It is a problem for the trace tool like
> LKET which will generate a very large trace data file.

The upper limit of both procfs and relayfs is the speed of the hard
disks, typically around 40MB/sec. If you are trying to log more data
than you can write to disk, you have a problem that cannot be fixed
without upgrading your hardware.

For bursts of data that temporarily exceed 40MB/sec, you can use "stap
-s" to allocate a larger buffer size.

It is possible to slow down the write process when the output buffer
starts getting full to avoid dropped packets. I have experimented with
this in procfs. However, that slows down the system and mainly just
reduces the need for large buffers. It cannot increase the sustained
logging rate which is limited by the drive speed.

Martin






-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=2514

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug runtime/2514] Sequence had xxx drops.
  2006-04-05 14:31 [Bug runtime/2514] New: Sequence had xxx drops guanglei at cn dot ibm dot com
                   ` (4 preceding siblings ...)
  2006-04-06 19:49 ` [Bug runtime/2514] " hunt at redhat dot com
@ 2007-03-16 16:00 ` fche at redhat dot com
  2007-05-08 19:06 ` fche at redhat dot com
  6 siblings, 0 replies; 8+ messages in thread
From: fche at redhat dot com @ 2007-03-16 16:00 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From fche at redhat dot com  2007-03-16 16:00 -------
Is this still a problem with recent systemtap/runtime versions?
That is, are we unable to saturate the disk system before dropping records?


-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |WAITING


http://sourceware.org/bugzilla/show_bug.cgi?id=2514

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug runtime/2514] Sequence had xxx drops.
  2006-04-05 14:31 [Bug runtime/2514] New: Sequence had xxx drops guanglei at cn dot ibm dot com
                   ` (5 preceding siblings ...)
  2007-03-16 16:00 ` fche at redhat dot com
@ 2007-05-08 19:06 ` fche at redhat dot com
  6 siblings, 0 replies; 8+ messages in thread
From: fche at redhat dot com @ 2007-05-08 19:06 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From fche at redhat dot com  2007-05-08 20:06 -------
Looks better now.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |RESOLVED
         Resolution|                            |WORKSFORME


http://sourceware.org/bugzilla/show_bug.cgi?id=2514

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-05-08 19:06 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-04-05 14:31 [Bug runtime/2514] New: Sequence had xxx drops guanglei at cn dot ibm dot com
2006-04-05 15:15 ` [Bug runtime/2514] " fche at redhat dot com
2006-04-05 15:23 ` guanglei at cn dot ibm dot com
2006-04-05 16:19 ` zanussi at us dot ibm dot com
2006-04-06 19:48 ` [Bug runtime/2514] New: " Martin Hunt
2006-04-06 19:49 ` [Bug runtime/2514] " hunt at redhat dot com
2007-03-16 16:00 ` fche at redhat dot com
2007-05-08 19:06 ` fche at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).