public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* [Bug runtime/5194] New: IO problem on begin/end probes
@ 2007-10-18 14:01 hunt at redhat dot com
  2007-10-18 14:10 ` [Bug runtime/5194] " fche at redhat dot com
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: hunt at redhat dot com @ 2007-10-18 14:01 UTC (permalink / raw)
  To: systemtap

begin and end probes are now interruptible. This creates a problem with IO
because we use percpu buffers.

A possible solution would be to have the translator add a call to
smp_processor_id() to the probe prologue and modify the print functions to take
a cpu input instead of each calling smp_processor)id().

-- 
           Summary: IO problem on begin/end probes
           Product: systemtap
           Version: unspecified
            Status: NEW
          Severity: critical
          Priority: P1
         Component: runtime
        AssignedTo: systemtap at sources dot redhat dot com
        ReportedBy: hunt at redhat dot com


http://sourceware.org/bugzilla/show_bug.cgi?id=5194

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug runtime/5194] IO problem on begin/end probes
  2007-10-18 14:01 [Bug runtime/5194] New: IO problem on begin/end probes hunt at redhat dot com
@ 2007-10-18 14:10 ` fche at redhat dot com
  2007-10-18 14:11 ` fche at redhat dot com
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: fche at redhat dot com @ 2007-10-18 14:10 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From fche at redhat dot com  2007-10-18 14:10 -------
Can you spell out why you believe there is a problem?
begin/end probes are only run when no kprobes etc. are even registered.
They do not run in parallel with each other.
So I see no source of concurrency/reentrancy (other than the shared-buffer
guest/host case I mentioned in the other bug).

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=5194

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug runtime/5194] IO problem on begin/end probes
  2007-10-18 14:01 [Bug runtime/5194] New: IO problem on begin/end probes hunt at redhat dot com
  2007-10-18 14:10 ` [Bug runtime/5194] " fche at redhat dot com
@ 2007-10-18 14:11 ` fche at redhat dot com
  2007-10-18 14:25 ` hunt at redhat dot com
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: fche at redhat dot com @ 2007-10-18 14:11 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From fche at redhat dot com  2007-10-18 14:11 -------
Plus remember, we still block preemption during begin/end probes.

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=5194

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug runtime/5194] IO problem on begin/end probes
  2007-10-18 14:01 [Bug runtime/5194] New: IO problem on begin/end probes hunt at redhat dot com
  2007-10-18 14:10 ` [Bug runtime/5194] " fche at redhat dot com
  2007-10-18 14:11 ` fche at redhat dot com
@ 2007-10-18 14:25 ` hunt at redhat dot com
  2007-10-18 15:19 ` fche at redhat dot com
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: hunt at redhat dot com @ 2007-10-18 14:25 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From hunt at redhat dot com  2007-10-18 14:25 -------
OK, this is not something we normally run into.  It's easy to force the
situation by calling a sleeping function in a begin probe.  This is not
something we normally do, but we might, so we should handle it correctly or
prevent it.  

If the begin/end probe does sleep or get interrupted, it might not get the same
cpu when it resumes, which means  the output buffer will change without the
previous buffer getting flushed and data will be lost.

IIRC, one of the reasons we enabled preemption was to handle situations where
large arrays are dumped during end probes. If interrupts are disabled, stapio
cannot process the output while it is arriving and the module runs out of
relayfs buffers.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|critical                    |normal
           Priority|P1                          |P2


http://sourceware.org/bugzilla/show_bug.cgi?id=5194

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug runtime/5194] IO problem on begin/end probes
  2007-10-18 14:01 [Bug runtime/5194] New: IO problem on begin/end probes hunt at redhat dot com
                   ` (2 preceding siblings ...)
  2007-10-18 14:25 ` hunt at redhat dot com
@ 2007-10-18 15:19 ` fche at redhat dot com
  2008-01-21 17:42 ` fche at redhat dot com
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: fche at redhat dot com @ 2007-10-18 15:19 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From fche at redhat dot com  2007-10-18 15:18 -------
(In reply to comment #3)
> calling a sleeping function in a begin probe.

That remains invalid, even with the slight relaxing we did earlier.

> If the begin/end probe does sleep or get interrupted, it might
> not get the same cpu when it resumes,

Interrupts do not cause a reschedule or cpu reassingnment if
preemption was blocked (as it is).

> which means the output buffer will change without the
> previous buffer getting flushed and data will be lost.

If this is the limit of what can happen when a user schemes to call
a sleeping function (which can only happen in "-g" mode anyway), then
this problem is a minor one.

> IIRC, one of the reasons we enabled preemption was to handle situations where
> large arrays are dumped during end probes.

That was only a side-effect.  We wanted to produce "long" reports, for "long"
meaning perhaps thousands of lines.  It's not megabytes.  With the status
quo, SMP machines may be able to keep up with even that.


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=5194

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug runtime/5194] IO problem on begin/end probes
  2007-10-18 14:01 [Bug runtime/5194] New: IO problem on begin/end probes hunt at redhat dot com
                   ` (3 preceding siblings ...)
  2007-10-18 15:19 ` fche at redhat dot com
@ 2008-01-21 17:42 ` fche at redhat dot com
  2008-01-21 17:56 ` hunt at redhat dot com
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: fche at redhat dot com @ 2008-01-21 17:42 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From fche at redhat dot com  2008-01-21 17:41 -------
I'm not clear about whether there actually exists a bug described
in comment #0, in that we do not enable sleeping or preemption
during begin/end probes, so we can't switch CPUs.


-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |WAITING


http://sourceware.org/bugzilla/show_bug.cgi?id=5194

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug runtime/5194] IO problem on begin/end probes
  2007-10-18 14:01 [Bug runtime/5194] New: IO problem on begin/end probes hunt at redhat dot com
                   ` (4 preceding siblings ...)
  2008-01-21 17:42 ` fche at redhat dot com
@ 2008-01-21 17:56 ` hunt at redhat dot com
  2008-01-21 18:13 ` [Bug runtime/5194] IO problem on begin/end probes; need fche at redhat dot com
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: hunt at redhat dot com @ 2008-01-21 17:56 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From hunt at redhat dot com  2008-01-21 17:55 -------
(In reply to comment #5)
> I'm not clear about whether there actually exists a bug described
> in comment #0, in that we do not enable sleeping or preemption
> during begin/end probes, so we can't switch CPUs.

We don't?  News to me. It's easy to test.

>cat test.stp
function in_interrupt:long () %{ /* pure */
        THIS->__retvalue = in_interrupt();
%}
function irqs_disabled:long () %{ /* pure */
        THIS->__retvalue = irqs_disabled();
%}
function preempt_count:long () %{ /* pure */
        THIS->__retvalue = preempt_count();
%}
function print_info () {
	printf("%s\n", pp())
	printf("cpu: %d\n", cpu())
	printf("in_interrupt:%d, irqs_disabled:%d, preempt_count:%d\n", in_interrupt(),
irqs_disabled(), preempt_count());
	print("------------------\n\n")
}

probe begin {
	print_info()
}

>stap test.stp
begin
cpu: 2
in_interrupt:0, irqs_disabled:0, preempt_count:0



-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=5194

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug runtime/5194] IO problem on begin/end probes; need
  2007-10-18 14:01 [Bug runtime/5194] New: IO problem on begin/end probes hunt at redhat dot com
                   ` (5 preceding siblings ...)
  2008-01-21 17:56 ` hunt at redhat dot com
@ 2008-01-21 18:13 ` fche at redhat dot com
  2008-01-21 18:37 ` fche at redhat dot com
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: fche at redhat dot com @ 2008-01-21 18:13 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From fche at redhat dot com  2008-01-21 18:12 -------
(In reply to comment #6)
> (In reply to comment #5)
> > I'm not clear about whether there actually exists a bug described
> > in comment #0, in that we do not enable sleeping or preemption
> > during begin/end probes, so we can't switch CPUs.
> 
> We don't?  News to me. It's easy to test.
> [...]
> in_interrupt:0, irqs_disabled:0, preempt_count:0

Thanks for running the test.
The preempt_count should not be 0.
The generated code in enter_begin_probe clearly contains
a preempt_disable(), so the question is why this does not
appear to have effect.


-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|IO problem on begin/end     |IO problem on begin/end
                   |probes                      |probes; need


http://sourceware.org/bugzilla/show_bug.cgi?id=5194

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug runtime/5194] IO problem on begin/end probes; need
  2007-10-18 14:01 [Bug runtime/5194] New: IO problem on begin/end probes hunt at redhat dot com
                   ` (6 preceding siblings ...)
  2008-01-21 18:13 ` [Bug runtime/5194] IO problem on begin/end probes; need fche at redhat dot com
@ 2008-01-21 18:37 ` fche at redhat dot com
  2008-01-29 16:40 ` fche at redhat dot com
  2008-01-29 17:08 ` hunt at redhat dot com
  9 siblings, 0 replies; 11+ messages in thread
From: fche at redhat dot com @ 2008-01-21 18:37 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From fche at redhat dot com  2008-01-21 18:36 -------
Maybe your kernel built without CONFIG_PRREMPT.
My f8 kernel only has CONFIG_PREEMPT_VOLUNTARY and
CONFIG_PREEMPT_BKL.

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=5194

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug runtime/5194] IO problem on begin/end probes; need
  2007-10-18 14:01 [Bug runtime/5194] New: IO problem on begin/end probes hunt at redhat dot com
                   ` (7 preceding siblings ...)
  2008-01-21 18:37 ` fche at redhat dot com
@ 2008-01-29 16:40 ` fche at redhat dot com
  2008-01-29 17:08 ` hunt at redhat dot com
  9 siblings, 0 replies; 11+ messages in thread
From: fche at redhat dot com @ 2008-01-29 16:40 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From fche at redhat dot com  2008-01-29 16:40 -------
(In reply to comment #6)
> (In reply to comment #5)
> > I'm not clear about whether there actually exists a bug described
> > in comment #0, in that we do not enable sleeping or preemption
> > during begin/end probes, so we can't switch CPUs.
> 
> We don't?  News to me.

Unless you have a scenario that could explain cpu switching, or an
actual test case that demonstrates it occurring, let's please close
this bug.

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=5194

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug runtime/5194] IO problem on begin/end probes; need
  2007-10-18 14:01 [Bug runtime/5194] New: IO problem on begin/end probes hunt at redhat dot com
                   ` (8 preceding siblings ...)
  2008-01-29 16:40 ` fche at redhat dot com
@ 2008-01-29 17:08 ` hunt at redhat dot com
  9 siblings, 0 replies; 11+ messages in thread
From: hunt at redhat dot com @ 2008-01-29 17:08 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From hunt at redhat dot com  2008-01-29 17:07 -------

> Unless you have a scenario that could explain cpu switching, or an
> actual test case that demonstrates it occurring, let's please close
> this bug.

Something could only happen now if code included using "-g" slept during a begin
or end probe.  The only reason to ever do so that I can think of is because we
decide to change IO to timeout and retry on error during end probes.  And the
reason to do that is because in extreme cases, on single-cpu systems, the total
data collected in arrays and being dumped in the end probe exceeds the free
space in the relayfs buffer.  So we have to free the cpu and let stapio read
from the relayfs buffer to continue.  Such a situation would be rare and we
could just say the proper fix is to simply use "-s" to set the buffer size larger.

I'm fine with that, at least for now. We have bigger fish to fry.



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |RESOLVED
         Resolution|                            |WONTFIX


http://sourceware.org/bugzilla/show_bug.cgi?id=5194

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2008-01-29 17:08 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-10-18 14:01 [Bug runtime/5194] New: IO problem on begin/end probes hunt at redhat dot com
2007-10-18 14:10 ` [Bug runtime/5194] " fche at redhat dot com
2007-10-18 14:11 ` fche at redhat dot com
2007-10-18 14:25 ` hunt at redhat dot com
2007-10-18 15:19 ` fche at redhat dot com
2008-01-21 17:42 ` fche at redhat dot com
2008-01-21 17:56 ` hunt at redhat dot com
2008-01-21 18:13 ` [Bug runtime/5194] IO problem on begin/end probes; need fche at redhat dot com
2008-01-21 18:37 ` fche at redhat dot com
2008-01-29 16:40 ` fche at redhat dot com
2008-01-29 17:08 ` hunt at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).