public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* Backward compatibility for insn probe point
@ 2009-04-01 21:59 Maynard Johnson
  2009-04-27 21:01 ` David Smith
  0 siblings, 1 reply; 16+ messages in thread
From: Maynard Johnson @ 2009-04-01 21:59 UTC (permalink / raw)
  To: systemtap; +Cc: Frank Ch. Eigler

Frank has already made the changes in runtime/itrace.c to support backward compatibility with older utrace.  I wanted to test the insn probe point on an older utrace, so I built and installed systemtap 0.9.5 on a ppc970 blade running RHEL 5.3.  I ran the following simple test:

# stap -c /bin/ls simple-test.stp /bin/ls -o simple-out -k -vvv 
     where simple-test.stp is . . .
=========simple-test.stp ========================
global instrs = 0

probe begin {
	printf("systemtap starting probe\n")
}

probe process(@1).insn {
	instrs += 1
}

probe end { printf("systemtap ending probe\n")
printf("itraced = %d\n", instrs)
}
==================================

The result of the above test is that the stap command hangs at "stapio:start_cmd:195 execing target_cmd /bin/ls".  If I Ctl-C the job, it finished (i.e, I see "Pass 5: run completed ..."), but the output file contents indicate the insn probe was not hit (i.e., "itraced = 0").

Any suggestions on where to look for the problem?

Thanks.
-Maynard

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Backward compatibility for insn probe point
  2009-04-01 21:59 Backward compatibility for insn probe point Maynard Johnson
@ 2009-04-27 21:01 ` David Smith
  2009-04-29 13:56   ` Maynard Johnson
  2009-04-29 21:08   ` Maynard Johnson
  0 siblings, 2 replies; 16+ messages in thread
From: David Smith @ 2009-04-27 21:01 UTC (permalink / raw)
  To: Maynard Johnson; +Cc: systemtap, Frank Ch. Eigler

Maynard Johnson wrote:
> Frank has already made the changes in runtime/itrace.c to support
> backward compatibility with older utrace.
> I wanted to test the insn probe point on an older utrace, so I built
> and installed systemtap 0.9.5 on a ppc970 blade running RHEL 5.3.  I
> ran the following simple test:
> 
> # stap -c /bin/ls simple-test.stp /bin/ls -o simple-out -k -vvv 
>      where simple-test.stp is . . .
> =========simple-test.stp ========================
> global instrs = 0
> 
> probe begin {
> 	printf("systemtap starting probe\n")
> }
> 
> probe process(@1).insn {
> 	instrs += 1
> }
> 
> probe end { printf("systemtap ending probe\n")
> printf("itraced = %d\n", instrs)
> }
> ==================================
> 
> The result of the above test is that the stap command hangs at
> "stapio:start_cmd:195 execing target_cmd /bin/ls".  If I Ctl-C the
> job, it finished (i.e, I see "Pass 5: run completed ..."), but the
> output file contents indicate the insn probe was not hit (i.e.,
> "itraced = 0").
> 
> Any suggestions on where to look for the problem?

I took a look at this and fixed it.  For more details, see
<http://sources.redhat.com/bugzilla/show_bug.cgi?id=10091>.

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Backward compatibility for insn probe point
  2009-04-27 21:01 ` David Smith
@ 2009-04-29 13:56   ` Maynard Johnson
  2009-04-29 21:08   ` Maynard Johnson
  1 sibling, 0 replies; 16+ messages in thread
From: Maynard Johnson @ 2009-04-29 13:56 UTC (permalink / raw)
  To: David Smith; +Cc: systemtap, Frank Ch. Eigler

David Smith wrote:
> Maynard Johnson wrote:
>> Frank has already made the changes in runtime/itrace.c to support
>> backward compatibility with older utrace.
>> I wanted to test the insn probe point on an older utrace, so I built
>> and installed systemtap 0.9.5 on a ppc970 blade running RHEL 5.3.  I
>> ran the following simple test:
>>
>> # stap -c /bin/ls simple-test.stp /bin/ls -o simple-out -k -vvv 
>>      where simple-test.stp is . . .
>> =========simple-test.stp ========================
>> global instrs = 0
>>
>> probe begin {
>> 	printf("systemtap starting probe\n")
>> }
>>
>> probe process(@1).insn {
>> 	instrs += 1
>> }
>>
>> probe end { printf("systemtap ending probe\n")
>> printf("itraced = %d\n", instrs)
>> }
>> ==================================
>>
>> The result of the above test is that the stap command hangs at
>> "stapio:start_cmd:195 execing target_cmd /bin/ls".  If I Ctl-C the
>> job, it finished (i.e, I see "Pass 5: run completed ..."), but the
>> output file contents indicate the insn probe was not hit (i.e.,
>> "itraced = 0").
>>
>> Any suggestions on where to look for the problem?
> 
> I took a look at this and fixed it.  For more details, see
Excellent!  Thanks much.  I hadn't gotten very far debugging this -- too many other irons in the fire, I'm afraid.  I will test it out on ppc64 RHEL 5 and Fedora 11 and let you know my results.

-Maynard 
> <http://sources.redhat.com/bugzilla/show_bug.cgi?id=10091>.
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Backward compatibility for insn probe point
  2009-04-27 21:01 ` David Smith
  2009-04-29 13:56   ` Maynard Johnson
@ 2009-04-29 21:08   ` Maynard Johnson
  2009-04-30 18:00     ` David Smith
  1 sibling, 1 reply; 16+ messages in thread
From: Maynard Johnson @ 2009-04-29 21:08 UTC (permalink / raw)
  To: David Smith; +Cc: systemtap, Frank Ch. Eigler

David Smith wrote:
> Maynard Johnson wrote:
>> Frank has already made the changes in runtime/itrace.c to support
>> backward compatibility with older utrace.
>> I wanted to test the insn probe point on an older utrace, so I built
>> and installed systemtap 0.9.5 on a ppc970 blade running RHEL 5.3.  I
>> ran the following simple test:
>>
>> # stap -c /bin/ls simple-test.stp /bin/ls -o simple-out -k -vvv 
>>      where simple-test.stp is . . .
>> =========simple-test.stp ========================
>> global instrs = 0
>>
>> probe begin {
>> 	printf("systemtap starting probe\n")
>> }
>>
>> probe process(@1).insn {
>> 	instrs += 1
>> }
>>
>> probe end { printf("systemtap ending probe\n")
>> printf("itraced = %d\n", instrs)
>> }
>> ==================================
>>
>> The result of the above test is that the stap command hangs at
>> "stapio:start_cmd:195 execing target_cmd /bin/ls".  If I Ctl-C the
>> job, it finished (i.e, I see "Pass 5: run completed ..."), but the
>> output file contents indicate the insn probe was not hit (i.e.,
>> "itraced = 0").
>>
>> Any suggestions on where to look for the problem?
> 
> I took a look at this and fixed it.  For more details, see
I've not tested the fix on x86 yet, but I'm afraid the results don't look right on ppc64/RHEL 5.  But test results of 0.9.7 with your patch on ppc64/F11 were good -- no regressions.  I used the following simple test script:

========================================================
global instrs = 0

probe process(@1).begin {
        printf("pid %d begins\n", pid())
}

probe process(@1).end {
        printf("pid %d ends\n", pid())
}


probe begin { printf("systemtap starting probe\n") }

probe process(@1).insn {
        instrs += 1
}



probe end { printf("systemtap ending probe\n")
          printf("itraced = %d\n", instrs)
    }
========================================================

I invoked the script as follows:
     stap  -c /usr/bin/whoami simple-test.stp /usr/bin/whoami -o simple-out
On Fedora 11, the simple-out file showed that I had nearly 330,000 insn probe hits.  On ppc64/RHEL 5.3, I had only 65 probe hits.  Can you try out the above script on an x86/RHEL 5.3 system?

Thanks.
-Maynard
> <http://sources.redhat.com/bugzilla/show_bug.cgi?id=10091>.
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Backward compatibility for insn probe point
  2009-04-29 21:08   ` Maynard Johnson
@ 2009-04-30 18:00     ` David Smith
  2009-04-30 20:48       ` Roland McGrath
  0 siblings, 1 reply; 16+ messages in thread
From: David Smith @ 2009-04-30 18:00 UTC (permalink / raw)
  To: Maynard Johnson; +Cc: systemtap, Frank Ch. Eigler, Roland McGrath

Maynard Johnson wrote:
> David Smith wrote:
>> Maynard Johnson wrote:

... stuff deleted ...

>>> The result of the above test is that the stap command hangs at
>>> "stapio:start_cmd:195 execing target_cmd /bin/ls".  If I Ctl-C the
>>> job, it finished (i.e, I see "Pass 5: run completed ..."), but the
>>> output file contents indicate the insn probe was not hit (i.e.,
>>> "itraced = 0").
>>>
>>> Any suggestions on where to look for the problem?
>> I took a look at this and fixed it.  For more details, see
> I've not tested the fix on x86 yet, but I'm afraid the results don't look right on ppc64/RHEL 5.
> But test results of 0.9.7 with your patch on ppc64/F11 were good -- no
regressions.  I used the
> following simple test script:

... script deleted ...

> I invoked the script as follows:
>      stap  -c /usr/bin/whoami simple-test.stp /usr/bin/whoami -o simple-out
> On Fedora 11, the simple-out file showed that I had nearly 330,000 insn probe hits.  On ppc64/RHEL
> 5.3, I had only 65 probe hits.  Can you try out the above script on an
x86/RHEL 5.3 system?

I've tested this on x86/RHEL5.3 (2.6.18-128.1.6.el5) and I get 226127
probe hits.  On x86_64/RHEL5.3 (2.6.18-128.1.6.el5) I get 198294 probe hits.

Off the top of my head that might mean there is a specific ppc64 utrace
problem.

Roland, do you know of any problems with original utrace doing
UTRACE_ACTION_SINGLESTEP on ppc64?

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Backward compatibility for insn probe point
  2009-04-30 18:00     ` David Smith
@ 2009-04-30 20:48       ` Roland McGrath
  2009-05-12 16:06         ` David Smith
  0 siblings, 1 reply; 16+ messages in thread
From: Roland McGrath @ 2009-04-30 20:48 UTC (permalink / raw)
  To: David Smith; +Cc: Maynard Johnson, systemtap, Frank Ch. Eigler

> Off the top of my head that might mean there is a specific ppc64 utrace
> problem.
> 
> Roland, do you know of any problems with original utrace doing
> UTRACE_ACTION_SINGLESTEP on ppc64?

Sorry, I can't remember anything in particular.  There might well have been
some issue that I've forgotten.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Backward compatibility for insn probe point
  2009-04-30 20:48       ` Roland McGrath
@ 2009-05-12 16:06         ` David Smith
  2009-05-12 18:20           ` Roland McGrath
  0 siblings, 1 reply; 16+ messages in thread
From: David Smith @ 2009-05-12 16:06 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Maynard Johnson, systemtap, Frank Ch. Eigler

Roland McGrath wrote:
>> Off the top of my head that might mean there is a specific ppc64 utrace
>> problem.
>>
>> Roland, do you know of any problems with original utrace doing
>> UTRACE_ACTION_SINGLESTEP on ppc64?
> 
> Sorry, I can't remember anything in particular.  There might well have been
> some issue that I've forgotten.

I've done some more debugging here, and I thought I'd let you know what
I've found.  The problem occurs on all versions of 2.6.25 I've found
(which uses the original utrace).  On the earliest version of 2.6.26
I've found (2.6.26.3-17.fc9.ppc64) which uses the new utrace, this works
correctly.

Digging a bit deeper into the behavior on failing kernels, I've found
that if you set up an insn probe point and a syscall probe, you get only
one insn probe hit before each syscall.  Note that gdb is fully able to
do instruction stepping on failing kernels, so ptrace seems to be
working correctly.

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Backward compatibility for insn probe point
  2009-05-12 16:06         ` David Smith
@ 2009-05-12 18:20           ` Roland McGrath
  2009-05-13 15:04             ` David Smith
  0 siblings, 1 reply; 16+ messages in thread
From: Roland McGrath @ 2009-05-12 18:20 UTC (permalink / raw)
  To: David Smith; +Cc: Maynard Johnson, systemtap, Frank Ch. Eigler

Ok, that is on the verge of ringing a bell.

The single-step trap hit resets the single-step bit in the registers
(arch/powerpc/kernel/traps.c:single_step_exception).  That needs to be
turned on again before resuming.  The only place that this happens is
utrace_quiescent->tracehook_enable_single_step.  You should get there via
check_quiescent after each event report when UTRACE_ACTION_STATE_* is set.

It makes sense that ptrace does not see the same problem.  It always stops
after each step trap, so it surely goes into utrace_quiescent to stop;
there it will properly re-enable stepping when it gets resumed.  In the
itrace scenario, you don't stop, so it's only the (apparently broken)
bookkeeping that should ensure you get there.

In a reporting loop, update_action should be keeping UTRACE_ACTION_SINGLESTEP
in its return value, so that check_quiescent see it and calls utrace_quiescent.
You can see if some of that is going wrong.

(This is all entirely different in modern utrace.)


Thanks,
Roland

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Backward compatibility for insn probe point
  2009-05-12 18:20           ` Roland McGrath
@ 2009-05-13 15:04             ` David Smith
  2009-05-13 18:24               ` Roland McGrath
  0 siblings, 1 reply; 16+ messages in thread
From: David Smith @ 2009-05-13 15:04 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Maynard Johnson, systemtap, Frank Ch. Eigler

Roland McGrath wrote:
> Ok, that is on the verge of ringing a bell.
> 
> The single-step trap hit resets the single-step bit in the registers
> (arch/powerpc/kernel/traps.c:single_step_exception).  That needs to be
> turned on again before resuming.  The only place that this happens is
> utrace_quiescent->tracehook_enable_single_step.  You should get there via
> check_quiescent after each event report when UTRACE_ACTION_STATE_* is set.
> 
> It makes sense that ptrace does not see the same problem.  It always stops
> after each step trap, so it surely goes into utrace_quiescent to stop;
> there it will properly re-enable stepping when it gets resumed.  In the
> itrace scenario, you don't stop, so it's only the (apparently broken)
> bookkeeping that should ensure you get there.
> 
> In a reporting loop, update_action should be keeping UTRACE_ACTION_SINGLESTEP
> in its return value, so that check_quiescent see it and calls utrace_quiescent.
> You can see if some of that is going wrong.
> 
> (This is all entirely different in modern utrace.)

I poked around the kernel source some more, but couldn't see what was
going wrong.  So, I decided to attack the problem from a different
angle, based on what you said above.

I've changed the itrace code to stop the task after each step trap (so
that it acts more like ptrace).  I've tested this on several kernels
(2.6.18-141.el5/ppc, 2.6.18-128.1.10.el5/x86_64/i686, and
2.6.25-14.fc9.ppc64) and it seems to work correctly.

Does this seem like a reasonable work-around?  Could there be problems
with this approach?

Thanks.

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Backward compatibility for insn probe point
  2009-05-13 15:04             ` David Smith
@ 2009-05-13 18:24               ` Roland McGrath
  2009-05-14 15:11                 ` David Smith
  0 siblings, 1 reply; 16+ messages in thread
From: Roland McGrath @ 2009-05-13 18:24 UTC (permalink / raw)
  To: David Smith; +Cc: Maynard Johnson, systemtap, Frank Ch. Eigler

> I poked around the kernel source some more, but couldn't see what was
> going wrong.  

I figured you'd use some stap probes to follow the code paths!

> I've changed the itrace code to stop the task after each step trap (so
> that it acts more like ptrace).  I've tested this on several kernels
> (2.6.18-141.el5/ppc, 2.6.18-128.1.10.el5/x86_64/i686, and
> 2.6.25-14.fc9.ppc64) and it seems to work correctly.
> 
> Does this seem like a reasonable work-around?  Could there be problems
> with this approach?

I presume it kills performance.  But what works works, that's a what a
work-around is.  I'd hope that you don't make it use this "not really
right" mode for kernels with the modern utrace interface that doesn't have
this bug.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Backward compatibility for insn probe point
  2009-05-13 18:24               ` Roland McGrath
@ 2009-05-14 15:11                 ` David Smith
  2009-05-14 18:41                   ` Roland McGrath
  2009-05-14 19:01                   ` Maynard Johnson
  0 siblings, 2 replies; 16+ messages in thread
From: David Smith @ 2009-05-14 15:11 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Maynard Johnson, systemtap, Frank Ch. Eigler

Roland McGrath wrote:
>> I poked around the kernel source some more, but couldn't see what was
>> going wrong.  
> 
> I figured you'd use some stap probes to follow the code paths!

Oh I did, but so much of it is inlined that it was incredibly difficult
to follow.

>> I've changed the itrace code to stop the task after each step trap (so
>> that it acts more like ptrace).  I've tested this on several kernels
>> (2.6.18-141.el5/ppc, 2.6.18-128.1.10.el5/x86_64/i686, and
>> 2.6.25-14.fc9.ppc64) and it seems to work correctly.
>>
>> Does this seem like a reasonable work-around?  Could there be problems
>> with this approach?
> 
> I presume it kills performance.  But what works works, that's a what a
> work-around is.  I'd hope that you don't make it use this "not really
> right" mode for kernels with the modern utrace interface that doesn't have
> this bug.

Yes, this is only for old utrace.  As far as performance goes, I've
benchmarked single-stepping '/bin/ls' on x86_64 with both approaches.
Here's what I saw ('time' output, averages of 5 runs):

- no stopping on each step trap:
real   0m3.735s
user   0m0.328s
sys    0m3.359s

- stopping on each step trap:
real   0m4.101s
user   0m0.336s
sys    0m3.692s

I could also limit this work-around to ppc-only, to not penalize other
architectures.

One last thing.  I thought I'd try block stepping, so I got access to an
ia64 machine.  Unfortunately, using systemtap insn probes (either single
or block step) lock up the system with a spinlock lockup.  Sigh.

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Backward compatibility for insn probe point
  2009-05-14 15:11                 ` David Smith
@ 2009-05-14 18:41                   ` Roland McGrath
  2009-05-14 19:01                   ` Maynard Johnson
  1 sibling, 0 replies; 16+ messages in thread
From: Roland McGrath @ 2009-05-14 18:41 UTC (permalink / raw)
  To: David Smith; +Cc: Maynard Johnson, systemtap, Frank Ch. Eigler

> One last thing.  I thought I'd try block stepping, so I got access to an
> ia64 machine.  Unfortunately, using systemtap insn probes (either single
> or block step) lock up the system with a spinlock lockup.  Sigh.

Exciting!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Backward compatibility for insn probe point
  2009-05-14 15:11                 ` David Smith
  2009-05-14 18:41                   ` Roland McGrath
@ 2009-05-14 19:01                   ` Maynard Johnson
  2009-05-14 19:44                     ` David Smith
  1 sibling, 1 reply; 16+ messages in thread
From: Maynard Johnson @ 2009-05-14 19:01 UTC (permalink / raw)
  To: David Smith; +Cc: Roland McGrath, systemtap, Frank Ch. Eigler

David Smith wrote:
> Roland McGrath wrote:
>>> I poked around the kernel source some more, but couldn't see what was
>>> going wrong.  
>> I figured you'd use some stap probes to follow the code paths!
> 
> Oh I did, but so much of it is inlined that it was incredibly difficult
> to follow.
> 
>>> I've changed the itrace code to stop the task after each step trap (so
>>> that it acts more like ptrace).  I've tested this on several kernels
>>> (2.6.18-141.el5/ppc, 2.6.18-128.1.10.el5/x86_64/i686, and
>>> 2.6.25-14.fc9.ppc64) and it seems to work correctly.

David, thanks very much for doing this!  I was pretty much at a loss on how to 
debug this.

>>>
>>> Does this seem like a reasonable work-around?  Could there be problems
>>> with this approach?
>> I presume it kills performance.  But what works works, that's a what a
>> work-around is.  I'd hope that you don't make it use this "not really
>> right" mode for kernels with the modern utrace interface that doesn't have
>> this bug.
> 
> Yes, this is only for old utrace.  As far as performance goes, I've
> benchmarked single-stepping '/bin/ls' on x86_64 with both approaches.
> Here's what I saw ('time' output, averages of 5 runs):
> 
> - no stopping on each step trap:
> real   0m3.735s
> user   0m0.328s
> sys    0m3.359s
> 
> - stopping on each step trap:
> real   0m4.101s
> user   0m0.336s
> sys    0m3.692s
> 
> I could also limit this work-around to ppc-only, to not penalize other
> architectures.
> 
> One last thing.  I thought I'd try block stepping, so I got access to an
> ia64 machine.  Unfortunately, using systemtap insn probes (either single
> or block step) lock up the system with a spinlock lockup.  Sigh.
Does anyone know who maintains ia64/utrace?  David, was the above error on "old" 
utrace or "new"?

-Maynard
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Backward compatibility for insn probe point
  2009-05-14 19:01                   ` Maynard Johnson
@ 2009-05-14 19:44                     ` David Smith
  2009-05-14 21:36                       ` David Smith
  0 siblings, 1 reply; 16+ messages in thread
From: David Smith @ 2009-05-14 19:44 UTC (permalink / raw)
  To: maynardj; +Cc: Roland McGrath, systemtap, Frank Ch. Eigler

Maynard Johnson wrote:
>> David Smith wrote:
>> One last thing.  I thought I'd try block stepping, so I got access to an
>> ia64 machine.  Unfortunately, using systemtap insn probes (either single
>> or block step) lock up the system with a spinlock lockup.  Sigh.
>
> Does anyone know who maintains ia64/utrace?  David, was the above error
> on "old" utrace or "new"?

The error is on "old" utrace.  I'm trying to look into the ia64 utrace
problem now.

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Backward compatibility for insn probe point
  2009-05-14 19:44                     ` David Smith
@ 2009-05-14 21:36                       ` David Smith
  2009-05-15 13:52                         ` ia64 hang when using itrace (was Re: Backward compatibility for insn probe point) David Smith
  0 siblings, 1 reply; 16+ messages in thread
From: David Smith @ 2009-05-14 21:36 UTC (permalink / raw)
  To: maynardj; +Cc: Roland McGrath, systemtap, Frank Ch. Eigler

David Smith wrote:
> Maynard Johnson wrote:
>>> David Smith wrote:
>>> One last thing.  I thought I'd try block stepping, so I got access to an
>>> ia64 machine.  Unfortunately, using systemtap insn probes (either single
>>> or block step) lock up the system with a spinlock lockup.  Sigh.
>> Does anyone know who maintains ia64/utrace?  David, was the above error
>> on "old" utrace or "new"?
> 
> The error is on "old" utrace.  I'm trying to look into the ia64 utrace
> problem now.

Here's what I see on the console (running lockdep enabled
2.6.18-146.el5debug):

====
BUG: spinlock lockup on CPU#0, ls/2576, e0000040fe1092d8 (Tainted: G)

Call Trace:
 [<a000000100013b40>] show_stack+0x40/0xa0
                                sp=e0000003f640f870 bsp=e0000003f6409440
 [<a000000100013bd0>] dump_stack+0x30/0x60
                                sp=e0000003f640fa40 bsp=e0000003f6409428
 [<a0000001002de200>] _raw_spin_lock+0x200/0x260
                                sp=e0000003f640fa40 bsp=e0000003f64093e8
 [<a00000010065ff50>] _spin_lock_irqsave+0x30/0x60
                                sp=e0000003f640fa40 bsp=e0000003f64093c0
 [<a00000010009c730>] force_sig_info+0x30/0x160
                                sp=e0000003f640fa40 bsp=e0000003f6409380
 [<a000000100661450>] ia64_fault+0xff0/0x1280
                                sp=e0000003f640fa40 bsp=e0000003f6409328
 [<a00000010000bfe0>] __ia64_leave_kernel+0x0/0x280
                                sp=e0000003f640fc60 bsp=e0000003f6409328
 [<a0000001002de0d0>] _raw_spin_lock+0xd0/0x260
                                sp=e0000003f640fe30 bsp=e0000003f64092c0
 [<a00000010065ff50>] _spin_lock_irqsave+0x30/0x60
                                sp=e0000003f640fe30 bsp=e0000003f6409298
 [<a00000010009c730>] force_sig_info+0x30/0x160
                                sp=e0000003f640fe30 bsp=e0000003f6409258
 [<a00000010009c890>] force_sig+0x30/0x60
                                sp=e0000003f640fe30 bsp=e0000003f6409230
 [<a00000010002cfe0>] syscall_trace_leave+0x100/0x140
                                sp=e0000003f640fe30 bsp=e0000003f64091d0
 [<a00000010000bda0>] __ia64_trace_syscall+0x100/0x110
                                sp=e0000003f640fe30 bsp=e0000003f64091d0
 [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
                                sp=e0000003f6410000 bsp=e0000003f64091d0
====

From what I can tell, the spinlock that is stuck is
current->sighand->siglock.  force_sig_info() (from kernel/signal.c:739)
grabs the spinlock, but we get a fault somewhere? and end up in
__ia64_leave_kernel() (from arch/ia64/kernel/entry.S:813).  The fault
handling in ia64_fault() calls force_sig_info() again, which tries to
grab same spinlock again.

If anyone has a better understanding of this, I'd love to know how we
ended up in __ia64_leave_kernel().

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* ia64 hang when using itrace (was Re: Backward compatibility for insn  probe point)
  2009-05-14 21:36                       ` David Smith
@ 2009-05-15 13:52                         ` David Smith
  0 siblings, 0 replies; 16+ messages in thread
From: David Smith @ 2009-05-15 13:52 UTC (permalink / raw)
  To: maynardj; +Cc: Roland McGrath, systemtap, Frank Ch. Eigler

David Smith wrote:
> David Smith wrote:
>> Maynard Johnson wrote:
>>>> David Smith wrote:
>>>> One last thing.  I thought I'd try block stepping, so I got access to an
>>>> ia64 machine.  Unfortunately, using systemtap insn probes (either single
>>>> or block step) lock up the system with a spinlock lockup.  Sigh.
>>> Does anyone know who maintains ia64/utrace?  David, was the above error
>>> on "old" utrace or "new"?
>> The error is on "old" utrace.  I'm trying to look into the ia64 utrace
>> problem now.
> 
> Here's what I see on the console (running lockdep enabled
> 2.6.18-146.el5debug):
> 
> ====
> BUG: spinlock lockup on CPU#0, ls/2576, e0000040fe1092d8 (Tainted: G)
> 
> Call Trace:
>  [<a000000100013b40>] show_stack+0x40/0xa0
>                                 sp=e0000003f640f870 bsp=e0000003f6409440
>  [<a000000100013bd0>] dump_stack+0x30/0x60
>                                 sp=e0000003f640fa40 bsp=e0000003f6409428
>  [<a0000001002de200>] _raw_spin_lock+0x200/0x260
>                                 sp=e0000003f640fa40 bsp=e0000003f64093e8
>  [<a00000010065ff50>] _spin_lock_irqsave+0x30/0x60
>                                 sp=e0000003f640fa40 bsp=e0000003f64093c0
>  [<a00000010009c730>] force_sig_info+0x30/0x160
>                                 sp=e0000003f640fa40 bsp=e0000003f6409380
>  [<a000000100661450>] ia64_fault+0xff0/0x1280
>                                 sp=e0000003f640fa40 bsp=e0000003f6409328
>  [<a00000010000bfe0>] __ia64_leave_kernel+0x0/0x280
>                                 sp=e0000003f640fc60 bsp=e0000003f6409328
>  [<a0000001002de0d0>] _raw_spin_lock+0xd0/0x260
>                                 sp=e0000003f640fe30 bsp=e0000003f64092c0
>  [<a00000010065ff50>] _spin_lock_irqsave+0x30/0x60
>                                 sp=e0000003f640fe30 bsp=e0000003f6409298
>  [<a00000010009c730>] force_sig_info+0x30/0x160
>                                 sp=e0000003f640fe30 bsp=e0000003f6409258
>  [<a00000010009c890>] force_sig+0x30/0x60
>                                 sp=e0000003f640fe30 bsp=e0000003f6409230
>  [<a00000010002cfe0>] syscall_trace_leave+0x100/0x140
>                                 sp=e0000003f640fe30 bsp=e0000003f64091d0
>  [<a00000010000bda0>] __ia64_trace_syscall+0x100/0x110
>                                 sp=e0000003f640fe30 bsp=e0000003f64091d0
>  [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
>                                 sp=e0000003f6410000 bsp=e0000003f64091d0
> ====
> 
> From what I can tell, the spinlock that is stuck is
> current->sighand->siglock.  force_sig_info() (from kernel/signal.c:739)
> grabs the spinlock, but we get a fault somewhere? and end up in
> __ia64_leave_kernel() (from arch/ia64/kernel/entry.S:813).  The fault
> handling in ia64_fault() calls force_sig_info() again, which tries to
> grab same spinlock again.
> 
> If anyone has a better understanding of this, I'd love to know how we
> ended up in __ia64_leave_kernel().

I should have included other information I know.  This always happens
after a call to set_tid_address(), which is the 79th syscall that 'ls'
runs.  By this point the insn probe has been hit at least
555391 times (my test script prints the number of instructions at every
syscall entry and exit).

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2009-05-15 13:52 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-01 21:59 Backward compatibility for insn probe point Maynard Johnson
2009-04-27 21:01 ` David Smith
2009-04-29 13:56   ` Maynard Johnson
2009-04-29 21:08   ` Maynard Johnson
2009-04-30 18:00     ` David Smith
2009-04-30 20:48       ` Roland McGrath
2009-05-12 16:06         ` David Smith
2009-05-12 18:20           ` Roland McGrath
2009-05-13 15:04             ` David Smith
2009-05-13 18:24               ` Roland McGrath
2009-05-14 15:11                 ` David Smith
2009-05-14 18:41                   ` Roland McGrath
2009-05-14 19:01                   ` Maynard Johnson
2009-05-14 19:44                     ` David Smith
2009-05-14 21:36                       ` David Smith
2009-05-15 13:52                         ` ia64 hang when using itrace (was Re: Backward compatibility for insn probe point) David Smith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).