public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* [Bug translator/2525] New: NMI Watchdog lockup with gettimeofday and do_timer probe
@ 2006-04-06 18:10 joshua dot i dot stone at intel dot com
  2006-04-06 18:14 ` [Bug translator/2525] " joshua dot i dot stone at intel dot com
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: joshua dot i dot stone at intel dot com @ 2006-04-06 18:10 UTC (permalink / raw)
  To: systemtap

This one-liner:
  probe kernel.function("do_timer") { gettimeofday_us() }

causes "NMI Watchdog detected LOCKUP" on 2.6.16-1.2080_FC5 x86_64.  It also
hangs on RHEL4 2.6.9-34.ELsmp, without any message on the console.

-- 
           Summary: NMI Watchdog lockup with gettimeofday and do_timer probe
           Product: systemtap
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: translator
        AssignedTo: systemtap at sources dot redhat dot com
        ReportedBy: joshua dot i dot stone at intel dot com


http://sourceware.org/bugzilla/show_bug.cgi?id=2525

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug translator/2525] NMI Watchdog lockup with gettimeofday and do_timer probe
  2006-04-06 18:10 [Bug translator/2525] New: NMI Watchdog lockup with gettimeofday and do_timer probe joshua dot i dot stone at intel dot com
@ 2006-04-06 18:14 ` joshua dot i dot stone at intel dot com
  2006-04-07  3:43 ` bibo dot mao at intel dot com
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: joshua dot i dot stone at intel dot com @ 2006-04-06 18:14 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From joshua dot i dot stone at intel dot com  2006-04-06 18:14 -------
Created an attachment (id=957)
 --> (http://sourceware.org/bugzilla/attachment.cgi?id=957&action=view)
Console dump of crash on 2.6.16-1.2080_FC5 x86_64


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=2525

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug translator/2525] NMI Watchdog lockup with gettimeofday and do_timer probe
  2006-04-06 18:10 [Bug translator/2525] New: NMI Watchdog lockup with gettimeofday and do_timer probe joshua dot i dot stone at intel dot com
  2006-04-06 18:14 ` [Bug translator/2525] " joshua dot i dot stone at intel dot com
@ 2006-04-07  3:43 ` bibo dot mao at intel dot com
  2006-04-07 16:11 ` fche at redhat dot com
  2006-04-07 17:51 ` joshua dot i dot stone at intel dot com
  3 siblings, 0 replies; 5+ messages in thread
From: bibo dot mao at intel dot com @ 2006-04-07  3:43 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From bibo dot mao at intel dot com  2006-04-07 03:43 -------
do_timer is called by timer_interrupt() function, there is write_seqlock
(&xtime_lock) sentence, and in do_gettimeofday sentence there will be 
  do{
       read_seqbegin(&xtime_lock);
       ...
    } while(read_seqretry(&xtime_lock, seq));

write_seqlock will increase xtime_lock.sequence by 1, interrupt is disabled 
when calling do_gettimeofday() function, there will be dead loop in 
do_gettimeofday () function.

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=2525

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug translator/2525] NMI Watchdog lockup with gettimeofday and do_timer probe
  2006-04-06 18:10 [Bug translator/2525] New: NMI Watchdog lockup with gettimeofday and do_timer probe joshua dot i dot stone at intel dot com
  2006-04-06 18:14 ` [Bug translator/2525] " joshua dot i dot stone at intel dot com
  2006-04-07  3:43 ` bibo dot mao at intel dot com
@ 2006-04-07 16:11 ` fche at redhat dot com
  2006-04-07 17:51 ` joshua dot i dot stone at intel dot com
  3 siblings, 0 replies; 5+ messages in thread
From: fche at redhat dot com @ 2006-04-07 16:11 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From fche at redhat dot com  2006-04-07 16:11 -------
Sigh, another printk-like situation.  This would only be a partial solution, but
could the timer callback functions be made one more level indirect, by
interposing a level of work queing?

Alternately, is there a lower-level approximate gettimeofday equivalent that
does not putz with locks?

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=2525

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug translator/2525] NMI Watchdog lockup with gettimeofday and do_timer probe
  2006-04-06 18:10 [Bug translator/2525] New: NMI Watchdog lockup with gettimeofday and do_timer probe joshua dot i dot stone at intel dot com
                   ` (2 preceding siblings ...)
  2006-04-07 16:11 ` fche at redhat dot com
@ 2006-04-07 17:51 ` joshua dot i dot stone at intel dot com
  3 siblings, 0 replies; 5+ messages in thread
From: joshua dot i dot stone at intel dot com @ 2006-04-07 17:51 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From joshua dot i dot stone at intel dot com  2006-04-07 17:51 -------
(In reply to comment #2)
Many thanks Bibo for the detailed analysis!

(In reply to comment #3)
> Sigh, another printk-like situation.  This would only be a partial solution, but
> could the timer callback functions be made one more level indirect, by
> interposing a level of work queing?

The problem was discovered with a kprobe, not a timer probe - are you just
extrapolating?  Indeed, this will likely be a problem for timer.profile as well,
but I think timer.ms and timer.jiffies run in a different context.  I will try
each variant and make sure though.

Work queuing may be ok for ms and jiffies probes if needed, but for
timer.profile I don't think that will work.  Won't the trapframe be invalid if
we delay the probe execution?

> Alternately, is there a lower-level approximate gettimeofday equivalent that
> does not putz with locks?

I don't know of one - nothing in linux/time.h looks promising.

The xtime_lock is exported though, so we should be able spin until read_seqretry
returns 0, and we can limit it to MAXTRYLOCK and throw an error, just like we do
for our own locks.  If someone else comes along and locks it after we determine
we're clear, that's ok because it won't be anyone in our callstack, so there
won't be a deadlock.

This should allow safely calling do_gettimeofday, and if we find a non-locking
alternative we can use that as a fallback.  I will implement this lock-test in
timestamp.stp...

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|systemtap at sources dot    |joshua dot i dot stone at
                   |redhat dot com              |intel dot com
             Status|NEW                         |ASSIGNED


http://sourceware.org/bugzilla/show_bug.cgi?id=2525

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2006-04-07 17:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-04-06 18:10 [Bug translator/2525] New: NMI Watchdog lockup with gettimeofday and do_timer probe joshua dot i dot stone at intel dot com
2006-04-06 18:14 ` [Bug translator/2525] " joshua dot i dot stone at intel dot com
2006-04-07  3:43 ` bibo dot mao at intel dot com
2006-04-07 16:11 ` fche at redhat dot com
2006-04-07 17:51 ` joshua dot i dot stone at intel dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).