public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* [Bug runtime/10651] New: very rare BUG_ON kernel/timer.c:619 due to runtime/time.c
@ 2009-09-17  2:28 fche at redhat dot com
  2009-09-18 20:06 ` [Bug runtime/10651] " jistone at redhat dot com
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: fche at redhat dot com @ 2009-09-17  2:28 UTC (permalink / raw)
  To: systemtap

Something is calling mod_timer with a timer->function==NULL.

It appears as if the _stp_kill_time function is needlessly racy
(amongst the stp_timer_reregister flag, which should probably be
an atomic_t), and the del_timer_sync()'s.  It wouldn't hurt to
plop a synchronize_sched() in there too before the free_percpu
goo.

This was on an i686 rawhide 4cpu VM, 2.6.31-12.fc12.i686 kernel,
according to the systemtap.log, last attempted tes was bz5274.exp,
and here's the BUG message itself:

------------[ cut here ]------------
kernel BUG at kernel/timer.c:619!
invalid opcode: 0000 [#2] SMP 
last sysfs file: /sys/module/virtio_pci/sections/__mcount_loc
Modules linked in: stap_d91e5e64f72dff7c4c94d049e84aaaab_10830 uprobes
netconsole iptable_nat nf_nat nfsd exportfs configfs nfs lockd fscache nfs_acl
auth_rpcgss sunrpc ipv6 dm_multipath uinput ppdev parport_pc parport
virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core floppy virtio_pci [last
unloaded: stap_f28f0139d183e5474e8c3292a39d7c78_2115]

Pid: 10, comm: ksoftirqd/2 Tainted: G      D W  (2.6.31-12.fc12.i686 #1) 
EIP: 0060:[<c04503fc>] EFLAGS: 00010246 CPU: 2
EIP is at __mod_timer+0x35/0x110
EAX: 00000000 EBX: c3909f7c ECX: 00000000 EDX: 00000000
ESI: c3909f40 EDI: 00001ecc EBP: f6cdde8c ESP: f6cdde6c
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process ksoftirqd/2 (pid: 10, ti=f6cdc000 task=f6ce0000 task.ti=f6cdc000)
Stack:
 00000000 00bdafbe c0605a00 42ad3970 42ad3970 00000246 c3909f40 00001ecc
<0> f6cdde9c c0450547 00000000 42ad3970 f6cdded8 f7d8c15b 9e2aa125 11641494
<0> 00000000 c04500d6 33171125 00000000 2e380e90 4ab18a28 33171125 42ad3970
Call Trace:
 [<c0605a00>] ? spin_bug+0x33/0xae
 [<c0450547>] ? mod_timer+0x2c/0x44
 [<f7d8c15b>] ? __stp_time_timer_callback+0xd4/0xed
[stap_d91e5e64f72dff7c4c94d049e84aaaab_10830]
 [<c04500d6>] ? run_timer_softirq+0x133/0x204
 [<c0450121>] ? run_timer_softirq+0x17e/0x204
 [<c04500d6>] ? run_timer_softirq+0x133/0x204
 [<f7d8c087>] ? __stp_time_timer_callback+0x0/0xed
[stap_d91e5e64f72dff7c4c94d049e84aaaab_10830]
 [<c044b4f5>] ? __do_softirq+0xc8/0x192
 [<c044b63e>] ? ksoftirqd+0x0/0xd6
 [<c044b608>] ? do_softirq+0x49/0x7f
 [<c044b696>] ? ksoftirqd+0x58/0xd6
 [<c045cef1>] ? kthread+0x7b/0x80
 [<c045ce76>] ? kthread+0x0/0x80
 [<c040463f>] ? kernel_thread_helper+0x7/0x10
Code: 1f 44 00 00 89 c3 8b 45 08 89 55 e4 88 4d e8 65 8b 15 14 00 00 00 89 55 f0
31 d2 89 45 e0 89 d8 e8 05 ff ff ff 83 7b 0c 00 75 04 <0f> 0b eb fe 8d 55 ec 89
d8 e8 59 ff ff ff 83 3b 00 89 c6 74 10 
EIP: [<c04503fc>] __mod_timer+0x35/0x110 SS:ESP 0068:f6cdde6c
---[ end trace 8eed33dc7b203ed4 ]---

-- 
           Summary: very rare BUG_ON kernel/timer.c:619 due to
                    runtime/time.c
           Product: systemtap
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: runtime
        AssignedTo: systemtap at sources dot redhat dot com
        ReportedBy: fche at redhat dot com


http://sourceware.org/bugzilla/show_bug.cgi?id=10651

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug runtime/10651] very rare BUG_ON kernel/timer.c:619 due to runtime/time.c
  2009-09-17  2:28 [Bug runtime/10651] New: very rare BUG_ON kernel/timer.c:619 due to runtime/time.c fche at redhat dot com
@ 2009-09-18 20:06 ` jistone at redhat dot com
  2009-11-05 14:47 ` fche at redhat dot com
  2010-07-08 14:57 ` fche at redhat dot com
  2 siblings, 0 replies; 6+ messages in thread
From: jistone at redhat dot com @ 2009-09-18 20:06 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From jistone at redhat dot com  2009-09-18 20:06 -------
(In reply to comment #0)
> Something is calling mod_timer with a timer->function==NULL.

This seems to indicate either memory corruption, or the memory was freed and
cleared by the next owner.  It seems like particularly poor timing for the
function pointer to have been valid enough to enter the handler but invalid at
the end of the handler.

> It appears as if the _stp_kill_time function is needlessly racy
> (amongst the stp_timer_reregister flag, which should probably be
> an atomic_t), and the del_timer_sync()'s.  It wouldn't hurt to
> plop a synchronize_sched() in there too before the free_percpu
> goo.

Can you get a crash dump?  I'd like to confirm that _stp_kill_time was actually
attempted, possibly by looking at the backtraces on other cpus and checking if
stp_time==NULL.

The promises of del_timer_sync when it returns are that the handler is not
active and the timer is not queued.  I think this actually makes the reregister
flag superfluous.  It should then be perfectly safe to free the memory, unless
the for_each_online_cpu somehow missed one of the timers...


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=10651

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug runtime/10651] very rare BUG_ON kernel/timer.c:619 due to runtime/time.c
  2009-09-17  2:28 [Bug runtime/10651] New: very rare BUG_ON kernel/timer.c:619 due to runtime/time.c fche at redhat dot com
  2009-09-18 20:06 ` [Bug runtime/10651] " jistone at redhat dot com
@ 2009-11-05 14:47 ` fche at redhat dot com
  2010-07-08 14:57 ` fche at redhat dot com
  2 siblings, 0 replies; 6+ messages in thread
From: fche at redhat dot com @ 2009-11-05 14:47 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From fche at redhat dot com  2009-11-05 14:47 -------
> > Something is calling mod_timer with a timer->function==NULL.
> 
> This seems to indicate either memory corruption, or the memory was freed and
> cleared by the next owner.

I wonder if this was another case of bug #10854.


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=10651

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug runtime/10651] very rare BUG_ON kernel/timer.c:619 due to runtime/time.c
  2009-09-17  2:28 [Bug runtime/10651] New: very rare BUG_ON kernel/timer.c:619 due to runtime/time.c fche at redhat dot com
  2009-09-18 20:06 ` [Bug runtime/10651] " jistone at redhat dot com
  2009-11-05 14:47 ` fche at redhat dot com
@ 2010-07-08 14:57 ` fche at redhat dot com
  2 siblings, 0 replies; 6+ messages in thread
From: fche at redhat dot com @ 2010-07-08 14:57 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From fche at redhat dot com  2010-07-08 14:56 -------
Not recently seen.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |WORKSFORME


http://sourceware.org/bugzilla/show_bug.cgi?id=10651

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug runtime/10651] very rare BUG_ON kernel/timer.c:619 due to runtime/time.c
       [not found] <bug-10651-1110@http.sourceware.org/bugzilla/>
  2010-10-14 20:47 ` fche at redhat dot com
@ 2010-11-27 23:55 ` fche at redhat dot com
  1 sibling, 0 replies; 6+ messages in thread
From: fche at redhat dot com @ 2010-11-27 23:55 UTC (permalink / raw)
  To: systemtap

http://sourceware.org/bugzilla/show_bug.cgi?id=10651

Frank Ch. Eigler <fche at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |FIXED

--- Comment #5 from Frank Ch. Eigler <fche at redhat dot com> 2010-11-27 23:54:44 UTC ---
commit 3fd1c490 switches to atomic_t flags to the timer callback functions
to defeat mod_timer() calls that might overlap a del_timer_sync.
commit f2b610b does likewise for the transport layer timers.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug runtime/10651] very rare BUG_ON kernel/timer.c:619 due to runtime/time.c
       [not found] <bug-10651-1110@http.sourceware.org/bugzilla/>
@ 2010-10-14 20:47 ` fche at redhat dot com
  2010-11-27 23:55 ` fche at redhat dot com
  1 sibling, 0 replies; 6+ messages in thread
From: fche at redhat dot com @ 2010-10-14 20:47 UTC (permalink / raw)
  To: systemtap

http://sourceware.org/bugzilla/show_bug.cgi?id=10651

Frank Ch. Eigler <fche at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|WORKSFORME                  |

--- Comment #4 from Frank Ch. Eigler <fche at redhat dot com> 2010-10-14 20:47:17 UTC ---
http://sourceware.org/ml/systemtap/2010-q4/msg00034.html
appears to be a new sighting.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-11-27 23:55 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-17  2:28 [Bug runtime/10651] New: very rare BUG_ON kernel/timer.c:619 due to runtime/time.c fche at redhat dot com
2009-09-18 20:06 ` [Bug runtime/10651] " jistone at redhat dot com
2009-11-05 14:47 ` fche at redhat dot com
2010-07-08 14:57 ` fche at redhat dot com
     [not found] <bug-10651-1110@http.sourceware.org/bugzilla/>
2010-10-14 20:47 ` fche at redhat dot com
2010-11-27 23:55 ` fche at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).