public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* [Bug translator/4066] New: hist_linear() with large H value crashes system
@ 2007-02-16 21:39 mmlnx at us dot ibm dot com
  2007-02-16 22:13 ` [Bug translator/4066] " mmlnx at us dot ibm dot com
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: mmlnx at us dot ibm dot com @ 2007-02-16 21:39 UTC (permalink / raw)
  To: systemtap

I was writing a simple script that uses hist_linear() and decided to pass it a
ridiculously large H value to see what happens.  To my surprise, it crashed the
system.  

Here's the script:

global reads
probe netdev.receive { reads <<< length }
probe end { print (@hist_linear(reads, 0, 300000000000, 50)) }

I tried the script on ppc64 and x86 running SLES10 SP1 and x86_64 running RHEL5
RC1 and the latest FC5 kernel.  I tried the SLES10 SP1, RHEL5 RC1 and CVS
versions of systemtap.  All crashed.  I suspect there's a problem in
_stp_stat_init(). More details to follow.

-- 
           Summary: hist_linear() with large H value crashes system
           Product: systemtap
           Version: unspecified
            Status: NEW
          Severity: critical
          Priority: P2
         Component: translator
        AssignedTo: systemtap at sources dot redhat dot com
        ReportedBy: mmlnx at us dot ibm dot com


http://sourceware.org/bugzilla/show_bug.cgi?id=4066

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug translator/4066] hist_linear() with large H value crashes system
  2007-02-16 21:39 [Bug translator/4066] New: hist_linear() with large H value crashes system mmlnx at us dot ibm dot com
@ 2007-02-16 22:13 ` mmlnx at us dot ibm dot com
  2007-02-16 22:41 ` joshua dot i dot stone at intel dot com
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: mmlnx at us dot ibm dot com @ 2007-02-16 22:13 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From mmlnx at us dot ibm dot com  2007-02-16 22:13 -------
Console output from crash on an x86_64 system running 2.6.19-1.2288.fc5 and
systemtap CVS from Feb 15th: 

Unable to handle kernel NULL pointer dereference at 0000000000000018 RIP: 
 [<ffffffff8833c2f0>]
:stap_d084cedcd06497638f61939c59dd9ce0_807:_stp_stat_add+0x0/0x155
PGD 0 
Oops: 0000 [1] SMP 
last sysfs file: /module/scsi_mod/sections/.text
CPU 0 
Modules linked in: stap_d084cedcd06497638f61939c59dd9ce0_807(U) ipv6 autofs4
hidp rfcomm l2cap bluetooth sunrpc dm_mirror dm_mod video sbs i2c_ec button
battery asus_acpi ac lp parport_pc parport snd_hda_intel snd_hda_codec
snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss
ehci_hcd uhci_hcd snd_mixer_oss sg ata_piix snd_pcm e1000 ide_cd serio_raw
snd_timer i2c_i801 snd soundcore cdrom snd_page_alloc i2c_core pcspkr shpchp
ext3 jbd ahci libata sd_mod scsi_mod
Pid: 0, comm: swapper Not tainted 2.6.19-1.2288.fc5 #1
RIP: 0010:[<ffffffff8833c2f0>]  [<ffffffff8833c2f0>]
:stap_d084cedcd06497638f61939c59dd9ce0_807:_stp_stat_add+0x0/0x155
RSP: 0018:ffffffff806bae70  EFLAGS: 00010093
RAX: 0000000000000002 RBX: ffff810025d1e000 RCX: ffff810025d1e048
RDX: 0000000000000063 RSI: 0000000000000063 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffffffff88345b60 R09: 0000000000000000
R10: ffff81003fc8fc80 R11: 0000000000000000 R12: ffffffff8022056b
R13: ffffffff806baf58 R14: ffffffff806a9800 R15: ffff81003fc7e580
FS:  0000000000000000(0000) GS:ffffffff805ff000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000018 CR3: 0000000037947000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffffffff80658000, task ffffffff80565640)
Stack:  ffffffff883406e4 0000000000000000 0000000000000082 ffff8100020445a0
 ffffffff883402b1 0000000000000000 ffff810025d1e000 ffffffff88345b60
 ffffffff80264376 0000000000000000 ffffffff806baf18 0000000000000002
Call Trace:
 [<ffffffff883406e4>]
:stap_d084cedcd06497638f61939c59dd9ce0_807:probe_1495+0x128/0x1c4
 [<ffffffff883402b1>]
:stap_d084cedcd06497638f61939c59dd9ce0_807:enter_kprobe_probe+0xf3/0x18d
 [<ffffffff80264376>] kprobe_handler+0x18f/0x1bf
 [<ffffffff802643e1>] kprobe_exceptions_notify+0x3b/0x72
 [<ffffffff80265094>] notifier_call_chain+0x20/0x32
 [<ffffffff80263c67>] do_int3+0x42/0x83
 [<ffffffff802633c3>] int3+0x93/0xb0
 [<ffffffff8022056c>] netif_receive_skb+0x1/0x3da
 [<ffffffff8810e361>] :e1000:e1000_clean_rx_irq+0x470/0x52f
 [<ffffffff8810d264>] :e1000:e1000_clean+0x8c/0x159
 [<ffffffff8020c37c>] net_rx_action+0xa4/0x1a7
 [<ffffffff80211ee5>] __do_softirq+0x55/0xc4
 [<ffffffff8025d24c>] call_softirq+0x1c/0x30
 [<ffffffff8026aa5a>] do_softirq+0x2c/0x97
 [<ffffffff8026abf5>] do_IRQ+0x130/0x151
 [<ffffffff8025c641>] ret_from_intr+0x0/0xa
 [<ffffffff8026911d>] mwait_idle_with_hints+0x44/0x45
 [<ffffffff80255eee>] mwait_idle+0xc/0x20
 [<ffffffff80247ec6>] cpu_idle+0x8b/0xae
 [<ffffffff806627a0>] start_kernel+0x240/0x245
 [<ffffffff8066215a>] _sinittext+0x15a/0x15e


Code: 48 8b 47 18 65 8b 14 25 24 00 00 00 48 63 d2 48 f7 d0 4c 8b 
RIP  [<ffffffff8833c2f0>]
:stap_d084cedcd06497638f61939c59dd9ce0_807:_stp_stat_add+0x0/0x155
 RSP <ffffffff806bae70>
CR2: 0000000000000018
 <3>BUG: sleeping function called from invalid context at kernel/rwsem.c:20
in_atomic():1, irqs_disabled():1

Call Trace:
 [<ffffffff802699c5>] show_trace+0x34/0x47
 [<ffffffff802699ea>] dump_stack+0x12/0x17
 [<ffffffff8029cc94>] down_read+0x15/0x23
 [<ffffffff80294cb3>] blocking_notifier_call_chain+0x13/0x36
 [<ffffffff8021505d>] do_exit+0x20/0x97d
 [<ffffffff80264ff9>] do_page_fault+0x7a1/0x81c
 [<ffffffff8026307d>] error_exit+0x0/0x84
 [<ffffffff8833c2f0>]
:stap_d084cedcd06497638f61939c59dd9ce0_807:_stp_stat_add+0x0/0x155
 [<ffffffff883406e4>]
:stap_d084cedcd06497638f61939c59dd9ce0_807:probe_1495+0x128/0x1c4
 [<ffffffff883402b1>]
:stap_d084cedcd06497638f61939c59dd9ce0_807:enter_kprobe_probe+0xf3/0x18d
 [<ffffffff80264376>] kprobe_handler+0x18f/0x1bf
 [<ffffffff802643e1>] kprobe_exceptions_notify+0x3b/0x72
 [<ffffffff80265094>] notifier_call_chain+0x20/0x32
 [<ffffffff80263c67>] do_int3+0x42/0x83
 [<ffffffff802633c3>] int3+0x93/0xb0
 [<ffffffff8022056c>] netif_receive_skb+0x1/0x3da
 [<ffffffff8810e361>] :e1000:e1000_clean_rx_irq+0x470/0x52f
 [<ffffffff8810d264>] :e1000:e1000_clean+0x8c/0x159
 [<ffffffff8020c37c>] net_rx_action+0xa4/0x1a7
 [<ffffffff80211ee5>] __do_softirq+0x55/0xc4
 [<ffffffff8025d24c>] call_softirq+0x1c/0x30
 [<ffffffff8026aa5a>] do_softirq+0x2c/0x97
 [<ffffffff8026abf5>] do_IRQ+0x130/0x151
 [<ffffffff8025c641>] ret_from_intr+0x0/0xa
 [<ffffffff8026911d>] mwait_idle_with_hints+0x44/0x45
 [<ffffffff80255eee>] mwait_idle+0xc/0x20
 [<ffffffff80247ec6>] cpu_idle+0x8b/0xae
 [<ffffffff806627a0>] start_kernel+0x240/0x245
 [<ffffffff8066215a>] _sinittext+0x15a/0x15e

BUG: scheduling while atomic: swapper/0x10000100/0

Call Trace:
 [<ffffffff802699c5>] show_trace+0x34/0x47
 [<ffffffff802699ea>] dump_stack+0x12/0x17
 [<ffffffff802604ae>] __sched_text_start+0x5e/0xadc
 [<ffffffff802889fa>] __cond_resched+0x2d/0x55
 [<ffffffff8026104c>] cond_resched+0x2e/0x39
 [<ffffffff8029cc99>] down_read+0x1a/0x23
 [<ffffffff80294cb3>] blocking_notifier_call_chain+0x13/0x36
 [<ffffffff8021505d>] do_exit+0x20/0x97d
 [<ffffffff80264ff9>] do_page_fault+0x7a1/0x81c
 [<ffffffff8026307d>] error_exit+0x0/0x84
 [<ffffffff8833c2f0>]
:stap_d084cedcd06497638f61939c59dd9ce0_807:_stp_stat_add+0x0/0x155
 [<ffffffff883406e4>]
:stap_d084cedcd06497638f61939c59dd9ce0_807:probe_1495+0x128/0x1c4
 [<ffffffff883402b1>]
:stap_d084cedcd06497638f61939c59dd9ce0_807:enter_kprobe_probe+0xf3/0x18d
 [<ffffffff80264376>] kprobe_handler+0x18f/0x1bf
 [<ffffffff802643e1>] kprobe_exceptions_notify+0x3b/0x72
 [<ffffffff80265094>] notifier_call_chain+0x20/0x32
 [<ffffffff80263c67>] do_int3+0x42/0x83
 [<ffffffff802633c3>] int3+0x93/0xb0
 [<ffffffff8022056c>] netif_receive_skb+0x1/0x3da
 [<ffffffff8810e361>] :e1000:e1000_clean_rx_irq+0x470/0x52f
 [<ffffffff8810d264>] :e1000:e1000_clean+0x8c/0x159
 [<ffffffff8020c37c>] net_rx_action+0xa4/0x1a7
 [<ffffffff80211ee5>] __do_softirq+0x55/0xc4
 [<ffffffff8025d24c>] call_softirq+0x1c/0x30
 [<ffffffff8026aa5a>] do_softirq+0x2c/0x97
 [<ffffffff8026abf5>] do_IRQ+0x130/0x151
 [<ffffffff8025c641>] ret_from_intr+0x0/0xa
 [<ffffffff8026911d>] mwait_idle_with_hints+0x44/0x45
 [<ffffffff80255eee>] mwait_idle+0xc/0x20
 [<ffffffff80247ec6>] cpu_idle+0x8b/0xae
 [<ffffffff806627a0>] start_kernel+0x240/0x245
 [<ffffffff8066215a>] _sinittext+0x15a/0x15e

Kernel panic - not syncing: Aiee, killing interrupt handler!


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=4066

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug translator/4066] hist_linear() with large H value crashes system
  2007-02-16 21:39 [Bug translator/4066] New: hist_linear() with large H value crashes system mmlnx at us dot ibm dot com
  2007-02-16 22:13 ` [Bug translator/4066] " mmlnx at us dot ibm dot com
@ 2007-02-16 22:41 ` joshua dot i dot stone at intel dot com
  2007-02-17 12:28 ` fche at redhat dot com
       [not found] ` <20070217122846.3531.qmail@sourceware.org>
  3 siblings, 0 replies; 10+ messages in thread
From: joshua dot i dot stone at intel dot com @ 2007-02-16 22:41 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From joshua dot i dot stone at intel dot com  2007-02-16 22:41 -------
(In reply to comment #0)
> I suspect there's a problem in _stp_stat_init().

The problem is we're not checking the return value.  The initialization code for
"reads" goes:

  global_reads = _stp_stat_init (HIST_LINEAR, 0, 300000000000, 50);
  if (rc) {
    _stp_error ("global variable reads allocation failed");
    goto out;
  }
  rwlock_init (& global_reads_lock);

_stp_stat_init returns NULL if a problem occurs, but the caller is only checking
'rc', which is untouched.



-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=4066

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug translator/4066] hist_linear() with large H value crashes system
  2007-02-16 21:39 [Bug translator/4066] New: hist_linear() with large H value crashes system mmlnx at us dot ibm dot com
  2007-02-16 22:13 ` [Bug translator/4066] " mmlnx at us dot ibm dot com
  2007-02-16 22:41 ` joshua dot i dot stone at intel dot com
@ 2007-02-17 12:28 ` fche at redhat dot com
       [not found] ` <20070217122846.3531.qmail@sourceware.org>
  3 siblings, 0 replies; 10+ messages in thread
From: fche at redhat dot com @ 2007-02-17 12:28 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From fche at redhat dot com  2007-02-17 12:28 -------
Patch committed.  Thanks to Josh for the quick analysis.
This would be an appropriate sort of test to add to the suite coming for bug #3591.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


http://sourceware.org/bugzilla/show_bug.cgi?id=4066

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bug translator/4066] hist_linear() with large H value crashes  system
       [not found] ` <20070217122846.3531.qmail@sourceware.org>
@ 2007-02-20 22:03   ` Mike Mason
  2007-02-20 22:12     ` Frank Ch. Eigler
  2007-02-20 22:14     ` Mike Mason
  0 siblings, 2 replies; 10+ messages in thread
From: Mike Mason @ 2007-02-20 22:03 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: SystemTAP

Frank,

Thanks for the quick fix.  This bug appears in the RHEL 4.4, 4.5 and 5.0 versions of systemtap.  Have you filed bugs in Red Hat's bugzilla for this?  If not, do you want me to file them on our end and mirror them to Red Hat?

Thanks,
Mike

fche at redhat dot com wrote:
> ------- Additional Comments From fche at redhat dot com  2007-02-17 12:28 -------
> Patch committed.  Thanks to Josh for the quick analysis.
> This would be an appropriate sort of test to add to the suite coming for bug #3591.
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bug translator/4066] hist_linear() with large H value crashes system
  2007-02-20 22:03   ` Mike Mason
@ 2007-02-20 22:12     ` Frank Ch. Eigler
  2007-02-20 22:14     ` Mike Mason
  1 sibling, 0 replies; 10+ messages in thread
From: Frank Ch. Eigler @ 2007-02-20 22:12 UTC (permalink / raw)
  To: Mike Mason; +Cc: systemtap

Hi -

> Thanks for the quick fix.  This bug appears in the RHEL 4.4, 4.5 and
> 5.0 versions of systemtap.  Have you filed bugs in Red Hat's
> bugzilla for this? [...]

No, and I'm not planning to.  We file routine "refresh systemtap for
RHEL*" bugs for each active branch around release time.  This is
another guilty pleasure of being a technology preview.

- FChE

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bug translator/4066] hist_linear() with large H value crashes  system
  2007-02-20 22:03   ` Mike Mason
  2007-02-20 22:12     ` Frank Ch. Eigler
@ 2007-02-20 22:14     ` Mike Mason
  2007-02-20 22:23       ` Frank Ch. Eigler
  1 sibling, 1 reply; 10+ messages in thread
From: Mike Mason @ 2007-02-20 22:14 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: SystemTAP

I should have asked this question in a more general way.  When we find bugs in the cvs version of systemtap, then discover it appears in specific distro releases as well, what's the preferred way of handling it?  I'm thinking we should file several bug reports: one in systemtap bugzilla and one for each distro release in which it appears.  Lots of work, I know, but I can't think of any other way to make sure it's tracked and fixed everywhere.  Can anyone think of a less cumbersome way to achieve the same result?

Mike

Mike Mason wrote:
> Frank,
> 
> Thanks for the quick fix.  This bug appears in the RHEL 4.4, 4.5 and 5.0 
> versions of systemtap.  Have you filed bugs in Red Hat's bugzilla for 
> this?  If not, do you want me to file them on our end and mirror them to 
> Red Hat?
> 
> Thanks,
> Mike
> 
> fche at redhat dot com wrote:
>> ------- Additional Comments From fche at redhat dot com  2007-02-17 
>> 12:28 -------
>> Patch committed.  Thanks to Josh for the quick analysis.
>> This would be an appropriate sort of test to add to the suite coming 
>> for bug #3591.
>>
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bug translator/4066] hist_linear() with large H value crashes system
  2007-02-20 22:14     ` Mike Mason
@ 2007-02-20 22:23       ` Frank Ch. Eigler
  2007-02-20 23:57         ` Vara Prasad
  0 siblings, 1 reply; 10+ messages in thread
From: Frank Ch. Eigler @ 2007-02-20 22:23 UTC (permalink / raw)
  To: Mike Mason; +Cc: systemtap

Hi -

On Tue, Feb 20, 2007 at 02:14:07PM -0800, Mike Mason wrote:

> I should have asked this question in a more general way.  When we
> find bugs in the cvs version of systemtap, then discover it appears
> in specific distro releases as well, what's the preferred way of
> handling it?  [...]

For the near future, RHEL and FC systemtap releases will each be
nearly-identical snapshots of the development tree.  This means that
if one gets a bug fix, they all will get the same bug fix at the same
time.  Tracking this with multiple bug reports in multiple systems
seems excessive to me.

- FChE

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bug translator/4066] hist_linear() with large H value crashes  system
  2007-02-20 22:23       ` Frank Ch. Eigler
@ 2007-02-20 23:57         ` Vara Prasad
  2007-02-21  2:22           ` Frank Ch. Eigler
  0 siblings, 1 reply; 10+ messages in thread
From: Vara Prasad @ 2007-02-20 23:57 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Mike Mason, systemtap

Frank Ch. Eigler wrote:

>Hi -
>
>On Tue, Feb 20, 2007 at 02:14:07PM -0800, Mike Mason wrote:
>
>  
>
>>I should have asked this question in a more general way.  When we
>>find bugs in the cvs version of systemtap, then discover it appears
>>in specific distro releases as well, what's the preferred way of
>>handling it?  [...]
>>    
>>
>
>For the near future, RHEL and FC systemtap releases will each be
>nearly-identical snapshots of the development tree.  This means that
>if one gets a bug fix, they all will get the same bug fix at the same
>time.  Tracking this with multiple bug reports in multiple systems
>seems excessive to me.
>
>- FChE
>  
>
I think i understand but i wanted to make it more explicit based on your 
above answer, in every version of RHEL we will update entire SystemTap 
RPM set, including tapsets. Am i right?  What about ELF utils? Are we 
doing a whole sale replacement of the same?

Thanks,
Vara Prasad


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bug translator/4066] hist_linear() with large H value crashes system
  2007-02-20 23:57         ` Vara Prasad
@ 2007-02-21  2:22           ` Frank Ch. Eigler
  0 siblings, 0 replies; 10+ messages in thread
From: Frank Ch. Eigler @ 2007-02-21  2:22 UTC (permalink / raw)
  To: Vara Prasad; +Cc: systemtap

Hi -

varap wrote:
> [...]
> >For the near future, RHEL and FC systemtap releases will each be
> >nearly-identical snapshots of the development tree.  [...]
>
> I think i understand but i wanted to make it more explicit based on your 
> above answer, in every version of RHEL we will update entire SystemTap 
> RPM set, including tapsets. Am i right?  What about ELF utils? Are we 
> doing a whole sale replacement of the same?

In simple terms, I expect the pattern of the last year to continue for
the near future (another release or two).  As the tapsets are part of
systemtap, yes, the same applies.  I expect we will continue to bundle
elfutils on those situations/platforms where the system elfutils is
expected to remain too old (e.g., RHEL4).

This will probably change after we exit "technology preview" status in
RHEL.

- FChE

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2007-02-21  2:22 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-16 21:39 [Bug translator/4066] New: hist_linear() with large H value crashes system mmlnx at us dot ibm dot com
2007-02-16 22:13 ` [Bug translator/4066] " mmlnx at us dot ibm dot com
2007-02-16 22:41 ` joshua dot i dot stone at intel dot com
2007-02-17 12:28 ` fche at redhat dot com
     [not found] ` <20070217122846.3531.qmail@sourceware.org>
2007-02-20 22:03   ` Mike Mason
2007-02-20 22:12     ` Frank Ch. Eigler
2007-02-20 22:14     ` Mike Mason
2007-02-20 22:23       ` Frank Ch. Eigler
2007-02-20 23:57         ` Vara Prasad
2007-02-21  2:22           ` Frank Ch. Eigler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).