From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <systemtap-return-18026-listarch-systemtap=sources.redhat.com@sourceware.org>
Received: (qmail 24556 invoked by alias); 22 Jun 2011 23:52:27 -0000
Received: (qmail 24547 invoked by uid 22791); 22 Jun 2011 23:52:27 -0000
X-SWARE-Spam-Status: No, hits=-6.2 required=5.0	tests=AWL,BAYES_00,RCVD_IN_DNSWL_HI,SPF_HELO_PASS,T_RP_MATCHES_RCVD
X-Spam-Check-By: sourceware.org
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 22 Jun 2011 23:52:09 +0000
Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23])	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id p5MNq8PK007496	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK)	for <systemtap@sourceware.org>; Wed, 22 Jun 2011 19:52:08 -0400
Received: from [10.3.113.54] (ovpn-113-54.phx2.redhat.com [10.3.113.54])	by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id p5MNq8m2026030;	Wed, 22 Jun 2011 19:52:08 -0400
Message-ID: <4E028028.4010603@redhat.com>
Date: Wed, 22 Jun 2011 23:52:00 -0000
From: Josh Stone <jistone@redhat.com>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110428 Fedora/3.1.10-1.fc15 Lightning/1.0b3pre Thunderbird/3.1.10
MIME-Version: 1.0
To: "Richard W.M. Jones" <rjones@redhat.com>
CC: systemtap@sourceware.org
Subject: Re: Rapidly running systemtap causing hangs or oops
References: <20110622230025.GG18438@amd.home.annexia.org>
In-Reply-To: <20110622230025.GG18438@amd.home.annexia.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-IsSubscribed: yes
Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <systemtap.sourceware.org>
List-Subscribe: <mailto:systemtap-subscribe@sourceware.org>
List-Post: <mailto:systemtap@sourceware.org>
List-Help: <mailto:systemtap-help@sourceware.org>, <http://sourceware.org/lists.html#faqs>
Sender: systemtap-owner@sourceware.org
X-SW-Source: 2011-q2/txt/msg00318.txt.bz2

On 06/22/2011 04:00 PM, Richard W.M. Jones wrote:
> Me again.  I can get something involving systemtap, ext2, the loop
> device, Linux 3.0 to oops very easily.  I'm not quite sure exactly
> what factor causes it, but here's an easy reproducer:
> 
> $ mkdir /tmp/mnt
> 
> $ truncate -s 1G /tmp/fs
> $ mkfs.ext2 -F /tmp/fs
> 
> $ cat > /tmp/test.sh 
> #!/bin/sh -
> echo mount
> mount -o loop /tmp/fs /tmp/mnt
> echo unmount
> umount /tmp/mnt
> 
> $ chmod +x /tmp/test.sh
> 
> $ while sudo stap -e 'probe module("ext2").statement ("*@*.c:*") { printf ("%s\n", pp()); }' -c /tmp/test.sh ; do : ; done
> 
> The final command usually either hangs the machine, or produces a long
> oops like the one attached, after just a few iterations.  It takes
> just a few seconds on my VM to get a hang or oops.

Can you try running stap with "-D STP_ALIBI"?  This alibi mode compiles
out most of stap's code, so each probe handler is reduced to just an
atomic increment, then a final hit count is reported on exit.

Another test might be to move the loop inside test.sh, so stap is left
running the whole time, and we might tell if the issue is timed around
stap's probe registration or unregistration.

> [  342.037017]  [<ffffffff8100b0ce>] show_registers+0xbd/0x206
> [  342.037017]  [<ffffffff814f6cba>] ? atomic_notifier_call_chain+0x14/0x16
> [  342.037017]  [<ffffffff814f4941>] __die+0x97/0xd8
> [  342.037017]  [<ffffffff8100be1c>] die+0x47/0x63
> [  342.037017]  [<ffffffff81009d79>] do_double_fault+0x65/0x67
> [  342.037017]  [<ffffffff814fb1aa>] double_fault+0x2a/0x30
> [  342.037017]  [<ffffffffa00ca6a6>] ? ext2_get_inode+0x6d/0x130 [ext2]

Is the Oops always this minimal?  Does it always (questionably) point to
the same ext2_get_inode location?

I'll play with this tomorrow and see if I can reproduce it myself...

Josh