From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 14490 invoked by alias); 23 Jun 2011 07:51:47 -0000 Received: (qmail 14482 invoked by uid 22791); 23 Jun 2011 07:51:46 -0000 X-SWARE-Spam-Status: No, hits=-6.8 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_HI,SPF_HELO_PASS,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 23 Jun 2011 07:51:28 +0000 Received: from int-mx02.intmail.prod.int.phx2.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id p5N7pR2u030387 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Thu, 23 Jun 2011 03:51:27 -0400 Received: from localhost (vpn1-5-79.ams2.redhat.com [10.36.5.79]) by int-mx02.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id p5N7pQIB011892; Thu, 23 Jun 2011 03:51:27 -0400 Date: Thu, 23 Jun 2011 07:51:00 -0000 From: "Richard W.M. Jones" To: Josh Stone Cc: systemtap@sourceware.org Subject: Re: Rapidly running systemtap causing hangs or oops Message-ID: <20110623075126.GJ803@amd.home.annexia.org> References: <20110622230025.GG18438@amd.home.annexia.org> <4E028028.4010603@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4E028028.4010603@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-IsSubscribed: yes Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: systemtap-owner@sourceware.org X-SW-Source: 2011-q2/txt/msg00319.txt.bz2 On Wed, Jun 22, 2011 at 04:52:08PM -0700, Josh Stone wrote: > On 06/22/2011 04:00 PM, Richard W.M. Jones wrote: > > Me again. I can get something involving systemtap, ext2, the loop > > device, Linux 3.0 to oops very easily. I'm not quite sure exactly > > what factor causes it, but here's an easy reproducer: > > > > $ mkdir /tmp/mnt > > > > $ truncate -s 1G /tmp/fs > > $ mkfs.ext2 -F /tmp/fs > > > > $ cat > /tmp/test.sh > > #!/bin/sh - > > echo mount > > mount -o loop /tmp/fs /tmp/mnt > > echo unmount > > umount /tmp/mnt > > > > $ chmod +x /tmp/test.sh > > > > $ while sudo stap -e 'probe module("ext2").statement ("*@*.c:*") { printf ("%s\n", pp()); }' -c /tmp/test.sh ; do : ; done > > > > The final command usually either hangs the machine, or produces a long > > oops like the one attached, after just a few iterations. It takes > > just a few seconds on my VM to get a hang or oops. > > Can you try running stap with "-D STP_ALIBI"? This alibi mode compiles > out most of stap's code, so each probe handler is reduced to just an > atomic increment, then a final hit count is reported on exit. Adding -D STP_ALIBI to the command line changed the stap output a little bit, so I see lines that look like this: module("ext2").statement("ext2_xattr_put_super@fs/ext2/xattr.c:817"), (:1:1), hits: 1, from: module("ext2").statement("*@*.c:*") However it did not change the behaviour. The mount process crashed quickly with the oops below: [ 159.454020] [] ext2_fill_super+0x9b5/0xc3b [ext2] [ 159.454020] [] mount_bdev+0x155/0x1b7 [ 159.454020] [] ? ext2_error+0x112/0x112 [ext2] [ 159.454020] [] ext2_mount+0x15/0x17 [ext2] [ 159.454020] [] mount_fs+0x69/0x155 [ 159.454020] [] ? __alloc_percpu+0x10/0x12 [ 159.454020] [] vfs_kern_mount+0x63/0xa0 [ 159.454020] [] do_kern_mount+0x4d/0xdf [ 159.454020] [] do_mount+0x63c/0x69f [ 159.454020] [] ? memdup_user+0x42/0x6a [ 159.454020] [] ? strndup_user+0x3b/0x51 [ 159.454020] [] sys_mount+0x88/0xc2 [ 159.454020] [] system_call_fastpath+0x16/0x1b > Another test might be to move the loop inside test.sh, so stap is left > running the whole time, and we might tell if the issue is timed around > stap's probe registration or unregistration. I've left this running for 10 minutes, no crash. Unfortunately for the real program I'm writing, I really do need a way to box stap around each test. The problem I was having before was that there was quite a long delay between my test running and stap probes firing (or at least, seeing stap output). I need the stap output from one test to be clearly distinct from the stap output from the next test. If there was a way to run the test and then say to stap "now flush all your output" before running the next test, then that would be acceptable. I thought about using the process ID, but ideally my tests will all run as the same pid. > > [ 342.037017] [] show_registers+0xbd/0x206 > > [ 342.037017] [] ? atomic_notifier_call_chain+0x14/0x16 > > [ 342.037017] [] __die+0x97/0xd8 > > [ 342.037017] [] die+0x47/0x63 > > [ 342.037017] [] do_double_fault+0x65/0x67 > > [ 342.037017] [] double_fault+0x2a/0x30 > > [ 342.037017] [] ? ext2_get_inode+0x6d/0x130 [ext2] > > Is the Oops always this minimal? Does it always (questionably) point to > the same ext2_get_inode location? Different places, but all within the ext2 mounting / superblock code. > I'll play with this tomorrow and see if I can reproduce it myself... Thanks for looking at this. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://et.redhat.com/~rjones/virt-top