From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 5021 invoked by alias); 22 Nov 2005 14:00:42 -0000 Received: (qmail 4834 invoked by uid 22791); 22 Nov 2005 14:00:40 -0000 X-Spam-Status: No, hits=-2.6 required=5.0 tests=BAYES_00 X-Spam-Check-By: sourceware.org Received: from tomts16.bellnexxia.net (HELO tomts16-srv.bellnexxia.net) (209.226.175.4) by sourceware.org (qpsmtpd/0.31) with ESMTP; Tue, 22 Nov 2005 14:00:36 +0000 Received: from krystal.dyndns.org ([70.49.125.187]) by tomts16-srv.bellnexxia.net (InterMail vM.5.01.06.10 201-253-122-130-110-20040306) with ESMTP id <20051122140028.QQIX21026.tomts16-srv.bellnexxia.net@krystal.dyndns.org> for ; Tue, 22 Nov 2005 09:00:28 -0500 Received: from localhost (localhost [127.0.0.1]) (uid 1000) by krystal.dyndns.org with local; Tue, 22 Nov 2005 09:00:22 -0500 id 002A16D3.43832477.000054FD Date: Tue, 22 Nov 2005 14:00:00 -0000 From: Mathieu Desnoyers To: "Stone, Joshua I" , Tom Zanussi , michel.dagenais@polymtl.ca Cc: systemtap@sources.redhat.com Subject: Re: double fault -> PAGE_KERNEL flagged memory Message-ID: <20051122140021.GA3907@Krystal> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline In-Reply-To: X-Editor: vi X-Info: http://krystal.dyndns.org:8080 X-Operating-System: Linux/2.4.31-grsec (i686) X-Uptime: 08:26:29 up 25 days, 15:25, 2 users, load average: 1.55, 1.36, 1.09 User-Agent: Mutt/1.5.11 X-Originating-IP: [0] X-IsSubscribed: yes Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Post: List-Help: , Sender: systemtap-owner@sourceware.org X-SW-Source: 2005-q4/txt/msg00231.txt.bz2 I suspect that your double fault may come from the systemTAP logging code. Do you have an instrumentation point in any fault handler ? For Tom : can you flag the RelayFS buffer memory PAGE_KERNEL instead of GFP_KERNEL ? Otherwise, it leads to page faults when accessing those pages when accessed for the first time (seen with LTTng). For instance, if you log an event for the page fault handler, and this logging code does generate a page fault itself, then you get a double fault. The same could apply to unaligned memory access. Make sure that the SystemTAP code is _always_ in contiguous memory non swappable to disk : The Linux kernel module loading does make sure that all module code is memory locked (see module.c) by first loading the whole module in a vmap area (which is swappable) and then copying the code in a region of memory flagged PAGE_KERNEL_EXEC (see vmalloc.c:vmalloc_exec()). Furthermore, make sure that each data memory regions are also non swappable. That means the RelayFS buffer too. So : - memory in which the SystemTAP code is loaded should be allocated with vmalloc_exec() (or with the GFP_KERNEL | __GFP_HIGHMEM, PAGE_KERNEL_EXEC flags). - SystemTAP global data structures should be in memory protected from swap out, with a flag like PAGE_KERNEL. - RelayFS buffers should be PAGE_KERNEL too (not GFP_KERNEL). Mathieu * Stone, Joshua I (joshua.i.stone@intel.com) wrote: > I am seeing sporadic double-faults when running tests on systemtap. I > am trying to run systemtap.base/lt.exp, though others fail as well. It > doesn't always fail, but if I run it four or five times in succession > that's usually enough to trigger the fault. Below are manual copies of > a couple of the faults dumped to the console: > > double fault, gdt at c0358000 [255 bytes] > double fault, tss at c03dc000 > eip = ffffffff, esp = f4b6500c > eax = ffffffff, ebx = ffffffff, ecx = 0000007b, edx = f4b65018 > esi = ffffffff, edi = ffffffff, ebp = 00000000 > > double fault, gdt at c0358000 [255 bytes] > double fault, tss at c03dc000 > eip = c011a799, esp = f5bd4f98 > eax = f959a380, ebx = f5bd5170, ecx = 0000007b, edx = f4bd505c > esi = 00000000, edi = c011a785, ebp = 00000000 > > The first dump doesn't tell much, but the edi and eip values in the > second dump are interesting. 'c011a785' is the beginning of > do_page_fault, and the instruction at 'c011a799' is a read from the > stack. Methinks the stack runneth over? > > This is on RHEL4 U2, i686, kernel 2.6.9-22.EL. I verified this crash on > two different machines with this kernel: an IBM T42 laptop (1.7GHz > Pentium M, 1GB RAM), and a desktop (3.6GHz Pentium 4 HT/EM64T, 2GB RAM). > I couldn't reproduce the problem with the 2.6.9-22.ELsmp kernel. I also > tried the desktop in x86_64 mode, and could not reproduce the problem > with the UP kernel nor the SMP kernel. > > Please let me know if there's any other information I can provide to > help track this down... > > Thanks, > > Josh Stone > OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68