From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 19997 invoked by alias); 1 Jun 2007 21:12:04 -0000 Received: (qmail 19987 invoked by uid 22791); 1 Jun 2007 21:12:03 -0000 X-Spam-Status: No, hits=-1.7 required=5.0 tests=AWL,BAYES_00,DK_POLICY_SIGNSOME X-Spam-Check-By: sourceware.org Received: from mail119.messagelabs.com (HELO mail119.messagelabs.com) (216.82.241.179) by sourceware.org (qpsmtpd/0.31) with SMTP; Fri, 01 Jun 2007 21:12:01 +0000 X-VirusChecked: Checked X-Env-Sender: qbarnes@urbana.css.mot.com X-Msg-Ref: server-6.tower-119.messagelabs.com!1180732318!11993753!1 X-StarScan-Version: 5.5.12.11; banners=-,-,- X-Originating-IP: [129.188.136.8] Received: (qmail 8668 invoked from network); 1 Jun 2007 21:11:58 -0000 Received: from motgate8.mot.com (HELO motgate8.mot.com) (129.188.136.8) by server-6.tower-119.messagelabs.com with SMTP; 1 Jun 2007 21:11:58 -0000 Received: from il06exr04.mot.com (il06exr04.mot.com [129.188.137.134]) by motgate8.mot.com (8.12.11/Motorola) with ESMTP id l51LBsjG024288; Fri, 1 Jun 2007 14:11:58 -0700 (MST) Received: from il06vts01.mot.com (il06vts01.mot.com [129.188.137.141]) by il06exr04.mot.com (8.13.1/Vontu) with SMTP id l51LBskT022798; Fri, 1 Jun 2007 16:11:54 -0500 (CDT) Received: from udc.urbana.css.mot.com (udc.urbana.css.mot.com [10.12.0.51]) by il06exr04.mot.com (8.13.1/8.13.0) with ESMTP id l51LBrVa022795; Fri, 1 Jun 2007 16:11:53 -0500 (CDT) Received: from nova.urbana.css.mot.com (nova.urbana.css.mot.com [192.88.153.60]) by udc.urbana.css.mot.com (8.13.8+Sun/8.13.8) with ESMTP id l51LBrNZ022036; Fri, 1 Jun 2007 16:11:53 -0500 (CDT) Received: (from qbarnes@localhost) by nova.urbana.css.mot.com (8.13.8+Sun/8.13.7/Submit) id l51LBrgZ011590; Fri, 1 Jun 2007 16:11:53 -0500 (CDT) Date: Fri, 01 Jun 2007 21:12:00 -0000 From: Quentin Barnes To: Roland McGrath Cc: systemtap@sources.redhat.com Subject: Re: systemtap ARM port status Message-ID: <20070601211153.GG5831@urbana.css.mot.com> References: <20070601204637.GF5831@urbana.css.mot.com> <20070601204951.17EB61F8512@magilla.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: <20070601204951.17EB61F8512@magilla.localdomain> User-Agent: Mutt/1.4.2.1i X-POPI: This message is Motorola General Business Information (MGBI). X-Organization: Motorola Cellular Subscriber Group, Urbana Design Center X-Phone: (217) 384-8726 X-Vontu: Pass X-IsSubscribed: yes Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: systemtap-owner@sourceware.org X-SW-Source: 2007-q2/txt/msg00441.txt.bz2 On Fri, Jun 01, 2007 at 01:49:51PM -0700, Roland McGrath wrote: >> However, one of the changes I made to loc2c-runtime.h shouldn't be >> necessary, if anything, it should cause things to fail, but instead >> it makes some tests now pass. I sent some mail to Martin about it >> to see if he has any ideas as to what's going on. > >Please use this mailing list for such discussion. Hi Roland! I don't think we've swapped mail in some time, maybe even going back to the gmake alpha devel mailing list back in the '90s. Well, gee, I think that's a real first for me. I'm often told to take a discussion off a mailing list to e-mail, not the other way around! Okay, you ask for it; you got it. The two mails I sent to Martin about these issues are below. >Thanks, >Roland Quentin +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Date: Fri, 1 Jun 2007 00:17:24 -0500 From: Quentin Barnes To: Martin Hunt Subject: Strange failures with syscall testsuite I've got most all of the testsuite tests passing (or at least understand why they fail) with one big, notable exception, all the tests under systemtap.syscall fail. I've been working on understanding the failures, but am starting to hit a wall due to several anomalies I've encountered. Here's a sample failure: ===== Testing 32-bit access FAIL: 32-bit access access FAILED. output of "stap -c ../access /usr/src/systemtap-20070519/testsuite/systemtap.syscall/sys.stp" was: ------------------------------------------ staprun: getpid () = N/A staprun: getrlimit (RLIMIT_STACK, 0xbec5caf0) = N/A staprun: rt_sigaction (32, 0xbec5ca70, 0x00000000, 8) = N/A staprun: rt_sigaction (33, 0xbec5ca70, 0x00000000, 8) = N/A staprun: rt_sigaction (34, 0xbec5ca70, 0x00000000, 8) = N/A WARNING: Number of errors: 1, skipped probes: 0 ERROR: kernel string copy fault at 0xc1068000 near identifier '$filename' at /usr/local/share/systemtap/tapset/syscalls.stp:49:48 ------------------------------------------ RESULTS: ('*' = MATCHED EXPECTED) --------- EXPECTED and NOT MATCHED ---------- access: access \("foobar1", F_OK\) = 0 access: access \("foobar1", R_OK\) = 0 access: access \("foobar1", W_OK\) = 0 access: access \("foobar1", X_OK\) = -[\-0-9]+ \(EACCES\) access: access \("foobar1", W_OK \|R_OK\) = 0 access: access \("foobar1", X_OK \|W_OK \|R_OK\) = -[\-0-9]+ \(EACCES\) ===== 0xc1068000 is a legit kernel memory address on ARM. When I look at line 49 in syscalls.stp, it is: argstr = sprintf("%s, %s", user_string_quoted($filename), mode_str) $filename is the first argument to sys_access() which has type "const char __user *filename". I dumped the regs as passed into enter_kprobe_probe(). R0 matches the address (i.e. 0xc1068000), so that seems fine, sort of. The first anomaly is 0xc1068000 is a _kernel_ memory address. Shouldn't the address passed into sys_access() be a user-space address due to the "__user" qualifier? I don't know Linux kernel programming nuances, but I thought that tag meant it was expected to be a user-space address only. I assume the error as reported is from expanding the "user_string_quoted($filename)" expression. Having a user space string routine expand a kernel address would certainly be fatal! ARM uses an "ldrbt" instruction to do the copy -- that would fault on trying to copy from a kernel memory address. But then there's anomaly number two. The error message is "ERROR: kernel string copy fault at ...". This is a message from function_kernel_string(). This function is expecting to copy a string at a kernel address. How did it get invoked from a service called "user_string_quoted"? In looking at the implementation of function_kernel_string(), it calls deref_string(), a macro in loc2c-runtime.h. deref_string() invokes deref(). For byte operations, deref() invokes a platform specific call. In the case of i386, it is called "__get_user_asm()". Now I would expect such a routine to perform user-space only loading of data and accessing kernel data would fault. Anomaly number three is why doesn't calling such a function from function_kernel_string() fault on kernel memory access on other platforms like it does on ARM? As a hack, I decided to modify ARM's call of __get_user_asm_byte() by deref() to just do "*(char *)addr". Suddenly, all the test cases that were previously failing with the "ERROR: kernel string copy fault at ..." diagnostic are now making it past that point and much further! (Most of the tests might actually even be passing now, but debug messages I put into the stap output are throwing off the Expect scripts.) Could ARM be doing something unusual (or the systemtap test suite) with providing a kernel data address to sys_access() and the other system calls? What am I not understanding? Any words of wisdom on what's going on and how to solve this? Quentin +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Date: Fri, 1 Jun 2007 15:04:31 -0500 From: Quentin Barnes To: Martin Hunt Subject: Re: Basic ARM support for Systemtap At the end of this mail are three additional files for ARM on top of the ones I sent you. They are pretty straight forward. The patch fixes the "N/A" in the output I had sent you (and one cast). In the mean time, I hacked the deref macro in loc2c-runtime.h to be this for ARM: ===== #define deref(size, addr) \ ({ \ int _bad = 0; \ intptr_t _v=0; \ switch (size){ \ case 1: _v = *(char *)addr; break; \ case 2: _v = *(short *)addr; break; \ case 4: _v = *(int *)addr; break; \ default: __get_user_bad(); break; \ } \ if (_bad) \ goto deref_fault; \ _v; \ }) ===== This hackery above and the patch below, for the first time ever, I'm getting the "systemtap.syscall" testsuite to pass! I don't know how many will pass. It will take another several hours for the suite to finish running that section on my devel board, but I am very curious to see. If you can help me understand what's going on here with why the above hackery "works" at all, it would be a big help. (My questions are in the previous mail I sent in the wee hours earlier today). Quentin Index: runtime/stack-arm.c =================================================================== --- runtime/stack-arm.c (revision 195) +++ runtime/stack-arm.c (working copy) @@ -59,7 +59,7 @@ static void __stp_stack_print (struct pt _stp_symbol_print((unsigned long)pc); _stp_print_char('\n'); } else { - _stp_printf("%08lx ", pc); + _stp_printf("%08lx ", (unsigned long)pc); } /* Sanity check the next_fp. */ Index: runtime/regs.c =================================================================== --- runtime/regs.c (revision 195) +++ runtime/regs.c (working copy) @@ -46,6 +46,8 @@ unsigned long _stp_ret_addr (struct pt_r return regs->b0; #elif defined (__s390__) || defined (__s390x__) return regs->gprs[14]; +#elif defined (__arm__) + return regs->ARM_r0; #else #error Unimplemented architecture #endif Index: tapset/errno.stp =================================================================== --- tapset/errno.stp (revision 194) +++ tapset/errno.stp (working copy) @@ -370,6 +370,8 @@ function returnstr:string (returnp:long) ret = CONTEXT->regs->u_regs[UREG_RETPC]; #elif defined (__s390x__) ret = CONTEXT->regs->gprs[2]; +#elif defined (__arm__) + ret = CONTEXT->regs->ARM_r0; #else goto no_ret; #endif