From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 29560 invoked by alias); 13 Jun 2016 13:42:21 -0000 Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: systemtap-owner@sourceware.org Received: (qmail 29551 invoked by uid 89); 13 Jun 2016 13:42:20 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-3.3 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=lie X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Mon, 13 Jun 2016 13:42:09 +0000 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id CBFCD8553B; Mon, 13 Jun 2016 13:42:08 +0000 (UTC) Received: from [10.13.129.159] (dhcp129-159.rdu.redhat.com [10.13.129.159]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u5DDg78d001643; Mon, 13 Jun 2016 09:42:08 -0400 Subject: Re: exercising current aarch64 kprobe support with systemtap To: Pratyush Anand References: <20160613042758.GB6344@dhcppc6> Cc: systemtap@sourceware.org, Dave Long , Mark Brown , Jeremy Linton From: William Cohen Message-ID: <156bb7aa-1542-80a4-5585-6a5cec12f97f@redhat.com> Date: Mon, 13 Jun 2016 13:42:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: <20160613042758.GB6344@dhcppc6> Content-Type: multipart/mixed; boundary="------------E07187EE792E0860C7A41D93" X-IsSubscribed: yes X-SW-Source: 2016-q2/txt/msg00220.txt.bz2 This is a multi-part message in MIME format. --------------E07187EE792E0860C7A41D93 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Content-length: 2964 On 06/13/2016 12:27 AM, Pratyush Anand wrote: > Hi Will, > > On 10/06/2016:05:28:36 PM, William Cohen wrote: >> On 06/09/2016 12:17 PM, William Cohen wrote: >>> I have been exercising the current kprobes and uprobe patches for >>> arm64 that are in the test_upstream_arm64_devel branch of >>> https://github.com/pratyushanand/linux with systemtap. There are a >>> two issues that I have seen on this kernel with systemtap. There are >>> some cases where kprobes fail to register at places that appear to be >>> reasonable places for a kprobe. The other issue is that kernel starts >>> having soft lockups when the hw_watch_addr.stp tests runs. To get >>> systemtap with the newer kernels need the attached hack because of >>> changes in the aarch64 macro args. >> ... >>> Soft Lookup for the hw_watch_addr.stp >>> >>> When running the hw_watch_addr.stp tests the machine gets a number of >>> processes using a lot of sys time and eventually the kernel reports >>> soft lockup: >>> >>> http://paste.stg.fedoraproject.org/5323/ >>> >>> The systemtap.base/overload.exp tests all pass, but maybe there is >>> much work being done to generate the backtraces for hw_watch_addr.stp >>> and that is triggering the problem. >> >> I can reliably reproduce the soft lockup running a single test with: >> >> /root/systemtap_write/install/bin/stap --all-modules \ >> /root/systemtap_write/systemtap/testsuite/systemtap.examples/memory/hw_watch_addr.stp \ >> 0x`grep "vm_dirty_ratio" /proc/kallsyms | awk '{print $1}'` -T 5 > /dev/null >> >> paste of output and soft lockup at: http://paste.stg.fedoraproject.org/5324/ >> >> One of the things that Jeremy Linton pointed to was: >> >> https://lkml.org/lkml/2016/3/21/198 > > Now we have following in arch_within_kprobe_blacklist(). So above issue should > not bite us. > > + !!search_exception_tables(addr)) > + return true; > >> >> Could the aarch64 hardware watchpoint handler have an issue that is causing this problem with the soft lockup? >> Or spending too much time doing the stack backtrace? > > Not sure, could be the locked up CPU waiting for a lock (spinlock), which is not > being released. Just noticed that, backtrace of all active CPUs (`echo l > > /proc/sysrq-trigger`) is not working for arm64. Probably because, we do not have > arch_trigger_all_cpu_backtrace() defined for aarch64. May be we can have one, > like that of arm. Backtrace of CPUs in this state might give us some input. > > ~Pratyush > Hi Pratyush, I did some additional experimentation this weekend. The version of systemtap script with an empty body (the attached hw_watch_addr_null2.stp) still caused the system to have soft lockup. However, the equivalent perf use of the hardware watchpoint worked fine (it got counts and no soft lookup): perf stat -a -e mem:0x`grep "vm_dirty_ratio" /proc/kallsyms | awk '{print $1}'`/1 bash So it looks like the issue might lie with something in systemtap. -Will --------------E07187EE792E0860C7A41D93 Content-Type: text/plain; charset=UTF-8; name="hw_watch_addr_null2.stp" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="hw_watch_addr_null2.stp" Content-length: 111 #! /usr/bin/env stap %( CONFIG_HAVE_HW_BREAKPOINT == "y" %? probe kernel.data($1).rw { } %: probe never {} %) --------------E07187EE792E0860C7A41D93--