On 06/13/2016 12:27 AM, Pratyush Anand wrote: > Hi Will, > > On 10/06/2016:05:28:36 PM, William Cohen wrote: >> On 06/09/2016 12:17 PM, William Cohen wrote: >>> I have been exercising the current kprobes and uprobe patches for >>> arm64 that are in the test_upstream_arm64_devel branch of >>> https://github.com/pratyushanand/linux with systemtap. There are a >>> two issues that I have seen on this kernel with systemtap. There are >>> some cases where kprobes fail to register at places that appear to be >>> reasonable places for a kprobe. The other issue is that kernel starts >>> having soft lockups when the hw_watch_addr.stp tests runs. To get >>> systemtap with the newer kernels need the attached hack because of >>> changes in the aarch64 macro args. >> ... >>> Soft Lookup for the hw_watch_addr.stp >>> >>> When running the hw_watch_addr.stp tests the machine gets a number of >>> processes using a lot of sys time and eventually the kernel reports >>> soft lockup: >>> >>> http://paste.stg.fedoraproject.org/5323/ >>> >>> The systemtap.base/overload.exp tests all pass, but maybe there is >>> much work being done to generate the backtraces for hw_watch_addr.stp >>> and that is triggering the problem. >> >> I can reliably reproduce the soft lockup running a single test with: >> >> /root/systemtap_write/install/bin/stap --all-modules \ >> /root/systemtap_write/systemtap/testsuite/systemtap.examples/memory/hw_watch_addr.stp \ >> 0x`grep "vm_dirty_ratio" /proc/kallsyms | awk '{print $1}'` -T 5 > /dev/null >> >> paste of output and soft lockup at: http://paste.stg.fedoraproject.org/5324/ >> >> One of the things that Jeremy Linton pointed to was: >> >> https://lkml.org/lkml/2016/3/21/198 > > Now we have following in arch_within_kprobe_blacklist(). So above issue should > not bite us. > > + !!search_exception_tables(addr)) > + return true; > >> >> Could the aarch64 hardware watchpoint handler have an issue that is causing this problem with the soft lockup? >> Or spending too much time doing the stack backtrace? > > Not sure, could be the locked up CPU waiting for a lock (spinlock), which is not > being released. Just noticed that, backtrace of all active CPUs (`echo l > > /proc/sysrq-trigger`) is not working for arm64. Probably because, we do not have > arch_trigger_all_cpu_backtrace() defined for aarch64. May be we can have one, > like that of arm. Backtrace of CPUs in this state might give us some input. > > ~Pratyush > Hi Pratyush, I did some additional experimentation this weekend. The version of systemtap script with an empty body (the attached hw_watch_addr_null2.stp) still caused the system to have soft lockup. However, the equivalent perf use of the hardware watchpoint worked fine (it got counts and no soft lookup): perf stat -a -e mem:0x`grep "vm_dirty_ratio" /proc/kallsyms | awk '{print $1}'`/1 bash So it looks like the issue might lie with something in systemtap. -Will