From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 14347 invoked by alias); 4 Jul 2016 12:46:47 -0000 Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: systemtap-owner@sourceware.org Received: (qmail 14332 invoked by uid 89); 4 Jul 2016 12:46:45 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.2 spammy=HTo:U*wcohen, gathered, 10995, kprobe X-HELO: mail-oi0-f45.google.com Received: from mail-oi0-f45.google.com (HELO mail-oi0-f45.google.com) (209.85.218.45) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Mon, 04 Jul 2016 12:46:33 +0000 Received: by mail-oi0-f45.google.com with SMTP id r2so193225473oih.2 for ; Mon, 04 Jul 2016 05:46:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=YXGxlhCr2tU+Zj3ulg6b35z/gfar5Pd0IkRm2Uy77HE=; b=JktNT6kJ0asrbVBeb6WohW4U7z3TynkKhntkbD3nhhUxgjxrfShAcNRHZbOAG4RTBG vJUtZqatUjO7tXyM41sOmfG1v5X0AKI4m9KGDOjI+5gkRjFdrREj3/Adh83Jui7cEpkw ictPzWFy9742zlBHienT/oohSBb5ZZMCHvRT1VL+36p/Zz1lvb1k/e6koq41bPJO+dzc yDUiuxA9wfX4y9R9ZYqOQJS75xyyRixuvwGKWWaaQR/N/emkjm77FT/7RUhuVPlun3kS cr7kqgb0HqIWIzokk5KIBfSMJV9kfe6GZ0QA88nkRxvvMYL/CdKZLDOoz/f/b0D0zNtK GKzQ== X-Gm-Message-State: ALyK8tJfN1QpKXBQKQpX1E5fgAWWbcXHVjtwQtH9S+vxjRE4B0cv1iZgK3kXWSe85C2a/pjeSDlF7k0DK7xcuYnz X-Received: by 10.157.7.40 with SMTP id 37mr6959979ote.145.1467636391612; Mon, 04 Jul 2016 05:46:31 -0700 (PDT) MIME-Version: 1.0 Received: by 10.182.158.136 with HTTP; Mon, 4 Jul 2016 05:46:30 -0700 (PDT) In-Reply-To: References: <8f40d0b9-5550-92f9-d1c5-8769f52304c0@redhat.com> <576B5501.1030106@linaro.org> <576C29E1.8060805@linaro.org> <0a594132-796b-779d-b473-a06c0f3e8ae8@redhat.com> <20160627141840.GB8139@dhcppc9> From: Pratyush Anand Date: Mon, 04 Jul 2016 12:46:00 -0000 Message-ID: Subject: Re: exercising current aarch64 kprobe support with systemtap To: William Cohen Cc: David Long , systemtap@sourceware.org, Mark Brown , Jeremy Linton , David Smith Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes X-SW-Source: 2016-q3/txt/msg00006.txt.bz2 Hi Will, I did some more debugging, and this is what my understanding is: - While executing this test page_counter_cancel() is called. Probably there is an out of memory scenario. - page_counter_cancel() calls WARN_ON_ONCE(new < 0); - WARN_ON_ONCE() causes to invoke brk BUG_BRK_IMM (brk 0x800) instruction - Execution of brk 0x800 invokes calling of bug_handler() - bug_handler() calls report_bug() which calls __warn() - __warn() does lot of pr_warn() which invokes print_worker_info() where we have a kprobe instrumented. - Therefore, we are encountering this issue. ~Pratyush On Tue, Jun 28, 2016 at 8:50 AM, William Cohen wrote: > On 06/27/2016 10:18 AM, Pratyush Anand wrote: >> Hi Will, >> >> On 23/06/2016:03:22:44 PM, William Cohen wrote: >>> On 06/23/2016 02:26 PM, David Long wrote: >>>> On 06/23/2016 11:49 AM, William Cohen wrote: >>>>> On 06/22/2016 11:18 PM, David Long wrote: >>>>>> On 06/22/2016 04:24 PM, William Cohen wrote: >>>>>>> Hi all, >>>>>>> >>>>>>> When running the current systemtap checked out from the git reposit= ory >>>>>>> and a locally built kernel with the kprobes64-v13 patches (the >>>>>>> test_upstream_arm64_devel branch of >>>>>>> https://github.com/pratyushanand/linux) on Fedora 23 machine one of >>>>>>> the kprobes_onthefly.exp tests is causing the machine to get in a >>>>>>> state that requires rebooting to fix. This can be triggered by run= ning a >>>>>>> portion of the systemtap tests with: >>>>>>> >>>>>>> make installcheck RUNTESTFLAGS=3D"--debug systemtap.onthefly/kpr= obes_onthefly.exp" >>>>>>> >>>>>>> When it gets to the kprobes_onthefly - otf_stress_max_iter_5000 tes= t the >>>>>>> console starts spewing the following and needs to be rebooted: >>>>>>> >>>>>>> [23394.036860] Unexpected kernel single-step exception at EL1 >>>>>>> [23394.042434] Unexpected kernel single-step exception at EL1 >>>>>>> [23394.048008] Unexpected kernel single-step exception at EL1 >>>>>>> [23394.053541] Unexpected kernel single-step exception at EL1 >>>>>>> [23394.059053] Unexpected kernel single-step exception at EL1 >>>>>>> [23394.064545] Unexpected kernel single-step exception at EL1 >>>>>>> >>>>>>> Sorry I don't have the start of the failure it scrolled off the scr= een very quickly. >>>>>>> >>>>>>> -Will >>>>>>> >>>>>>> >>>>>> >>>>>> I'll take a look and see what I can figure out. >>>>>> >>>>>> In the meantime I did just push a v14 branch. I'm doubtful that it = will address the above problem even though it contains a few bug fixes. >>>>>> >>>>>> -dl >>>>>> >>>>> >>>>> Hi Dave and Pratyush, >>>>> >>>>> I tried the kprobes64-v13 kernel and it also seems to work, so it loo= kw like the problem might be in the the >>>>> test_upstream_arm64_devel branch of https://github.com/pratyushanand/= linux . >>>>> >>>>> -Will >>>>> >>>> >>>> I'm going to interpret that as meaning you know of no problem in the k= probes v14 patch that would give me pause to email it upstream. Do you dis= agree? >>>> >>>> -dl >>>> >>> >>> Hi Dave, >>> >>> Yes, the problem only seems to be in that other kernel from https://git= hub.com/pratyushanand/linux with the kprobe and uprobe patches, so the arm6= 4 patches do not appear to be the problem. I don't know what is causing th= e problem maybe there is something going on with the porting of the patche= s to that kernel or the patches included in there (uprobes/kexec) in there. >> >> Just to update: >> >> I confirm that problem arises after uprobe patches only, but not yet sur= e that >> actual culprit is uprobe code. >> >> I can see that kprobes_onthefly.exp also exercises uprobes in the test. = It >> seems, when problem happens, there was a kprobe at print_worker_info(). >> >> Most likely re-entrant kprobe is called when kprobe is instrumented at >> print_worker_info(). I guessed it could be show_regs() from arm64/kprobe= code, >> but commenting show_regs() did not make any difference. Even blacklisting >> print_worker_info() also did not resolve it, probelem reproduced in a di= fferent >> way after blacklisting. >> >> So, still its vague and debugging is continued. >> If I can clearly understand the systemtap test code, then probably it wi= ll be >> easier to debug. I mean, if I can get the kernel and user space symbols = name >> where this test is instrumenting probes then that would help a lot to ze= ro it >> down. >> >> ~Pratyush >> > > Hi Pratyush, > > My understanding is that the systemtap onthefly support enables/disable t= he probe as metnioned in the following sytemtap bugzilla entry (and the one= s that it is dependent on): https://sourceware.org/bugzilla/show_bug.cgi?id= =3D10995. It would be handy to things pared down to the systemtap script t= hat triggers the problem. Putting some diagnostic puts it looks like the s= cript that triggers the problems it looks like it is something like the att= ached onthefly_trigger.stp (that was gathered on a x86_64 machine so it mig= ht not be exactly what is causing the problem on aarch64. David Smith, any= suggestions on how to debug based on your experiences from https://sourcew= are.org/bugzilla/show_bug.cgi?id=3D17126 where the ppc64 had a similar issu= e with onthefly testing? > > The "Unexpected kernel single-step exception at EL1" reminds me of the ti= mes when kprobes couldn't find a handler. Maybe there is some situation wh= ere the kprobe is being removed but the breakpoint is still around. Did you= get a backtrace with the insertino of the "BUG()" where that message is pr= inted out? I wonder if it might be triggered by the (thread_flags & _TIF_UP= ROBE) somehow being true and the aarch64 do_notify_resume starts running. > > -Will