public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
From: William Cohen <wcohen@redhat.com>
To: Pratyush Anand <panand@redhat.com>
Cc: David Long <dave.long@linaro.org>,
	systemtap@sourceware.org,        Mark Brown <broonie@linaro.org>,
	Jeremy Linton <jlinton@redhat.com>,
	       David Smith <dsmith@redhat.com>
Subject: Re: exercising current aarch64 kprobe support with systemtap
Date: Tue, 28 Jun 2016 03:20:00 -0000	[thread overview]
Message-ID: <ff385049-3e0f-b3ee-8395-4cc3ab1b13d5@redhat.com> (raw)
In-Reply-To: <20160627141840.GB8139@dhcppc9>

[-- Attachment #1: Type: text/plain, Size: 4847 bytes --]

On 06/27/2016 10:18 AM, Pratyush Anand wrote:
> Hi Will,
> 
> On 23/06/2016:03:22:44 PM, William Cohen wrote:
>> On 06/23/2016 02:26 PM, David Long wrote:
>>> On 06/23/2016 11:49 AM, William Cohen wrote:
>>>> On 06/22/2016 11:18 PM, David Long wrote:
>>>>> On 06/22/2016 04:24 PM, William Cohen wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> When running the current systemtap checked out from the git repository
>>>>>> and a locally built kernel with the kprobes64-v13 patches (the
>>>>>> test_upstream_arm64_devel branch of
>>>>>> https://github.com/pratyushanand/linux) on Fedora 23 machine one of
>>>>>> the kprobes_onthefly.exp tests is causing the machine to get in a
>>>>>> state that requires rebooting to fix.  This can be triggered by running a
>>>>>> portion of the systemtap tests with:
>>>>>>
>>>>>>    make installcheck RUNTESTFLAGS="--debug systemtap.onthefly/kprobes_onthefly.exp"
>>>>>>
>>>>>> When it gets to the kprobes_onthefly - otf_stress_max_iter_5000 test the
>>>>>> console starts spewing the following and needs to be rebooted:
>>>>>>
>>>>>> [23394.036860] Unexpected kernel single-step exception at EL1
>>>>>> [23394.042434] Unexpected kernel single-step exception at EL1
>>>>>> [23394.048008] Unexpected kernel single-step exception at EL1
>>>>>> [23394.053541] Unexpected kernel single-step exception at EL1
>>>>>> [23394.059053] Unexpected kernel single-step exception at EL1
>>>>>> [23394.064545] Unexpected kernel single-step exception at EL1
>>>>>>
>>>>>> Sorry I don't have the start of the failure it scrolled off the screen very quickly.
>>>>>>
>>>>>> -Will
>>>>>>
>>>>>>
>>>>>
>>>>> I'll take a look and see what I can figure out.
>>>>>
>>>>> In the meantime I did just push a v14 branch.  I'm doubtful that it will address the above problem even though it contains a few bug fixes.
>>>>>
>>>>> -dl
>>>>>
>>>>
>>>> Hi Dave and Pratyush,
>>>>
>>>> I tried the kprobes64-v13 kernel and it also seems to work, so it lookw like the problem might be in the the
>>>> test_upstream_arm64_devel branch of https://github.com/pratyushanand/linux .
>>>>
>>>> -Will
>>>>
>>>
>>> I'm going to interpret that as meaning you know of no problem in the kprobes v14 patch that would give me pause to email it upstream.  Do you disagree?
>>>
>>> -dl
>>>
>>
>> Hi Dave,
>>
>> Yes, the problem only seems to be in that other kernel from https://github.com/pratyushanand/linux with the kprobe and uprobe patches, so the arm64 patches do not appear to be the problem.  I don't know what is causing the problem  maybe there is something going on with the porting of the patches to that kernel or the patches included in there (uprobes/kexec) in there. 
> 
> Just to update:
> 
> I confirm that problem arises after uprobe patches only, but not yet sure that
> actual culprit is uprobe code. 
> 
> I can see that kprobes_onthefly.exp also exercises uprobes in the test. It
> seems, when problem happens, there was a kprobe at print_worker_info(). 
> 
> Most likely re-entrant kprobe is called when kprobe is instrumented at
> print_worker_info(). I guessed it could be show_regs() from arm64/kprobe code,
> but commenting show_regs() did not make any difference. Even blacklisting
> print_worker_info() also did not resolve it, probelem reproduced in a different
> way after blacklisting.
> 
> So, still its vague and debugging is continued.
> If I can clearly understand the systemtap test code, then probably it will be
> easier to debug. I mean, if I can get the kernel and user space symbols name
> where this test is instrumenting probes then that would help a lot to zero it
> down.
> 
> ~Pratyush
> 

Hi Pratyush,

My understanding is that the systemtap onthefly support enables/disable the probe as metnioned in the following sytemtap bugzilla entry (and the ones that it is dependent on): https://sourceware.org/bugzilla/show_bug.cgi?id=10995.  It would be handy to things pared down to the systemtap script that triggers the problem.  Putting some diagnostic puts it looks like the script that triggers the problems it looks like it is something like the attached onthefly_trigger.stp (that was gathered on a x86_64 machine so it might not be exactly what is causing the problem on aarch64.  David Smith, any suggestions on how to debug based on your experiences from https://sourceware.org/bugzilla/show_bug.cgi?id=17126 where the ppc64 had a similar issue with onthefly testing?

The "Unexpected kernel single-step exception at EL1" reminds me of the times when kprobes couldn't find a handler.  Maybe there is some situation where the kprobe is being removed but the breakpoint is still around. Did you get a backtrace with the insertino of the "BUG()" where that message is printed out? I wonder if it might be triggered by the (thread_flags & _TIF_UPROBE) somehow being true and the aarch64 do_notify_resume starts running.

-Will

[-- Attachment #2: onthefly_trigger.stp --]
[-- Type: text/plain, Size: 2276 bytes --]

      # We want these probes to fire as deterministically as possible so that
      # their outputs can easily be predicted and compared. Unfortunately this
      # is complicated by multiple facts, including
      #     (1) the timer might be so fast that the probes don't have time to
      #         fire and print something,
      #     (2) the kprobe might print something after it is re-enabled, but
      #         before the kretprobe is re-enabled,
      #     (3) disabling a kretprobe won't stop an already running handler.
      #
      # To get around these issues, we use a simple state machine. The states
      # are as follow:
      #
      #     0 = cond disabled
      #     1 = cond enabled, but kprobe && kretprobe not yet enabled
      #     2 = cond enabled, kprobe enabled, kretprobe not yet enabled
      #     3 = cond enabled, kprobe && kretprobe enabled, nothing printed yet
      #     4 = cond enabled, 'hit' printed but not 'rethit'
      #     5 = cond enabled, 'rethit' printed

      global state = 1
      global toggles = 0

      probe kernel.function("vfs_read").call if (state > 0) {
         if (state == 3) {
            println("hit")
            state++;
         } else if (state == 1) {
            state++;
         }
      }

      probe kernel.function("vfs_read").return if (state > 0) {
         # ensure that nothing changed during the vfs_read body
         if (state != @entry(state))
            next
         if (state == 4) {
            println("rethit")
            state++;
         } else if (state == 2) {
            state++;
         }
      }

      probe begin, end, error, kernel.function("*@workqueue.c"),  process.begin, process.end, kernel.trace("*"), process("echo").function("*"),  netfilter.pf("NFPROTO_IPV4").hook("NF_INET_LOCAL_IN")?,  netfilter.pf("NFPROTO_IPV4").hook("NF_INET_LOCAL_OUT")?,  perf.sw.cpu_clock.sample(1000000)?, timer.profile.tick? {
         if (state != 0 && state != 5)
            next # give probes more time to move through the states
         toggles++
         if (toggles > 5000)
            exit()
         else {
            println("toggling")
            state = !state
         }
      }

      probe timer.s(360) {
         println("timed out")
         exit()
      }

  reply	other threads:[~2016-06-28  3:20 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-09 16:17 William Cohen
2016-06-09 19:52 ` William Cohen
2016-06-10  3:42   ` David Long
2016-06-10  5:49   ` David Long
2016-06-10 13:43     ` Pratyush Anand
2016-06-10 14:03       ` William Cohen
2016-06-10 14:37         ` David Long
2016-06-10 15:27           ` William Cohen
2016-06-10 14:20       ` David Long
2016-06-10 15:11         ` William Cohen
2016-06-10 17:07         ` Pratyush Anand
2016-07-12 14:33     ` William Cohen
2016-07-13 18:26       ` David Long
2016-07-13 18:47         ` Pratyush Anand
2016-07-13 19:45           ` William Cohen
2016-06-10 21:28 ` William Cohen
2016-06-10 21:37   ` William Cohen
2016-06-13  4:28   ` Pratyush Anand
2016-06-13 13:42     ` William Cohen
2016-06-22 20:24   ` William Cohen
2016-06-23  3:19     ` David Long
2016-06-23 13:42       ` William Cohen
2016-06-23 13:47         ` David Smith
2016-06-23 15:49       ` William Cohen
2016-06-23 18:26         ` David Long
2016-06-23 19:22           ` William Cohen
2016-06-27  2:57             ` David Long
2016-06-27 14:18             ` Pratyush Anand
2016-06-28  3:20               ` William Cohen [this message]
2016-07-04 12:46                 ` Pratyush Anand
2016-07-07 19:05                   ` David Long
2016-07-07 19:58                     ` Frank Ch. Eigler
2016-08-03 13:13                       ` Pratyush Anand
2016-08-03 14:51                         ` William Cohen
2016-08-03 15:11                           ` David Long
2016-08-03 17:40                         ` William Cohen
2016-08-03 20:00                           ` Lastest kprobes64 patch David Long
2016-08-03 20:01                             ` Frank Ch. Eigler
2016-08-03 20:08                               ` David Long
2016-08-04  5:03                             ` Pratyush Anand
2016-08-04 13:07                               ` David Long
2016-08-04  4:42                           ` exercising current aarch64 kprobe support with systemtap Pratyush Anand
2016-08-04 13:57                             ` William Cohen
2016-08-04 14:36                               ` Pratyush Anand
2016-08-04 14:50                                 ` William Cohen
2016-08-04 20:51                                 ` William Cohen
2016-08-17 14:36                                   ` William Cohen
2016-08-17 18:04                                     ` David Smith
2016-08-17 18:28                                       ` William Cohen
2016-08-18 15:07                                         ` David Smith
2016-08-18 15:16                                           ` William Cohen
2016-08-18 15:39                                             ` David Smith
2016-08-18 14:55                                     ` Pratyush Anand
2016-06-13 16:11 ` William Cohen
2016-06-13 16:15   ` William Cohen
2016-06-14  4:27   ` Pratyush Anand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ff385049-3e0f-b3ee-8395-4cc3ab1b13d5@redhat.com \
    --to=wcohen@redhat.com \
    --cc=broonie@linaro.org \
    --cc=dave.long@linaro.org \
    --cc=dsmith@redhat.com \
    --cc=jlinton@redhat.com \
    --cc=panand@redhat.com \
    --cc=systemtap@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).