public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* [Bug kprobes/22147] New: stopped to probe do_arch_ctrl_64
@ 2017-09-17 20:41 nlzdbg at gmail dot com
  2017-09-17 20:49 ` [Bug kprobes/22147] " nlzdbg at gmail dot com
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: nlzdbg at gmail dot com @ 2017-09-17 20:41 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=22147

            Bug ID: 22147
           Summary: stopped to probe do_arch_ctrl_64
           Product: systemtap
           Version: unspecified
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: kprobes
          Assignee: systemtap at sourceware dot org
          Reporter: nlzdbg at gmail dot com
  Target Milestone: ---

fedora core 26 x64
kernel version either 4.12.9 or 4.12.12 x64.
gcc-7.1.1-3.fc26.x86_64
systemtap-3.1-5.fc26.x86_64

tap script 
nvidia.debug.stp

global cnt_fs_base_set

probe begin 
{
 print ("start systemtap for Xorg nvidia fs_base change debugging version\n")
 printf( "pid=%d, target=%d\n",pid(),target())
}

probe end {
 print ( "cnt_fs_base_set=%d\n", cnt_fs_base_set )
 print ( "end systemtap\n")

}

function my_get_fs_base:long () %{
unsigned int low,high;
__asm__ volatile (
  "movq $0xc0000100, %%rcx\n\t"
  "rdmsr\n\t" 
  :"=a"(low),"=d"(high)::"rcx"
);
STAP_RETURN( (unsigned long long) high<<32 | low ); 
%}


probe kernel.function("do_arch_prctl_64").call 
{
 if ( pid() == target() ) {
    if ( ulonglong_arg(2) ==  0x1002 ) {
      ++cnt_fs_base_set
      printf ("do_arch_prctl_64 ARCH_SET_FS fun %s option %x arg %x\n",
ppfunc(), ulonglong_arg(2), ulonglong_arg(3) )
      printf( " tid %d current fs base %x \n", tid(), my_get_fs_base() )

    }
 }
}

some times it did probe but it stopped to do it.
It is not clear  what is changed.
I did kernel and systemtap upgrade.
But return to previous 4.12.9 where I had it working
but behavour the same as for 4.12.12

dmesg

[17032.940742] NMI watchdog: BUG: soft lockup - CPU#7 stuck for 23s!
[stapio:10314]
[17032.940744] Modules linked in: stap_nvidiadbg(OE) overlay razerkbd(OE)
vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) rfcomm fuse ccm
xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns
nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle
ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw
iptable_security ebtable_filter ebtables ip6table_filter ip6_tables bnep sunrpc
vfat fat arc4 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp
iTCO_wdt iTCO_vendor_support kvm_intel iwlmvm kvm mac80211 snd_hda_codec_hdmi
[17032.940766]  nvidia_drm(POE) irqbypass nvidia_modeset(POE) crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel nvidia(POE) snd_hda_codec_realtek
snd_hda_codec_generic intel_cstate intel_uncore iwlwifi intel_rapl_perf
snd_hda_intel cfg80211 snd_hda_codec raid0 snd_hda_core snd_hwdep snd_seq
snd_seq_device joydev snd_pcm hci_uart razermouse(OE) btusb snd_timer btrtl
btbcm snd btqca btintel mei_me bluetooth soundcore i2c_i801 mei shpchp
ecdh_generic rfkill pinctrl_sunrisepoint intel_lpss_acpi pinctrl_intel
intel_lpss acpi_als tpm_tis kfifo_buf tpm_tis_core industrialio tpm acpi_pad
binfmt_misc i915 crc32c_intel e1000e ptp nvme pps_core nvme_core i2c_algo_bit
drm_kms_helper drm wmi video i2c_hid nct6775 hwmon_vid
[17032.940790] CPU: 7 PID: 10314 Comm: stapio Tainted: P           OE  
4.12.9-300.fc26.x86_64 #1
[17032.940790] Hardware name: To Be Filled By O.E.M. To Be Filled By
O.E.M./Z270 Gaming-ITX/ac, BIOS P2.30 07/14/2017
[17032.940791] task: ffff8faa605c4c80 task.stack: ffffa2eec8948000
[17032.940805] RIP: 0010:strchr+0xc/0x30
[17032.940806] RSP: 0018:ffffa2eec894bc30 EFLAGS: 00000202 ORIG_RAX:
ffffffffffffff10
[17032.940807] RAX: 000000000000006e RBX: ffffffffc1f63460 RCX:
ffffffffc0d4fe90
[17032.940807] RDX: 000000000000003a RSI: 000000000000003a RDI:
ffffffffc1a52d69
[17032.940808] RBP: ffffa2eec894bc30 R08: 0000000000000075 R09:
ffffffffc1a52d69
[17032.940808] R10: ffffffff99c5f690 R11: ffffffff99c5fa20 R12:
ffffffffc12d3840
[17032.940809] R13: 0000000000151380 R14: 0000000000003d50 R15:
ffffa2eec894bda8
[17032.940809] FS:  00007f534c68a700(0000) GS:ffff8fab4edc0000(0000)
knlGS:0000000000000000
[17032.940810] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[17032.940810] CR2: 00007f534409aff8 CR3: 000000076a233000 CR4:
00000000003406e0
[17032.940811] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[17032.940811] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[17032.940812] Call Trace:
[17032.940816]  stapkp_symbol_callback+0xa0/0x120 [stap_nvidiadbg]
[17032.940937]  ? _nv003965rm+0x30/0x30 [nvidia]
[17032.940940]  ? __stp_relay_wakeup_timer+0x90/0x90 [stap_nvidiadbg]
[17032.940941]  module_kallsyms_on_each_symbol+0x73/0xb0
[17032.940943]  ? __stp_relay_wakeup_timer+0x90/0x90 [stap_nvidiadbg]
[17032.940944]  kallsyms_on_each_symbol+0xb9/0xc0
[17032.940945]  ? probe_12123+0x3f/0x290 [stap_nvidiadbg]
[17032.940947]  _stp_ctl_write_cmd+0xa3e/0xcb0 [stap_nvidiadbg]
[17032.940949]  full_proxy_write+0x54/0x90
[17032.940951]  __vfs_write+0x37/0x160
[17032.940952]  ? selinux_file_permission+0xfb/0x120
[17032.940954]  ? security_file_permission+0x3b/0xc0
[17032.940955]  vfs_write+0xb1/0x1a0
[17032.940956]  SyS_write+0x55/0xc0
[17032.940959]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[17032.940959] RIP: 0033:0x7f534c28522d
[17032.940960] RSP: 002b:00007fff4cd4cad0 EFLAGS: 00000293 ORIG_RAX:
0000000000000001
[17032.940961] RAX: ffffffffffffffda RBX: 00007f534409b700 RCX:
00007f534c28522d
[17032.940961] RDX: 000000000000000c RSI: 00007fff4cd4cae0 RDI:
0000000000000004
[17032.940962] RBP: 00007fff4cd4bd90 R08: 0000000000000000 R09:
00007f534409b700
[17032.940962] R10: 00007f534409b9d0 R11: 0000000000000293 R12:
0000000000000000
[17032.940962] R13: 00007fff4cd4bd20 R14: 00007f534409b9c0 R15:
0000000000000000
[17032.940963] Code: 01 84 c9 74 05 48 39 d0 75 e5 31 c0 5d c3 44 38 c1 19 c0
83 c8 01 5d c3 0f 1f 80 00 00 00 00 0f b6 07 55 48 89 e5 40 38 f0 74 17 <84> c0
89 f2 75 06 eb 14 84 c0 74 10 48 83 c7 01 0f b6 07 38 d0 


Any clue what is wrong with systemstap?
Is it systemtap problem or problem of linux kernel?

Why pass 2 run ~25 minutes? Is it expected so long time.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug kprobes/22147] stopped to probe do_arch_ctrl_64
  2017-09-17 20:41 [Bug kprobes/22147] New: stopped to probe do_arch_ctrl_64 nlzdbg at gmail dot com
@ 2017-09-17 20:49 ` nlzdbg at gmail dot com
  2017-09-18 16:27 ` dsmith at redhat dot com
  2017-09-19 16:00 ` dsmith at redhat dot com
  2 siblings, 0 replies; 4+ messages in thread
From: nlzdbg at gmail dot com @ 2017-09-17 20:49 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=22147

--- Comment #1 from nlz <nlzdbg at gmail dot com> ---
I need this feature to localize bug in nvidia 384.69 kernel driver
becausefs_base is changed and Xorg crashed  in glib.c.
Issue
https://devtalk.nvidia.com/default/topic/1023538/384-69-libglx-xorg-strtol-sigsegv/
I need to figure out in which context do_arch_ctrl_64 is called.
While systemtap worked I found that fs_base is spolied due to
chain of calls
SyS_arch_prctl() -> do_arch_prctl_64()
but I want localize kernel and user stack more.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug kprobes/22147] stopped to probe do_arch_ctrl_64
  2017-09-17 20:41 [Bug kprobes/22147] New: stopped to probe do_arch_ctrl_64 nlzdbg at gmail dot com
  2017-09-17 20:49 ` [Bug kprobes/22147] " nlzdbg at gmail dot com
@ 2017-09-18 16:27 ` dsmith at redhat dot com
  2017-09-19 16:00 ` dsmith at redhat dot com
  2 siblings, 0 replies; 4+ messages in thread
From: dsmith at redhat dot com @ 2017-09-18 16:27 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=22147

David Smith <dsmith at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dsmith at redhat dot com

--- Comment #2 from David Smith <dsmith at redhat dot com> ---
(In reply to nlz from comment #0)

... stuff deleted ...

> some times it did probe but it stopped to do it.
> It is not clear  what is changed.
> I did kernel and systemtap upgrade.
> But return to previous 4.12.9 where I had it working
> but behavour the same as for 4.12.12

OK, let's try to clarify some things here to make sure I understand you.

- With systemtap version X and kernel 4.12.9, the above script worked for you.

- You then upgraded systemtap and the kernel

- With systemtap version Y and kernel 4.12.12, the above script causes a kernel
BUG.

The first thing I'll need to know is what version of systemtap did you start
with, and what version did you upgrade to.

Anytime a systemtap module causes a kernel BUG, that's a systemtap problem that
we'll need to fix.

Of course your function my_get_fs_base() is quite scary. If you comment out the
call to that function, do you still get the kernel BUG? Is that function based
on some code in the kernel or did you write it from scratch yourself?

Here's another oddity in your code. The following probe is quite mixed up:

====
probe kernel.function("do_arch_prctl_64").call 
{
 if ( pid() == target() ) {
    if ( ulonglong_arg(2) ==  0x1002 ) {
      ++cnt_fs_base_set
      printf ("do_arch_prctl_64 ARCH_SET_FS fun %s option %x arg %x\n",
ppfunc(), ulonglong_arg(2), ulonglong_arg(3) )
     printf( " tid %d current fs base %x \n", tid(), my_get_fs_base() )

    }
 }
}
====

You ask for a dwarf-based function probe ("kernel.function"), then you use
dwarfless argument handling (ulonglong_arg() calls instead of just using
dwarf-based argument handling - $option and $arg2 in this case). In theory this
should be OK for x86_64, but note that we never test it. I'd either switch to
dwarfless-based function probling (i.e. "kprobe.function") or dwarf-based
argument handling.

We can start looking at your DMESG output once you let us know what versions of
systemtap you were using.

Another thing, are you using a self-built kernel or a distro kernel? Are you
using self-built versions of systemtap or distro versions?

> Any clue what is wrong with systemstap?
> Is it systemtap problem or problem of linux kernel?
> 
> Why pass 2 run ~25 minutes? Is it expected so long time.

Pass 2 shouldn't take 25 minutes. It does parse the entire tapset library and
then your script, but it shouldn't take 25 minutes. I'm not sure what to think
here. Please answer my above questions and we'll try to help you figure this
out also.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug kprobes/22147] stopped to probe do_arch_ctrl_64
  2017-09-17 20:41 [Bug kprobes/22147] New: stopped to probe do_arch_ctrl_64 nlzdbg at gmail dot com
  2017-09-17 20:49 ` [Bug kprobes/22147] " nlzdbg at gmail dot com
  2017-09-18 16:27 ` dsmith at redhat dot com
@ 2017-09-19 16:00 ` dsmith at redhat dot com
  2 siblings, 0 replies; 4+ messages in thread
From: dsmith at redhat dot com @ 2017-09-19 16:00 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=22147

--- Comment #3 from David Smith <dsmith at redhat dot com> ---
(In reply to David Smith from comment #2)

Here's something else about this probe:

> ====
> probe kernel.function("do_arch_prctl_64").call 
> {
>  if ( pid() == target() ) {
>     if ( ulonglong_arg(2) ==  0x1002 ) {
>       ++cnt_fs_base_set
>       printf ("do_arch_prctl_64 ARCH_SET_FS fun %s option %x arg %x\n",
> ppfunc(), ulonglong_arg(2), ulonglong_arg(3) )
>      printf( " tid %d current fs base %x \n", tid(), my_get_fs_base() )
> 
>     }
>  }
> }
> ====

In the upstream kernel source, that function is defined like this:

====
long do_arch_prctl_64(struct task_struct *task, int option, unsigned long
arg2);
====

You are treating argument 2 as a ulonglong (8 bytes on x86_64), where it is
really an int (4 bytes on x86_64). So, you are probably getting incorrect
values for argument 2. So, you should use 'int_arg(2)', not 'ulonglong_arg(2)'.
If you aren't getting incorrect values, you are getting lucky.

(Or course argument 3 isn't really a ulonglong, but since a ulonglong takes the
same number of bytes as a long on x86_64 you should be OK there.)

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-09-19 16:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-17 20:41 [Bug kprobes/22147] New: stopped to probe do_arch_ctrl_64 nlzdbg at gmail dot com
2017-09-17 20:49 ` [Bug kprobes/22147] " nlzdbg at gmail dot com
2017-09-18 16:27 ` dsmith at redhat dot com
2017-09-19 16:00 ` dsmith at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).