public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* Recent aarch64 kprobes and uprobes patch systemtap testing
@ 2015-12-10 20:24 William Cohen
  2015-12-10 21:12 ` David Long
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: William Cohen @ 2015-12-10 20:24 UTC (permalink / raw)
  To: systemtap; +Cc: Dave Long, Pratyush Anand

Hi All,

Dave Long and Pratyush Anand have been working on kprobe and uprobe patches for aarch64.  I have built a local version the uprobe/upstream_arm64_devel branch of https://github.com/pratyushanand/linux which includes those patches in a linux-4.4.0-rc3 kernel.

The tests seemed to run fairly well and the results have been uploaded to dejazilla:

https://web.elastic.org/~dejazilla/viewsummary.php?summary=%3D%27%3C56698DCC.3090207%40redhat.com%3E%27

		=== systemtap Summary ===

# of expected passes		6096
# of unexpected failures	111
# of unexpected successes	2
# of expected failures		333
# of unknown successes		2
# of known failures		89
# of untested testcases		97
# of unsupported tests		27
runtest completed at Thu Dec 10 05:54:32 2015


There are still some areas needing for for aarch64 such as stack backtrace support.

The following failure looks suspect because the child process died:


spawn stap -g ./systemtap.examples/process/threadstacks.stp -Gsize=65536 -c /root/systemtap_write/systemtap/testsuite/pthread_stacks.x 1024 0 -d /root/systemtap_write/systemtap/testsuite/pthread_stacks.x

pthread_stacks.x: ./systemtap.base/pthread_stacks.c:67: main: Assertion `rc == 0' failed.

WARNING: Child process exited with signal 6 (Aborted)

pthread_stacks.[3567] overwrote __default_stacksize@0x3ffb3be4338 (8388608->65536)

WARNING: /root/systemtap_write/install/bin/staprun exited with status: 1

Pass 5: run failed.  [man error::pass5]

FAIL: pthread_stacks -Gsize (0 0)

The fslatency-nd and fsslower-nd tests need further investigation:

PASS: ./systemtap.examples/lwtools/fslatency-nd build
meta taglines 'test_installcheck: stap fslatency-nd.stp 1 1' tag 'test_installcheck' value 'stap fslatency-nd.stp 1 1'
attempting command stap fslatency-nd.stp 1 1
OUT ERROR: read fault [man error::fault] at 0x0000000000000034 (addr) near operator '@cast' at fslatency-nd.stp:66:15
Tracing FS sync reads and writes... Output every 1 secs.
WARNING: Number of errors: 1, skipped probes: 1
WARNING: /root/systemtap_write/install/bin/staprun exited with status: 1
Pass 5: run failed.  [man error::pass5]
child process exited abnormally
RC 1
FAIL: ./systemtap.examples/lwtools/fslatency-nd run

PASS: ./systemtap.examples/lwtools/fsslower-nd build
meta taglines 'test_installcheck: stap fsslower-nd.stp -c "sleep 1"' tag 'test_installcheck' value 'stap fsslower-nd.stp -c "sleep 1"'
attempting command stap fsslower-nd.stp -c "sleep 1"
OUT ERROR: read fault [man error::fault] at 0x0000000000000034 (addr) near operator '@cast' at fsslower-nd.stp:68:15
Tracing FS sync reads and writes slower than 10 ms... Hit Ctrl-C to end.
TIME     PID    COMM             FUNC           SIZE     LAT(ms)
WARNING: Number of errors: 1, skipped probes: 1
WARNING: /root/systemtap_write/install/bin/staprun exited with status: 1
Pass 5: run failed.  [man error::pass5]
child process exited abnormally
RC 1
FAIL: ./systemtap.examples/lwtools/fsslower-nd run


Also a number of network tests failed like the following 


TEST PWD=/root/systemtap_write/systemtap/testsuite/systemtap.examples/network
meta taglines 'test_check: stap -g -p4 netfilter_drop.stp TCP 1' tag 'test_check' value 'stap -g -p4 netfilter_drop.stp TCP 1'
attempting command stap -g -p4 netfilter_drop.stp TCP 1
OUT /tmp/stapbIEqFl/stap_0c0db8521e3ea3b2d26dcde208a77baf_21082_src.c:2731:1: error: initialization from incompatible pointer type [-Werror]
 .hook = enter_netfilter_probe_0,
 ^
/tmp/stapbIEqFl/stap_0c0db8521e3ea3b2d26dcde208a77baf_21082_src.c:2731:1: error: (near initialization for 'netfilter_opts_0.hook') [-Werror]
/tmp/stapbIEqFl/stap_0c0db8521e3ea3b2d26dcde208a77baf_21082_src.c:2732:1: error: unknown field 'owner' specified in initializer
 .owner = THIS_MODULE,
 ^
/tmp/stapbIEqFl/stap_0c0db8521e3ea3b2d26dcde208a77baf_21082_src.c:2732:1: error: initialization from incompatible pointer type [-Werror]
/tmp/stapbIEqFl/stap_0c0db8521e3ea3b2d26dcde208a77baf_21082_src.c:2732:1: error: (near initialization for 'netfilter_opts_0.dev') [-Werror]
cc1: all warnings being treated as errors
make[4]: *** [/tmp/stapbIEqFl/stap_0c0db8521e3ea3b2d26dcde208a77baf_21082_src.o] Error 1
make[3]: *** [_module_/tmp/stapbIEqFl] Error 2
WARNING: kbuild exited with status: 2
Pass 4: compilation failed.  [man error::pass4]
child process exited abnormally
RC 1
FAIL: ./systemtap.examples/network/netfilter_drop build

-Will

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Recent aarch64 kprobes and uprobes patch systemtap testing
  2015-12-10 20:24 Recent aarch64 kprobes and uprobes patch systemtap testing William Cohen
@ 2015-12-10 21:12 ` David Long
  2015-12-11  4:19   ` Pratyush Anand
                     ` (2 more replies)
  2015-12-10 21:17 ` David Long
  2015-12-11 17:23 ` David Smith
  2 siblings, 3 replies; 13+ messages in thread
From: David Long @ 2015-12-10 21:12 UTC (permalink / raw)
  To: William Cohen, systemtap; +Cc: Pratyush Anand

On 12/10/2015 03:24 PM, William Cohen wrote:
> Hi All,
>
> Dave Long and Pratyush Anand have been working on kprobe and uprobe patches for aarch64.  I have built a local version the uprobe/upstream_arm64_devel branch of https://github.com/pratyushanand/linux which includes those patches in a linux-4.4.0-rc3 kernel.
>
> The tests seemed to run fairly well and the results have been uploaded to dejazilla:
>
> https://web.elastic.org/~dejazilla/viewsummary.php?summary=%3D%27%3C56698DCC.3090207%40redhat.com%3E%27
>
> 		=== systemtap Summary ===
>
> # of expected passes		6096
> # of unexpected failures	111
> # of unexpected successes	2
> # of expected failures		333
> # of unknown successes		2
> # of known failures		89
> # of untested testcases		97
> # of unsupported tests		27
> runtest completed at Thu Dec 10 05:54:32 2015
>
>
> There are still some areas needing for for aarch64 such as stack backtrace support.
>
> The following failure looks suspect because the child process died:
>
>
> spawn stap -g ./systemtap.examples/process/threadstacks.stp -Gsize=65536 -c /root/systemtap_write/systemtap/testsuite/pthread_stacks.x 1024 0 -d /root/systemtap_write/systemtap/testsuite/pthread_stacks.x
>
> pthread_stacks.x: ./systemtap.base/pthread_stacks.c:67: main: Assertion `rc == 0' failed.
>
> WARNING: Child process exited with signal 6 (Aborted)
>
> pthread_stacks.[3567] overwrote __default_stacksize@0x3ffb3be4338 (8388608->65536)
>
> WARNING: /root/systemtap_write/install/bin/staprun exited with status: 1
>
> Pass 5: run failed.  [man error::pass5]
>
> FAIL: pthread_stacks -Gsize (0 0)
>
> The fslatency-nd and fsslower-nd tests need further investigation:
>
> PASS: ./systemtap.examples/lwtools/fslatency-nd build
> meta taglines 'test_installcheck: stap fslatency-nd.stp 1 1' tag 'test_installcheck' value 'stap fslatency-nd.stp 1 1'
> attempting command stap fslatency-nd.stp 1 1
> OUT ERROR: read fault [man error::fault] at 0x0000000000000034 (addr) near operator '@cast' at fslatency-nd.stp:66:15
> Tracing FS sync reads and writes... Output every 1 secs.
> WARNING: Number of errors: 1, skipped probes: 1
> WARNING: /root/systemtap_write/install/bin/staprun exited with status: 1
> Pass 5: run failed.  [man error::pass5]
> child process exited abnormally
> RC 1
> FAIL: ./systemtap.examples/lwtools/fslatency-nd run
>
> PASS: ./systemtap.examples/lwtools/fsslower-nd build
> meta taglines 'test_installcheck: stap fsslower-nd.stp -c "sleep 1"' tag 'test_installcheck' value 'stap fsslower-nd.stp -c "sleep 1"'
> attempting command stap fsslower-nd.stp -c "sleep 1"
> OUT ERROR: read fault [man error::fault] at 0x0000000000000034 (addr) near operator '@cast' at fsslower-nd.stp:68:15
> Tracing FS sync reads and writes slower than 10 ms... Hit Ctrl-C to end.
> TIME     PID    COMM             FUNC           SIZE     LAT(ms)
> WARNING: Number of errors: 1, skipped probes: 1
> WARNING: /root/systemtap_write/install/bin/staprun exited with status: 1
> Pass 5: run failed.  [man error::pass5]
> child process exited abnormally
> RC 1
> FAIL: ./systemtap.examples/lwtools/fsslower-nd run
>
>
> Also a number of network tests failed like the following
>
>
> TEST PWD=/root/systemtap_write/systemtap/testsuite/systemtap.examples/network
> meta taglines 'test_check: stap -g -p4 netfilter_drop.stp TCP 1' tag 'test_check' value 'stap -g -p4 netfilter_drop.stp TCP 1'
> attempting command stap -g -p4 netfilter_drop.stp TCP 1
> OUT /tmp/stapbIEqFl/stap_0c0db8521e3ea3b2d26dcde208a77baf_21082_src.c:2731:1: error: initialization from incompatible pointer type [-Werror]
>   .hook = enter_netfilter_probe_0,
>   ^
> /tmp/stapbIEqFl/stap_0c0db8521e3ea3b2d26dcde208a77baf_21082_src.c:2731:1: error: (near initialization for 'netfilter_opts_0.hook') [-Werror]
> /tmp/stapbIEqFl/stap_0c0db8521e3ea3b2d26dcde208a77baf_21082_src.c:2732:1: error: unknown field 'owner' specified in initializer
>   .owner = THIS_MODULE,
>   ^
> /tmp/stapbIEqFl/stap_0c0db8521e3ea3b2d26dcde208a77baf_21082_src.c:2732:1: error: initialization from incompatible pointer type [-Werror]
> /tmp/stapbIEqFl/stap_0c0db8521e3ea3b2d26dcde208a77baf_21082_src.c:2732:1: error: (near initialization for 'netfilter_opts_0.dev') [-Werror]
> cc1: all warnings being treated as errors
> make[4]: *** [/tmp/stapbIEqFl/stap_0c0db8521e3ea3b2d26dcde208a77baf_21082_src.o] Error 1
> make[3]: *** [_module_/tmp/stapbIEqFl] Error 2
> WARNING: kbuild exited with status: 2
> Pass 4: compilation failed.  [man error::pass4]
> child process exited abnormally
> RC 1
> FAIL: ./systemtap.examples/network/netfilter_drop build
>
> -Will


Cool. Wish I could make sense of systemtap error messages.

  At Will Deacon's suggested I tested probing the instruction in 
__copy_to_user that can cause a captured kernel exception when an 
application passes in a bad buffer address.  Unfortunately the result 
was a hang.  So copy_to/from user is going to have to be blacklisted for 
now, unless there turns out to be a simple fix. I'm worried there might 
be other places in the kernel where an otherwise probeable instruction 
might be expected to generate an exception.

-dl



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Recent aarch64 kprobes and uprobes patch systemtap testing
  2015-12-10 20:24 Recent aarch64 kprobes and uprobes patch systemtap testing William Cohen
  2015-12-10 21:12 ` David Long
@ 2015-12-10 21:17 ` David Long
  2015-12-11 17:23 ` David Smith
  2 siblings, 0 replies; 13+ messages in thread
From: David Long @ 2015-12-10 21:17 UTC (permalink / raw)
  To: William Cohen, systemtap; +Cc: Pratyush Anand

I'm slightly worried that the app abort mentions pthread stacks.  I hope 
there's no threads issues in the kprobes code.  Do we know how much 
systemtap makes use of multiple threads?

-dl


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Recent aarch64 kprobes and uprobes patch systemtap testing
  2015-12-10 21:12 ` David Long
@ 2015-12-11  4:19   ` Pratyush Anand
  2015-12-11  4:43     ` David Long
  2015-12-11 17:02   ` William Cohen
  2015-12-11 20:59   ` William Cohen
  2 siblings, 1 reply; 13+ messages in thread
From: Pratyush Anand @ 2015-12-11  4:19 UTC (permalink / raw)
  To: David Long; +Cc: William Cohen, systemtap

On 10/12/2015:04:12:30 PM, David Long wrote:
> On 12/10/2015 03:24 PM, William Cohen wrote:
> >Hi All,
> >
> >Dave Long and Pratyush Anand have been working on kprobe and uprobe patches for aarch64.  I have built a local version the uprobe/upstream_arm64_devel branch of https://github.com/pratyushanand/linux which includes those patches in a linux-4.4.0-rc3 kernel.
> >
> >The tests seemed to run fairly well and the results have been uploaded to dejazilla:
> >
> >https://web.elastic.org/~dejazilla/viewsummary.php?summary=%3D%27%3C56698DCC.3090207%40redhat.com%3E%27
> >
> >		=== systemtap Summary ===
> >
> ># of expected passes		6096
> ># of unexpected failures	111
> ># of unexpected successes	2
> ># of expected failures		333
> ># of unknown successes		2
> ># of known failures		89
> ># of untested testcases		97
> ># of unsupported tests		27
> >runtest completed at Thu Dec 10 05:54:32 2015
> >
> >
> >There are still some areas needing for for aarch64 such as stack backtrace support.
> >
> >The following failure looks suspect because the child process died:
> >
> >
> >spawn stap -g ./systemtap.examples/process/threadstacks.stp -Gsize=65536 -c /root/systemtap_write/systemtap/testsuite/pthread_stacks.x 1024 0 -d /root/systemtap_write/systemtap/testsuite/pthread_stacks.x
> >
> >pthread_stacks.x: ./systemtap.base/pthread_stacks.c:67: main: Assertion `rc == 0' failed.
> >
> >WARNING: Child process exited with signal 6 (Aborted)
> >
> >pthread_stacks.[3567] overwrote __default_stacksize@0x3ffb3be4338 (8388608->65536)
> >
> >WARNING: /root/systemtap_write/install/bin/staprun exited with status: 1
> >
> >Pass 5: run failed.  [man error::pass5]
> >
> >FAIL: pthread_stacks -Gsize (0 0)
> >
> >The fslatency-nd and fsslower-nd tests need further investigation:
> >
> >PASS: ./systemtap.examples/lwtools/fslatency-nd build
> >meta taglines 'test_installcheck: stap fslatency-nd.stp 1 1' tag 'test_installcheck' value 'stap fslatency-nd.stp 1 1'
> >attempting command stap fslatency-nd.stp 1 1
> >OUT ERROR: read fault [man error::fault] at 0x0000000000000034 (addr) near operator '@cast' at fslatency-nd.stp:66:15
> >Tracing FS sync reads and writes... Output every 1 secs.
> >WARNING: Number of errors: 1, skipped probes: 1
> >WARNING: /root/systemtap_write/install/bin/staprun exited with status: 1
> >Pass 5: run failed.  [man error::pass5]
> >child process exited abnormally
> >RC 1
> >FAIL: ./systemtap.examples/lwtools/fslatency-nd run
> >
> >PASS: ./systemtap.examples/lwtools/fsslower-nd build
> >meta taglines 'test_installcheck: stap fsslower-nd.stp -c "sleep 1"' tag 'test_installcheck' value 'stap fsslower-nd.stp -c "sleep 1"'
> >attempting command stap fsslower-nd.stp -c "sleep 1"
> >OUT ERROR: read fault [man error::fault] at 0x0000000000000034 (addr) near operator '@cast' at fsslower-nd.stp:68:15
> >Tracing FS sync reads and writes slower than 10 ms... Hit Ctrl-C to end.
> >TIME     PID    COMM             FUNC           SIZE     LAT(ms)
> >WARNING: Number of errors: 1, skipped probes: 1
> >WARNING: /root/systemtap_write/install/bin/staprun exited with status: 1
> >Pass 5: run failed.  [man error::pass5]
> >child process exited abnormally
> >RC 1
> >FAIL: ./systemtap.examples/lwtools/fsslower-nd run
> >
> >
> >Also a number of network tests failed like the following
> >
> >
> >TEST PWD=/root/systemtap_write/systemtap/testsuite/systemtap.examples/network
> >meta taglines 'test_check: stap -g -p4 netfilter_drop.stp TCP 1' tag 'test_check' value 'stap -g -p4 netfilter_drop.stp TCP 1'
> >attempting command stap -g -p4 netfilter_drop.stp TCP 1
> >OUT /tmp/stapbIEqFl/stap_0c0db8521e3ea3b2d26dcde208a77baf_21082_src.c:2731:1: error: initialization from incompatible pointer type [-Werror]
> >  .hook = enter_netfilter_probe_0,
> >  ^
> >/tmp/stapbIEqFl/stap_0c0db8521e3ea3b2d26dcde208a77baf_21082_src.c:2731:1: error: (near initialization for 'netfilter_opts_0.hook') [-Werror]
> >/tmp/stapbIEqFl/stap_0c0db8521e3ea3b2d26dcde208a77baf_21082_src.c:2732:1: error: unknown field 'owner' specified in initializer
> >  .owner = THIS_MODULE,
> >  ^
> >/tmp/stapbIEqFl/stap_0c0db8521e3ea3b2d26dcde208a77baf_21082_src.c:2732:1: error: initialization from incompatible pointer type [-Werror]
> >/tmp/stapbIEqFl/stap_0c0db8521e3ea3b2d26dcde208a77baf_21082_src.c:2732:1: error: (near initialization for 'netfilter_opts_0.dev') [-Werror]
> >cc1: all warnings being treated as errors
> >make[4]: *** [/tmp/stapbIEqFl/stap_0c0db8521e3ea3b2d26dcde208a77baf_21082_src.o] Error 1
> >make[3]: *** [_module_/tmp/stapbIEqFl] Error 2
> >WARNING: kbuild exited with status: 2
> >Pass 4: compilation failed.  [man error::pass4]
> >child process exited abnormally
> >RC 1
> >FAIL: ./systemtap.examples/network/netfilter_drop build
> >
> >-Will
> 
> 
> Cool. Wish I could make sense of systemtap error messages.
> 
>  At Will Deacon's suggested I tested probing the instruction in
> __copy_to_user that can cause a captured kernel exception when an
> application passes in a bad buffer address.  Unfortunately the result was a
> hang.  So copy_to/from user is going to have to be blacklisted for now,
> unless there turns out to be a simple fix. I'm worried there might be other
> places in the kernel where an otherwise probeable instruction might be
> expected to generate an exception.

There are many arm64 specific functions which need blacklisting. I have them
here.

https://github.com/pratyushanand/linux/commit/4098b5ad2c67bf4c375981fc68793f44af005eb9
https://github.com/pratyushanand/linux/commit/df3e76cbf70a8e1af42951d4b30587f022d25938

I think uprobe_pre/post_sstep_notifier() should also be blacklisted.

https://github.com/pratyushanand/linux/commit/99c89512931a46582d2f026b7288c895b8ef320c

Certainly, there could be some more functions which need kprobe blacklisting.
Because I remember, I had kprobe at every function of kallsyms on a x86
platform, and it had crashed.

~Pratyush

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Recent aarch64 kprobes and uprobes patch systemtap testing
  2015-12-11  4:19   ` Pratyush Anand
@ 2015-12-11  4:43     ` David Long
  0 siblings, 0 replies; 13+ messages in thread
From: David Long @ 2015-12-11  4:43 UTC (permalink / raw)
  To: Pratyush Anand; +Cc: William Cohen, systemtap

On 12/10/2015 11:19 PM, Pratyush Anand wrote:
> On 10/12/2015:04:12:30 PM, David Long wrote:
>> On 12/10/2015 03:24 PM, William Cohen wrote:
>>> Hi All,
>>>
>>> Dave Long and Pratyush Anand have been working on kprobe and uprobe patches for aarch64.  I have built a local version the uprobe/upstream_arm64_devel branch of https://github.com/pratyushanand/linux which includes those patches in a linux-4.4.0-rc3 kernel.
>>>
>>> The tests seemed to run fairly well and the results have been uploaded to dejazilla:
>>>
>>> https://web.elastic.org/~dejazilla/viewsummary.php?summary=%3D%27%3C56698DCC.3090207%40redhat.com%3E%27
>>>
>>> 		=== systemtap Summary ===
>>>
>>> # of expected passes		6096
>>> # of unexpected failures	111
>>> # of unexpected successes	2
>>> # of expected failures		333
>>> # of unknown successes		2
>>> # of known failures		89
>>> # of untested testcases		97
>>> # of unsupported tests		27
>>> runtest completed at Thu Dec 10 05:54:32 2015
>>>
>>>
>>> There are still some areas needing for for aarch64 such as stack backtrace support.
>>>
>>> The following failure looks suspect because the child process died:
>>>
>>>
>>> spawn stap -g ./systemtap.examples/process/threadstacks.stp -Gsize=65536 -c /root/systemtap_write/systemtap/testsuite/pthread_stacks.x 1024 0 -d /root/systemtap_write/systemtap/testsuite/pthread_stacks.x
>>>
>>> pthread_stacks.x: ./systemtap.base/pthread_stacks.c:67: main: Assertion `rc == 0' failed.
>>>
>>> WARNING: Child process exited with signal 6 (Aborted)
>>>
>>> pthread_stacks.[3567] overwrote __default_stacksize@0x3ffb3be4338 (8388608->65536)
>>>
>>> WARNING: /root/systemtap_write/install/bin/staprun exited with status: 1
>>>
>>> Pass 5: run failed.  [man error::pass5]
>>>
>>> FAIL: pthread_stacks -Gsize (0 0)
>>>
>>> The fslatency-nd and fsslower-nd tests need further investigation:
>>>
>>> PASS: ./systemtap.examples/lwtools/fslatency-nd build
>>> meta taglines 'test_installcheck: stap fslatency-nd.stp 1 1' tag 'test_installcheck' value 'stap fslatency-nd.stp 1 1'
>>> attempting command stap fslatency-nd.stp 1 1
>>> OUT ERROR: read fault [man error::fault] at 0x0000000000000034 (addr) near operator '@cast' at fslatency-nd.stp:66:15
>>> Tracing FS sync reads and writes... Output every 1 secs.
>>> WARNING: Number of errors: 1, skipped probes: 1
>>> WARNING: /root/systemtap_write/install/bin/staprun exited with status: 1
>>> Pass 5: run failed.  [man error::pass5]
>>> child process exited abnormally
>>> RC 1
>>> FAIL: ./systemtap.examples/lwtools/fslatency-nd run
>>>
>>> PASS: ./systemtap.examples/lwtools/fsslower-nd build
>>> meta taglines 'test_installcheck: stap fsslower-nd.stp -c "sleep 1"' tag 'test_installcheck' value 'stap fsslower-nd.stp -c "sleep 1"'
>>> attempting command stap fsslower-nd.stp -c "sleep 1"
>>> OUT ERROR: read fault [man error::fault] at 0x0000000000000034 (addr) near operator '@cast' at fsslower-nd.stp:68:15
>>> Tracing FS sync reads and writes slower than 10 ms... Hit Ctrl-C to end.
>>> TIME     PID    COMM             FUNC           SIZE     LAT(ms)
>>> WARNING: Number of errors: 1, skipped probes: 1
>>> WARNING: /root/systemtap_write/install/bin/staprun exited with status: 1
>>> Pass 5: run failed.  [man error::pass5]
>>> child process exited abnormally
>>> RC 1
>>> FAIL: ./systemtap.examples/lwtools/fsslower-nd run
>>>
>>>
>>> Also a number of network tests failed like the following
>>>
>>>
>>> TEST PWD=/root/systemtap_write/systemtap/testsuite/systemtap.examples/network
>>> meta taglines 'test_check: stap -g -p4 netfilter_drop.stp TCP 1' tag 'test_check' value 'stap -g -p4 netfilter_drop.stp TCP 1'
>>> attempting command stap -g -p4 netfilter_drop.stp TCP 1
>>> OUT /tmp/stapbIEqFl/stap_0c0db8521e3ea3b2d26dcde208a77baf_21082_src.c:2731:1: error: initialization from incompatible pointer type [-Werror]
>>>   .hook = enter_netfilter_probe_0,
>>>   ^
>>> /tmp/stapbIEqFl/stap_0c0db8521e3ea3b2d26dcde208a77baf_21082_src.c:2731:1: error: (near initialization for 'netfilter_opts_0.hook') [-Werror]
>>> /tmp/stapbIEqFl/stap_0c0db8521e3ea3b2d26dcde208a77baf_21082_src.c:2732:1: error: unknown field 'owner' specified in initializer
>>>   .owner = THIS_MODULE,
>>>   ^
>>> /tmp/stapbIEqFl/stap_0c0db8521e3ea3b2d26dcde208a77baf_21082_src.c:2732:1: error: initialization from incompatible pointer type [-Werror]
>>> /tmp/stapbIEqFl/stap_0c0db8521e3ea3b2d26dcde208a77baf_21082_src.c:2732:1: error: (near initialization for 'netfilter_opts_0.dev') [-Werror]
>>> cc1: all warnings being treated as errors
>>> make[4]: *** [/tmp/stapbIEqFl/stap_0c0db8521e3ea3b2d26dcde208a77baf_21082_src.o] Error 1
>>> make[3]: *** [_module_/tmp/stapbIEqFl] Error 2
>>> WARNING: kbuild exited with status: 2
>>> Pass 4: compilation failed.  [man error::pass4]
>>> child process exited abnormally
>>> RC 1
>>> FAIL: ./systemtap.examples/network/netfilter_drop build
>>>
>>> -Will
>>
>>
>> Cool. Wish I could make sense of systemtap error messages.
>>
>>   At Will Deacon's suggested I tested probing the instruction in
>> __copy_to_user that can cause a captured kernel exception when an
>> application passes in a bad buffer address.  Unfortunately the result was a
>> hang.  So copy_to/from user is going to have to be blacklisted for now,
>> unless there turns out to be a simple fix. I'm worried there might be other
>> places in the kernel where an otherwise probeable instruction might be
>> expected to generate an exception.
>
> There are many arm64 specific functions which need blacklisting. I have them
> here.
>
> https://github.com/pratyushanand/linux/commit/4098b5ad2c67bf4c375981fc68793f44af005eb9
> https://github.com/pratyushanand/linux/commit/df3e76cbf70a8e1af42951d4b30587f022d25938
>
> I think uprobe_pre/post_sstep_notifier() should also be blacklisted.
>
> https://github.com/pratyushanand/linux/commit/99c89512931a46582d2f026b7288c895b8ef320c
>
> Certainly, there could be some more functions which need kprobe blacklisting.
> Because I remember, I had kprobe at every function of kallsyms on a x86
> platform, and it had crashed.
>
> ~Pratyush
>

OK, it sounds like this has been just a "best effort" on x86 then?  I'll 
blacklist the copy functions and move on.

-dl

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Recent aarch64 kprobes and uprobes patch systemtap testing
  2015-12-10 21:12 ` David Long
  2015-12-11  4:19   ` Pratyush Anand
@ 2015-12-11 17:02   ` William Cohen
  2015-12-16  5:22     ` Pratyush Anand
  2015-12-11 20:59   ` William Cohen
  2 siblings, 1 reply; 13+ messages in thread
From: William Cohen @ 2015-12-11 17:02 UTC (permalink / raw)
  To: David Long, systemtap; +Cc: Pratyush Anand

On 12/10/2015 04:12 PM, David Long wrote:
> On 12/10/2015 03:24 PM, William Cohen wrote:
>> Hi All,
>>
>> Dave Long and Pratyush Anand have been working on kprobe and uprobe patches for aarch64.  I have built a local version the uprobe/upstream_arm64_devel branch of https://github.com/pratyushanand/linux which includes those patches in a linux-4.4.0-rc3 kernel.
>>
>> The tests seemed to run fairly well and the results have been uploaded to dejazilla:
>>
>> https://web.elastic.org/~dejazilla/viewsummary.php?summary=%3D%27%3C56698DCC.3090207%40redhat.com%3E%27
>>
>>         === systemtap Summary ===
>>
>> # of expected passes        6096
>> # of unexpected failures    111
>> # of unexpected successes    2
>> # of expected failures        333
>> # of unknown successes        2
>> # of known failures        89
>> # of untested testcases        97
>> # of unsupported tests        27
>> runtest completed at Thu Dec 10 05:54:32 2015
>>
>>
>> There are still some areas needing for for aarch64 such as stack backtrace support.
>>
>> The following failure looks suspect because the child process died:
>>
>>
>> spawn stap -g ./systemtap.examples/process/threadstacks.stp -Gsize=65536 -c /root/systemtap_write/systemtap/testsuite/pthread_stacks.x 1024 0 -d /root/systemtap_write/systemtap/testsuite/pthread_stacks.x
>>
>> pthread_stacks.x: ./systemtap.base/pthread_stacks.c:67: main: Assertion `rc == 0' failed.
>>
>> WARNING: Child process exited with signal 6 (Aborted)
>>
>> pthread_stacks.[3567] overwrote __default_stacksize@0x3ffb3be4338 (8388608->65536)
>>
>> WARNING: /root/systemtap_write/install/bin/staprun exited with status: 1
>>
>> Pass 5: run failed.  [man error::pass5]
>>
>> FAIL: pthread_stacks -Gsize (0 0)
>>
>> The fslatency-nd and fsslower-nd tests need further investigation:
>>
>> PASS: ./systemtap.examples/lwtools/fslatency-nd build
>> meta taglines 'test_installcheck: stap fslatency-nd.stp 1 1' tag 'test_installcheck' value 'stap fslatency-nd.stp 1 1'
>> attempting command stap fslatency-nd.stp 1 1
>> OUT ERROR: read fault [man error::fault] at 0x0000000000000034 (addr) near operator '@cast' at fslatency-nd.stp:66:15
>> Tracing FS sync reads and writes... Output every 1 secs.
>> WARNING: Number of errors: 1, skipped probes: 1
>> WARNING: /root/systemtap_write/install/bin/staprun exited with status: 1
>> Pass 5: run failed.  [man error::pass5]
>> child process exited abnormally
>> RC 1
>> FAIL: ./systemtap.examples/lwtools/fslatency-nd run
>>
>> PASS: ./systemtap.examples/lwtools/fsslower-nd build
>> meta taglines 'test_installcheck: stap fsslower-nd.stp -c "sleep 1"' tag 'test_installcheck' value 'stap fsslower-nd.stp -c "sleep 1"'
>> attempting command stap fsslower-nd.stp -c "sleep 1"
>> OUT ERROR: read fault [man error::fault] at 0x0000000000000034 (addr) near operator '@cast' at fsslower-nd.stp:68:15
>> Tracing FS sync reads and writes slower than 10 ms... Hit Ctrl-C to end.
>> TIME     PID    COMM             FUNC           SIZE     LAT(ms)
>> WARNING: Number of errors: 1, skipped probes: 1
>> WARNING: /root/systemtap_write/install/bin/staprun exited with status: 1
>> Pass 5: run failed.  [man error::pass5]
>> child process exited abnormally
>> RC 1
>> FAIL: ./systemtap.examples/lwtools/fsslower-nd run
>>
>>
>> Also a number of network tests failed like the following
>>
>>
>> TEST PWD=/root/systemtap_write/systemtap/testsuite/systemtap.examples/network
>> meta taglines 'test_check: stap -g -p4 netfilter_drop.stp TCP 1' tag 'test_check' value 'stap -g -p4 netfilter_drop.stp TCP 1'
>> attempting command stap -g -p4 netfilter_drop.stp TCP 1
>> OUT /tmp/stapbIEqFl/stap_0c0db8521e3ea3b2d26dcde208a77baf_21082_src.c:2731:1: error: initialization from incompatible pointer type [-Werror]
>>   .hook = enter_netfilter_probe_0,
>>   ^
>> /tmp/stapbIEqFl/stap_0c0db8521e3ea3b2d26dcde208a77baf_21082_src.c:2731:1: error: (near initialization for 'netfilter_opts_0.hook') [-Werror]
>> /tmp/stapbIEqFl/stap_0c0db8521e3ea3b2d26dcde208a77baf_21082_src.c:2732:1: error: unknown field 'owner' specified in initializer
>>   .owner = THIS_MODULE,
>>   ^
>> /tmp/stapbIEqFl/stap_0c0db8521e3ea3b2d26dcde208a77baf_21082_src.c:2732:1: error: initialization from incompatible pointer type [-Werror]
>> /tmp/stapbIEqFl/stap_0c0db8521e3ea3b2d26dcde208a77baf_21082_src.c:2732:1: error: (near initialization for 'netfilter_opts_0.dev') [-Werror]
>> cc1: all warnings being treated as errors
>> make[4]: *** [/tmp/stapbIEqFl/stap_0c0db8521e3ea3b2d26dcde208a77baf_21082_src.o] Error 1
>> make[3]: *** [_module_/tmp/stapbIEqFl] Error 2
>> WARNING: kbuild exited with status: 2
>> Pass 4: compilation failed.  [man error::pass4]
>> child process exited abnormally
>> RC 1
>> FAIL: ./systemtap.examples/network/netfilter_drop build
>>
>> -Will
> 
> 
> Cool. Wish I could make sense of systemtap error messages.

Hi Dave,

This was a data dump, so I haven't made sense of some of it either. :)  The ".hook=..." and ".owner=..." are problems in the systemtap code generation for newer kernel and don't concern the aarch64 kprobes/uprobes work.  The read faults for fslatency-nd.stp and fsslower-nd,stp need to be check more carefully. but they are likely issues with systemtap (they do work on linux-4.2.0 on x86_64).

The "FAIL: pthread_stacks -Gsize (0 0)" looks like it could be an issue with uprobes affecting the running of the program.  Pratyush are you able to run this systemtap test locally?

> 
>  At Will Deacon's suggested I tested probing the instruction in __copy_to_user that can cause a captured kernel exception when an application passes in a bad buffer address.  Unfortunately the result was a hang.  So copy_to/from user is going to have to be blacklisted for now, unless there turns out to be a simple fix. I'm worried there might be other places in the kernel where an otherwise probeable instruction might be expected to generate an exception.
> 
> -dl

So the problem is the issue of nested exceptions in the single step debug exception?  Are there other place in the kernel where similar exceptions could occur, such as memory management code probing an address to verify it is valid?

-Will

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Recent aarch64 kprobes and uprobes patch systemtap testing
  2015-12-10 20:24 Recent aarch64 kprobes and uprobes patch systemtap testing William Cohen
  2015-12-10 21:12 ` David Long
  2015-12-10 21:17 ` David Long
@ 2015-12-11 17:23 ` David Smith
  2 siblings, 0 replies; 13+ messages in thread
From: David Smith @ 2015-12-11 17:23 UTC (permalink / raw)
  To: William Cohen, systemtap; +Cc: Dave Long, Pratyush Anand

On 12/10/2015 02:24 PM, William Cohen wrote:
> Hi All,
> 
> Dave Long and Pratyush Anand have been working on kprobe and uprobe patches for aarch64.  I have built a local version the uprobe/upstream_arm64_devel branch of https://github.com/pratyushanand/linux which includes those patches in a linux-4.4.0-rc3 kernel.
> 
> The tests seemed to run fairly well and the results have been uploaded to dejazilla:
> 
> https://web.elastic.org/~dejazilla/viewsummary.php?summary=%3D%27%3C56698DCC.3090207%40redhat.com%3E%27

... stuff deleted ...

> Also a number of network tests failed like the following 
> 
> TEST PWD=/root/systemtap_write/systemtap/testsuite/systemtap.examples/network
> meta taglines 'test_check: stap -g -p4 netfilter_drop.stp TCP 1' tag 'test_check' value 'stap -g -p4 netfilter_drop.stp TCP 1'
> attempting command stap -g -p4 netfilter_drop.stp TCP 1
> OUT /tmp/stapbIEqFl/stap_0c0db8521e3ea3b2d26dcde208a77baf_21082_src.c:2731:1: error: initialization from incompatible pointer type [-Werror]
>  .hook = enter_netfilter_probe_0,
>  ^
> /tmp/stapbIEqFl/stap_0c0db8521e3ea3b2d26dcde208a77baf_21082_src.c:2731:1: error: (near initialization for 'netfilter_opts_0.hook') [-Werror]
> /tmp/stapbIEqFl/stap_0c0db8521e3ea3b2d26dcde208a77baf_21082_src.c:2732:1: error: unknown field 'owner' specified in initializer
>  .owner = THIS_MODULE,
>  ^
> /tmp/stapbIEqFl/stap_0c0db8521e3ea3b2d26dcde208a77baf_21082_src.c:2732:1: error: initialization from incompatible pointer type [-Werror]
> /tmp/stapbIEqFl/stap_0c0db8521e3ea3b2d26dcde208a77baf_21082_src.c:2732:1: error: (near initialization for 'netfilter_opts_0.dev') [-Werror]
> cc1: all warnings being treated as errors
> make[4]: *** [/tmp/stapbIEqFl/stap_0c0db8521e3ea3b2d26dcde208a77baf_21082_src.o] Error 1
> make[3]: *** [_module_/tmp/stapbIEqFl] Error 2
> WARNING: kbuild exited with status: 2
> Pass 4: compilation failed.  [man error::pass4]

This appears to be a problem with 4.4 kernels in general, not just an
aarch64 problem. It also happens on rawhide x86_64. It appears that the
kernel has had some changes in the area of netfilter probes.

I filed PR19358 on this issue.

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Recent aarch64 kprobes and uprobes patch systemtap testing
  2015-12-10 21:12 ` David Long
  2015-12-11  4:19   ` Pratyush Anand
  2015-12-11 17:02   ` William Cohen
@ 2015-12-11 20:59   ` William Cohen
  2015-12-16 11:55     ` Pratyush Anand
  2 siblings, 1 reply; 13+ messages in thread
From: William Cohen @ 2015-12-11 20:59 UTC (permalink / raw)
  To: David Long, systemtap; +Cc: Pratyush Anand

[-- Attachment #1: Type: text/plain, Size: 4811 bytes --]

On 12/10/2015 04:12 PM, David Long wrote:
> On 12/10/2015 03:24 PM, William Cohen wrote:

>> The fslatency-nd and fsslower-nd tests need further investigation:
>>
>> PASS: ./systemtap.examples/lwtools/fslatency-nd build
>> meta taglines 'test_installcheck: stap fslatency-nd.stp 1 1' tag 'test_installcheck' value 'stap fslatency-nd.stp 1 1'
>> attempting command stap fslatency-nd.stp 1 1
>> OUT ERROR: read fault [man error::fault] at 0x0000000000000034 (addr) near operator '@cast' at fslatency-nd.stp:66:15
>> Tracing FS sync reads and writes... Output every 1 secs.
>> WARNING: Number of errors: 1, skipped probes: 1
>> WARNING: /root/systemtap_write/install/bin/staprun exited with status: 1
>> Pass 5: run failed.  [man error::pass5]
>> child process exited abnormally
>> RC 1
>> FAIL: ./systemtap.examples/lwtools/fslatency-nd run
>>
>> PASS: ./systemtap.examples/lwtools/fsslower-nd build
>> meta taglines 'test_installcheck: stap fsslower-nd.stp -c "sleep 1"' tag 'test_installcheck' value 'stap fsslower-nd.stp -c "sleep 1"'
>> attempting command stap fsslower-nd.stp -c "sleep 1"
>> OUT ERROR: read fault [man error::fault] at 0x0000000000000034 (addr) near operator '@cast' at fsslower-nd.stp:68:15
>> Tracing FS sync reads and writes slower than 10 ms... Hit Ctrl-C to end.
>> TIME     PID    COMM             FUNC           SIZE     LAT(ms)
>> WARNING: Number of errors: 1, skipped probes: 1
>> WARNING: /root/systemtap_write/install/bin/staprun exited with status: 1
>> Pass 5: run failed.  [man error::pass5]
>> child process exited abnormally
>> RC 1
>> FAIL: ./systemtap.examples/lwtools/fsslower-nd run

> 
> Cool. Wish I could make sense of systemtap error messages.
> 
>  At Will Deacon's suggested I tested probing the instruction in __copy_to_user that can cause a captured kernel exception when an application passes in a bad buffer address.  Unfortunately the result was a hang.  So copy_to/from user is going to have to be blacklisted for now, unless there turns out to be a simple fix. I'm worried there might be other places in the kernel where an otherwise probeable instruction might be expected to generate an exception.
> 
> -dl
> 
> 
> 

Hi Dave and Pratyush,

I did some more experimentation with the fslatency-nd and fsslow-nd tests to see what is going on.  The problem seems to be related to the return probes.  I have a small reproducer attached which runs fine on x86_64 machine.  However on aarch64 it has the bogus read because some of the argument registers have changed value

# ../install/bin/stap ./aarch64_retkprobe_issue2.stp 
ERROR: read fault [man error::fault] at 0x0000000000000034 (addr) near operator '@cast' at ./aarch64_retkprobe_issue2.stp:13:7
pc : [<fffffe000021e37c>] lr : [<fffffe000021eb64>] pstate: 80000145
sp : fffffe00bad7be30
x29: fffffe00bad7be30 x28: fffffe00bad78000 
x27: fffffe0000912000 x26: 000000000000003f 
x25: 000000000000011d x24: 0000000000000015 
x23: 0000000080000000 x22: 000003fff82b9760 
x21: fffffe00bad7bec8 x20: 0000000000002004 
x19: fffffe01b716e100 x18: 000003fff82b8160 
x17: 000003ff849bf0a0 x16: fffffe000021f4a0 
x15: 0000000000000004 x14: 000003fff82bb910 
x13: 0000000000000001 x12: 000003ff7d75f200 
x11: 00000000003d0f00 x10: 000003ff849b7af4 
x9 : 0000000000000028 x8 : 0000000000000020 
x7 : fffffe00bc5c3600 x6 : 0000000000000000 
x5 : 0000000000000000 x4 : 0000000000000000 
x3 : fffffe00bad7bec8 x2 : 0000000000002004 
x1 : 000003fff82b9760 x0 : fffffe01b716e100 

pc : [<fffffe000021e37c>] lr : [<fffffe000009fbe0>] pstate: 60000145
sp : fffffe00bad7be30
x29: fffffe00bad7be30 x28: fffffe00bad78000 
x27: fffffe0000912000 x26: 000000000000003f 
x25: 000000000000011d x24: 0000000000000015 
x23: 0000000080000000 x22: 000003fff82b9760 
x21: fffffe00bad7bec8 x20: 0000000000002004 
x19: fffffe01b716e100 x18: 000003fff82b8160 
x17: 000003ff849bf0a0 x16: fffffe000021f4a0 
x15: 0000000000000004 x14: 000003fff82bb910 
x13: 0000000000000001 x12: 000003ff7d75f200 
x11: 00000000003d0f00 x10: 000003ff849b7af4 
x9 : 0000000000000028 x8 : 0000000000000020 
x7 : fffffe00bc5c3600 x6 : 000003fff82b976c 
x5 : 000003fff82b976c x4 : 0000000000000000 
x3 : 0000000000000000 x2 : 0000000000000000 
x1 : 0000000000000000 x0 : 000000000000000c 

WARNING: Number of errors: 1, skipped probes: 1
WARNING: /root/systemtap_write/install/bin/staprun exited with status: 1
Pass 5: run failed.  [man error::pass5]

Comment the return probe with a '#' at the beginning of the line with "kprobe.function("__vfs_read").return," and the script runs fine.  The systemtap pointer_arg() doesn't take into account that the register might be used as a scratch register and the value changed after entry into the function.  This is an issue with the systemtap scripts. I have patched the systemtap scripts to addresss this issue..

-Will

[-- Attachment #2: aarch64_retkprobe_issue2.stp --]
[-- Type: text/plain, Size: 473 bytes --]

# The return probe appears to cause this reproducer to crash
# kprobe.function("__vfs_read").return causes a read fault
# comment out the kprobe.function("__vfs_read").return and the
# script runs without error
probe
      kprobe.function("__vfs_read").return,
      kprobe.function("__vfs_read")
{
	# Skip the call if new_sync_read() wouldn't be called.
	file = pointer_arg(1)
	if (file) {
		print_regs();
 		if(@cast(file, "file")->f_op->read) {
			next
		}
	}
	exit()
}

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Recent aarch64 kprobes and uprobes patch systemtap testing
  2015-12-11 17:02   ` William Cohen
@ 2015-12-16  5:22     ` Pratyush Anand
  2015-12-16 13:14       ` William Cohen
  0 siblings, 1 reply; 13+ messages in thread
From: Pratyush Anand @ 2015-12-16  5:22 UTC (permalink / raw)
  To: William Cohen; +Cc: David Long, systemtap

On 11/12/2015:12:02:21 PM, William Cohen wrote:
> 
> The "FAIL: pthread_stacks -Gsize (0 0)" looks like it could be an issue with uprobes affecting the running of the program.  Pratyush are you able to run this systemtap test locally?

Even when I run this test locally it does not work, but it fails very early in
my case. May be because of different libpthread.so

[root@amd-seattle-01 testsuite]# /root/bin/systemtap/bin/stap -gp4 ./systemtap.examples/process/threadstacks.stp -Gsize=65536 -d /root/systemtap/testsuite/pthread_stacks.x
semantic error: while resolving probe point: identifier 'process' at ./systemtap.examples/process/threadstacks.stp:17:7
        source: probe process("/lib*/libpthread.so.*").function("allocate_stack") {
                      ^

semantic error: no match

Pass 2: analysis failed.  [man error::pass2]
[root@amd-seattle-01 testsuite]# ls /lib*/libpthread.so.*
/lib64/libpthread.so.0
[root@amd-seattle-01 testsuite]# ll /lib64/libpthread.so.0
lrwxrwxrwx. 1 root root 18 Dec 13 23:42 /lib64/libpthread.so.0 -> libpthread-2.17.so
[root@amd-seattle-01 testsuite]# objdump -d /lib64/libpthread.so.0 | grep allocate_stack
0000000000006a50 <__deallocate_stack>:
    6a7c:       54000061        b.ne    6a88 <__deallocate_stack+0x38>
    6a84:       35ffff83        cbnz    w3, 6a74 <__deallocate_stack+0x24>
    6a88:       540005e1        b.ne    6b44 <__deallocate_stack+0xf4>
    6a90:       350005e0        cbnz    w0, 6b4c <__deallocate_stack+0xfc>
    6ac4:       350005e2        cbnz    w2, 6b80 <__deallocate_stack+0x130>
    6b14:       54000328        b.hi    6b78 <__deallocate_stack+0x128>
    6b2c:       35ffffc2        cbnz    w2, 6b24 <__deallocate_stack+0xd4>
    6b34:       5400014c        b.gt    6b5c <__deallocate_stack+0x10c>
    6b48:       17ffffd1        b       6a8c <__deallocate_stack+0x3c>
    6b58:       17ffffcf        b       6a94 <__deallocate_stack+0x44>
    6b74:       17fffff1        b       6b38 <__deallocate_stack+0xe8>
    6b7c:       17ffffe7        b       6b18 <__deallocate_stack+0xc8>
    6b8c:       17ffffe3        b       6b18 <__deallocate_stack+0xc8>
    6c3c:       97ffff85        bl      6a50 <__deallocate_stack>
    7ce4:       97fffb5b        bl      6a50 <__deallocate_stack>
    7f04:       97fffad3        bl      6a50 <__deallocate_stack>
    894c:       97fff841        bl      6a50 <__deallocate_stack>
[root@amd-seattle-01 testsuite]#

~Pratyush

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Recent aarch64 kprobes and uprobes patch systemtap testing
  2015-12-11 20:59   ` William Cohen
@ 2015-12-16 11:55     ` Pratyush Anand
  2015-12-16 13:10       ` William Cohen
  0 siblings, 1 reply; 13+ messages in thread
From: Pratyush Anand @ 2015-12-16 11:55 UTC (permalink / raw)
  To: William Cohen; +Cc: David Long, systemtap

On 11/12/2015:03:59:53 PM, William Cohen wrote:
> On 12/10/2015 04:12 PM, David Long wrote:
> > On 12/10/2015 03:24 PM, William Cohen wrote:
> 
> >> The fslatency-nd and fsslower-nd tests need further investigation:
> >>
> >> PASS: ./systemtap.examples/lwtools/fslatency-nd build
> >> meta taglines 'test_installcheck: stap fslatency-nd.stp 1 1' tag 'test_installcheck' value 'stap fslatency-nd.stp 1 1'
> >> attempting command stap fslatency-nd.stp 1 1
> >> OUT ERROR: read fault [man error::fault] at 0x0000000000000034 (addr) near operator '@cast' at fslatency-nd.stp:66:15
> >> Tracing FS sync reads and writes... Output every 1 secs.
> >> WARNING: Number of errors: 1, skipped probes: 1
> >> WARNING: /root/systemtap_write/install/bin/staprun exited with status: 1
> >> Pass 5: run failed.  [man error::pass5]
> >> child process exited abnormally
> >> RC 1
> >> FAIL: ./systemtap.examples/lwtools/fslatency-nd run
> >>
> >> PASS: ./systemtap.examples/lwtools/fsslower-nd build
> >> meta taglines 'test_installcheck: stap fsslower-nd.stp -c "sleep 1"' tag 'test_installcheck' value 'stap fsslower-nd.stp -c "sleep 1"'
> >> attempting command stap fsslower-nd.stp -c "sleep 1"
> >> OUT ERROR: read fault [man error::fault] at 0x0000000000000034 (addr) near operator '@cast' at fsslower-nd.stp:68:15
> >> Tracing FS sync reads and writes slower than 10 ms... Hit Ctrl-C to end.
> >> TIME     PID    COMM             FUNC           SIZE     LAT(ms)
> >> WARNING: Number of errors: 1, skipped probes: 1
> >> WARNING: /root/systemtap_write/install/bin/staprun exited with status: 1
> >> Pass 5: run failed.  [man error::pass5]
> >> child process exited abnormally
> >> RC 1
> >> FAIL: ./systemtap.examples/lwtools/fsslower-nd run
> 
> > 
> > Cool. Wish I could make sense of systemtap error messages.
> > 
> >  At Will Deacon's suggested I tested probing the instruction in __copy_to_user that can cause a captured kernel exception when an application passes in a bad buffer address.  Unfortunately the result was a hang.  So copy_to/from user is going to have to be blacklisted for now, unless there turns out to be a simple fix. I'm worried there might be other places in the kernel where an otherwise probeable instruction might be expected to generate an exception.
> > 
> > -dl
> > 
> > 
> > 
> 
> Hi Dave and Pratyush,
> 
> I did some more experimentation with the fslatency-nd and fsslow-nd tests to see what is going on.  The problem seems to be related to the return probes.  I have a small reproducer attached which runs fine on x86_64 machine.  However on aarch64 it has the bogus read because some of the argument registers have changed value
> 
> # ../install/bin/stap ./aarch64_retkprobe_issue2.stp 
> ERROR: read fault [man error::fault] at 0x0000000000000034 (addr) near operator '@cast' at ./aarch64_retkprobe_issue2.stp:13:7
> pc : [<fffffe000021e37c>] lr : [<fffffe000021eb64>] pstate: 80000145
> sp : fffffe00bad7be30
> x29: fffffe00bad7be30 x28: fffffe00bad78000 
> x27: fffffe0000912000 x26: 000000000000003f 
> x25: 000000000000011d x24: 0000000000000015 
> x23: 0000000080000000 x22: 000003fff82b9760 
> x21: fffffe00bad7bec8 x20: 0000000000002004 
> x19: fffffe01b716e100 x18: 000003fff82b8160 
> x17: 000003ff849bf0a0 x16: fffffe000021f4a0 
> x15: 0000000000000004 x14: 000003fff82bb910 
> x13: 0000000000000001 x12: 000003ff7d75f200 
> x11: 00000000003d0f00 x10: 000003ff849b7af4 
> x9 : 0000000000000028 x8 : 0000000000000020 
> x7 : fffffe00bc5c3600 x6 : 0000000000000000 
> x5 : 0000000000000000 x4 : 0000000000000000 
> x3 : fffffe00bad7bec8 x2 : 0000000000002004 
> x1 : 000003fff82b9760 x0 : fffffe01b716e100 
> 
> pc : [<fffffe000021e37c>] lr : [<fffffe000009fbe0>] pstate: 60000145
> sp : fffffe00bad7be30
> x29: fffffe00bad7be30 x28: fffffe00bad78000 
> x27: fffffe0000912000 x26: 000000000000003f 
> x25: 000000000000011d x24: 0000000000000015 
> x23: 0000000080000000 x22: 000003fff82b9760 
> x21: fffffe00bad7bec8 x20: 0000000000002004 
> x19: fffffe01b716e100 x18: 000003fff82b8160 
> x17: 000003ff849bf0a0 x16: fffffe000021f4a0 
> x15: 0000000000000004 x14: 000003fff82bb910 
> x13: 0000000000000001 x12: 000003ff7d75f200 
> x11: 00000000003d0f00 x10: 000003ff849b7af4 
> x9 : 0000000000000028 x8 : 0000000000000020 
> x7 : fffffe00bc5c3600 x6 : 000003fff82b976c 
> x5 : 000003fff82b976c x4 : 0000000000000000 
> x3 : 0000000000000000 x2 : 0000000000000000 
> x1 : 0000000000000000 x0 : 000000000000000c 

Although I am not sure, but this is what it seems to me:

First argument (file) is in x0, and which is 0xC in case of kretprobe. But, can
x0 really be considered as 1st arg in case of kretprobe? 
I think, x0 should have return value of __vfs_read() in case of kretprobe. So,
0xC could be the number of bytes read.

With perf I see:

# perf probe -k vmlinux __vfs_read_exit=__vfs_read%return file
Semantic error :You can't specify local variable for kretprobe.

So, I am not sure what mechanism systemtap uses to get local variable in case of
kretprobe.

Moreover, on x86 I see that loop exits after the 1st print_regs() only. So it
means there was valid file->f_op->read() for the 1st file itself. If I comment
"kprobe.function("__vfs_read")", then there is no print at all. It means, we are
not hitting a case on x86 when callback was called for kretprobe and we had
nonzero 1st argument.

~Pratyush

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Recent aarch64 kprobes and uprobes patch systemtap testing
  2015-12-16 11:55     ` Pratyush Anand
@ 2015-12-16 13:10       ` William Cohen
  0 siblings, 0 replies; 13+ messages in thread
From: William Cohen @ 2015-12-16 13:10 UTC (permalink / raw)
  To: Pratyush Anand; +Cc: David Long, systemtap

On 12/16/2015 06:55 AM, Pratyush Anand wrote:
> On 11/12/2015:03:59:53 PM, William Cohen wrote:

>> Hi Dave and Pratyush,
>>
>> I did some more experimentation with the fslatency-nd and fsslow-nd tests to see what is going on.  The problem seems to be related to the return probes.  I have a small reproducer attached which runs fine on x86_64 machine.  However on aarch64 it has the bogus read because some of the argument registers have changed value
>>
>> # ../install/bin/stap ./aarch64_retkprobe_issue2.stp 
>> ERROR: read fault [man error::fault] at 0x0000000000000034 (addr) near operator '@cast' at ./aarch64_retkprobe_issue2.stp:13:7
>> pc : [<fffffe000021e37c>] lr : [<fffffe000021eb64>] pstate: 80000145
>> sp : fffffe00bad7be30
>> x29: fffffe00bad7be30 x28: fffffe00bad78000 
>> x27: fffffe0000912000 x26: 000000000000003f 
>> x25: 000000000000011d x24: 0000000000000015 
>> x23: 0000000080000000 x22: 000003fff82b9760 
>> x21: fffffe00bad7bec8 x20: 0000000000002004 
>> x19: fffffe01b716e100 x18: 000003fff82b8160 
>> x17: 000003ff849bf0a0 x16: fffffe000021f4a0 
>> x15: 0000000000000004 x14: 000003fff82bb910 
>> x13: 0000000000000001 x12: 000003ff7d75f200 
>> x11: 00000000003d0f00 x10: 000003ff849b7af4 
>> x9 : 0000000000000028 x8 : 0000000000000020 
>> x7 : fffffe00bc5c3600 x6 : 0000000000000000 
>> x5 : 0000000000000000 x4 : 0000000000000000 
>> x3 : fffffe00bad7bec8 x2 : 0000000000002004 
>> x1 : 000003fff82b9760 x0 : fffffe01b716e100 
>>
>> pc : [<fffffe000021e37c>] lr : [<fffffe000009fbe0>] pstate: 60000145
>> sp : fffffe00bad7be30
>> x29: fffffe00bad7be30 x28: fffffe00bad78000 
>> x27: fffffe0000912000 x26: 000000000000003f 
>> x25: 000000000000011d x24: 0000000000000015 
>> x23: 0000000080000000 x22: 000003fff82b9760 
>> x21: fffffe00bad7bec8 x20: 0000000000002004 
>> x19: fffffe01b716e100 x18: 000003fff82b8160 
>> x17: 000003ff849bf0a0 x16: fffffe000021f4a0 
>> x15: 0000000000000004 x14: 000003fff82bb910 
>> x13: 0000000000000001 x12: 000003ff7d75f200 
>> x11: 00000000003d0f00 x10: 000003ff849b7af4 
>> x9 : 0000000000000028 x8 : 0000000000000020 
>> x7 : fffffe00bc5c3600 x6 : 000003fff82b976c 
>> x5 : 000003fff82b976c x4 : 0000000000000000 
>> x3 : 0000000000000000 x2 : 0000000000000000 
>> x1 : 0000000000000000 x0 : 000000000000000c 
> 
> Although I am not sure, but this is what it seems to me:
> 
> First argument (file) is in x0, and which is 0xC in case of kretprobe. But, can
> x0 really be considered as 1st arg in case of kretprobe? 
> I think, x0 should have return value of __vfs_read() in case of kretprobe. So,
> 0xC could be the number of bytes read.
> 
> With perf I see:
> 
> # perf probe -k vmlinux __vfs_read_exit=__vfs_read%return file
> Semantic error :You can't specify local variable for kretprobe.
> 
> So, I am not sure what mechanism systemtap uses to get local variable in case of
> kretprobe.
> 
> Moreover, on x86 I see that loop exits after the 1st print_regs() only. So it
> means there was valid file->f_op->read() for the 1st file itself. If I comment
> "kprobe.function("__vfs_read")", then there is no print at all. It means, we are
> not hitting a case on x86 when callback was called for kretprobe and we had
> nonzero 1st argument.
> 
> ~Pratyush
> 

Hi Pratyush,

I found that the problem was that the systemtap scripts assumed x86_64 behavior where the arguments passed into the function are in memory and are still around on return.  For aarch64 (and other architectures that pass arguments via registers) the registers holding the values get clobbered.  I checked in a fix into the exampls to address this problem:

https://sourceware.org/git/gitweb.cgi?p=systemtap.git;a=commit;h=3d0c2f452f09a64b800aabe68508f8f0183f0ea1

I also filed a systemtap bug to look for this issue in other places in systemtap:

https://sourceware.org/bugzilla/show_bug.cgi?id=19360

-Will

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Recent aarch64 kprobes and uprobes patch systemtap testing
  2015-12-16  5:22     ` Pratyush Anand
@ 2015-12-16 13:14       ` William Cohen
  2015-12-17  0:53         ` Pratyush Anand
  0 siblings, 1 reply; 13+ messages in thread
From: William Cohen @ 2015-12-16 13:14 UTC (permalink / raw)
  To: Pratyush Anand; +Cc: David Long, systemtap

On 12/16/2015 12:22 AM, Pratyush Anand wrote:
> On 11/12/2015:12:02:21 PM, William Cohen wrote:
>>
>> The "FAIL: pthread_stacks -Gsize (0 0)" looks like it could be an issue with uprobes affecting the running of the program.  Pratyush are you able to run this systemtap test locally?
> 
> Even when I run this test locally it does not work, but it fails very early in
> my case. May be because of different libpthread.so
> 
> [root@amd-seattle-01 testsuite]# /root/bin/systemtap/bin/stap -gp4 ./systemtap.examples/process/threadstacks.stp -Gsize=65536 -d /root/systemtap/testsuite/pthread_stacks.x
> semantic error: while resolving probe point: identifier 'process' at ./systemtap.examples/process/threadstacks.stp:17:7
>         source: probe process("/lib*/libpthread.so.*").function("allocate_stack") {
>                       ^
> 
> semantic error: no match

You might need to install glibc-debuginfo.  Below is some information from the machine I have setup showing that the probe point is available and what glibc stuff is installed on the machine:

[root@apm-mustang-ev3-01 systemtap]# ../install/bin/stap -L 'process("/lib*/libpthread.so.*").function("allocate_stack")'
process("/usr/lib64/libpthread-2.17.so").function("allocate_stack@/usr/src/debug/glibc-2.17-c758a686/nptl/allocatestack.c:344") $stack:void** $pdp:struct pthread** $attr:struct pthread_attr const*
[root@apm-mustang-ev3-01 systemtap]# rpm -qf /usr/lib64/libpthread-2.17.so 
glibc-2.17-105.el7.aarch64
[root@apm-mustang-ev3-01 systemtap]# rpm -qa|grep glibc
glibc-common-2.17-105.el7.aarch64
glibc-devel-2.17-105.el7.aarch64
glibc-debuginfo-2.17-105.el7.aarch64
glibc-headers-2.17-105.el7.aarch64
glibc-2.17-105.el7.aarch64

-Will

> 
> Pass 2: analysis failed.  [man error::pass2]
> [root@amd-seattle-01 testsuite]# ls /lib*/libpthread.so.*
> /lib64/libpthread.so.0
> [root@amd-seattle-01 testsuite]# ll /lib64/libpthread.so.0
> lrwxrwxrwx. 1 root root 18 Dec 13 23:42 /lib64/libpthread.so.0 -> libpthread-2.17.so
> [root@amd-seattle-01 testsuite]# objdump -d /lib64/libpthread.so.0 | grep allocate_stack
> 0000000000006a50 <__deallocate_stack>:
>     6a7c:       54000061        b.ne    6a88 <__deallocate_stack+0x38>
>     6a84:       35ffff83        cbnz    w3, 6a74 <__deallocate_stack+0x24>
>     6a88:       540005e1        b.ne    6b44 <__deallocate_stack+0xf4>
>     6a90:       350005e0        cbnz    w0, 6b4c <__deallocate_stack+0xfc>
>     6ac4:       350005e2        cbnz    w2, 6b80 <__deallocate_stack+0x130>
>     6b14:       54000328        b.hi    6b78 <__deallocate_stack+0x128>
>     6b2c:       35ffffc2        cbnz    w2, 6b24 <__deallocate_stack+0xd4>
>     6b34:       5400014c        b.gt    6b5c <__deallocate_stack+0x10c>
>     6b48:       17ffffd1        b       6a8c <__deallocate_stack+0x3c>
>     6b58:       17ffffcf        b       6a94 <__deallocate_stack+0x44>
>     6b74:       17fffff1        b       6b38 <__deallocate_stack+0xe8>
>     6b7c:       17ffffe7        b       6b18 <__deallocate_stack+0xc8>
>     6b8c:       17ffffe3        b       6b18 <__deallocate_stack+0xc8>
>     6c3c:       97ffff85        bl      6a50 <__deallocate_stack>
>     7ce4:       97fffb5b        bl      6a50 <__deallocate_stack>
>     7f04:       97fffad3        bl      6a50 <__deallocate_stack>
>     894c:       97fff841        bl      6a50 <__deallocate_stack>
> [root@amd-seattle-01 testsuite]#
> 
> ~Pratyush
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Recent aarch64 kprobes and uprobes patch systemtap testing
  2015-12-16 13:14       ` William Cohen
@ 2015-12-17  0:53         ` Pratyush Anand
  0 siblings, 0 replies; 13+ messages in thread
From: Pratyush Anand @ 2015-12-17  0:53 UTC (permalink / raw)
  To: William Cohen; +Cc: David Long, systemtap

On 16/12/2015:08:14:01 AM, William Cohen wrote:
> On 12/16/2015 12:22 AM, Pratyush Anand wrote:
> > On 11/12/2015:12:02:21 PM, William Cohen wrote:
> >>
> >> The "FAIL: pthread_stacks -Gsize (0 0)" looks like it could be an issue with uprobes affecting the running of the program.  Pratyush are you able to run this systemtap test locally?
> > 
> > Even when I run this test locally it does not work, but it fails very early in
> > my case. May be because of different libpthread.so
> > 
> > [root@amd-seattle-01 testsuite]# /root/bin/systemtap/bin/stap -gp4 ./systemtap.examples/process/threadstacks.stp -Gsize=65536 -d /root/systemtap/testsuite/pthread_stacks.x
> > semantic error: while resolving probe point: identifier 'process' at ./systemtap.examples/process/threadstacks.stp:17:7
> >         source: probe process("/lib*/libpthread.so.*").function("allocate_stack") {
> >                       ^
> > 
> > semantic error: no match
> 
> You might need to install glibc-debuginfo.  Below is some information from the machine I have setup showing that the probe point is available and what glibc stuff is installed on the machine:

Thanks. After installing glibc-debuginfo I see the test is passing locally.

PASS: ./systemtap.examples/process/thread-business run
meta taglines '' tag 'output' value ''
PRETEST PWD=/root/systemtap/testsuite
meta taglines '' tag 'test_support' value ''
TEST PWD=/root/systemtap/testsuite/systemtap.examples/process
sourcing threadstacks.tcl for ./systemtap.examples/process/threadstacks
meta taglines 'test_check: stap -gp4 threadstacks.stp -Gsize=65536 -d `which stap`' tag 'test_check' value 'stap -gp4 threadstacks.stp -Gsize=65536 -d `which stap`'
attempting command stap -gp4 threadstacks.stp -Gsize=65536 -d `which stap`
OUT /root/systemtap/testsuite/.systemtap-root/cache/71/stap_715823b07593f6d19206a3fa54071a9b_9758.ko
RC 0
PASS: ./systemtap.examples/process/threadstacks build
meta taglines 'test_installcheck: stap -g threadstacks.stp -Gsize=65536 -c "sleep 1" -d `which stap`' tag 'test_installcheck' value 'stap -g threadstacks.stp -Gsize=65536 -c "sleep 1" -d `which stap`'
attempting command stap -g threadstacks.stp -Gsize=65536 -c "sleep 1" -d `which stap`
OUT
RC 0
PASS: ./systemtap.examples/process/threadstacks run

~Pratyush

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-12-17  0:53 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-10 20:24 Recent aarch64 kprobes and uprobes patch systemtap testing William Cohen
2015-12-10 21:12 ` David Long
2015-12-11  4:19   ` Pratyush Anand
2015-12-11  4:43     ` David Long
2015-12-11 17:02   ` William Cohen
2015-12-16  5:22     ` Pratyush Anand
2015-12-16 13:14       ` William Cohen
2015-12-17  0:53         ` Pratyush Anand
2015-12-11 20:59   ` William Cohen
2015-12-16 11:55     ` Pratyush Anand
2015-12-16 13:10       ` William Cohen
2015-12-10 21:17 ` David Long
2015-12-11 17:23 ` David Smith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).