public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* exercising current aarch64 kprobe support with systemtap
@ 2016-06-09 16:17 William Cohen
  2016-06-09 19:52 ` William Cohen
                   ` (2 more replies)
  0 siblings, 3 replies; 56+ messages in thread
From: William Cohen @ 2016-06-09 16:17 UTC (permalink / raw)
  To: systemtap, Dave Long, Pratyush Anand, Mark Brown

[-- Attachment #1: Type: text/plain, Size: 3537 bytes --]

I have been exercising the current kprobes and uprobe patches for
arm64 that are in the test_upstream_arm64_devel branch of
https://github.com/pratyushanand/linux with systemtap.  There are a
two issues that I have seen on this kernel with systemtap.  There are
some cases where kprobes fail to register at places that appear to be
reasonable places for a kprobe.  The other issue is that kernel starts
having soft lockups when the hw_watch_addr.stp tests runs.  To get
systemtap with the newer kernels need the attached hack because of
changes in the aarch64 macro args.

EINVAL for seemingly valid kprobe registration

Below shows the bz1027459.stp failing because of the some of the kprobes not registering.

# make installcheck RUNTESTFLAGS="--debug systemtap.base/bz1027459.exp"
...


spawn stap /root/systemtap_write/systemtap/testsuite/systemtap.base/bz1027459.stp
WARNING: probe kernel.function("SyS_set_tid_address@kernel/fork.c:1236").call (address 0xfffffc00080c9578) registration error (rc -22)
WARNING: probe kernel.function("SyS_sched_setaffinity@kernel/sched/core.c:4690").call (address 0xfffffc0008104d58) registration error (rc -22)
WARNING: probe kernel.function("SyS_sched_get_priority_min@kernel/sched/core.c:5013").call (address 0xfffffc0008105250) registration error (rc -22)
WARNING: probe kernel.function("SyS_sched_get_priority_max@kernel/sched/core.c:4986").call (address 0xfffffc00081051e8) registration error (rc -22)
hi
FAIL: bz1027459 -p5 (0)

area around  Sys_set_tid_address

fffffc00080c956c:	d503201f 	nop
fffffc00080c9570:	08dc4c80 	.word	0x08dc4c80
fffffc00080c9574:	fffffc00 	.word	0xfffffc00

fffffc00080c9578 <SyS_set_tid_address>:
fffffc00080c9578:	a9be7bfd 	stp	x29, x30, [sp,#-32]!
fffffc00080c957c:	910003fd 	mov	x29, sp

area around SyS_sched_setaffiniity

fffffc0008104d4c:	17ffff73 	b	fffffc0008104b18 <sched_setaffinity+0x438>
fffffc0008104d50:	08dd9d80 	.word	0x08dd9d80
fffffc0008104d54:	fffffc00 	.word	0xfffffc00

fffffc0008104d58 <SyS_sched_setaffinity>:
fffffc0008104d58:	a9bb7bfd 	stp	x29, x30, [sp,#-80]!
fffffc0008104d5c:	910003fd 	mov	x29, sp

area around SyS_sched_get_priority_min

fffffc0008105244:	f9400bf3 	ldr	x19, [sp,#16]
fffffc0008105248:	a8c27bfd 	ldp	x29, x30, [sp],#32
fffffc000810524c:	d65f03c0 	ret

fffffc0008105250 <SyS_sched_get_priority_min>:
fffffc0008105250:	a9be7bfd 	stp	x29, x30, [sp,#-32]!
fffffc0008105254:	910003fd 	mov	x29, sp


area around SyS_sched_get_priority_max


fffffc00081051dc:	17ffffe8 	b	fffffc000810517c <sys_sched_yield+0x34>
fffffc00081051e0:	08dd9d80 	.word	0x08dd9d80
fffffc00081051e4:	fffffc00 	.word	0xfffffc00

fffffc00081051e8 <SyS_sched_get_priority_max>:
fffffc00081051e8:	a9be7bfd 	stp	x29, x30, [sp,#-32]!
fffffc00081051ec:	910003fd 	mov	x29, sp


The stp (store pair) instructions at the beginning of these functions
should be fine to instrument.  One thing that I could think of causing
a problem is the test to make sure that the instruction is not inside
a load exclusive/store exclusive region.  The test might be mistaking
some of the data before the start of the function as load exclusive
instructions.



Soft Lookup for the hw_watch_addr.stp

When running the hw_watch_addr.stp tests the machine gets a number of
processes using a lot of sys time and eventually the kernel reports
soft lockup:

http://paste.stg.fedoraproject.org/5323/

The systemtap.base/overload.exp tests all pass, but maybe there is
much work being done to generate the backtraces for hw_watch_addr.stp
and that is triggering the problem.


-Will

[-- Attachment #2: arm64_uaccess.patch --]
[-- Type: text/x-patch, Size: 2671 bytes --]

diff --git a/runtime/linux/loc2c-runtime.h b/runtime/linux/loc2c-runtime.h
index a3bec58..7589026 100644
--- a/runtime/linux/loc2c-runtime.h
+++ b/runtime/linux/loc2c-runtime.h
@@ -589,10 +589,10 @@ extern void __store_deref_bad(void);
     else                                                                      \
       switch (size)                                                           \
         {                                                                     \
-	case 1: __get_user_asm("ldrb", "%w", _v, (unsigned long)addr, _bad); break;\
-	case 2: __get_user_asm("ldrh", "%w",_v, (unsigned long)addr, _bad); break;\
-	case 4: __get_user_asm("ldr", "%w",_v,  (unsigned long)addr, _bad); break;\
-	case 8: __get_user_asm("ldr", "%",_v,  (unsigned long)addr, _bad); break;\
+	case 1: __get_user_asm("ldrb", "ldtrb", "%w", _v, (unsigned long)addr, _bad, ARM64_HAS_UAO); break; \
+	case 2: __get_user_asm("ldrh", "ldtrh", "%w",_v, (unsigned long)addr, _bad, ARM64_HAS_UAO); break; \
+	case 4: __get_user_asm("ldr", "ldtr", "%w",_v,  (unsigned long)addr, _bad, ARM64_HAS_UAO); break; \
+	case 8: __get_user_asm("ldr", "ldtr", "%",_v,  (unsigned long)addr, _bad, ARM64_HAS_UAO); break; \
         default: BUILD_BUG();			                              \
         }                                                                     \
     pagefault_enable();                                                       \
@@ -613,10 +613,10 @@ extern void __store_deref_bad(void);
     else                                                                      \
       switch (size)                                                           \
         {                                                                     \
-	case 1: __put_user_asm("strb", "%w", ((u8)(value)), addr, _bad); break;\
-	case 2: __put_user_asm("strh", "%w", ((u16)(value)), addr, _bad); break;\
-	case 4: __put_user_asm("str", "%w", ((u32)(value)), addr, _bad); break;\
-	case 8: __put_user_asm("str", "%", ((u64)(value)), addr, _bad); break;\
+	case 1: __put_user_asm("strb", "sttrb", "%w", ((u8)(value)), addr, _bad, ARM64_HAS_UAO); break; \
+	case 2: __put_user_asm("strh", "sttrh", "%w", ((u16)(value)), addr, _bad, ARM64_HAS_UAO); break;	\
+	case 4: __put_user_asm("str", "sttr", "%w", ((u32)(value)), addr, _bad, ARM64_HAS_UAO); break; \
+	case 8: __put_user_asm("str", "sttr", "%", ((u64)(value)), addr, _bad, ARM64_HAS_UAO); break; \
         default: BUILD_BUG();                                                 \
         }                                                                     \
     pagefault_enable();                                                       \

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-06-09 16:17 exercising current aarch64 kprobe support with systemtap William Cohen
@ 2016-06-09 19:52 ` William Cohen
  2016-06-10  3:42   ` David Long
  2016-06-10  5:49   ` David Long
  2016-06-10 21:28 ` William Cohen
  2016-06-13 16:11 ` William Cohen
  2 siblings, 2 replies; 56+ messages in thread
From: William Cohen @ 2016-06-09 19:52 UTC (permalink / raw)
  To: systemtap, Dave Long, Pratyush Anand, Mark Brown

On 06/09/2016 12:17 PM, William Cohen wrote:
> I have been exercising the current kprobes and uprobe patches for
> arm64 that are in the test_upstream_arm64_devel branch of
> https://github.com/pratyushanand/linux with systemtap.  There are a
> two issues that I have seen on this kernel with systemtap.  There are
> some cases where kprobes fail to register at places that appear to be
> reasonable places for a kprobe.  The other issue is that kernel starts
> having soft lockups when the hw_watch_addr.stp tests runs.  To get
> systemtap with the newer kernels need the attached hack because of
> changes in the aarch64 macro args.
> 
> EINVAL for seemingly valid kprobe registration
> 
> Below shows the bz1027459.stp failing because of the some of the kprobes not registering.
> 
> # make installcheck RUNTESTFLAGS="--debug systemtap.base/bz1027459.exp"
> ...
> 
> 
> spawn stap /root/systemtap_write/systemtap/testsuite/systemtap.base/bz1027459.stp
> WARNING: probe kernel.function("SyS_set_tid_address@kernel/fork.c:1236").call (address 0xfffffc00080c9578) registration error (rc -22)
> WARNING: probe kernel.function("SyS_sched_setaffinity@kernel/sched/core.c:4690").call (address 0xfffffc0008104d58) registration error (rc -22)
> WARNING: probe kernel.function("SyS_sched_get_priority_min@kernel/sched/core.c:5013").call (address 0xfffffc0008105250) registration error (rc -22)
> WARNING: probe kernel.function("SyS_sched_get_priority_max@kernel/sched/core.c:4986").call (address 0xfffffc00081051e8) registration error (rc -22)
> hi
> FAIL: bz1027459 -p5 (0)
> 
> area around  Sys_set_tid_address
> 
> fffffc00080c956c:	d503201f 	nop
> fffffc00080c9570:	08dc4c80 	.word	0x08dc4c80
> fffffc00080c9574:	fffffc00 	.word	0xfffffc00
> 
> fffffc00080c9578 <SyS_set_tid_address>:
> fffffc00080c9578:	a9be7bfd 	stp	x29, x30, [sp,#-32]!
> fffffc00080c957c:	910003fd 	mov	x29, sp
> 
> area around SyS_sched_setaffiniity
> 
> fffffc0008104d4c:	17ffff73 	b	fffffc0008104b18 <sched_setaffinity+0x438>
> fffffc0008104d50:	08dd9d80 	.word	0x08dd9d80
> fffffc0008104d54:	fffffc00 	.word	0xfffffc00
> 
> fffffc0008104d58 <SyS_sched_setaffinity>:
> fffffc0008104d58:	a9bb7bfd 	stp	x29, x30, [sp,#-80]!
> fffffc0008104d5c:	910003fd 	mov	x29, sp
> 
> area around SyS_sched_get_priority_min
> 
> fffffc0008105244:	f9400bf3 	ldr	x19, [sp,#16]
> fffffc0008105248:	a8c27bfd 	ldp	x29, x30, [sp],#32
> fffffc000810524c:	d65f03c0 	ret
> 
> fffffc0008105250 <SyS_sched_get_priority_min>:
> fffffc0008105250:	a9be7bfd 	stp	x29, x30, [sp,#-32]!
> fffffc0008105254:	910003fd 	mov	x29, sp
> 
> 
> area around SyS_sched_get_priority_max
> 
> 
> fffffc00081051dc:	17ffffe8 	b	fffffc000810517c <sys_sched_yield+0x34>
> fffffc00081051e0:	08dd9d80 	.word	0x08dd9d80
> fffffc00081051e4:	fffffc00 	.word	0xfffffc00
> 
> fffffc00081051e8 <SyS_sched_get_priority_max>:
> fffffc00081051e8:	a9be7bfd 	stp	x29, x30, [sp,#-32]!
> fffffc00081051ec:	910003fd 	mov	x29, sp
> 
> 
> The stp (store pair) instructions at the beginning of these functions
> should be fine to instrument.  One thing that I could think of causing
> a problem is the test to make sure that the instruction is not inside
> a load exclusive/store exclusive region.  The test might be mistaking
> some of the data before the start of the function as load exclusive
> instructions.


I verified that the cause of kprobes not being registered is the scan
backward for load exclusive instructions.  For one example have:

...
fffffc00080c98cc:	d503201f 	nop
fffffc00080c98d0:	08dc4c80 	.word	0x08dc4c80
fffffc00080c98d4:	fffffc00 	.word	0xfffffc00

fffffc00080c98d8 <SyS_set_tid_address>:
fffffc00080c98d8:	a9be7bfd 	stp	x29, x30, [sp,#-32]!
fffffc00080c98dc:	910003fd 	mov	x29, sp

The previous function has 0xfffffc0008dc4c80 as data at the end of the
function. The scan backwards from the beginning of the current
function Sys_set_tid_address stumbles into that data and interprets
the 0x08dc4c80 as load exclusive instructions.  This causes the kprobe
registration to fail.

Disabled the is_probed_address_atomic() scan for atomic instructions
allows the test to work:

make installcheck RUNTESTFLAGS="--debug systemtap.base/bz1027459.exp"
...
Running target unix
Running /root/systemtap_write/systemtap/testsuite/systemtap.base/bz1027459.exp ...
PASS: bz1027459 -p5

		=== systemtap Summary ===

# of expected passes		1


Somehow the is_probed_address_atomic and arm_kprobe_decode_insn
functions need to avoid scanning past the beginning of a function.

-Will


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-06-09 19:52 ` William Cohen
@ 2016-06-10  3:42   ` David Long
  2016-06-10  5:49   ` David Long
  1 sibling, 0 replies; 56+ messages in thread
From: David Long @ 2016-06-10  3:42 UTC (permalink / raw)
  To: William Cohen, systemtap, Pratyush Anand, Mark Brown

On 06/09/2016 03:52 PM, William Cohen wrote:
> On 06/09/2016 12:17 PM, William Cohen wrote:
>> I have been exercising the current kprobes and uprobe patches for
>> arm64 that are in the test_upstream_arm64_devel branch of
>> https://github.com/pratyushanand/linux with systemtap.  There are a
>> two issues that I have seen on this kernel with systemtap.  There are
>> some cases where kprobes fail to register at places that appear to be
>> reasonable places for a kprobe.  The other issue is that kernel starts
>> having soft lockups when the hw_watch_addr.stp tests runs.  To get
>> systemtap with the newer kernels need the attached hack because of
>> changes in the aarch64 macro args.
>>
>> EINVAL for seemingly valid kprobe registration
>>
>> Below shows the bz1027459.stp failing because of the some of the kprobes not registering.
>>
>> # make installcheck RUNTESTFLAGS="--debug systemtap.base/bz1027459.exp"
>> ...
>>
>>
>> spawn stap /root/systemtap_write/systemtap/testsuite/systemtap.base/bz1027459.stp
>> WARNING: probe kernel.function("SyS_set_tid_address@kernel/fork.c:1236").call (address 0xfffffc00080c9578) registration error (rc -22)
>> WARNING: probe kernel.function("SyS_sched_setaffinity@kernel/sched/core.c:4690").call (address 0xfffffc0008104d58) registration error (rc -22)
>> WARNING: probe kernel.function("SyS_sched_get_priority_min@kernel/sched/core.c:5013").call (address 0xfffffc0008105250) registration error (rc -22)
>> WARNING: probe kernel.function("SyS_sched_get_priority_max@kernel/sched/core.c:4986").call (address 0xfffffc00081051e8) registration error (rc -22)
>> hi
>> FAIL: bz1027459 -p5 (0)
>>
>> area around  Sys_set_tid_address
>>
>> fffffc00080c956c:	d503201f 	nop
>> fffffc00080c9570:	08dc4c80 	.word	0x08dc4c80
>> fffffc00080c9574:	fffffc00 	.word	0xfffffc00
>>
>> fffffc00080c9578 <SyS_set_tid_address>:
>> fffffc00080c9578:	a9be7bfd 	stp	x29, x30, [sp,#-32]!
>> fffffc00080c957c:	910003fd 	mov	x29, sp
>>
>> area around SyS_sched_setaffiniity
>>
>> fffffc0008104d4c:	17ffff73 	b	fffffc0008104b18 <sched_setaffinity+0x438>
>> fffffc0008104d50:	08dd9d80 	.word	0x08dd9d80
>> fffffc0008104d54:	fffffc00 	.word	0xfffffc00
>>
>> fffffc0008104d58 <SyS_sched_setaffinity>:
>> fffffc0008104d58:	a9bb7bfd 	stp	x29, x30, [sp,#-80]!
>> fffffc0008104d5c:	910003fd 	mov	x29, sp
>>
>> area around SyS_sched_get_priority_min
>>
>> fffffc0008105244:	f9400bf3 	ldr	x19, [sp,#16]
>> fffffc0008105248:	a8c27bfd 	ldp	x29, x30, [sp],#32
>> fffffc000810524c:	d65f03c0 	ret
>>
>> fffffc0008105250 <SyS_sched_get_priority_min>:
>> fffffc0008105250:	a9be7bfd 	stp	x29, x30, [sp,#-32]!
>> fffffc0008105254:	910003fd 	mov	x29, sp
>>
>>
>> area around SyS_sched_get_priority_max
>>
>>
>> fffffc00081051dc:	17ffffe8 	b	fffffc000810517c <sys_sched_yield+0x34>
>> fffffc00081051e0:	08dd9d80 	.word	0x08dd9d80
>> fffffc00081051e4:	fffffc00 	.word	0xfffffc00
>>
>> fffffc00081051e8 <SyS_sched_get_priority_max>:
>> fffffc00081051e8:	a9be7bfd 	stp	x29, x30, [sp,#-32]!
>> fffffc00081051ec:	910003fd 	mov	x29, sp
>>
>>
>> The stp (store pair) instructions at the beginning of these functions
>> should be fine to instrument.  One thing that I could think of causing
>> a problem is the test to make sure that the instruction is not inside
>> a load exclusive/store exclusive region.  The test might be mistaking
>> some of the data before the start of the function as load exclusive
>> instructions.
>
>
> I verified that the cause of kprobes not being registered is the scan
> backward for load exclusive instructions.  For one example have:
>
> ...
> fffffc00080c98cc:	d503201f 	nop
> fffffc00080c98d0:	08dc4c80 	.word	0x08dc4c80
> fffffc00080c98d4:	fffffc00 	.word	0xfffffc00
>
> fffffc00080c98d8 <SyS_set_tid_address>:
> fffffc00080c98d8:	a9be7bfd 	stp	x29, x30, [sp,#-32]!
> fffffc00080c98dc:	910003fd 	mov	x29, sp
>
> The previous function has 0xfffffc0008dc4c80 as data at the end of the
> function. The scan backwards from the beginning of the current
> function Sys_set_tid_address stumbles into that data and interprets
> the 0x08dc4c80 as load exclusive instructions.  This causes the kprobe
> registration to fail.
>
> Disabled the is_probed_address_atomic() scan for atomic instructions
> allows the test to work:
>
> make installcheck RUNTESTFLAGS="--debug systemtap.base/bz1027459.exp"
> ...
> Running target unix
> Running /root/systemtap_write/systemtap/testsuite/systemtap.base/bz1027459.exp ...
> PASS: bz1027459 -p5
>
> 		=== systemtap Summary ===
>
> # of expected passes		1
>

Interesting coincidence.  Thanks for isolating the cause of that problem.

>
> Somehow the is_probed_address_atomic and arm_kprobe_decode_insn
> functions need to avoid scanning past the beginning of a function.
>
> -Will
>
>

I'm not sure we're going to be able to do that.  We could interpret a 
"ret" to end the scan but that's probably going to be on the other side 
of the data. And can we be certain the compiler only puts literals 
between functions?  Maybe the "stp x29,x30,[sp,...]" instruction is 
definitive enough to use as a flag for the beginning of functions.  This 
is the approach I'm now thinking might be the best fix.

-dl

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-06-09 19:52 ` William Cohen
  2016-06-10  3:42   ` David Long
@ 2016-06-10  5:49   ` David Long
  2016-06-10 13:43     ` Pratyush Anand
  2016-07-12 14:33     ` William Cohen
  1 sibling, 2 replies; 56+ messages in thread
From: David Long @ 2016-06-10  5:49 UTC (permalink / raw)
  To: William Cohen, systemtap, Pratyush Anand, Mark Brown

[-- Attachment #1: Type: text/plain, Size: 229 bytes --]

Attached are incremental diffs I hope will fix the latest systemtap 
failures, without abandoning atomic sequence checking.  I'm trying to 
avoid the hex constants but I don't think the insn.c functions help in 
this case.

-dl


[-- Attachment #2: diffs --]
[-- Type: text/plain, Size: 686 bytes --]

diff --git a/arch/arm64/kernel/kprobes-arm64.c b/arch/arm64/kernel/kprobes-arm64.c
index 28b9c5b..36b4ea5 100644
--- a/arch/arm64/kernel/kprobes-arm64.c
+++ b/arch/arm64/kernel/kprobes-arm64.c
@@ -127,7 +127,9 @@ is_probed_address_atomic(kprobe_opcode_t *scan_start, kprobe_opcode_t *scan_end)
 		 * atomic region starts from exclusive load and ends with
 		 * exclusive store.
 		 */
-		if (aarch64_insn_is_store_ex(le32_to_cpu(*scan_start)))
+		if ((le32_to_cpu(*scan_start) & 0xffc07fff) == 0xa9807bfd)
+			return false;
+		else if (aarch64_insn_is_store_ex(le32_to_cpu(*scan_start)))
 			return false;
 		else if (aarch64_insn_is_load_ex(le32_to_cpu(*scan_start)))
 			return true;

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-06-10  5:49   ` David Long
@ 2016-06-10 13:43     ` Pratyush Anand
  2016-06-10 14:03       ` William Cohen
  2016-06-10 14:20       ` David Long
  2016-07-12 14:33     ` William Cohen
  1 sibling, 2 replies; 56+ messages in thread
From: Pratyush Anand @ 2016-06-10 13:43 UTC (permalink / raw)
  To: David Long; +Cc: William Cohen, systemtap, Mark Brown

On 10/06/2016:01:49:10 AM, David Long wrote:
> Attached are incremental diffs I hope will fix the latest systemtap
> failures, without abandoning atomic sequence checking.  I'm trying to avoid
> the hex constants but I don't think the insn.c functions help in this case.

It will save us from current problem by checking "stp x29,x30,[sp,...]"
instruction and returning false if matches. However, we will have to find some
way to recognize .word instructions.

* An assembly function may not start with "stp x29,x30,[sp,...]", e.g.
 __dma_map_area(), _cpu_resume etc. However, it could be least likely that a
 .word instruction exists before start of assembly function and that too
 contains a word value which could be misleading.

* But major issue is, what if someone instruments a kprobe at an address which
 contains  .word values. Instruction will never hit, so probe function will not
 be called, but when real code reads that .word value, it reads a wrong value.

Can GCC provide some compiler option where .word values are located into a
specific area?

~Pratyush

> 
> -dl
> 

> diff --git a/arch/arm64/kernel/kprobes-arm64.c b/arch/arm64/kernel/kprobes-arm64.c
> index 28b9c5b..36b4ea5 100644
> --- a/arch/arm64/kernel/kprobes-arm64.c
> +++ b/arch/arm64/kernel/kprobes-arm64.c
> @@ -127,7 +127,9 @@ is_probed_address_atomic(kprobe_opcode_t *scan_start, kprobe_opcode_t *scan_end)
>  		 * atomic region starts from exclusive load and ends with
>  		 * exclusive store.
>  		 */
> -		if (aarch64_insn_is_store_ex(le32_to_cpu(*scan_start)))
> +		if ((le32_to_cpu(*scan_start) & 0xffc07fff) == 0xa9807bfd)
> +			return false;
> +		else if (aarch64_insn_is_store_ex(le32_to_cpu(*scan_start)))
>  			return false;
>  		else if (aarch64_insn_is_load_ex(le32_to_cpu(*scan_start)))
>  			return true;

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-06-10 13:43     ` Pratyush Anand
@ 2016-06-10 14:03       ` William Cohen
  2016-06-10 14:37         ` David Long
  2016-06-10 14:20       ` David Long
  1 sibling, 1 reply; 56+ messages in thread
From: William Cohen @ 2016-06-10 14:03 UTC (permalink / raw)
  To: Pratyush Anand, David Long; +Cc: systemtap, Mark Brown

On 06/10/2016 09:42 AM, Pratyush Anand wrote:
> On 10/06/2016:01:49:10 AM, David Long wrote:
>> Attached are incremental diffs I hope will fix the latest systemtap
>> failures, without abandoning atomic sequence checking.  I'm trying to avoid
>> the hex constants but I don't think the insn.c functions help in this case.
> 
> It will save us from current problem by checking "stp x29,x30,[sp,...]"
> instruction and returning false if matches. However, we will have to find some
> way to recognize .word instructions.
> 
> * An assembly function may not start with "stp x29,x30,[sp,...]", e.g.
>  __dma_map_area(), _cpu_resume etc. However, it could be least likely that a
>  .word instruction exists before start of assembly function and that too
>  contains a word value which could be misleading.
> 
> * But major issue is, what if someone instruments a kprobe at an address which
>  contains  .word values. Instruction will never hit, so probe function will not
>  be called, but when real code reads that .word value, it reads a wrong value.
> 
> Can GCC provide some compiler option where .word values are located into a
> specific area?
> 
> ~Pratyush

Hi Dave and Pratyush,

Expecting the instruction to the stp x29, x30, [sp,...] would be pretty fragile.  The compiler might not generate that for some very simple function or with certain types of optimization. If the compiler could generate a sentinel word before the start of each function that might be a more robust solution.  Maybe something like a breakpoint instruction or something that clearly would not be in an atomic region.

-Will
> 
>>
>> -dl
>>
> 
>> diff --git a/arch/arm64/kernel/kprobes-arm64.c b/arch/arm64/kernel/kprobes-arm64.c
>> index 28b9c5b..36b4ea5 100644
>> --- a/arch/arm64/kernel/kprobes-arm64.c
>> +++ b/arch/arm64/kernel/kprobes-arm64.c
>> @@ -127,7 +127,9 @@ is_probed_address_atomic(kprobe_opcode_t *scan_start, kprobe_opcode_t *scan_end)
>>  		 * atomic region starts from exclusive load and ends with
>>  		 * exclusive store.
>>  		 */
>> -		if (aarch64_insn_is_store_ex(le32_to_cpu(*scan_start)))
>> +		if ((le32_to_cpu(*scan_start) & 0xffc07fff) == 0xa9807bfd)
>> +			return false;
>> +		else if (aarch64_insn_is_store_ex(le32_to_cpu(*scan_start)))
>>  			return false;
>>  		else if (aarch64_insn_is_load_ex(le32_to_cpu(*scan_start)))
>>  			return true;
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-06-10 13:43     ` Pratyush Anand
  2016-06-10 14:03       ` William Cohen
@ 2016-06-10 14:20       ` David Long
  2016-06-10 15:11         ` William Cohen
  2016-06-10 17:07         ` Pratyush Anand
  1 sibling, 2 replies; 56+ messages in thread
From: David Long @ 2016-06-10 14:20 UTC (permalink / raw)
  To: Pratyush Anand; +Cc: William Cohen, systemtap, Mark Brown

On 06/10/2016 09:42 AM, Pratyush Anand wrote:
> On 10/06/2016:01:49:10 AM, David Long wrote:
>> Attached are incremental diffs I hope will fix the latest systemtap
>> failures, without abandoning atomic sequence checking.  I'm trying to avoid
>> the hex constants but I don't think the insn.c functions help in this case.
>
> It will save us from current problem by checking "stp x29,x30,[sp,...]"
> instruction and returning false if matches. However, we will have to find some
> way to recognize .word instructions.
>
> * An assembly function may not start with "stp x29,x30,[sp,...]", e.g.
>   __dma_map_area(), _cpu_resume etc. However, it could be least likely that a
>   .word instruction exists before start of assembly function and that too
>   contains a word value which could be misleading.
>
> * But major issue is, what if someone instruments a kprobe at an address which
>   contains  .word values. Instruction will never hit, so probe function will not
>   be called, but when real code reads that .word value, it reads a wrong value.
>

I had considered the assembler routine case but my take on it is that 
all of this is just a best effort heuristic attempt to prevent someone 
from kprobe'ing a kernel to death.  I don't hold out any hope for making 
this bullet-proof.  The mode of failure for the atomic sequence is the 
safer choice (rejecting probe registration) so I'm not that worried 
about the rare case of this happening.  Probing inline data doesn't seem 
like something we can protect from, although we now do blacklist some 
more data sections.

> Can GCC provide some compiler option where .word values are located into a
> specific area?
>

You can't just go moving the effect of .word directives into a new 
location/section.  As likely as not that data (which could be an actual 
instruction) needs to be exactly where they were put in the source.

> ~Pratyush
>
>>
>> -dl
>>
>
>> diff --git a/arch/arm64/kernel/kprobes-arm64.c b/arch/arm64/kernel/kprobes-arm64.c
>> index 28b9c5b..36b4ea5 100644
>> --- a/arch/arm64/kernel/kprobes-arm64.c
>> +++ b/arch/arm64/kernel/kprobes-arm64.c
>> @@ -127,7 +127,9 @@ is_probed_address_atomic(kprobe_opcode_t *scan_start, kprobe_opcode_t *scan_end)
>>   		 * atomic region starts from exclusive load and ends with
>>   		 * exclusive store.
>>   		 */
>> -		if (aarch64_insn_is_store_ex(le32_to_cpu(*scan_start)))
>> +		if ((le32_to_cpu(*scan_start) & 0xffc07fff) == 0xa9807bfd)
>> +			return false;
>> +		else if (aarch64_insn_is_store_ex(le32_to_cpu(*scan_start)))
>>   			return false;
>>   		else if (aarch64_insn_is_load_ex(le32_to_cpu(*scan_start)))
>>   			return true;
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-06-10 14:03       ` William Cohen
@ 2016-06-10 14:37         ` David Long
  2016-06-10 15:27           ` William Cohen
  0 siblings, 1 reply; 56+ messages in thread
From: David Long @ 2016-06-10 14:37 UTC (permalink / raw)
  To: William Cohen, Pratyush Anand; +Cc: systemtap, Mark Brown

On 06/10/2016 10:03 AM, William Cohen wrote:
> On 06/10/2016 09:42 AM, Pratyush Anand wrote:
>> On 10/06/2016:01:49:10 AM, David Long wrote:
>>> Attached are incremental diffs I hope will fix the latest systemtap
>>> failures, without abandoning atomic sequence checking.  I'm trying to avoid
>>> the hex constants but I don't think the insn.c functions help in this case.
>>
>> It will save us from current problem by checking "stp x29,x30,[sp,...]"
>> instruction and returning false if matches. However, we will have to find some
>> way to recognize .word instructions.
>>
>> * An assembly function may not start with "stp x29,x30,[sp,...]", e.g.
>>   __dma_map_area(), _cpu_resume etc. However, it could be least likely that a
>>   .word instruction exists before start of assembly function and that too
>>   contains a word value which could be misleading.
>>
>> * But major issue is, what if someone instruments a kprobe at an address which
>>   contains  .word values. Instruction will never hit, so probe function will not
>>   be called, but when real code reads that .word value, it reads a wrong value.
>>
>> Can GCC provide some compiler option where .word values are located into a
>> specific area?
>>
>> ~Pratyush
>
> Hi Dave and Pratyush,
>
> Expecting the instruction to the stp x29, x30, [sp,...] would be pretty fragile.  The compiler might not generate that for some very simple function or with certain types of optimization. If the compiler could generate a sentinel word before the start of each function that might be a more robust solution.  Maybe something like a breakpoint instruction or something that clearly would not be in an atomic region.
>

I think this is still a reasonable improvement.  The case of both 
heuristics failing together has to be pretty rare and the result is to 
make the safer choice.

I'm looking at what gcc might provide to help.  I made need to talk to a 
compiler expert though, I've always found the gcc option list a bit 
overwhelming.

> -Will
>>
>>>
>>> -dl
>>>
>>
>>> diff --git a/arch/arm64/kernel/kprobes-arm64.c b/arch/arm64/kernel/kprobes-arm64.c
>>> index 28b9c5b..36b4ea5 100644
>>> --- a/arch/arm64/kernel/kprobes-arm64.c
>>> +++ b/arch/arm64/kernel/kprobes-arm64.c
>>> @@ -127,7 +127,9 @@ is_probed_address_atomic(kprobe_opcode_t *scan_start, kprobe_opcode_t *scan_end)
>>>   		 * atomic region starts from exclusive load and ends with
>>>   		 * exclusive store.
>>>   		 */
>>> -		if (aarch64_insn_is_store_ex(le32_to_cpu(*scan_start)))
>>> +		if ((le32_to_cpu(*scan_start) & 0xffc07fff) == 0xa9807bfd)
>>> +			return false;
>>> +		else if (aarch64_insn_is_store_ex(le32_to_cpu(*scan_start)))
>>>   			return false;
>>>   		else if (aarch64_insn_is_load_ex(le32_to_cpu(*scan_start)))
>>>   			return true;
>>
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-06-10 14:20       ` David Long
@ 2016-06-10 15:11         ` William Cohen
  2016-06-10 17:07         ` Pratyush Anand
  1 sibling, 0 replies; 56+ messages in thread
From: William Cohen @ 2016-06-10 15:11 UTC (permalink / raw)
  To: David Long, Pratyush Anand; +Cc: systemtap, Mark Brown

On 06/10/2016 10:20 AM, David Long wrote:
> On 06/10/2016 09:42 AM, Pratyush Anand wrote:
>> On 10/06/2016:01:49:10 AM, David Long wrote:
>>> Attached are incremental diffs I hope will fix the latest systemtap
>>> failures, without abandoning atomic sequence checking.  I'm trying to avoid
>>> the hex constants but I don't think the insn.c functions help in this case.
>>
>> It will save us from current problem by checking "stp x29,x30,[sp,...]"
>> instruction and returning false if matches. However, we will have to find some
>> way to recognize .word instructions.
>>
>> * An assembly function may not start with "stp x29,x30,[sp,...]", e.g.
>>   __dma_map_area(), _cpu_resume etc. However, it could be least likely that a
>>   .word instruction exists before start of assembly function and that too
>>   contains a word value which could be misleading.
>>
>> * But major issue is, what if someone instruments a kprobe at an address which
>>   contains  .word values. Instruction will never hit, so probe function will not
>>   be called, but when real code reads that .word value, it reads a wrong value.
>>
> 
> I had considered the assembler routine case but my take on it is that all of this is just a best effort heuristic attempt to prevent someone from kprobe'ing a kernel to death.  I don't hold out any hope for making this bullet-proof.  The mode of failure for the atomic sequence is the safer choice (rejecting probe registration) so I'm not that worried about the rare case of this happening.  Probing inline data doesn't seem like something we can protect from, although we now do blacklist some more data sections.
> 

Yes, the way that the kprobe registration is failing is safer than the alternative.  The main concern is that some places that should be allowed for kprobes are not such as the beginning of the schedule function.  This type of failure will make a number of the systemtap scripts fail.

Would it be possible to look up internally through the data the /proc/kallsyms provides for the T/t symbol that precedes this address and do a max between that symbol address and currently computed start address?

-Will




>> Can GCC provide some compiler option where .word values are located into a
>> specific area?
>>
> 
> You can't just go moving the effect of .word directives into a new location/section.  As likely as not that data (which could be an actual instruction) needs to be exactly where they were put in the source.
> 
>> ~Pratyush
>>
>>>
>>> -dl
>>>
>>
>>> diff --git a/arch/arm64/kernel/kprobes-arm64.c b/arch/arm64/kernel/kprobes-arm64.c
>>> index 28b9c5b..36b4ea5 100644
>>> --- a/arch/arm64/kernel/kprobes-arm64.c
>>> +++ b/arch/arm64/kernel/kprobes-arm64.c
>>> @@ -127,7 +127,9 @@ is_probed_address_atomic(kprobe_opcode_t *scan_start, kprobe_opcode_t *scan_end)
>>>            * atomic region starts from exclusive load and ends with
>>>            * exclusive store.
>>>            */
>>> -        if (aarch64_insn_is_store_ex(le32_to_cpu(*scan_start)))
>>> +        if ((le32_to_cpu(*scan_start) & 0xffc07fff) == 0xa9807bfd)
>>> +            return false;
>>> +        else if (aarch64_insn_is_store_ex(le32_to_cpu(*scan_start)))
>>>               return false;
>>>           else if (aarch64_insn_is_load_ex(le32_to_cpu(*scan_start)))
>>>               return true;
>>
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-06-10 14:37         ` David Long
@ 2016-06-10 15:27           ` William Cohen
  0 siblings, 0 replies; 56+ messages in thread
From: William Cohen @ 2016-06-10 15:27 UTC (permalink / raw)
  To: David Long, Pratyush Anand; +Cc: systemtap, Mark Brown

On 06/10/2016 10:37 AM, David Long wrote:
> On 06/10/2016 10:03 AM, William Cohen wrote:
>> On 06/10/2016 09:42 AM, Pratyush Anand wrote:
>>> On 10/06/2016:01:49:10 AM, David Long wrote:
>>>> Attached are incremental diffs I hope will fix the latest systemtap
>>>> failures, without abandoning atomic sequence checking.  I'm trying to avoid
>>>> the hex constants but I don't think the insn.c functions help in this case.
>>>
>>> It will save us from current problem by checking "stp x29,x30,[sp,...]"
>>> instruction and returning false if matches. However, we will have to find some
>>> way to recognize .word instructions.
>>>
>>> * An assembly function may not start with "stp x29,x30,[sp,...]", e.g.
>>>   __dma_map_area(), _cpu_resume etc. However, it could be least likely that a
>>>   .word instruction exists before start of assembly function and that too
>>>   contains a word value which could be misleading.
>>>
>>> * But major issue is, what if someone instruments a kprobe at an address which
>>>   contains  .word values. Instruction will never hit, so probe function will not
>>>   be called, but when real code reads that .word value, it reads a wrong value.
>>>
>>> Can GCC provide some compiler option where .word values are located into a
>>> specific area?
>>>
>>> ~Pratyush
>>
>> Hi Dave and Pratyush,
>>
>> Expecting the instruction to the stp x29, x30, [sp,...] would be pretty fragile.  The compiler might not generate that for some very simple function or with certain types of optimization. If the compiler could generate a sentinel word before the start of each function that might be a more robust solution.  Maybe something like a breakpoint instruction or something that clearly would not be in an atomic region.
>>
> 
> I think this is still a reasonable improvement.  The case of both heuristics failing together has to be pretty rare and the result is to make the safer choice.
> 
> I'm looking at what gcc might provide to help.  I made need to talk to a compiler expert though, I've always found the gcc option list a bit overwhelming.
> 
>> -Will

Hi Dave,

Yes, I know what you mean about gcc options.  I am asking some the RH gcc people to see if they know of anything like that.

This additional heuristic should have a comment in there about what exactly that magic set of bits should be matching.  I assume that it is:
"stp x29, x30, [sp, #xxx]!".  Is that correct?

Once the kernel patched with it is built. I will give it a try. Thanks,

-Will

>>>
>>>>
>>>> -dl
>>>>
>>>
>>>> diff --git a/arch/arm64/kernel/kprobes-arm64.c b/arch/arm64/kernel/kprobes-arm64.c
>>>> index 28b9c5b..36b4ea5 100644
>>>> --- a/arch/arm64/kernel/kprobes-arm64.c
>>>> +++ b/arch/arm64/kernel/kprobes-arm64.c
>>>> @@ -127,7 +127,9 @@ is_probed_address_atomic(kprobe_opcode_t *scan_start, kprobe_opcode_t *scan_end)
>>>>            * atomic region starts from exclusive load and ends with
>>>>            * exclusive store.
>>>>            */
>>>> -        if (aarch64_insn_is_store_ex(le32_to_cpu(*scan_start)))
>>>> +        if ((le32_to_cpu(*scan_start) & 0xffc07fff) == 0xa9807bfd)
>>>> +            return false;
>>>> +        else if (aarch64_insn_is_store_ex(le32_to_cpu(*scan_start)))
>>>>               return false;
>>>>           else if (aarch64_insn_is_load_ex(le32_to_cpu(*scan_start)))
>>>>               return true;
>>>
>>
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-06-10 14:20       ` David Long
  2016-06-10 15:11         ` William Cohen
@ 2016-06-10 17:07         ` Pratyush Anand
  1 sibling, 0 replies; 56+ messages in thread
From: Pratyush Anand @ 2016-06-10 17:07 UTC (permalink / raw)
  To: David Long; +Cc: William Cohen, systemtap, Mark Brown

On 10/06/2016:10:20:37 AM, David Long wrote:
> On 06/10/2016 09:42 AM, Pratyush Anand wrote:
> > On 10/06/2016:01:49:10 AM, David Long wrote:
> > > Attached are incremental diffs I hope will fix the latest systemtap
> > > failures, without abandoning atomic sequence checking.  I'm trying to avoid
> > > the hex constants but I don't think the insn.c functions help in this case.
> > 
> > It will save us from current problem by checking "stp x29,x30,[sp,...]"
> > instruction and returning false if matches. However, we will have to find some
> > way to recognize .word instructions.
> > 
> > * An assembly function may not start with "stp x29,x30,[sp,...]", e.g.
> >   __dma_map_area(), _cpu_resume etc. However, it could be least likely that a
> >   .word instruction exists before start of assembly function and that too
> >   contains a word value which could be misleading.
> > 
> > * But major issue is, what if someone instruments a kprobe at an address which
> >   contains  .word values. Instruction will never hit, so probe function will not
> >   be called, but when real code reads that .word value, it reads a wrong value.
> > 
> 
> I had considered the assembler routine case but my take on it is that all of
> this is just a best effort heuristic attempt to prevent someone from
> kprobe'ing a kernel to death.  I don't hold out any hope for making this
> bullet-proof.  The mode of failure for the atomic sequence is the safer
> choice (rejecting probe registration) so I'm not that worried about the rare
> case of this happening.  Probing inline data doesn't seem like something we
> can protect from, although we now do blacklist some more data sections.

Sure, I agree that we go with what you have suggested. I was just thinking if we
can take it with GCC people to improve it further in future.

> 
> > Can GCC provide some compiler option where .word values are located into a
> > specific area?
> > 
> 
> You can't just go moving the effect of .word directives into a new
> location/section.  As likely as not that data (which could be an actual
> instruction) needs to be exactly where they were put in the source.

Yes, yes, I meant then compiler will have to modify the offset in instruction
using .word data as well, and offcourse offset has limited range, so .word can
be placed only in those limited regions. I do not have any idea about GCC
implementation, so I do not say that this could be the best way of identifying
.word instructions.

~Pratyush

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-06-09 16:17 exercising current aarch64 kprobe support with systemtap William Cohen
  2016-06-09 19:52 ` William Cohen
@ 2016-06-10 21:28 ` William Cohen
  2016-06-10 21:37   ` William Cohen
                     ` (2 more replies)
  2016-06-13 16:11 ` William Cohen
  2 siblings, 3 replies; 56+ messages in thread
From: William Cohen @ 2016-06-10 21:28 UTC (permalink / raw)
  To: systemtap, Dave Long, Pratyush Anand, Mark Brown; +Cc: Jeremy Linton

On 06/09/2016 12:17 PM, William Cohen wrote:
> I have been exercising the current kprobes and uprobe patches for
> arm64 that are in the test_upstream_arm64_devel branch of
> https://github.com/pratyushanand/linux with systemtap.  There are a
> two issues that I have seen on this kernel with systemtap.  There are
> some cases where kprobes fail to register at places that appear to be
> reasonable places for a kprobe.  The other issue is that kernel starts
> having soft lockups when the hw_watch_addr.stp tests runs.  To get
> systemtap with the newer kernels need the attached hack because of
> changes in the aarch64 macro args.
...
> Soft Lookup for the hw_watch_addr.stp
> 
> When running the hw_watch_addr.stp tests the machine gets a number of
> processes using a lot of sys time and eventually the kernel reports
> soft lockup:
> 
> http://paste.stg.fedoraproject.org/5323/
> 
> The systemtap.base/overload.exp tests all pass, but maybe there is
> much work being done to generate the backtraces for hw_watch_addr.stp
> and that is triggering the problem.

I can reliably reproduce the soft lockup running a single test with:

/root/systemtap_write/install/bin/stap --all-modules \
/root/systemtap_write/systemtap/testsuite/systemtap.examples/memory/hw_watch_addr.stp \
0x`grep "vm_dirty_ratio" /proc/kallsyms | awk '{print $1}'` -T 5 > /dev/null

paste of output and soft lockup at: http://paste.stg.fedoraproject.org/5324/

One of the things that Jeremy Linton pointed to was:

https://lkml.org/lkml/2016/3/21/198

Could the aarch64 hardware watchpoint handler have an issue that is causing this problem with the soft lockup?
Or spending too much time doing the stack backtrace?

-Will

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-06-10 21:28 ` William Cohen
@ 2016-06-10 21:37   ` William Cohen
  2016-06-13  4:28   ` Pratyush Anand
  2016-06-22 20:24   ` William Cohen
  2 siblings, 0 replies; 56+ messages in thread
From: William Cohen @ 2016-06-10 21:37 UTC (permalink / raw)
  To: systemtap, Dave Long, Pratyush Anand, Mark Brown; +Cc: Jeremy Linton

On 06/10/2016 05:28 PM, William Cohen wrote:
> On 06/09/2016 12:17 PM, William Cohen wrote:
>> I have been exercising the current kprobes and uprobe patches for
>> arm64 that are in the test_upstream_arm64_devel branch of
>> https://github.com/pratyushanand/linux with systemtap.  There are a
>> two issues that I have seen on this kernel with systemtap.  There are
>> some cases where kprobes fail to register at places that appear to be
>> reasonable places for a kprobe.  The other issue is that kernel starts
>> having soft lockups when the hw_watch_addr.stp tests runs.  To get
>> systemtap with the newer kernels need the attached hack because of
>> changes in the aarch64 macro args.
> ...
>> Soft Lookup for the hw_watch_addr.stp
>>
>> When running the hw_watch_addr.stp tests the machine gets a number of
>> processes using a lot of sys time and eventually the kernel reports
>> soft lockup:
>>
>> http://paste.stg.fedoraproject.org/5323/
>>
>> The systemtap.base/overload.exp tests all pass, but maybe there is
>> much work being done to generate the backtraces for hw_watch_addr.stp
>> and that is triggering the problem.
> 
> I can reliably reproduce the soft lockup running a single test with:
> 
> /root/systemtap_write/install/bin/stap --all-modules \
> /root/systemtap_write/systemtap/testsuite/systemtap.examples/memory/hw_watch_addr.stp \
> 0x`grep "vm_dirty_ratio" /proc/kallsyms | awk '{print $1}'` -T 5 > /dev/null
> 
> paste of output and soft lockup at: http://paste.stg.fedoraproject.org/5324/
> 
> One of the things that Jeremy Linton pointed to was:
> 
> https://lkml.org/lkml/2016/3/21/198
> 
> Could the aarch64 hardware watchpoint handler have an issue that is causing this problem with the soft lockup?
> Or spending too much time doing the stack backtrace?
> 
> -Will
> 

The soft lockup remains even if when the stabk backtrace in systemtap script is disabled: http://paste.stg.fedoraproject.org/5325/

-Will

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-06-10 21:28 ` William Cohen
  2016-06-10 21:37   ` William Cohen
@ 2016-06-13  4:28   ` Pratyush Anand
  2016-06-13 13:42     ` William Cohen
  2016-06-22 20:24   ` William Cohen
  2 siblings, 1 reply; 56+ messages in thread
From: Pratyush Anand @ 2016-06-13  4:28 UTC (permalink / raw)
  To: William Cohen; +Cc: systemtap, Dave Long, Mark Brown, Jeremy Linton

Hi Will,

On 10/06/2016:05:28:36 PM, William Cohen wrote:
> On 06/09/2016 12:17 PM, William Cohen wrote:
> > I have been exercising the current kprobes and uprobe patches for
> > arm64 that are in the test_upstream_arm64_devel branch of
> > https://github.com/pratyushanand/linux with systemtap.  There are a
> > two issues that I have seen on this kernel with systemtap.  There are
> > some cases where kprobes fail to register at places that appear to be
> > reasonable places for a kprobe.  The other issue is that kernel starts
> > having soft lockups when the hw_watch_addr.stp tests runs.  To get
> > systemtap with the newer kernels need the attached hack because of
> > changes in the aarch64 macro args.
> ...
> > Soft Lookup for the hw_watch_addr.stp
> > 
> > When running the hw_watch_addr.stp tests the machine gets a number of
> > processes using a lot of sys time and eventually the kernel reports
> > soft lockup:
> > 
> > http://paste.stg.fedoraproject.org/5323/
> > 
> > The systemtap.base/overload.exp tests all pass, but maybe there is
> > much work being done to generate the backtraces for hw_watch_addr.stp
> > and that is triggering the problem.
> 
> I can reliably reproduce the soft lockup running a single test with:
> 
> /root/systemtap_write/install/bin/stap --all-modules \
> /root/systemtap_write/systemtap/testsuite/systemtap.examples/memory/hw_watch_addr.stp \
> 0x`grep "vm_dirty_ratio" /proc/kallsyms | awk '{print $1}'` -T 5 > /dev/null
> 
> paste of output and soft lockup at: http://paste.stg.fedoraproject.org/5324/
> 
> One of the things that Jeremy Linton pointed to was:
> 
> https://lkml.org/lkml/2016/3/21/198

Now we have following in arch_within_kprobe_blacklist(). So above issue should
not bite us.

+           !!search_exception_tables(addr))
+               return true;

> 
> Could the aarch64 hardware watchpoint handler have an issue that is causing this problem with the soft lockup?
> Or spending too much time doing the stack backtrace?

Not sure, could be the locked up CPU waiting for a lock (spinlock), which is not
being released. Just noticed that, backtrace of all active CPUs (`echo l >
/proc/sysrq-trigger`) is not working for arm64. Probably because, we do not have
arch_trigger_all_cpu_backtrace() defined for aarch64. May be we can have one,
like that of arm. Backtrace of CPUs in this state might give us some input.

~Pratyush

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-06-13  4:28   ` Pratyush Anand
@ 2016-06-13 13:42     ` William Cohen
  0 siblings, 0 replies; 56+ messages in thread
From: William Cohen @ 2016-06-13 13:42 UTC (permalink / raw)
  To: Pratyush Anand; +Cc: systemtap, Dave Long, Mark Brown, Jeremy Linton

[-- Attachment #1: Type: text/plain, Size: 2964 bytes --]

On 06/13/2016 12:27 AM, Pratyush Anand wrote:
> Hi Will,
> 
> On 10/06/2016:05:28:36 PM, William Cohen wrote:
>> On 06/09/2016 12:17 PM, William Cohen wrote:
>>> I have been exercising the current kprobes and uprobe patches for
>>> arm64 that are in the test_upstream_arm64_devel branch of
>>> https://github.com/pratyushanand/linux with systemtap.  There are a
>>> two issues that I have seen on this kernel with systemtap.  There are
>>> some cases where kprobes fail to register at places that appear to be
>>> reasonable places for a kprobe.  The other issue is that kernel starts
>>> having soft lockups when the hw_watch_addr.stp tests runs.  To get
>>> systemtap with the newer kernels need the attached hack because of
>>> changes in the aarch64 macro args.
>> ...
>>> Soft Lookup for the hw_watch_addr.stp
>>>
>>> When running the hw_watch_addr.stp tests the machine gets a number of
>>> processes using a lot of sys time and eventually the kernel reports
>>> soft lockup:
>>>
>>> http://paste.stg.fedoraproject.org/5323/
>>>
>>> The systemtap.base/overload.exp tests all pass, but maybe there is
>>> much work being done to generate the backtraces for hw_watch_addr.stp
>>> and that is triggering the problem.
>>
>> I can reliably reproduce the soft lockup running a single test with:
>>
>> /root/systemtap_write/install/bin/stap --all-modules \
>> /root/systemtap_write/systemtap/testsuite/systemtap.examples/memory/hw_watch_addr.stp \
>> 0x`grep "vm_dirty_ratio" /proc/kallsyms | awk '{print $1}'` -T 5 > /dev/null
>>
>> paste of output and soft lockup at: http://paste.stg.fedoraproject.org/5324/
>>
>> One of the things that Jeremy Linton pointed to was:
>>
>> https://lkml.org/lkml/2016/3/21/198
> 
> Now we have following in arch_within_kprobe_blacklist(). So above issue should
> not bite us.
> 
> +           !!search_exception_tables(addr))
> +               return true;
> 
>>
>> Could the aarch64 hardware watchpoint handler have an issue that is causing this problem with the soft lockup?
>> Or spending too much time doing the stack backtrace?
> 
> Not sure, could be the locked up CPU waiting for a lock (spinlock), which is not
> being released. Just noticed that, backtrace of all active CPUs (`echo l >
> /proc/sysrq-trigger`) is not working for arm64. Probably because, we do not have
> arch_trigger_all_cpu_backtrace() defined for aarch64. May be we can have one,
> like that of arm. Backtrace of CPUs in this state might give us some input.
> 
> ~Pratyush
> 

Hi Pratyush,

I did some additional experimentation this weekend.  The version of systemtap script with an empty body (the attached hw_watch_addr_null2.stp) still caused the system to have soft lockup.  However, the equivalent perf use of the hardware watchpoint worked fine (it got counts and no soft lookup):

 perf stat -a -e mem:0x`grep "vm_dirty_ratio" /proc/kallsyms | awk '{print $1}'`/1 bash

So it looks like the issue might lie with something in systemtap.

-Will

[-- Attachment #2: hw_watch_addr_null2.stp --]
[-- Type: text/plain, Size: 111 bytes --]

#! /usr/bin/env stap

%( CONFIG_HAVE_HW_BREAKPOINT == "y" %?
probe kernel.data($1).rw
{
}
%:
probe never {}
%)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-06-09 16:17 exercising current aarch64 kprobe support with systemtap William Cohen
  2016-06-09 19:52 ` William Cohen
  2016-06-10 21:28 ` William Cohen
@ 2016-06-13 16:11 ` William Cohen
  2016-06-13 16:15   ` William Cohen
  2016-06-14  4:27   ` Pratyush Anand
  2 siblings, 2 replies; 56+ messages in thread
From: William Cohen @ 2016-06-13 16:11 UTC (permalink / raw)
  To: systemtap, Dave Long, Pratyush Anand, Mark Brown

I dummied up the hw_addr_*.stp tests to not run.  The test made it further, but then got stuck spewing out:

[ 1648.037580] Unexpected kernel single-step exception at EL1
[ 1648.043060] Unexpected kernel single-step exception at EL1
[ 1648.048540] Unexpected kernel single-step exception at EL1


This happens during the "systemtap.onthefly/kprobes_onthefly.exp" tests  and can be reliably triggered running that portion of the systemtap tests with:

make installcheck RUNTESTFLAGS="--debug systemtap.onthefly/kprobes_onthefly.exp"


Seems like the tests get past the following and then start spewing the error message:

Executing: kill -KILL 22311
kill: kill: sending signal to 22311 failed: No such process
PASS: kprobes_onthefly - otf_stress_hard_iter_2000 (survived)

However the testsuite doesn't seem to make it through to print out the next test:

PASS: hrtimer_onthefly - otf_stress_max_iter_5000 (survived)

Note that this kernel (clone of https://github.com/pratyushanand/linux on test_upstream_arm64_devel branch) does have the patch to avoid having the atomic region search go before the start of a function by look for the "stp x29, x30, [sp, -#xx]!"

-Will

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-06-13 16:11 ` William Cohen
@ 2016-06-13 16:15   ` William Cohen
  2016-06-14  4:27   ` Pratyush Anand
  1 sibling, 0 replies; 56+ messages in thread
From: William Cohen @ 2016-06-13 16:15 UTC (permalink / raw)
  To: systemtap, Dave Long, Pratyush Anand, Mark Brown

On 06/13/2016 12:10 PM, William Cohen wrote:
> I dummied up the hw_addr_*.stp tests to not run.  The test made it further, but then got stuck spewing out:
> 
> [ 1648.037580] Unexpected kernel single-step exception at EL1
> [ 1648.043060] Unexpected kernel single-step exception at EL1
> [ 1648.048540] Unexpected kernel single-step exception at EL1

I was able to get the start of when it starting spewning the single-step exception at EL1:

[root@amd-seattle-03 systemtap]# [  793.930801] Scheduler tracepoints stat_sleep, stat_iowait, stat_blocked and stat_runtime require the kernel parameter schedstats=enabled or kernel.sched_schedstats=1
[  793.965896] hrtimer: interrupt took 422795 ns
[  887.063206] ------------[ cut here ]------------
[  887.067856] WARNING: CPU: 1 PID: 21315 at mm/page_counter.c:26 page_counter_cancel+0x5c/0x68
[  887.076288] Modules linked in: stap_e307cc4760ef17e67f4e7cdd288c9472_1_21004(OE) vfat fat amd_xgbe i2c_designware_platform ipmi_si ptp i2c_designware_core i2c_core spi_pl022 sbsa_gwdt ccp ipmi_msghandler pps_core crc32_arm64 ghash_ce nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c [last unloaded: stap_d8856f77dba73cfbeacac8b8ae0b9f60__20838]
[  887.107716] 
[  887.109202] CPU: 1 PID: 21315 Comm: echo Tainted: G        W IOE   4.7.0-rc1panand+ #5
[  887.117123] Hardware name: AMD Overdrive/Supercharger/Default string, BIOS ROD1001A 02/09/2016
[  887.125738] Unexpected kernel single-step exception at EL1
[  887.131241] Unexpected kernel single-step exception at EL1


-Will
> 
> 
> This happens during the "systemtap.onthefly/kprobes_onthefly.exp" tests  and can be reliably triggered running that portion of the systemtap tests with:
> 
> make installcheck RUNTESTFLAGS="--debug systemtap.onthefly/kprobes_onthefly.exp"
> 
> 
> Seems like the tests get past the following and then start spewing the error message:
> 
> Executing: kill -KILL 22311
> kill: kill: sending signal to 22311 failed: No such process
> PASS: kprobes_onthefly - otf_stress_hard_iter_2000 (survived)
> 
> However the testsuite doesn't seem to make it through to print out the next test:
> 
> PASS: hrtimer_onthefly - otf_stress_max_iter_5000 (survived)
> 
> Note that this kernel (clone of https://github.com/pratyushanand/linux on test_upstream_arm64_devel branch) does have the patch to avoid having the atomic region search go before the start of a function by look for the "stp x29, x30, [sp, -#xx]!"
> 
> -Will
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-06-13 16:11 ` William Cohen
  2016-06-13 16:15   ` William Cohen
@ 2016-06-14  4:27   ` Pratyush Anand
  1 sibling, 0 replies; 56+ messages in thread
From: Pratyush Anand @ 2016-06-14  4:27 UTC (permalink / raw)
  To: William Cohen; +Cc: systemtap, Dave Long, Mark Brown

On 13/06/2016:12:10:58 PM, William Cohen wrote:
> I dummied up the hw_addr_*.stp tests to not run.  The test made it further, but then got stuck spewing out:
> 
> [ 1648.037580] Unexpected kernel single-step exception at EL1
> [ 1648.043060] Unexpected kernel single-step exception at EL1
> [ 1648.048540] Unexpected kernel single-step exception at EL1
> 
> 
> This happens during the "systemtap.onthefly/kprobes_onthefly.exp" tests  and can be reliably triggered running that portion of the systemtap tests with:
> 
> make installcheck RUNTESTFLAGS="--debug systemtap.onthefly/kprobes_onthefly.exp"
> 
> 
> Seems like the tests get past the following and then start spewing the error message:
> 
> Executing: kill -KILL 22311
> kill: kill: sending signal to 22311 failed: No such process
> PASS: kprobes_onthefly - otf_stress_hard_iter_2000 (survived)
> 
> However the testsuite doesn't seem to make it through to print out the next test:
> 
> PASS: hrtimer_onthefly - otf_stress_max_iter_5000 (survived)
> 
> Note that this kernel (clone of https://github.com/pratyushanand/linux on test_upstream_arm64_devel branch) does have the patch to avoid having the atomic region search go before the start of a function by look for the "stp x29, x30, [sp, -#xx]!"

Hi Will,

No, it should not have that modification yet.

~Pratyush
> 
> -Will

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-06-10 21:28 ` William Cohen
  2016-06-10 21:37   ` William Cohen
  2016-06-13  4:28   ` Pratyush Anand
@ 2016-06-22 20:24   ` William Cohen
  2016-06-23  3:19     ` David Long
  2 siblings, 1 reply; 56+ messages in thread
From: William Cohen @ 2016-06-22 20:24 UTC (permalink / raw)
  To: systemtap, Dave Long, Pratyush Anand, Mark Brown
  Cc: Jeremy Linton, David Smith

Hi all,

When running the current systemtap checked out from the git repository
and a locally built kernel with the kprobes64-v13 patches (the
test_upstream_arm64_devel branch of
https://github.com/pratyushanand/linux) on Fedora 23 machine one of
the kprobes_onthefly.exp tests is causing the machine to get in a
state that requires rebooting to fix.  This can be triggered by running a
portion of the systemtap tests with:

 make installcheck RUNTESTFLAGS="--debug systemtap.onthefly/kprobes_onthefly.exp"

When it gets to the kprobes_onthefly - otf_stress_max_iter_5000 test the
console starts spewing the following and needs to be rebooted:

[23394.036860] Unexpected kernel single-step exception at EL1
[23394.042434] Unexpected kernel single-step exception at EL1
[23394.048008] Unexpected kernel single-step exception at EL1
[23394.053541] Unexpected kernel single-step exception at EL1
[23394.059053] Unexpected kernel single-step exception at EL1
[23394.064545] Unexpected kernel single-step exception at EL1

Sorry I don't have the start of the failure it scrolled off the screen very quickly.

-Will


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-06-22 20:24   ` William Cohen
@ 2016-06-23  3:19     ` David Long
  2016-06-23 13:42       ` William Cohen
  2016-06-23 15:49       ` William Cohen
  0 siblings, 2 replies; 56+ messages in thread
From: David Long @ 2016-06-23  3:19 UTC (permalink / raw)
  To: William Cohen, systemtap, Pratyush Anand, Mark Brown
  Cc: Jeremy Linton, David Smith

On 06/22/2016 04:24 PM, William Cohen wrote:
> Hi all,
>
> When running the current systemtap checked out from the git repository
> and a locally built kernel with the kprobes64-v13 patches (the
> test_upstream_arm64_devel branch of
> https://github.com/pratyushanand/linux) on Fedora 23 machine one of
> the kprobes_onthefly.exp tests is causing the machine to get in a
> state that requires rebooting to fix.  This can be triggered by running a
> portion of the systemtap tests with:
>
>   make installcheck RUNTESTFLAGS="--debug systemtap.onthefly/kprobes_onthefly.exp"
>
> When it gets to the kprobes_onthefly - otf_stress_max_iter_5000 test the
> console starts spewing the following and needs to be rebooted:
>
> [23394.036860] Unexpected kernel single-step exception at EL1
> [23394.042434] Unexpected kernel single-step exception at EL1
> [23394.048008] Unexpected kernel single-step exception at EL1
> [23394.053541] Unexpected kernel single-step exception at EL1
> [23394.059053] Unexpected kernel single-step exception at EL1
> [23394.064545] Unexpected kernel single-step exception at EL1
>
> Sorry I don't have the start of the failure it scrolled off the screen very quickly.
>
> -Will
>
>

I'll take a look and see what I can figure out.

In the meantime I did just push a v14 branch.  I'm doubtful that it will 
address the above problem even though it contains a few bug fixes.

-dl

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-06-23  3:19     ` David Long
@ 2016-06-23 13:42       ` William Cohen
  2016-06-23 13:47         ` David Smith
  2016-06-23 15:49       ` William Cohen
  1 sibling, 1 reply; 56+ messages in thread
From: William Cohen @ 2016-06-23 13:42 UTC (permalink / raw)
  To: David Long, systemtap, Pratyush Anand, Mark Brown
  Cc: Jeremy Linton, David Smith

On 06/22/2016 11:18 PM, David Long wrote:
> On 06/22/2016 04:24 PM, William Cohen wrote:
>> Hi all,
>>
>> When running the current systemtap checked out from the git repository
>> and a locally built kernel with the kprobes64-v13 patches (the
>> test_upstream_arm64_devel branch of
>> https://github.com/pratyushanand/linux) on Fedora 23 machine one of
>> the kprobes_onthefly.exp tests is causing the machine to get in a
>> state that requires rebooting to fix.  This can be triggered by running a
>> portion of the systemtap tests with:
>>
>>   make installcheck RUNTESTFLAGS="--debug systemtap.onthefly/kprobes_onthefly.exp"
>>
>> When it gets to the kprobes_onthefly - otf_stress_max_iter_5000 test the
>> console starts spewing the following and needs to be rebooted:
>>
>> [23394.036860] Unexpected kernel single-step exception at EL1
>> [23394.042434] Unexpected kernel single-step exception at EL1
>> [23394.048008] Unexpected kernel single-step exception at EL1
>> [23394.053541] Unexpected kernel single-step exception at EL1
>> [23394.059053] Unexpected kernel single-step exception at EL1
>> [23394.064545] Unexpected kernel single-step exception at EL1
>>
>> Sorry I don't have the start of the failure it scrolled off the screen very quickly.
>>
>> -Will
>>
>>
> 
> I'll take a look and see what I can figure out.
> 
> In the meantime I did just push a v14 branch.  I'm doubtful that it will address the above problem even though it contains a few bug fixes.
> 
> -dl
> 

Hi Dave,

I have pulled the v14 branch and have it built, and gave it a try.


When running the tests on various kernels I typically see message about the hrtimer interrupt like the following during the run:

...
Running /root/systemtap_write/systemtap/testsuite/systemtap.onthefly/kprobes_onthefly.exp ...
[  197.217699] stap_ab30d3b6ca1d8e512f40980479fd57e_18018: module verification failed: signature and/or required key missing - tainting kernel
[  534.166223] Scheduler tracepoints stat_sleep, stat_iowait, stat_blocked and stat_runtime require the kernel parameter schedstats=enabled or kernel.sched_schedstats=1
[  534.205668] hrtimer: interrupt took 241848 ns
...


Hmmm, it successfully completed the run with the v14 branch four times.

# make installcheck RUNTESTFLAGS="--debug systemtap.onthefly/kprobes_onthefly.exp"
...
tail systemtap.sum
PASS: kprobes_onthefly - otf_stress_1ms_iter_50 (survived)
PASS: kprobes_onthefly - otf_stress_500us_iter_50 (survived)
PASS: kprobes_onthefly - otf_stress_100us_iter_50 (survived)
PASS: kprobes_onthefly - otf_stress_prof_iter_2000 (survived)
PASS: kprobes_onthefly - otf_stress_hard_iter_2000 (survived)
PASS: kprobes_onthefly - otf_stress_max_iter_5000 (survived)

		=== systemtap Summary ===

# of expected passes		22



What were the bug fixes and changes in v14?  Alternatively could there be something else in the kernel I was using previously (e.g. the uprobes patches).

-Will


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-06-23 13:42       ` William Cohen
@ 2016-06-23 13:47         ` David Smith
  0 siblings, 0 replies; 56+ messages in thread
From: David Smith @ 2016-06-23 13:47 UTC (permalink / raw)
  To: William Cohen, David Long, systemtap, Pratyush Anand, Mark Brown
  Cc: Jeremy Linton

On 06/23/2016 08:42 AM, William Cohen wrote:

... stuff deleted ...

> When running the tests on various kernels I typically see message about the hrtimer
> interrupt like the following during the run:
> 
> ...
> Running /root/systemtap_write/systemtap/testsuite/systemtap.onthefly/kprobes_onthefly.exp ...
> [  197.217699] stap_ab30d3b6ca1d8e512f40980479fd57e_18018: module verification failed: signature and/or required key missing - tainting kernel
> [  534.166223] Scheduler tracepoints stat_sleep, stat_iowait, stat_blocked and stat_runtime require the kernel parameter schedstats=enabled or kernel.sched_schedstats=1
> [  534.205668] hrtimer: interrupt took 241848 ns
> ...

The hrtimer interrupt message is PR20286. This appears on more than just
aarch64. Basically the hrtimer probe handler took too long and the next
one has been skipped. We decided there wasn't really anything we could do.

<https://sourceware.org/bugzilla/show_bug.cgi?id=20286>

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-06-23  3:19     ` David Long
  2016-06-23 13:42       ` William Cohen
@ 2016-06-23 15:49       ` William Cohen
  2016-06-23 18:26         ` David Long
  1 sibling, 1 reply; 56+ messages in thread
From: William Cohen @ 2016-06-23 15:49 UTC (permalink / raw)
  To: David Long, systemtap, Pratyush Anand, Mark Brown
  Cc: Jeremy Linton, David Smith

On 06/22/2016 11:18 PM, David Long wrote:
> On 06/22/2016 04:24 PM, William Cohen wrote:
>> Hi all,
>>
>> When running the current systemtap checked out from the git repository
>> and a locally built kernel with the kprobes64-v13 patches (the
>> test_upstream_arm64_devel branch of
>> https://github.com/pratyushanand/linux) on Fedora 23 machine one of
>> the kprobes_onthefly.exp tests is causing the machine to get in a
>> state that requires rebooting to fix.  This can be triggered by running a
>> portion of the systemtap tests with:
>>
>>   make installcheck RUNTESTFLAGS="--debug systemtap.onthefly/kprobes_onthefly.exp"
>>
>> When it gets to the kprobes_onthefly - otf_stress_max_iter_5000 test the
>> console starts spewing the following and needs to be rebooted:
>>
>> [23394.036860] Unexpected kernel single-step exception at EL1
>> [23394.042434] Unexpected kernel single-step exception at EL1
>> [23394.048008] Unexpected kernel single-step exception at EL1
>> [23394.053541] Unexpected kernel single-step exception at EL1
>> [23394.059053] Unexpected kernel single-step exception at EL1
>> [23394.064545] Unexpected kernel single-step exception at EL1
>>
>> Sorry I don't have the start of the failure it scrolled off the screen very quickly.
>>
>> -Will
>>
>>
> 
> I'll take a look and see what I can figure out.
> 
> In the meantime I did just push a v14 branch.  I'm doubtful that it will address the above problem even though it contains a few bug fixes.
> 
> -dl
> 

Hi Dave and Pratyush,

I tried the kprobes64-v13 kernel and it also seems to work, so it lookw like the problem might be in the the
test_upstream_arm64_devel branch of https://github.com/pratyushanand/linux .

-Will

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-06-23 15:49       ` William Cohen
@ 2016-06-23 18:26         ` David Long
  2016-06-23 19:22           ` William Cohen
  0 siblings, 1 reply; 56+ messages in thread
From: David Long @ 2016-06-23 18:26 UTC (permalink / raw)
  To: William Cohen, Pratyush Anand
  Cc: systemtap, Mark Brown, Jeremy Linton, David Smith

On 06/23/2016 11:49 AM, William Cohen wrote:
> On 06/22/2016 11:18 PM, David Long wrote:
>> On 06/22/2016 04:24 PM, William Cohen wrote:
>>> Hi all,
>>>
>>> When running the current systemtap checked out from the git repository
>>> and a locally built kernel with the kprobes64-v13 patches (the
>>> test_upstream_arm64_devel branch of
>>> https://github.com/pratyushanand/linux) on Fedora 23 machine one of
>>> the kprobes_onthefly.exp tests is causing the machine to get in a
>>> state that requires rebooting to fix.  This can be triggered by running a
>>> portion of the systemtap tests with:
>>>
>>>    make installcheck RUNTESTFLAGS="--debug systemtap.onthefly/kprobes_onthefly.exp"
>>>
>>> When it gets to the kprobes_onthefly - otf_stress_max_iter_5000 test the
>>> console starts spewing the following and needs to be rebooted:
>>>
>>> [23394.036860] Unexpected kernel single-step exception at EL1
>>> [23394.042434] Unexpected kernel single-step exception at EL1
>>> [23394.048008] Unexpected kernel single-step exception at EL1
>>> [23394.053541] Unexpected kernel single-step exception at EL1
>>> [23394.059053] Unexpected kernel single-step exception at EL1
>>> [23394.064545] Unexpected kernel single-step exception at EL1
>>>
>>> Sorry I don't have the start of the failure it scrolled off the screen very quickly.
>>>
>>> -Will
>>>
>>>
>>
>> I'll take a look and see what I can figure out.
>>
>> In the meantime I did just push a v14 branch.  I'm doubtful that it will address the above problem even though it contains a few bug fixes.
>>
>> -dl
>>
>
> Hi Dave and Pratyush,
>
> I tried the kprobes64-v13 kernel and it also seems to work, so it lookw like the problem might be in the the
> test_upstream_arm64_devel branch of https://github.com/pratyushanand/linux .
>
> -Will
>

I'm going to interpret that as meaning you know of no problem in the 
kprobes v14 patch that would give me pause to email it upstream.  Do you 
disagree?

-dl

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-06-23 18:26         ` David Long
@ 2016-06-23 19:22           ` William Cohen
  2016-06-27  2:57             ` David Long
  2016-06-27 14:18             ` Pratyush Anand
  0 siblings, 2 replies; 56+ messages in thread
From: William Cohen @ 2016-06-23 19:22 UTC (permalink / raw)
  To: David Long, Pratyush Anand
  Cc: systemtap, Mark Brown, Jeremy Linton, David Smith

On 06/23/2016 02:26 PM, David Long wrote:
> On 06/23/2016 11:49 AM, William Cohen wrote:
>> On 06/22/2016 11:18 PM, David Long wrote:
>>> On 06/22/2016 04:24 PM, William Cohen wrote:
>>>> Hi all,
>>>>
>>>> When running the current systemtap checked out from the git repository
>>>> and a locally built kernel with the kprobes64-v13 patches (the
>>>> test_upstream_arm64_devel branch of
>>>> https://github.com/pratyushanand/linux) on Fedora 23 machine one of
>>>> the kprobes_onthefly.exp tests is causing the machine to get in a
>>>> state that requires rebooting to fix.  This can be triggered by running a
>>>> portion of the systemtap tests with:
>>>>
>>>>    make installcheck RUNTESTFLAGS="--debug systemtap.onthefly/kprobes_onthefly.exp"
>>>>
>>>> When it gets to the kprobes_onthefly - otf_stress_max_iter_5000 test the
>>>> console starts spewing the following and needs to be rebooted:
>>>>
>>>> [23394.036860] Unexpected kernel single-step exception at EL1
>>>> [23394.042434] Unexpected kernel single-step exception at EL1
>>>> [23394.048008] Unexpected kernel single-step exception at EL1
>>>> [23394.053541] Unexpected kernel single-step exception at EL1
>>>> [23394.059053] Unexpected kernel single-step exception at EL1
>>>> [23394.064545] Unexpected kernel single-step exception at EL1
>>>>
>>>> Sorry I don't have the start of the failure it scrolled off the screen very quickly.
>>>>
>>>> -Will
>>>>
>>>>
>>>
>>> I'll take a look and see what I can figure out.
>>>
>>> In the meantime I did just push a v14 branch.  I'm doubtful that it will address the above problem even though it contains a few bug fixes.
>>>
>>> -dl
>>>
>>
>> Hi Dave and Pratyush,
>>
>> I tried the kprobes64-v13 kernel and it also seems to work, so it lookw like the problem might be in the the
>> test_upstream_arm64_devel branch of https://github.com/pratyushanand/linux .
>>
>> -Will
>>
> 
> I'm going to interpret that as meaning you know of no problem in the kprobes v14 patch that would give me pause to email it upstream.  Do you disagree?
> 
> -dl
> 

Hi Dave,

Yes, the problem only seems to be in that other kernel from https://github.com/pratyushanand/linux with the kprobe and uprobe patches, so the arm64 patches do not appear to be the problem.  I don't know what is causing the problem  maybe there is something going on with the porting of the patches to that kernel or the patches included in there (uprobes/kexec) in there. 

-Will

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-06-23 19:22           ` William Cohen
@ 2016-06-27  2:57             ` David Long
  2016-06-27 14:18             ` Pratyush Anand
  1 sibling, 0 replies; 56+ messages in thread
From: David Long @ 2016-06-27  2:57 UTC (permalink / raw)
  To: William Cohen, Pratyush Anand
  Cc: systemtap, Mark Brown, Jeremy Linton, David Smith

I've updated my kprobes64-v14 branch one last time, although the most 
recent differences are mostly cosmetic.  I'm about to send out the patch.

-dl


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-06-23 19:22           ` William Cohen
  2016-06-27  2:57             ` David Long
@ 2016-06-27 14:18             ` Pratyush Anand
  2016-06-28  3:20               ` William Cohen
  1 sibling, 1 reply; 56+ messages in thread
From: Pratyush Anand @ 2016-06-27 14:18 UTC (permalink / raw)
  To: William Cohen
  Cc: David Long, systemtap, Mark Brown, Jeremy Linton, David Smith

Hi Will,

On 23/06/2016:03:22:44 PM, William Cohen wrote:
> On 06/23/2016 02:26 PM, David Long wrote:
> > On 06/23/2016 11:49 AM, William Cohen wrote:
> >> On 06/22/2016 11:18 PM, David Long wrote:
> >>> On 06/22/2016 04:24 PM, William Cohen wrote:
> >>>> Hi all,
> >>>>
> >>>> When running the current systemtap checked out from the git repository
> >>>> and a locally built kernel with the kprobes64-v13 patches (the
> >>>> test_upstream_arm64_devel branch of
> >>>> https://github.com/pratyushanand/linux) on Fedora 23 machine one of
> >>>> the kprobes_onthefly.exp tests is causing the machine to get in a
> >>>> state that requires rebooting to fix.  This can be triggered by running a
> >>>> portion of the systemtap tests with:
> >>>>
> >>>>    make installcheck RUNTESTFLAGS="--debug systemtap.onthefly/kprobes_onthefly.exp"
> >>>>
> >>>> When it gets to the kprobes_onthefly - otf_stress_max_iter_5000 test the
> >>>> console starts spewing the following and needs to be rebooted:
> >>>>
> >>>> [23394.036860] Unexpected kernel single-step exception at EL1
> >>>> [23394.042434] Unexpected kernel single-step exception at EL1
> >>>> [23394.048008] Unexpected kernel single-step exception at EL1
> >>>> [23394.053541] Unexpected kernel single-step exception at EL1
> >>>> [23394.059053] Unexpected kernel single-step exception at EL1
> >>>> [23394.064545] Unexpected kernel single-step exception at EL1
> >>>>
> >>>> Sorry I don't have the start of the failure it scrolled off the screen very quickly.
> >>>>
> >>>> -Will
> >>>>
> >>>>
> >>>
> >>> I'll take a look and see what I can figure out.
> >>>
> >>> In the meantime I did just push a v14 branch.  I'm doubtful that it will address the above problem even though it contains a few bug fixes.
> >>>
> >>> -dl
> >>>
> >>
> >> Hi Dave and Pratyush,
> >>
> >> I tried the kprobes64-v13 kernel and it also seems to work, so it lookw like the problem might be in the the
> >> test_upstream_arm64_devel branch of https://github.com/pratyushanand/linux .
> >>
> >> -Will
> >>
> > 
> > I'm going to interpret that as meaning you know of no problem in the kprobes v14 patch that would give me pause to email it upstream.  Do you disagree?
> > 
> > -dl
> > 
> 
> Hi Dave,
> 
> Yes, the problem only seems to be in that other kernel from https://github.com/pratyushanand/linux with the kprobe and uprobe patches, so the arm64 patches do not appear to be the problem.  I don't know what is causing the problem  maybe there is something going on with the porting of the patches to that kernel or the patches included in there (uprobes/kexec) in there. 

Just to update:

I confirm that problem arises after uprobe patches only, but not yet sure that
actual culprit is uprobe code. 

I can see that kprobes_onthefly.exp also exercises uprobes in the test. It
seems, when problem happens, there was a kprobe at print_worker_info(). 

Most likely re-entrant kprobe is called when kprobe is instrumented at
print_worker_info(). I guessed it could be show_regs() from arm64/kprobe code,
but commenting show_regs() did not make any difference. Even blacklisting
print_worker_info() also did not resolve it, probelem reproduced in a different
way after blacklisting.

So, still its vague and debugging is continued.
If I can clearly understand the systemtap test code, then probably it will be
easier to debug. I mean, if I can get the kernel and user space symbols name
where this test is instrumenting probes then that would help a lot to zero it
down.

~Pratyush

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-06-27 14:18             ` Pratyush Anand
@ 2016-06-28  3:20               ` William Cohen
  2016-07-04 12:46                 ` Pratyush Anand
  0 siblings, 1 reply; 56+ messages in thread
From: William Cohen @ 2016-06-28  3:20 UTC (permalink / raw)
  To: Pratyush Anand
  Cc: David Long, systemtap, Mark Brown, Jeremy Linton, David Smith

[-- Attachment #1: Type: text/plain, Size: 4847 bytes --]

On 06/27/2016 10:18 AM, Pratyush Anand wrote:
> Hi Will,
> 
> On 23/06/2016:03:22:44 PM, William Cohen wrote:
>> On 06/23/2016 02:26 PM, David Long wrote:
>>> On 06/23/2016 11:49 AM, William Cohen wrote:
>>>> On 06/22/2016 11:18 PM, David Long wrote:
>>>>> On 06/22/2016 04:24 PM, William Cohen wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> When running the current systemtap checked out from the git repository
>>>>>> and a locally built kernel with the kprobes64-v13 patches (the
>>>>>> test_upstream_arm64_devel branch of
>>>>>> https://github.com/pratyushanand/linux) on Fedora 23 machine one of
>>>>>> the kprobes_onthefly.exp tests is causing the machine to get in a
>>>>>> state that requires rebooting to fix.  This can be triggered by running a
>>>>>> portion of the systemtap tests with:
>>>>>>
>>>>>>    make installcheck RUNTESTFLAGS="--debug systemtap.onthefly/kprobes_onthefly.exp"
>>>>>>
>>>>>> When it gets to the kprobes_onthefly - otf_stress_max_iter_5000 test the
>>>>>> console starts spewing the following and needs to be rebooted:
>>>>>>
>>>>>> [23394.036860] Unexpected kernel single-step exception at EL1
>>>>>> [23394.042434] Unexpected kernel single-step exception at EL1
>>>>>> [23394.048008] Unexpected kernel single-step exception at EL1
>>>>>> [23394.053541] Unexpected kernel single-step exception at EL1
>>>>>> [23394.059053] Unexpected kernel single-step exception at EL1
>>>>>> [23394.064545] Unexpected kernel single-step exception at EL1
>>>>>>
>>>>>> Sorry I don't have the start of the failure it scrolled off the screen very quickly.
>>>>>>
>>>>>> -Will
>>>>>>
>>>>>>
>>>>>
>>>>> I'll take a look and see what I can figure out.
>>>>>
>>>>> In the meantime I did just push a v14 branch.  I'm doubtful that it will address the above problem even though it contains a few bug fixes.
>>>>>
>>>>> -dl
>>>>>
>>>>
>>>> Hi Dave and Pratyush,
>>>>
>>>> I tried the kprobes64-v13 kernel and it also seems to work, so it lookw like the problem might be in the the
>>>> test_upstream_arm64_devel branch of https://github.com/pratyushanand/linux .
>>>>
>>>> -Will
>>>>
>>>
>>> I'm going to interpret that as meaning you know of no problem in the kprobes v14 patch that would give me pause to email it upstream.  Do you disagree?
>>>
>>> -dl
>>>
>>
>> Hi Dave,
>>
>> Yes, the problem only seems to be in that other kernel from https://github.com/pratyushanand/linux with the kprobe and uprobe patches, so the arm64 patches do not appear to be the problem.  I don't know what is causing the problem  maybe there is something going on with the porting of the patches to that kernel or the patches included in there (uprobes/kexec) in there. 
> 
> Just to update:
> 
> I confirm that problem arises after uprobe patches only, but not yet sure that
> actual culprit is uprobe code. 
> 
> I can see that kprobes_onthefly.exp also exercises uprobes in the test. It
> seems, when problem happens, there was a kprobe at print_worker_info(). 
> 
> Most likely re-entrant kprobe is called when kprobe is instrumented at
> print_worker_info(). I guessed it could be show_regs() from arm64/kprobe code,
> but commenting show_regs() did not make any difference. Even blacklisting
> print_worker_info() also did not resolve it, probelem reproduced in a different
> way after blacklisting.
> 
> So, still its vague and debugging is continued.
> If I can clearly understand the systemtap test code, then probably it will be
> easier to debug. I mean, if I can get the kernel and user space symbols name
> where this test is instrumenting probes then that would help a lot to zero it
> down.
> 
> ~Pratyush
> 

Hi Pratyush,

My understanding is that the systemtap onthefly support enables/disable the probe as metnioned in the following sytemtap bugzilla entry (and the ones that it is dependent on): https://sourceware.org/bugzilla/show_bug.cgi?id=10995.  It would be handy to things pared down to the systemtap script that triggers the problem.  Putting some diagnostic puts it looks like the script that triggers the problems it looks like it is something like the attached onthefly_trigger.stp (that was gathered on a x86_64 machine so it might not be exactly what is causing the problem on aarch64.  David Smith, any suggestions on how to debug based on your experiences from https://sourceware.org/bugzilla/show_bug.cgi?id=17126 where the ppc64 had a similar issue with onthefly testing?

The "Unexpected kernel single-step exception at EL1" reminds me of the times when kprobes couldn't find a handler.  Maybe there is some situation where the kprobe is being removed but the breakpoint is still around. Did you get a backtrace with the insertino of the "BUG()" where that message is printed out? I wonder if it might be triggered by the (thread_flags & _TIF_UPROBE) somehow being true and the aarch64 do_notify_resume starts running.

-Will

[-- Attachment #2: onthefly_trigger.stp --]
[-- Type: text/plain, Size: 2276 bytes --]

      # We want these probes to fire as deterministically as possible so that
      # their outputs can easily be predicted and compared. Unfortunately this
      # is complicated by multiple facts, including
      #     (1) the timer might be so fast that the probes don't have time to
      #         fire and print something,
      #     (2) the kprobe might print something after it is re-enabled, but
      #         before the kretprobe is re-enabled,
      #     (3) disabling a kretprobe won't stop an already running handler.
      #
      # To get around these issues, we use a simple state machine. The states
      # are as follow:
      #
      #     0 = cond disabled
      #     1 = cond enabled, but kprobe && kretprobe not yet enabled
      #     2 = cond enabled, kprobe enabled, kretprobe not yet enabled
      #     3 = cond enabled, kprobe && kretprobe enabled, nothing printed yet
      #     4 = cond enabled, 'hit' printed but not 'rethit'
      #     5 = cond enabled, 'rethit' printed

      global state = 1
      global toggles = 0

      probe kernel.function("vfs_read").call if (state > 0) {
         if (state == 3) {
            println("hit")
            state++;
         } else if (state == 1) {
            state++;
         }
      }

      probe kernel.function("vfs_read").return if (state > 0) {
         # ensure that nothing changed during the vfs_read body
         if (state != @entry(state))
            next
         if (state == 4) {
            println("rethit")
            state++;
         } else if (state == 2) {
            state++;
         }
      }

      probe begin, end, error, kernel.function("*@workqueue.c"),  process.begin, process.end, kernel.trace("*"), process("echo").function("*"),  netfilter.pf("NFPROTO_IPV4").hook("NF_INET_LOCAL_IN")?,  netfilter.pf("NFPROTO_IPV4").hook("NF_INET_LOCAL_OUT")?,  perf.sw.cpu_clock.sample(1000000)?, timer.profile.tick? {
         if (state != 0 && state != 5)
            next # give probes more time to move through the states
         toggles++
         if (toggles > 5000)
            exit()
         else {
            println("toggling")
            state = !state
         }
      }

      probe timer.s(360) {
         println("timed out")
         exit()
      }

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-06-28  3:20               ` William Cohen
@ 2016-07-04 12:46                 ` Pratyush Anand
  2016-07-07 19:05                   ` David Long
  0 siblings, 1 reply; 56+ messages in thread
From: Pratyush Anand @ 2016-07-04 12:46 UTC (permalink / raw)
  To: William Cohen
  Cc: David Long, systemtap, Mark Brown, Jeremy Linton, David Smith

Hi Will,

I did some more debugging, and this is what my understanding is:

- While executing this test page_counter_cancel() is called. Probably
there is an out of memory scenario.
- page_counter_cancel() calls WARN_ON_ONCE(new < 0);
- WARN_ON_ONCE() causes to invoke brk BUG_BRK_IMM (brk 0x800) instruction
- Execution of brk 0x800 invokes calling of bug_handler()
- bug_handler() calls report_bug() which calls __warn()
- __warn() does lot of pr_warn()  which invokes print_worker_info()
where we have a kprobe instrumented.
- Therefore, we are encountering this issue.


~Pratyush




On Tue, Jun 28, 2016 at 8:50 AM, William Cohen <wcohen@redhat.com> wrote:
> On 06/27/2016 10:18 AM, Pratyush Anand wrote:
>> Hi Will,
>>
>> On 23/06/2016:03:22:44 PM, William Cohen wrote:
>>> On 06/23/2016 02:26 PM, David Long wrote:
>>>> On 06/23/2016 11:49 AM, William Cohen wrote:
>>>>> On 06/22/2016 11:18 PM, David Long wrote:
>>>>>> On 06/22/2016 04:24 PM, William Cohen wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> When running the current systemtap checked out from the git repository
>>>>>>> and a locally built kernel with the kprobes64-v13 patches (the
>>>>>>> test_upstream_arm64_devel branch of
>>>>>>> https://github.com/pratyushanand/linux) on Fedora 23 machine one of
>>>>>>> the kprobes_onthefly.exp tests is causing the machine to get in a
>>>>>>> state that requires rebooting to fix.  This can be triggered by running a
>>>>>>> portion of the systemtap tests with:
>>>>>>>
>>>>>>>    make installcheck RUNTESTFLAGS="--debug systemtap.onthefly/kprobes_onthefly.exp"
>>>>>>>
>>>>>>> When it gets to the kprobes_onthefly - otf_stress_max_iter_5000 test the
>>>>>>> console starts spewing the following and needs to be rebooted:
>>>>>>>
>>>>>>> [23394.036860] Unexpected kernel single-step exception at EL1
>>>>>>> [23394.042434] Unexpected kernel single-step exception at EL1
>>>>>>> [23394.048008] Unexpected kernel single-step exception at EL1
>>>>>>> [23394.053541] Unexpected kernel single-step exception at EL1
>>>>>>> [23394.059053] Unexpected kernel single-step exception at EL1
>>>>>>> [23394.064545] Unexpected kernel single-step exception at EL1
>>>>>>>
>>>>>>> Sorry I don't have the start of the failure it scrolled off the screen very quickly.
>>>>>>>
>>>>>>> -Will
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> I'll take a look and see what I can figure out.
>>>>>>
>>>>>> In the meantime I did just push a v14 branch.  I'm doubtful that it will address the above problem even though it contains a few bug fixes.
>>>>>>
>>>>>> -dl
>>>>>>
>>>>>
>>>>> Hi Dave and Pratyush,
>>>>>
>>>>> I tried the kprobes64-v13 kernel and it also seems to work, so it lookw like the problem might be in the the
>>>>> test_upstream_arm64_devel branch of https://github.com/pratyushanand/linux .
>>>>>
>>>>> -Will
>>>>>
>>>>
>>>> I'm going to interpret that as meaning you know of no problem in the kprobes v14 patch that would give me pause to email it upstream.  Do you disagree?
>>>>
>>>> -dl
>>>>
>>>
>>> Hi Dave,
>>>
>>> Yes, the problem only seems to be in that other kernel from https://github.com/pratyushanand/linux with the kprobe and uprobe patches, so the arm64 patches do not appear to be the problem.  I don't know what is causing the problem  maybe there is something going on with the porting of the patches to that kernel or the patches included in there (uprobes/kexec) in there.
>>
>> Just to update:
>>
>> I confirm that problem arises after uprobe patches only, but not yet sure that
>> actual culprit is uprobe code.
>>
>> I can see that kprobes_onthefly.exp also exercises uprobes in the test. It
>> seems, when problem happens, there was a kprobe at print_worker_info().
>>
>> Most likely re-entrant kprobe is called when kprobe is instrumented at
>> print_worker_info(). I guessed it could be show_regs() from arm64/kprobe code,
>> but commenting show_regs() did not make any difference. Even blacklisting
>> print_worker_info() also did not resolve it, probelem reproduced in a different
>> way after blacklisting.
>>
>> So, still its vague and debugging is continued.
>> If I can clearly understand the systemtap test code, then probably it will be
>> easier to debug. I mean, if I can get the kernel and user space symbols name
>> where this test is instrumenting probes then that would help a lot to zero it
>> down.
>>
>> ~Pratyush
>>
>
> Hi Pratyush,
>
> My understanding is that the systemtap onthefly support enables/disable the probe as metnioned in the following sytemtap bugzilla entry (and the ones that it is dependent on): https://sourceware.org/bugzilla/show_bug.cgi?id=10995.  It would be handy to things pared down to the systemtap script that triggers the problem.  Putting some diagnostic puts it looks like the script that triggers the problems it looks like it is something like the attached onthefly_trigger.stp (that was gathered on a x86_64 machine so it might not be exactly what is causing the problem on aarch64.  David Smith, any suggestions on how to debug based on your experiences from https://sourceware.org/bugzilla/show_bug.cgi?id=17126 where the ppc64 had a similar issue with onthefly testing?
>
> The "Unexpected kernel single-step exception at EL1" reminds me of the times when kprobes couldn't find a handler.  Maybe there is some situation where the kprobe is being removed but the breakpoint is still around. Did you get a backtrace with the insertino of the "BUG()" where that message is printed out? I wonder if it might be triggered by the (thread_flags & _TIF_UPROBE) somehow being true and the aarch64 do_notify_resume starts running.
>
> -Will

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-07-04 12:46                 ` Pratyush Anand
@ 2016-07-07 19:05                   ` David Long
  2016-07-07 19:58                     ` Frank Ch. Eigler
  0 siblings, 1 reply; 56+ messages in thread
From: David Long @ 2016-07-07 19:05 UTC (permalink / raw)
  To: Pratyush Anand, William Cohen
  Cc: systemtap, Mark Brown, Jeremy Linton, David Smith

On 07/04/2016 08:46 AM, Pratyush Anand wrote:
> Hi Will,
>
> I did some more debugging, and this is what my understanding is:
>
> - While executing this test page_counter_cancel() is called. Probably
> there is an out of memory scenario.
> - page_counter_cancel() calls WARN_ON_ONCE(new < 0);
> - WARN_ON_ONCE() causes to invoke brk BUG_BRK_IMM (brk 0x800) instruction
> - Execution of brk 0x800 invokes calling of bug_handler()
> - bug_handler() calls report_bug() which calls __warn()
> - __warn() does lot of pr_warn()  which invokes print_worker_info()
> where we have a kprobe instrumented.
> - Therefore, we are encountering this issue.
>
>
> ~Pratyush
>

It sounds like the only fix would be to expand the blacklist to any 
function that could be called in a debug exception-handling context? I 
have to think by the time this (fluid) list of functions were compiled 
there would be an awful lot of unprobeable code.  Do we think there is 
any reasonable approach to making this less likely to happen when using 
kprobes, without extensive blacklisting?

I pushed a v15 branch to my repo last night and I'd like to email the 
patches out ASAP if we think this issue is either acceptable, or best 
addressed after the feature is in place.

>
>
>
> On Tue, Jun 28, 2016 at 8:50 AM, William Cohen <wcohen@redhat.com> wrote:
>> On 06/27/2016 10:18 AM, Pratyush Anand wrote:
>>> Hi Will,
>>>
>>> On 23/06/2016:03:22:44 PM, William Cohen wrote:
>>>> On 06/23/2016 02:26 PM, David Long wrote:
>>>>> On 06/23/2016 11:49 AM, William Cohen wrote:
>>>>>> On 06/22/2016 11:18 PM, David Long wrote:
>>>>>>> On 06/22/2016 04:24 PM, William Cohen wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> When running the current systemtap checked out from the git repository
>>>>>>>> and a locally built kernel with the kprobes64-v13 patches (the
>>>>>>>> test_upstream_arm64_devel branch of
>>>>>>>> https://github.com/pratyushanand/linux) on Fedora 23 machine one of
>>>>>>>> the kprobes_onthefly.exp tests is causing the machine to get in a
>>>>>>>> state that requires rebooting to fix.  This can be triggered by running a
>>>>>>>> portion of the systemtap tests with:
>>>>>>>>
>>>>>>>>     make installcheck RUNTESTFLAGS="--debug systemtap.onthefly/kprobes_onthefly.exp"
>>>>>>>>
>>>>>>>> When it gets to the kprobes_onthefly - otf_stress_max_iter_5000 test the
>>>>>>>> console starts spewing the following and needs to be rebooted:
>>>>>>>>
>>>>>>>> [23394.036860] Unexpected kernel single-step exception at EL1
>>>>>>>> [23394.042434] Unexpected kernel single-step exception at EL1
>>>>>>>> [23394.048008] Unexpected kernel single-step exception at EL1
>>>>>>>> [23394.053541] Unexpected kernel single-step exception at EL1
>>>>>>>> [23394.059053] Unexpected kernel single-step exception at EL1
>>>>>>>> [23394.064545] Unexpected kernel single-step exception at EL1
>>>>>>>>
>>>>>>>> Sorry I don't have the start of the failure it scrolled off the screen very quickly.
>>>>>>>>
>>>>>>>> -Will
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> I'll take a look and see what I can figure out.
>>>>>>>
>>>>>>> In the meantime I did just push a v14 branch.  I'm doubtful that it will address the above problem even though it contains a few bug fixes.
>>>>>>>
>>>>>>> -dl
>>>>>>>
>>>>>>
>>>>>> Hi Dave and Pratyush,
>>>>>>
>>>>>> I tried the kprobes64-v13 kernel and it also seems to work, so it lookw like the problem might be in the the
>>>>>> test_upstream_arm64_devel branch of https://github.com/pratyushanand/linux .
>>>>>>
>>>>>> -Will
>>>>>>
>>>>>
>>>>> I'm going to interpret that as meaning you know of no problem in the kprobes v14 patch that would give me pause to email it upstream.  Do you disagree?
>>>>>
>>>>> -dl
>>>>>
>>>>
>>>> Hi Dave,
>>>>
>>>> Yes, the problem only seems to be in that other kernel from https://github.com/pratyushanand/linux with the kprobe and uprobe patches, so the arm64 patches do not appear to be the problem.  I don't know what is causing the problem  maybe there is something going on with the porting of the patches to that kernel or the patches included in there (uprobes/kexec) in there.
>>>
>>> Just to update:
>>>
>>> I confirm that problem arises after uprobe patches only, but not yet sure that
>>> actual culprit is uprobe code.
>>>
>>> I can see that kprobes_onthefly.exp also exercises uprobes in the test. It
>>> seems, when problem happens, there was a kprobe at print_worker_info().
>>>
>>> Most likely re-entrant kprobe is called when kprobe is instrumented at
>>> print_worker_info(). I guessed it could be show_regs() from arm64/kprobe code,
>>> but commenting show_regs() did not make any difference. Even blacklisting
>>> print_worker_info() also did not resolve it, probelem reproduced in a different
>>> way after blacklisting.
>>>
>>> So, still its vague and debugging is continued.
>>> If I can clearly understand the systemtap test code, then probably it will be
>>> easier to debug. I mean, if I can get the kernel and user space symbols name
>>> where this test is instrumenting probes then that would help a lot to zero it
>>> down.
>>>
>>> ~Pratyush
>>>
>>
>> Hi Pratyush,
>>
>> My understanding is that the systemtap onthefly support enables/disable the probe as metnioned in the following sytemtap bugzilla entry (and the ones that it is dependent on): https://sourceware.org/bugzilla/show_bug.cgi?id=10995.  It would be handy to things pared down to the systemtap script that triggers the problem.  Putting some diagnostic puts it looks like the script that triggers the problems it looks like it is something like the attached onthefly_trigger.stp (that was gathered on a x86_64 machine so it might not be exactly what is causing the problem on aarch64.  David Smith, any suggestions on how to debug based on your experiences from https://sourceware.org/bugzilla/show_bug.cgi?id=17126 where the ppc64 had a similar issue with onthefly testing?
>>
>> The "Unexpected kernel single-step exception at EL1" reminds me of the times when kprobes couldn't find a handler.  Maybe there is some situation where the kprobe is being removed but the breakpoint is still around. Did you get a backtrace with the insertino of the "BUG()" where that message is printed out? I wonder if it might be triggered by the (thread_flags & _TIF_UPROBE) somehow being true and the aarch64 do_notify_resume starts running.
>>
>> -Will

-dl

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-07-07 19:05                   ` David Long
@ 2016-07-07 19:58                     ` Frank Ch. Eigler
  2016-08-03 13:13                       ` Pratyush Anand
  0 siblings, 1 reply; 56+ messages in thread
From: Frank Ch. Eigler @ 2016-07-07 19:58 UTC (permalink / raw)
  To: David Long
  Cc: Pratyush Anand, William Cohen, systemtap, Mark Brown,
	Jeremy Linton, David Smith

David Long <dave.long@linaro.org> writes:

> [...]
>> - bug_handler() calls report_bug() which calls __warn()
>> - __warn() does lot of pr_warn()  which invokes print_worker_info()
>> where we have a kprobe instrumented.
>> - Therefore, we are encountering this issue.
>> [...]
> It sounds like the only fix would be to expand the blacklist to any
> function that could be called in a debug exception-handling context? [...]

The kernel maintains its own blacklist by means of designating some
low-level functions with the "__kprobes" attribute.  That protects
those regions of code from "perf probe"-directed kprobes too. 

- FChE

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-06-10  5:49   ` David Long
  2016-06-10 13:43     ` Pratyush Anand
@ 2016-07-12 14:33     ` William Cohen
  2016-07-13 18:26       ` David Long
  1 sibling, 1 reply; 56+ messages in thread
From: William Cohen @ 2016-07-12 14:33 UTC (permalink / raw)
  To: David Long, systemtap, Pratyush Anand, Mark Brown

On 06/10/2016 01:49 AM, David Long wrote:
> Attached are incremental diffs I hope will fix the latest systemtap failures, without abandoning atomic sequence checking.  I'm trying to avoid the hex constants but I don't think the insn.c functions help in this case.
> 
> -dl
> 

Hi Dave,

Is this heuristic to limit the search to not go past the prologue going to be included in the arm64 kprobe patches?  I ran the current v15 patches and saw that some of the tests were failing becuase some of the problems were unseccussful in registering as seen below in the output of systemtap.log.  -Will

spawn stap /root/systemtap_write/systemtap/testsuite/systemtap.base/bz1027459.stp

WARNING: probe kernel.function("SyS_set_tid_address@kernel/fork.c:1236").call (address 0xfffffc00080c5f00) registration error (rc -22)

WARNING: probe kernel.function("SyS_sched_setaffinity@kernel/sched/core.c:4706").call (address 0xfffffc00081010e8) registration error (rc -22)

WARNING: probe kernel.function("SyS_sched_get_priority_min@kernel/sched/core.c:5029").call (address 0xfffffc00081015e8) registration error (rc -22)

WARNING: probe kernel.function("SyS_sched_get_priority_max@kernel/sched/core.c:5002").call (address 0xfffffc0008101580) registration error (rc -22)

hi

FAIL: bz1027459 -p5 (0)

spawn stap -v /root/systemtap_write/systemtap/testsuite/systemtap.base/equal.stp

Pass 1: parsed user script and 115 library scripts using 51968virt/38784res/6976shr/32576data kb, in 190usr/0sys/202real ms.

Pass 2: analyzed script: 4 probes, 1 function, 0 embeds, 2 globals using 91776virt/82176res/7744shr/72384data kb, in 1230usr/10sys/1241real ms.

Pass 3: translated to C into "/tmp/stapK5F1ze/stap_5a080310fa71fdec33340dcee0389f80_1969_src.c" using 91776virt/82432res/8000shr/72384data kb, in 0usr/0sys/11real ms.

Pass 4: compiled C into "stap_5a080310fa71fdec33340dcee0389f80_1969.ko" in 3190usr/640sys/3803real ms.

Pass 5: starting run.

WARNING: probe kernel.function("schedule@kernel/sched/core.c:3369") (address 0xfffffc00088f65c0) registration error (rc -22)

systemtap starting probe

FAIL: systemtap.base/equal.stp startup (timeout)

spawn stap -v /root/systemtap_write/systemtap/testsuite/systemtap.base/finloop2.stp

Pass 1: parsed user script and 115 library scripts using 52032virt/38848res/6976shr/32640data kb, in 200usr/0sys/204real ms.

Pass 2: analyzed script: 4 probes, 1 function, 0 embeds, 2 globals using 91840virt/82240res/7744shr/72448data kb, in 1230usr/10sys/1246real ms.

Pass 3: translated to C into "/tmp/stapfwr7Sg/stap_177624071ced2b52cbbfd6b4ea09dd28_1964_src.c" using 91840virt/82496res/8000shr/72448data kb, in 10usr/10sys/11real ms.

Pass 4: compiled C into "stap_177624071ced2b52cbbfd6b4ea09dd28_1964.ko" in 3230usr/620sys/3812real ms.

Pass 5: starting run.

WARNING: probe kernel.function("schedule@kernel/sched/core.c:3369") (address 0xfffffc00088f65c0) registration error (rc -22)

systemtap starting probe

FAIL: systemtap.base/finloop2.stp startup (timeout)

spawn stap -v /root/systemtap_write/systemtap/testsuite/systemtap.base/kfunct.stp

Pass 1: parsed user script and 115 library scripts using 52032virt/38848res/6976shr/32640data kb, in 190usr/10sys/201real ms.

Pass 2: analyzed script: 4 probes, 1 function, 0 embeds, 1 global using 91840virt/82240res/7744shr/72448data kb, in 1240usr/10sys/1245real ms.

Pass 3: translated to C into "/tmp/stap8zsRzS/stap_2fa61ae7aa177442cb013709494408e9_1680_src.c" using 91840virt/82496res/8000shr/72448data kb, in 10usr/0sys/11real ms.

Pass 4: compiled C into "stap_2fa61ae7aa177442cb013709494408e9_1680.ko" in 3210usr/620sys/3789real ms.

Pass 5: starting run.

WARNING: probe kernel.function("schedule@kernel/sched/core.c:3369") (address 0xfffffc00088f65c0) registration error (rc -22)

systemtap starting probe

FAIL: systemtap.base/kfunct.stp startup (timeout)


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-07-12 14:33     ` William Cohen
@ 2016-07-13 18:26       ` David Long
  2016-07-13 18:47         ` Pratyush Anand
  0 siblings, 1 reply; 56+ messages in thread
From: David Long @ 2016-07-13 18:26 UTC (permalink / raw)
  To: William Cohen, systemtap, Pratyush Anand, Mark Brown

On 07/12/2016 10:33 AM, William Cohen wrote:
> On 06/10/2016 01:49 AM, David Long wrote:
>> Attached are incremental diffs I hope will fix the latest systemtap failures, without abandoning atomic sequence checking.  I'm trying to avoid the hex constants but I don't think the insn.c functions help in this case.
>>
>> -dl
>>
>
> Hi Dave,
>
> Is this heuristic to limit the search to not go past the prologue going to be included in the arm64 kprobe patches?  I ran the current v15 patches and saw that some of the tests were failing becuase some of the problems were unseccussful in registering as seen below in the output of systemtap.log.  -Will
>
> spawn stap /root/systemtap_write/systemtap/testsuite/systemtap.base/bz1027459.stp
>
> WARNING: probe kernel.function("SyS_set_tid_address@kernel/fork.c:1236").call (address 0xfffffc00080c5f00) registration error (rc -22)
>
> WARNING: probe kernel.function("SyS_sched_setaffinity@kernel/sched/core.c:4706").call (address 0xfffffc00081010e8) registration error (rc -22)
>
> WARNING: probe kernel.function("SyS_sched_get_priority_min@kernel/sched/core.c:5029").call (address 0xfffffc00081015e8) registration error (rc -22)
>
> WARNING: probe kernel.function("SyS_sched_get_priority_max@kernel/sched/core.c:5002").call (address 0xfffffc0008101580) registration error (rc -22)
>
> hi
>
> FAIL: bz1027459 -p5 (0)
>
> spawn stap -v /root/systemtap_write/systemtap/testsuite/systemtap.base/equal.stp
>
> Pass 1: parsed user script and 115 library scripts using 51968virt/38784res/6976shr/32576data kb, in 190usr/0sys/202real ms.
>
> Pass 2: analyzed script: 4 probes, 1 function, 0 embeds, 2 globals using 91776virt/82176res/7744shr/72384data kb, in 1230usr/10sys/1241real ms.
>
> Pass 3: translated to C into "/tmp/stapK5F1ze/stap_5a080310fa71fdec33340dcee0389f80_1969_src.c" using 91776virt/82432res/8000shr/72384data kb, in 0usr/0sys/11real ms.
>
> Pass 4: compiled C into "stap_5a080310fa71fdec33340dcee0389f80_1969.ko" in 3190usr/640sys/3803real ms.
>
> Pass 5: starting run.
>
> WARNING: probe kernel.function("schedule@kernel/sched/core.c:3369") (address 0xfffffc00088f65c0) registration error (rc -22)
>
> systemtap starting probe
>
> FAIL: systemtap.base/equal.stp startup (timeout)
>
> spawn stap -v /root/systemtap_write/systemtap/testsuite/systemtap.base/finloop2.stp
>
> Pass 1: parsed user script and 115 library scripts using 52032virt/38848res/6976shr/32640data kb, in 200usr/0sys/204real ms.
>
> Pass 2: analyzed script: 4 probes, 1 function, 0 embeds, 2 globals using 91840virt/82240res/7744shr/72448data kb, in 1230usr/10sys/1246real ms.
>
> Pass 3: translated to C into "/tmp/stapfwr7Sg/stap_177624071ced2b52cbbfd6b4ea09dd28_1964_src.c" using 91840virt/82496res/8000shr/72448data kb, in 10usr/10sys/11real ms.
>
> Pass 4: compiled C into "stap_177624071ced2b52cbbfd6b4ea09dd28_1964.ko" in 3230usr/620sys/3812real ms.
>
> Pass 5: starting run.
>
> WARNING: probe kernel.function("schedule@kernel/sched/core.c:3369") (address 0xfffffc00088f65c0) registration error (rc -22)
>
> systemtap starting probe
>
> FAIL: systemtap.base/finloop2.stp startup (timeout)
>
> spawn stap -v /root/systemtap_write/systemtap/testsuite/systemtap.base/kfunct.stp
>
> Pass 1: parsed user script and 115 library scripts using 52032virt/38848res/6976shr/32640data kb, in 190usr/10sys/201real ms.
>
> Pass 2: analyzed script: 4 probes, 1 function, 0 embeds, 1 global using 91840virt/82240res/7744shr/72448data kb, in 1240usr/10sys/1245real ms.
>
> Pass 3: translated to C into "/tmp/stap8zsRzS/stap_2fa61ae7aa177442cb013709494408e9_1680_src.c" using 91840virt/82496res/8000shr/72448data kb, in 10usr/0sys/11real ms.
>
> Pass 4: compiled C into "stap_2fa61ae7aa177442cb013709494408e9_1680.ko" in 3210usr/620sys/3789real ms.
>
> Pass 5: starting run.
>
> WARNING: probe kernel.function("schedule@kernel/sched/core.c:3369") (address 0xfffffc00088f65c0) registration error (rc -22)
>
> systemtap starting probe
>
> FAIL: systemtap.base/kfunct.stp startup (timeout)
>
>

I don't think we ever closed the loop on whether this was a good idea or 
not.  This code is not present in v15.

-dl

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-07-13 18:26       ` David Long
@ 2016-07-13 18:47         ` Pratyush Anand
  2016-07-13 19:45           ` William Cohen
  0 siblings, 1 reply; 56+ messages in thread
From: Pratyush Anand @ 2016-07-13 18:47 UTC (permalink / raw)
  To: David Long; +Cc: William Cohen, systemtap, Mark Brown

On 13/07/2016:02:25:57 PM, David Long wrote:
> On 07/12/2016 10:33 AM, William Cohen wrote:
> > On 06/10/2016 01:49 AM, David Long wrote:
> > > Attached are incremental diffs I hope will fix the latest systemtap failures, without abandoning atomic sequence checking.  I'm trying to avoid the hex constants but I don't think the insn.c functions help in this case.
> > > 
> > > -dl
> > > 
> > 
> > Hi Dave,
> > 
> > Is this heuristic to limit the search to not go past the prologue going to be included in the arm64 kprobe patches?  I ran the current v15 patches and saw that some of the tests were failing becuase some of the problems were unseccussful in registering as seen below in the output of systemtap.log.  -Will
> > 
> > spawn stap /root/systemtap_write/systemtap/testsuite/systemtap.base/bz1027459.stp
> > 
> > WARNING: probe kernel.function("SyS_set_tid_address@kernel/fork.c:1236").call (address 0xfffffc00080c5f00) registration error (rc -22)
> > 
> > WARNING: probe kernel.function("SyS_sched_setaffinity@kernel/sched/core.c:4706").call (address 0xfffffc00081010e8) registration error (rc -22)
> > 
> > WARNING: probe kernel.function("SyS_sched_get_priority_min@kernel/sched/core.c:5029").call (address 0xfffffc00081015e8) registration error (rc -22)
> > 
> > WARNING: probe kernel.function("SyS_sched_get_priority_max@kernel/sched/core.c:5002").call (address 0xfffffc0008101580) registration error (rc -22)
> > 
> > hi
> > 
> > FAIL: bz1027459 -p5 (0)
> > 
> > spawn stap -v /root/systemtap_write/systemtap/testsuite/systemtap.base/equal.stp
> > 
> > Pass 1: parsed user script and 115 library scripts using 51968virt/38784res/6976shr/32576data kb, in 190usr/0sys/202real ms.
> > 
> > Pass 2: analyzed script: 4 probes, 1 function, 0 embeds, 2 globals using 91776virt/82176res/7744shr/72384data kb, in 1230usr/10sys/1241real ms.
> > 
> > Pass 3: translated to C into "/tmp/stapK5F1ze/stap_5a080310fa71fdec33340dcee0389f80_1969_src.c" using 91776virt/82432res/8000shr/72384data kb, in 0usr/0sys/11real ms.
> > 
> > Pass 4: compiled C into "stap_5a080310fa71fdec33340dcee0389f80_1969.ko" in 3190usr/640sys/3803real ms.
> > 
> > Pass 5: starting run.
> > 
> > WARNING: probe kernel.function("schedule@kernel/sched/core.c:3369") (address 0xfffffc00088f65c0) registration error (rc -22)
> > 
> > systemtap starting probe
> > 
> > FAIL: systemtap.base/equal.stp startup (timeout)
> > 
> > spawn stap -v /root/systemtap_write/systemtap/testsuite/systemtap.base/finloop2.stp
> > 
> > Pass 1: parsed user script and 115 library scripts using 52032virt/38848res/6976shr/32640data kb, in 200usr/0sys/204real ms.
> > 
> > Pass 2: analyzed script: 4 probes, 1 function, 0 embeds, 2 globals using 91840virt/82240res/7744shr/72448data kb, in 1230usr/10sys/1246real ms.
> > 
> > Pass 3: translated to C into "/tmp/stapfwr7Sg/stap_177624071ced2b52cbbfd6b4ea09dd28_1964_src.c" using 91840virt/82496res/8000shr/72448data kb, in 10usr/10sys/11real ms.
> > 
> > Pass 4: compiled C into "stap_177624071ced2b52cbbfd6b4ea09dd28_1964.ko" in 3230usr/620sys/3812real ms.
> > 
> > Pass 5: starting run.
> > 
> > WARNING: probe kernel.function("schedule@kernel/sched/core.c:3369") (address 0xfffffc00088f65c0) registration error (rc -22)
> > 
> > systemtap starting probe
> > 
> > FAIL: systemtap.base/finloop2.stp startup (timeout)
> > 
> > spawn stap -v /root/systemtap_write/systemtap/testsuite/systemtap.base/kfunct.stp
> > 
> > Pass 1: parsed user script and 115 library scripts using 52032virt/38848res/6976shr/32640data kb, in 190usr/10sys/201real ms.
> > 
> > Pass 2: analyzed script: 4 probes, 1 function, 0 embeds, 1 global using 91840virt/82240res/7744shr/72448data kb, in 1240usr/10sys/1245real ms.
> > 
> > Pass 3: translated to C into "/tmp/stap8zsRzS/stap_2fa61ae7aa177442cb013709494408e9_1680_src.c" using 91840virt/82496res/8000shr/72448data kb, in 10usr/0sys/11real ms.
> > 
> > Pass 4: compiled C into "stap_2fa61ae7aa177442cb013709494408e9_1680.ko" in 3210usr/620sys/3789real ms.
> > 
> > Pass 5: starting run.
> > 
> > WARNING: probe kernel.function("schedule@kernel/sched/core.c:3369") (address 0xfffffc00088f65c0) registration error (rc -22)
> > 
> > systemtap starting probe
> > 
> > FAIL: systemtap.base/kfunct.stp startup (timeout)
> > 
> > 
> 
> I don't think we ever closed the loop on whether this was a good idea or

I think, we should go with that improvement of recognising "stp
x29,x30,[sp,...]" until we have something from compiler to recognize .word
instructions.

> not.  This code is not present in v15.

So, please take that in next revision.

~Pratyush

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-07-13 18:47         ` Pratyush Anand
@ 2016-07-13 19:45           ` William Cohen
  0 siblings, 0 replies; 56+ messages in thread
From: William Cohen @ 2016-07-13 19:45 UTC (permalink / raw)
  To: Pratyush Anand, David Long; +Cc: systemtap, Mark Brown

On 07/13/2016 02:44 PM, Pratyush Anand wrote:
> On 13/07/2016:02:25:57 PM, David Long wrote:
>> On 07/12/2016 10:33 AM, William Cohen wrote:
>>> On 06/10/2016 01:49 AM, David Long wrote:
>>>> Attached are incremental diffs I hope will fix the latest systemtap failures, without abandoning atomic sequence checking.  I'm trying to avoid the hex constants but I don't think the insn.c functions help in this case.
>>>>
>>>> -dl
>>>>
>>>
>>> Hi Dave,
>>>
>>> Is this heuristic to limit the search to not go past the prologue going to be included in the arm64 kprobe patches?  I ran the current v15 patches and saw that some of the tests were failing becuase some of the problems were unseccussful in registering as seen below in the output of systemtap.log.  -Will
>>>
>>> spawn stap /root/systemtap_write/systemtap/testsuite/systemtap.base/bz1027459.stp
>>>
>>> WARNING: probe kernel.function("SyS_set_tid_address@kernel/fork.c:1236").call (address 0xfffffc00080c5f00) registration error (rc -22)
>>>
>>> WARNING: probe kernel.function("SyS_sched_setaffinity@kernel/sched/core.c:4706").call (address 0xfffffc00081010e8) registration error (rc -22)
>>>
>>> WARNING: probe kernel.function("SyS_sched_get_priority_min@kernel/sched/core.c:5029").call (address 0xfffffc00081015e8) registration error (rc -22)
>>>
>>> WARNING: probe kernel.function("SyS_sched_get_priority_max@kernel/sched/core.c:5002").call (address 0xfffffc0008101580) registration error (rc -22)
>>>
>>> hi
>>>
>>> FAIL: bz1027459 -p5 (0)
>>>
>>> spawn stap -v /root/systemtap_write/systemtap/testsuite/systemtap.base/equal.stp
>>>
>>> Pass 1: parsed user script and 115 library scripts using 51968virt/38784res/6976shr/32576data kb, in 190usr/0sys/202real ms.
>>>
>>> Pass 2: analyzed script: 4 probes, 1 function, 0 embeds, 2 globals using 91776virt/82176res/7744shr/72384data kb, in 1230usr/10sys/1241real ms.
>>>
>>> Pass 3: translated to C into "/tmp/stapK5F1ze/stap_5a080310fa71fdec33340dcee0389f80_1969_src.c" using 91776virt/82432res/8000shr/72384data kb, in 0usr/0sys/11real ms.
>>>
>>> Pass 4: compiled C into "stap_5a080310fa71fdec33340dcee0389f80_1969.ko" in 3190usr/640sys/3803real ms.
>>>
>>> Pass 5: starting run.
>>>
>>> WARNING: probe kernel.function("schedule@kernel/sched/core.c:3369") (address 0xfffffc00088f65c0) registration error (rc -22)
>>>
>>> systemtap starting probe
>>>
>>> FAIL: systemtap.base/equal.stp startup (timeout)
>>>
>>> spawn stap -v /root/systemtap_write/systemtap/testsuite/systemtap.base/finloop2.stp
>>>
>>> Pass 1: parsed user script and 115 library scripts using 52032virt/38848res/6976shr/32640data kb, in 200usr/0sys/204real ms.
>>>
>>> Pass 2: analyzed script: 4 probes, 1 function, 0 embeds, 2 globals using 91840virt/82240res/7744shr/72448data kb, in 1230usr/10sys/1246real ms.
>>>
>>> Pass 3: translated to C into "/tmp/stapfwr7Sg/stap_177624071ced2b52cbbfd6b4ea09dd28_1964_src.c" using 91840virt/82496res/8000shr/72448data kb, in 10usr/10sys/11real ms.
>>>
>>> Pass 4: compiled C into "stap_177624071ced2b52cbbfd6b4ea09dd28_1964.ko" in 3230usr/620sys/3812real ms.
>>>
>>> Pass 5: starting run.
>>>
>>> WARNING: probe kernel.function("schedule@kernel/sched/core.c:3369") (address 0xfffffc00088f65c0) registration error (rc -22)
>>>
>>> systemtap starting probe
>>>
>>> FAIL: systemtap.base/finloop2.stp startup (timeout)
>>>
>>> spawn stap -v /root/systemtap_write/systemtap/testsuite/systemtap.base/kfunct.stp
>>>
>>> Pass 1: parsed user script and 115 library scripts using 52032virt/38848res/6976shr/32640data kb, in 190usr/10sys/201real ms.
>>>
>>> Pass 2: analyzed script: 4 probes, 1 function, 0 embeds, 1 global using 91840virt/82240res/7744shr/72448data kb, in 1240usr/10sys/1245real ms.
>>>
>>> Pass 3: translated to C into "/tmp/stap8zsRzS/stap_2fa61ae7aa177442cb013709494408e9_1680_src.c" using 91840virt/82496res/8000shr/72448data kb, in 10usr/0sys/11real ms.
>>>
>>> Pass 4: compiled C into "stap_2fa61ae7aa177442cb013709494408e9_1680.ko" in 3210usr/620sys/3789real ms.
>>>
>>> Pass 5: starting run.
>>>
>>> WARNING: probe kernel.function("schedule@kernel/sched/core.c:3369") (address 0xfffffc00088f65c0) registration error (rc -22)
>>>
>>> systemtap starting probe
>>>
>>> FAIL: systemtap.base/kfunct.stp startup (timeout)
>>>
>>>
>>
>> I don't think we ever closed the loop on whether this was a good idea or
> 
> I think, we should go with that improvement of recognising "stp
> x29,x30,[sp,...]" until we have something from compiler to recognize .word
> instructions.
> 
>> not.  This code is not present in v15.
> 
> So, please take that in next revision.

Yes, I am in favor of including this heuristic also.  -Will

> 
> ~Pratyush
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-07-07 19:58                     ` Frank Ch. Eigler
@ 2016-08-03 13:13                       ` Pratyush Anand
  2016-08-03 14:51                         ` William Cohen
  2016-08-03 17:40                         ` William Cohen
  0 siblings, 2 replies; 56+ messages in thread
From: Pratyush Anand @ 2016-08-03 13:13 UTC (permalink / raw)
  To: William Cohen
  Cc: David Long, systemtap, Mark Brown, Jeremy Linton, David Smith,
	Frank Ch. Eigler

On 07/07/2016:03:58:37 PM, Frank Ch. Eigler wrote:
> David Long <dave.long@linaro.org> writes:
> 
> > [...]
> >> - bug_handler() calls report_bug() which calls __warn()
> >> - __warn() does lot of pr_warn()  which invokes print_worker_info()
> >> where we have a kprobe instrumented.
> >> - Therefore, we are encountering this issue.

Hi Will,

Can you please cross-check if following branch works perfectly with
kprobes_onthefly.exp and other systemtap tests.

https://github.com/pratyushanand/linux/tree/uprobe/upstream_arm64_devel_v1.1

Following patch in above branch should solve this issue.
https://github.com/pratyushanand/linux/commit/d0dcc6477f1279ab0bd99aefc30efdecb16c586e

However, I am not yet sure that above modification could be the best solution,
so discussing at arm kernel list.

~Pratyush

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-08-03 13:13                       ` Pratyush Anand
@ 2016-08-03 14:51                         ` William Cohen
  2016-08-03 15:11                           ` David Long
  2016-08-03 17:40                         ` William Cohen
  1 sibling, 1 reply; 56+ messages in thread
From: William Cohen @ 2016-08-03 14:51 UTC (permalink / raw)
  To: Pratyush Anand
  Cc: David Long, systemtap, Mark Brown, Jeremy Linton, David Smith,
	Frank Ch. Eigler

On 08/03/2016 09:13 AM, Pratyush Anand wrote:
> On 07/07/2016:03:58:37 PM, Frank Ch. Eigler wrote:
>> David Long <dave.long@linaro.org> writes:
>>
>>> [...]
>>>> - bug_handler() calls report_bug() which calls __warn()
>>>> - __warn() does lot of pr_warn()  which invokes print_worker_info()
>>>> where we have a kprobe instrumented.
>>>> - Therefore, we are encountering this issue.
> 
> Hi Will,
> 
> Can you please cross-check if following branch works perfectly with
> kprobes_onthefly.exp and other systemtap tests.
> 
> https://github.com/pratyushanand/linux/tree/uprobe/upstream_arm64_devel_v1.1
> 
> Following patch in above branch should solve this issue.
> https://github.com/pratyushanand/linux/commit/d0dcc6477f1279ab0bd99aefc30efdecb16c586e
> 
> However, I am not yet sure that above modification could be the best solution,
> so discussing at arm kernel list.
> 
> ~Pratyush
> 

Hi Pratyush,

I am setting up a machine with the locally built kernel and systemtap to see if the problem is resolved. I hope to have some results by this evening.

Could there be a better way to handle the "Unexpected kernel single-step exception at EL1"?  Getting stuck in a loop endlessly quickly printing out that message isn't very helpful. Maybe use pr_warn_ratelimited instead of pr_warn.

-Will

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-08-03 14:51                         ` William Cohen
@ 2016-08-03 15:11                           ` David Long
  0 siblings, 0 replies; 56+ messages in thread
From: David Long @ 2016-08-03 15:11 UTC (permalink / raw)
  To: William Cohen, Pratyush Anand
  Cc: systemtap, Mark Brown, Jeremy Linton, David Smith, Frank Ch. Eigler

On 08/03/2016 10:51 AM, William Cohen wrote:
> On 08/03/2016 09:13 AM, Pratyush Anand wrote:
>> On 07/07/2016:03:58:37 PM, Frank Ch. Eigler wrote:
>>> David Long <dave.long@linaro.org> writes:
>>>
>>>> [...]
>>>>> - bug_handler() calls report_bug() which calls __warn()
>>>>> - __warn() does lot of pr_warn()  which invokes print_worker_info()
>>>>> where we have a kprobe instrumented.
>>>>> - Therefore, we are encountering this issue.
>>
>> Hi Will,
>>
>> Can you please cross-check if following branch works perfectly with
>> kprobes_onthefly.exp and other systemtap tests.
>>
>> https://github.com/pratyushanand/linux/tree/uprobe/upstream_arm64_devel_v1.1
>>
>> Following patch in above branch should solve this issue.
>> https://github.com/pratyushanand/linux/commit/d0dcc6477f1279ab0bd99aefc30efdecb16c586e
>>
>> However, I am not yet sure that above modification could be the best solution,
>> so discussing at arm kernel list.
>>
>> ~Pratyush
>>
>
> Hi Pratyush,
>
> I am setting up a machine with the locally built kernel and systemtap to see if the problem is resolved. I hope to have some results by this evening.
>
> Could there be a better way to handle the "Unexpected kernel single-step exception at EL1"?  Getting stuck in a loop endlessly quickly printing out that message isn't very helpful. Maybe use pr_warn_ratelimited instead of pr_warn.
>
> -Will
>

Just a reminder that the latest kprobes patches are on the end of the 
for-next/kprobes branch in the:

git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git

repo.  There's at least one more tweak to that (removing jprobe stack 
copy) coming up shortly.

Thanks,
-dl

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-08-03 13:13                       ` Pratyush Anand
  2016-08-03 14:51                         ` William Cohen
@ 2016-08-03 17:40                         ` William Cohen
  2016-08-03 20:00                           ` Lastest kprobes64 patch David Long
  2016-08-04  4:42                           ` exercising current aarch64 kprobe support with systemtap Pratyush Anand
  1 sibling, 2 replies; 56+ messages in thread
From: William Cohen @ 2016-08-03 17:40 UTC (permalink / raw)
  To: Pratyush Anand
  Cc: David Long, systemtap, Mark Brown, Jeremy Linton, David Smith,
	Frank Ch. Eigler

On 08/03/2016 09:13 AM, Pratyush Anand wrote:
> On 07/07/2016:03:58:37 PM, Frank Ch. Eigler wrote:
>> David Long <dave.long@linaro.org> writes:
>>
>>> [...]
>>>> - bug_handler() calls report_bug() which calls __warn()
>>>> - __warn() does lot of pr_warn()  which invokes print_worker_info()
>>>> where we have a kprobe instrumented.
>>>> - Therefore, we are encountering this issue.
> 
> Hi Will,
> 
> Can you please cross-check if following branch works perfectly with
> kprobes_onthefly.exp and other systemtap tests.
> 
> https://github.com/pratyushanand/linux/tree/uprobe/upstream_arm64_devel_v1.1
> 
> Following patch in above branch should solve this issue.
> https://github.com/pratyushanand/linux/commit/d0dcc6477f1279ab0bd99aefc30efdecb16c586e
> 
> However, I am not yet sure that above modification could be the best solution,
> so discussing at arm kernel list.
> 
> ~Pratyush
> 

I have an AMD seattle machine set up Fedora24, the upstream_arm64_devel_v1.1 branch kernel locally built, and a locally built checkout of systemtap (systemtap rpm in fc24 doesn't generate models for linux 4.6 and newer kernels).  Tried to run the systemtap tests with:

 make installcheck RUNTESTFLAGS="--debug systemtap.onthefly/kprobes_onthefly.exp"

However at some point the kernel starts having problems:

http://paste.stg.fedoraproject.org/5375/


-Will




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Lastest kprobes64 patch
  2016-08-03 17:40                         ` William Cohen
@ 2016-08-03 20:00                           ` David Long
  2016-08-03 20:01                             ` Frank Ch. Eigler
  2016-08-04  5:03                             ` Pratyush Anand
  2016-08-04  4:42                           ` exercising current aarch64 kprobe support with systemtap Pratyush Anand
  1 sibling, 2 replies; 56+ messages in thread
From: David Long @ 2016-08-03 20:00 UTC (permalink / raw)
  To: William Cohen, Pratyush Anand
  Cc: systemtap, Mark Brown, Jeremy Linton, David Smith, Frank Ch. Eigler

In repsonse to the prolonged discussion on issues with use of a 
duplicated stack for jprobes on arm64 I have created the requested 
change of removing the stack duplication code altogether.  I don't know 
how this affects systemtap.  It would also be good to have a better idea 
of how much systemtap makes use of jprobes.

The new additional patch, plus Catalin's accumulated patches since 
kprobes64-v15, are now in my (recreated) kprobes64-v16 branch.  If 
someone has time to exercise this it I would be grateful.

Thanks,
-dl

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Lastest kprobes64 patch
  2016-08-03 20:00                           ` Lastest kprobes64 patch David Long
@ 2016-08-03 20:01                             ` Frank Ch. Eigler
  2016-08-03 20:08                               ` David Long
  2016-08-04  5:03                             ` Pratyush Anand
  1 sibling, 1 reply; 56+ messages in thread
From: Frank Ch. Eigler @ 2016-08-03 20:01 UTC (permalink / raw)
  To: David Long
  Cc: William Cohen, Pratyush Anand, systemtap, Mark Brown,
	Jeremy Linton, David Smith

Hi -

> In repsonse to the prolonged discussion on issues with use of a 
> duplicated stack for jprobes on arm64 I have created the requested 
> change of removing the stack duplication code altogether.  I don't know 
> how this affects systemtap.  [...]

systemtap does not use jprobes at all.

- FChE

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Lastest kprobes64 patch
  2016-08-03 20:01                             ` Frank Ch. Eigler
@ 2016-08-03 20:08                               ` David Long
  0 siblings, 0 replies; 56+ messages in thread
From: David Long @ 2016-08-03 20:08 UTC (permalink / raw)
  To: Frank Ch. Eigler
  Cc: William Cohen, Pratyush Anand, systemtap, Mark Brown,
	Jeremy Linton, David Smith

On 08/03/2016 04:01 PM, Frank Ch. Eigler wrote:
> Hi -
>
>> In repsonse to the prolonged discussion on issues with use of a
>> duplicated stack for jprobes on arm64 I have created the requested
>> change of removing the stack duplication code altogether.  I don't know
>> how this affects systemtap.  [...]
>
> systemtap does not use jprobes at all.
>
> - FChE
>

Yeah, I was just looking through the sources and was coming to the same 
conclusion.  That makes the need for a systemtap run on this last change 
a bit moot.

-dl

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-08-03 17:40                         ` William Cohen
  2016-08-03 20:00                           ` Lastest kprobes64 patch David Long
@ 2016-08-04  4:42                           ` Pratyush Anand
  2016-08-04 13:57                             ` William Cohen
  1 sibling, 1 reply; 56+ messages in thread
From: Pratyush Anand @ 2016-08-04  4:42 UTC (permalink / raw)
  To: William Cohen
  Cc: David Long, systemtap, Mark Brown, Jeremy Linton, David Smith,
	Frank Ch. Eigler

Hi Will,

On 03/08/2016:01:39:47 PM, William Cohen wrote:
> On 08/03/2016 09:13 AM, Pratyush Anand wrote:
> > On 07/07/2016:03:58:37 PM, Frank Ch. Eigler wrote:
> >> David Long <dave.long@linaro.org> writes:
> >>
> >>> [...]
> >>>> - bug_handler() calls report_bug() which calls __warn()
> >>>> - __warn() does lot of pr_warn()  which invokes print_worker_info()
> >>>> where we have a kprobe instrumented.
> >>>> - Therefore, we are encountering this issue.
> > 
> > Hi Will,
> > 
> > Can you please cross-check if following branch works perfectly with
> > kprobes_onthefly.exp and other systemtap tests.
> > 
> > https://github.com/pratyushanand/linux/tree/uprobe/upstream_arm64_devel_v1.1
> > 
> > Following patch in above branch should solve this issue.
> > https://github.com/pratyushanand/linux/commit/d0dcc6477f1279ab0bd99aefc30efdecb16c586e
> > 
> > However, I am not yet sure that above modification could be the best solution,
> > so discussing at arm kernel list.
> > 
> > ~Pratyush
> > 
> 
> I have an AMD seattle machine set up Fedora24, the upstream_arm64_devel_v1.1 branch kernel locally built, and a locally built checkout of systemtap (systemtap rpm in fc24 doesn't generate models for linux 4.6 and newer kernels).  Tried to run the systemtap tests with:
> 
>  make installcheck RUNTESTFLAGS="--debug systemtap.onthefly/kprobes_onthefly.exp"
> 
> However at some point the kernel starts having problems:
> 
> http://paste.stg.fedoraproject.org/5375/

Yes, this is what you were getting with earlier code as well, but now it is not
going to infinite unexpected EL1, so at least proposed kprobe improvement seems
fine to me.

In this failing test we are getting oom and the oom_killer is called. So,
I think, this is another point of investigation that why this OOM occurs. 

~Pratyush

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Lastest kprobes64 patch
  2016-08-03 20:00                           ` Lastest kprobes64 patch David Long
  2016-08-03 20:01                             ` Frank Ch. Eigler
@ 2016-08-04  5:03                             ` Pratyush Anand
  2016-08-04 13:07                               ` David Long
  1 sibling, 1 reply; 56+ messages in thread
From: Pratyush Anand @ 2016-08-04  5:03 UTC (permalink / raw)
  To: David Long
  Cc: William Cohen, systemtap, Mark Brown, Jeremy Linton, David Smith,
	Frank Ch. Eigler

Hi Dave,

On 03/08/2016:04:00:33 PM, David Long wrote:
> In repsonse to the prolonged discussion on issues with use of a duplicated
> stack for jprobes on arm64 I have created the requested change of removing
> the stack duplication code altogether.  I don't know how this affects
> systemtap.  It would also be good to have a better idea of how much
> systemtap makes use of jprobes.
> 
> The new additional patch, plus Catalin's accumulated patches since
> kprobes64-v15, are now in my (recreated) kprobes64-v16 branch.  If someone
> has time to exercise this it I would be grateful.

Your all patches except "arm64: Remove stack duplicating code from jprobes" are
already in torvalds/linux.git:master now.

~Pratyush

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Lastest kprobes64 patch
  2016-08-04  5:03                             ` Pratyush Anand
@ 2016-08-04 13:07                               ` David Long
  0 siblings, 0 replies; 56+ messages in thread
From: David Long @ 2016-08-04 13:07 UTC (permalink / raw)
  To: Pratyush Anand
  Cc: William Cohen, systemtap, Mark Brown, Jeremy Linton, David Smith,
	Frank Ch. Eigler

On 08/04/2016 01:03 AM, Pratyush Anand wrote:
> Hi Dave,
>
> On 03/08/2016:04:00:33 PM, David Long wrote:
>> In repsonse to the prolonged discussion on issues with use of a duplicated
>> stack for jprobes on arm64 I have created the requested change of removing
>> the stack duplication code altogether.  I don't know how this affects
>> systemtap.  It would also be good to have a better idea of how much
>> systemtap makes use of jprobes.
>>
>> The new additional patch, plus Catalin's accumulated patches since
>> kprobes64-v15, are now in my (recreated) kprobes64-v16 branch.  If someone
>> has time to exercise this it I would be grateful.
>
> Your all patches except "arm64: Remove stack duplicating code from jprobes" are
> already in torvalds/linux.git:master now.
>
> ~Pratyush
>

Thanks for that info, I was expecting it to be held up for removal of 
the jprobes stack duplicating code.

Note that I plan to send out a separate patch for the improvement to the 
code searching for atomic sequences.  I didn't think it was wise or 
necessary to try and bundle that into the initial patch at the point it 
was at.

-dl

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-08-04  4:42                           ` exercising current aarch64 kprobe support with systemtap Pratyush Anand
@ 2016-08-04 13:57                             ` William Cohen
  2016-08-04 14:36                               ` Pratyush Anand
  0 siblings, 1 reply; 56+ messages in thread
From: William Cohen @ 2016-08-04 13:57 UTC (permalink / raw)
  To: Pratyush Anand
  Cc: David Long, systemtap, Mark Brown, Jeremy Linton, David Smith,
	Frank Ch. Eigler

On 08/04/2016 12:42 AM, Pratyush Anand wrote:
> Hi Will,
> 
> On 03/08/2016:01:39:47 PM, William Cohen wrote:
>> On 08/03/2016 09:13 AM, Pratyush Anand wrote:
>>> On 07/07/2016:03:58:37 PM, Frank Ch. Eigler wrote:
>>>> David Long <dave.long@linaro.org> writes:
>>>>
>>>>> [...]
>>>>>> - bug_handler() calls report_bug() which calls __warn()
>>>>>> - __warn() does lot of pr_warn()  which invokes print_worker_info()
>>>>>> where we have a kprobe instrumented.
>>>>>> - Therefore, we are encountering this issue.
>>>
>>> Hi Will,
>>>
>>> Can you please cross-check if following branch works perfectly with
>>> kprobes_onthefly.exp and other systemtap tests.
>>>
>>> https://github.com/pratyushanand/linux/tree/uprobe/upstream_arm64_devel_v1.1
>>>
>>> Following patch in above branch should solve this issue.
>>> https://github.com/pratyushanand/linux/commit/d0dcc6477f1279ab0bd99aefc30efdecb16c586e
>>>
>>> However, I am not yet sure that above modification could be the best solution,
>>> so discussing at arm kernel list.
>>>
>>> ~Pratyush
>>>
>>
>> I have an AMD seattle machine set up Fedora24, the upstream_arm64_devel_v1.1 branch kernel locally built, and a locally built checkout of systemtap (systemtap rpm in fc24 doesn't generate models for linux 4.6 and newer kernels).  Tried to run the systemtap tests with:
>>
>>  make installcheck RUNTESTFLAGS="--debug systemtap.onthefly/kprobes_onthefly.exp"
>>
>> However at some point the kernel starts having problems:
>>
>> http://paste.stg.fedoraproject.org/5375/
> 
> Yes, this is what you were getting with earlier code as well, but now it is not
> going to infinite unexpected EL1, so at least proposed kprobe improvement seems
> fine to me.
> 
> In this failing test we are getting oom and the oom_killer is called. So,
> I think, this is another point of investigation that why this OOM occurs. 
> 
> ~Pratyush
> 

Hi,

The OOM errors came before the otf_stress_hard_iter_5000 test that previous triggered the infinite unexpected EL1, so can't really say that the proposed patch has fixed the problem.

Any thoughts on how to track down the oom issue?  Are you able to replicate it running the systemtap onthefly/kprobes_onthefly.exp tests?

-Will

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-08-04 13:57                             ` William Cohen
@ 2016-08-04 14:36                               ` Pratyush Anand
  2016-08-04 14:50                                 ` William Cohen
  2016-08-04 20:51                                 ` William Cohen
  0 siblings, 2 replies; 56+ messages in thread
From: Pratyush Anand @ 2016-08-04 14:36 UTC (permalink / raw)
  To: William Cohen
  Cc: David Long, systemtap, Mark Brown, Jeremy Linton, David Smith,
	Frank Ch. Eigler

Hi Will,

On 04/08/2016:09:56:45 AM, William Cohen wrote:
> On 08/04/2016 12:42 AM, Pratyush Anand wrote:
> > Hi Will,
> > 
> > On 03/08/2016:01:39:47 PM, William Cohen wrote:
> >> On 08/03/2016 09:13 AM, Pratyush Anand wrote:
> >>> On 07/07/2016:03:58:37 PM, Frank Ch. Eigler wrote:
> >>>> David Long <dave.long@linaro.org> writes:
> >>>>
> >>>>> [...]
> >>>>>> - bug_handler() calls report_bug() which calls __warn()
> >>>>>> - __warn() does lot of pr_warn()  which invokes print_worker_info()
> >>>>>> where we have a kprobe instrumented.
> >>>>>> - Therefore, we are encountering this issue.
> >>>
> >>> Hi Will,
> >>>
> >>> Can you please cross-check if following branch works perfectly with
> >>> kprobes_onthefly.exp and other systemtap tests.
> >>>
> >>> https://github.com/pratyushanand/linux/tree/uprobe/upstream_arm64_devel_v1.1
> >>>
> >>> Following patch in above branch should solve this issue.
> >>> https://github.com/pratyushanand/linux/commit/d0dcc6477f1279ab0bd99aefc30efdecb16c586e
> >>>
> >>> However, I am not yet sure that above modification could be the best solution,
> >>> so discussing at arm kernel list.
> >>>
> >>> ~Pratyush
> >>>
> >>
> >> I have an AMD seattle machine set up Fedora24, the upstream_arm64_devel_v1.1 branch kernel locally built, and a locally built checkout of systemtap (systemtap rpm in fc24 doesn't generate models for linux 4.6 and newer kernels).  Tried to run the systemtap tests with:
> >>
> >>  make installcheck RUNTESTFLAGS="--debug systemtap.onthefly/kprobes_onthefly.exp"
> >>
> >> However at some point the kernel starts having problems:
> >>
> >> http://paste.stg.fedoraproject.org/5375/
> > 
> > Yes, this is what you were getting with earlier code as well, but now it is not
> > going to infinite unexpected EL1, so at least proposed kprobe improvement seems
> > fine to me.
> > 
> > In this failing test we are getting oom and the oom_killer is called. So,
> > I think, this is another point of investigation that why this OOM occurs. 
> > 
> > ~Pratyush
> > 
> 
> Hi,
> 
> The OOM errors came before the otf_stress_hard_iter_5000 test that previous triggered the infinite unexpected EL1, so can't really say that the proposed patch has fixed the problem.

Yes, yes, previously also we were getting OOM, and then that OOM was triggering
infinite unexpected EL1, because OOM message uses WARN_ON() to print, and
WARN_ON() uses "BRK BUG_BRK_IMM". Now when it is printing though BRK, we were
hitting kprobe at print_worker_info() which was resulting in unexpected EL1.

Proposed patch fixes kprobe tracing within none kprobe BRK context such as
uprobe or WARN_ON() breakpoint handler etc. So, now a kprobe at
print_worker_info() will work while printing message of WARN_ON().


> 
> Any thoughts on how to track down the oom issue?  Are you able to replicate it running the systemtap onthefly/kprobes_onthefly.exp tests?

Sure, will look into. Have reserved a seattle.

~Pratyush

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-08-04 14:36                               ` Pratyush Anand
@ 2016-08-04 14:50                                 ` William Cohen
  2016-08-04 20:51                                 ` William Cohen
  1 sibling, 0 replies; 56+ messages in thread
From: William Cohen @ 2016-08-04 14:50 UTC (permalink / raw)
  To: Pratyush Anand
  Cc: David Long, systemtap, Mark Brown, Jeremy Linton, David Smith,
	Frank Ch. Eigler

On 08/04/2016 10:35 AM, Pratyush Anand wrote:
> Hi Will,
> 
> On 04/08/2016:09:56:45 AM, William Cohen wrote:
>> On 08/04/2016 12:42 AM, Pratyush Anand wrote:
>>> Hi Will,
>>>
>>> On 03/08/2016:01:39:47 PM, William Cohen wrote:
>>>> On 08/03/2016 09:13 AM, Pratyush Anand wrote:
>>>>> On 07/07/2016:03:58:37 PM, Frank Ch. Eigler wrote:
>>>>>> David Long <dave.long@linaro.org> writes:
>>>>>>
>>>>>>> [...]
>>>>>>>> - bug_handler() calls report_bug() which calls __warn()
>>>>>>>> - __warn() does lot of pr_warn()  which invokes print_worker_info()
>>>>>>>> where we have a kprobe instrumented.
>>>>>>>> - Therefore, we are encountering this issue.
>>>>>
>>>>> Hi Will,
>>>>>
>>>>> Can you please cross-check if following branch works perfectly with
>>>>> kprobes_onthefly.exp and other systemtap tests.
>>>>>
>>>>> https://github.com/pratyushanand/linux/tree/uprobe/upstream_arm64_devel_v1.1
>>>>>
>>>>> Following patch in above branch should solve this issue.
>>>>> https://github.com/pratyushanand/linux/commit/d0dcc6477f1279ab0bd99aefc30efdecb16c586e
>>>>>
>>>>> However, I am not yet sure that above modification could be the best solution,
>>>>> so discussing at arm kernel list.
>>>>>
>>>>> ~Pratyush
>>>>>
>>>>
>>>> I have an AMD seattle machine set up Fedora24, the upstream_arm64_devel_v1.1 branch kernel locally built, and a locally built checkout of systemtap (systemtap rpm in fc24 doesn't generate models for linux 4.6 and newer kernels).  Tried to run the systemtap tests with:
>>>>
>>>>  make installcheck RUNTESTFLAGS="--debug systemtap.onthefly/kprobes_onthefly.exp"
>>>>
>>>> However at some point the kernel starts having problems:
>>>>
>>>> http://paste.stg.fedoraproject.org/5375/
>>>
>>> Yes, this is what you were getting with earlier code as well, but now it is not
>>> going to infinite unexpected EL1, so at least proposed kprobe improvement seems
>>> fine to me.
>>>
>>> In this failing test we are getting oom and the oom_killer is called. So,
>>> I think, this is another point of investigation that why this OOM occurs. 
>>>
>>> ~Pratyush
>>>
>>
>> Hi,
>>
>> The OOM errors came before the otf_stress_hard_iter_5000 test that previous triggered the infinite unexpected EL1, so can't really say that the proposed patch has fixed the problem.
> 
> Yes, yes, previously also we were getting OOM, and then that OOM was triggering
> infinite unexpected EL1, because OOM message uses WARN_ON() to print, and
> WARN_ON() uses "BRK BUG_BRK_IMM". Now when it is printing though BRK, we were
> hitting kprobe at print_worker_info() which was resulting in unexpected EL1.
> 
> Proposed patch fixes kprobe tracing within none kprobe BRK context such as
> uprobe or WARN_ON() breakpoint handler etc. So, now a kprobe at
> print_worker_info() will work while printing message of WARN_ON().
> 

Okay, I didn't realize that the EL1 issue was hiding the OOM issue. So the patch is helping things.

> 
>>
>> Any thoughts on how to track down the oom issue?  Are you able to replicate it running the systemtap onthefly/kprobes_onthefly.exp tests?
> 
> Sure, will look into. Have reserved a seattle.

Thanks so much.

-Will

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-08-04 14:36                               ` Pratyush Anand
  2016-08-04 14:50                                 ` William Cohen
@ 2016-08-04 20:51                                 ` William Cohen
  2016-08-17 14:36                                   ` William Cohen
  1 sibling, 1 reply; 56+ messages in thread
From: William Cohen @ 2016-08-04 20:51 UTC (permalink / raw)
  To: Pratyush Anand
  Cc: David Long, systemtap, Mark Brown, Jeremy Linton, David Smith,
	Frank Ch. Eigler

On 08/04/2016 10:35 AM, Pratyush Anand wrote:
> Hi Will,
> 
> On 04/08/2016:09:56:45 AM, William Cohen wrote:
...
>> Hi,
>>
>> The OOM errors came before the otf_stress_hard_iter_5000 test that previous triggered the infinite unexpected EL1, so can't really say that the proposed patch has fixed the problem.
> 
> Yes, yes, previously also we were getting OOM, and then that OOM was triggering
> infinite unexpected EL1, because OOM message uses WARN_ON() to print, and
> WARN_ON() uses "BRK BUG_BRK_IMM". Now when it is printing though BRK, we were
> hitting kprobe at print_worker_info() which was resulting in unexpected EL1.
> 
> Proposed patch fixes kprobe tracing within none kprobe BRK context such as
> uprobe or WARN_ON() breakpoint handler etc. So, now a kprobe at
> print_worker_info() will work while printing message of WARN_ON().
> 
> 
>>
>> Any thoughts on how to track down the oom issue?  Are you able to replicate it running the systemtap onthefly/kprobes_onthefly.exp tests?
> 
> Sure, will look into. Have reserved a seattle.
> 
> ~Pratyush
> 

Hi Pratyush,

The stack backtrace of http://paste.stg.fedoraproject.org/5375/ is:


[  668.676682] [<fffffc00082386fc>] page_counter_cancel+0x54/0x60
[  668.682508] [<fffffc000823885c>] page_counter_uncharge+0x2c/0x40
[  668.688509] [<fffffc0008239c68>] cancel_charge+0x40/0xe0
[  668.693815] [<fffffc000823fdfc>] mem_cgroup_cancel_charge+0x2c/0x38
[  668.700088] [<fffffc00081c96a8>] uprobe_write_opcode+0x4e8/0x688
[  668.706089] [<fffffc00081c9878>] set_swbp+0x30/0x40
[  668.710962] [<fffffc00081c98e4>] install_breakpoint.isra.10+0x5c/0x2b8
[  668.717484] [<fffffc00081ca6d8>] uprobe_mmap+0x248/0x2a8
[  668.722791] [<fffffc000820fbac>] mmap_region+0x204/0x558
[  668.728097] [<fffffc0008210164>] do_mmap+0x264/0x320
[  668.733057] [<fffffc00081f2238>] vm_mmap_pgoff+0xb0/0xd8
[  668.738363] [<fffffc00081f22d0>] vm_mmap+0x70/0xa0
[  668.743149] [<fffffc00082a62c8>] elf_map+0x80/0xf8
[  668.747934] [<fffffc00082a7a48>] load_elf_binary+0x480/0xb90
[  668.753588] [<fffffc0008252e7c>] search_binary_handler+0xbc/0x210
[  668.759674] [<fffffc0008253810>] do_execveat_common+0x4b0/0x620
[  668.765587] [<fffffc0008253c74>] SyS_execve+0x44/0x58
[  668.770633] [<fffffc0008082c4c>] __sys_trace_return+0x0/0x4

There is some uprobe code running in the traceback.  It looks like things are going wrong when uprobes are being installed on a newly loaded executable.

-Will

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-08-04 20:51                                 ` William Cohen
@ 2016-08-17 14:36                                   ` William Cohen
  2016-08-17 18:04                                     ` David Smith
  2016-08-18 14:55                                     ` Pratyush Anand
  0 siblings, 2 replies; 56+ messages in thread
From: William Cohen @ 2016-08-17 14:36 UTC (permalink / raw)
  To: Pratyush Anand
  Cc: David Long, systemtap, Mark Brown, Jeremy Linton, David Smith,
	Frank Ch. Eigler

On 08/04/2016 04:50 PM, William Cohen wrote:
> On 08/04/2016 10:35 AM, Pratyush Anand wrote:
>> Hi Will,
>>
>> On 04/08/2016:09:56:45 AM, William Cohen wrote:
> ...
>>> Hi,
>>>
>>> The OOM errors came before the otf_stress_hard_iter_5000 test that previous triggered the infinite unexpected EL1, so can't really say that the proposed patch has fixed the problem.
>>
>> Yes, yes, previously also we were getting OOM, and then that OOM was triggering
>> infinite unexpected EL1, because OOM message uses WARN_ON() to print, and
>> WARN_ON() uses "BRK BUG_BRK_IMM". Now when it is printing though BRK, we were
>> hitting kprobe at print_worker_info() which was resulting in unexpected EL1.
>>
>> Proposed patch fixes kprobe tracing within none kprobe BRK context such as
>> uprobe or WARN_ON() breakpoint handler etc. So, now a kprobe at
>> print_worker_info() will work while printing message of WARN_ON().
>>
>>
>>>
>>> Any thoughts on how to track down the oom issue?  Are you able to replicate it running the systemtap onthefly/kprobes_onthefly.exp tests?
>>
>> Sure, will look into. Have reserved a seattle.
>>
>> ~Pratyush
>>
> 
> Hi Pratyush,
> 
> The stack backtrace of http://paste.stg.fedoraproject.org/5375/ is:
> 
> 
> [  668.676682] [<fffffc00082386fc>] page_counter_cancel+0x54/0x60
> [  668.682508] [<fffffc000823885c>] page_counter_uncharge+0x2c/0x40
> [  668.688509] [<fffffc0008239c68>] cancel_charge+0x40/0xe0
> [  668.693815] [<fffffc000823fdfc>] mem_cgroup_cancel_charge+0x2c/0x38
> [  668.700088] [<fffffc00081c96a8>] uprobe_write_opcode+0x4e8/0x688
> [  668.706089] [<fffffc00081c9878>] set_swbp+0x30/0x40
> [  668.710962] [<fffffc00081c98e4>] install_breakpoint.isra.10+0x5c/0x2b8
> [  668.717484] [<fffffc00081ca6d8>] uprobe_mmap+0x248/0x2a8
> [  668.722791] [<fffffc000820fbac>] mmap_region+0x204/0x558
> [  668.728097] [<fffffc0008210164>] do_mmap+0x264/0x320
> [  668.733057] [<fffffc00081f2238>] vm_mmap_pgoff+0xb0/0xd8
> [  668.738363] [<fffffc00081f22d0>] vm_mmap+0x70/0xa0
> [  668.743149] [<fffffc00082a62c8>] elf_map+0x80/0xf8
> [  668.747934] [<fffffc00082a7a48>] load_elf_binary+0x480/0xb90
> [  668.753588] [<fffffc0008252e7c>] search_binary_handler+0xbc/0x210
> [  668.759674] [<fffffc0008253810>] do_execveat_common+0x4b0/0x620
> [  668.765587] [<fffffc0008253c74>] SyS_execve+0x44/0x58
> [  668.770633] [<fffffc0008082c4c>] __sys_trace_return+0x0/0x4
> 
> There is some uprobe code running in the traceback.  It looks like things are going wrong when uprobes are being installed on a newly loaded executable.
> 
> -Will
> 

Hi,

I was able to locally build uptream_arm64-devel branch of  https://github.com/pratyushanand/linux.git with the configure from fedora rawhide and run the systemtap tests. Pratyush were there changes in patches between these versions?  The only other difference is that the machine above was a fedora 24 machine rather than a RHELSA, so there would be differences in the compiler and other tools. The results (systemtap.log and systemtap.sum) are at:

http://people.redhat.com/wcohen/aarch64/20160817/

I the results have been sent to dejazilla, but dejazilla appears to be having issues with displaying results (https://web.elastic.org/~dejazilla/viewsummary.php?summary=%3D%27%3Ccdc0ada3-26b8-295d-2d3b-d8a88da83e17%40redhat.com%3E%27)

The results look pretty respectable

		=== systemtap Summary ===

# of expected passes		8809
# of unexpected failures	69
# of unexpected successes	1
# of expected failures		339
# of unknown successes		3
# of known failures		95
# of untested testcases		749
# of unsupported tests		33

There are about a dozen failures due to the the search for atomic regions going beyond the beginning of the function which prevents probes on a number of functions like the following test:

spawn stap /root/systemtap_write/systemtap/testsuite/systemtap.base/bz1027459.stp
WARNING: probe kernel.function("SyS_set_tid_address@kernel/fork.c:1234").call (address 0xfffffc00080cb528) registration error (rc -22)
WARNING: probe kernel.function("SyS_sched_setaffinity@kernel/sched/core.c:4716").call (address 0xfffffc00081051b0) registration error (rc -22)
WARNING: probe kernel.function("SyS_sched_get_priority_min@kernel/sched/core.c:5040").call (address 0xfffffc0008105688) registration error (rc -22)
WARNING: probe kernel.function("SyS_sched_get_priority_max@kernel/sched/core.c:5013").call (address 0xfffffc0008105620) registration error (rc -22)

There are also some differences in the syscalls used on aarch64 that cause some of the tests to fail.

-Will

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-08-17 14:36                                   ` William Cohen
@ 2016-08-17 18:04                                     ` David Smith
  2016-08-17 18:28                                       ` William Cohen
  2016-08-18 14:55                                     ` Pratyush Anand
  1 sibling, 1 reply; 56+ messages in thread
From: David Smith @ 2016-08-17 18:04 UTC (permalink / raw)
  To: William Cohen, Pratyush Anand
  Cc: David Long, systemtap, Mark Brown, Jeremy Linton, Frank Ch. Eigler

On 08/17/2016 09:36 AM, William Cohen wrote:
> Hi,
> 
> I was able to locally build uptream_arm64-devel branch of  
> https://github.com/pratyushanand/linux.git with the configure
> from fedora rawhide and run the systemtap tests. Pratyush were
> there changes in patches between these versions?  The only other
> difference is that the machine above was a fedora 24 machine rather
> than a RHELSA, so there would be differences in the compiler and
> other tools. The results (systemtap.log and systemtap.sum) are at:
> 
> http://people.redhat.com/wcohen/aarch64/20160817/

... stuff deleted ...

> There are also some differences in the syscalls used on aarch64 that
> cause some of the tests to fail.

I looked a bit at those syscall failures, and I'm not sure what is going
on. Note that the syscall/nd_syscall test cases pass completely on RHEL7
aarch64. I did see one easy fix - you were getting registration errors
for the sched_[gs]etaffinity syscalls. Commit 619425f makes them fully
optional.

I'd need access to that machine to debug the other syscall failures further.

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-08-17 18:04                                     ` David Smith
@ 2016-08-17 18:28                                       ` William Cohen
  2016-08-18 15:07                                         ` David Smith
  0 siblings, 1 reply; 56+ messages in thread
From: William Cohen @ 2016-08-17 18:28 UTC (permalink / raw)
  To: David Smith, Pratyush Anand
  Cc: David Long, systemtap, Mark Brown, Jeremy Linton, Frank Ch. Eigler

On 08/17/2016 02:04 PM, David Smith wrote:
> On 08/17/2016 09:36 AM, William Cohen wrote:
>> Hi,
>>
>> I was able to locally build uptream_arm64-devel branch of  
>> https://github.com/pratyushanand/linux.git with the configure
>> from fedora rawhide and run the systemtap tests. Pratyush were
>> there changes in patches between these versions?  The only other
>> difference is that the machine above was a fedora 24 machine rather
>> than a RHELSA, so there would be differences in the compiler and
>> other tools. The results (systemtap.log and systemtap.sum) are at:
>>
>> http://people.redhat.com/wcohen/aarch64/20160817/
> 
> ... stuff deleted ...
> 
>> There are also some differences in the syscalls used on aarch64 that
>> cause some of the tests to fail.
> 
> I looked a bit at those syscall failures, and I'm not sure what is going
> on. Note that the syscall/nd_syscall test cases pass completely on RHEL7
> aarch64. I did see one easy fix - you were getting registration errors
> for the sched_[gs]etaffinity syscalls. Commit 619425f makes them fully
> optional.

The functions are actually there otherwise the error would happen during the earlier passes. The kprobes registration errors are due to the the heuristic that scans backward through the instructions to make sure that the kprobe is not in an atomic region. The heuristic got past the beginning of the function and interprets some of the data before the start of the function as the start of an atomic region.  There is a proposed fix for this, but it is not in this particular kernel.

There are some syscalls that either are a bit different in implentation or do not fail in the same way as x86 that are causing some of the tests to fail.  For example syscall.fork doesn't seem to be implemented on aarch64 and setgroups return 0 rather than an error for some argument combinations.

> 
> I'd need access to that machine to debug the other syscall failures further.
> 

Thanks,

-Will

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-08-17 14:36                                   ` William Cohen
  2016-08-17 18:04                                     ` David Smith
@ 2016-08-18 14:55                                     ` Pratyush Anand
  1 sibling, 0 replies; 56+ messages in thread
From: Pratyush Anand @ 2016-08-18 14:55 UTC (permalink / raw)
  To: William Cohen
  Cc: David Long, systemtap, Mark Brown, Jeremy Linton, David Smith,
	Frank Ch. Eigler

Hi Will,

On 17/08/2016:10:36:23 AM, William Cohen wrote:
> Hi,
> 
> I was able to locally build uptream_arm64-devel branch of  https://github.com/pratyushanand/linux.git with the configure from fedora rawhide and run the systemtap tests. Pratyush were there changes in patches between these versions?  The only other difference is that the machine above was a fedora 24 machine rather than a RHELSA, so there would be differences in the compiler and other tools. The results (systemtap.log and systemtap.sum) are at:

I had rebased it over 4.8-rc2. This would have been the only change.

~Pratyush

> 
> http://people.redhat.com/wcohen/aarch64/20160817/
> 
> I the results have been sent to dejazilla, but dejazilla appears to be having issues with displaying results (https://web.elastic.org/~dejazilla/viewsummary.php?summary=%3D%27%3Ccdc0ada3-26b8-295d-2d3b-d8a88da83e17%40redhat.com%3E%27)
> 
> The results look pretty respectable
> 
> 		=== systemtap Summary ===
> 
> # of expected passes		8809
> # of unexpected failures	69
> # of unexpected successes	1
> # of expected failures		339
> # of unknown successes		3
> # of known failures		95
> # of untested testcases		749
> # of unsupported tests		33
> 
> There are about a dozen failures due to the the search for atomic regions going beyond the beginning of the function which prevents probes on a number of functions like the following test:
> 
> spawn stap /root/systemtap_write/systemtap/testsuite/systemtap.base/bz1027459.stp
> WARNING: probe kernel.function("SyS_set_tid_address@kernel/fork.c:1234").call (address 0xfffffc00080cb528) registration error (rc -22)
> WARNING: probe kernel.function("SyS_sched_setaffinity@kernel/sched/core.c:4716").call (address 0xfffffc00081051b0) registration error (rc -22)
> WARNING: probe kernel.function("SyS_sched_get_priority_min@kernel/sched/core.c:5040").call (address 0xfffffc0008105688) registration error (rc -22)
> WARNING: probe kernel.function("SyS_sched_get_priority_max@kernel/sched/core.c:5013").call (address 0xfffffc0008105620) registration error (rc -22)
> 
> There are also some differences in the syscalls used on aarch64 that cause some of the tests to fail.
> 
> -Will

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-08-17 18:28                                       ` William Cohen
@ 2016-08-18 15:07                                         ` David Smith
  2016-08-18 15:16                                           ` William Cohen
  0 siblings, 1 reply; 56+ messages in thread
From: David Smith @ 2016-08-18 15:07 UTC (permalink / raw)
  To: William Cohen, Pratyush Anand
  Cc: David Long, systemtap, Mark Brown, Jeremy Linton, Frank Ch. Eigler

On 08/17/2016 01:27 PM, William Cohen wrote:
> On 08/17/2016 02:04 PM, David Smith wrote:
>> On 08/17/2016 09:36 AM, William Cohen wrote:
>>> Hi,
>>>
>>> I was able to locally build uptream_arm64-devel branch of  
>>> https://github.com/pratyushanand/linux.git with the configure
>>> from fedora rawhide and run the systemtap tests. Pratyush were
>>> there changes in patches between these versions?  The only other
>>> difference is that the machine above was a fedora 24 machine rather
>>> than a RHELSA, so there would be differences in the compiler and
>>> other tools. The results (systemtap.log and systemtap.sum) are at:
>>>
>>> http://people.redhat.com/wcohen/aarch64/20160817/
>>
>> ... stuff deleted ...
>>
>>> There are also some differences in the syscalls used on aarch64 that
>>> cause some of the tests to fail.
>>
>> I looked a bit at those syscall failures, and I'm not sure what is going
>> on. Note that the syscall/nd_syscall test cases pass completely on RHEL7
>> aarch64. I did see one easy fix - you were getting registration errors
>> for the sched_[gs]etaffinity syscalls. Commit 619425f makes them fully
>> optional.
> 
> The functions are actually there otherwise the error would happen during the
> earlier passes. The kprobes registration errors are due to the the heuristic
> that scans backward through the instructions to make sure that the
kprobe is
> not in an atomic region. The heuristic got past the beginning of the function
> and interprets some of the data before the start of the function as
the start
> of an atomic region.  There is a proposed fix for this, but it is not
in this
> particular kernel.
> 
> There are some syscalls that either are a bit different in implentation or do
> not fail in the same way as x86 that are causing some of the tests to
fail.
> For example syscall.fork doesn't seem to be implemented on aarch64 and
setgroups
> return 0 rather than an error for some argument combinations.

OK, here's the status of the [nd_]syscall test failures. There were 2
failures caused by a test case being too specific when looking for
syscall return value. There were fixed by commit 9c004b0:

FAIL: 64-bit getgroups syscall
FAIL: 64-bit setgroups syscall

There are 4 failures caused by the atomic region kprobes registration bug:

FAIL: 64-bit sched syscall
FAIL: 64-bit sched_setaffinity syscall
FAIL: 64-bit sched_setscheduler syscall
FAIL: 64-bit set_tid_address syscall

I verified all these failures by trying to use perf to put a probe on
the same functions:

====
# perf probe --add=sys_set_tid_address
Failed to write event: Invalid argument
  Error: Failed to add events.
====

I'm not seeing any issue with fork in the testsuite results, but perhaps
I've missed something. What error are you referring to?

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-08-18 15:07                                         ` David Smith
@ 2016-08-18 15:16                                           ` William Cohen
  2016-08-18 15:39                                             ` David Smith
  0 siblings, 1 reply; 56+ messages in thread
From: William Cohen @ 2016-08-18 15:16 UTC (permalink / raw)
  To: David Smith, Pratyush Anand
  Cc: David Long, systemtap, Mark Brown, Jeremy Linton, Frank Ch. Eigler

On 08/18/2016 11:06 AM, David Smith wrote:
...
> OK, here's the status of the [nd_]syscall test failures. There were 2
> failures caused by a test case being too specific when looking for
> syscall return value. There were fixed by commit 9c004b0:
> 
> FAIL: 64-bit getgroups syscall
> FAIL: 64-bit setgroups syscall
> 
> There are 4 failures caused by the atomic region kprobes registration bug:
> 
> FAIL: 64-bit sched syscall
> FAIL: 64-bit sched_setaffinity syscall
> FAIL: 64-bit sched_setscheduler syscall
> FAIL: 64-bit set_tid_address syscall
> 
> I verified all these failures by trying to use perf to put a probe on
> the same functions:
> 
> ====
> # perf probe --add=sys_set_tid_address
> Failed to write event: Invalid argument
>   Error: Failed to add events.
> ====
> 
> I'm not seeing any issue with fork in the testsuite results, but perhaps
> I've missed something. What error are you referring to?
> 

Sorry,  I should have mentioned where I saw it.  The fork issues weren't in the syscall tests.  It was for procmod_wather.stp.  Below is the part of systemtap.log that shows the problem:

meta taglines 'test_check: stap -p4 procmod_watcher.stp' tag 'test_check' value 'stap -p4 procmod_watcher.stp'
attempting command stap -p4 procmod_watcher.stp
OUT semantic error: while resolving probe point: identifier 'nd_syscall' at procmod_watcher.stp:47:7
        source: probe nd_syscall.fork.return {
                      ^

semantic error: no match

Pass 2: analysis failed.  [man error::pass2]
child process exited abnormally
RC 1
FAIL: systemtap.examples/process/procmod_watcher build
meta taglines 'test_installcheck: stap procmod_watcher.stp -T 1' tag 'test_installcheck' value 'stap procmod_watcher.stp -T 1'
UNTESTED: systemtap.examples/process/procmod_watcher run

-Will

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: exercising current aarch64 kprobe support with systemtap
  2016-08-18 15:16                                           ` William Cohen
@ 2016-08-18 15:39                                             ` David Smith
  0 siblings, 0 replies; 56+ messages in thread
From: David Smith @ 2016-08-18 15:39 UTC (permalink / raw)
  To: William Cohen, Pratyush Anand
  Cc: David Long, systemtap, Mark Brown, Jeremy Linton, Frank Ch. Eigler

On 08/18/2016 10:16 AM, William Cohen wrote:
> On 08/18/2016 11:06 AM, David Smith wrote:
>> I'm not seeing any issue with fork in the testsuite results, but perhaps
>> I've missed something. What error are you referring to?
>>
> 
> Sorry,  I should have mentioned where I saw it.  The fork issues weren't in the syscall tests.  It was for procmod_wather.stp.  Below is the part of systemtap.log that shows the problem:
> 
> meta taglines 'test_check: stap -p4 procmod_watcher.stp' tag 'test_check' value 'stap -p4 procmod_watcher.stp'
> attempting command stap -p4 procmod_watcher.stp
> OUT semantic error: while resolving probe point: identifier 'nd_syscall' at procmod_watcher.stp:47:7
>         source: probe nd_syscall.fork.return {
>                       ^
> 
> semantic error: no match
> 
> Pass 2: analysis failed.  [man error::pass2]
> child process exited abnormally
> RC 1
> FAIL: systemtap.examples/process/procmod_watcher build
> meta taglines 'test_installcheck: stap procmod_watcher.stp -T 1' tag 'test_installcheck' value 'stap procmod_watcher.stp -T 1'
> UNTESTED: systemtap.examples/process/procmod_watcher run

Ah. That isn't really an aarch specific problem, lots of kernels
implement the C library fork() call with clone().

That's a problem with the procmod_watcher.stp example. I've updated it
to handle this problem (and another with exit()) in commit 543563e.

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2016-08-18 15:39 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-09 16:17 exercising current aarch64 kprobe support with systemtap William Cohen
2016-06-09 19:52 ` William Cohen
2016-06-10  3:42   ` David Long
2016-06-10  5:49   ` David Long
2016-06-10 13:43     ` Pratyush Anand
2016-06-10 14:03       ` William Cohen
2016-06-10 14:37         ` David Long
2016-06-10 15:27           ` William Cohen
2016-06-10 14:20       ` David Long
2016-06-10 15:11         ` William Cohen
2016-06-10 17:07         ` Pratyush Anand
2016-07-12 14:33     ` William Cohen
2016-07-13 18:26       ` David Long
2016-07-13 18:47         ` Pratyush Anand
2016-07-13 19:45           ` William Cohen
2016-06-10 21:28 ` William Cohen
2016-06-10 21:37   ` William Cohen
2016-06-13  4:28   ` Pratyush Anand
2016-06-13 13:42     ` William Cohen
2016-06-22 20:24   ` William Cohen
2016-06-23  3:19     ` David Long
2016-06-23 13:42       ` William Cohen
2016-06-23 13:47         ` David Smith
2016-06-23 15:49       ` William Cohen
2016-06-23 18:26         ` David Long
2016-06-23 19:22           ` William Cohen
2016-06-27  2:57             ` David Long
2016-06-27 14:18             ` Pratyush Anand
2016-06-28  3:20               ` William Cohen
2016-07-04 12:46                 ` Pratyush Anand
2016-07-07 19:05                   ` David Long
2016-07-07 19:58                     ` Frank Ch. Eigler
2016-08-03 13:13                       ` Pratyush Anand
2016-08-03 14:51                         ` William Cohen
2016-08-03 15:11                           ` David Long
2016-08-03 17:40                         ` William Cohen
2016-08-03 20:00                           ` Lastest kprobes64 patch David Long
2016-08-03 20:01                             ` Frank Ch. Eigler
2016-08-03 20:08                               ` David Long
2016-08-04  5:03                             ` Pratyush Anand
2016-08-04 13:07                               ` David Long
2016-08-04  4:42                           ` exercising current aarch64 kprobe support with systemtap Pratyush Anand
2016-08-04 13:57                             ` William Cohen
2016-08-04 14:36                               ` Pratyush Anand
2016-08-04 14:50                                 ` William Cohen
2016-08-04 20:51                                 ` William Cohen
2016-08-17 14:36                                   ` William Cohen
2016-08-17 18:04                                     ` David Smith
2016-08-17 18:28                                       ` William Cohen
2016-08-18 15:07                                         ` David Smith
2016-08-18 15:16                                           ` William Cohen
2016-08-18 15:39                                             ` David Smith
2016-08-18 14:55                                     ` Pratyush Anand
2016-06-13 16:11 ` William Cohen
2016-06-13 16:15   ` William Cohen
2016-06-14  4:27   ` Pratyush Anand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).