* [Bug runtime/31074] On aarch64 the systemtap.base/set_kernel.stp triggers "Unable to handle kernel paging request"
2023-11-17 17:13 [Bug runtime/31074] New: On aarch64 the systemtap.base/set_kernel.stp triggers "Unable to handle kernel paging request" wcohen at redhat dot com
@ 2023-11-22 0:19 ` fche at redhat dot com
2023-11-22 15:02 ` wcohen at redhat dot com
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: fche at redhat dot com @ 2023-11-22 0:19 UTC (permalink / raw)
To: systemtap
https://sourceware.org/bugzilla/show_bug.cgi?id=31074
Frank Ch. Eigler <fche at redhat dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |fche at redhat dot com
--- Comment #1 from Frank Ch. Eigler <fche at redhat dot com> ---
That failing strcmp may come from stp_tracepoint.c, via
stp_tracepoint_going / get_tracepoint
00126 head = &tracepoint_table[hash & (TRACEPOINT_TABLE_SIZE - 1)];
00127 hlist_for_each_entry(e, head, hlist) {
00128 if (!strcmp(name, e->name))
00129 return e;
00130 }
that offset 0x30 looks like it could be a match for e->name with null e
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug runtime/31074] On aarch64 the systemtap.base/set_kernel.stp triggers "Unable to handle kernel paging request"
2023-11-17 17:13 [Bug runtime/31074] New: On aarch64 the systemtap.base/set_kernel.stp triggers "Unable to handle kernel paging request" wcohen at redhat dot com
2023-11-22 0:19 ` [Bug runtime/31074] " fche at redhat dot com
@ 2023-11-22 15:02 ` wcohen at redhat dot com
2023-11-22 15:36 ` fche at redhat dot com
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: wcohen at redhat dot com @ 2023-11-22 15:02 UTC (permalink / raw)
To: systemtap
https://sourceware.org/bugzilla/show_bug.cgi?id=31074
William Cohen <wcohen at redhat dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
--- Comment #2 from William Cohen <wcohen at redhat dot com> ---
Yes, it looked like the value might NULL + offset for field. Added "-g
-save-temps" to EXTRA_CFLAGS of the make file of the save /tmp/stap* to get a
better idea of what the compiler is generating for code and where things are
located.
for the get_buffer:
adrp x3, .LANCHOR0
add x3, x3, :lo12:.LANCHOR0
add x3, x3, 568
mov x2, 512
mov x0, x3
mov w1, 0
.LVL175:
.loc 23 1963 178 discriminator 1 view .LVU775
bl memset
get_tracepoint (static tracepoint_table)
.LVL489:
.loc 27 127 36 view .LVU2068
adrp x1, .LANCHOR0
add x1, x1, :lo12:.LANCHOR0
add x1, x1, 1080
ldr x19, [x1, x0, lsl 3]
One thought that crossed my mind is that the memset code is pretty optimized
using cacheline zeroing for specific memset(x, 0, size) operations and might be
overrunning the end the static buffer into the static tracepoint_table as
buffer is not aligned to cache boundaries. However, reducing the size of the
get_buffer memset clearing didn't eliminate the problem.
Thinking probably should add some diagnostics to the stp_tracepoint.c to get a
better understanding how tracepoint_table entries are getting corrupted.
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug runtime/31074] On aarch64 the systemtap.base/set_kernel.stp triggers "Unable to handle kernel paging request"
2023-11-17 17:13 [Bug runtime/31074] New: On aarch64 the systemtap.base/set_kernel.stp triggers "Unable to handle kernel paging request" wcohen at redhat dot com
2023-11-22 0:19 ` [Bug runtime/31074] " fche at redhat dot com
2023-11-22 15:02 ` wcohen at redhat dot com
@ 2023-11-22 15:36 ` fche at redhat dot com
2023-11-27 16:07 ` wcohen at redhat dot com
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: fche at redhat dot com @ 2023-11-22 15:36 UTC (permalink / raw)
To: systemtap
https://sourceware.org/bugzilla/show_bug.cgi?id=31074
--- Comment #3 from Frank Ch. Eigler <fche at redhat dot com> ---
... and as a temporary measure, could consider
diff --git a/runtime/linux/stp_tracepoint.c b/runtime/linux/stp_tracepoint.c
index 508948dce4fd..9b7409194956 100644
--- a/runtime/linux/stp_tracepoint.c
+++ b/runtime/linux/stp_tracepoint.c
@@ -286,7 +286,10 @@ int stp_tracepoint_coming(struct tp_module *tp_mod)
#else
tp = tp_mod->mod->tracepoints_ptrs[i];
#endif
- e = get_tracepoint(tp->name);
+ if (!tp) {
+ WARN_ON(!tp);
+ continue;
+ }
if (!e) {
e = add_tracepoint(tp->name);
if (IS_ERR(e)) {
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug runtime/31074] On aarch64 the systemtap.base/set_kernel.stp triggers "Unable to handle kernel paging request"
2023-11-17 17:13 [Bug runtime/31074] New: On aarch64 the systemtap.base/set_kernel.stp triggers "Unable to handle kernel paging request" wcohen at redhat dot com
` (2 preceding siblings ...)
2023-11-22 15:36 ` fche at redhat dot com
@ 2023-11-27 16:07 ` wcohen at redhat dot com
2023-12-01 14:51 ` wcohen at redhat dot com
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: wcohen at redhat dot com @ 2023-11-27 16:07 UTC (permalink / raw)
To: systemtap
https://sourceware.org/bugzilla/show_bug.cgi?id=31074
--- Comment #4 from William Cohen <wcohen at redhat dot com> ---
Looking a bit more at the stp_tracepoint.c code. If e was NULL for
hlist_for_each_entry should, it should exit the for loop rather than doing the
strcmp:
https://elixir.bootlin.com/linux/v6.5.11/source/include/linux/list.h#L1053
The problem attemtped access is occurring when tracepoints are being removed.
It is a bit surprising the that the similar code in add_tracepoint doesn't
encounter a similar problem earlier around with virtually identical code in
add_tracepoint:
https://sourceware.org/git/?p=systemtap.git;a=blob;f=runtime/linux/stp_tracepoint.c;h=508948dce4fd438bde6d9d155035faba2abd0ee1;hb=HEAD#l146
The suggested patch isn't making much sense to me.
e is a local variable that would no longer be initialized before the
"if(!e){..." check.
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug runtime/31074] On aarch64 the systemtap.base/set_kernel.stp triggers "Unable to handle kernel paging request"
2023-11-17 17:13 [Bug runtime/31074] New: On aarch64 the systemtap.base/set_kernel.stp triggers "Unable to handle kernel paging request" wcohen at redhat dot com
` (3 preceding siblings ...)
2023-11-27 16:07 ` wcohen at redhat dot com
@ 2023-12-01 14:51 ` wcohen at redhat dot com
2023-12-01 16:45 ` wcohen at redhat dot com
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: wcohen at redhat dot com @ 2023-12-01 14:51 UTC (permalink / raw)
To: systemtap
https://sourceware.org/bugzilla/show_bug.cgi?id=31074
--- Comment #5 from William Cohen <wcohen at redhat dot com> ---
I have done some addition investigation using diagnostic print statements to
runtime/linux/stp_tracepoint.c. The tracepoint_table[0] is getting corrupted.
The lower 4 bytes of that entry are being zeroed.
The problem is linked to the:
set_kernel_string(addr, "foobar")
The interesting thing is that the set_kernel_string_n() functions later in the
test do not have an issue, but they use the same underlying code. If the len
passed in is set to a value greater than 508 then the set_kernel_string_n()
will cause a similar corruption. So it appears that issue is happening
_stp_store_deref_string.
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug runtime/31074] On aarch64 the systemtap.base/set_kernel.stp triggers "Unable to handle kernel paging request"
2023-11-17 17:13 [Bug runtime/31074] New: On aarch64 the systemtap.base/set_kernel.stp triggers "Unable to handle kernel paging request" wcohen at redhat dot com
` (4 preceding siblings ...)
2023-12-01 14:51 ` wcohen at redhat dot com
@ 2023-12-01 16:45 ` wcohen at redhat dot com
2023-12-01 17:19 ` mark at klomp dot org
2023-12-04 16:38 ` wcohen at redhat dot com
7 siblings, 0 replies; 9+ messages in thread
From: wcohen at redhat dot com @ 2023-12-01 16:45 UTC (permalink / raw)
To: systemtap
https://sourceware.org/bugzilla/show_bug.cgi?id=31074
--- Comment #6 from William Cohen <wcohen at redhat dot com> ---
The code in loc2c-runtime.c _stp_store_deref_string_() looks like it can write
past the end of the buffer. The for loop has a loop test of "i < len-1". When
the for loop exits i == len. The statement after the for loop is:
err = __stp_put_either('\0', (u8 *)addr + i, seg);
This would be effectively addressing addr + len. The ranges accessing the
buffer should be 0 to len-1.
For aarch64 the buffer for set_kernel.stp get_buffer function and the
tracepoint_table butt up to each other no padding between them. Some
tracepoints on aarch64 map to the tracepoint_table[0] which is getting
corrupted. On x86_64 either nothing is getting mapped to tracepoint_table[0]
or there is some space between the end of buffer and the beginning of
tracepoint_table.
On aarch64 sizeof('\0') is returning 4 rather than 1 as expected for a
character. This would explain 4 bytes of tracepoint_table[0] being 0.
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug runtime/31074] On aarch64 the systemtap.base/set_kernel.stp triggers "Unable to handle kernel paging request"
2023-11-17 17:13 [Bug runtime/31074] New: On aarch64 the systemtap.base/set_kernel.stp triggers "Unable to handle kernel paging request" wcohen at redhat dot com
` (5 preceding siblings ...)
2023-12-01 16:45 ` wcohen at redhat dot com
@ 2023-12-01 17:19 ` mark at klomp dot org
2023-12-04 16:38 ` wcohen at redhat dot com
7 siblings, 0 replies; 9+ messages in thread
From: mark at klomp dot org @ 2023-12-01 17:19 UTC (permalink / raw)
To: systemtap
https://sourceware.org/bugzilla/show_bug.cgi?id=31074
Mark Wielaard <mark at klomp dot org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |mark at klomp dot org
--- Comment #7 from Mark Wielaard <mark at klomp dot org> ---
(In reply to William Cohen from comment #6)
> On aarch64 sizeof('\0') is returning 4 rather than 1 as expected for a
> character.
That is actually so on all arches. The confusing thing is that a character
constant has type int. Which explains why sizeof ('a') is really sizeof (int)
== 4. While char c = 'a'; sizeof (c) == 1 (sizeof (char)).
Note that C and C++ differ here. In C++ character constants are type char and
so this program:
#include <stdio.h>
int
main ()
{
printf ("sizeof literal char: %d\n", sizeof ('a'));
}
$ gcc -g -o c c.c
$ ./c
sizeof literal char: 4
$ g++ -g -o c c.c
$ ./c
sizeof literal char: 1
Aren't C and C++ fun? :)
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug runtime/31074] On aarch64 the systemtap.base/set_kernel.stp triggers "Unable to handle kernel paging request"
2023-11-17 17:13 [Bug runtime/31074] New: On aarch64 the systemtap.base/set_kernel.stp triggers "Unable to handle kernel paging request" wcohen at redhat dot com
` (6 preceding siblings ...)
2023-12-01 17:19 ` mark at klomp dot org
@ 2023-12-04 16:38 ` wcohen at redhat dot com
7 siblings, 0 replies; 9+ messages in thread
From: wcohen at redhat dot com @ 2023-12-04 16:38 UTC (permalink / raw)
To: systemtap
https://sourceware.org/bugzilla/show_bug.cgi?id=31074
William Cohen <wcohen at redhat dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|ASSIGNED |RESOLVED
--- Comment #8 from William Cohen <wcohen at redhat dot com> ---
Fixed by
commit b84a5e8c2c5a857c0790a71df7824259a95131cf (HEAD -> master, origin/master,
origin/HEAD)
Author: William Cohen <wcohen@redhat.com>
Date: Mon Dec 4 11:28:10 2023 -0500
PR31074: Ensure that the set_kernel_string* functions limit their writes
Both the set_kernel_string and set_kernel_string_n function use the
underlying _stp_store_deref_string_ function to write strings. There
were two issues with the this function:
1) wrote MAXSTRINGLEN bytes even if string was shorter
2) null write at end could spill past end of buffer
The first issue was addressed by stopping to write once a null
character is encountered. The second issue is a side effect of C
implicit promotion of character constants to ints and was addressed by
explicitlying casting the character constants as a char.
The pr31074.exp test was added to verify that the write length are
limited to string length and the null write does not go beyond the end
of the buffer.
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread