* Failure in syscall.open probe @ 2017-08-24 14:26 Arkady 2017-08-24 14:35 ` Arkady 0 siblings, 1 reply; 16+ messages in thread From: Arkady @ 2017-08-24 14:26 UTC (permalink / raw) To: systemtap Hi, I am trying the following code https://gist.github.com/larytet/5c361b1a4ee967655f500909026978f9 - the script is mostly syscall.open and syscall.open.return on CentOS release 6.9 (Final) kernel 2.6.32-696.3.2.el6.x86_64 #1 SMP Tue Jun 20 01:26:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux g++ (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6) I use a custom build of 3.1 release of Systemtap Systemtap translator/driver (version 3.1/0.164, commit release-2.9-1416-gb8ea350dc13a) Copyright (C) 2005-2017 Red Hat, Inc. and others This is free software; see the source for copying conditions. tested kernel versions: 2.6.18 ... 4.10-rc8 enabled features: PYTHON2 NLS Build command stap -g -p4 -m test_open -g --suppress-time-limits -v -k -D MAXSKIPPED=0 -D MAXTRYLOCK=100000 -D TRYLOCKDELAY=1 test_open.stp I restart the script using the following bash one liner count=0;while [ 1 ];do echo $count;count=$(($count+1));sudo rmmod test_open;sudo staprun test_open.ko -L ;done; 30-50 restarts later I get an exception: Aug 24 10:13:52 localhost kernel: <#DB> [<ffffffff81068244>] ? __schedule_bug+0x44/0x50 Aug 24 10:13:52 localhost kernel: [<ffffffff8154af4c>] ? schedule+0xa4c/0xb70 Aug 24 10:13:52 localhost kernel: [<ffffffff812a6356>] ? vsnprintf+0x336/0x5e0 Aug 24 10:13:52 localhost kernel: [<ffffffff810740aa>] ? __cond_resched+0x2a/0x40 Aug 24 10:13:52 localhost kernel: [<ffffffff8154b340>] ? _cond_resched+0x30/0x40 Aug 24 10:13:52 localhost kernel: [<ffffffff812a8cda>] ? strncpy_from_user+0x4a/0x90 Aug 24 10:13:52 localhost kernel: [<ffffffffa0703fb0>] ? function___global_user_string_quoted__overload_0+0x460/0x890 [test_open] Aug 24 10:13:52 localhost kernel: [<ffffffffa070281e>] ? _new_map_create+0x3e/0xa0 [test_open] Aug 24 10:13:52 localhost kernel: [<ffffffffa07047f1>] ? probe_3606+0x101/0x400 [test_open] Aug 24 10:13:52 localhost kernel: [<ffffffff81196da1>] ? sys_open+0x1/0x30 Aug 24 10:13:52 localhost kernel: [<ffffffffa0707ce0>] ? enter_kprobe_probe+0x1c0/0x2d0 [test_open] Aug 24 10:13:52 localhost kernel: [<ffffffff815513fb>] ? aggr_pre_handler+0x5b/0xb0 Aug 24 10:13:52 localhost kernel: [<ffffffff81196da0>] ? sys_open+0x0/0x30 Aug 24 10:13:52 localhost kernel: [<ffffffff81196da1>] ? sys_open+0x1/0x30 Aug 24 10:13:52 localhost kernel: [<ffffffff81550e15>] ? kprobe_exceptions_notify+0x3d5/0x430 Aug 24 10:13:52 localhost kernel: [<ffffffff81551085>] ? notifier_call_chain+0x55/0x80 Aug 24 10:13:52 localhost kernel: [<ffffffff815510ea>] ? atomic_notifier_call_chain+0x1a/0x20 Aug 24 10:13:52 localhost kernel: [<ffffffff810acd0e>] ? notify_die+0x2e/0x30 Aug 24 10:13:52 localhost kernel: [<ffffffff8154e955>] ? do_int3+0x35/0xb0 Aug 24 10:13:52 localhost kernel: [<ffffffff8154e1c3>] ? int3+0x33/0x40 Aug 24 10:13:52 localhost kernel: [<ffffffff81196da0>] ? sys_open+0x0/0x30 Thank you, Arkady. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Failure in syscall.open probe 2017-08-24 14:26 Failure in syscall.open probe Arkady @ 2017-08-24 14:35 ` Arkady 2017-08-24 16:28 ` Arkady 0 siblings, 1 reply; 16+ messages in thread From: Arkady @ 2017-08-24 14:35 UTC (permalink / raw) To: systemtap Update The build line can be as short as stap -g -p4 -m test_open -v -k test_open.stp On Thu, Aug 24, 2017 at 5:26 PM, Arkady <arkady.miasnikov@gmail.com> wrote: > Hi, > > I am trying the following code > https://gist.github.com/larytet/5c361b1a4ee967655f500909026978f9 > - the script is mostly syscall.open and syscall.open.return > > on CentOS release 6.9 (Final) > kernel 2.6.32-696.3.2.el6.x86_64 #1 SMP Tue Jun 20 01:26:55 UTC 2017 > x86_64 x86_64 x86_64 GNU/Linux > > g++ (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6) > > I use a custom build of 3.1 release of Systemtap > Systemtap translator/driver (version 3.1/0.164, commit > release-2.9-1416-gb8ea350dc13a) > Copyright (C) 2005-2017 Red Hat, Inc. and others > This is free software; see the source for copying conditions. > tested kernel versions: 2.6.18 ... 4.10-rc8 > enabled features: PYTHON2 NLS > > Build command stap -g -p4 -m test_open -g --suppress-time-limits -v -k > -D MAXSKIPPED=0 -D MAXTRYLOCK=100000 -D TRYLOCKDELAY=1 test_open.stp > > I restart the script using the following bash one liner > count=0;while [ 1 ];do echo $count;count=$(($count+1));sudo rmmod > test_open;sudo staprun test_open.ko -L ;done; > > 30-50 restarts later I get an exception: > > Aug 24 10:13:52 localhost kernel: <#DB> [<ffffffff81068244>] ? > __schedule_bug+0x44/0x50 > Aug 24 10:13:52 localhost kernel: [<ffffffff8154af4c>] ? schedule+0xa4c/0xb70 > Aug 24 10:13:52 localhost kernel: [<ffffffff812a6356>] ? vsnprintf+0x336/0x5e0 > Aug 24 10:13:52 localhost kernel: [<ffffffff810740aa>] ? > __cond_resched+0x2a/0x40 > Aug 24 10:13:52 localhost kernel: [<ffffffff8154b340>] ? _cond_resched+0x30/0x40 > Aug 24 10:13:52 localhost kernel: [<ffffffff812a8cda>] ? > strncpy_from_user+0x4a/0x90 > Aug 24 10:13:52 localhost kernel: [<ffffffffa0703fb0>] ? > function___global_user_string_quoted__overload_0+0x460/0x890 > [test_open] > Aug 24 10:13:52 localhost kernel: [<ffffffffa070281e>] ? > _new_map_create+0x3e/0xa0 [test_open] > Aug 24 10:13:52 localhost kernel: [<ffffffffa07047f1>] ? > probe_3606+0x101/0x400 [test_open] > Aug 24 10:13:52 localhost kernel: [<ffffffff81196da1>] ? sys_open+0x1/0x30 > Aug 24 10:13:52 localhost kernel: [<ffffffffa0707ce0>] ? > enter_kprobe_probe+0x1c0/0x2d0 [test_open] > Aug 24 10:13:52 localhost kernel: [<ffffffff815513fb>] ? > aggr_pre_handler+0x5b/0xb0 > Aug 24 10:13:52 localhost kernel: [<ffffffff81196da0>] ? sys_open+0x0/0x30 > Aug 24 10:13:52 localhost kernel: [<ffffffff81196da1>] ? sys_open+0x1/0x30 > Aug 24 10:13:52 localhost kernel: [<ffffffff81550e15>] ? > kprobe_exceptions_notify+0x3d5/0x430 > Aug 24 10:13:52 localhost kernel: [<ffffffff81551085>] ? > notifier_call_chain+0x55/0x80 > Aug 24 10:13:52 localhost kernel: [<ffffffff815510ea>] ? > atomic_notifier_call_chain+0x1a/0x20 > Aug 24 10:13:52 localhost kernel: [<ffffffff810acd0e>] ? notify_die+0x2e/0x30 > Aug 24 10:13:52 localhost kernel: [<ffffffff8154e955>] ? do_int3+0x35/0xb0 > Aug 24 10:13:52 localhost kernel: [<ffffffff8154e1c3>] ? int3+0x33/0x40 > Aug 24 10:13:52 localhost kernel: [<ffffffff81196da0>] ? sys_open+0x0/0x30 > > Thank you, Arkady. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Failure in syscall.open probe 2017-08-24 14:35 ` Arkady @ 2017-08-24 16:28 ` Arkady 2017-08-25 3:56 ` Arkady 0 siblings, 1 reply; 16+ messages in thread From: Arkady @ 2017-08-24 16:28 UTC (permalink / raw) To: systemtap Update 2 I made the script shorter - failed after 400 driver restarts global ARRAY_FILENAME_PID% global ARRAY_TIMESTAMP_OPEN% function current_time:long() %{ /* pure */ struct timespec now = current_kernel_time(); STAP_RETURN(timespec_to_ns(&now)); %} probe syscall.open { tid = tid() ARRAY_FILENAME_PID[tid] = filename ARRAY_TIMESTAMP_OPEN[tid] = current_time() } probe syscall.open.return { tid = tid(); delete ARRAY_FILENAME_PID[tid] delete ARRAY_TIMESTAMP_OPEN[tid] } On Thu, Aug 24, 2017 at 5:35 PM, Arkady <arkady.miasnikov@gmail.com> wrote: > Update > The build line can be as short as > > stap -g -p4 -m test_open -v -k test_open.stp > > > On Thu, Aug 24, 2017 at 5:26 PM, Arkady <arkady.miasnikov@gmail.com> wrote: >> Hi, >> >> I am trying the following code >> https://gist.github.com/larytet/5c361b1a4ee967655f500909026978f9 >> - the script is mostly syscall.open and syscall.open.return >> >> on CentOS release 6.9 (Final) >> kernel 2.6.32-696.3.2.el6.x86_64 #1 SMP Tue Jun 20 01:26:55 UTC 2017 >> x86_64 x86_64 x86_64 GNU/Linux >> >> g++ (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6) >> >> I use a custom build of 3.1 release of Systemtap >> Systemtap translator/driver (version 3.1/0.164, commit >> release-2.9-1416-gb8ea350dc13a) >> Copyright (C) 2005-2017 Red Hat, Inc. and others >> This is free software; see the source for copying conditions. >> tested kernel versions: 2.6.18 ... 4.10-rc8 >> enabled features: PYTHON2 NLS >> >> Build command stap -g -p4 -m test_open -g --suppress-time-limits -v -k >> -D MAXSKIPPED=0 -D MAXTRYLOCK=100000 -D TRYLOCKDELAY=1 test_open.stp >> >> I restart the script using the following bash one liner >> count=0;while [ 1 ];do echo $count;count=$(($count+1));sudo rmmod >> test_open;sudo staprun test_open.ko -L ;done; >> >> 30-50 restarts later I get an exception: >> >> Aug 24 10:13:52 localhost kernel: <#DB> [<ffffffff81068244>] ? >> __schedule_bug+0x44/0x50 >> Aug 24 10:13:52 localhost kernel: [<ffffffff8154af4c>] ? schedule+0xa4c/0xb70 >> Aug 24 10:13:52 localhost kernel: [<ffffffff812a6356>] ? vsnprintf+0x336/0x5e0 >> Aug 24 10:13:52 localhost kernel: [<ffffffff810740aa>] ? >> __cond_resched+0x2a/0x40 >> Aug 24 10:13:52 localhost kernel: [<ffffffff8154b340>] ? _cond_resched+0x30/0x40 >> Aug 24 10:13:52 localhost kernel: [<ffffffff812a8cda>] ? >> strncpy_from_user+0x4a/0x90 >> Aug 24 10:13:52 localhost kernel: [<ffffffffa0703fb0>] ? >> function___global_user_string_quoted__overload_0+0x460/0x890 >> [test_open] >> Aug 24 10:13:52 localhost kernel: [<ffffffffa070281e>] ? >> _new_map_create+0x3e/0xa0 [test_open] >> Aug 24 10:13:52 localhost kernel: [<ffffffffa07047f1>] ? >> probe_3606+0x101/0x400 [test_open] >> Aug 24 10:13:52 localhost kernel: [<ffffffff81196da1>] ? sys_open+0x1/0x30 >> Aug 24 10:13:52 localhost kernel: [<ffffffffa0707ce0>] ? >> enter_kprobe_probe+0x1c0/0x2d0 [test_open] >> Aug 24 10:13:52 localhost kernel: [<ffffffff815513fb>] ? >> aggr_pre_handler+0x5b/0xb0 >> Aug 24 10:13:52 localhost kernel: [<ffffffff81196da0>] ? sys_open+0x0/0x30 >> Aug 24 10:13:52 localhost kernel: [<ffffffff81196da1>] ? sys_open+0x1/0x30 >> Aug 24 10:13:52 localhost kernel: [<ffffffff81550e15>] ? >> kprobe_exceptions_notify+0x3d5/0x430 >> Aug 24 10:13:52 localhost kernel: [<ffffffff81551085>] ? >> notifier_call_chain+0x55/0x80 >> Aug 24 10:13:52 localhost kernel: [<ffffffff815510ea>] ? >> atomic_notifier_call_chain+0x1a/0x20 >> Aug 24 10:13:52 localhost kernel: [<ffffffff810acd0e>] ? notify_die+0x2e/0x30 >> Aug 24 10:13:52 localhost kernel: [<ffffffff8154e955>] ? do_int3+0x35/0xb0 >> Aug 24 10:13:52 localhost kernel: [<ffffffff8154e1c3>] ? int3+0x33/0x40 >> Aug 24 10:13:52 localhost kernel: [<ffffffff81196da0>] ? sys_open+0x0/0x30 >> >> Thank you, Arkady. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Failure in syscall.open probe 2017-08-24 16:28 ` Arkady @ 2017-08-25 3:56 ` Arkady 2017-08-25 4:36 ` Arkady 0 siblings, 1 reply; 16+ messages in thread From: Arkady @ 2017-08-25 3:56 UTC (permalink / raw) To: systemtap Update 3 This code fails as well global ARRAY_FILENAME% probe syscall.open { tid = tid() ARRAY_FILENAME[tid] = filename } probe syscall.open.return { tid = tid(); delete ARRAY_FILENAME[tid] } Load thescript stap -g -p 4 -m test_open -v -k test_open.stp staprun test_open -L Run a tight loop which opens a file touch test.txt;while [ 1 ];do cat test.txt;done; Wait a couple of minutes. The problem appears to be specific for the kernel 2.6.32-696.3.2.el6.x86_64 #1 CentOS 6.9 On Thu, Aug 24, 2017 at 7:28 PM, Arkady <arkady.miasnikov@gmail.com> wrote: > Update 2 > > I made the script shorter - failed after 400 driver restarts > > global ARRAY_FILENAME_PID% > global ARRAY_TIMESTAMP_OPEN% > > function current_time:long() > %{ /* pure */ > struct timespec now = current_kernel_time(); > STAP_RETURN(timespec_to_ns(&now)); > %} > > probe syscall.open > { > tid = tid() > ARRAY_FILENAME_PID[tid] = filename > ARRAY_TIMESTAMP_OPEN[tid] = current_time() > } > > probe syscall.open.return > { > tid = tid(); > delete ARRAY_FILENAME_PID[tid] > delete ARRAY_TIMESTAMP_OPEN[tid] > } > > On Thu, Aug 24, 2017 at 5:35 PM, Arkady <arkady.miasnikov@gmail.com> wrote: >> Update >> The build line can be as short as >> >> stap -g -p4 -m test_open -v -k test_open.stp >> >> >> On Thu, Aug 24, 2017 at 5:26 PM, Arkady <arkady.miasnikov@gmail.com> wrote: >>> Hi, >>> >>> I am trying the following code >>> https://gist.github.com/larytet/5c361b1a4ee967655f500909026978f9 >>> - the script is mostly syscall.open and syscall.open.return >>> >>> on CentOS release 6.9 (Final) >>> kernel 2.6.32-696.3.2.el6.x86_64 #1 SMP Tue Jun 20 01:26:55 UTC 2017 >>> x86_64 x86_64 x86_64 GNU/Linux >>> >>> g++ (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6) >>> >>> I use a custom build of 3.1 release of Systemtap >>> Systemtap translator/driver (version 3.1/0.164, commit >>> release-2.9-1416-gb8ea350dc13a) >>> Copyright (C) 2005-2017 Red Hat, Inc. and others >>> This is free software; see the source for copying conditions. >>> tested kernel versions: 2.6.18 ... 4.10-rc8 >>> enabled features: PYTHON2 NLS >>> >>> Build command stap -g -p4 -m test_open -g --suppress-time-limits -v -k >>> -D MAXSKIPPED=0 -D MAXTRYLOCK=100000 -D TRYLOCKDELAY=1 test_open.stp >>> >>> I restart the script using the following bash one liner >>> count=0;while [ 1 ];do echo $count;count=$(($count+1));sudo rmmod >>> test_open;sudo staprun test_open.ko -L ;done; >>> >>> 30-50 restarts later I get an exception: >>> >>> Aug 24 10:13:52 localhost kernel: <#DB> [<ffffffff81068244>] ? >>> __schedule_bug+0x44/0x50 >>> Aug 24 10:13:52 localhost kernel: [<ffffffff8154af4c>] ? schedule+0xa4c/0xb70 >>> Aug 24 10:13:52 localhost kernel: [<ffffffff812a6356>] ? vsnprintf+0x336/0x5e0 >>> Aug 24 10:13:52 localhost kernel: [<ffffffff810740aa>] ? >>> __cond_resched+0x2a/0x40 >>> Aug 24 10:13:52 localhost kernel: [<ffffffff8154b340>] ? _cond_resched+0x30/0x40 >>> Aug 24 10:13:52 localhost kernel: [<ffffffff812a8cda>] ? >>> strncpy_from_user+0x4a/0x90 >>> Aug 24 10:13:52 localhost kernel: [<ffffffffa0703fb0>] ? >>> function___global_user_string_quoted__overload_0+0x460/0x890 >>> [test_open] >>> Aug 24 10:13:52 localhost kernel: [<ffffffffa070281e>] ? >>> _new_map_create+0x3e/0xa0 [test_open] >>> Aug 24 10:13:52 localhost kernel: [<ffffffffa07047f1>] ? >>> probe_3606+0x101/0x400 [test_open] >>> Aug 24 10:13:52 localhost kernel: [<ffffffff81196da1>] ? sys_open+0x1/0x30 >>> Aug 24 10:13:52 localhost kernel: [<ffffffffa0707ce0>] ? >>> enter_kprobe_probe+0x1c0/0x2d0 [test_open] >>> Aug 24 10:13:52 localhost kernel: [<ffffffff815513fb>] ? >>> aggr_pre_handler+0x5b/0xb0 >>> Aug 24 10:13:52 localhost kernel: [<ffffffff81196da0>] ? sys_open+0x0/0x30 >>> Aug 24 10:13:52 localhost kernel: [<ffffffff81196da1>] ? sys_open+0x1/0x30 >>> Aug 24 10:13:52 localhost kernel: [<ffffffff81550e15>] ? >>> kprobe_exceptions_notify+0x3d5/0x430 >>> Aug 24 10:13:52 localhost kernel: [<ffffffff81551085>] ? >>> notifier_call_chain+0x55/0x80 >>> Aug 24 10:13:52 localhost kernel: [<ffffffff815510ea>] ? >>> atomic_notifier_call_chain+0x1a/0x20 >>> Aug 24 10:13:52 localhost kernel: [<ffffffff810acd0e>] ? notify_die+0x2e/0x30 >>> Aug 24 10:13:52 localhost kernel: [<ffffffff8154e955>] ? do_int3+0x35/0xb0 >>> Aug 24 10:13:52 localhost kernel: [<ffffffff8154e1c3>] ? int3+0x33/0x40 >>> Aug 24 10:13:52 localhost kernel: [<ffffffff81196da0>] ? sys_open+0x0/0x30 >>> >>> Thank you, Arkady. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Failure in syscall.open probe 2017-08-25 3:56 ` Arkady @ 2017-08-25 4:36 ` Arkady 2017-08-25 17:25 ` Arkady 0 siblings, 1 reply; 16+ messages in thread From: Arkady @ 2017-08-25 4:36 UTC (permalink / raw) To: systemtap Update 4 The problem can be reproduced on the kernel 2.6.32-696.10.1.el6.x86_64 #1 SMP as well On Fri, Aug 25, 2017 at 6:55 AM, Arkady <arkady.miasnikov@gmail.com> wrote: > Update 3 > > This code fails as well > > > global ARRAY_FILENAME% > > probe syscall.open > { > tid = tid() > ARRAY_FILENAME[tid] = filename > } > > probe syscall.open.return > { > tid = tid(); > delete ARRAY_FILENAME[tid] > } > > Load thescript > stap -g -p 4 -m test_open -v -k test_open.stp > staprun test_open -L > > Run a tight loop which opens a file > touch test.txt;while [ 1 ];do cat test.txt;done; > > Wait a couple of minutes. > > The problem appears to be specific for the kernel > 2.6.32-696.3.2.el6.x86_64 #1 CentOS 6.9 > > On Thu, Aug 24, 2017 at 7:28 PM, Arkady <arkady.miasnikov@gmail.com> wrote: >> Update 2 >> >> I made the script shorter - failed after 400 driver restarts >> >> global ARRAY_FILENAME_PID% >> global ARRAY_TIMESTAMP_OPEN% >> >> function current_time:long() >> %{ /* pure */ >> struct timespec now = current_kernel_time(); >> STAP_RETURN(timespec_to_ns(&now)); >> %} >> >> probe syscall.open >> { >> tid = tid() >> ARRAY_FILENAME_PID[tid] = filename >> ARRAY_TIMESTAMP_OPEN[tid] = current_time() >> } >> >> probe syscall.open.return >> { >> tid = tid(); >> delete ARRAY_FILENAME_PID[tid] >> delete ARRAY_TIMESTAMP_OPEN[tid] >> } >> >> On Thu, Aug 24, 2017 at 5:35 PM, Arkady <arkady.miasnikov@gmail.com> wrote: >>> Update >>> The build line can be as short as >>> >>> stap -g -p4 -m test_open -v -k test_open.stp >>> >>> >>> On Thu, Aug 24, 2017 at 5:26 PM, Arkady <arkady.miasnikov@gmail.com> wrote: >>>> Hi, >>>> >>>> I am trying the following code >>>> https://gist.github.com/larytet/5c361b1a4ee967655f500909026978f9 >>>> - the script is mostly syscall.open and syscall.open.return >>>> >>>> on CentOS release 6.9 (Final) >>>> kernel 2.6.32-696.3.2.el6.x86_64 #1 SMP Tue Jun 20 01:26:55 UTC 2017 >>>> x86_64 x86_64 x86_64 GNU/Linux >>>> >>>> g++ (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6) >>>> >>>> I use a custom build of 3.1 release of Systemtap >>>> Systemtap translator/driver (version 3.1/0.164, commit >>>> release-2.9-1416-gb8ea350dc13a) >>>> Copyright (C) 2005-2017 Red Hat, Inc. and others >>>> This is free software; see the source for copying conditions. >>>> tested kernel versions: 2.6.18 ... 4.10-rc8 >>>> enabled features: PYTHON2 NLS >>>> >>>> Build command stap -g -p4 -m test_open -g --suppress-time-limits -v -k >>>> -D MAXSKIPPED=0 -D MAXTRYLOCK=100000 -D TRYLOCKDELAY=1 test_open.stp >>>> >>>> I restart the script using the following bash one liner >>>> count=0;while [ 1 ];do echo $count;count=$(($count+1));sudo rmmod >>>> test_open;sudo staprun test_open.ko -L ;done; >>>> >>>> 30-50 restarts later I get an exception: >>>> >>>> Aug 24 10:13:52 localhost kernel: <#DB> [<ffffffff81068244>] ? >>>> __schedule_bug+0x44/0x50 >>>> Aug 24 10:13:52 localhost kernel: [<ffffffff8154af4c>] ? schedule+0xa4c/0xb70 >>>> Aug 24 10:13:52 localhost kernel: [<ffffffff812a6356>] ? vsnprintf+0x336/0x5e0 >>>> Aug 24 10:13:52 localhost kernel: [<ffffffff810740aa>] ? >>>> __cond_resched+0x2a/0x40 >>>> Aug 24 10:13:52 localhost kernel: [<ffffffff8154b340>] ? _cond_resched+0x30/0x40 >>>> Aug 24 10:13:52 localhost kernel: [<ffffffff812a8cda>] ? >>>> strncpy_from_user+0x4a/0x90 >>>> Aug 24 10:13:52 localhost kernel: [<ffffffffa0703fb0>] ? >>>> function___global_user_string_quoted__overload_0+0x460/0x890 >>>> [test_open] >>>> Aug 24 10:13:52 localhost kernel: [<ffffffffa070281e>] ? >>>> _new_map_create+0x3e/0xa0 [test_open] >>>> Aug 24 10:13:52 localhost kernel: [<ffffffffa07047f1>] ? >>>> probe_3606+0x101/0x400 [test_open] >>>> Aug 24 10:13:52 localhost kernel: [<ffffffff81196da1>] ? sys_open+0x1/0x30 >>>> Aug 24 10:13:52 localhost kernel: [<ffffffffa0707ce0>] ? >>>> enter_kprobe_probe+0x1c0/0x2d0 [test_open] >>>> Aug 24 10:13:52 localhost kernel: [<ffffffff815513fb>] ? >>>> aggr_pre_handler+0x5b/0xb0 >>>> Aug 24 10:13:52 localhost kernel: [<ffffffff81196da0>] ? sys_open+0x0/0x30 >>>> Aug 24 10:13:52 localhost kernel: [<ffffffff81196da1>] ? sys_open+0x1/0x30 >>>> Aug 24 10:13:52 localhost kernel: [<ffffffff81550e15>] ? >>>> kprobe_exceptions_notify+0x3d5/0x430 >>>> Aug 24 10:13:52 localhost kernel: [<ffffffff81551085>] ? >>>> notifier_call_chain+0x55/0x80 >>>> Aug 24 10:13:52 localhost kernel: [<ffffffff815510ea>] ? >>>> atomic_notifier_call_chain+0x1a/0x20 >>>> Aug 24 10:13:52 localhost kernel: [<ffffffff810acd0e>] ? notify_die+0x2e/0x30 >>>> Aug 24 10:13:52 localhost kernel: [<ffffffff8154e955>] ? do_int3+0x35/0xb0 >>>> Aug 24 10:13:52 localhost kernel: [<ffffffff8154e1c3>] ? int3+0x33/0x40 >>>> Aug 24 10:13:52 localhost kernel: [<ffffffff81196da0>] ? sys_open+0x0/0x30 >>>> >>>> Thank you, Arkady. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Failure in syscall.open probe 2017-08-25 4:36 ` Arkady @ 2017-08-25 17:25 ` Arkady 2017-08-25 19:52 ` David Smith 0 siblings, 1 reply; 16+ messages in thread From: Arkady @ 2017-08-25 17:25 UTC (permalink / raw) To: systemtap Reproduce the problem Run the following one liner on CentOS 6.9 stap -e "global AF%;probe syscall.open {tid = tid();AF[tid] = filename;}probe syscall.open.return{tid = tid();delete AF[tid];}" Run a bash loop touch test.txt;while [ 1 ];do cat test.txt;done; Monitor the kernel log tail -f /var/log/messages After a couple of minutes you shall see BUG: scheduling while atomic: cat/87764/0x10000001 Modules linked in: test_open(U) fuse rfcomm sco bridge bnep l2cap autofs4 bnx2fc cnic uio fcoe libfcoe libfc 8021q scsi_transport_fc garp stp scsi_tgt llc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 uinput microcode vmware_balloon uvcvideo videodev v4l2_compat_ioctl32 btusb bluetooth rfkill snd_seq_midi e1000 snd_seq_midi_event snd_ens1371 snd_rawmidi snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc sg i2c_piix4 shpchp ext4 jbd2 mbcache sd_mod crc_t10dif sr_mod cdrom mptspi mptscsih mptbase scsi_transport_spi pata_acpi ata_generic ata_piix vmwgfx ttm drm_kms_helper drm i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib] Pid: 87764, comm: cat Not tainted 2.6.32-696.10.1.el6.x86_64 #1 Call Trace: <#DB> [<ffffffff81068244>] ? __schedule_bug+0x44/0x50 [<ffffffff8154ae0c>] ? schedule+0xa4c/0xb70 [<ffffffff812a6216>] ? vsnprintf+0x336/0x5e0 [<ffffffff810740aa>] ? __cond_resched+0x2a/0x40 [<ffffffff8154b200>] ? _cond_resched+0x30/0x40 [<ffffffff812a8b9a>] ? strncpy_from_user+0x4a/0x90 [<ffffffffa0705ff9>] ? probe_3602+0x5f9/0x1220 [test_open] [<ffffffff81196c31>] ? sys_open+0x1/0x30 [<ffffffffa0707fed>] ? enter_kprobe_probe+0x1ed/0x3a0 [test_open] [<ffffffff815512bb>] ? aggr_pre_handler+0x5b/0xb0 [<ffffffff81196c30>] ? sys_open+0x0/0x30 [<ffffffff81196c31>] ? sys_open+0x1/0x30 [<ffffffff81550cd5>] ? kprobe_exceptions_notify+0x3d5/0x430 [<ffffffff81550f45>] ? notifier_call_chain+0x55/0x80 [<ffffffff81550faa>] ? atomic_notifier_call_chain+0x1a/0x20 [<ffffffff810acd0e>] ? notify_die+0x2e/0x30 [<ffffffff8154e815>] ? do_int3+0x35/0xb0 [<ffffffff8154e083>] ? int3+0x33/0x40 [<ffffffff81196c30>] ? sys_open+0x0/0x30 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Failure in syscall.open probe 2017-08-25 17:25 ` Arkady @ 2017-08-25 19:52 ` David Smith 2017-08-27 6:13 ` Daniel Doron 2017-08-27 12:22 ` Arkady 0 siblings, 2 replies; 16+ messages in thread From: David Smith @ 2017-08-25 19:52 UTC (permalink / raw) To: Arkady; +Cc: systemtap Arkady, The "good" news is that I've duplicated your problem. I went ahead and filed PR22012 (<https://sourceware.org/bugzilla/show_bug.cgi?id=22012>) on this issue so I won't forget it. I'm looking into it. On Fri, Aug 25, 2017 at 12:25 PM, Arkady <arkady.miasnikov@gmail.com> wrote: > Reproduce the problem > > Run the following one liner on CentOS 6.9 > > stap -e "global AF%;probe syscall.open {tid = tid();AF[tid] = > filename;}probe syscall.open.return{tid = tid();delete AF[tid];}" > > Run a bash loop > touch test.txt;while [ 1 ];do cat test.txt;done; > > Monitor the kernel log > tail -f /var/log/messages > > After a couple of minutes you shall see > > BUG: scheduling while atomic: cat/87764/0x10000001 > Modules linked in: test_open(U) fuse rfcomm sco bridge bnep l2cap > autofs4 bnx2fc cnic uio fcoe libfcoe libfc 8021q scsi_transport_fc > garp stp scsi_tgt llc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 > iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 > xt_state nf_conntrack ip6table_filter ip6_tables ib_ipoib rdma_ucm > ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core > ib_addr ipv6 uinput microcode vmware_balloon uvcvideo videodev > v4l2_compat_ioctl32 btusb bluetooth rfkill snd_seq_midi e1000 > snd_seq_midi_event snd_ens1371 snd_rawmidi snd_ac97_codec ac97_bus > snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc > sg i2c_piix4 shpchp ext4 jbd2 mbcache sd_mod crc_t10dif sr_mod cdrom > mptspi mptscsih mptbase scsi_transport_spi pata_acpi ata_generic > ata_piix vmwgfx ttm drm_kms_helper drm i2c_core dm_mirror > dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib] > Pid: 87764, comm: cat Not tainted 2.6.32-696.10.1.el6.x86_64 #1 > Call Trace: > <#DB> [<ffffffff81068244>] ? __schedule_bug+0x44/0x50 > [<ffffffff8154ae0c>] ? schedule+0xa4c/0xb70 > [<ffffffff812a6216>] ? vsnprintf+0x336/0x5e0 > [<ffffffff810740aa>] ? __cond_resched+0x2a/0x40 > [<ffffffff8154b200>] ? _cond_resched+0x30/0x40 > [<ffffffff812a8b9a>] ? strncpy_from_user+0x4a/0x90 > [<ffffffffa0705ff9>] ? probe_3602+0x5f9/0x1220 [test_open] > [<ffffffff81196c31>] ? sys_open+0x1/0x30 > [<ffffffffa0707fed>] ? enter_kprobe_probe+0x1ed/0x3a0 [test_open] > [<ffffffff815512bb>] ? aggr_pre_handler+0x5b/0xb0 > [<ffffffff81196c30>] ? sys_open+0x0/0x30 > [<ffffffff81196c31>] ? sys_open+0x1/0x30 > [<ffffffff81550cd5>] ? kprobe_exceptions_notify+0x3d5/0x430 > [<ffffffff81550f45>] ? notifier_call_chain+0x55/0x80 > [<ffffffff81550faa>] ? atomic_notifier_call_chain+0x1a/0x20 > [<ffffffff810acd0e>] ? notify_die+0x2e/0x30 > [<ffffffff8154e815>] ? do_int3+0x35/0xb0 > [<ffffffff8154e083>] ? int3+0x33/0x40 > [<ffffffff81196c30>] ? sys_open+0x0/0x30 -- David Smith Principal Software Engineer Red Hat ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Failure in syscall.open probe 2017-08-25 19:52 ` David Smith @ 2017-08-27 6:13 ` Daniel Doron 2017-08-27 12:22 ` Arkady 1 sibling, 0 replies; 16+ messages in thread From: Daniel Doron @ 2017-08-27 6:13 UTC (permalink / raw) To: David Smith; +Cc: Arkady, systemtap I would like to add that I have noticed the same bug on long running module....so it is not just about loading/unloading of module. On Fri, Aug 25, 2017 at 10:52 PM, David Smith <dsmith@redhat.com> wrote: > Arkady, > > The "good" news is that I've duplicated your problem. I went ahead and > filed PR22012 (<https://sourceware.org/bugzilla/show_bug.cgi?id=22012>) > on this issue so I won't forget it. > > I'm looking into it. > > > On Fri, Aug 25, 2017 at 12:25 PM, Arkady <arkady.miasnikov@gmail.com> wrote: >> Reproduce the problem >> >> Run the following one liner on CentOS 6.9 >> >> stap -e "global AF%;probe syscall.open {tid = tid();AF[tid] = >> filename;}probe syscall.open.return{tid = tid();delete AF[tid];}" >> >> Run a bash loop >> touch test.txt;while [ 1 ];do cat test.txt;done; >> >> Monitor the kernel log >> tail -f /var/log/messages >> >> After a couple of minutes you shall see >> >> BUG: scheduling while atomic: cat/87764/0x10000001 >> Modules linked in: test_open(U) fuse rfcomm sco bridge bnep l2cap >> autofs4 bnx2fc cnic uio fcoe libfcoe libfc 8021q scsi_transport_fc >> garp stp scsi_tgt llc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 >> iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 >> xt_state nf_conntrack ip6table_filter ip6_tables ib_ipoib rdma_ucm >> ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core >> ib_addr ipv6 uinput microcode vmware_balloon uvcvideo videodev >> v4l2_compat_ioctl32 btusb bluetooth rfkill snd_seq_midi e1000 >> snd_seq_midi_event snd_ens1371 snd_rawmidi snd_ac97_codec ac97_bus >> snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc >> sg i2c_piix4 shpchp ext4 jbd2 mbcache sd_mod crc_t10dif sr_mod cdrom >> mptspi mptscsih mptbase scsi_transport_spi pata_acpi ata_generic >> ata_piix vmwgfx ttm drm_kms_helper drm i2c_core dm_mirror >> dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib] >> Pid: 87764, comm: cat Not tainted 2.6.32-696.10.1.el6.x86_64 #1 >> Call Trace: >> <#DB> [<ffffffff81068244>] ? __schedule_bug+0x44/0x50 >> [<ffffffff8154ae0c>] ? schedule+0xa4c/0xb70 >> [<ffffffff812a6216>] ? vsnprintf+0x336/0x5e0 >> [<ffffffff810740aa>] ? __cond_resched+0x2a/0x40 >> [<ffffffff8154b200>] ? _cond_resched+0x30/0x40 >> [<ffffffff812a8b9a>] ? strncpy_from_user+0x4a/0x90 >> [<ffffffffa0705ff9>] ? probe_3602+0x5f9/0x1220 [test_open] >> [<ffffffff81196c31>] ? sys_open+0x1/0x30 >> [<ffffffffa0707fed>] ? enter_kprobe_probe+0x1ed/0x3a0 [test_open] >> [<ffffffff815512bb>] ? aggr_pre_handler+0x5b/0xb0 >> [<ffffffff81196c30>] ? sys_open+0x0/0x30 >> [<ffffffff81196c31>] ? sys_open+0x1/0x30 >> [<ffffffff81550cd5>] ? kprobe_exceptions_notify+0x3d5/0x430 >> [<ffffffff81550f45>] ? notifier_call_chain+0x55/0x80 >> [<ffffffff81550faa>] ? atomic_notifier_call_chain+0x1a/0x20 >> [<ffffffff810acd0e>] ? notify_die+0x2e/0x30 >> [<ffffffff8154e815>] ? do_int3+0x35/0xb0 >> [<ffffffff8154e083>] ? int3+0x33/0x40 >> [<ffffffff81196c30>] ? sys_open+0x0/0x30 > > > > -- > David Smith > Principal Software Engineer > Red Hat ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Failure in syscall.open probe 2017-08-25 19:52 ` David Smith 2017-08-27 6:13 ` Daniel Doron @ 2017-08-27 12:22 ` Arkady 2017-08-27 12:38 ` Arkady 1 sibling, 1 reply; 16+ messages in thread From: Arkady @ 2017-08-27 12:22 UTC (permalink / raw) To: David Smith; +Cc: systemtap After the failure the driver can not be removed - remains "stuck" due to the failure of the syscall.open probe to decrement the usage counter on return. On Fri, Aug 25, 2017 at 10:52 PM, David Smith <dsmith@redhat.com> wrote: > Arkady, > > The "good" news is that I've duplicated your problem. I went ahead and > filed PR22012 (<https://sourceware.org/bugzilla/show_bug.cgi?id=22012>) > on this issue so I won't forget it. > > I'm looking into it. > > > On Fri, Aug 25, 2017 at 12:25 PM, Arkady <arkady.miasnikov@gmail.com> wrote: >> Reproduce the problem >> >> Run the following one liner on CentOS 6.9 >> >> stap -e "global AF%;probe syscall.open {tid = tid();AF[tid] = >> filename;}probe syscall.open.return{tid = tid();delete AF[tid];}" >> >> Run a bash loop >> touch test.txt;while [ 1 ];do cat test.txt;done; >> >> Monitor the kernel log >> tail -f /var/log/messages >> >> After a couple of minutes you shall see >> >> BUG: scheduling while atomic: cat/87764/0x10000001 >> Modules linked in: test_open(U) fuse rfcomm sco bridge bnep l2cap >> autofs4 bnx2fc cnic uio fcoe libfcoe libfc 8021q scsi_transport_fc >> garp stp scsi_tgt llc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 >> iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 >> xt_state nf_conntrack ip6table_filter ip6_tables ib_ipoib rdma_ucm >> ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core >> ib_addr ipv6 uinput microcode vmware_balloon uvcvideo videodev >> v4l2_compat_ioctl32 btusb bluetooth rfkill snd_seq_midi e1000 >> snd_seq_midi_event snd_ens1371 snd_rawmidi snd_ac97_codec ac97_bus >> snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc >> sg i2c_piix4 shpchp ext4 jbd2 mbcache sd_mod crc_t10dif sr_mod cdrom >> mptspi mptscsih mptbase scsi_transport_spi pata_acpi ata_generic >> ata_piix vmwgfx ttm drm_kms_helper drm i2c_core dm_mirror >> dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib] >> Pid: 87764, comm: cat Not tainted 2.6.32-696.10.1.el6.x86_64 #1 >> Call Trace: >> <#DB> [<ffffffff81068244>] ? __schedule_bug+0x44/0x50 >> [<ffffffff8154ae0c>] ? schedule+0xa4c/0xb70 >> [<ffffffff812a6216>] ? vsnprintf+0x336/0x5e0 >> [<ffffffff810740aa>] ? __cond_resched+0x2a/0x40 >> [<ffffffff8154b200>] ? _cond_resched+0x30/0x40 >> [<ffffffff812a8b9a>] ? strncpy_from_user+0x4a/0x90 >> [<ffffffffa0705ff9>] ? probe_3602+0x5f9/0x1220 [test_open] >> [<ffffffff81196c31>] ? sys_open+0x1/0x30 >> [<ffffffffa0707fed>] ? enter_kprobe_probe+0x1ed/0x3a0 [test_open] >> [<ffffffff815512bb>] ? aggr_pre_handler+0x5b/0xb0 >> [<ffffffff81196c30>] ? sys_open+0x0/0x30 >> [<ffffffff81196c31>] ? sys_open+0x1/0x30 >> [<ffffffff81550cd5>] ? kprobe_exceptions_notify+0x3d5/0x430 >> [<ffffffff81550f45>] ? notifier_call_chain+0x55/0x80 >> [<ffffffff81550faa>] ? atomic_notifier_call_chain+0x1a/0x20 >> [<ffffffff810acd0e>] ? notify_die+0x2e/0x30 >> [<ffffffff8154e815>] ? do_int3+0x35/0xb0 >> [<ffffffff8154e083>] ? int3+0x33/0x40 >> [<ffffffff81196c30>] ? sys_open+0x0/0x30 > > > > -- > David Smith > Principal Software Engineer > Red Hat ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Failure in syscall.open probe 2017-08-27 12:22 ` Arkady @ 2017-08-27 12:38 ` Arkady 2017-08-28 5:51 ` Arkady 0 siblings, 1 reply; 16+ messages in thread From: Arkady @ 2017-08-27 12:38 UTC (permalink / raw) To: David Smith; +Cc: systemtap The following comment is probably relevant: /* * On some kernels (e.g. 2.6.39), even with preemption disabled, the strncpy_from_user, * instead of returning -1 after a page fault, schedules the process, so we drop events * because of the preemption. This function reads the user buffer in atomic chunks, and * returns when there's an error or the terminator is found */ https://github.com/draios/sysdig/blob/dev/driver/ppm_events.c#L108 > On Fri, Aug 25, 2017 at 10:52 PM, David Smith <dsmith@redhat.com> wrote: >> Arkady, >> >> The "good" news is that I've duplicated your problem. I went ahead and >> filed PR22012 (<https://sourceware.org/bugzilla/show_bug.cgi?id=22012>) >> on this issue so I won't forget it. >> >> I'm looking into it. >> >> >> On Fri, Aug 25, 2017 at 12:25 PM, Arkady <arkady.miasnikov@gmail.com> wrote: >>> Reproduce the problem >>> >>> Run the following one liner on CentOS 6.9 >>> >>> stap -e "global AF%;probe syscall.open {tid = tid();AF[tid] = >>> filename;}probe syscall.open.return{tid = tid();delete AF[tid];}" >>> >>> Run a bash loop >>> touch test.txt;while [ 1 ];do cat test.txt;done; >>> >>> Monitor the kernel log >>> tail -f /var/log/messages >>> >>> After a couple of minutes you shall see >>> >>> BUG: scheduling while atomic: cat/87764/0x10000001 >>> Modules linked in: test_open(U) fuse rfcomm sco bridge bnep l2cap >>> autofs4 bnx2fc cnic uio fcoe libfcoe libfc 8021q scsi_transport_fc >>> garp stp scsi_tgt llc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 >>> iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 >>> xt_state nf_conntrack ip6table_filter ip6_tables ib_ipoib rdma_ucm >>> ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core >>> ib_addr ipv6 uinput microcode vmware_balloon uvcvideo videodev >>> v4l2_compat_ioctl32 btusb bluetooth rfkill snd_seq_midi e1000 >>> snd_seq_midi_event snd_ens1371 snd_rawmidi snd_ac97_codec ac97_bus >>> snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc >>> sg i2c_piix4 shpchp ext4 jbd2 mbcache sd_mod crc_t10dif sr_mod cdrom >>> mptspi mptscsih mptbase scsi_transport_spi pata_acpi ata_generic >>> ata_piix vmwgfx ttm drm_kms_helper drm i2c_core dm_mirror >>> dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib] >>> Pid: 87764, comm: cat Not tainted 2.6.32-696.10.1.el6.x86_64 #1 >>> Call Trace: >>> <#DB> [<ffffffff81068244>] ? __schedule_bug+0x44/0x50 >>> [<ffffffff8154ae0c>] ? schedule+0xa4c/0xb70 >>> [<ffffffff812a6216>] ? vsnprintf+0x336/0x5e0 >>> [<ffffffff810740aa>] ? __cond_resched+0x2a/0x40 >>> [<ffffffff8154b200>] ? _cond_resched+0x30/0x40 >>> [<ffffffff812a8b9a>] ? strncpy_from_user+0x4a/0x90 >>> [<ffffffffa0705ff9>] ? probe_3602+0x5f9/0x1220 [test_open] >>> [<ffffffff81196c31>] ? sys_open+0x1/0x30 >>> [<ffffffffa0707fed>] ? enter_kprobe_probe+0x1ed/0x3a0 [test_open] >>> [<ffffffff815512bb>] ? aggr_pre_handler+0x5b/0xb0 >>> [<ffffffff81196c30>] ? sys_open+0x0/0x30 >>> [<ffffffff81196c31>] ? sys_open+0x1/0x30 >>> [<ffffffff81550cd5>] ? kprobe_exceptions_notify+0x3d5/0x430 >>> [<ffffffff81550f45>] ? notifier_call_chain+0x55/0x80 >>> [<ffffffff81550faa>] ? atomic_notifier_call_chain+0x1a/0x20 >>> [<ffffffff810acd0e>] ? notify_die+0x2e/0x30 >>> [<ffffffff8154e815>] ? do_int3+0x35/0xb0 >>> [<ffffffff8154e083>] ? int3+0x33/0x40 >>> [<ffffffff81196c30>] ? sys_open+0x0/0x30 >> >> >> >> -- >> David Smith >> Principal Software Engineer >> Red Hat ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Failure in syscall.open probe 2017-08-27 12:38 ` Arkady @ 2017-08-28 5:51 ` Arkady 2017-08-28 6:28 ` Arkady 0 siblings, 1 reply; 16+ messages in thread From: Arkady @ 2017-08-28 5:51 UTC (permalink / raw) To: David Smith; +Cc: systemtap The following patch solves the problem, or, at least, makes it significantly less probable diff --git a/runtime/linux/copy.c b/runtime/linux/copy.c index c5022b5..2da01f1 100644 --- a/runtime/linux/copy.c +++ b/runtime/linux/copy.c @@ -39,6 +39,28 @@ * <i>count</i> bytes and returns <i>count</i>. */ +static long _stp_copy_from_user_inatomic(char *dst, const char __user *src, long count) +{ + int res; + long bytes = 0; + while (count) + { + res = __copy_from_user_inatomic(dst, src, 1); + if (unlikely(res != 1)) + { + return res; + } + if (unlikely((*dst == 0))) + { + break; + } + bytes++; + count--; + } + + return bytes; +} + /* XXX: see also kread/uread in loc2c-runtime.h */ static long _stp_strncpy_from_user(char *dst, const char __user *src, long count) { @@ -46,14 +68,8 @@ static long _stp_strncpy_from_user(char *dst, const char __user *src, long count mm_segment_t _oldfs = get_fs(); set_fs(USER_DS); pagefault_disable(); - /* XXX: The following preempt() manipulations should be - redundant with probe entry/exit code, but for unknown - reasons on RHEL5/6 conversions.exp intermittently fails - without this. */ - preempt_disable(); if (!lookup_bad_addr(VERIFY_READ, (const unsigned long)src, count)) - res = strncpy_from_user(dst, src, count); - preempt_enable_no_resched(); + res = _stp_copy_from_user_inatomic(dst, src, count); pagefault_enable(); set_fs(_oldfs); return res; On Sun, Aug 27, 2017 at 3:38 PM, Arkady <arkady.miasnikov@gmail.com> wrote: > The following comment is probably relevant: > /* > * On some kernels (e.g. 2.6.39), even with preemption disabled, the > strncpy_from_user, > * instead of returning -1 after a page fault, schedules the process, > so we drop events > * because of the preemption. This function reads the user buffer in > atomic chunks, and > * returns when there's an error or the terminator is found > */ > > https://github.com/draios/sysdig/blob/dev/driver/ppm_events.c#L108 > >> On Fri, Aug 25, 2017 at 10:52 PM, David Smith <dsmith@redhat.com> wrote: >>> Arkady, >>> >>> The "good" news is that I've duplicated your problem. I went ahead and >>> filed PR22012 (<https://sourceware.org/bugzilla/show_bug.cgi?id=22012>) >>> on this issue so I won't forget it. >>> >>> I'm looking into it. >>> >>> >>> On Fri, Aug 25, 2017 at 12:25 PM, Arkady <arkady.miasnikov@gmail.com> wrote: >>>> Reproduce the problem >>>> >>>> Run the following one liner on CentOS 6.9 >>>> >>>> stap -e "global AF%;probe syscall.open {tid = tid();AF[tid] = >>>> filename;}probe syscall.open.return{tid = tid();delete AF[tid];}" >>>> >>>> Run a bash loop >>>> touch test.txt;while [ 1 ];do cat test.txt;done; >>>> >>>> Monitor the kernel log >>>> tail -f /var/log/messages >>>> >>>> After a couple of minutes you shall see >>>> >>>> BUG: scheduling while atomic: cat/87764/0x10000001 >>>> Modules linked in: test_open(U) fuse rfcomm sco bridge bnep l2cap >>>> autofs4 bnx2fc cnic uio fcoe libfcoe libfc 8021q scsi_transport_fc >>>> garp stp scsi_tgt llc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 >>>> iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 >>>> xt_state nf_conntrack ip6table_filter ip6_tables ib_ipoib rdma_ucm >>>> ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core >>>> ib_addr ipv6 uinput microcode vmware_balloon uvcvideo videodev >>>> v4l2_compat_ioctl32 btusb bluetooth rfkill snd_seq_midi e1000 >>>> snd_seq_midi_event snd_ens1371 snd_rawmidi snd_ac97_codec ac97_bus >>>> snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc >>>> sg i2c_piix4 shpchp ext4 jbd2 mbcache sd_mod crc_t10dif sr_mod cdrom >>>> mptspi mptscsih mptbase scsi_transport_spi pata_acpi ata_generic >>>> ata_piix vmwgfx ttm drm_kms_helper drm i2c_core dm_mirror >>>> dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib] >>>> Pid: 87764, comm: cat Not tainted 2.6.32-696.10.1.el6.x86_64 #1 >>>> Call Trace: >>>> <#DB> [<ffffffff81068244>] ? __schedule_bug+0x44/0x50 >>>> [<ffffffff8154ae0c>] ? schedule+0xa4c/0xb70 >>>> [<ffffffff812a6216>] ? vsnprintf+0x336/0x5e0 >>>> [<ffffffff810740aa>] ? __cond_resched+0x2a/0x40 >>>> [<ffffffff8154b200>] ? _cond_resched+0x30/0x40 >>>> [<ffffffff812a8b9a>] ? strncpy_from_user+0x4a/0x90 >>>> [<ffffffffa0705ff9>] ? probe_3602+0x5f9/0x1220 [test_open] >>>> [<ffffffff81196c31>] ? sys_open+0x1/0x30 >>>> [<ffffffffa0707fed>] ? enter_kprobe_probe+0x1ed/0x3a0 [test_open] >>>> [<ffffffff815512bb>] ? aggr_pre_handler+0x5b/0xb0 >>>> [<ffffffff81196c30>] ? sys_open+0x0/0x30 >>>> [<ffffffff81196c31>] ? sys_open+0x1/0x30 >>>> [<ffffffff81550cd5>] ? kprobe_exceptions_notify+0x3d5/0x430 >>>> [<ffffffff81550f45>] ? notifier_call_chain+0x55/0x80 >>>> [<ffffffff81550faa>] ? atomic_notifier_call_chain+0x1a/0x20 >>>> [<ffffffff810acd0e>] ? notify_die+0x2e/0x30 >>>> [<ffffffff8154e815>] ? do_int3+0x35/0xb0 >>>> [<ffffffff8154e083>] ? int3+0x33/0x40 >>>> [<ffffffff81196c30>] ? sys_open+0x0/0x30 >>> >>> >>> >>> -- >>> David Smith >>> Principal Software Engineer >>> Red Hat ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Failure in syscall.open probe 2017-08-28 5:51 ` Arkady @ 2017-08-28 6:28 ` Arkady 2017-08-28 17:19 ` David Smith 0 siblings, 1 reply; 16+ messages in thread From: Arkady @ 2017-08-28 6:28 UTC (permalink / raw) To: David Smith; +Cc: systemtap This patch follows the API and fixes a bug - increment src/dst pointers index c5022b5..082a4fd 100644 --- a/runtime/linux/copy.c +++ b/runtime/linux/copy.c @@ -24,6 +24,27 @@ * @{ */ +static long _stp_copy_from_user_inatomic(char *dst, const char __user *src, long count) +{ + int res; + long bytes = 0; + for (;count;dst++, src++, count--, bytes++) + { + res = __copy_from_user_inatomic(dst, src, 1); + if (unlikely(res != 0)) + { + *dst = 0; + break; + } + if (unlikely(*dst == 0)) + { + break; + } + } + + return bytes; +} + /** Copy a NULL-terminated string from userspace. * On success, returns the length of the string (not including the trailing * NULL). @@ -34,8 +55,8 @@ * least <i>count</i> bytes long. * @param src Source address, in user space. * @param count Maximum number of bytes to copy, including the trailing NULL. - * - * If <i>count</i> is smaller than the length of the string, copies + * + * If <i>count</i> is smaller than the length of the string, copies * <i>count</i> bytes and returns <i>count</i>. */ @@ -46,14 +67,8 @@ static long _stp_strncpy_from_user(char *dst, const char __user *src, long count mm_segment_t _oldfs = get_fs(); set_fs(USER_DS); pagefault_disable(); - /* XXX: The following preempt() manipulations should be - redundant with probe entry/exit code, but for unknown - reasons on RHEL5/6 conversions.exp intermittently fails - without this. */ - preempt_disable(); if (!lookup_bad_addr(VERIFY_READ, (const unsigned long)src, count)) - res = strncpy_from_user(dst, src, count); - preempt_enable_no_resched(); + res = _stp_copy_from_user_inatomic(dst, src, count); pagefault_enable(); set_fs(_oldfs); return res; On Mon, Aug 28, 2017 at 8:49 AM, Arkady <arkady.miasnikov@gmail.com> wrote: > The following patch solves the problem, or, at least, makes it > significantly less probable > > diff --git a/runtime/linux/copy.c b/runtime/linux/copy.c > index c5022b5..2da01f1 100644 > --- a/runtime/linux/copy.c > +++ b/runtime/linux/copy.c > @@ -39,6 +39,28 @@ > * <i>count</i> bytes and returns <i>count</i>. > */ > > +static long _stp_copy_from_user_inatomic(char *dst, const char __user > *src, long count) > +{ > + int res; > + long bytes = 0; > + while (count) > + { > + res = __copy_from_user_inatomic(dst, src, 1); > + if (unlikely(res != 1)) > + { > + return res; > + } > + if (unlikely((*dst == 0))) > + { > + break; > + } > + bytes++; > + count--; > + } > + > + return bytes; > +} > + > /* XXX: see also kread/uread in loc2c-runtime.h */ > static long _stp_strncpy_from_user(char *dst, const char __user *src, > long count) > { > @@ -46,14 +68,8 @@ static long _stp_strncpy_from_user(char *dst, const > char __user *src, long count > mm_segment_t _oldfs = get_fs(); > set_fs(USER_DS); > pagefault_disable(); > - /* XXX: The following preempt() manipulations should be > - redundant with probe entry/exit code, but for unknown > - reasons on RHEL5/6 conversions.exp intermittently fails > - without this. */ > - preempt_disable(); > if (!lookup_bad_addr(VERIFY_READ, (const unsigned long)src, count)) > - res = strncpy_from_user(dst, src, count); > - preempt_enable_no_resched(); > + res = _stp_copy_from_user_inatomic(dst, src, count); > pagefault_enable(); > set_fs(_oldfs); > return res; > > On Sun, Aug 27, 2017 at 3:38 PM, Arkady <arkady.miasnikov@gmail.com> wrote: >> The following comment is probably relevant: >> /* >> * On some kernels (e.g. 2.6.39), even with preemption disabled, the >> strncpy_from_user, >> * instead of returning -1 after a page fault, schedules the process, >> so we drop events >> * because of the preemption. This function reads the user buffer in >> atomic chunks, and >> * returns when there's an error or the terminator is found >> */ >> >> https://github.com/draios/sysdig/blob/dev/driver/ppm_events.c#L108 >> >>> On Fri, Aug 25, 2017 at 10:52 PM, David Smith <dsmith@redhat.com> wrote: >>>> Arkady, >>>> >>>> The "good" news is that I've duplicated your problem. I went ahead and >>>> filed PR22012 (<https://sourceware.org/bugzilla/show_bug.cgi?id=22012>) >>>> on this issue so I won't forget it. >>>> >>>> I'm looking into it. >>>> >>>> >>>> On Fri, Aug 25, 2017 at 12:25 PM, Arkady <arkady.miasnikov@gmail.com> wrote: >>>>> Reproduce the problem >>>>> >>>>> Run the following one liner on CentOS 6.9 >>>>> >>>>> stap -e "global AF%;probe syscall.open {tid = tid();AF[tid] = >>>>> filename;}probe syscall.open.return{tid = tid();delete AF[tid];}" >>>>> >>>>> Run a bash loop >>>>> touch test.txt;while [ 1 ];do cat test.txt;done; >>>>> >>>>> Monitor the kernel log >>>>> tail -f /var/log/messages >>>>> >>>>> After a couple of minutes you shall see >>>>> >>>>> BUG: scheduling while atomic: cat/87764/0x10000001 >>>>> Modules linked in: test_open(U) fuse rfcomm sco bridge bnep l2cap >>>>> autofs4 bnx2fc cnic uio fcoe libfcoe libfc 8021q scsi_transport_fc >>>>> garp stp scsi_tgt llc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 >>>>> iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 >>>>> xt_state nf_conntrack ip6table_filter ip6_tables ib_ipoib rdma_ucm >>>>> ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core >>>>> ib_addr ipv6 uinput microcode vmware_balloon uvcvideo videodev >>>>> v4l2_compat_ioctl32 btusb bluetooth rfkill snd_seq_midi e1000 >>>>> snd_seq_midi_event snd_ens1371 snd_rawmidi snd_ac97_codec ac97_bus >>>>> snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc >>>>> sg i2c_piix4 shpchp ext4 jbd2 mbcache sd_mod crc_t10dif sr_mod cdrom >>>>> mptspi mptscsih mptbase scsi_transport_spi pata_acpi ata_generic >>>>> ata_piix vmwgfx ttm drm_kms_helper drm i2c_core dm_mirror >>>>> dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib] >>>>> Pid: 87764, comm: cat Not tainted 2.6.32-696.10.1.el6.x86_64 #1 >>>>> Call Trace: >>>>> <#DB> [<ffffffff81068244>] ? __schedule_bug+0x44/0x50 >>>>> [<ffffffff8154ae0c>] ? schedule+0xa4c/0xb70 >>>>> [<ffffffff812a6216>] ? vsnprintf+0x336/0x5e0 >>>>> [<ffffffff810740aa>] ? __cond_resched+0x2a/0x40 >>>>> [<ffffffff8154b200>] ? _cond_resched+0x30/0x40 >>>>> [<ffffffff812a8b9a>] ? strncpy_from_user+0x4a/0x90 >>>>> [<ffffffffa0705ff9>] ? probe_3602+0x5f9/0x1220 [test_open] >>>>> [<ffffffff81196c31>] ? sys_open+0x1/0x30 >>>>> [<ffffffffa0707fed>] ? enter_kprobe_probe+0x1ed/0x3a0 [test_open] >>>>> [<ffffffff815512bb>] ? aggr_pre_handler+0x5b/0xb0 >>>>> [<ffffffff81196c30>] ? sys_open+0x0/0x30 >>>>> [<ffffffff81196c31>] ? sys_open+0x1/0x30 >>>>> [<ffffffff81550cd5>] ? kprobe_exceptions_notify+0x3d5/0x430 >>>>> [<ffffffff81550f45>] ? notifier_call_chain+0x55/0x80 >>>>> [<ffffffff81550faa>] ? atomic_notifier_call_chain+0x1a/0x20 >>>>> [<ffffffff810acd0e>] ? notify_die+0x2e/0x30 >>>>> [<ffffffff8154e815>] ? do_int3+0x35/0xb0 >>>>> [<ffffffff8154e083>] ? int3+0x33/0x40 >>>>> [<ffffffff81196c30>] ? sys_open+0x0/0x30 >>>> >>>> >>>> >>>> -- >>>> David Smith >>>> Principal Software Engineer >>>> Red Hat ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Failure in syscall.open probe 2017-08-28 6:28 ` Arkady @ 2017-08-28 17:19 ` David Smith 2017-08-29 16:06 ` David Smith 0 siblings, 1 reply; 16+ messages in thread From: David Smith @ 2017-08-28 17:19 UTC (permalink / raw) To: Arkady; +Cc: systemtap On Mon, Aug 28, 2017 at 1:28 AM, Arkady <arkady.miasnikov@gmail.com> wrote: > This patch follows the API and fixes a bug - increment src/dst pointers ... patch deleted ... Your patch looked fine, but I decided to go a slightly different route. We had a function, called kderef_string_(), over in runtime/linux/loc2c-runtime.h, which did everything we needed here, but it was hardcoded to only work for kernel strings. I abstracted it slightly and renamed it '_stp_deref_string_nofault()'. Now kderef_string() and _stp_strncpy_from_user() both call _stp_deref_string_nofault(). The advantage here is that we're using the same function to read kernel and user strings. Assuming after a testsuite run I don't see any issues, I'll check this in tomorrow. ==== diff --git a/runtime/linux/copy.c b/runtime/linux/copy.c index c5022b5e2..892700865 100644 --- a/runtime/linux/copy.c +++ b/runtime/linux/copy.c @@ -39,24 +39,10 @@ * <i>count</i> bytes and returns <i>count</i>. */ -/* XXX: see also kread/uread in loc2c-runtime.h */ -static long _stp_strncpy_from_user(char *dst, const char __user *src, long count) +static long _stp_strncpy_from_user(char *dst, const char __user *src, + long count) { - long res = -EFAULT; - mm_segment_t _oldfs = get_fs(); - set_fs(USER_DS); - pagefault_disable(); - /* XXX: The following preempt() manipulations should be - redundant with probe entry/exit code, but for unknown - reasons on RHEL5/6 conversions.exp intermittently fails - without this. */ - preempt_disable(); - if (!lookup_bad_addr(VERIFY_READ, (const unsigned long)src, count)) - res = strncpy_from_user(dst, src, count); - preempt_enable_no_resched(); - pagefault_enable(); - set_fs(_oldfs); - return res; + return _stp_deref_string_nofault(dst, src, count, USER_DS); } /** Copy a block of data from user space. diff --git a/runtime/linux/loc2c-runtime.h b/runtime/linux/loc2c-runtime.h index 2920abed7..39dece07d 100644 --- a/runtime/linux/loc2c-runtime.h +++ b/runtime/linux/loc2c-runtime.h @@ -716,16 +716,30 @@ static inline char *kderef_buffer_(char *dst, void *addr, size_t len) _r; \ }) -/* The following is for kernel strings, see the uconversions.stp - tapset for user_string functions. */ +/* + * _stp_deref_string_nofault(): safely read a string from memory + * without a DEREF_FAULT on error + * + * dst: read the string into this address + * addr: address to read from + * len: maximum number of bytes to read + * seg: memory segment to use, either kernel (KERNEL_DS) or user + * (USER_DS) + * + * If this function gets an error while trying to read memory, -EFAULT + * is returned. On success, the number of bytes copied is returned + * (not including the trailing NULL). Note that the kernel will not + * pagefault when trying to read the string. + */ -static inline char *kderef_string_(char *dst, void *addr, size_t len) +static inline long _stp_deref_string_nofault(char *dst, const char *addr, + size_t len, mm_segment_t seg) { int err = 0; - size_t i; + size_t i = 0; mm_segment_t oldfs = get_fs(); - set_fs(KERNEL_DS); + set_fs(seg); pagefault_disable(); if (lookup_bad_addr(VERIFY_READ, (uintptr_t)addr, len)) err = 1; @@ -746,13 +760,13 @@ static inline char *kderef_string_(char *dst, void *addr, size_t len) pagefault_enable(); set_fs(oldfs); - return err ? (char *)-1 : dst; + return err ? -EFAULT : i; } #define kderef_string(dst, addr, maxbytes) \ ({ \ - char *_r = kderef_string_((dst), (void *)(uintptr_t)(addr), (maxbytes)); \ - if (_r == (char *)-1) \ + long _r = _stp_deref_string_nofault((dst), (void *)(uintptr_t)(addr), (maxbytes), KERNEL_DS); \ + if (_r < 0) \ DEREF_FAULT(addr); \ _r; \ }) ==== -- David Smith Principal Software Engineer Red Hat ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Failure in syscall.open probe 2017-08-28 17:19 ` David Smith @ 2017-08-29 16:06 ` David Smith 2017-08-29 16:44 ` Arkady 0 siblings, 1 reply; 16+ messages in thread From: David Smith @ 2017-08-29 16:06 UTC (permalink / raw) To: Arkady; +Cc: systemtap I've checked that patch in as commit cd8837d56. Let me know if you still see that "scheduling while atomic" bug. On Mon, Aug 28, 2017 at 12:19 PM, David Smith <dsmith@redhat.com> wrote: > On Mon, Aug 28, 2017 at 1:28 AM, Arkady <arkady.miasnikov@gmail.com> wrote: >> This patch follows the API and fixes a bug - increment src/dst pointers > > ... patch deleted ... > > Your patch looked fine, but I decided to go a slightly different > route. We had a function, called kderef_string_(), over in > runtime/linux/loc2c-runtime.h, which did everything we needed here, > but it was hardcoded to only work for kernel strings. I abstracted it > slightly and renamed it '_stp_deref_string_nofault()'. Now > kderef_string() and _stp_strncpy_from_user() both call > _stp_deref_string_nofault(). The advantage here is that we're using > the same function to read kernel and user strings. > > Assuming after a testsuite run I don't see any issues, I'll check this > in tomorrow. > > ==== > diff --git a/runtime/linux/copy.c b/runtime/linux/copy.c > index c5022b5e2..892700865 100644 > --- a/runtime/linux/copy.c > +++ b/runtime/linux/copy.c > @@ -39,24 +39,10 @@ > * <i>count</i> bytes and returns <i>count</i>. > */ > > -/* XXX: see also kread/uread in loc2c-runtime.h */ > -static long _stp_strncpy_from_user(char *dst, const char __user *src, > long count) > +static long _stp_strncpy_from_user(char *dst, const char __user *src, > + long count) > { > - long res = -EFAULT; > - mm_segment_t _oldfs = get_fs(); > - set_fs(USER_DS); > - pagefault_disable(); > - /* XXX: The following preempt() manipulations should be > - redundant with probe entry/exit code, but for unknown > - reasons on RHEL5/6 conversions.exp intermittently fails > - without this. */ > - preempt_disable(); > - if (!lookup_bad_addr(VERIFY_READ, (const unsigned long)src, count)) > - res = strncpy_from_user(dst, src, count); > - preempt_enable_no_resched(); > - pagefault_enable(); > - set_fs(_oldfs); > - return res; > + return _stp_deref_string_nofault(dst, src, count, USER_DS); > } > > /** Copy a block of data from user space. > diff --git a/runtime/linux/loc2c-runtime.h b/runtime/linux/loc2c-runtime.h > index 2920abed7..39dece07d 100644 > --- a/runtime/linux/loc2c-runtime.h > +++ b/runtime/linux/loc2c-runtime.h > @@ -716,16 +716,30 @@ static inline char *kderef_buffer_(char *dst, > void *addr, size_t len) > _r; \ > }) > > -/* The following is for kernel strings, see the uconversions.stp > - tapset for user_string functions. */ > +/* > + * _stp_deref_string_nofault(): safely read a string from memory > + * without a DEREF_FAULT on error > + * > + * dst: read the string into this address > + * addr: address to read from > + * len: maximum number of bytes to read > + * seg: memory segment to use, either kernel (KERNEL_DS) or user > + * (USER_DS) > + * > + * If this function gets an error while trying to read memory, -EFAULT > + * is returned. On success, the number of bytes copied is returned > + * (not including the trailing NULL). Note that the kernel will not > + * pagefault when trying to read the string. > + */ > > -static inline char *kderef_string_(char *dst, void *addr, size_t len) > +static inline long _stp_deref_string_nofault(char *dst, const char *addr, > + size_t len, mm_segment_t seg) > { > int err = 0; > - size_t i; > + size_t i = 0; > mm_segment_t oldfs = get_fs(); > > - set_fs(KERNEL_DS); > + set_fs(seg); > pagefault_disable(); > if (lookup_bad_addr(VERIFY_READ, (uintptr_t)addr, len)) > err = 1; > @@ -746,13 +760,13 @@ static inline char *kderef_string_(char *dst, > void *addr, size_t len) > pagefault_enable(); > set_fs(oldfs); > > - return err ? (char *)-1 : dst; > + return err ? -EFAULT : i; > } > > #define kderef_string(dst, addr, maxbytes) \ > ({ \ > - char *_r = kderef_string_((dst), (void *)(uintptr_t)(addr), (maxbytes)); \ > - if (_r == (char *)-1) \ > + long _r = _stp_deref_string_nofault((dst), (void > *)(uintptr_t)(addr), (maxbytes), KERNEL_DS); \ > + if (_r < 0) \ > DEREF_FAULT(addr); \ > _r; \ > }) > ==== > -- > David Smith > Principal Software Engineer > Red Hat -- David Smith Principal Software Engineer Red Hat ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Failure in syscall.open probe 2017-08-29 16:06 ` David Smith @ 2017-08-29 16:44 ` Arkady 2017-08-30 14:18 ` David Smith 0 siblings, 1 reply; 16+ messages in thread From: Arkady @ 2017-08-29 16:44 UTC (permalink / raw) To: David Smith; +Cc: systemtap On Tue, Aug 29, 2017 at 7:05 PM, David Smith <dsmith@redhat.com> wrote: > I've checked that patch in as commit cd8837d56. Let me know if you > still see that "scheduling while atomic" bug. > What is the policy for back port of patches in the SystemTap project? For example, will release 3.1 be updated? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Failure in syscall.open probe 2017-08-29 16:44 ` Arkady @ 2017-08-30 14:18 ` David Smith 0 siblings, 0 replies; 16+ messages in thread From: David Smith @ 2017-08-30 14:18 UTC (permalink / raw) To: Arkady; +Cc: systemtap On Tue, Aug 29, 2017 at 11:44 AM, Arkady <arkady.miasnikov@gmail.com> wrote: > On Tue, Aug 29, 2017 at 7:05 PM, David Smith <dsmith@redhat.com> wrote: >> I've checked that patch in as commit cd8837d56. Let me know if you >> still see that "scheduling while atomic" bug. >> > > What is the policy for back port of patches in the SystemTap project? > For example, will release 3.1 be updated? I'm not sure we really have a true policy for Fedora. We've back ported things in the past when the patch was important enough, like support for a newer kernel. The biggest problem will be finding someone with the time to do the release. -- David Smith Principal Software Engineer Red Hat ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2017-08-30 14:18 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-08-24 14:26 Failure in syscall.open probe Arkady 2017-08-24 14:35 ` Arkady 2017-08-24 16:28 ` Arkady 2017-08-25 3:56 ` Arkady 2017-08-25 4:36 ` Arkady 2017-08-25 17:25 ` Arkady 2017-08-25 19:52 ` David Smith 2017-08-27 6:13 ` Daniel Doron 2017-08-27 12:22 ` Arkady 2017-08-27 12:38 ` Arkady 2017-08-28 5:51 ` Arkady 2017-08-28 6:28 ` Arkady 2017-08-28 17:19 ` David Smith 2017-08-29 16:06 ` David Smith 2017-08-29 16:44 ` Arkady 2017-08-30 14:18 ` David Smith
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).