public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* [Bug runtime/30831] New: The do_each_thread macro has been removed from newest  linux 6.6 kernels
@ 2023-09-07 17:46 wcohen at redhat dot com
  2023-09-07 19:23 ` [Bug runtime/30831] Systemtap scripts fail to compile on " wcohen at redhat dot com
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: wcohen at redhat dot com @ 2023-09-07 17:46 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=30831

            Bug ID: 30831
           Summary: The do_each_thread macro has been removed from newest
                    linux 6.6 kernels
           Product: systemtap
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: runtime
          Assignee: systemtap at sourceware dot org
          Reporter: wcohen at redhat dot com
  Target Milestone: ---

When working with the newest linux 6.6 kernels in Fedora Rawhide scripts do not
compile because do_each_thread no longer exists in the kernel:

$ uname -a
Linux rawhide 6.6.0-0.rc0.20230905git3f86ed6ec0b3.7.fc40.x86_64 #1 SMP
PREEMPT_DYNAMIC Tue Sep  5 14:23:00 UTC 2023 x86_64 GNU/Linux
$ ../install/bin/stap -p4 -e 'probe begin {printf("hi!\n");exit()}'
In file included from
/home/wcohen/systemtap_write/install/share/systemtap/runtime/linux/task_finder.c:15,
                 from
/home/wcohen/systemtap_write/install/share/systemtap/runtime/linux/runtime.h:328,
                 from
/home/wcohen/systemtap_write/install/share/systemtap/runtime/runtime.h:26,
                 from
/tmp/stapFRPbtc/stap_c1555c4c3e68d9141933830206b88dd6_1339_src.c:21:
/home/wcohen/systemtap_write/install/share/systemtap/runtime/linux/task_finder2.c:
In function ‘stap_start_task_finder’:
/home/wcohen/systemtap_write/install/share/systemtap/runtime/linux/task_finder2.c:1721:9:
error: implicit declaration of function ‘do_each_thread’; did you mean
‘for_each_thread’? [-Werror=implicit-function-declaration]
 1721 |         do_each_thread(grp, tsk) {
      |         ^~~~~~~~~~~~~~
      |         for_each_thread
/home/wcohen/systemtap_write/install/share/systemtap/runtime/linux/task_finder2.c:1721:33:
error: expected ‘;’ before ‘{’ token
 1721 |         do_each_thread(grp, tsk) {
      |                                 ^~
      |                                 ;
/home/wcohen/systemtap_write/install/share/systemtap/runtime/linux/task_finder2.c:
In function ‘stap_task_finder_post_init’:
/home/wcohen/systemtap_write/install/share/systemtap/runtime/linux/task_finder2.c:1868:33:
error: expected ‘;’ before ‘{’ token
 1868 |         do_each_thread(grp, tsk) {
      |                                 ^~
      |                                 ;
In file included from
/home/wcohen/systemtap_write/install/share/systemtap/runtime/namespaces.h:17,
                 from
/tmp/stapFRPbtc/stap_c1555c4c3e68d9141933830206b88dd6_1339_src.c:73:
/home/wcohen/systemtap_write/install/share/systemtap/runtime/linux/namespaces.h:
In function ‘_stp_task_struct_valid’:
/home/wcohen/systemtap_write/install/share/systemtap/runtime/linux/namespaces.h:107:27:
error: expected ‘;’ before ‘{’ token
  107 |   do_each_thread(grp, tsk) {
      |                           ^~
      |                           ;
cc1: all warnings being treated as errors
make[1]: *** [scripts/Makefile.build:243:
/tmp/stapFRPbtc/stap_c1555c4c3e68d9141933830206b88dd6_1339_src.o] Error 1
make: *** [Makefile:2047: /tmp/stapFRPbtc] Error 2
WARNING: kbuild exited with status: 2
Pass 4: compilation failed.  [man error::pass4]


This is caused by kernel git commit 5ffd2c37cb7a53d52099e5ed1fd7ccbc9e358791

Author: Oleg Nesterov <oleg@redhat.com>  2023-08-17 12:37:08
Committer: Andrew Morton <akpm@linux-foundation.org>  2023-08-21 16:46:25
Parent: cdaac8e7e5a059f9b5e816cda257f08d0abffacd (nilfs2: fix WARNING in
mark_buffer_dirty due to discarded buffer reuse)
Child:  3d0b71398490fbf68f9c5766b6dce71eb89c4da0 (kstrtox: consistently use
_tolower())
Branches: master, remotes/origin/master
Follows: v6.5-rc4
Precedes: 

    kill do_each_thread()

    Eric has pointed out that we still have 3 users of do_each_thread().
    Change them to use for_each_process_thread() and kill this helper.

    There is a subtle change, after do_each_thread/while_each_thread g == t ==
    &init_task, while after for_each_process_thread() they both point to
    nowhere, but this doesn't matter.

    > Why is for_each_process_thread() better than do_each_thread()?

    Say, for_each_process_thread() is rcu safe, do_each_thread() is not.

    And certainly

        for_each_process_thread(p, t) {
                do_something(p, t);
        }

    looks better than

        do_each_thread(p, t) {
                do_something(p, t);
        } while_each_thread(p, t);

    And again, there are only 3 users of this awkward helper left.  It should
    have been killed years ago and in fact I thought it had already been
    killed.  It uses while_each_thread() which needs some changes.

    Link: https://lkml.kernel.org/r/20230817163708.GA8248@redhat.com
    Signed-off-by: Oleg Nesterov <oleg@redhat.com>
    Reviewed-by: Kees Cook <keescook@chromium.org>
    Cc: "Christian Brauner (Microsoft)" <brauner@kernel.org>
    Cc: Eric W. Biederman <ebiederm@xmission.com>
    Cc: Jiri Slaby <jirislaby@kernel.org> # tty/serial
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug runtime/30831] Systemtap scripts fail to compile on newest  linux 6.6 kernels
  2023-09-07 17:46 [Bug runtime/30831] New: The do_each_thread macro has been removed from newest linux 6.6 kernels wcohen at redhat dot com
@ 2023-09-07 19:23 ` wcohen at redhat dot com
  2023-09-08  2:49 ` wcohen at redhat dot com
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: wcohen at redhat dot com @ 2023-09-07 19:23 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=30831

William Cohen <wcohen at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|The do_each_thread macro    |Systemtap scripts fail to
                   |has been removed from       |compile on newest  linux
                   |newest  linux 6.6 kernels   |6.6 kernels

--- Comment #1 from William Cohen <wcohen at redhat dot com> ---
With minor tweaks in the runtime to use the for_each_process_thread macro. Get
a new set of errors when the generated C code is compiled:

$ sudo  ../install/bin/stap  -e 'probe begin {printf("hi!\n");exit()}'
In file included from
/home/wcohen/systemtap_write/install/share/systemtap/runtime/runtime.h:26,
                 from
/tmp/stapGSFULg/stap_9dfacd1f722045d5e3bceb982809b15d_1345_src.c:21:
/home/wcohen/systemtap_write/install/share/systemtap/runtime/stp_task_work.c:
In function ‘stp_task_work_add’:
/home/wcohen/systemtap_write/install/share/systemtap/runtime/stp_task_work.c:9:102:
error: implicit conversion from ‘enum <anonymous>’ to ‘enum
task_work_notify_mode’ [-Werror=enum-conversion]
    9 | apper(int, (* (task_work_add_fn)kallsyms_task_work_add)((a), (b), (c)))
      |                                                                   ^~~

/home/wcohen/systemtap_write/install/share/systemtap/runtime/linux/runtime.h:298:21:
note: in definition of macro ‘ibt_wrapper’
  298 |   rettype retval = (function);                  \
      |                     ^~~~~~~~
/home/wcohen/systemtap_write/install/share/systemtap/runtime/stp_task_work.c:92:14:
note: in expansion of macro ‘task_work_add’
   92 |         rc = task_work_add(task, twork, true);
      |              ^~~~~~~~~~~~~
In file included from ./include/linux/srcu.h:21,
                 from ./include/linux/notifier.h:16,
                 from ./arch/x86/include/asm/uprobes.h:13,
                 from ./include/linux/uprobes.h:49,
                 from ./include/linux/mm_types.h:16,
                 from ./include/linux/mmzone.h:22,
                 from ./include/linux/gfp.h:7,
                 from
/home/wcohen/systemtap_write/install/share/systemtap/runtime/linux/runtime_defines.h:20,
                 from
/home/wcohen/systemtap_write/install/share/systemtap/runtime/runtime_defines.h:8,
                 from
/tmp/stapGSFULg/stap_9dfacd1f722045d5e3bceb982809b15d_1345_src.c:12:
In function ‘stp_synchronize_sched’,
    inlined from ‘_stp_cleanup_and_exit.part.0’ at
/home/wcohen/systemtap_write/install/share/systemtap/runtime/transport/transport.c:412:25:
./include/linux/workqueue.h:631:9: error: call to
‘__warn_flushing_systemwide_wq’ declared with attribute warning: Please avoid
flushing system-wide workqueues. [-Werror=attribute-warning]
  631 |         __warn_flushing_systemwide_wq();                               
\
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/wcohen/systemtap_write/install/share/systemtap/runtime/linux/runtime.h:407:3:
note: in expansion of macro ‘flush_scheduled_work’
  407 |   flush_scheduled_work();
      |   ^~~~~~~~~~~~~~~~~~~~
In function ‘stp_synchronize_sched’,
    inlined from ‘systemtap_module_exit’ at
/tmp/stapGSFULg/stap_9dfacd1f722045d5e3bceb982809b15d_1345_src.c:472:3,
    inlined from ‘_stp_cleanup_and_exit.part.0’ at
/home/wcohen/systemtap_write/install/share/systemtap/runtime/transport/transport.c:420:4:
./include/linux/workqueue.h:631:9: error: call to
‘__warn_flushing_systemwide_wq’ declared with attribute warning: Please avoid
flushing system-wide workqueues. [-Werror=attribute-warning]
  631 |         __warn_flushing_systemwide_wq();                               
\
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/wcohen/systemtap_write/install/share/systemtap/runtime/linux/runtime.h:407:3:
note: in expansion of macro ‘flush_scheduled_work’
  407 |   flush_scheduled_work();
      |   ^~~~~~~~~~~~~~~~~~~~
In function ‘stp_synchronize_sched’,
    inlined from ‘systemtap_module_exit’ at
/tmp/stapGSFULg/stap_9dfacd1f722045d5e3bceb982809b15d_1345_src.c:485:3,
    inlined from ‘_stp_cleanup_and_exit.part.0’ at
/home/wcohen/systemtap_write/install/share/systemtap/runtime/transport/transport.c:420:4:
./include/linux/workqueue.h:631:9: error: call to
‘__warn_flushing_systemwide_wq’ declared with attribute warning: Please avoid
flushing system-wide workqueues. [-Werror=attribute-warning]
  631 |         __warn_flushing_systemwide_wq();                               
\
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/wcohen/systemtap_write/install/share/systemtap/runtime/linux/runtime.h:407:3:
note: in expansion of macro ‘flush_scheduled_work’
  407 |   flush_scheduled_work();
      |   ^~~~~~~~~~~~~~~~~~~~
In function ‘stp_synchronize_sched’,
    inlined from ‘systemtap_module_exit’ at
/tmp/stapGSFULg/stap_9dfacd1f722045d5e3bceb982809b15d_1345_src.c:488:3,
    inlined from ‘_stp_cleanup_and_exit.part.0’ at
/home/wcohen/systemtap_write/install/share/systemtap/runtime/transport/transport.c:420:4:
./include/linux/workqueue.h:631:9: error: call to
‘__warn_flushing_systemwide_wq’ declared with attribute warning: Please avoid
flushing system-wide workqueues. [-Werror=attribute-warning]
  631 |         __warn_flushing_systemwide_wq();                               
\
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/wcohen/systemtap_write/install/share/systemtap/runtime/linux/runtime.h:407:3:
note: in expansion of macro ‘flush_scheduled_work’
  407 |   flush_scheduled_work();
      |   ^~~~~~~~~~~~~~~~~~~~
In function ‘stp_synchronize_sched’,
    inlined from ‘systemtap_module_init’ at
/tmp/stapGSFULg/stap_9dfacd1f722045d5e3bceb982809b15d_1345_src.c:420:3,
    inlined from ‘_stp_handle_start’ at
/home/wcohen/systemtap_write/install/share/systemtap/runtime/transport/transport.c:264:13,
    inlined from ‘_stp_ctl_write_cmd’ at
/home/wcohen/systemtap_write/install/share/systemtap/runtime/transport/control.c:82:17:
./include/linux/workqueue.h:631:9: error: call to
‘__warn_flushing_systemwide_wq’ declared with attribute warning: Please avoid
flushing system-wide workqueues. [-Werror=attribute-warning]
  631 |         __warn_flushing_systemwide_wq();                               
\
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/wcohen/systemtap_write/install/share/systemtap/runtime/linux/runtime.h:407:3:
note: in expansion of macro ‘flush_scheduled_work’
  407 |   flush_scheduled_work();
      |   ^~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
make[1]: *** [scripts/Makefile.build:243:
/tmp/stapGSFULg/stap_9dfacd1f722045d5e3bceb982809b15d_1345_src.o] Error 1
make: *** [Makefile:1931: /tmp/stapGSFULg] Error 2
WARNING: kbuild exited with status: 2
Pass 4: compilation failed.  [man error::pass4]

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug runtime/30831] Systemtap scripts fail to compile on newest  linux 6.6 kernels
  2023-09-07 17:46 [Bug runtime/30831] New: The do_each_thread macro has been removed from newest linux 6.6 kernels wcohen at redhat dot com
  2023-09-07 19:23 ` [Bug runtime/30831] Systemtap scripts fail to compile on " wcohen at redhat dot com
@ 2023-09-08  2:49 ` wcohen at redhat dot com
  2023-09-18 13:46 ` wcohen at redhat dot com
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: wcohen at redhat dot com @ 2023-09-08  2:49 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=30831

--- Comment #2 from William Cohen <wcohen at redhat dot com> ---
A bit more data.  The reproducer works with Fedora Rawhide
kernel-6.5.0-57.fc40.x86_64.  The failure happens with the later 6.6 kernels.

There are two issues going on:
-problem with the type used for the ibt_wrapper for task_work_add.
-use of flush_scheduled_work() is frowned upon because of possible deadlock
issues.
 git commit c4f135d643823a869becfa87539f7820ef9d5bfa has details on the issues:


Author: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>  2022-06-01 03:32:47
Committer: Tejun Heo <tj@kernel.org>  2022-06-07 13:07:14
Parent: e71e60cd74df9386c3f684c54888f2367050b831 (Merge tag
'dma-mapping-5.19-2022-06-06' of git://git.infradead.org/users/hch/dma-mapping)
Child:  873a400938b31a1e443c4d94b560b78300787540 (workqueue: Fix type of cpu in
trace event)
Branches: master, remotes/origin/master, remotes/origin/mmu_gather-race-fix,
remotes/origin/x86-rep-insns, remotes/origin/x86-uaccess-cleanup
Follows: v5.19-rc1
Precedes: v5.19-rc2

    workqueue: Wrap flush_workqueue() using a macro

    Since flush operation synchronously waits for completion, flushing
    system-wide WQs (e.g. system_wq) might introduce possibility of deadlock
    due to unexpected locking dependency. Tejun Heo commented at [1] that it
    makes no sense at all to call flush_workqueue() on the shared WQs as the
    caller has no idea what it's gonna end up waiting for.

    Although there is flush_scheduled_work() which flushes system_wq WQ with
    "Think twice before calling this function! It's very easy to get into
    trouble if you don't take great care." warning message, syzbot found a
    circular locking dependency caused by flushing system_wq WQ [2].

    Therefore, let's change the direction to that developers had better use
    their local WQs if flush_scheduled_work()/flush_workqueue(system_*_wq) is
    inevitable.

    Steps for converting system-wide WQs into local WQs are explained at [3],
    and a conversion to stop flushing system-wide WQs is in progress. Now we
    want some mechanism for preventing developers who are not aware of this
    conversion from again start flushing system-wide WQs.

    Since I found that WARN_ON() is complete but awkward approach for teaching
    developers about this problem, let's use __compiletime_warning() for
    incomplete but handy approach. For completeness, we will also insert
    WARN_ON() into __flush_workqueue() after all in-tree users stopped calling
    flush_scheduled_work().

    Link: https://lore.kernel.org/all/YgnQGZWT%2Fn3VAITX@slm.duckdns.org/ [1]
    Link: https://syzkaller.appspot.com/bug?extid=bde0f89deacca7c765b8 [2]
    Link:
https://lkml.kernel.org/r/49925af7-78a8-a3dd-bce6-cfc02e1a9236@I-love.SAKURA.ne.jp
[3]
    Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
    Signed-off-by: Tejun Heo <tj@kernel.org>

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug runtime/30831] Systemtap scripts fail to compile on newest  linux 6.6 kernels
  2023-09-07 17:46 [Bug runtime/30831] New: The do_each_thread macro has been removed from newest linux 6.6 kernels wcohen at redhat dot com
  2023-09-07 19:23 ` [Bug runtime/30831] Systemtap scripts fail to compile on " wcohen at redhat dot com
  2023-09-08  2:49 ` wcohen at redhat dot com
@ 2023-09-18 13:46 ` wcohen at redhat dot com
  2023-09-18 13:51 ` wcohen at redhat dot com
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: wcohen at redhat dot com @ 2023-09-18 13:46 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=30831

--- Comment #3 from William Cohen <wcohen at redhat dot com> ---
Created attachment 15116
  --> https://sourceware.org/bugzilla/attachment.cgi?id=15116&action=edit
Move to using the current for_each_proecss_thread macro in the kernel

This is one of the fixes needed for compiling systemtap instrumentation on
Linux 6.6 kernels.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug runtime/30831] Systemtap scripts fail to compile on newest  linux 6.6 kernels
  2023-09-07 17:46 [Bug runtime/30831] New: The do_each_thread macro has been removed from newest linux 6.6 kernels wcohen at redhat dot com
                   ` (2 preceding siblings ...)
  2023-09-18 13:46 ` wcohen at redhat dot com
@ 2023-09-18 13:51 ` wcohen at redhat dot com
  2023-09-27 14:24 ` wcohen at redhat dot com
  2023-09-27 20:03 ` wcohen at redhat dot com
  5 siblings, 0 replies; 10+ messages in thread
From: wcohen at redhat dot com @ 2023-09-18 13:51 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=30831

--- Comment #4 from William Cohen <wcohen at redhat dot com> ---
Created attachment 15117
  --> https://sourceware.org/bugzilla/attachment.cgi?id=15117&action=edit
Avoid implicit type conversions of argument for task_work_add

The linux 6.6 kernels are now flagging implicit type conversions with warnings
that will stop the compilations as they are treated as errors.  Now using
TWA_RESUME where available or setting a sane define value for it if it doesn't.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug runtime/30831] Systemtap scripts fail to compile on newest  linux 6.6 kernels
  2023-09-07 17:46 [Bug runtime/30831] New: The do_each_thread macro has been removed from newest linux 6.6 kernels wcohen at redhat dot com
                   ` (3 preceding siblings ...)
  2023-09-18 13:51 ` wcohen at redhat dot com
@ 2023-09-27 14:24 ` wcohen at redhat dot com
  2023-09-27 20:03 ` wcohen at redhat dot com
  5 siblings, 0 replies; 10+ messages in thread
From: wcohen at redhat dot com @ 2023-09-27 14:24 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=30831

--- Comment #5 from William Cohen <wcohen at redhat dot com> ---
Created attachment 15140
  --> https://sourceware.org/bugzilla/attachment.cgi?id=15140&action=edit
eliminate problem use of flush_scheduled_work()

This patch allows systemtap to build modules for linux-6.6 kernels.  It has
been tested out on (running the examples tests):
  x86_64 fedora rawhide kernel-6.6.0-0.rc3.26.fc40.x86_64
  x86_64 f38 kernel-6.4.15-200.fc38.x86_64
  x86_64 rhel9 kernel-5.14.0-368.el9.x86_64
  x86_64 rhel8 kernel-4.18.0-514.el8.x86_64

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug runtime/30831] Systemtap scripts fail to compile on newest  linux 6.6 kernels
  2023-09-07 17:46 [Bug runtime/30831] New: The do_each_thread macro has been removed from newest linux 6.6 kernels wcohen at redhat dot com
                   ` (4 preceding siblings ...)
  2023-09-27 14:24 ` wcohen at redhat dot com
@ 2023-09-27 20:03 ` wcohen at redhat dot com
  2023-11-23 15:16   ` Tetsuo Handa
  5 siblings, 1 reply; 10+ messages in thread
From: wcohen at redhat dot com @ 2023-09-27 20:03 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=30831

William Cohen <wcohen at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #6 from William Cohen <wcohen at redhat dot com> ---
The following commit have been added to the systemtap master git repo to
address the issues with building systemtap instrumentation with the linux-6.6
kernels:

commit ca71442b93af61cbd9a1386e24e967373f928eae (HEAD -> master, origin/master,
origin/HEAD)
Author: William Cohen <wcohen@redhat.com>
Date:   Wed Sep 27 10:09:11 2023 -0400

    Eliminate use of kernel's flush_scheduled_work() in systemtap modules

    Kernel git commit 20bdedafd2f63e0ba70991127f9b5c0826ebdb32 turns use
    of flush_scheduled_work() into a warning which causes builds of
    anything using it to fail because warnings are treated as errors.
    Previous users of flush_scheduled_work() in the kernel have been
    converted over to use individual workqueues.  Systemtap runtime now
    does the same.  It creates and uses its own workqueue to eliminate the
    use of flush_scheduled_work().

commit 1ed860193216de3b233da440b2f9d52d94770b43
Author: William Cohen <wcohen@redhat.com>
Date:   Fri Sep 15 15:12:11 2023 -0400

    Use TWA_RESUME in the runtime calls to task_work_add

    Kernel git commit c40e60f00caf18bc382215c79651777eb40f5f9d in the
    linux 6.6 kernels will cause the implicit conversion of boolean true
    to an enum task_work_notify_mode to be flagged resulting in the
    systemtap instrumentation compile to fail.  Adding a config check to
    use TWA_RESUME if it is available or pass in the equivalent true
    argument for older kernels.

commit 21bb398cb78b4274577c8ef2483ea2d03beaf7f4
Author: William Cohen <wcohen@redhat.com>
Date:   Tue Sep 12 11:34:24 2023 -0400

    Eliminate use of do_each_thread() macro

    The Linux kernel commit 5ffd2c37cb7a5 removed the do_each_thread()
    macro and suggests using the for_each_process_thread() in its place.
    The Systemtap runtime has been migrated to using the more concise
    macro.  The for_each_process_thread() macro was added by commit
    0c740d0afc3bff to Linux 3.14, January 2014, almost a decade ago, so it
    should be available on modern systems.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bug runtime/30831] Systemtap scripts fail to compile on newest linux 6.6 kernels
  2023-09-27 20:03 ` wcohen at redhat dot com
@ 2023-11-23 15:16   ` Tetsuo Handa
  2023-12-13 14:47     ` William Cohen
  0 siblings, 1 reply; 10+ messages in thread
From: Tetsuo Handa @ 2023-11-23 15:16 UTC (permalink / raw)
  To: wcohen; +Cc: systemtap

On 2023/09/28 5:03, wcohen at redhat dot com via Systemtap wrote:
> commit ca71442b93af61cbd9a1386e24e967373f928eae (HEAD -> master, origin/master,
> origin/HEAD)
> Author: William Cohen <wcohen@redhat.com>
> Date:   Wed Sep 27 10:09:11 2023 -0400
> 
>     Eliminate use of kernel's flush_scheduled_work() in systemtap modules
> 
>     Kernel git commit 20bdedafd2f63e0ba70991127f9b5c0826ebdb32 turns use
>     of flush_scheduled_work() into a warning which causes builds of
>     anything using it to fail because warnings are treated as errors.
>     Previous users of flush_scheduled_work() in the kernel have been
>     converted over to use individual workqueues.  Systemtap runtime now
>     does the same.  It creates and uses its own workqueue to eliminate the
>     use of flush_scheduled_work().

That commit does not look correct.

  systemtap_wq = alloc_workqueue("systemtap-wq", WQ_UNBOUND | WQ_MEM_RECLAIM, 0);
  if (!systemtap_wq)
    return !systemtap_wq;

causes stap_init_module() to return 1 (i.e. a positive value) when alloc_workqueue()
failed. The caller ( https://elixir.bootlin.com/linux/v6.6/source/kernel/module/main.c#L2528 )
emits a warning message but thinks that stap_init_module() succeeded.
stap_init_module() should return -ENOMEM when alloc_workqueue() failed.

Also, when alloc_workqueue() with WQ_MEM_RECLAIM flag succeeded, alloc_workqueue()
allocated a "struct task_struct". Not calling destroy_workqueue(systemtap_wq) leaks
that "struct task_struct" and related kernel resources.

 rc = systemtap_kernel_module_init();
 if (rc)
   return rc;

I don't know the content of systemtap_kernel_module_init() emitted by translate.cxx ,
but it seems to me that destroy_workqueue(systemtap_wq) will not be called.
Like https://lkml.kernel.org/r/49925af7-78a8-a3dd-bce6-cfc02e1a9236@I-love.SAKURA.ne.jp
says, please be sure to call destroy_workqueue(systemtap_wq) when stap_init_module()
returns an error after alloc_workqueue() succeeded. 

Also, system_wq is created without WQ_MEM_RECLAIM flag
( https://elixir.bootlin.com/linux/v6.6/source/kernel/workqueue.c#L6596 ).
If you don't have a reason to create systemtap_wq with WQ_MEM_RECLAIM flag,
use of WQ_MEM_RECLAIM flag is a waste of kernel resource.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bug runtime/30831] Systemtap scripts fail to compile on newest linux 6.6 kernels
  2023-11-23 15:16   ` Tetsuo Handa
@ 2023-12-13 14:47     ` William Cohen
  2023-12-17 10:41       ` Tetsuo Handa
  0 siblings, 1 reply; 10+ messages in thread
From: William Cohen @ 2023-12-13 14:47 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: wcohen, systemtap

[-- Attachment #1: Type: text/plain, Size: 3074 bytes --]

On 11/23/23 10:16, Tetsuo Handa wrote:
> On 2023/09/28 5:03, wcohen at redhat dot com via Systemtap wrote:
>> commit ca71442b93af61cbd9a1386e24e967373f928eae (HEAD -> master, origin/master,
>> origin/HEAD)
>> Author: William Cohen <wcohen@redhat.com>
>> Date:   Wed Sep 27 10:09:11 2023 -0400
>>
>>     Eliminate use of kernel's flush_scheduled_work() in systemtap modules
>>
>>     Kernel git commit 20bdedafd2f63e0ba70991127f9b5c0826ebdb32 turns use
>>     of flush_scheduled_work() into a warning which causes builds of
>>     anything using it to fail because warnings are treated as errors.
>>     Previous users of flush_scheduled_work() in the kernel have been
>>     converted over to use individual workqueues.  Systemtap runtime now
>>     does the same.  It creates and uses its own workqueue to eliminate the
>>     use of flush_scheduled_work().
> 

Hi,
Sorry for not getting to this earlier.  Attached is a patch that attempts to address the problems mentioned in this email.  Could you verify that the patch addresses all the concerns.


> That commit does not look correct.
> 
>   systemtap_wq = alloc_workqueue("systemtap-wq", WQ_UNBOUND | WQ_MEM_RECLAIM, 0);
>   if (!systemtap_wq)
>     return !systemtap_wq;
> 
> causes stap_init_module() to return 1 (i.e. a positive value) when alloc_workqueue()
> failed. The caller ( https://elixir.bootlin.com/linux/v6.6/source/kernel/module/main.c#L2528 )
> emits a warning message but thinks that stap_init_module() succeeded.
> stap_init_module() should return -ENOMEM when alloc_workqueue() failed.

This area of code now return -ENOMEM if alloc_workqueue fails.  It also eliminates the WQ_MEM_RECLAIM flag.

> 
> Also, when alloc_workqueue() with WQ_MEM_RECLAIM flag succeeded, alloc_workqueue()
> allocated a "struct task_struct". Not calling destroy_workqueue(systemtap_wq) leaks
> that "struct task_struct" and related kernel resources.
> 
>  rc = systemtap_kernel_module_init();
>  if (rc)
>    return rc;
> 
> I don't know the content of systemtap_kernel_module_init() emitted by translate.cxx ,
> but it seems to me that destroy_workqueue(systemtap_wq) will not be called.
> Like https://lkml.kernel.org/r/49925af7-78a8-a3dd-bce6-cfc02e1a9236@I-love.SAKURA.ne.jp
> says, please be sure to call destroy_workqueue(systemtap_wq) when stap_init_module()
> returns an error after alloc_workqueue() succeeded. 

The stap_init_module is generated by the stap translator, but it doesn't call destroy_workqueue(). The code now calls destroy_workqueue before each of the returns in stap_init_module().  There is a use of the workqueue in systemtap_kernel_module_exit(), so the destroy_workqueue() follows it.

The stap_cleanup_module() was simplified as systemtap_wq must be available after the stap_init_module() is successful.

-Will

> 
> Also, system_wq is created without WQ_MEM_RECLAIM flag
> ( https://elixir.bootlin.com/linux/v6.6/source/kernel/workqueue.c#L6596 ).
> If you don't have a reason to create systemtap_wq with WQ_MEM_RECLAIM flag,
> use of WQ_MEM_RECLAIM flag is a waste of kernel resource.
> 

[-- Attachment #2: pr30831_revision.patch --]
[-- Type: text/x-patch, Size: 1069 bytes --]

diff --git a/runtime/linux/runtime.h b/runtime/linux/runtime.h
index a5840794a..d501f7990 100644
--- a/runtime/linux/runtime.h
+++ b/runtime/linux/runtime.h
@@ -434,15 +434,19 @@ static int stap_init_module (void)
      map hash. */
   get_random_bytes(&stap_hash_seed, sizeof (stap_hash_seed));
   systemtap_wq = alloc_workqueue("systemtap-wq",
-		WQ_UNBOUND | WQ_MEM_RECLAIM, 0);
+		WQ_UNBOUND, 0);
   if (!systemtap_wq)
-    return !systemtap_wq;
+    return -ENOMEM;
   rc = systemtap_kernel_module_init();
-  if (rc)
+  if (rc){
+    destroy_workqueue(systemtap_wq);
     return rc;
+  }
   rc = _stp_transport_init();
-  if (rc)
+  if (rc){
     systemtap_kernel_module_exit();
+    destroy_workqueue(systemtap_wq);
+  }
   return rc;
 }
 
@@ -455,8 +459,7 @@ void stap_cleanup_module(void)
      due to tapset-procfs.cxx cleaning up after procfs probes (such
      as in --monitor mode).  */
   systemtap_kernel_module_exit();
-  if (systemtap_wq)
-    destroy_workqueue(systemtap_wq);
+  destroy_workqueue(systemtap_wq);
 }
 
 module_exit(stap_cleanup_module);

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bug runtime/30831] Systemtap scripts fail to compile on newest linux 6.6 kernels
  2023-12-13 14:47     ` William Cohen
@ 2023-12-17 10:41       ` Tetsuo Handa
  0 siblings, 0 replies; 10+ messages in thread
From: Tetsuo Handa @ 2023-12-17 10:41 UTC (permalink / raw)
  To: William Cohen; +Cc: systemtap

On 2023/12/13 23:47, William Cohen wrote:
> Attached is a patch that attempts to address the problems mentioned in this email.
> Could you verify that the patch addresses all the concerns.

Looks good to me. Please apply. Thank you.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-12-17 10:41 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-07 17:46 [Bug runtime/30831] New: The do_each_thread macro has been removed from newest linux 6.6 kernels wcohen at redhat dot com
2023-09-07 19:23 ` [Bug runtime/30831] Systemtap scripts fail to compile on " wcohen at redhat dot com
2023-09-08  2:49 ` wcohen at redhat dot com
2023-09-18 13:46 ` wcohen at redhat dot com
2023-09-18 13:51 ` wcohen at redhat dot com
2023-09-27 14:24 ` wcohen at redhat dot com
2023-09-27 20:03 ` wcohen at redhat dot com
2023-11-23 15:16   ` Tetsuo Handa
2023-12-13 14:47     ` William Cohen
2023-12-17 10:41       ` Tetsuo Handa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).