From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 83343 invoked by alias); 18 Dec 2019 07:32:44 -0000 Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: systemtap-owner@sourceware.org Received: (qmail 83293 invoked by uid 48); 18 Dec 2019 07:32:39 -0000 From: "agentzh at gmail dot com" To: systemtap@sourceware.org Subject: [Bug tapsets/25290] New: process(EXE).begin may occasionally miss already-running target processes from EXE Date: Wed, 18 Dec 2019 07:32:00 -0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: systemtap X-Bugzilla-Component: tapsets X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: agentzh at gmail dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: systemtap at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2019-q4/txt/msg00077.txt.bz2 https://sourceware.org/bugzilla/show_bug.cgi?id=3D25290 Bug ID: 25290 Summary: process(EXE).begin may occasionally miss already-running target processes from EXE Product: systemtap Version: unspecified Status: UNCONFIRMED Severity: normal Priority: P2 Component: tapsets Assignee: systemtap at sourceware dot org Reporter: agentzh at gmail dot com Target Milestone: --- I've noted that the process(EXE).begin probes may sometimes miss target processes which are already running when staprun is started. To reproduce t= his: 1. Prepare a minimal a.c C program file: ``` #include int main (int argc, char** argv) { while (1) { usleep(1); } return 0; } ``` Compile it like this: gcc -g a.c which results in ./a.out generated. Run this program in a separate terminal window like this: ./a.out Keep this process running (it has an infinite loop so it will never exit by itself). 2. Prepare a minimal a.stp script file: ``` global max_cnt =3D 5, cnt =3D 0; global pids probe process("/home/agentzh/a.out").begin { pids[pid()] =3D 1 } probe process("/home/agentzh/a.out").end { delete pids[pid()]; } probe process("/lib64/libc.so.6").function("usleep") { if (!pids[pid()]) next; if (++cnt > max_cnt) exit(); println("usleep(", @var("useconds"), ")"); } probe timer.s(5) { warn("timer expired"); exit(); } probe begin { warn("Start tracing...") } ``` And then compile this file to a kernel module file, usleep.ko: stap -p4 -m usleep a.stp 3. Run this usleep.ko module in a shell loop: ( while true; do echo =3D=3D=3D=3D=3D=3D; sudo staprun usleep.ko; sleep= 0.1; done ) |& tee a.txt after a few minutes (be patient!), we should see "timer expired" messages f= rom the output file a.txt: $ grep -a timer a.txt WARNING: timer expired WARNING: timer expired WARNING: timer expired WARNING: timer expired WARNING: timer expired WARNING: timer expired WARNING: timer expired WARNING: timer expired You may get fewer lines here though. The timer expired message should never be printed when the process(EXE).beg= in probe is fired properly. I tried the latest master branch of stap on kernel 5.0.16-100.fc28.x86_64 f= rom Fedora 28, kernel 4.15.0-72-generic x86_64 from Ubuntu 18.04, and kernels 3.10.0-957.27.2.el7.x86_64 and 3.10.0-1062.9.1.el7.x86_64 from CentOS 7, all show the same problem. I just noted that this is much harder to reproduce on kernel 5.0.16 than earlier versions of the kernel. The stap version: ``` $ /opt/stap/bin/stap -V Systemtap translator/driver (version 4.3/0.174/0.177, commit release-4.2-10-g1427836ac118) Copyright (C) 2005-2019 Red Hat, Inc. and others This is free software; see the source for copying conditions. tested kernel versions: 2.6.32 ... 5.4-rc6 enabled features: BPF LIBSQLITE3 NLS ``` Any hints on debugging this further? Thanks! --=20 You are receiving this mail because: You are the assignee for the bug.