From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 98903 invoked by alias); 1 May 2018 15:11:08 -0000 Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: systemtap-owner@sourceware.org Received: (qmail 98835 invoked by uid 48); 1 May 2018 15:11:01 -0000 From: "dsmith at redhat dot com" To: systemtap@sourceware.org Subject: [Bug runtime/22847] ARM OABI syscall tracing issues Date: Tue, 01 May 2018 15:11:00 -0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: systemtap X-Bugzilla-Component: runtime X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: dsmith at redhat dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: systemtap at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2018-q2/txt/msg00048.txt.bz2 https://sourceware.org/bugzilla/show_bug.cgi?id=3D22847 --- Comment #21 from David Smith --- (In reply to Gustavo Moreira from comment #20) > (In reply to David Smith from comment #19) > > (sorry for the delay in responding) > >=20 > > (In reply to Gustavo Moreira from comment #18) > > > I ended up modifying the kernel to update thread_info struct with the > > > syscall number. Then I just call the original kernel syscall_get_nr() > > > function from SystemTap, which is working like a charm. > >=20 > > Good deal. Have you tried getting the kernel patch upstream? >=20 > Not yet. Do you think they could be interested? Yes. Without getting your patch in the upstream kernel, you work here will = only be useful for you. > > ... stuff deleted ... > >=20 > > > However, for instance, when it's used with your strace.stp which uses= probe > > > alias, it doesn't work ... it doesn't report the syscalls. Even using= an > > > EABI binary it doesn't report the syscalls. (See staprun_output_eabi.= log and > > > staprun_output_oabi.log) > > >=20 > > > I also noticed that, for instance from tapset/linux/sysc_connect.stp, > > > __syscall_gate() is called to filter the syscalls, so I've crafted so= me code > > > (see syscalls_stpm.patch) to avoid to be filtered in case the syscall= number > > > doesn't match with the constants. > > >=20 > > > I'm not getting what is happening from the SystemTap side, it seems t= he > > > syscalls are being filtered somewhere ... could you please help me ou= t? > >=20 > > You'll need to break down the @__syscall_gate macro into smaller pieces= and > > see where it is calling "next". Another idea, perhaps simpler, would be= to > > stick printf calls in that macro (and all that it calls) to let you know > > which macro is calling "next". My guess would be that the > > @__syscall_gate_compat_simple macro is doing the filtering, but you'll = need > > to test that theory. >=20 > Actually, the patches are fully working. The probes wasn't being called d= ue > to the MAXSKIPPED limit: > So, I've suppressed the time limits checks (--suppress-time-limits). I co= uld > also increase the limit to a specific value but anyway I wonder why it's > happening now after these changes. >=20 > What do you think about the changes in syscalls.stpm? Do they look good? I've got some problems with the changes to syscalls.stpm. Besides having de= bug printf's present, your changes bypass the filtering if you've got a OABI executable. You'll end up with syscall nesting that way, something we definitely try to avoid. Also, you'd need similar changes in the other macr= os - __syscall_gate2, __syscall_compat_gate, etc. Earlier, you said: In OABI the syscall convention is svc 0x900000 + SYSCALL= _NR. If that is true, couldn't your changes be simplified to: %( CONFIG_OABI_COMPAT =3D=3D "y" %? # If _stp_syscall_nr() fails, that means we aren't in user # context. So, skip this call. try { __nr =3D _stp_syscall_nr() } catch { next } # In ARM, if it is an OABI call, the syscalls are > __NR_OABI_SYSCALL_BASE if (__nr > @const("__NR_OABI_SYSCALL_BASE")) { __nr =3D __nr - @const("__NR_OABI_SYSCALL_BASE") } if (__nr !=3D @syscall_nr) next %: ... And then the next thing I wonder is there has got to be more difference than just syscall numbers between the two ABIs. I assume structures are laid out differently along with perhaps other changes. You'll have to account for th= at. Poking around the arch/arm directory I'd guess you might need to probe the sys_oabi_* functions and implement a way of knowing if we're in an OABI executable (like setting a thread flag). > It also shows two warnings in the output: >=20 > WARNING: Skipped due to missed kretprobe/2 on > 'kprobe.function("sys_readlink").return?': 1 > WARNING: Skipped due to missed kprobe on 'kprobe.function("sys_readlink")= ?': > 1 >=20 > I don't think it would be important but anyway it would be nice if we cou= ld > fix it as well. Any clue? Actually, it is important (and probably why MAXSKIPPED is being hit). Let's start with the definition of MAXSKIPPED: "Maximum number of skipped probes before an exit is triggered, default 100." So, the first question to answer is "why are you getting so many skipped probes?". You might start by seeing if the kernel outputs any messages when this happens. --=20 You are receiving this mail because: You are the assignee for the bug.