User Memory Read Failure Question

public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed

* User Memory Read Failure Question
@ 2015-08-24 12:07 Daniel Heller
  2015-08-24 17:59 ` Josh Stone
  0 siblings, 1 reply; 2+ messages in thread
From: Daniel Heller @ 2015-08-24 12:07 UTC (permalink / raw)
  To: systemtap

Hello,

I've built a userland tracing mechanism on top of SystemTap which I
and my colleagues have used fruitfully for quite a while.  Recently, I
got a report that some of my users on linux 3.18.16/x64/SystemTap 2.7
were seeing missing data in some of my probe logging; on the systems
in question, I can reliably reproduce the problem, but on other
similarly configured systems, I cannot reproduce it at all.  I have
found that reads for certain user addresses are reliably failing even
though examinations of /proc/${pid}/maps show that the regions are
mapped with read access; reads of the same addresses through
/proc/${pid}/mem and examination of core files both find the expected
values at the locations in question (SystemTap continues to fail to
read after that manual examination, so I do not believe that my manual
reads changed the state which is causing the problem).  Since I am
tracing an interpreter (Node.js 0.10) whose behavior I don't fully
understand, it's possible that the process itself is changing
permissions on the pages dynamically, causing the reads to fail.  I
haven't been able to disprove this possibility.  As I've been trying
to investigate, I've begun to wonder:

(a) Whether Linux may be unmapping the pages (but leaving them
resident) for access detection, and whether if that happened,
SystemTap would fail user reads to avoid potential recursive faulting
behavior.

(b) Whether there may be reasons for systemtap read failures other
than invalid mappings that I haven't anticipated but would be able to
check for.

(c) Whether there is a good recipe for getting at the page-level
permissions in the VM, from SystemTap context or otherwise (this would
of course be platform-specific; I can dig in and embed C if need be,
but I'm not experienced with the Linux VM).

Has anyone else debugged a problem like this?  Do you have any
insights or tooling you might recommend?  Thanks for your insight!

Dan

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: User Memory Read Failure Question
  2015-08-24 12:07 User Memory Read Failure Question Daniel Heller
@ 2015-08-24 17:59 ` Josh Stone
  0 siblings, 0 replies; 2+ messages in thread
From: Josh Stone @ 2015-08-24 17:59 UTC (permalink / raw)
  To: Daniel Heller, systemtap

On 08/24/2015 05:07 AM, Daniel Heller wrote:
> Since I am tracing an interpreter (Node.js 0.10) whose behavior I
> don't fully understand, it's possible that the process itself is
> changing permissions on the pages dynamically, causing the reads to
> fail.  I haven't been able to disprove this possibility.

Try to probe syscall.mprotect, or even just strace -e mprotect.

> (a) Whether Linux may be unmapping the pages (but leaving them
> resident) for access detection, and whether if that happened,
> SystemTap would fail user reads to avoid potential recursive faulting
> behavior.

I'm not sure how this unmapping would happen.  If it's really out of the
process' memory map, then there's no access anymore.  Now, getting paged
out, or never having paged in, is totally possible...

> (b) Whether there may be reasons for systemtap read failures other
> than invalid mappings that I haven't anticipated but would be able to
> check for.

I think it's most likely that the memory simply isn't paged in at that
time.  SystemTap's kernel handlers run in atomic context, so they can't
wait for a page fault to be serviced.

But if you poked the same process memory from outside, that ought to
have paged it in, so if it still fails I'm not sure...

You might have better luck with stap --runtime=dyninst, where all the
probe handlers will run directly in-process, so all memory access is
exactly as capable as the process itself.  (e.g. paging is fine.)  But
dyninst mode only works for scripts without any kernel probes, and it
only works on targeted processes (-x / -c) and their children, not
systemwide.  Plus some features aren't developed there yet, especially
backtracing.  If that's acceptable, please try it out!

> (c) Whether there is a good recipe for getting at the page-level
> permissions in the VM, from SystemTap context or otherwise (this would
> of course be platform-specific; I can dig in and embed C if need be,
> but I'm not experienced with the Linux VM).

I don't think we have anything canned like that, but it sounds like a
good idea for a tapset function!  Something like:

  function addr_vm_flags:long(addr:long) %{
    struct vm_area_struct *vma;
    unsigned long addr = STAP_ARG_addr;
    vma = find_vma_intersection(current->mm, addr, addr+1);
    STAP_RETURN(vma ? vma->vm_flags : VM_NONE);
  %}

(only lightly tested here)

Or maybe pgprot_val(vma->vm_page_prot) is more useful, I'm not sure.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2015-08-24 17:59 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-24 12:07 User Memory Read Failure Question Daniel Heller
2015-08-24 17:59 ` Josh Stone

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).