public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* Read contents of (userland) physical pages not yet mapped?
@ 2015-01-23  0:31 Yichun Zhang (agentzh)
  2015-01-23  3:15 ` Frank Ch. Eigler
  0 siblings, 1 reply; 2+ messages in thread
From: Yichun Zhang (agentzh) @ 2015-01-23  0:31 UTC (permalink / raw)
  To: systemtap; +Cc: Shuxin Yang, Jovi Zhangwei, Dane Knecht

Hi folks!

I have run into an interesting problem while writing the systemtap
tool [1] to inspect the shared memory regions of nginx.

Basically, the nginx master process creates a shared memory zone via
something like

    mmap(NULL, shm->size, PROT_READ|PROT_WRITE,
               MAP_ANON|MAP_SHARED, -1, 0);

And then the master process forks off several worker processes which
can both read from and write to such shm zones.

My little tool is to inspect the internal contents in such shm zone by
probing on an arbitrary userland C function and checking a particular
nginx worker process specified by the tool user via pid [2].

But sometimes, due to "demand paging", some pages initialized by some
workers are not loaded yet to the current worker's page tables,
leading to systemtap runtime errors while reading the corresponding
userspace addresses even though a (minor) page fault should safely
bring those (cold) pages in if the read were initiated by the userland
code itself. And I cannot trigger page faults in the user process in
my stap script and it's not safe (due to the possibility of SEGV)
nevertheless. This issue has been haunted me for long :P

Given my very limited knowledge of the Linux kernel, I know each user
process has its own page tables and address space, so it's impossible
to access other worker processes' address space from within the
current process context of the probe handler (even it's in the kernel
space). Also, accessing RAM data always requires using logical
addresses instead of physical address and cannot (temporarily) turn
off the virtual memory translation of the CPU (unless DMA is
involved).

But anyway I've decide to ask here for any potential workarounds to
this problem because there are so many systemtap and kernel experts
here :)

Thanks for reading this!

Best regards,
-agentzh

[1] https://github.com/openresty/nginx-systemtap-toolkit#ngx-shm
[2] Swapped out pages are not interested at all (because it is usually rare) :)

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Read contents of (userland) physical pages not yet mapped?
  2015-01-23  0:31 Read contents of (userland) physical pages not yet mapped? Yichun Zhang (agentzh)
@ 2015-01-23  3:15 ` Frank Ch. Eigler
  0 siblings, 0 replies; 2+ messages in thread
From: Frank Ch. Eigler @ 2015-01-23  3:15 UTC (permalink / raw)
  To: Yichun Zhang (agentzh); +Cc: systemtap, Shuxin Yang, Jovi Zhangwei, Dane Knecht


agentzh wrote:

> [...]  I have run into an interesting problem while writing the
> systemtap tool [1] to inspect the shared memory regions of nginx.
> [...]  But sometimes, due to "demand paging", some pages initialized
> by some workers are not loaded yet to the current worker's page
> tables, leading to systemtap runtime errors while reading the
> corresponding userspace addresses even though a (minor) page fault
> should [do the job]

Yeah, this is similar to the way stap tries to preclude causing page
faults in other contexts, like not-yet-paged-in strings out of user
syscall parameters.


> [...]  But anyway I've decide to ask here for any potential
> workarounds to this problem because there are so many systemtap and
> kernel experts here :)

Any chance that mlockall() (or an explicit page-touching loop right
after mmap) could work for you?


- FChE

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2015-01-23  3:15 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-23  0:31 Read contents of (userland) physical pages not yet mapped? Yichun Zhang (agentzh)
2015-01-23  3:15 ` Frank Ch. Eigler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).