Re: vfs.add_to_page_cache not working anymore?

public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed

From: Daire Byrne <daire@dneg.com>
To: William Cohen <wcohen@redhat.com>
Cc: "systemtap@sourceware.org" <systemtap@sourceware.org>
Subject: Re: vfs.add_to_page_cache not working anymore?
Date: Mon, 6 Nov 2023 11:20:09 +0000	[thread overview]
Message-ID: <CAPt2mGM00YL+r0Mf4Je3AfBbGoFdq=LaBq4Z_sfT+PN+6tVv1w@mail.gmail.com> (raw)
In-Reply-To: <CAPt2mGPQzCjUR2P+TKW7DrZAhutFU3e7wy_SH5PqcEkrhCeu_A@mail.gmail.com>

Just to follow up, I eventually switched from vfs.add_to_page_cache
(which never triggers on folio kernels), to
kernel.trace("mm_filemap_add_to_page_cache") and I can still get the
inode iike I did before.

However, I'm still a bit stumped as to how I can get the folio size
rather than assume it is the page size (4096) as it was prior to
folios. If any gurus could point me in the right direction I'd be
eternally grateful.

Here's my "working" code but with the (wrong) assumed folio (page) size:

probe kernel.trace("mm_filemap_add_to_page_cache") {
  pid = pid()
  ino = $folio->mapping->host->i_ino
  if ([pid, ino] in files ) {
    readpage[pid, ino] += 4096
    files_store[pid, ino] = sprintf("%s", files[pid, ino])
  }
}

Cheers.

Daire

On Wed, 14 Jun 2023 at 16:58, Daire Byrne <daire@dneg.com> wrote:
>
> Thinking about this a little more, even if the vfs.stp was updated so
> that add_to_page_cache was folio aware (filemap_add_folio?), I presume
> my simple code that assumes we are adding a page worth of data to the
> page cache would no longer be valid.
>
> probe vfs.add_to_page_cache {
>   pid = pid()
>   if ([pid, ino] in files ) {
>     readpage[pid, ino] += 4096
>     files_store[pid, ino] = sprintf("%s", files[pid, ino])
>   }
> }
>
> I would think something like filemap_add_folio can now be called once
> for many pages read and I would need to track the number of pages in
> each folio call too?
>
> And I remember exactly why I was inferring (NFS) file reads via
> vfs.add_to_page_cache now - I wanted to record only the file reads
> that resulted in data being asked of the NFS server. In other words,
> only the IO resulting in network IO from each NFS server in a time
> series.
>
> I couldn't find any other way of doing that on a per file inode basis
> while taking the page cache data into account too.
>
> If anyone knows of an easier way  to achieve the same thing, then I'll
> happily do that instead.
>
> Cheers,
>
> Daire
>
> On Wed, 14 Jun 2023 at 13:11, Daire Byrne <daire@dneg.com> wrote:
> >
> > On Tue, 13 Jun 2023 at 18:34, William Cohen <wcohen@redhat.com> wrote:
> > >
> > > On 6/13/23 12:39, Daire Byrne wrote:
> > > > On Tue, 13 Jun 2023 at 16:22, William Cohen <wcohen@redhat.com> wrote:>
> > > >> Switching to systemtap-4.9 is probably not going to change the results
> > > >> in this case as there are no changes in tapset/linux/vfs.stp between
> > > >> 4.8 and 4.9.
> > > >
> > > > Good to know, I can skip trying to compile that then...
> > >
> > > Yes, running a newer version of software is often the first approach to see if the problem has been fixed upstream.  However, in this case the newer version of systemtap is going to give the same results as the tapset in that area are the same.  So the focus is find what is different between the working older kernels and the current non-working kernel.
> > >
> > > >
> > > >> Unfortunately, the kernels changes over time and some functions probed
> > > >> by the tapset change over time or the way they are used by other parts
> > > >> of the kernel changes.  The vfs.add_to_page cache in the vfs.stp has
> > > >> three possible functions it probes: add_to_page_cache_locked,
> > > >> add_to_page_cache_lru, and add_to_page_cache.  The first two functions
> > > >> were added due to kernel commit f00654007fe1c15.  Did some git commit
> > > >> archeology and only add_to_page_cache_lru is in the kernel due to
> > > >> kernel git commit 2bb876b58d593d7f2522ec0f41f20a74fde76822.
> > > >>
> > > >> The following URL show where add_to_page_cache_lru is used in 6.2.16
> > > >> kernels nfs and can provide some method of seeing how the nfs related
> > > >> functions get called:
> > > >>
> > > >> https://elixir.bootlin.com/linux/v6.2.16/A/ident/add_to_page_cache_lru
> > > >
> > > > Thanks for the feedback and pointers, that helps me understand where
> > > > the changes came from at least. It was still working on my last
> > > > production kernel - v5.16.
> > >
> > > There are times were that is not possible when some function has been inlined and the return probe point isn't available or some argument is not available at the probe point, but we do try to adapt the tapsets and examples to work on newer kernels.
> > >
> > > >
> > > > So if I recall, I used vfs.add_to_page cache because at the time it
> > > > was the only (or easiest) way to work out total reads for mmap files
> > > > from an NFS filesystem.
> > > >
> > > > I also would have thought it should work for any filesystem not just
> > > > NFS - but I don't get any hits at all for an entire busy system.
> > > >
> > > >> As far as specifically what has changed to cause vfs.add_to_page_cache
> > > >> not to trigger for NFS operations I am not sure.  For the 6.2 kernel
> > > >> it might be good to get a backtrace of the triggering of it and then
> > > >> use that information to see what has changed in the functions on the
> > > >> backtrace.
> > > >>
> > > >> stap -ldd -e 'probe vfs.add_to_page_cache { print_backtrace(); printf("Works.\n"); exit() }'
> > > >
> > > > I just get the error "Cannot specify a script with -l/-L/--dump-*
> > > > switches" using systemtap v4.8.
> > >
> > > Sorry,  missing a second '-' before ldd.  The command below should work:
> > >
> > > stap --ldd -e 'probe vfs.add_to_page_cache { print_backtrace(); printf("Works.\n"); exit() }'
> > >
> > > It would be useful to know if the backtraces are.  That would provide some information on how to adapt the script for newer kernels.
> >
> > Right, so I got set it up on the last known "working" kernel I had,
> > v5.16, and this is a typical trace for a read:
> >
> > root@lonc400b1 daire]# stap --ldd -e 'probe vfs.add_to_page_cache {
> > print_backtrace(); printf("Works.\n"); exit() }'
> > WARNING: Missing unwind data for a module, rerun with 'stap -d kernel'
> >  0xffffffff91258300 : add_to_page_cache_lru+0x0/0x30 [kernel]
> >  0xffffffff912585b8 : read_cache_pages+0xd8/0x1a0 [kernel]
> >  0xffffffffc0bbaccf
> >  0xffffffffc0bbaccf
> >  0xffffffff912589e5 : read_pages+0x155/0x250 [kernel]
> >  0xffffffff91258cae : page_cache_ra_unbounded+0x1ce/0x250 [kernel]
> >  0xffffffff91258ed0 : ondemand_readahead+0x1a0/0x300 [kernel]
> >  0xffffffff912592ed : page_cache_sync_ra+0xbd/0xd0 [kernel]
> >  0xffffffff9124cf13 : filemap_get_pages+0xe3/0x420 [kernel]
> >  0xffffffff9124d31e : filemap_read+0xce/0x3c0 [kernel]
> >  0xffffffff9124d700 : generic_file_read_iter+0xf0/0x160 [kernel]
> >  0xffffffffc0baea64
> >  0xffffffff91312c70 : new_sync_read+0x110/0x190 [kernel]
> >  0xffffffff9131546f : vfs_read+0xff/0x1a0 [kernel]
> >  0xffffffff91315b07 : ksys_read+0x67/0xe0 [kernel]
> >  0xffffffff91315b99 : __x64_sys_read+0x19/0x20 [kernel]
> >  0xffffffff91a6312b : do_syscall_64+0x3b/0x90 [kernel]
> >  0xffffffff91c0007c : entry_SYSCALL_64_after_hwframe+0x44/0xae [kernel]
> > Works.
> >
> > As you said earlier, it's hitting "add_to_page_cache_lru".
> >
> > I also tested with the v5.19 kernel and it no longer triggers anything
> > with that.
> >
> > I'm going to stick my head out and say this stopped working due to all
> > the folio conversion patches that were added between v5.17 & v6.0?
> >
> > Looking at the changelogs between v5.16 and v5.19 that's what jumps
> > out to me anyway.
> >
> > Cheers,
> >
> > Daire
> >
> >
> > > -Will
> > >
> > > >
> > > > Thanks for the response. It sounds like I need to find a different way
> > > > to work out total NFS reads for each filename path in modern kernels.
> > > >
> > > > Daire
> > > >
> > > > BTW, this is the code I had for tracking per process and file path read IO:
> > > >
> > > > probe nfs.fop.open {
> > > >   pid = pid()
> > > >   filename = sprintf("%s", d_path(&$filp->f_path))
> > > >   if (filename =~ "/hosts/.*/user_data") {
> > > >     files[pid, ino] = filename
> > > >     if ( !([pid, ino] in procinfo))
> > > >       procinfo[pid, ino] = sprintf("%s", proc())
> > > >   }
> > > > }
> > > >
> > > > probe vfs.add_to_page_cache {
> > > >   pid = pid()
> > > >   if ([pid, ino] in files ) {
> > > >     readpage[pid, ino] += 4096
> > > >     files_store[pid, ino] = sprintf("%s", files[pid, ino])
> > > >   }
> > > > }
> > > >
> > >

     prev parent reply	other threads:[~2023-11-06 11:20 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-12 15:39 Daire Byrne
2023-06-13 15:22 ` William Cohen
2023-06-13 16:39   ` Daire Byrne
2023-06-13 17:34     ` William Cohen
2023-06-14 12:11       ` Daire Byrne
2023-06-14 15:58         ` Daire Byrne
2023-11-06 11:20           ` Daire Byrne [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPt2mGM00YL+r0Mf4Je3AfBbGoFdq=LaBq4Z_sfT+PN+6tVv1w@mail.gmail.com' \
    --to=daire@dneg.com \
    --cc=systemtap@sourceware.org \
    --cc=wcohen@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).