public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* vfs.add_to_page_cache not working anymore?
@ 2023-06-12 15:39 Daire Byrne
  2023-06-13 15:22 ` William Cohen
  0 siblings, 1 reply; 7+ messages in thread
From: Daire Byrne @ 2023-06-12 15:39 UTC (permalink / raw)
  To: systemtap

[-- Attachment #1: Type: text/plain, Size: 627 bytes --]

I have recently updated my kernel (v6.3) and with systemtap version (v4.8),
an old systemtap script I was using has stopped working.

It looks like I was counting pages added via vfs.add_to_page_cache as a
means to work out file reads (over NFS) per pid.

But this function no longer seem to hit at all for me anymore:

stap -e 'probe vfs.add_to_page_cache { printf("Works.\n"); exit() }'

I am aware that v4.9 is the latest systemtap, but I have been struggling to
get it to compile with the new jupyter stuff (my python version is too old).

Anyway, any help or pointers for vfs.add_to_page_cache greatly appreciated.

Daire

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: vfs.add_to_page_cache not working anymore?
  2023-06-12 15:39 vfs.add_to_page_cache not working anymore? Daire Byrne
@ 2023-06-13 15:22 ` William Cohen
  2023-06-13 16:39   ` Daire Byrne
  0 siblings, 1 reply; 7+ messages in thread
From: William Cohen @ 2023-06-13 15:22 UTC (permalink / raw)
  To: Daire Byrne; +Cc: wcohen, systemtap

On 6/12/23 11:39, Daire Byrne via Systemtap wrote:
> I have recently updated my kernel (v6.3) and with systemtap version (v4.8),
> an old systemtap script I was using has stopped working.
> 
> It looks like I was counting pages added via vfs.add_to_page_cache as a
> means to work out file reads (over NFS) per pid.
> 
> But this function no longer seem to hit at all for me anymore:
> 
> stap -e 'probe vfs.add_to_page_cache { printf("Works.\n"); exit() }'
> 
> I am aware that v4.9 is the latest systemtap, but I have been struggling to
> get it to compile with the new jupyter stuff (my python version is too old).
> 
> Anyway, any help or pointers for vfs.add_to_page_cache greatly appreciated.
> 
> Daire
> 

Hi,

Switching to systemtap-4.9 is probably not going to change the results
in this case as there are no changes in tapset/linux/vfs.stp between
4.8 and 4.9.

Unfortunately, the kernels changes over time and some functions probed
by the tapset change over time or the way they are used by other parts
of the kernel changes.  The vfs.add_to_page cache in the vfs.stp has
three possible functions it probes: add_to_page_cache_locked,
add_to_page_cache_lru, and add_to_page_cache.  The first two functions
were added due to kernel commit f00654007fe1c15.  Did some git commit
archeology and only add_to_page_cache_lru is in the kernel due to
kernel git commit 2bb876b58d593d7f2522ec0f41f20a74fde76822.

The following URL show where add_to_page_cache_lru is used in 6.2.16
kernels nfs and can provide some method of seeing how the nfs related
functions get called:

https://elixir.bootlin.com/linux/v6.2.16/A/ident/add_to_page_cache_lru

As far as specifically what has changed to cause vfs.add_to_page_cache
not to trigger for NFS operations I am not sure.  For the 6.2 kernel
it might be good to get a backtrace of the triggering of it and then
use that information to see what has changed in the functions on the
backtrace.

stap -ldd -e 'probe vfs.add_to_page_cache { print_backtrace(); printf("Works.\n"); exit() }'

-Will


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: vfs.add_to_page_cache not working anymore?
  2023-06-13 15:22 ` William Cohen
@ 2023-06-13 16:39   ` Daire Byrne
  2023-06-13 17:34     ` William Cohen
  0 siblings, 1 reply; 7+ messages in thread
From: Daire Byrne @ 2023-06-13 16:39 UTC (permalink / raw)
  To: William Cohen; +Cc: systemtap

On Tue, 13 Jun 2023 at 16:22, William Cohen <wcohen@redhat.com> wrote:>
> Switching to systemtap-4.9 is probably not going to change the results
> in this case as there are no changes in tapset/linux/vfs.stp between
> 4.8 and 4.9.

Good to know, I can skip trying to compile that then...

> Unfortunately, the kernels changes over time and some functions probed
> by the tapset change over time or the way they are used by other parts
> of the kernel changes.  The vfs.add_to_page cache in the vfs.stp has
> three possible functions it probes: add_to_page_cache_locked,
> add_to_page_cache_lru, and add_to_page_cache.  The first two functions
> were added due to kernel commit f00654007fe1c15.  Did some git commit
> archeology and only add_to_page_cache_lru is in the kernel due to
> kernel git commit 2bb876b58d593d7f2522ec0f41f20a74fde76822.
>
> The following URL show where add_to_page_cache_lru is used in 6.2.16
> kernels nfs and can provide some method of seeing how the nfs related
> functions get called:
>
> https://elixir.bootlin.com/linux/v6.2.16/A/ident/add_to_page_cache_lru

Thanks for the feedback and pointers, that helps me understand where
the changes came from at least. It was still working on my last
production kernel - v5.16.

So if I recall, I used vfs.add_to_page cache because at the time it
was the only (or easiest) way to work out total reads for mmap files
from an NFS filesystem.

I also would have thought it should work for any filesystem not just
NFS - but I don't get any hits at all for an entire busy system.

> As far as specifically what has changed to cause vfs.add_to_page_cache
> not to trigger for NFS operations I am not sure.  For the 6.2 kernel
> it might be good to get a backtrace of the triggering of it and then
> use that information to see what has changed in the functions on the
> backtrace.
>
> stap -ldd -e 'probe vfs.add_to_page_cache { print_backtrace(); printf("Works.\n"); exit() }'

I just get the error "Cannot specify a script with -l/-L/--dump-*
switches" using systemtap v4.8.

Thanks for the response. It sounds like I need to find a different way
to work out total NFS reads for each filename path in modern kernels.

Daire

BTW, this is the code I had for tracking per process and file path read IO:

probe nfs.fop.open {
  pid = pid()
  filename = sprintf("%s", d_path(&$filp->f_path))
  if (filename =~ "/hosts/.*/user_data") {
    files[pid, ino] = filename
    if ( !([pid, ino] in procinfo))
      procinfo[pid, ino] = sprintf("%s", proc())
  }
}

probe vfs.add_to_page_cache {
  pid = pid()
  if ([pid, ino] in files ) {
    readpage[pid, ino] += 4096
    files_store[pid, ino] = sprintf("%s", files[pid, ino])
  }
}

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: vfs.add_to_page_cache not working anymore?
  2023-06-13 16:39   ` Daire Byrne
@ 2023-06-13 17:34     ` William Cohen
  2023-06-14 12:11       ` Daire Byrne
  0 siblings, 1 reply; 7+ messages in thread
From: William Cohen @ 2023-06-13 17:34 UTC (permalink / raw)
  To: Daire Byrne; +Cc: wcohen, systemtap

On 6/13/23 12:39, Daire Byrne wrote:
> On Tue, 13 Jun 2023 at 16:22, William Cohen <wcohen@redhat.com> wrote:>
>> Switching to systemtap-4.9 is probably not going to change the results
>> in this case as there are no changes in tapset/linux/vfs.stp between
>> 4.8 and 4.9.
> 
> Good to know, I can skip trying to compile that then...

Yes, running a newer version of software is often the first approach to see if the problem has been fixed upstream.  However, in this case the newer version of systemtap is going to give the same results as the tapset in that area are the same.  So the focus is find what is different between the working older kernels and the current non-working kernel.

> 
>> Unfortunately, the kernels changes over time and some functions probed
>> by the tapset change over time or the way they are used by other parts
>> of the kernel changes.  The vfs.add_to_page cache in the vfs.stp has
>> three possible functions it probes: add_to_page_cache_locked,
>> add_to_page_cache_lru, and add_to_page_cache.  The first two functions
>> were added due to kernel commit f00654007fe1c15.  Did some git commit
>> archeology and only add_to_page_cache_lru is in the kernel due to
>> kernel git commit 2bb876b58d593d7f2522ec0f41f20a74fde76822.
>>
>> The following URL show where add_to_page_cache_lru is used in 6.2.16
>> kernels nfs and can provide some method of seeing how the nfs related
>> functions get called:
>>
>> https://elixir.bootlin.com/linux/v6.2.16/A/ident/add_to_page_cache_lru
> 
> Thanks for the feedback and pointers, that helps me understand where
> the changes came from at least. It was still working on my last
> production kernel - v5.16.

There are times were that is not possible when some function has been inlined and the return probe point isn't available or some argument is not available at the probe point, but we do try to adapt the tapsets and examples to work on newer kernels.

> 
> So if I recall, I used vfs.add_to_page cache because at the time it
> was the only (or easiest) way to work out total reads for mmap files
> from an NFS filesystem.
> 
> I also would have thought it should work for any filesystem not just
> NFS - but I don't get any hits at all for an entire busy system.
> 
>> As far as specifically what has changed to cause vfs.add_to_page_cache
>> not to trigger for NFS operations I am not sure.  For the 6.2 kernel
>> it might be good to get a backtrace of the triggering of it and then
>> use that information to see what has changed in the functions on the
>> backtrace.
>>
>> stap -ldd -e 'probe vfs.add_to_page_cache { print_backtrace(); printf("Works.\n"); exit() }'
> 
> I just get the error "Cannot specify a script with -l/-L/--dump-*
> switches" using systemtap v4.8.

Sorry,  missing a second '-' before ldd.  The command below should work:

stap --ldd -e 'probe vfs.add_to_page_cache { print_backtrace(); printf("Works.\n"); exit() }'

It would be useful to know if the backtraces are.  That would provide some information on how to adapt the script for newer kernels.

-Will

> 
> Thanks for the response. It sounds like I need to find a different way
> to work out total NFS reads for each filename path in modern kernels.
> 
> Daire
> 
> BTW, this is the code I had for tracking per process and file path read IO:
> 
> probe nfs.fop.open {
>   pid = pid()
>   filename = sprintf("%s", d_path(&$filp->f_path))
>   if (filename =~ "/hosts/.*/user_data") {
>     files[pid, ino] = filename
>     if ( !([pid, ino] in procinfo))
>       procinfo[pid, ino] = sprintf("%s", proc())
>   }
> }
> 
> probe vfs.add_to_page_cache {
>   pid = pid()
>   if ([pid, ino] in files ) {
>     readpage[pid, ino] += 4096
>     files_store[pid, ino] = sprintf("%s", files[pid, ino])
>   }
> }
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: vfs.add_to_page_cache not working anymore?
  2023-06-13 17:34     ` William Cohen
@ 2023-06-14 12:11       ` Daire Byrne
  2023-06-14 15:58         ` Daire Byrne
  0 siblings, 1 reply; 7+ messages in thread
From: Daire Byrne @ 2023-06-14 12:11 UTC (permalink / raw)
  To: William Cohen; +Cc: systemtap

On Tue, 13 Jun 2023 at 18:34, William Cohen <wcohen@redhat.com> wrote:
>
> On 6/13/23 12:39, Daire Byrne wrote:
> > On Tue, 13 Jun 2023 at 16:22, William Cohen <wcohen@redhat.com> wrote:>
> >> Switching to systemtap-4.9 is probably not going to change the results
> >> in this case as there are no changes in tapset/linux/vfs.stp between
> >> 4.8 and 4.9.
> >
> > Good to know, I can skip trying to compile that then...
>
> Yes, running a newer version of software is often the first approach to see if the problem has been fixed upstream.  However, in this case the newer version of systemtap is going to give the same results as the tapset in that area are the same.  So the focus is find what is different between the working older kernels and the current non-working kernel.
>
> >
> >> Unfortunately, the kernels changes over time and some functions probed
> >> by the tapset change over time or the way they are used by other parts
> >> of the kernel changes.  The vfs.add_to_page cache in the vfs.stp has
> >> three possible functions it probes: add_to_page_cache_locked,
> >> add_to_page_cache_lru, and add_to_page_cache.  The first two functions
> >> were added due to kernel commit f00654007fe1c15.  Did some git commit
> >> archeology and only add_to_page_cache_lru is in the kernel due to
> >> kernel git commit 2bb876b58d593d7f2522ec0f41f20a74fde76822.
> >>
> >> The following URL show where add_to_page_cache_lru is used in 6.2.16
> >> kernels nfs and can provide some method of seeing how the nfs related
> >> functions get called:
> >>
> >> https://elixir.bootlin.com/linux/v6.2.16/A/ident/add_to_page_cache_lru
> >
> > Thanks for the feedback and pointers, that helps me understand where
> > the changes came from at least. It was still working on my last
> > production kernel - v5.16.
>
> There are times were that is not possible when some function has been inlined and the return probe point isn't available or some argument is not available at the probe point, but we do try to adapt the tapsets and examples to work on newer kernels.
>
> >
> > So if I recall, I used vfs.add_to_page cache because at the time it
> > was the only (or easiest) way to work out total reads for mmap files
> > from an NFS filesystem.
> >
> > I also would have thought it should work for any filesystem not just
> > NFS - but I don't get any hits at all for an entire busy system.
> >
> >> As far as specifically what has changed to cause vfs.add_to_page_cache
> >> not to trigger for NFS operations I am not sure.  For the 6.2 kernel
> >> it might be good to get a backtrace of the triggering of it and then
> >> use that information to see what has changed in the functions on the
> >> backtrace.
> >>
> >> stap -ldd -e 'probe vfs.add_to_page_cache { print_backtrace(); printf("Works.\n"); exit() }'
> >
> > I just get the error "Cannot specify a script with -l/-L/--dump-*
> > switches" using systemtap v4.8.
>
> Sorry,  missing a second '-' before ldd.  The command below should work:
>
> stap --ldd -e 'probe vfs.add_to_page_cache { print_backtrace(); printf("Works.\n"); exit() }'
>
> It would be useful to know if the backtraces are.  That would provide some information on how to adapt the script for newer kernels.

Right, so I got set it up on the last known "working" kernel I had,
v5.16, and this is a typical trace for a read:

root@lonc400b1 daire]# stap --ldd -e 'probe vfs.add_to_page_cache {
print_backtrace(); printf("Works.\n"); exit() }'
WARNING: Missing unwind data for a module, rerun with 'stap -d kernel'
 0xffffffff91258300 : add_to_page_cache_lru+0x0/0x30 [kernel]
 0xffffffff912585b8 : read_cache_pages+0xd8/0x1a0 [kernel]
 0xffffffffc0bbaccf
 0xffffffffc0bbaccf
 0xffffffff912589e5 : read_pages+0x155/0x250 [kernel]
 0xffffffff91258cae : page_cache_ra_unbounded+0x1ce/0x250 [kernel]
 0xffffffff91258ed0 : ondemand_readahead+0x1a0/0x300 [kernel]
 0xffffffff912592ed : page_cache_sync_ra+0xbd/0xd0 [kernel]
 0xffffffff9124cf13 : filemap_get_pages+0xe3/0x420 [kernel]
 0xffffffff9124d31e : filemap_read+0xce/0x3c0 [kernel]
 0xffffffff9124d700 : generic_file_read_iter+0xf0/0x160 [kernel]
 0xffffffffc0baea64
 0xffffffff91312c70 : new_sync_read+0x110/0x190 [kernel]
 0xffffffff9131546f : vfs_read+0xff/0x1a0 [kernel]
 0xffffffff91315b07 : ksys_read+0x67/0xe0 [kernel]
 0xffffffff91315b99 : __x64_sys_read+0x19/0x20 [kernel]
 0xffffffff91a6312b : do_syscall_64+0x3b/0x90 [kernel]
 0xffffffff91c0007c : entry_SYSCALL_64_after_hwframe+0x44/0xae [kernel]
Works.

As you said earlier, it's hitting "add_to_page_cache_lru".

I also tested with the v5.19 kernel and it no longer triggers anything
with that.

I'm going to stick my head out and say this stopped working due to all
the folio conversion patches that were added between v5.17 & v6.0?

Looking at the changelogs between v5.16 and v5.19 that's what jumps
out to me anyway.

Cheers,

Daire


> -Will
>
> >
> > Thanks for the response. It sounds like I need to find a different way
> > to work out total NFS reads for each filename path in modern kernels.
> >
> > Daire
> >
> > BTW, this is the code I had for tracking per process and file path read IO:
> >
> > probe nfs.fop.open {
> >   pid = pid()
> >   filename = sprintf("%s", d_path(&$filp->f_path))
> >   if (filename =~ "/hosts/.*/user_data") {
> >     files[pid, ino] = filename
> >     if ( !([pid, ino] in procinfo))
> >       procinfo[pid, ino] = sprintf("%s", proc())
> >   }
> > }
> >
> > probe vfs.add_to_page_cache {
> >   pid = pid()
> >   if ([pid, ino] in files ) {
> >     readpage[pid, ino] += 4096
> >     files_store[pid, ino] = sprintf("%s", files[pid, ino])
> >   }
> > }
> >
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: vfs.add_to_page_cache not working anymore?
  2023-06-14 12:11       ` Daire Byrne
@ 2023-06-14 15:58         ` Daire Byrne
  2023-11-06 11:20           ` Daire Byrne
  0 siblings, 1 reply; 7+ messages in thread
From: Daire Byrne @ 2023-06-14 15:58 UTC (permalink / raw)
  To: William Cohen; +Cc: systemtap

Thinking about this a little more, even if the vfs.stp was updated so
that add_to_page_cache was folio aware (filemap_add_folio?), I presume
my simple code that assumes we are adding a page worth of data to the
page cache would no longer be valid.

probe vfs.add_to_page_cache {
  pid = pid()
  if ([pid, ino] in files ) {
    readpage[pid, ino] += 4096
    files_store[pid, ino] = sprintf("%s", files[pid, ino])
  }
}

I would think something like filemap_add_folio can now be called once
for many pages read and I would need to track the number of pages in
each folio call too?

And I remember exactly why I was inferring (NFS) file reads via
vfs.add_to_page_cache now - I wanted to record only the file reads
that resulted in data being asked of the NFS server. In other words,
only the IO resulting in network IO from each NFS server in a time
series.

I couldn't find any other way of doing that on a per file inode basis
while taking the page cache data into account too.

If anyone knows of an easier way  to achieve the same thing, then I'll
happily do that instead.

Cheers,

Daire

On Wed, 14 Jun 2023 at 13:11, Daire Byrne <daire@dneg.com> wrote:
>
> On Tue, 13 Jun 2023 at 18:34, William Cohen <wcohen@redhat.com> wrote:
> >
> > On 6/13/23 12:39, Daire Byrne wrote:
> > > On Tue, 13 Jun 2023 at 16:22, William Cohen <wcohen@redhat.com> wrote:>
> > >> Switching to systemtap-4.9 is probably not going to change the results
> > >> in this case as there are no changes in tapset/linux/vfs.stp between
> > >> 4.8 and 4.9.
> > >
> > > Good to know, I can skip trying to compile that then...
> >
> > Yes, running a newer version of software is often the first approach to see if the problem has been fixed upstream.  However, in this case the newer version of systemtap is going to give the same results as the tapset in that area are the same.  So the focus is find what is different between the working older kernels and the current non-working kernel.
> >
> > >
> > >> Unfortunately, the kernels changes over time and some functions probed
> > >> by the tapset change over time or the way they are used by other parts
> > >> of the kernel changes.  The vfs.add_to_page cache in the vfs.stp has
> > >> three possible functions it probes: add_to_page_cache_locked,
> > >> add_to_page_cache_lru, and add_to_page_cache.  The first two functions
> > >> were added due to kernel commit f00654007fe1c15.  Did some git commit
> > >> archeology and only add_to_page_cache_lru is in the kernel due to
> > >> kernel git commit 2bb876b58d593d7f2522ec0f41f20a74fde76822.
> > >>
> > >> The following URL show where add_to_page_cache_lru is used in 6.2.16
> > >> kernels nfs and can provide some method of seeing how the nfs related
> > >> functions get called:
> > >>
> > >> https://elixir.bootlin.com/linux/v6.2.16/A/ident/add_to_page_cache_lru
> > >
> > > Thanks for the feedback and pointers, that helps me understand where
> > > the changes came from at least. It was still working on my last
> > > production kernel - v5.16.
> >
> > There are times were that is not possible when some function has been inlined and the return probe point isn't available or some argument is not available at the probe point, but we do try to adapt the tapsets and examples to work on newer kernels.
> >
> > >
> > > So if I recall, I used vfs.add_to_page cache because at the time it
> > > was the only (or easiest) way to work out total reads for mmap files
> > > from an NFS filesystem.
> > >
> > > I also would have thought it should work for any filesystem not just
> > > NFS - but I don't get any hits at all for an entire busy system.
> > >
> > >> As far as specifically what has changed to cause vfs.add_to_page_cache
> > >> not to trigger for NFS operations I am not sure.  For the 6.2 kernel
> > >> it might be good to get a backtrace of the triggering of it and then
> > >> use that information to see what has changed in the functions on the
> > >> backtrace.
> > >>
> > >> stap -ldd -e 'probe vfs.add_to_page_cache { print_backtrace(); printf("Works.\n"); exit() }'
> > >
> > > I just get the error "Cannot specify a script with -l/-L/--dump-*
> > > switches" using systemtap v4.8.
> >
> > Sorry,  missing a second '-' before ldd.  The command below should work:
> >
> > stap --ldd -e 'probe vfs.add_to_page_cache { print_backtrace(); printf("Works.\n"); exit() }'
> >
> > It would be useful to know if the backtraces are.  That would provide some information on how to adapt the script for newer kernels.
>
> Right, so I got set it up on the last known "working" kernel I had,
> v5.16, and this is a typical trace for a read:
>
> root@lonc400b1 daire]# stap --ldd -e 'probe vfs.add_to_page_cache {
> print_backtrace(); printf("Works.\n"); exit() }'
> WARNING: Missing unwind data for a module, rerun with 'stap -d kernel'
>  0xffffffff91258300 : add_to_page_cache_lru+0x0/0x30 [kernel]
>  0xffffffff912585b8 : read_cache_pages+0xd8/0x1a0 [kernel]
>  0xffffffffc0bbaccf
>  0xffffffffc0bbaccf
>  0xffffffff912589e5 : read_pages+0x155/0x250 [kernel]
>  0xffffffff91258cae : page_cache_ra_unbounded+0x1ce/0x250 [kernel]
>  0xffffffff91258ed0 : ondemand_readahead+0x1a0/0x300 [kernel]
>  0xffffffff912592ed : page_cache_sync_ra+0xbd/0xd0 [kernel]
>  0xffffffff9124cf13 : filemap_get_pages+0xe3/0x420 [kernel]
>  0xffffffff9124d31e : filemap_read+0xce/0x3c0 [kernel]
>  0xffffffff9124d700 : generic_file_read_iter+0xf0/0x160 [kernel]
>  0xffffffffc0baea64
>  0xffffffff91312c70 : new_sync_read+0x110/0x190 [kernel]
>  0xffffffff9131546f : vfs_read+0xff/0x1a0 [kernel]
>  0xffffffff91315b07 : ksys_read+0x67/0xe0 [kernel]
>  0xffffffff91315b99 : __x64_sys_read+0x19/0x20 [kernel]
>  0xffffffff91a6312b : do_syscall_64+0x3b/0x90 [kernel]
>  0xffffffff91c0007c : entry_SYSCALL_64_after_hwframe+0x44/0xae [kernel]
> Works.
>
> As you said earlier, it's hitting "add_to_page_cache_lru".
>
> I also tested with the v5.19 kernel and it no longer triggers anything
> with that.
>
> I'm going to stick my head out and say this stopped working due to all
> the folio conversion patches that were added between v5.17 & v6.0?
>
> Looking at the changelogs between v5.16 and v5.19 that's what jumps
> out to me anyway.
>
> Cheers,
>
> Daire
>
>
> > -Will
> >
> > >
> > > Thanks for the response. It sounds like I need to find a different way
> > > to work out total NFS reads for each filename path in modern kernels.
> > >
> > > Daire
> > >
> > > BTW, this is the code I had for tracking per process and file path read IO:
> > >
> > > probe nfs.fop.open {
> > >   pid = pid()
> > >   filename = sprintf("%s", d_path(&$filp->f_path))
> > >   if (filename =~ "/hosts/.*/user_data") {
> > >     files[pid, ino] = filename
> > >     if ( !([pid, ino] in procinfo))
> > >       procinfo[pid, ino] = sprintf("%s", proc())
> > >   }
> > > }
> > >
> > > probe vfs.add_to_page_cache {
> > >   pid = pid()
> > >   if ([pid, ino] in files ) {
> > >     readpage[pid, ino] += 4096
> > >     files_store[pid, ino] = sprintf("%s", files[pid, ino])
> > >   }
> > > }
> > >
> >

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: vfs.add_to_page_cache not working anymore?
  2023-06-14 15:58         ` Daire Byrne
@ 2023-11-06 11:20           ` Daire Byrne
  0 siblings, 0 replies; 7+ messages in thread
From: Daire Byrne @ 2023-11-06 11:20 UTC (permalink / raw)
  To: William Cohen; +Cc: systemtap

Just to follow up, I eventually switched from vfs.add_to_page_cache
(which never triggers on folio kernels), to
kernel.trace("mm_filemap_add_to_page_cache") and I can still get the
inode iike I did before.

However, I'm still a bit stumped as to how I can get the folio size
rather than assume it is the page size (4096) as it was prior to
folios. If any gurus could point me in the right direction I'd be
eternally grateful.

Here's my "working" code but with the (wrong) assumed folio (page) size:

probe kernel.trace("mm_filemap_add_to_page_cache") {
  pid = pid()
  ino = $folio->mapping->host->i_ino
  if ([pid, ino] in files ) {
    readpage[pid, ino] += 4096
    files_store[pid, ino] = sprintf("%s", files[pid, ino])
  }
}

Cheers.

Daire

On Wed, 14 Jun 2023 at 16:58, Daire Byrne <daire@dneg.com> wrote:
>
> Thinking about this a little more, even if the vfs.stp was updated so
> that add_to_page_cache was folio aware (filemap_add_folio?), I presume
> my simple code that assumes we are adding a page worth of data to the
> page cache would no longer be valid.
>
> probe vfs.add_to_page_cache {
>   pid = pid()
>   if ([pid, ino] in files ) {
>     readpage[pid, ino] += 4096
>     files_store[pid, ino] = sprintf("%s", files[pid, ino])
>   }
> }
>
> I would think something like filemap_add_folio can now be called once
> for many pages read and I would need to track the number of pages in
> each folio call too?
>
> And I remember exactly why I was inferring (NFS) file reads via
> vfs.add_to_page_cache now - I wanted to record only the file reads
> that resulted in data being asked of the NFS server. In other words,
> only the IO resulting in network IO from each NFS server in a time
> series.
>
> I couldn't find any other way of doing that on a per file inode basis
> while taking the page cache data into account too.
>
> If anyone knows of an easier way  to achieve the same thing, then I'll
> happily do that instead.
>
> Cheers,
>
> Daire
>
> On Wed, 14 Jun 2023 at 13:11, Daire Byrne <daire@dneg.com> wrote:
> >
> > On Tue, 13 Jun 2023 at 18:34, William Cohen <wcohen@redhat.com> wrote:
> > >
> > > On 6/13/23 12:39, Daire Byrne wrote:
> > > > On Tue, 13 Jun 2023 at 16:22, William Cohen <wcohen@redhat.com> wrote:>
> > > >> Switching to systemtap-4.9 is probably not going to change the results
> > > >> in this case as there are no changes in tapset/linux/vfs.stp between
> > > >> 4.8 and 4.9.
> > > >
> > > > Good to know, I can skip trying to compile that then...
> > >
> > > Yes, running a newer version of software is often the first approach to see if the problem has been fixed upstream.  However, in this case the newer version of systemtap is going to give the same results as the tapset in that area are the same.  So the focus is find what is different between the working older kernels and the current non-working kernel.
> > >
> > > >
> > > >> Unfortunately, the kernels changes over time and some functions probed
> > > >> by the tapset change over time or the way they are used by other parts
> > > >> of the kernel changes.  The vfs.add_to_page cache in the vfs.stp has
> > > >> three possible functions it probes: add_to_page_cache_locked,
> > > >> add_to_page_cache_lru, and add_to_page_cache.  The first two functions
> > > >> were added due to kernel commit f00654007fe1c15.  Did some git commit
> > > >> archeology and only add_to_page_cache_lru is in the kernel due to
> > > >> kernel git commit 2bb876b58d593d7f2522ec0f41f20a74fde76822.
> > > >>
> > > >> The following URL show where add_to_page_cache_lru is used in 6.2.16
> > > >> kernels nfs and can provide some method of seeing how the nfs related
> > > >> functions get called:
> > > >>
> > > >> https://elixir.bootlin.com/linux/v6.2.16/A/ident/add_to_page_cache_lru
> > > >
> > > > Thanks for the feedback and pointers, that helps me understand where
> > > > the changes came from at least. It was still working on my last
> > > > production kernel - v5.16.
> > >
> > > There are times were that is not possible when some function has been inlined and the return probe point isn't available or some argument is not available at the probe point, but we do try to adapt the tapsets and examples to work on newer kernels.
> > >
> > > >
> > > > So if I recall, I used vfs.add_to_page cache because at the time it
> > > > was the only (or easiest) way to work out total reads for mmap files
> > > > from an NFS filesystem.
> > > >
> > > > I also would have thought it should work for any filesystem not just
> > > > NFS - but I don't get any hits at all for an entire busy system.
> > > >
> > > >> As far as specifically what has changed to cause vfs.add_to_page_cache
> > > >> not to trigger for NFS operations I am not sure.  For the 6.2 kernel
> > > >> it might be good to get a backtrace of the triggering of it and then
> > > >> use that information to see what has changed in the functions on the
> > > >> backtrace.
> > > >>
> > > >> stap -ldd -e 'probe vfs.add_to_page_cache { print_backtrace(); printf("Works.\n"); exit() }'
> > > >
> > > > I just get the error "Cannot specify a script with -l/-L/--dump-*
> > > > switches" using systemtap v4.8.
> > >
> > > Sorry,  missing a second '-' before ldd.  The command below should work:
> > >
> > > stap --ldd -e 'probe vfs.add_to_page_cache { print_backtrace(); printf("Works.\n"); exit() }'
> > >
> > > It would be useful to know if the backtraces are.  That would provide some information on how to adapt the script for newer kernels.
> >
> > Right, so I got set it up on the last known "working" kernel I had,
> > v5.16, and this is a typical trace for a read:
> >
> > root@lonc400b1 daire]# stap --ldd -e 'probe vfs.add_to_page_cache {
> > print_backtrace(); printf("Works.\n"); exit() }'
> > WARNING: Missing unwind data for a module, rerun with 'stap -d kernel'
> >  0xffffffff91258300 : add_to_page_cache_lru+0x0/0x30 [kernel]
> >  0xffffffff912585b8 : read_cache_pages+0xd8/0x1a0 [kernel]
> >  0xffffffffc0bbaccf
> >  0xffffffffc0bbaccf
> >  0xffffffff912589e5 : read_pages+0x155/0x250 [kernel]
> >  0xffffffff91258cae : page_cache_ra_unbounded+0x1ce/0x250 [kernel]
> >  0xffffffff91258ed0 : ondemand_readahead+0x1a0/0x300 [kernel]
> >  0xffffffff912592ed : page_cache_sync_ra+0xbd/0xd0 [kernel]
> >  0xffffffff9124cf13 : filemap_get_pages+0xe3/0x420 [kernel]
> >  0xffffffff9124d31e : filemap_read+0xce/0x3c0 [kernel]
> >  0xffffffff9124d700 : generic_file_read_iter+0xf0/0x160 [kernel]
> >  0xffffffffc0baea64
> >  0xffffffff91312c70 : new_sync_read+0x110/0x190 [kernel]
> >  0xffffffff9131546f : vfs_read+0xff/0x1a0 [kernel]
> >  0xffffffff91315b07 : ksys_read+0x67/0xe0 [kernel]
> >  0xffffffff91315b99 : __x64_sys_read+0x19/0x20 [kernel]
> >  0xffffffff91a6312b : do_syscall_64+0x3b/0x90 [kernel]
> >  0xffffffff91c0007c : entry_SYSCALL_64_after_hwframe+0x44/0xae [kernel]
> > Works.
> >
> > As you said earlier, it's hitting "add_to_page_cache_lru".
> >
> > I also tested with the v5.19 kernel and it no longer triggers anything
> > with that.
> >
> > I'm going to stick my head out and say this stopped working due to all
> > the folio conversion patches that were added between v5.17 & v6.0?
> >
> > Looking at the changelogs between v5.16 and v5.19 that's what jumps
> > out to me anyway.
> >
> > Cheers,
> >
> > Daire
> >
> >
> > > -Will
> > >
> > > >
> > > > Thanks for the response. It sounds like I need to find a different way
> > > > to work out total NFS reads for each filename path in modern kernels.
> > > >
> > > > Daire
> > > >
> > > > BTW, this is the code I had for tracking per process and file path read IO:
> > > >
> > > > probe nfs.fop.open {
> > > >   pid = pid()
> > > >   filename = sprintf("%s", d_path(&$filp->f_path))
> > > >   if (filename =~ "/hosts/.*/user_data") {
> > > >     files[pid, ino] = filename
> > > >     if ( !([pid, ino] in procinfo))
> > > >       procinfo[pid, ino] = sprintf("%s", proc())
> > > >   }
> > > > }
> > > >
> > > > probe vfs.add_to_page_cache {
> > > >   pid = pid()
> > > >   if ([pid, ino] in files ) {
> > > >     readpage[pid, ino] += 4096
> > > >     files_store[pid, ino] = sprintf("%s", files[pid, ino])
> > > >   }
> > > > }
> > > >
> > >

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-11-06 11:20 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-12 15:39 vfs.add_to_page_cache not working anymore? Daire Byrne
2023-06-13 15:22 ` William Cohen
2023-06-13 16:39   ` Daire Byrne
2023-06-13 17:34     ` William Cohen
2023-06-14 12:11       ` Daire Byrne
2023-06-14 15:58         ` Daire Byrne
2023-11-06 11:20           ` Daire Byrne

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).