From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=hH40=GT=dneg.com=daire@sourceware.org>
Received: from mail-yb1-xb2e.google.com (mail-yb1-xb2e.google.com [IPv6:2607:f8b0:4864:20::b2e])
	by sourceware.org (Postfix) with ESMTPS id 2B6C03858D3C
	for <systemtap@sourceware.org>; Mon,  6 Nov 2023 11:20:46 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2B6C03858D3C
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=dneg.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=dneg.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 2B6C03858D3C
Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::b2e
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699269649; cv=none;
	b=n6lIMyTXq6n9ve9f0L7+jYLfW/p2QTqqJ/77gVNukds6sX33u1NrNozGVKnkv1FZIjBaIZ9nIW0S7/230XBs03OCMaLnJ+XIAdd8GYvud96bzERfmW2f10KoYfzI34m8A10a9iapDBHy0IpGSv5sq/nOU5lm65Klf/jciHXnpho=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
	t=1699269649; c=relaxed/simple;
	bh=yTNG0ScijxCn3sFhtYWvbUYUj/P+nwSsZEunWjor3sI=;
	h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=YeAoSAYh1w7nQfzc6aeuOSCspu6NvmLowckqTAaqkS0I4cnKAmLfTjqGuXDelulF+FYS+eUi/sxx+8Qn77TVSHZ+c0M76bq3jzPw8H/sYmGEs7UHW+tHO8zcVcE1vmxdYL+g6Ja81laud7boAP75TpsK+PsLjEPmFLILrBfq0ig=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: by mail-yb1-xb2e.google.com with SMTP id 3f1490d57ef6-d9ac9573274so4512302276.0
        for <systemtap@sourceware.org>; Mon, 06 Nov 2023 03:20:46 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=dneg.com; s=google; t=1699269645; x=1699874445; darn=sourceware.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=LKQppVOAFDzkntDt7J71by/4Ol/KmpM9XNTkoF6xvUY=;
        b=hqGPU1az8Qh0KdVY9h7Vuv1mJuIDL+0sL6/PItYYO4+aSVundf7oLfSyp2f9gYwXl0
         EdFOcI/5fPraVjLyAvOFYZBWSQ5XsCeAYL1LsxiYWP1Awq82LeM5Qt0s3jJ5GtVTF72e
         oT51Iq+1vBzuWNipcEoGGBe497+B6EMuBDIFBRadxOtColCA5Kyvp3iGS7LPRlUt8T8G
         Ztdv8Cr79v2qV9FBC7iBbxNdB+lu7DcLB219VXy2Gg+s9hJEkBo5/bBfsgqkE8Q5nmXG
         fnRr+BiXg4wwJVDYit1trZBB48uGVqFnrKHcyuoSo04DGVAiZokwqATa312ChoiU1DBi
         lA0w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1699269645; x=1699874445;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=LKQppVOAFDzkntDt7J71by/4Ol/KmpM9XNTkoF6xvUY=;
        b=WqjBEm8g/mkABDxn7aXzJYVJ62EszJ3Ki3pxI9ml2na7j/ATehONw44tabIMZRDuk9
         eXMJtJ1bcQMk+B8WfD5+5iGsahwjJRBGG/ZgpAwThn3RZOXy6doioH7DLcUy4Uum7dKv
         y2z9DsjaPlgmmBqrXeweaaI91HKLazVHsPr8KAC/v41/NvfdGekS0Ffdd8CbGzQXyX8C
         AGlvvZMH4B+PXjKYnLlyXTL009wVM9e2fZDkcsj/sQ8Umpr3hsYWxqplUGPTH6ltlZ2E
         IrlodVesEtWE6bwDGkyEYdPPwHvu/PliUw+PJ8orEK97J8fEsIcUCB3o6Ym3FZn/pJa6
         MTqw==
X-Gm-Message-State: AOJu0YwMrwQLvwkih5hJVInfc/7Yj1LDWYF3NvnmZjZjuYUV3eCkNWOS
	WS4e6TTanMB14QM1m1CLxnPFQjbhBmBcsuweWnhiOA==
X-Google-Smtp-Source: AGHT+IF+QiSi0pDpqpVZxkcjHmHAYDegbCcrMTtHsJZTAwsDCR9T8CCUNiPLhjxjM2RGhxukxsuVEf6AAC/MvrFB/28=
X-Received: by 2002:a25:ab10:0:b0:d7b:9211:51a5 with SMTP id
 u16-20020a25ab10000000b00d7b921151a5mr26873034ybi.44.1699269645377; Mon, 06
 Nov 2023 03:20:45 -0800 (PST)
MIME-Version: 1.0
References: <CAPt2mGOm2_mXZ5uNEoeZm_H1dGi17229mubquC8P_AfoipjvJQ@mail.gmail.com>
 <4893d129-6569-4318-cf3b-7821bef98441@redhat.com> <CAPt2mGNCxLwDZRePoJdU9fezL8kcQQnRdspbQncb4U6xbMLxUQ@mail.gmail.com>
 <d077e9b2-6181-64a0-6a7d-53beaa20078c@redhat.com> <CAPt2mGMNdJ3tsp8CY3V-u6tWHESgXChP=qz3emmpCj+U5MdTbQ@mail.gmail.com>
 <CAPt2mGPQzCjUR2P+TKW7DrZAhutFU3e7wy_SH5PqcEkrhCeu_A@mail.gmail.com>
In-Reply-To: <CAPt2mGPQzCjUR2P+TKW7DrZAhutFU3e7wy_SH5PqcEkrhCeu_A@mail.gmail.com>
From: Daire Byrne <daire@dneg.com>
Date: Mon, 6 Nov 2023 11:20:09 +0000
Message-ID: <CAPt2mGM00YL+r0Mf4Je3AfBbGoFdq=LaBq4Z_sfT+PN+6tVv1w@mail.gmail.com>
Subject: Re: vfs.add_to_page_cache not working anymore?
To: William Cohen <wcohen@redhat.com>
Cc: "systemtap@sourceware.org" <systemtap@sourceware.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-0.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <systemtap.sourceware.org>

Just to follow up, I eventually switched from vfs.add_to_page_cache
(which never triggers on folio kernels), to
kernel.trace("mm_filemap_add_to_page_cache") and I can still get the
inode iike I did before.

However, I'm still a bit stumped as to how I can get the folio size
rather than assume it is the page size (4096) as it was prior to
folios. If any gurus could point me in the right direction I'd be
eternally grateful.

Here's my "working" code but with the (wrong) assumed folio (page) size:

probe kernel.trace("mm_filemap_add_to_page_cache") {
  pid =3D pid()
  ino =3D $folio->mapping->host->i_ino
  if ([pid, ino] in files ) {
    readpage[pid, ino] +=3D 4096
    files_store[pid, ino] =3D sprintf("%s", files[pid, ino])
  }
}

Cheers.

Daire

On Wed, 14 Jun 2023 at 16:58, Daire Byrne <daire@dneg.com> wrote:
>
> Thinking about this a little more, even if the vfs.stp was updated so
> that add_to_page_cache was folio aware (filemap_add_folio?), I presume
> my simple code that assumes we are adding a page worth of data to the
> page cache would no longer be valid.
>
> probe vfs.add_to_page_cache {
>   pid =3D pid()
>   if ([pid, ino] in files ) {
>     readpage[pid, ino] +=3D 4096
>     files_store[pid, ino] =3D sprintf("%s", files[pid, ino])
>   }
> }
>
> I would think something like filemap_add_folio can now be called once
> for many pages read and I would need to track the number of pages in
> each folio call too?
>
> And I remember exactly why I was inferring (NFS) file reads via
> vfs.add_to_page_cache now - I wanted to record only the file reads
> that resulted in data being asked of the NFS server. In other words,
> only the IO resulting in network IO from each NFS server in a time
> series.
>
> I couldn't find any other way of doing that on a per file inode basis
> while taking the page cache data into account too.
>
> If anyone knows of an easier way  to achieve the same thing, then I'll
> happily do that instead.
>
> Cheers,
>
> Daire
>
> On Wed, 14 Jun 2023 at 13:11, Daire Byrne <daire@dneg.com> wrote:
> >
> > On Tue, 13 Jun 2023 at 18:34, William Cohen <wcohen@redhat.com> wrote:
> > >
> > > On 6/13/23 12:39, Daire Byrne wrote:
> > > > On Tue, 13 Jun 2023 at 16:22, William Cohen <wcohen@redhat.com> wro=
te:>
> > > >> Switching to systemtap-4.9 is probably not going to change the res=
ults
> > > >> in this case as there are no changes in tapset/linux/vfs.stp betwe=
en
> > > >> 4.8 and 4.9.
> > > >
> > > > Good to know, I can skip trying to compile that then...
> > >
> > > Yes, running a newer version of software is often the first approach =
to see if the problem has been fixed upstream.  However, in this case the n=
ewer version of systemtap is going to give the same results as the tapset i=
n that area are the same.  So the focus is find what is different between t=
he working older kernels and the current non-working kernel.
> > >
> > > >
> > > >> Unfortunately, the kernels changes over time and some functions pr=
obed
> > > >> by the tapset change over time or the way they are used by other p=
arts
> > > >> of the kernel changes.  The vfs.add_to_page cache in the vfs.stp h=
as
> > > >> three possible functions it probes: add_to_page_cache_locked,
> > > >> add_to_page_cache_lru, and add_to_page_cache.  The first two funct=
ions
> > > >> were added due to kernel commit f00654007fe1c15.  Did some git com=
mit
> > > >> archeology and only add_to_page_cache_lru is in the kernel due to
> > > >> kernel git commit 2bb876b58d593d7f2522ec0f41f20a74fde76822.
> > > >>
> > > >> The following URL show where add_to_page_cache_lru is used in 6.2.=
16
> > > >> kernels nfs and can provide some method of seeing how the nfs rela=
ted
> > > >> functions get called:
> > > >>
> > > >> https://elixir.bootlin.com/linux/v6.2.16/A/ident/add_to_page_cache=
_lru
> > > >
> > > > Thanks for the feedback and pointers, that helps me understand wher=
e
> > > > the changes came from at least. It was still working on my last
> > > > production kernel - v5.16.
> > >
> > > There are times were that is not possible when some function has been=
 inlined and the return probe point isn't available or some argument is not=
 available at the probe point, but we do try to adapt the tapsets and examp=
les to work on newer kernels.
> > >
> > > >
> > > > So if I recall, I used vfs.add_to_page cache because at the time it
> > > > was the only (or easiest) way to work out total reads for mmap file=
s
> > > > from an NFS filesystem.
> > > >
> > > > I also would have thought it should work for any filesystem not jus=
t
> > > > NFS - but I don't get any hits at all for an entire busy system.
> > > >
> > > >> As far as specifically what has changed to cause vfs.add_to_page_c=
ache
> > > >> not to trigger for NFS operations I am not sure.  For the 6.2 kern=
el
> > > >> it might be good to get a backtrace of the triggering of it and th=
en
> > > >> use that information to see what has changed in the functions on t=
he
> > > >> backtrace.
> > > >>
> > > >> stap -ldd -e 'probe vfs.add_to_page_cache { print_backtrace(); pri=
ntf("Works.\n"); exit() }'
> > > >
> > > > I just get the error "Cannot specify a script with -l/-L/--dump-*
> > > > switches" using systemtap v4.8.
> > >
> > > Sorry,  missing a second '-' before ldd.  The command below should wo=
rk:
> > >
> > > stap --ldd -e 'probe vfs.add_to_page_cache { print_backtrace(); print=
f("Works.\n"); exit() }'
> > >
> > > It would be useful to know if the backtraces are.  That would provide=
 some information on how to adapt the script for newer kernels.
> >
> > Right, so I got set it up on the last known "working" kernel I had,
> > v5.16, and this is a typical trace for a read:
> >
> > root@lonc400b1 daire]# stap --ldd -e 'probe vfs.add_to_page_cache {
> > print_backtrace(); printf("Works.\n"); exit() }'
> > WARNING: Missing unwind data for a module, rerun with 'stap -d kernel'
> >  0xffffffff91258300 : add_to_page_cache_lru+0x0/0x30 [kernel]
> >  0xffffffff912585b8 : read_cache_pages+0xd8/0x1a0 [kernel]
> >  0xffffffffc0bbaccf
> >  0xffffffffc0bbaccf
> >  0xffffffff912589e5 : read_pages+0x155/0x250 [kernel]
> >  0xffffffff91258cae : page_cache_ra_unbounded+0x1ce/0x250 [kernel]
> >  0xffffffff91258ed0 : ondemand_readahead+0x1a0/0x300 [kernel]
> >  0xffffffff912592ed : page_cache_sync_ra+0xbd/0xd0 [kernel]
> >  0xffffffff9124cf13 : filemap_get_pages+0xe3/0x420 [kernel]
> >  0xffffffff9124d31e : filemap_read+0xce/0x3c0 [kernel]
> >  0xffffffff9124d700 : generic_file_read_iter+0xf0/0x160 [kernel]
> >  0xffffffffc0baea64
> >  0xffffffff91312c70 : new_sync_read+0x110/0x190 [kernel]
> >  0xffffffff9131546f : vfs_read+0xff/0x1a0 [kernel]
> >  0xffffffff91315b07 : ksys_read+0x67/0xe0 [kernel]
> >  0xffffffff91315b99 : __x64_sys_read+0x19/0x20 [kernel]
> >  0xffffffff91a6312b : do_syscall_64+0x3b/0x90 [kernel]
> >  0xffffffff91c0007c : entry_SYSCALL_64_after_hwframe+0x44/0xae [kernel]
> > Works.
> >
> > As you said earlier, it's hitting "add_to_page_cache_lru".
> >
> > I also tested with the v5.19 kernel and it no longer triggers anything
> > with that.
> >
> > I'm going to stick my head out and say this stopped working due to all
> > the folio conversion patches that were added between v5.17 & v6.0?
> >
> > Looking at the changelogs between v5.16 and v5.19 that's what jumps
> > out to me anyway.
> >
> > Cheers,
> >
> > Daire
> >
> >
> > > -Will
> > >
> > > >
> > > > Thanks for the response. It sounds like I need to find a different =
way
> > > > to work out total NFS reads for each filename path in modern kernel=
s.
> > > >
> > > > Daire
> > > >
> > > > BTW, this is the code I had for tracking per process and file path =
read IO:
> > > >
> > > > probe nfs.fop.open {
> > > >   pid =3D pid()
> > > >   filename =3D sprintf("%s", d_path(&$filp->f_path))
> > > >   if (filename =3D~ "/hosts/.*/user_data") {
> > > >     files[pid, ino] =3D filename
> > > >     if ( !([pid, ino] in procinfo))
> > > >       procinfo[pid, ino] =3D sprintf("%s", proc())
> > > >   }
> > > > }
> > > >
> > > > probe vfs.add_to_page_cache {
> > > >   pid =3D pid()
> > > >   if ([pid, ino] in files ) {
> > > >     readpage[pid, ino] +=3D 4096
> > > >     files_store[pid, ino] =3D sprintf("%s", files[pid, ino])
> > > >   }
> > > > }
> > > >
> > >