From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yb1-xb2e.google.com (mail-yb1-xb2e.google.com [IPv6:2607:f8b0:4864:20::b2e]) by sourceware.org (Postfix) with ESMTPS id 2B6C03858D3C for ; Mon, 6 Nov 2023 11:20:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2B6C03858D3C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=dneg.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=dneg.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 2B6C03858D3C Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::b2e ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699269649; cv=none; b=n6lIMyTXq6n9ve9f0L7+jYLfW/p2QTqqJ/77gVNukds6sX33u1NrNozGVKnkv1FZIjBaIZ9nIW0S7/230XBs03OCMaLnJ+XIAdd8GYvud96bzERfmW2f10KoYfzI34m8A10a9iapDBHy0IpGSv5sq/nOU5lm65Klf/jciHXnpho= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699269649; c=relaxed/simple; bh=yTNG0ScijxCn3sFhtYWvbUYUj/P+nwSsZEunWjor3sI=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=YeAoSAYh1w7nQfzc6aeuOSCspu6NvmLowckqTAaqkS0I4cnKAmLfTjqGuXDelulF+FYS+eUi/sxx+8Qn77TVSHZ+c0M76bq3jzPw8H/sYmGEs7UHW+tHO8zcVcE1vmxdYL+g6Ja81laud7boAP75TpsK+PsLjEPmFLILrBfq0ig= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-yb1-xb2e.google.com with SMTP id 3f1490d57ef6-d9ac9573274so4512302276.0 for ; Mon, 06 Nov 2023 03:20:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dneg.com; s=google; t=1699269645; x=1699874445; darn=sourceware.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=LKQppVOAFDzkntDt7J71by/4Ol/KmpM9XNTkoF6xvUY=; b=hqGPU1az8Qh0KdVY9h7Vuv1mJuIDL+0sL6/PItYYO4+aSVundf7oLfSyp2f9gYwXl0 EdFOcI/5fPraVjLyAvOFYZBWSQ5XsCeAYL1LsxiYWP1Awq82LeM5Qt0s3jJ5GtVTF72e oT51Iq+1vBzuWNipcEoGGBe497+B6EMuBDIFBRadxOtColCA5Kyvp3iGS7LPRlUt8T8G Ztdv8Cr79v2qV9FBC7iBbxNdB+lu7DcLB219VXy2Gg+s9hJEkBo5/bBfsgqkE8Q5nmXG fnRr+BiXg4wwJVDYit1trZBB48uGVqFnrKHcyuoSo04DGVAiZokwqATa312ChoiU1DBi lA0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699269645; x=1699874445; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LKQppVOAFDzkntDt7J71by/4Ol/KmpM9XNTkoF6xvUY=; b=WqjBEm8g/mkABDxn7aXzJYVJ62EszJ3Ki3pxI9ml2na7j/ATehONw44tabIMZRDuk9 eXMJtJ1bcQMk+B8WfD5+5iGsahwjJRBGG/ZgpAwThn3RZOXy6doioH7DLcUy4Uum7dKv y2z9DsjaPlgmmBqrXeweaaI91HKLazVHsPr8KAC/v41/NvfdGekS0Ffdd8CbGzQXyX8C AGlvvZMH4B+PXjKYnLlyXTL009wVM9e2fZDkcsj/sQ8Umpr3hsYWxqplUGPTH6ltlZ2E IrlodVesEtWE6bwDGkyEYdPPwHvu/PliUw+PJ8orEK97J8fEsIcUCB3o6Ym3FZn/pJa6 MTqw== X-Gm-Message-State: AOJu0YwMrwQLvwkih5hJVInfc/7Yj1LDWYF3NvnmZjZjuYUV3eCkNWOS WS4e6TTanMB14QM1m1CLxnPFQjbhBmBcsuweWnhiOA== X-Google-Smtp-Source: AGHT+IF+QiSi0pDpqpVZxkcjHmHAYDegbCcrMTtHsJZTAwsDCR9T8CCUNiPLhjxjM2RGhxukxsuVEf6AAC/MvrFB/28= X-Received: by 2002:a25:ab10:0:b0:d7b:9211:51a5 with SMTP id u16-20020a25ab10000000b00d7b921151a5mr26873034ybi.44.1699269645377; Mon, 06 Nov 2023 03:20:45 -0800 (PST) MIME-Version: 1.0 References: <4893d129-6569-4318-cf3b-7821bef98441@redhat.com> In-Reply-To: From: Daire Byrne Date: Mon, 6 Nov 2023 11:20:09 +0000 Message-ID: Subject: Re: vfs.add_to_page_cache not working anymore? To: William Cohen Cc: "systemtap@sourceware.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-0.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Just to follow up, I eventually switched from vfs.add_to_page_cache (which never triggers on folio kernels), to kernel.trace("mm_filemap_add_to_page_cache") and I can still get the inode iike I did before. However, I'm still a bit stumped as to how I can get the folio size rather than assume it is the page size (4096) as it was prior to folios. If any gurus could point me in the right direction I'd be eternally grateful. Here's my "working" code but with the (wrong) assumed folio (page) size: probe kernel.trace("mm_filemap_add_to_page_cache") { pid =3D pid() ino =3D $folio->mapping->host->i_ino if ([pid, ino] in files ) { readpage[pid, ino] +=3D 4096 files_store[pid, ino] =3D sprintf("%s", files[pid, ino]) } } Cheers. Daire On Wed, 14 Jun 2023 at 16:58, Daire Byrne wrote: > > Thinking about this a little more, even if the vfs.stp was updated so > that add_to_page_cache was folio aware (filemap_add_folio?), I presume > my simple code that assumes we are adding a page worth of data to the > page cache would no longer be valid. > > probe vfs.add_to_page_cache { > pid =3D pid() > if ([pid, ino] in files ) { > readpage[pid, ino] +=3D 4096 > files_store[pid, ino] =3D sprintf("%s", files[pid, ino]) > } > } > > I would think something like filemap_add_folio can now be called once > for many pages read and I would need to track the number of pages in > each folio call too? > > And I remember exactly why I was inferring (NFS) file reads via > vfs.add_to_page_cache now - I wanted to record only the file reads > that resulted in data being asked of the NFS server. In other words, > only the IO resulting in network IO from each NFS server in a time > series. > > I couldn't find any other way of doing that on a per file inode basis > while taking the page cache data into account too. > > If anyone knows of an easier way to achieve the same thing, then I'll > happily do that instead. > > Cheers, > > Daire > > On Wed, 14 Jun 2023 at 13:11, Daire Byrne wrote: > > > > On Tue, 13 Jun 2023 at 18:34, William Cohen wrote: > > > > > > On 6/13/23 12:39, Daire Byrne wrote: > > > > On Tue, 13 Jun 2023 at 16:22, William Cohen wro= te:> > > > >> Switching to systemtap-4.9 is probably not going to change the res= ults > > > >> in this case as there are no changes in tapset/linux/vfs.stp betwe= en > > > >> 4.8 and 4.9. > > > > > > > > Good to know, I can skip trying to compile that then... > > > > > > Yes, running a newer version of software is often the first approach = to see if the problem has been fixed upstream. However, in this case the n= ewer version of systemtap is going to give the same results as the tapset i= n that area are the same. So the focus is find what is different between t= he working older kernels and the current non-working kernel. > > > > > > > > > > >> Unfortunately, the kernels changes over time and some functions pr= obed > > > >> by the tapset change over time or the way they are used by other p= arts > > > >> of the kernel changes. The vfs.add_to_page cache in the vfs.stp h= as > > > >> three possible functions it probes: add_to_page_cache_locked, > > > >> add_to_page_cache_lru, and add_to_page_cache. The first two funct= ions > > > >> were added due to kernel commit f00654007fe1c15. Did some git com= mit > > > >> archeology and only add_to_page_cache_lru is in the kernel due to > > > >> kernel git commit 2bb876b58d593d7f2522ec0f41f20a74fde76822. > > > >> > > > >> The following URL show where add_to_page_cache_lru is used in 6.2.= 16 > > > >> kernels nfs and can provide some method of seeing how the nfs rela= ted > > > >> functions get called: > > > >> > > > >> https://elixir.bootlin.com/linux/v6.2.16/A/ident/add_to_page_cache= _lru > > > > > > > > Thanks for the feedback and pointers, that helps me understand wher= e > > > > the changes came from at least. It was still working on my last > > > > production kernel - v5.16. > > > > > > There are times were that is not possible when some function has been= inlined and the return probe point isn't available or some argument is not= available at the probe point, but we do try to adapt the tapsets and examp= les to work on newer kernels. > > > > > > > > > > > So if I recall, I used vfs.add_to_page cache because at the time it > > > > was the only (or easiest) way to work out total reads for mmap file= s > > > > from an NFS filesystem. > > > > > > > > I also would have thought it should work for any filesystem not jus= t > > > > NFS - but I don't get any hits at all for an entire busy system. > > > > > > > >> As far as specifically what has changed to cause vfs.add_to_page_c= ache > > > >> not to trigger for NFS operations I am not sure. For the 6.2 kern= el > > > >> it might be good to get a backtrace of the triggering of it and th= en > > > >> use that information to see what has changed in the functions on t= he > > > >> backtrace. > > > >> > > > >> stap -ldd -e 'probe vfs.add_to_page_cache { print_backtrace(); pri= ntf("Works.\n"); exit() }' > > > > > > > > I just get the error "Cannot specify a script with -l/-L/--dump-* > > > > switches" using systemtap v4.8. > > > > > > Sorry, missing a second '-' before ldd. The command below should wo= rk: > > > > > > stap --ldd -e 'probe vfs.add_to_page_cache { print_backtrace(); print= f("Works.\n"); exit() }' > > > > > > It would be useful to know if the backtraces are. That would provide= some information on how to adapt the script for newer kernels. > > > > Right, so I got set it up on the last known "working" kernel I had, > > v5.16, and this is a typical trace for a read: > > > > root@lonc400b1 daire]# stap --ldd -e 'probe vfs.add_to_page_cache { > > print_backtrace(); printf("Works.\n"); exit() }' > > WARNING: Missing unwind data for a module, rerun with 'stap -d kernel' > > 0xffffffff91258300 : add_to_page_cache_lru+0x0/0x30 [kernel] > > 0xffffffff912585b8 : read_cache_pages+0xd8/0x1a0 [kernel] > > 0xffffffffc0bbaccf > > 0xffffffffc0bbaccf > > 0xffffffff912589e5 : read_pages+0x155/0x250 [kernel] > > 0xffffffff91258cae : page_cache_ra_unbounded+0x1ce/0x250 [kernel] > > 0xffffffff91258ed0 : ondemand_readahead+0x1a0/0x300 [kernel] > > 0xffffffff912592ed : page_cache_sync_ra+0xbd/0xd0 [kernel] > > 0xffffffff9124cf13 : filemap_get_pages+0xe3/0x420 [kernel] > > 0xffffffff9124d31e : filemap_read+0xce/0x3c0 [kernel] > > 0xffffffff9124d700 : generic_file_read_iter+0xf0/0x160 [kernel] > > 0xffffffffc0baea64 > > 0xffffffff91312c70 : new_sync_read+0x110/0x190 [kernel] > > 0xffffffff9131546f : vfs_read+0xff/0x1a0 [kernel] > > 0xffffffff91315b07 : ksys_read+0x67/0xe0 [kernel] > > 0xffffffff91315b99 : __x64_sys_read+0x19/0x20 [kernel] > > 0xffffffff91a6312b : do_syscall_64+0x3b/0x90 [kernel] > > 0xffffffff91c0007c : entry_SYSCALL_64_after_hwframe+0x44/0xae [kernel] > > Works. > > > > As you said earlier, it's hitting "add_to_page_cache_lru". > > > > I also tested with the v5.19 kernel and it no longer triggers anything > > with that. > > > > I'm going to stick my head out and say this stopped working due to all > > the folio conversion patches that were added between v5.17 & v6.0? > > > > Looking at the changelogs between v5.16 and v5.19 that's what jumps > > out to me anyway. > > > > Cheers, > > > > Daire > > > > > > > -Will > > > > > > > > > > > Thanks for the response. It sounds like I need to find a different = way > > > > to work out total NFS reads for each filename path in modern kernel= s. > > > > > > > > Daire > > > > > > > > BTW, this is the code I had for tracking per process and file path = read IO: > > > > > > > > probe nfs.fop.open { > > > > pid =3D pid() > > > > filename =3D sprintf("%s", d_path(&$filp->f_path)) > > > > if (filename =3D~ "/hosts/.*/user_data") { > > > > files[pid, ino] =3D filename > > > > if ( !([pid, ino] in procinfo)) > > > > procinfo[pid, ino] =3D sprintf("%s", proc()) > > > > } > > > > } > > > > > > > > probe vfs.add_to_page_cache { > > > > pid =3D pid() > > > > if ([pid, ino] in files ) { > > > > readpage[pid, ino] +=3D 4096 > > > > files_store[pid, ino] =3D sprintf("%s", files[pid, ino]) > > > > } > > > > } > > > > > > >