From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yb1-xb35.google.com (mail-yb1-xb35.google.com [IPv6:2607:f8b0:4864:20::b35]) by sourceware.org (Postfix) with ESMTPS id 5EC473858C53 for ; Wed, 14 Jun 2023 15:58:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5EC473858C53 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=dneg.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=dneg.com Received: by mail-yb1-xb35.google.com with SMTP id 3f1490d57ef6-bcd0226607bso1207393276.1 for ; Wed, 14 Jun 2023 08:58:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dneg.com; s=google; t=1686758322; x=1689350322; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Od2FRFLOQCjBSNJQK8DvwlpIG0X1luWbsewlO8mOnbg=; b=VlkMzkgli8/5V3NGV0gA5894/5Lw/KGQ5uYUFWt4XVc86pB3ntw4410eY6Y7Yc+PJO jZXNAtIpfqnmPv0w36Y8iuQ17lSMv3Q55pvhdAEKH2QP7OxeoSGL/3lQFVx7GDB7QJGL gp7IgC8ItUenQoDh2o4MV4bcijXO2RH7AU4tknsEqX2LERBo/E13KuB+fjeYxPAd9DQJ nhb9+SM1dKmkwqt5QFgS/XOfaR+M9wTPlFHk2DdhN6EMNuJe3tR9vKrkc3ek6D1Jtyj4 bEIvY4Sfbh02m0Mr9EJWQCoYimvC3Li54k0Z8VVMkidqS6y4LGknHzVAy7n8wYokzpOJ 1vhg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686758322; x=1689350322; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Od2FRFLOQCjBSNJQK8DvwlpIG0X1luWbsewlO8mOnbg=; b=jFYx0d8boGrARrD9BbcEcTJDxpU5j0XiLSWSdtJvjti6maFZR5g3G8Isb1+WoJxegc MFy6yCSeRGpn6u5MsvVMX+H+FCsu0d3rJ3MGeHeTjGoHP05epZudtEkn9yUlIlYSFGOx 93cZ896GN3Eryt0ujxnaArxq7zaLHHFA9QSd32yjEU5mdx9GFiLEv9yyMIYQeFw2mE4G 6B/hfOaZ4gVmFYqZQl9JGEfbvY2WVIg4G17CgVTQGTbAsTGRllrqFGy3l3AEaOuN/Lcy qumcwmP3pt6w9ms8jyfYTn7DF+JCn6WFbFPqVTeajUzz1uowFVcEzrOJLHuqhTtiVtU2 ko5A== X-Gm-Message-State: AC+VfDxITgFpDVJpBSF6W5XCnW4XtA8WbJOLF1EMv9T9RdH+q4Sn3Yu8 2KoDBjejSpTsdSpJDOzkvSeHF2IEtm+hICIC69BmxIbbiO1Kfsc5K4o= X-Google-Smtp-Source: ACHHUZ4XAYGYFex8RpFiK9McQpeY1OlZSBZvxCeMeTJj+vXiYnpbUqch8AhNeD22mG8xkc4nnzmJ+hJik4pheduPQFs= X-Received: by 2002:a25:9309:0:b0:bb3:8945:d7d4 with SMTP id f9-20020a259309000000b00bb38945d7d4mr2564202ybo.2.1686758322608; Wed, 14 Jun 2023 08:58:42 -0700 (PDT) MIME-Version: 1.0 References: <4893d129-6569-4318-cf3b-7821bef98441@redhat.com> In-Reply-To: From: Daire Byrne Date: Wed, 14 Jun 2023 16:58:06 +0100 Message-ID: Subject: Re: vfs.add_to_page_cache not working anymore? To: William Cohen Cc: "systemtap@sourceware.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-0.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Thinking about this a little more, even if the vfs.stp was updated so that add_to_page_cache was folio aware (filemap_add_folio?), I presume my simple code that assumes we are adding a page worth of data to the page cache would no longer be valid. probe vfs.add_to_page_cache { pid =3D pid() if ([pid, ino] in files ) { readpage[pid, ino] +=3D 4096 files_store[pid, ino] =3D sprintf("%s", files[pid, ino]) } } I would think something like filemap_add_folio can now be called once for many pages read and I would need to track the number of pages in each folio call too? And I remember exactly why I was inferring (NFS) file reads via vfs.add_to_page_cache now - I wanted to record only the file reads that resulted in data being asked of the NFS server. In other words, only the IO resulting in network IO from each NFS server in a time series. I couldn't find any other way of doing that on a per file inode basis while taking the page cache data into account too. If anyone knows of an easier way to achieve the same thing, then I'll happily do that instead. Cheers, Daire On Wed, 14 Jun 2023 at 13:11, Daire Byrne wrote: > > On Tue, 13 Jun 2023 at 18:34, William Cohen wrote: > > > > On 6/13/23 12:39, Daire Byrne wrote: > > > On Tue, 13 Jun 2023 at 16:22, William Cohen wrote= :> > > >> Switching to systemtap-4.9 is probably not going to change the resul= ts > > >> in this case as there are no changes in tapset/linux/vfs.stp between > > >> 4.8 and 4.9. > > > > > > Good to know, I can skip trying to compile that then... > > > > Yes, running a newer version of software is often the first approach to= see if the problem has been fixed upstream. However, in this case the new= er version of systemtap is going to give the same results as the tapset in = that area are the same. So the focus is find what is different between the= working older kernels and the current non-working kernel. > > > > > > > >> Unfortunately, the kernels changes over time and some functions prob= ed > > >> by the tapset change over time or the way they are used by other par= ts > > >> of the kernel changes. The vfs.add_to_page cache in the vfs.stp has > > >> three possible functions it probes: add_to_page_cache_locked, > > >> add_to_page_cache_lru, and add_to_page_cache. The first two functio= ns > > >> were added due to kernel commit f00654007fe1c15. Did some git commi= t > > >> archeology and only add_to_page_cache_lru is in the kernel due to > > >> kernel git commit 2bb876b58d593d7f2522ec0f41f20a74fde76822. > > >> > > >> The following URL show where add_to_page_cache_lru is used in 6.2.16 > > >> kernels nfs and can provide some method of seeing how the nfs relate= d > > >> functions get called: > > >> > > >> https://elixir.bootlin.com/linux/v6.2.16/A/ident/add_to_page_cache_l= ru > > > > > > Thanks for the feedback and pointers, that helps me understand where > > > the changes came from at least. It was still working on my last > > > production kernel - v5.16. > > > > There are times were that is not possible when some function has been i= nlined and the return probe point isn't available or some argument is not a= vailable at the probe point, but we do try to adapt the tapsets and example= s to work on newer kernels. > > > > > > > > So if I recall, I used vfs.add_to_page cache because at the time it > > > was the only (or easiest) way to work out total reads for mmap files > > > from an NFS filesystem. > > > > > > I also would have thought it should work for any filesystem not just > > > NFS - but I don't get any hits at all for an entire busy system. > > > > > >> As far as specifically what has changed to cause vfs.add_to_page_cac= he > > >> not to trigger for NFS operations I am not sure. For the 6.2 kernel > > >> it might be good to get a backtrace of the triggering of it and then > > >> use that information to see what has changed in the functions on the > > >> backtrace. > > >> > > >> stap -ldd -e 'probe vfs.add_to_page_cache { print_backtrace(); print= f("Works.\n"); exit() }' > > > > > > I just get the error "Cannot specify a script with -l/-L/--dump-* > > > switches" using systemtap v4.8. > > > > Sorry, missing a second '-' before ldd. The command below should work= : > > > > stap --ldd -e 'probe vfs.add_to_page_cache { print_backtrace(); printf(= "Works.\n"); exit() }' > > > > It would be useful to know if the backtraces are. That would provide s= ome information on how to adapt the script for newer kernels. > > Right, so I got set it up on the last known "working" kernel I had, > v5.16, and this is a typical trace for a read: > > root@lonc400b1 daire]# stap --ldd -e 'probe vfs.add_to_page_cache { > print_backtrace(); printf("Works.\n"); exit() }' > WARNING: Missing unwind data for a module, rerun with 'stap -d kernel' > 0xffffffff91258300 : add_to_page_cache_lru+0x0/0x30 [kernel] > 0xffffffff912585b8 : read_cache_pages+0xd8/0x1a0 [kernel] > 0xffffffffc0bbaccf > 0xffffffffc0bbaccf > 0xffffffff912589e5 : read_pages+0x155/0x250 [kernel] > 0xffffffff91258cae : page_cache_ra_unbounded+0x1ce/0x250 [kernel] > 0xffffffff91258ed0 : ondemand_readahead+0x1a0/0x300 [kernel] > 0xffffffff912592ed : page_cache_sync_ra+0xbd/0xd0 [kernel] > 0xffffffff9124cf13 : filemap_get_pages+0xe3/0x420 [kernel] > 0xffffffff9124d31e : filemap_read+0xce/0x3c0 [kernel] > 0xffffffff9124d700 : generic_file_read_iter+0xf0/0x160 [kernel] > 0xffffffffc0baea64 > 0xffffffff91312c70 : new_sync_read+0x110/0x190 [kernel] > 0xffffffff9131546f : vfs_read+0xff/0x1a0 [kernel] > 0xffffffff91315b07 : ksys_read+0x67/0xe0 [kernel] > 0xffffffff91315b99 : __x64_sys_read+0x19/0x20 [kernel] > 0xffffffff91a6312b : do_syscall_64+0x3b/0x90 [kernel] > 0xffffffff91c0007c : entry_SYSCALL_64_after_hwframe+0x44/0xae [kernel] > Works. > > As you said earlier, it's hitting "add_to_page_cache_lru". > > I also tested with the v5.19 kernel and it no longer triggers anything > with that. > > I'm going to stick my head out and say this stopped working due to all > the folio conversion patches that were added between v5.17 & v6.0? > > Looking at the changelogs between v5.16 and v5.19 that's what jumps > out to me anyway. > > Cheers, > > Daire > > > > -Will > > > > > > > > Thanks for the response. It sounds like I need to find a different wa= y > > > to work out total NFS reads for each filename path in modern kernels. > > > > > > Daire > > > > > > BTW, this is the code I had for tracking per process and file path re= ad IO: > > > > > > probe nfs.fop.open { > > > pid =3D pid() > > > filename =3D sprintf("%s", d_path(&$filp->f_path)) > > > if (filename =3D~ "/hosts/.*/user_data") { > > > files[pid, ino] =3D filename > > > if ( !([pid, ino] in procinfo)) > > > procinfo[pid, ino] =3D sprintf("%s", proc()) > > > } > > > } > > > > > > probe vfs.add_to_page_cache { > > > pid =3D pid() > > > if ([pid, ino] in files ) { > > > readpage[pid, ino] +=3D 4096 > > > files_store[pid, ino] =3D sprintf("%s", files[pid, ino]) > > > } > > > } > > > > >