* Catastrophic Cygwin find . -ls, grep performance on samba share compared to WSL&Linux @ 2023-12-06 4:08 Dan Shelton 2023-12-18 6:22 ` Dan Shelton 0 siblings, 1 reply; 15+ messages in thread From: Dan Shelton @ 2023-12-06 4:08 UTC (permalink / raw) To: cygwin Hello! I am unhappy to report a severe performance issue with find -ls, ls -R and grep -r, with Cygwin 3.4.9 and Cygwin 3.5.0 when samba shares are involved. Imagine a directory with 256 subdirs, and each has 256 files per subdir, all on a samba share, samba server is on Linux with tmpfs. mkdir dir1 for ((i=0;i<256;i++)) ; do mkdir "dir1/subdir$i" for ((j=0; j < 256;j++));do echo "j=$j" >"dir1/subdir$i/j$j.txt" done done Time comparisations then show a dramatic difference, Debian Linux accessing the samba share, WSL accessing the samba share, and Cygwin accessing the samba share: 1. time find . >/dev/null Cygwin 86 seconds WSL 23 seconds Debian 19 seconds 2. time find . -ls >/dev/null Cygwin 129 seconds WSL 38 seconds Debian 32 seconds 3. time grep -r -E NOMATCH 2>/dev/null Cygwin 390 seconds WSL 144 seconds Debian 141 seconds So where does the bad Cygwin performance come from? Virus checker, memory compression and other Windows services known to interfere with benchmarking are OFF. But the network trace shows a dramatic difference: While Debian and WSL open files only once, the Cygwin run spends lots of network traffic checking whether the txt files are txt.lnk, txt,bat.lnk and so on, all non existent files. Why does that happen? -- Dan Shelton - Cluster Specialist Win/Lin/Bsd ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Catastrophic Cygwin find . -ls, grep performance on samba share compared to WSL&Linux 2023-12-06 4:08 Catastrophic Cygwin find . -ls, grep performance on samba share compared to WSL&Linux Dan Shelton @ 2023-12-18 6:22 ` Dan Shelton 2023-12-18 6:49 ` Marco Atzeri 2023-12-20 17:20 ` Kaz Kylheku 0 siblings, 2 replies; 15+ messages in thread From: Dan Shelton @ 2023-12-18 6:22 UTC (permalink / raw) To: cygwin On Wed, 6 Dec 2023 at 05:08, Dan Shelton <dan.f.shelton@gmail.com> wrote: > > Hello! > I am unhappy to report a severe performance issue with find -ls, ls -R > and grep -r, with Cygwin 3.4.9 and Cygwin 3.5.0 when samba shares are > involved. > > Imagine a directory with 256 subdirs, and each has 256 files per > subdir, all on a samba share, samba server is on Linux with tmpfs. > > mkdir dir1 > for ((i=0;i<256;i++)) ; do > mkdir "dir1/subdir$i" > for ((j=0; j < 256;j++));do > echo "j=$j" >"dir1/subdir$i/j$j.txt" > done > done > > Time comparisations then show a dramatic difference, Debian Linux > accessing the samba share, WSL accessing the samba share, and Cygwin > accessing the samba share: > 1. time find . >/dev/null > Cygwin 86 seconds > WSL 23 seconds > Debian 19 seconds > > 2. time find . -ls >/dev/null > Cygwin 129 seconds > WSL 38 seconds > Debian 32 seconds > > 3. time grep -r -E NOMATCH 2>/dev/null > Cygwin 390 seconds > WSL 144 seconds > Debian 141 seconds > > So where does the bad Cygwin performance come from? Virus checker, > memory compression and other Windows services known to interfere with > benchmarking are OFF. > > But the network trace shows a dramatic difference: While Debian and > WSL open files only once, the Cygwin run spends lots of network > traffic checking whether the txt files are txt.lnk, txt,bat.lnk and so > on, all non existent files. > > Why does that happen? It would be nice if someone from the Cygwin authors could assist me in figuring out why this happens. My working theory is that the extra file and dir lookup calls are for soft- and hardlink emulation for file systems which do not have soft- or hardlinks? If this is correct, then a fix might be to 1) determinate the filesystem type (cached, per process lifetime in absence of /etc/mnttab) and its boundaries (mount point, and whether other muont points are below it) 2) Only use the emulation for FAT filesystems, and for NTFS, REFS, SMBFS the native filesystem link is used. Help! Dan -- Dan Shelton - Cluster Specialist Win/Lin/Bsd ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Catastrophic Cygwin find . -ls, grep performance on samba share compared to WSL&Linux 2023-12-18 6:22 ` Dan Shelton @ 2023-12-18 6:49 ` Marco Atzeri 2023-12-18 6:53 ` Dan Shelton 2023-12-20 17:20 ` Kaz Kylheku 1 sibling, 1 reply; 15+ messages in thread From: Marco Atzeri @ 2023-12-18 6:49 UTC (permalink / raw) To: cygwin On 18/12/2023 07:22, Dan Shelton via Cygwin wrote: > On Wed, 6 Dec 2023 at 05:08, Dan Shelton <dan.f.shelton@gmail.com> wrote: >> >> Hello! >> I am unhappy to report a severe performance issue with find -ls, ls -R >> and grep -r, with Cygwin 3.4.9 and Cygwin 3.5.0 when samba shares are >> involved. >> >> Imagine a directory with 256 subdirs, and each has 256 files per >> subdir, all on a samba share, samba server is on Linux with tmpfs. >> >> mkdir dir1 >> for ((i=0;i<256;i++)) ; do >> mkdir "dir1/subdir$i" >> for ((j=0; j < 256;j++));do >> echo "j=$j" >"dir1/subdir$i/j$j.txt" >> done >> done >> >> Time comparisations then show a dramatic difference, Debian Linux >> accessing the samba share, WSL accessing the samba share, and Cygwin >> accessing the samba share: .. >> Why does that happen? > > It would be nice if someone from the Cygwin authors could assist me in > figuring out why this happens. > > My working theory is that the extra file and dir lookup calls are for > soft- and hardlink emulation for file systems which do not have soft- > or hardlinks? > If this is correct, then a fix might be to 1) determinate the > filesystem type (cached, per process lifetime in absence of > /etc/mnttab) and its boundaries (mount point, and whether other muont > points are below it) 2) Only use the emulation for FAT filesystems, > and for NTFS, REFS, SMBFS the native filesystem link is used. > > Help! > > Dan Is your cygserver running ? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Catastrophic Cygwin find . -ls, grep performance on samba share compared to WSL&Linux 2023-12-18 6:49 ` Marco Atzeri @ 2023-12-18 6:53 ` Dan Shelton 2023-12-18 7:05 ` Marco Atzeri 0 siblings, 1 reply; 15+ messages in thread From: Dan Shelton @ 2023-12-18 6:53 UTC (permalink / raw) To: cygwin On Mon, 18 Dec 2023 at 07:49, Marco Atzeri via Cygwin <cygwin@cygwin.com> wrote: > > On 18/12/2023 07:22, Dan Shelton via Cygwin wrote: > > On Wed, 6 Dec 2023 at 05:08, Dan Shelton <dan.f.shelton@gmail.com> wrote: > >> > >> Hello! > >> I am unhappy to report a severe performance issue with find -ls, ls -R > >> and grep -r, with Cygwin 3.4.9 and Cygwin 3.5.0 when samba shares are > >> involved. > >> > >> Imagine a directory with 256 subdirs, and each has 256 files per > >> subdir, all on a samba share, samba server is on Linux with tmpfs. > >> > >> mkdir dir1 > >> for ((i=0;i<256;i++)) ; do > >> mkdir "dir1/subdir$i" > >> for ((j=0; j < 256;j++));do > >> echo "j=$j" >"dir1/subdir$i/j$j.txt" > >> done > >> done > >> > >> Time comparisations then show a dramatic difference, Debian Linux > >> accessing the samba share, WSL accessing the samba share, and Cygwin > >> accessing the samba share: > .. > >> Why does that happen? > > > > It would be nice if someone from the Cygwin authors could assist me in > > figuring out why this happens. > > > > My working theory is that the extra file and dir lookup calls are for > > soft- and hardlink emulation for file systems which do not have soft- > > or hardlinks? > > If this is correct, then a fix might be to 1) determinate the > > filesystem type (cached, per process lifetime in absence of > > /etc/mnttab) and its boundaries (mount point, and whether other muont > > points are below it) 2) Only use the emulation for FAT filesystems, > > and for NTFS, REFS, SMBFS the native filesystem link is used. > > > > Help! > > > > Dan > > Is your cygserver running ? Yes, Cygserver is running Dan -- Dan Shelton - Cluster Specialist Win/Lin/Bsd ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Catastrophic Cygwin find . -ls, grep performance on samba share compared to WSL&Linux 2023-12-18 6:53 ` Dan Shelton @ 2023-12-18 7:05 ` Marco Atzeri 2023-12-18 7:16 ` Dan Shelton 0 siblings, 1 reply; 15+ messages in thread From: Marco Atzeri @ 2023-12-18 7:05 UTC (permalink / raw) To: cygwin On 18/12/2023 07:53, Dan Shelton via Cygwin wrote: > On Mon, 18 Dec 2023 at 07:49, Marco Atzeri via Cygwin <cygwin@cygwin.com> wrote: >> >> >> Is your cygserver running ? > > Yes, Cygserver is running > > Dan Hi Dan, the fact that you have only a factor 2 to 4 compared to WSL and Debian tell me that Cygwin is very effective as User space enviroment. 1. time find . >/dev/null Cygwin 86 seconds WSL 23 seconds Debian 19 seconds 2. time find . -ls >/dev/null Cygwin 129 seconds WSL 38 seconds Debian 32 seconds 3. time grep -r -E NOMATCH 2>/dev/null Cygwin 390 seconds WSL 144 seconds Debian 141 seconds Cygwin can not go faster than the engine below and there are several cubersome tricks to handle the POSIX compliance I have seen worst timing trying to emulate Unix Layer on top of a not supporting Microsoft environment. Regards Marco ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Catastrophic Cygwin find . -ls, grep performance on samba share compared to WSL&Linux 2023-12-18 7:05 ` Marco Atzeri @ 2023-12-18 7:16 ` Dan Shelton 2023-12-18 8:23 ` Marco Atzeri 0 siblings, 1 reply; 15+ messages in thread From: Dan Shelton @ 2023-12-18 7:16 UTC (permalink / raw) To: cygwin On Mon, 18 Dec 2023 at 08:05, Marco Atzeri via Cygwin <cygwin@cygwin.com> wrote: > > On 18/12/2023 07:53, Dan Shelton via Cygwin wrote: > > On Mon, 18 Dec 2023 at 07:49, Marco Atzeri via Cygwin <cygwin@cygwin.com> wrote: > >> > > >> > >> Is your cygserver running ? > > > > Yes, Cygserver is running > > > > Dan > > Hi Dan, > > the fact that you have only a factor 2 to 4 compared to WSL and Debian > tell me that Cygwin is very effective as User space enviroment. > > > 1. time find . >/dev/null > Cygwin 86 seconds > WSL 23 seconds > Debian 19 seconds > > 2. time find . -ls >/dev/null > Cygwin 129 seconds > WSL 38 seconds > Debian 32 seconds > > 3. time grep -r -E NOMATCH 2>/dev/null > Cygwin 390 seconds > WSL 144 seconds > Debian 141 seconds > > Cygwin can not go faster than the engine below and there are > several cubersome tricks to handle the POSIX compliance > > I have seen worst timing trying to emulate Unix Layer on top of > a not supporting Microsoft environment. Sorry, but I disagree. I think that Cygwin could compete with WSL in terms of performance. I think the issue is just bad symlink emulation for filesystems which do not need symlink emulation. Dan -- Dan Shelton - Cluster Specialist Win/Lin/Bsd ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Catastrophic Cygwin find . -ls, grep performance on samba share compared to WSL&Linux 2023-12-18 7:16 ` Dan Shelton @ 2023-12-18 8:23 ` Marco Atzeri 0 siblings, 0 replies; 15+ messages in thread From: Marco Atzeri @ 2023-12-18 8:23 UTC (permalink / raw) To: cygwin On 18/12/2023 08:16, Dan Shelton via Cygwin wrote: > On Mon, 18 Dec 2023 at 08:05, Marco Atzeri via Cygwin <cygwin@cygwin.com> wrote: >> >> >> Cygwin can not go faster than the engine below and there are >> several cubersome tricks to handle the POSIX compliance >> >> I have seen worst timing trying to emulate Unix Layer on top of >> a not supporting Microsoft environment. > > Sorry, but I disagree. I think that Cygwin could compete with WSL in > terms of performance. > I think the issue is just bad symlink emulation for filesystems which > do not need symlink emulation. > https://cygwin.com/acronyms/#PTC > Dan Regards Marco ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Catastrophic Cygwin find . -ls, grep performance on samba share compared to WSL&Linux 2023-12-18 6:22 ` Dan Shelton 2023-12-18 6:49 ` Marco Atzeri @ 2023-12-20 17:20 ` Kaz Kylheku 2023-12-21 12:16 ` rfe: CYGWIN fslinktypes option? " Martin Wege 1 sibling, 1 reply; 15+ messages in thread From: Kaz Kylheku @ 2023-12-20 17:20 UTC (permalink / raw) To: Dan Shelton; +Cc: cygwin On 2023-12-17 22:22, Dan Shelton via Cygwin wrote: > It would be nice if someone from the Cygwin authors could assist me in > figuring out why this happens. Cygwin is famously slow; this is nothing new. We are grateful for Cygwin because it makes stuff work at all; if it were blazing fast that would be a bonus. E.g. git operations (clone, rebase, ...); ./configure scripts; ...: all run like molasses. The following is just my fast and loose opinion, shot from the hip, and possibly off or wrong, but it likely has to do with the layering. Cygwin's core API is based on a C library called Newlib. Cygwin bolts Newlib to Windows by means of an additional shim below Newlib that is based on C++ objects, where there is path munging going on and such, and that's where the Win32 calls get made. It's an additional abstraction. I worked with the internals a bit when producing the Cygnal project. ^ permalink raw reply [flat|nested] 15+ messages in thread
* rfe: CYGWIN fslinktypes option? Re: Catastrophic Cygwin find . -ls, grep performance on samba share compared to WSL&Linux 2023-12-20 17:20 ` Kaz Kylheku @ 2023-12-21 12:16 ` Martin Wege 2023-12-21 16:10 ` Cedric Blancher ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: Martin Wege @ 2023-12-21 12:16 UTC (permalink / raw) To: cygwin On Wed, Dec 20, 2023 at 6:21 PM Kaz Kylheku via Cygwin <cygwin@cygwin.com> wrote: > > On 2023-12-17 22:22, Dan Shelton via Cygwin wrote: > > It would be nice if someone from the Cygwin authors could assist me in > > figuring out why this happens. > > Cygwin is famously slow; this is nothing new. We are grateful > for Cygwin because it makes stuff work at all; if it were blazing > fast that would be a bonus. > > E.g. git operations (clone, rebase, ...); ./configure scripts; ...: all > run like molasses. > > The following is just my fast and loose opinion, shot from the hip, > and possibly off or wrong, but it likely has to do with the layering. > Cygwin's core API is based on a C library called Newlib. Cygwin bolts > Newlib to Windows by means of an additional shim below Newlib that > is based on C++ objects, where there is path munging going on and such, > and that's where the Win32 calls get made. It's an additional abstraction. I disagree with that. Ok, part of that is that the layering causes more memory allocations and copies, but this is not the root cause. The root cause is IMO the extra Win32 syscalls (>= 3 per file lookup, compared to 1 on Linux) to lookup the *.lnk and *.exe.lnk files on filesystems which have native link support (NTFS, ReFS, SMBFS, NFS). On SMBFS and NFS it hurts the most, because access latency is the highest for networked filesystems. So my proposal would be to add an option ('fslinktypes') to the CYGWIN environment variable to define which types of links are supported: default 'all'. which is an shortcut for 'native,lnk,lnkexe'. So in case people do not want 'lnk' link support they just add CYGWIN+=' fslinktypes:native' to env, to turn off support for lnk/lnk.exe style links, and be happy. @Corinna Vinschen Would that be acceptable? Thanks, Martin ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: rfe: CYGWIN fslinktypes option? Re: Catastrophic Cygwin find . -ls, grep performance on samba share compared to WSL&Linux 2023-12-21 12:16 ` rfe: CYGWIN fslinktypes option? " Martin Wege @ 2023-12-21 16:10 ` Cedric Blancher 2023-12-21 17:43 ` Brian Inglis 2023-12-21 20:32 ` Kaz Kylheku 2023-12-22 18:53 ` Andrey Repin 2 siblings, 1 reply; 15+ messages in thread From: Cedric Blancher @ 2023-12-21 16:10 UTC (permalink / raw) To: cygwin On Thu, 21 Dec 2023 at 13:17, Martin Wege via Cygwin <cygwin@cygwin.com> wrote: > > On Wed, Dec 20, 2023 at 6:21 PM Kaz Kylheku via Cygwin > <cygwin@cygwin.com> wrote: > > > > On 2023-12-17 22:22, Dan Shelton via Cygwin wrote: > > > It would be nice if someone from the Cygwin authors could assist me in > > > figuring out why this happens. > > > > Cygwin is famously slow; this is nothing new. We are grateful > > for Cygwin because it makes stuff work at all; if it were blazing > > fast that would be a bonus. > > > > E.g. git operations (clone, rebase, ...); ./configure scripts; ...: all > > run like molasses. > > > > The following is just my fast and loose opinion, shot from the hip, > > and possibly off or wrong, but it likely has to do with the layering. > > Cygwin's core API is based on a C library called Newlib. Cygwin bolts > > Newlib to Windows by means of an additional shim below Newlib that > > is based on C++ objects, where there is path munging going on and such, > > and that's where the Win32 calls get made. It's an additional abstraction. > > I disagree with that. Ok, part of that is that the layering causes > more memory allocations and copies, but this is not the root cause. > > The root cause is IMO the extra Win32 syscalls (>= 3 per file lookup, > compared to 1 on Linux) to lookup the *.lnk and *.exe.lnk files on > filesystems which have native link support (NTFS, ReFS, SMBFS, NFS). > On SMBFS and NFS it hurts the most, because access latency is the > highest for networked filesystems. > > So my proposal would be to add an option ('fslinktypes') to the CYGWIN > environment variable to define which types of links are supported: > default 'all'. which is an shortcut for 'native,lnk,lnkexe'. > So in case people do not want 'lnk' link support they just add > CYGWIN+=' fslinktypes:native' to env, to turn off support for > lnk/lnk.exe style links, and be happy. > > @Corinna Vinschen Would that be acceptable? +1 for this proposal, which is almost the same idea as I proposed in https://www.mail-archive.com/cygwin@cygwin.com/msg174612.html Ced -- Cedric Blancher <cedric.blancher@gmail.com> [https://plus.google.com/u/0/+CedricBlancher/] Institute Pasteur ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: rfe: CYGWIN fslinktypes option? Re: Catastrophic Cygwin find . -ls, grep performance on samba share compared to WSL&Linux 2023-12-21 16:10 ` Cedric Blancher @ 2023-12-21 17:43 ` Brian Inglis 0 siblings, 0 replies; 15+ messages in thread From: Brian Inglis @ 2023-12-21 17:43 UTC (permalink / raw) To: cygwin On 2023-12-21 09:10, Cedric Blancher via Cygwin wrote: > On Thu, 21 Dec 2023 at 13:17, Martin Wege via Cygwin wrote: >> On Wed, Dec 20, 2023 at 6:21 PM Kaz Kylheku via Cygwin wrote: >>> On 2023-12-17 22:22, Dan Shelton via Cygwin wrote: >>>> It would be nice if someone from the Cygwin authors could assist me in >>>> figuring out why this happens. >>> Cygwin is famously slow; this is nothing new. We are grateful >>> for Cygwin because it makes stuff work at all; if it were blazing >>> fast that would be a bonus. >>> >>> E.g. git operations (clone, rebase, ...); ./configure scripts; ...: all >>> run like molasses. >>> >>> The following is just my fast and loose opinion, shot from the hip, >>> and possibly off or wrong, but it likely has to do with the layering. >>> Cygwin's core API is based on a C library called Newlib. Cygwin bolts >>> Newlib to Windows by means of an additional shim below Newlib that >>> is based on C++ objects, where there is path munging going on and such, >>> and that's where the Win32 calls get made. It's an additional abstraction. Cygwin is a newlib libc implementation providing some POSIX functionality using C++ functions calling x86_64 Windows functions, often entirely replacing a group of newlib functions, to support OS features or POSIX equivalents, including locales, UTF-8 and other multi-byte character sets, time zones, files, directories, processes, to provide as complete as possible hosted OS features, rather than newlib's usual base embedded RT (OS or not) targets. >> I disagree with that. Ok, part of that is that the layering causes >> more memory allocations and copies, but this is not the root cause. >> >> The root cause is IMO the extra Win32 syscalls (>= 3 per file lookup, >> compared to 1 on Linux) to lookup the *.lnk and *.exe.lnk files on >> filesystems which have native link support (NTFS, ReFS, SMBFS, NFS). >> On SMBFS and NFS it hurts the most, because access latency is the >> highest for networked filesystems. Run some commands under strace to produce logs with timing info and tell us how much that is a time factor, relative to the Windows emulation time, and the application functions? >> So my proposal would be to add an option ('fslinktypes') to the CYGWIN >> environment variable to define which types of links are supported: >> default 'all'. which is an shortcut for 'native,lnk,lnkexe'. >> So in case people do not want 'lnk' link support they just add >> CYGWIN+=' fslinktypes:native' to env, to turn off support for >> lnk/lnk.exe style links, and be happy. >> >> @Corinna Vinschen Would that be acceptable? > > +1 for this proposal, which is almost the same idea as I proposed in > https://www.mail-archive.com/cygwin@cygwin.com/msg174612.html We are all volunteers here, so you can clone the repo, install the cygwin package build deps, follow the build instructions, make the required changes, rebuild, install and test the dll, then git format-patch/send-email to cygwin-patches list for consideration. -- Take care. Thanks, Brian Inglis Calgary, Alberta, Canada La perfection est atteinte Perfection is achieved non pas lorsqu'il n'y a plus rien à ajouter not when there is no more to add mais lorsqu'il n'y a plus rien à retirer but when there is no more to cut -- Antoine de Saint-Exupéry ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: rfe: CYGWIN fslinktypes option? Re: Catastrophic Cygwin find . -ls, grep performance on samba share compared to WSL&Linux 2023-12-21 12:16 ` rfe: CYGWIN fslinktypes option? " Martin Wege 2023-12-21 16:10 ` Cedric Blancher @ 2023-12-21 20:32 ` Kaz Kylheku 2023-12-24 0:47 ` Roland Mainz 2023-12-22 18:53 ` Andrey Repin 2 siblings, 1 reply; 15+ messages in thread From: Kaz Kylheku @ 2023-12-21 20:32 UTC (permalink / raw) To: Martin Wege; +Cc: cygwin On 2023-12-21 04:16, Martin Wege via Cygwin wrote: > On Wed, Dec 20, 2023 at 6:21 PM Kaz Kylheku via Cygwin > <cygwin@cygwin.com> wrote: >> >> On 2023-12-17 22:22, Dan Shelton via Cygwin wrote: >> > It would be nice if someone from the Cygwin authors could assist me in >> > figuring out why this happens. >> >> Cygwin is famously slow; this is nothing new. We are grateful >> for Cygwin because it makes stuff work at all; if it were blazing >> fast that would be a bonus. >> >> E.g. git operations (clone, rebase, ...); ./configure scripts; ...: all >> run like molasses. >> >> The following is just my fast and loose opinion, shot from the hip, >> and possibly off or wrong, but it likely has to do with the layering. >> Cygwin's core API is based on a C library called Newlib. Cygwin bolts >> Newlib to Windows by means of an additional shim below Newlib that >> is based on C++ objects, where there is path munging going on and such, >> and that's where the Win32 calls get made. It's an additional abstraction. > > I disagree with that. Ok, part of that is that the layering causes > more memory allocations and copies, but this is not the root cause. I seem to recall that most operations that take a path argument have to convert the path from Cygwin to Win32, and I think that also involves going from 8 bit to UTF-16 also. That's gotta hurt a bit. > The root cause is IMO the extra Win32 syscalls (>= 3 per file lookup, > compared to 1 on Linux) to lookup the *.lnk and *.exe.lnk files on > filesystems which have native link support (NTFS, ReFS, SMBFS, NFS). > On SMBFS and NFS it hurts the most, because access latency is the > highest for networked filesystems. Could some intelligent caching be added there? (Discussion of associated invalidation problem in 3... 2.... 1... ) Can you discuss more details, so people don't have to dive into code to understand it? If we are accessing some file "foo", the application or user may actually be referring to a "foo.lnk" link. But in the happy case that "foo" exists, why would we bother looking for "foo.lnk"? If "foo" does not exist, but "foo.lnk" does, that could probably be cached, so that next time "foo" is accessed, we go straight for "foo.lnk", and keep using that while it exists. If someone has both "foo" and "foo.lnk" in the same directory, that's a bit of a degenerate case; how important is it to be "correct", anyway. > So my proposal would be to add an option ('fslinktypes') to the CYGWIN > environment variable to define which types of links are supported: > default 'all'. which is an shortcut for 'native,lnk,lnkexe'. > So in case people do not want 'lnk' link support they just add > CYGWIN+=' fslinktypes:native' to env, to turn off support for > lnk/lnk.exe style links, and be happy. So this complements the winsymlinks option? winsymlinks has to do with how the Cygwin DLL creates symbolic links, whereas this has to do with what objects are recognized as links. The implementation would probably want to compare fslinktypes and winsymlinks to make sure they are harmonized together; if winsymlinks tells Cygwin to make .lnk files, but then fslinktypes banishes them, that's something diagnosable somehow. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: rfe: CYGWIN fslinktypes option? Re: Catastrophic Cygwin find . -ls, grep performance on samba share compared to WSL&Linux 2023-12-21 20:32 ` Kaz Kylheku @ 2023-12-24 0:47 ` Roland Mainz 2024-01-08 14:53 ` Corinna Vinschen 0 siblings, 1 reply; 15+ messages in thread From: Roland Mainz @ 2023-12-24 0:47 UTC (permalink / raw) To: cygwin On Thu, Dec 21, 2023 at 9:32 PM Kaz Kylheku via Cygwin <cygwin@cygwin.com> wrote: > On 2023-12-21 04:16, Martin Wege via Cygwin wrote: > > On Wed, Dec 20, 2023 at 6:21 PM Kaz Kylheku via Cygwin > > <cygwin@cygwin.com> wrote: [snip] > > The root cause is IMO the extra Win32 syscalls (>= 3 per file lookup, > > compared to 1 on Linux) to lookup the *.lnk and *.exe.lnk files on > > filesystems which have native link support (NTFS, ReFS, SMBFS, NFS). > > On SMBFS and NFS it hurts the most, because access latency is the > > highest for networked filesystems. > > Could some intelligent caching be added there? (Discussion of > associated invalidation problem in 3... 2.... 1... ) See below, basically a short-lived cache which is only valid for the lifetime of the one POSIX function call would be OK... > Can you discuss more details, so people don't have to dive into code > to understand it? If we are accessing some file "foo", the application > or user may actually be referring to a "foo.lnk" link. But in the > happy case that "foo" exists, why would we bother looking for "foo.lnk"? > > If "foo" does not exist, but "foo.lnk" does, that could probably be > cached, so that next time "foo" is accessed, we go straight for "foo.lnk", > and keep using that while it exists. > > If someone has both "foo" and "foo.lnk" in the same directory, > that's a bit of a degenerate case; how important is it to be "correct", > anyway. Question, mainly for Corinna: Could the code be modified to use one |NtQueryDirectoryFile()| call with a SINGLE pattern testing for { "foo", "foo.lnk", "foo.lnk.exe", ... } (instead of calling the kernel for each suffix independently) and cache that information for the lifetime of the matching POSIX function call ? The idea is to reduce the number of userland<--->kernel roundstrips from <n> to <1>, and filesystem drivers could be optimized even further (for example if the network filesystem protocol supports file name globbing...) ---- Bye, Roland -- __ . . __ (o.\ \/ /.o) roland.mainz@nrubsig.org \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer /O /==\ O\ TEL +49 641 3992797 (;O/ \/ \O;) ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: rfe: CYGWIN fslinktypes option? Re: Catastrophic Cygwin find . -ls, grep performance on samba share compared to WSL&Linux 2023-12-24 0:47 ` Roland Mainz @ 2024-01-08 14:53 ` Corinna Vinschen 0 siblings, 0 replies; 15+ messages in thread From: Corinna Vinschen @ 2024-01-08 14:53 UTC (permalink / raw) To: cygwin On Dec 24 01:47, Roland Mainz via Cygwin wrote: > On Thu, Dec 21, 2023 at 9:32 PM Kaz Kylheku via Cygwin > <cygwin@cygwin.com> wrote: > > On 2023-12-21 04:16, Martin Wege via Cygwin wrote: > > > On Wed, Dec 20, 2023 at 6:21 PM Kaz Kylheku via Cygwin > > > <cygwin@cygwin.com> wrote: > [snip] > > > The root cause is IMO the extra Win32 syscalls (>= 3 per file lookup, > > > compared to 1 on Linux) to lookup the *.lnk and *.exe.lnk files on > > > filesystems which have native link support (NTFS, ReFS, SMBFS, NFS). > > > On SMBFS and NFS it hurts the most, because access latency is the > > > highest for networked filesystems. > > > > Could some intelligent caching be added there? (Discussion of > > associated invalidation problem in 3... 2.... 1... ) > > See below, basically a short-lived cache which is only valid for the > lifetime of the one POSIX function call would be OK... > > > Can you discuss more details, so people don't have to dive into code > > to understand it? If we are accessing some file "foo", the application > > or user may actually be referring to a "foo.lnk" link. But in the > > happy case that "foo" exists, why would we bother looking for "foo.lnk"? > > > > If "foo" does not exist, but "foo.lnk" does, that could probably be > > cached, so that next time "foo" is accessed, we go straight for "foo.lnk", > > and keep using that while it exists. > > > > If someone has both "foo" and "foo.lnk" in the same directory, > > that's a bit of a degenerate case; how important is it to be "correct", > > anyway. > > Question, mainly for Corinna: > Could the code be modified to use one |NtQueryDirectoryFile()| call > with a SINGLE pattern testing for { "foo", "foo.lnk", "foo.lnk.exe", > ... } (instead of calling the kernel for each suffix independently) > and cache that information for the lifetime of the matching POSIX > function call ? Yes and no. This could certainly made to work, but it has a couple of caveats which are not trivial, and there's *no* guarantee that you will be able to get faster code by doing that. At all. First of all, in contrast to calling NtOpenFile on the file, NtQueryDirectoryFile always needs two calls, because you have to open the directory first. If you then found the file, you have to open the file to fetch information. So you have always one more call than by opening the file immediately and having immediate success. It's more or less equivalent if the file is a *.exe file, and it's one less hit if it's a *.lnk file. Which pattern would you like to use? Let's assume we carefully try to get rid of .exe.lnk, we still have to check for "foo", "foo.exe" and "foo.lnk". Even if we get rid of .lnk, we have two patterns which can *not* be expressed in a single call to NtQueryDirectoryFile. We only have Windows' most simple globbing, i. e., we have '*' and '?'. The only pattern matching "foo" and "foo.exe" is "foo*". "foo.*" does not hit on "foo". So "foo*". As you know, the NtQueryDirectoryFile call can return a buffer with multiple hits. But the buffer has a finite size, so if somebody is looking for the file "a", we'd have to look for "a*", which may have more hits than fit into the buffer, So the code has to be prepared not only to scan a 64K buffer for (potentially) hundrets of entries, but also to repeat the call to NtQueryDirectoryFile to load more matching file entries. Next problem, NFS. The current call just opening the file checks with the necessary flags to access symlinks. Without these flags, NFS symlinks are invisible or not handled as symlinks. So, right now, we have a single call on NFS to open a file, if it exists without suffix. If you use NtQueryDirectoryFile, you have another subtil problem. If it happens to be an NFS dir, you have to use another FILE_INFORMATION_CLASS, otherwise symlinks don't show up at all. This information clas isn't even sufficient for the most basic of information we need in the symlink_info::check method. So you need to open the file here, too, and extract the information. There's probably more to it, but that's just what came to mind for a start. > The idea is to reduce the number of userland<--->kernel roundstrips > from <n> to <1>, and filesystem drivers could be optimized even > further (for example if the network filesystem protocol supports file > name globbing...) I have a hard time to see that you can really avoid a lot of calls. You may find that you won't save a lot of them, and another lot of them don't matter becasue the OS already cached information. Also, as exciting as it might be to do extensive caching (and, as I wrote in a former reply today, we do some caching), keep in mind the we are only a user-space DLL. The only caching of file information you can rely upon is that of the kernel. Corinna ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: rfe: CYGWIN fslinktypes option? Re: Catastrophic Cygwin find . -ls, grep performance on samba share compared to WSL&Linux 2023-12-21 12:16 ` rfe: CYGWIN fslinktypes option? " Martin Wege 2023-12-21 16:10 ` Cedric Blancher 2023-12-21 20:32 ` Kaz Kylheku @ 2023-12-22 18:53 ` Andrey Repin 2 siblings, 0 replies; 15+ messages in thread From: Andrey Repin @ 2023-12-22 18:53 UTC (permalink / raw) To: Martin Wege, cygwin Greetings, Martin Wege! > The root cause is IMO the extra Win32 syscalls (>= 3 per file lookup, > compared to 1 on Linux) to lookup the *.lnk and *.exe.lnk files on > filesystems which have native link support (NTFS, ReFS, SMBFS, NFS). Except you require elevation to actually create symlinks. Or some special system configuration. > On SMBFS and NFS it hurts the most, because access latency is the > highest for networked filesystems. > So my proposal would be to add an option ('fslinktypes') to the CYGWIN > environment variable to define which types of links are supported: > default 'all'. which is an shortcut for 'native,lnk,lnkexe'. > So in case people do not want 'lnk' link support they just add > CYGWIN+=' fslinktypes:native' to env, to turn off support for > lnk/lnk.exe style links, and be happy. > @Corinna Vinschen Would that be acceptable? Make a patch to begin discussion. Also, not all mangling is meaningful to disable. F.e. disabling .exe magic on Windows would be surprising to the end user. -- With best regards, Andrey Repin Friday, December 22, 2023 21:50:58 Sorry for my terrible english... ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2024-01-08 14:53 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-12-06 4:08 Catastrophic Cygwin find . -ls, grep performance on samba share compared to WSL&Linux Dan Shelton 2023-12-18 6:22 ` Dan Shelton 2023-12-18 6:49 ` Marco Atzeri 2023-12-18 6:53 ` Dan Shelton 2023-12-18 7:05 ` Marco Atzeri 2023-12-18 7:16 ` Dan Shelton 2023-12-18 8:23 ` Marco Atzeri 2023-12-20 17:20 ` Kaz Kylheku 2023-12-21 12:16 ` rfe: CYGWIN fslinktypes option? " Martin Wege 2023-12-21 16:10 ` Cedric Blancher 2023-12-21 17:43 ` Brian Inglis 2023-12-21 20:32 ` Kaz Kylheku 2023-12-24 0:47 ` Roland Mainz 2024-01-08 14:53 ` Corinna Vinschen 2023-12-22 18:53 ` Andrey Repin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).