public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* Catastrophic Cygwin find . -ls, grep performance on samba share compared to WSL&Linux
@ 2023-12-06  4:08 Dan Shelton
  2023-12-18  6:22 ` Dan Shelton
  0 siblings, 1 reply; 15+ messages in thread
From: Dan Shelton @ 2023-12-06  4:08 UTC (permalink / raw)
  To: cygwin

Hello!
I am unhappy to report a severe performance issue with find -ls, ls -R
and grep -r, with Cygwin 3.4.9 and Cygwin 3.5.0 when samba shares are
involved.

Imagine a directory with 256 subdirs, and each has 256 files per
subdir, all on a samba share, samba server is on Linux with tmpfs.

mkdir dir1
for ((i=0;i<256;i++)) ; do
    mkdir "dir1/subdir$i"
    for ((j=0; j < 256;j++));do
        echo  "j=$j" >"dir1/subdir$i/j$j.txt"
    done
done

Time comparisations then show a dramatic difference, Debian Linux
accessing the samba share, WSL accessing the samba share, and Cygwin
accessing the samba share:
1. time find . >/dev/null
Cygwin 86 seconds
WSL 23 seconds
Debian 19 seconds

2. time find . -ls >/dev/null
Cygwin 129 seconds
WSL 38 seconds
Debian 32 seconds

3. time grep -r -E NOMATCH 2>/dev/null
Cygwin 390 seconds
WSL 144 seconds
Debian 141 seconds

So where does the bad Cygwin performance come from? Virus checker,
memory compression and other Windows services known to interfere with
benchmarking are OFF.

But the network trace shows a dramatic difference: While Debian and
WSL open files only once, the Cygwin run spends lots of network
traffic checking whether the txt files are txt.lnk, txt,bat.lnk and so
on, all non existent files.

Why does that happen?
-- 
Dan Shelton - Cluster Specialist Win/Lin/Bsd

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Catastrophic Cygwin find . -ls, grep performance on samba share compared to WSL&Linux
  2023-12-06  4:08 Catastrophic Cygwin find . -ls, grep performance on samba share compared to WSL&Linux Dan Shelton
@ 2023-12-18  6:22 ` Dan Shelton
  2023-12-18  6:49   ` Marco Atzeri
  2023-12-20 17:20   ` Kaz Kylheku
  0 siblings, 2 replies; 15+ messages in thread
From: Dan Shelton @ 2023-12-18  6:22 UTC (permalink / raw)
  To: cygwin

On Wed, 6 Dec 2023 at 05:08, Dan Shelton <dan.f.shelton@gmail.com> wrote:
>
> Hello!
> I am unhappy to report a severe performance issue with find -ls, ls -R
> and grep -r, with Cygwin 3.4.9 and Cygwin 3.5.0 when samba shares are
> involved.
>
> Imagine a directory with 256 subdirs, and each has 256 files per
> subdir, all on a samba share, samba server is on Linux with tmpfs.
>
> mkdir dir1
> for ((i=0;i<256;i++)) ; do
>     mkdir "dir1/subdir$i"
>     for ((j=0; j < 256;j++));do
>         echo  "j=$j" >"dir1/subdir$i/j$j.txt"
>     done
> done
>
> Time comparisations then show a dramatic difference, Debian Linux
> accessing the samba share, WSL accessing the samba share, and Cygwin
> accessing the samba share:
> 1. time find . >/dev/null
> Cygwin 86 seconds
> WSL 23 seconds
> Debian 19 seconds
>
> 2. time find . -ls >/dev/null
> Cygwin 129 seconds
> WSL 38 seconds
> Debian 32 seconds
>
> 3. time grep -r -E NOMATCH 2>/dev/null
> Cygwin 390 seconds
> WSL 144 seconds
> Debian 141 seconds
>
> So where does the bad Cygwin performance come from? Virus checker,
> memory compression and other Windows services known to interfere with
> benchmarking are OFF.
>
> But the network trace shows a dramatic difference: While Debian and
> WSL open files only once, the Cygwin run spends lots of network
> traffic checking whether the txt files are txt.lnk, txt,bat.lnk and so
> on, all non existent files.
>
> Why does that happen?

It would be nice if someone from the Cygwin authors could assist me in
figuring out why this happens.

My working theory is that the extra file and dir lookup calls are for
soft- and hardlink emulation for file systems which do not have soft-
or hardlinks?
If this is correct, then a fix might be to 1) determinate the
filesystem type (cached, per process lifetime in absence of
/etc/mnttab) and its boundaries (mount point, and whether other muont
points are below it) 2) Only use the emulation for FAT filesystems,
and for NTFS, REFS, SMBFS the native filesystem link is used.

Help!

Dan
-- 
Dan Shelton - Cluster Specialist Win/Lin/Bsd

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Catastrophic Cygwin find . -ls, grep performance on samba share compared to WSL&Linux
  2023-12-18  6:22 ` Dan Shelton
@ 2023-12-18  6:49   ` Marco Atzeri
  2023-12-18  6:53     ` Dan Shelton
  2023-12-20 17:20   ` Kaz Kylheku
  1 sibling, 1 reply; 15+ messages in thread
From: Marco Atzeri @ 2023-12-18  6:49 UTC (permalink / raw)
  To: cygwin

On 18/12/2023 07:22, Dan Shelton via Cygwin wrote:
> On Wed, 6 Dec 2023 at 05:08, Dan Shelton <dan.f.shelton@gmail.com> wrote:
>>
>> Hello!
>> I am unhappy to report a severe performance issue with find -ls, ls -R
>> and grep -r, with Cygwin 3.4.9 and Cygwin 3.5.0 when samba shares are
>> involved.
>>
>> Imagine a directory with 256 subdirs, and each has 256 files per
>> subdir, all on a samba share, samba server is on Linux with tmpfs.
>>
>> mkdir dir1
>> for ((i=0;i<256;i++)) ; do
>>      mkdir "dir1/subdir$i"
>>      for ((j=0; j < 256;j++));do
>>          echo  "j=$j" >"dir1/subdir$i/j$j.txt"
>>      done
>> done
>>
>> Time comparisations then show a dramatic difference, Debian Linux
>> accessing the samba share, WSL accessing the samba share, and Cygwin
>> accessing the samba share:
..
>> Why does that happen?
> 
> It would be nice if someone from the Cygwin authors could assist me in
> figuring out why this happens.
> 
> My working theory is that the extra file and dir lookup calls are for
> soft- and hardlink emulation for file systems which do not have soft-
> or hardlinks?
> If this is correct, then a fix might be to 1) determinate the
> filesystem type (cached, per process lifetime in absence of
> /etc/mnttab) and its boundaries (mount point, and whether other muont
> points are below it) 2) Only use the emulation for FAT filesystems,
> and for NTFS, REFS, SMBFS the native filesystem link is used.
> 
> Help!
> 
> Dan

Is your cygserver running ?




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Catastrophic Cygwin find . -ls, grep performance on samba share compared to WSL&Linux
  2023-12-18  6:49   ` Marco Atzeri
@ 2023-12-18  6:53     ` Dan Shelton
  2023-12-18  7:05       ` Marco Atzeri
  0 siblings, 1 reply; 15+ messages in thread
From: Dan Shelton @ 2023-12-18  6:53 UTC (permalink / raw)
  To: cygwin

On Mon, 18 Dec 2023 at 07:49, Marco Atzeri via Cygwin <cygwin@cygwin.com> wrote:
>
> On 18/12/2023 07:22, Dan Shelton via Cygwin wrote:
> > On Wed, 6 Dec 2023 at 05:08, Dan Shelton <dan.f.shelton@gmail.com> wrote:
> >>
> >> Hello!
> >> I am unhappy to report a severe performance issue with find -ls, ls -R
> >> and grep -r, with Cygwin 3.4.9 and Cygwin 3.5.0 when samba shares are
> >> involved.
> >>
> >> Imagine a directory with 256 subdirs, and each has 256 files per
> >> subdir, all on a samba share, samba server is on Linux with tmpfs.
> >>
> >> mkdir dir1
> >> for ((i=0;i<256;i++)) ; do
> >>      mkdir "dir1/subdir$i"
> >>      for ((j=0; j < 256;j++));do
> >>          echo  "j=$j" >"dir1/subdir$i/j$j.txt"
> >>      done
> >> done
> >>
> >> Time comparisations then show a dramatic difference, Debian Linux
> >> accessing the samba share, WSL accessing the samba share, and Cygwin
> >> accessing the samba share:
> ..
> >> Why does that happen?
> >
> > It would be nice if someone from the Cygwin authors could assist me in
> > figuring out why this happens.
> >
> > My working theory is that the extra file and dir lookup calls are for
> > soft- and hardlink emulation for file systems which do not have soft-
> > or hardlinks?
> > If this is correct, then a fix might be to 1) determinate the
> > filesystem type (cached, per process lifetime in absence of
> > /etc/mnttab) and its boundaries (mount point, and whether other muont
> > points are below it) 2) Only use the emulation for FAT filesystems,
> > and for NTFS, REFS, SMBFS the native filesystem link is used.
> >
> > Help!
> >
> > Dan
>
> Is your cygserver running ?

Yes, Cygserver is running

Dan
-- 
Dan Shelton - Cluster Specialist Win/Lin/Bsd

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Catastrophic Cygwin find . -ls, grep performance on samba share compared to WSL&Linux
  2023-12-18  6:53     ` Dan Shelton
@ 2023-12-18  7:05       ` Marco Atzeri
  2023-12-18  7:16         ` Dan Shelton
  0 siblings, 1 reply; 15+ messages in thread
From: Marco Atzeri @ 2023-12-18  7:05 UTC (permalink / raw)
  To: cygwin

On 18/12/2023 07:53, Dan Shelton via Cygwin wrote:
> On Mon, 18 Dec 2023 at 07:49, Marco Atzeri via Cygwin <cygwin@cygwin.com> wrote:
>>

>>
>> Is your cygserver running ?
> 
> Yes, Cygserver is running
> 
> Dan

Hi Dan,

the fact that you have only a factor 2 to 4 compared to WSL and Debian
tell me that Cygwin is very effective as User space enviroment.


1. time find . >/dev/null
Cygwin 86 seconds
WSL 23 seconds
Debian 19 seconds

2. time find . -ls >/dev/null
Cygwin 129 seconds
WSL 38 seconds
Debian 32 seconds

3. time grep -r -E NOMATCH 2>/dev/null
Cygwin 390 seconds
WSL 144 seconds
Debian 141 seconds

Cygwin can not go faster than the engine below and there are
several cubersome tricks to handle the POSIX compliance

I have seen worst timing trying to emulate  Unix Layer on top of
a not supporting Microsoft environment.

Regards
Marco



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Catastrophic Cygwin find . -ls, grep performance on samba share compared to WSL&Linux
  2023-12-18  7:05       ` Marco Atzeri
@ 2023-12-18  7:16         ` Dan Shelton
  2023-12-18  8:23           ` Marco Atzeri
  0 siblings, 1 reply; 15+ messages in thread
From: Dan Shelton @ 2023-12-18  7:16 UTC (permalink / raw)
  To: cygwin

On Mon, 18 Dec 2023 at 08:05, Marco Atzeri via Cygwin <cygwin@cygwin.com> wrote:
>
> On 18/12/2023 07:53, Dan Shelton via Cygwin wrote:
> > On Mon, 18 Dec 2023 at 07:49, Marco Atzeri via Cygwin <cygwin@cygwin.com> wrote:
> >>
>
> >>
> >> Is your cygserver running ?
> >
> > Yes, Cygserver is running
> >
> > Dan
>
> Hi Dan,
>
> the fact that you have only a factor 2 to 4 compared to WSL and Debian
> tell me that Cygwin is very effective as User space enviroment.
>
>
> 1. time find . >/dev/null
> Cygwin 86 seconds
> WSL 23 seconds
> Debian 19 seconds
>
> 2. time find . -ls >/dev/null
> Cygwin 129 seconds
> WSL 38 seconds
> Debian 32 seconds
>
> 3. time grep -r -E NOMATCH 2>/dev/null
> Cygwin 390 seconds
> WSL 144 seconds
> Debian 141 seconds
>
> Cygwin can not go faster than the engine below and there are
> several cubersome tricks to handle the POSIX compliance
>
> I have seen worst timing trying to emulate  Unix Layer on top of
> a not supporting Microsoft environment.

Sorry, but I disagree. I think that Cygwin could compete with WSL in
terms of performance.
I think the issue is just bad symlink emulation for filesystems which
do not need symlink emulation.

Dan
-- 
Dan Shelton - Cluster Specialist Win/Lin/Bsd

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Catastrophic Cygwin find . -ls, grep performance on samba share compared to WSL&Linux
  2023-12-18  7:16         ` Dan Shelton
@ 2023-12-18  8:23           ` Marco Atzeri
  0 siblings, 0 replies; 15+ messages in thread
From: Marco Atzeri @ 2023-12-18  8:23 UTC (permalink / raw)
  To: cygwin

On 18/12/2023 08:16, Dan Shelton via Cygwin wrote:
> On Mon, 18 Dec 2023 at 08:05, Marco Atzeri via Cygwin <cygwin@cygwin.com> wrote:
>>

>>
>> Cygwin can not go faster than the engine below and there are
>> several cubersome tricks to handle the POSIX compliance
>>
>> I have seen worst timing trying to emulate  Unix Layer on top of
>> a not supporting Microsoft environment.
> 
> Sorry, but I disagree. I think that Cygwin could compete with WSL in
> terms of performance.
> I think the issue is just bad symlink emulation for filesystems which
> do not need symlink emulation.
> 

https://cygwin.com/acronyms/#PTC

> Dan

Regards
Marco


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Catastrophic Cygwin find . -ls, grep performance on samba share compared to WSL&Linux
  2023-12-18  6:22 ` Dan Shelton
  2023-12-18  6:49   ` Marco Atzeri
@ 2023-12-20 17:20   ` Kaz Kylheku
  2023-12-21 12:16     ` rfe: CYGWIN fslinktypes option? " Martin Wege
  1 sibling, 1 reply; 15+ messages in thread
From: Kaz Kylheku @ 2023-12-20 17:20 UTC (permalink / raw)
  To: Dan Shelton; +Cc: cygwin

On 2023-12-17 22:22, Dan Shelton via Cygwin wrote:
> It would be nice if someone from the Cygwin authors could assist me in
> figuring out why this happens.

Cygwin is famously slow; this is nothing new. We are grateful
for Cygwin because it makes stuff work at all; if it were blazing
fast that would be a bonus.

E.g. git operations (clone, rebase, ...); ./configure scripts; ...: all
run like molasses.

The following is just my fast and loose opinion, shot from the hip,
and possibly off or wrong, but it likely has to do with the layering.
Cygwin's core API is based on a C library called Newlib. Cygwin bolts
Newlib to Windows by means of an additional shim below Newlib that
is based on C++ objects, where there is path munging going on and such,
and that's where the Win32 calls get made. It's an additional abstraction.

I worked with the internals a bit when producing the Cygnal
project.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* rfe: CYGWIN fslinktypes option? Re: Catastrophic Cygwin find . -ls, grep performance on samba share compared to WSL&Linux
  2023-12-20 17:20   ` Kaz Kylheku
@ 2023-12-21 12:16     ` Martin Wege
  2023-12-21 16:10       ` Cedric Blancher
                         ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Martin Wege @ 2023-12-21 12:16 UTC (permalink / raw)
  To: cygwin

On Wed, Dec 20, 2023 at 6:21 PM Kaz Kylheku via Cygwin
<cygwin@cygwin.com> wrote:
>
> On 2023-12-17 22:22, Dan Shelton via Cygwin wrote:
> > It would be nice if someone from the Cygwin authors could assist me in
> > figuring out why this happens.
>
> Cygwin is famously slow; this is nothing new. We are grateful
> for Cygwin because it makes stuff work at all; if it were blazing
> fast that would be a bonus.
>
> E.g. git operations (clone, rebase, ...); ./configure scripts; ...: all
> run like molasses.
>
> The following is just my fast and loose opinion, shot from the hip,
> and possibly off or wrong, but it likely has to do with the layering.
> Cygwin's core API is based on a C library called Newlib. Cygwin bolts
> Newlib to Windows by means of an additional shim below Newlib that
> is based on C++ objects, where there is path munging going on and such,
> and that's where the Win32 calls get made. It's an additional abstraction.

I disagree with that. Ok, part of that is that the layering causes
more memory allocations and copies, but this is not the root cause.

The root cause is IMO the extra Win32 syscalls (>= 3 per file lookup,
compared to 1 on Linux) to lookup the *.lnk and *.exe.lnk files on
filesystems which have native link support (NTFS, ReFS, SMBFS, NFS).
On SMBFS and NFS it hurts the most, because access latency is the
highest for networked filesystems.

So my proposal would be to add an option ('fslinktypes') to the CYGWIN
environment variable to define which types of links are supported:
default 'all'. which is an shortcut for 'native,lnk,lnkexe'.
So in case people do not want 'lnk' link support they just add
CYGWIN+=' fslinktypes:native' to env, to turn off support for
lnk/lnk.exe style links, and be happy.

@Corinna Vinschen Would that be acceptable?

Thanks,
Martin

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: rfe: CYGWIN fslinktypes option? Re: Catastrophic Cygwin find . -ls, grep performance on samba share compared to WSL&Linux
  2023-12-21 12:16     ` rfe: CYGWIN fslinktypes option? " Martin Wege
@ 2023-12-21 16:10       ` Cedric Blancher
  2023-12-21 17:43         ` Brian Inglis
  2023-12-21 20:32       ` Kaz Kylheku
  2023-12-22 18:53       ` Andrey Repin
  2 siblings, 1 reply; 15+ messages in thread
From: Cedric Blancher @ 2023-12-21 16:10 UTC (permalink / raw)
  To: cygwin

On Thu, 21 Dec 2023 at 13:17, Martin Wege via Cygwin <cygwin@cygwin.com> wrote:
>
> On Wed, Dec 20, 2023 at 6:21 PM Kaz Kylheku via Cygwin
> <cygwin@cygwin.com> wrote:
> >
> > On 2023-12-17 22:22, Dan Shelton via Cygwin wrote:
> > > It would be nice if someone from the Cygwin authors could assist me in
> > > figuring out why this happens.
> >
> > Cygwin is famously slow; this is nothing new. We are grateful
> > for Cygwin because it makes stuff work at all; if it were blazing
> > fast that would be a bonus.
> >
> > E.g. git operations (clone, rebase, ...); ./configure scripts; ...: all
> > run like molasses.
> >
> > The following is just my fast and loose opinion, shot from the hip,
> > and possibly off or wrong, but it likely has to do with the layering.
> > Cygwin's core API is based on a C library called Newlib. Cygwin bolts
> > Newlib to Windows by means of an additional shim below Newlib that
> > is based on C++ objects, where there is path munging going on and such,
> > and that's where the Win32 calls get made. It's an additional abstraction.
>
> I disagree with that. Ok, part of that is that the layering causes
> more memory allocations and copies, but this is not the root cause.
>
> The root cause is IMO the extra Win32 syscalls (>= 3 per file lookup,
> compared to 1 on Linux) to lookup the *.lnk and *.exe.lnk files on
> filesystems which have native link support (NTFS, ReFS, SMBFS, NFS).
> On SMBFS and NFS it hurts the most, because access latency is the
> highest for networked filesystems.
>
> So my proposal would be to add an option ('fslinktypes') to the CYGWIN
> environment variable to define which types of links are supported:
> default 'all'. which is an shortcut for 'native,lnk,lnkexe'.
> So in case people do not want 'lnk' link support they just add
> CYGWIN+=' fslinktypes:native' to env, to turn off support for
> lnk/lnk.exe style links, and be happy.
>
> @Corinna Vinschen Would that be acceptable?

+1 for this proposal, which is almost the same idea as I proposed in
https://www.mail-archive.com/cygwin@cygwin.com/msg174612.html

Ced
-- 
Cedric Blancher <cedric.blancher@gmail.com>
[https://plus.google.com/u/0/+CedricBlancher/]
Institute Pasteur

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: rfe: CYGWIN fslinktypes option? Re: Catastrophic Cygwin find . -ls, grep performance on samba share compared to WSL&Linux
  2023-12-21 16:10       ` Cedric Blancher
@ 2023-12-21 17:43         ` Brian Inglis
  0 siblings, 0 replies; 15+ messages in thread
From: Brian Inglis @ 2023-12-21 17:43 UTC (permalink / raw)
  To: cygwin

On 2023-12-21 09:10, Cedric Blancher via Cygwin wrote:
> On Thu, 21 Dec 2023 at 13:17, Martin Wege via Cygwin wrote:
>> On Wed, Dec 20, 2023 at 6:21 PM Kaz Kylheku via Cygwin wrote:
>>> On 2023-12-17 22:22, Dan Shelton via Cygwin wrote:
>>>> It would be nice if someone from the Cygwin authors could assist me in
>>>> figuring out why this happens.

>>> Cygwin is famously slow; this is nothing new. We are grateful
>>> for Cygwin because it makes stuff work at all; if it were blazing
>>> fast that would be a bonus.
>>>
>>> E.g. git operations (clone, rebase, ...); ./configure scripts; ...: all
>>> run like molasses.
>>>
>>> The following is just my fast and loose opinion, shot from the hip,
>>> and possibly off or wrong, but it likely has to do with the layering.
>>> Cygwin's core API is based on a C library called Newlib. Cygwin bolts
>>> Newlib to Windows by means of an additional shim below Newlib that
>>> is based on C++ objects, where there is path munging going on and such,
>>> and that's where the Win32 calls get made. It's an additional abstraction.

Cygwin is a newlib libc implementation providing some POSIX functionality using 
C++ functions calling x86_64 Windows functions, often entirely replacing a group 
of newlib functions, to support OS features or POSIX equivalents, including 
locales, UTF-8 and other multi-byte character sets, time zones, files, 
directories, processes, to provide as complete as possible hosted OS features, 
rather than newlib's usual base embedded RT (OS or not) targets.

>> I disagree with that. Ok, part of that is that the layering causes
>> more memory allocations and copies, but this is not the root cause.
>>
>> The root cause is IMO the extra Win32 syscalls (>= 3 per file lookup,
>> compared to 1 on Linux) to lookup the *.lnk and *.exe.lnk files on
>> filesystems which have native link support (NTFS, ReFS, SMBFS, NFS).
>> On SMBFS and NFS it hurts the most, because access latency is the
>> highest for networked filesystems.

Run some commands under strace to produce logs with timing info and tell us how 
much that is a time factor, relative to the Windows emulation time, and the 
application functions?

>> So my proposal would be to add an option ('fslinktypes') to the CYGWIN
>> environment variable to define which types of links are supported:
>> default 'all'. which is an shortcut for 'native,lnk,lnkexe'.
>> So in case people do not want 'lnk' link support they just add
>> CYGWIN+=' fslinktypes:native' to env, to turn off support for
>> lnk/lnk.exe style links, and be happy.
>>
>> @Corinna Vinschen Would that be acceptable?
> 
> +1 for this proposal, which is almost the same idea as I proposed in
> https://www.mail-archive.com/cygwin@cygwin.com/msg174612.html

We are all volunteers here, so you can clone the repo, install the cygwin 
package build deps, follow the build instructions, make the required changes, 
rebuild, install and test the dll, then git format-patch/send-email to 
cygwin-patches list for consideration.

-- 
Take care. Thanks, Brian Inglis              Calgary, Alberta, Canada

La perfection est atteinte                   Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter  not when there is no more to add
mais lorsqu'il n'y a plus rien à retirer     but when there is no more to cut
                                 -- Antoine de Saint-Exupéry

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: rfe: CYGWIN fslinktypes option? Re: Catastrophic Cygwin find . -ls, grep performance on samba share compared to WSL&Linux
  2023-12-21 12:16     ` rfe: CYGWIN fslinktypes option? " Martin Wege
  2023-12-21 16:10       ` Cedric Blancher
@ 2023-12-21 20:32       ` Kaz Kylheku
  2023-12-24  0:47         ` Roland Mainz
  2023-12-22 18:53       ` Andrey Repin
  2 siblings, 1 reply; 15+ messages in thread
From: Kaz Kylheku @ 2023-12-21 20:32 UTC (permalink / raw)
  To: Martin Wege; +Cc: cygwin

On 2023-12-21 04:16, Martin Wege via Cygwin wrote:
> On Wed, Dec 20, 2023 at 6:21 PM Kaz Kylheku via Cygwin
> <cygwin@cygwin.com> wrote:
>>
>> On 2023-12-17 22:22, Dan Shelton via Cygwin wrote:
>> > It would be nice if someone from the Cygwin authors could assist me in
>> > figuring out why this happens.
>>
>> Cygwin is famously slow; this is nothing new. We are grateful
>> for Cygwin because it makes stuff work at all; if it were blazing
>> fast that would be a bonus.
>>
>> E.g. git operations (clone, rebase, ...); ./configure scripts; ...: all
>> run like molasses.
>>
>> The following is just my fast and loose opinion, shot from the hip,
>> and possibly off or wrong, but it likely has to do with the layering.
>> Cygwin's core API is based on a C library called Newlib. Cygwin bolts
>> Newlib to Windows by means of an additional shim below Newlib that
>> is based on C++ objects, where there is path munging going on and such,
>> and that's where the Win32 calls get made. It's an additional abstraction.
> 
> I disagree with that. Ok, part of that is that the layering causes
> more memory allocations and copies, but this is not the root cause.

I seem to recall that most operations that take a path argument have
to convert the path from Cygwin to Win32, and I think that also involves
going from 8 bit to UTF-16 also. That's gotta hurt a bit.
 
> The root cause is IMO the extra Win32 syscalls (>= 3 per file lookup,
> compared to 1 on Linux) to lookup the *.lnk and *.exe.lnk files on
> filesystems which have native link support (NTFS, ReFS, SMBFS, NFS).
> On SMBFS and NFS it hurts the most, because access latency is the
> highest for networked filesystems.

Could some intelligent caching be added there? (Discussion of
associated invalidation problem in 3... 2.... 1... )

Can you discuss more details, so people don't have to dive into code
to understand it? If we are accessing some file "foo", the application
or user may actually be referring to a "foo.lnk" link. But in the
happy case that "foo" exists, why would we bother looking for "foo.lnk"?

If "foo" does not exist, but "foo.lnk" does, that could probably be
cached, so that next time "foo" is accessed, we go straight for "foo.lnk",
and keep using that while it exists.

If someone has both "foo" and "foo.lnk" in the same directory,
that's a bit of a degenerate case; how important is it to be "correct",
anyway.

> So my proposal would be to add an option ('fslinktypes') to the CYGWIN
> environment variable to define which types of links are supported:
> default 'all'. which is an shortcut for 'native,lnk,lnkexe'.
> So in case people do not want 'lnk' link support they just add
> CYGWIN+=' fslinktypes:native' to env, to turn off support for
> lnk/lnk.exe style links, and be happy.

So this complements the winsymlinks option? winsymlinks has to do
with how the Cygwin DLL creates symbolic links, whereas this has to do
with what objects are recognized as links.

The implementation would probably want to compare fslinktypes
and winsymlinks to make sure they are harmonized together;
if winsymlinks tells Cygwin to make .lnk files, but then fslinktypes
banishes them, that's something diagnosable somehow.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: rfe: CYGWIN fslinktypes option? Re: Catastrophic Cygwin find . -ls, grep  performance on samba share compared to WSL&Linux
  2023-12-21 12:16     ` rfe: CYGWIN fslinktypes option? " Martin Wege
  2023-12-21 16:10       ` Cedric Blancher
  2023-12-21 20:32       ` Kaz Kylheku
@ 2023-12-22 18:53       ` Andrey Repin
  2 siblings, 0 replies; 15+ messages in thread
From: Andrey Repin @ 2023-12-22 18:53 UTC (permalink / raw)
  To: Martin Wege, cygwin

Greetings, Martin Wege!

> The root cause is IMO the extra Win32 syscalls (>= 3 per file lookup,
> compared to 1 on Linux) to lookup the *.lnk and *.exe.lnk files on
> filesystems which have native link support (NTFS, ReFS, SMBFS, NFS).

Except you require elevation to actually create symlinks.
Or some special system configuration.

> On SMBFS and NFS it hurts the most, because access latency is the
> highest for networked filesystems.

> So my proposal would be to add an option ('fslinktypes') to the CYGWIN
> environment variable to define which types of links are supported:
> default 'all'. which is an shortcut for 'native,lnk,lnkexe'.
> So in case people do not want 'lnk' link support they just add
> CYGWIN+=' fslinktypes:native' to env, to turn off support for
> lnk/lnk.exe style links, and be happy.

> @Corinna Vinschen Would that be acceptable?

Make a patch to begin discussion.
Also, not all mangling is meaningful to disable. F.e. disabling .exe magic on
Windows would be surprising to the end user.


-- 
With best regards,
Andrey Repin
Friday, December 22, 2023 21:50:58

Sorry for my terrible english...


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: rfe: CYGWIN fslinktypes option? Re: Catastrophic Cygwin find . -ls, grep performance on samba share compared to WSL&Linux
  2023-12-21 20:32       ` Kaz Kylheku
@ 2023-12-24  0:47         ` Roland Mainz
  2024-01-08 14:53           ` Corinna Vinschen
  0 siblings, 1 reply; 15+ messages in thread
From: Roland Mainz @ 2023-12-24  0:47 UTC (permalink / raw)
  To: cygwin

On Thu, Dec 21, 2023 at 9:32 PM Kaz Kylheku via Cygwin
<cygwin@cygwin.com> wrote:
> On 2023-12-21 04:16, Martin Wege via Cygwin wrote:
> > On Wed, Dec 20, 2023 at 6:21 PM Kaz Kylheku via Cygwin
> > <cygwin@cygwin.com> wrote:
[snip]
> > The root cause is IMO the extra Win32 syscalls (>= 3 per file lookup,
> > compared to 1 on Linux) to lookup the *.lnk and *.exe.lnk files on
> > filesystems which have native link support (NTFS, ReFS, SMBFS, NFS).
> > On SMBFS and NFS it hurts the most, because access latency is the
> > highest for networked filesystems.
>
> Could some intelligent caching be added there? (Discussion of
> associated invalidation problem in 3... 2.... 1... )

See below, basically a short-lived cache which is only valid for the
lifetime of the one POSIX function call would be OK...

> Can you discuss more details, so people don't have to dive into code
> to understand it? If we are accessing some file "foo", the application
> or user may actually be referring to a "foo.lnk" link. But in the
> happy case that "foo" exists, why would we bother looking for "foo.lnk"?
>
> If "foo" does not exist, but "foo.lnk" does, that could probably be
> cached, so that next time "foo" is accessed, we go straight for "foo.lnk",
> and keep using that while it exists.
>
> If someone has both "foo" and "foo.lnk" in the same directory,
> that's a bit of a degenerate case; how important is it to be "correct",
> anyway.

Question, mainly for Corinna:
Could the code be modified to use one |NtQueryDirectoryFile()| call
with a SINGLE pattern testing for { "foo", "foo.lnk", "foo.lnk.exe",
... } (instead of calling the kernel for each suffix independently)
and cache that information for the lifetime of the matching POSIX
function call ?
The idea is to reduce the number of userland<--->kernel roundstrips
from <n> to <1>, and filesystem drivers could be optimized even
further (for example if the network filesystem protocol supports file
name globbing...)

----

Bye,
Roland
-- 
  __ .  . __
 (o.\ \/ /.o) roland.mainz@nrubsig.org
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 3992797
 (;O/ \/ \O;)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: rfe: CYGWIN fslinktypes option? Re: Catastrophic Cygwin find . -ls, grep performance on samba share compared to WSL&Linux
  2023-12-24  0:47         ` Roland Mainz
@ 2024-01-08 14:53           ` Corinna Vinschen
  0 siblings, 0 replies; 15+ messages in thread
From: Corinna Vinschen @ 2024-01-08 14:53 UTC (permalink / raw)
  To: cygwin

On Dec 24 01:47, Roland Mainz via Cygwin wrote:
> On Thu, Dec 21, 2023 at 9:32 PM Kaz Kylheku via Cygwin
> <cygwin@cygwin.com> wrote:
> > On 2023-12-21 04:16, Martin Wege via Cygwin wrote:
> > > On Wed, Dec 20, 2023 at 6:21 PM Kaz Kylheku via Cygwin
> > > <cygwin@cygwin.com> wrote:
> [snip]
> > > The root cause is IMO the extra Win32 syscalls (>= 3 per file lookup,
> > > compared to 1 on Linux) to lookup the *.lnk and *.exe.lnk files on
> > > filesystems which have native link support (NTFS, ReFS, SMBFS, NFS).
> > > On SMBFS and NFS it hurts the most, because access latency is the
> > > highest for networked filesystems.
> >
> > Could some intelligent caching be added there? (Discussion of
> > associated invalidation problem in 3... 2.... 1... )
> 
> See below, basically a short-lived cache which is only valid for the
> lifetime of the one POSIX function call would be OK...
> 
> > Can you discuss more details, so people don't have to dive into code
> > to understand it? If we are accessing some file "foo", the application
> > or user may actually be referring to a "foo.lnk" link. But in the
> > happy case that "foo" exists, why would we bother looking for "foo.lnk"?
> >
> > If "foo" does not exist, but "foo.lnk" does, that could probably be
> > cached, so that next time "foo" is accessed, we go straight for "foo.lnk",
> > and keep using that while it exists.
> >
> > If someone has both "foo" and "foo.lnk" in the same directory,
> > that's a bit of a degenerate case; how important is it to be "correct",
> > anyway.
> 
> Question, mainly for Corinna:
> Could the code be modified to use one |NtQueryDirectoryFile()| call
> with a SINGLE pattern testing for { "foo", "foo.lnk", "foo.lnk.exe",
> ... } (instead of calling the kernel for each suffix independently)
> and cache that information for the lifetime of the matching POSIX
> function call ?

Yes and no.  This could certainly made to work, but it has a couple
of caveats which are not trivial, and there's *no* guarantee that
you will be able to get faster code by doing that.  At all.

First of all, in contrast to calling NtOpenFile on the file,
NtQueryDirectoryFile always needs two calls, because you have to open
the directory first. If you then found the file, you have to open the
file to fetch information.  So you have always one more call than by
opening the file immediately and having immediate success.  It's more or
less equivalent if the file is a *.exe file, and it's one less hit if
it's a *.lnk file.

Which pattern would you like to use? Let's assume we carefully try to
get rid of .exe.lnk, we still have to check for "foo", "foo.exe" and
"foo.lnk".  Even if we get rid of .lnk, we have two patterns which
can *not* be expressed in a single call to NtQueryDirectoryFile.
We only have Windows' most simple globbing, i. e., we have '*' and '?'.
The only pattern matching "foo" and "foo.exe" is "foo*".  "foo.*"
does not hit on "foo". So "foo*".  As you know, the NtQueryDirectoryFile
call can return a buffer with multiple hits.  But the buffer has a 
finite size, so if somebody is looking for the file "a", we'd have to
look for "a*", which may have more hits than fit into the buffer,
So the code has to be prepared not only to scan a 64K buffer for
(potentially) hundrets of entries, but also to repeat the call to
NtQueryDirectoryFile to load more matching file entries.

Next problem, NFS.  The current call just opening the file checks with
the necessary flags to access symlinks.  Without these flags, NFS
symlinks are invisible or not handled as symlinks.  So, right now, we
have a single call on NFS to open a file, if it exists without suffix.
If you use NtQueryDirectoryFile, you have another subtil problem.  If it
happens to be an NFS dir, you have to use another FILE_INFORMATION_CLASS,
otherwise symlinks don't show up at all.  This information clas isn't
even sufficient for the most basic of information we need in the
symlink_info::check method. So you need to open the file here, too,
and extract the information.

There's probably more to it, but that's just what came to mind for
a start.

> The idea is to reduce the number of userland<--->kernel roundstrips
> from <n> to <1>, and filesystem drivers could be optimized even
> further (for example if the network filesystem protocol supports file
> name globbing...)

I have a hard time to see that you can really avoid a lot of calls.
You may find that you won't save a lot of them, and another lot
of them don't matter becasue the OS already cached information.

Also, as exciting as it might be to do extensive caching (and, as I
wrote in a former reply today, we do some caching), keep in mind the we
are only a user-space DLL.  The only caching of file information you
can rely upon is that of the kernel.


Corinna

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-01-08 14:53 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-06  4:08 Catastrophic Cygwin find . -ls, grep performance on samba share compared to WSL&Linux Dan Shelton
2023-12-18  6:22 ` Dan Shelton
2023-12-18  6:49   ` Marco Atzeri
2023-12-18  6:53     ` Dan Shelton
2023-12-18  7:05       ` Marco Atzeri
2023-12-18  7:16         ` Dan Shelton
2023-12-18  8:23           ` Marco Atzeri
2023-12-20 17:20   ` Kaz Kylheku
2023-12-21 12:16     ` rfe: CYGWIN fslinktypes option? " Martin Wege
2023-12-21 16:10       ` Cedric Blancher
2023-12-21 17:43         ` Brian Inglis
2023-12-21 20:32       ` Kaz Kylheku
2023-12-24  0:47         ` Roland Mainz
2024-01-08 14:53           ` Corinna Vinschen
2023-12-22 18:53       ` Andrey Repin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).