public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
From: Dan Shelton <dan.f.shelton@gmail.com>
To: cygwin@cygwin.com
Subject: Re: Catastrophic Cygwin find . -ls, grep performance on samba share compared to WSL&Linux
Date: Mon, 18 Dec 2023 07:22:24 +0100	[thread overview]
Message-ID: <CAAvCNcB0_0ZeujP23QZFZaDvVTh5rxbXJw4FP6uXNPErCgdZ2w@mail.gmail.com> (raw)
In-Reply-To: <CAAvCNcBZGepZMP9Q0D5ua+6ACftDOQEriqnuCbwg6umBPUA72Q@mail.gmail.com>

On Wed, 6 Dec 2023 at 05:08, Dan Shelton <dan.f.shelton@gmail.com> wrote:
>
> Hello!
> I am unhappy to report a severe performance issue with find -ls, ls -R
> and grep -r, with Cygwin 3.4.9 and Cygwin 3.5.0 when samba shares are
> involved.
>
> Imagine a directory with 256 subdirs, and each has 256 files per
> subdir, all on a samba share, samba server is on Linux with tmpfs.
>
> mkdir dir1
> for ((i=0;i<256;i++)) ; do
>     mkdir "dir1/subdir$i"
>     for ((j=0; j < 256;j++));do
>         echo  "j=$j" >"dir1/subdir$i/j$j.txt"
>     done
> done
>
> Time comparisations then show a dramatic difference, Debian Linux
> accessing the samba share, WSL accessing the samba share, and Cygwin
> accessing the samba share:
> 1. time find . >/dev/null
> Cygwin 86 seconds
> WSL 23 seconds
> Debian 19 seconds
>
> 2. time find . -ls >/dev/null
> Cygwin 129 seconds
> WSL 38 seconds
> Debian 32 seconds
>
> 3. time grep -r -E NOMATCH 2>/dev/null
> Cygwin 390 seconds
> WSL 144 seconds
> Debian 141 seconds
>
> So where does the bad Cygwin performance come from? Virus checker,
> memory compression and other Windows services known to interfere with
> benchmarking are OFF.
>
> But the network trace shows a dramatic difference: While Debian and
> WSL open files only once, the Cygwin run spends lots of network
> traffic checking whether the txt files are txt.lnk, txt,bat.lnk and so
> on, all non existent files.
>
> Why does that happen?

It would be nice if someone from the Cygwin authors could assist me in
figuring out why this happens.

My working theory is that the extra file and dir lookup calls are for
soft- and hardlink emulation for file systems which do not have soft-
or hardlinks?
If this is correct, then a fix might be to 1) determinate the
filesystem type (cached, per process lifetime in absence of
/etc/mnttab) and its boundaries (mount point, and whether other muont
points are below it) 2) Only use the emulation for FAT filesystems,
and for NTFS, REFS, SMBFS the native filesystem link is used.

Help!

Dan
-- 
Dan Shelton - Cluster Specialist Win/Lin/Bsd

  reply	other threads:[~2023-12-18  6:22 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-06  4:08 Dan Shelton
2023-12-18  6:22 ` Dan Shelton [this message]
2023-12-18  6:49   ` Marco Atzeri
2023-12-18  6:53     ` Dan Shelton
2023-12-18  7:05       ` Marco Atzeri
2023-12-18  7:16         ` Dan Shelton
2023-12-18  8:23           ` Marco Atzeri
2023-12-20 17:20   ` Kaz Kylheku
2023-12-21 12:16     ` rfe: CYGWIN fslinktypes option? " Martin Wege
2023-12-21 16:10       ` Cedric Blancher
2023-12-21 17:43         ` Brian Inglis
2023-12-21 20:32       ` Kaz Kylheku
2023-12-24  0:47         ` Roland Mainz
2024-01-08 14:53           ` Corinna Vinschen
2023-12-22 18:53       ` Andrey Repin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAAvCNcB0_0ZeujP23QZFZaDvVTh5rxbXJw4FP6uXNPErCgdZ2w@mail.gmail.com \
    --to=dan.f.shelton@gmail.com \
    --cc=cygwin@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).