public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
From: Stephen John Smoogen <smooge@gmail.com>
To: cygwin@cygwin.com
Subject: Re: rsync and ls -lR slow for directories with many files
Date: Sun, 05 Jan 2020 21:22:00 -0000	[thread overview]
Message-ID: <CANnLRdjZbTrfjf5roqwZBcBUfLzb6MJfsyJRyfmWCFivWrJzaA@mail.gmail.com> (raw)
In-Reply-To: <8582CD6F-C872-41FB-9425-2CBD1126AE33@plutonium24.de>

On Sat, 4 Jan 2020 at 17:16, <muell@plutonium24.de> wrote:
>
> I am running rsync on a small linux server to synchronize files in one directory and its subdirectories from Windows (using sshd from Cygwin) to this server for backup purposes. The directory contains almost 1 TB of images and videos in about 160k files on a slow disk (Seagate Archive 8TB with SMR) with NTFS.

I am not sure if the Linux box has the slow disk or the Windows box
has the slow disk.

> Even if there are no changes and whith whole file transfers rsync takes about 45 minutes to come to this conclusion.
> I am using the following command line on the linux server:
>
> rsync -avx --stats --whole-file --no-perms --no-owner --no-group <user>@<server>:<source directory> <local destination directory>
>
> As rsync was only transferring a small number of bytes and gave no clue to the cause for being so slow and as rsync should only need filenames, dates and sizes I did a "ls -lR|wc" on both systems. On the linux server this took about 1 minute (only slightly faster magnetic disk, empty read cache at start) and doing the same on cygwin took almost as long as rsync (over 40 minutes). Using Windows Explorer (after a reboot to guarantee that the cache is empty) to get the total number of files and the total size took only a few seconds. Reading all file sizes with Treesize also took less than one minute. As ls -lR needs the same information I would have expected it to take the same time.


I would add a bunch of verbose to the rsync to see what it is doing.
(I don't recommend sending that to the list as it will be a lot of
data.. but maybe an excerpt) I am expecting it is spending a lot of
time getting the metadata off of one of the disks and mapping it to
Unix permissions then comparing if those items are the same on the
other side. Each one of those is going to be a separate action which
on a slow drive may be a spinup/get-data/spindown cycle to make it
even slower.

I would then check to see if perms and metadata on that directory
'look sane' (this is highly dependent on your environment.. if you
have an AD server giving out perms it will look different from other
things.) If the lookups for mapping metadata permissions is having to
ping an AD server or some sort of other network lookup that is going
to also slow down things.

Sorry I don't have any 'fixes'. I have always found large rsync
between Windows and Unix to be slow.

> Runnin "ls -lR" a second time on Cygwin is fast as lightning as it only takes less than 30s.
>
> Is there any way to get ls -lR or better rsync as fast as listing the directory with Windows tools?
>
> Frank
>
> --
> Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.
> --
> Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.
> --
> Problem reports:       http://cygwin.com/problems.html
> FAQ:                   http://cygwin.com/faq/
> Documentation:         http://cygwin.com/docs.html
> Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
>


-- 
Stephen J Smoogen.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

  reply	other threads:[~2020-01-05 21:22 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-04 22:15 muell
2020-01-05 21:22 ` Stephen John Smoogen [this message]
2020-01-08 16:43   ` Frank-Ulrich Sommer
2020-01-30  3:52     ` L A Walsh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANnLRdjZbTrfjf5roqwZBcBUfLzb6MJfsyJRyfmWCFivWrJzaA@mail.gmail.com \
    --to=smooge@gmail.com \
    --cc=cygwin@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).