public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* Question about slow access to file information
@ 2023-01-14  0:42 Eliot Moss
  2023-01-14 13:45 ` Adam Dinwoodie
  2023-01-14 16:38 ` Christian Franke
  0 siblings, 2 replies; 6+ messages in thread
From: Eliot Moss @ 2023-01-14  0:42 UTC (permalink / raw)
  To: cygwin

Dear Cygwin'ers -

I have a separate drive mounted this way:

d:/ /cygdrive/d ntfs binary,posix=0,user,noacl,auto 0 0

One thing I use it for is to store backup files.  These tend to be 2 Gb
chunks, and there can be hundreds of them in the backup directory.  (The drive
is 5Tb.)  The Windows Disk Management tool describes it as NTFS, Basic Data
Partition.

Doing ls (for example) takes a very perceptible numbers of seconds (though
whatever takes a long time seems to be cached, at least for a while, since a
second ls soon after is fast).

Windows Explorer (for example) and CMD do not seem to suffer this delay.

Any notion as to what is happening and what I might do to ameliorate it?

If it matters, the drive is removable (an external WD MyPassport hard drive).

Regards - Eliot Moss

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Question about slow access to file information
  2023-01-14  0:42 Question about slow access to file information Eliot Moss
@ 2023-01-14 13:45 ` Adam Dinwoodie
  2023-01-14 16:38 ` Christian Franke
  1 sibling, 0 replies; 6+ messages in thread
From: Adam Dinwoodie @ 2023-01-14 13:45 UTC (permalink / raw)
  To: cygwin

On Sat, Jan 14, 2023 at 11:42:58AM +1100, Eliot Moss via Cygwin wrote:
> Dear Cygwin'ers -
> 
> I have a separate drive mounted this way:
> 
> d:/ /cygdrive/d ntfs binary,posix=0,user,noacl,auto 0 0
> 
> One thing I use it for is to store backup files.  These tend to be 2 Gb
> chunks, and there can be hundreds of them in the backup directory.  (The drive
> is 5Tb.)  The Windows Disk Management tool describes it as NTFS, Basic Data
> Partition.
> 
> Doing ls (for example) takes a very perceptible numbers of seconds (though
> whatever takes a long time seems to be cached, at least for a while, since a
> second ls soon after is fast).
> 
> Windows Explorer (for example) and CMD do not seem to suffer this delay.
> 
> Any notion as to what is happening and what I might do to ameliorate it?
> 
> If it matters, the drive is removable (an external WD MyPassport hard drive).

I *suspect* this will be an issue with `ls` querying some file
metadata that are relatively slow to get out of an NTFS system, to
provide a similar interface to native *nix systems, where Windows' tools
unsurprisigly care more about the sorts of file properties that Windows
filesystems are better optimised for.

Based on experience, you might find using `ls --color=never` to be
quicker: querying some of the properties that `ls` likes to use for
colouring the output seems to require a bunch of extra queries to the
filesystem.  Failing that, if you have control over the directory
layout, making the structure deeper with fewer objects in each directory
will probably help.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Question about slow access to file information
  2023-01-14  0:42 Question about slow access to file information Eliot Moss
  2023-01-14 13:45 ` Adam Dinwoodie
@ 2023-01-14 16:38 ` Christian Franke
  2023-01-15  1:05   ` Eliot Moss
  1 sibling, 1 reply; 6+ messages in thread
From: Christian Franke @ 2023-01-14 16:38 UTC (permalink / raw)
  To: cygwin

Eliot Moss via Cygwin wrote:
> I have a separate drive mounted this way:
>
> d:/ /cygdrive/d ntfs binary,posix=0,user,noacl,auto 0 0
>
> One thing I use it for is to store backup files.  These tend to be 2 Gb
> chunks, and there can be hundreds of them in the backup directory.  
> (The drive
> is 5Tb.)  The Windows Disk Management tool describes it as NTFS, Basic 
> Data
> Partition.
>
> Doing ls (for example) takes a very perceptible numbers of seconds 
> (though
> whatever takes a long time seems to be cached, at least for a while, 
> since a
> second ls soon after is fast).

The problem is the 'noacl' mount option and the fact that POSIX only 
offers the *stat*() functions to retrieve file information. These 
functions always need to provide the full file information, even if only 
a small subset is needed.

To determine the 'x'-permission bits in the 'stat.st_mode' field on a 
'noacl'-mount, Cygwin reads the first bytes of most files (all except 
*.exe, *.lnk, *.com). The 'x' bits are set if the file starts with "#!" 
(script), ":\n" (?) or "MZ" (Windows executable).

On 'noacl' mounts, this behavior could be suppressed by 'exec' or 
'noexec' mount options.

-- 
Regards,
Christian


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Question about slow access to file information
  2023-01-14 16:38 ` Christian Franke
@ 2023-01-15  1:05   ` Eliot Moss
  2023-01-15  3:24     ` gs-cygwin.com
  2023-01-17 15:21     ` Christian Franke
  0 siblings, 2 replies; 6+ messages in thread
From: Eliot Moss @ 2023-01-15  1:05 UTC (permalink / raw)
  To: cygwin

On 1/15/2023 3:38 AM, Christian Franke via Cygwin wrote:
> Eliot Moss via Cygwin wrote:
>> I have a separate drive mounted this way:
>>
>> d:/ /cygdrive/d ntfs binary,posix=0,user,noacl,auto 0 0
>>
>> One thing I use it for is to store backup files.  These tend to be 2 Gb
>> chunks, and there can be hundreds of them in the backup directory. (The drive
>> is 5Tb.)  The Windows Disk Management tool describes it as NTFS, Basic Data
>> Partition.
>>
>> Doing ls (for example) takes a very perceptible numbers of seconds (though
>> whatever takes a long time seems to be cached, at least for a while, since a
>> second ls soon after is fast).
> 
> The problem is the 'noacl' mount option and the fact that POSIX only offers the *stat*() functions 
> to retrieve file information. These functions always need to provide the full file information, even 
> if only a small subset is needed.
> 
> To determine the 'x'-permission bits in the 'stat.st_mode' field on a 'noacl'-mount, Cygwin reads 
> the first bytes of most files (all except *.exe, *.lnk, *.com). The 'x' bits are set if the file 
> starts with "#!" (script), ":\n" (?) or "MZ" (Windows executable).
> 
> On 'noacl' mounts, this behavior could be suppressed by 'exec' or 'noexec' mount options.

Interesting.  I removed the noacl from /etc/fstab and restarted all Cygwin processes.
The mount program now shows that drive without noacl.  It still takes surprisingly
long to ls if I have not done so recently.  The directory contains ~1200 files.

Further thoughts?

EM

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Question about slow access to file information
  2023-01-15  1:05   ` Eliot Moss
@ 2023-01-15  3:24     ` gs-cygwin.com
  2023-01-17 15:21     ` Christian Franke
  1 sibling, 0 replies; 6+ messages in thread
From: gs-cygwin.com @ 2023-01-15  3:24 UTC (permalink / raw)
  To: moss; +Cc: cygwin

On Sun, Jan 15, 2023 at 12:05:10PM +1100, Eliot Moss via Cygwin wrote:
> On 1/15/2023 3:38 AM, Christian Franke via Cygwin wrote:
> > Eliot Moss via Cygwin wrote:
> > > I have a separate drive mounted this way:
> > > 
> > > d:/ /cygdrive/d ntfs binary,posix=0,user,noacl,auto 0 0
> > > 
> > > One thing I use it for is to store backup files.  These tend to be 2 Gb
> > > chunks, and there can be hundreds of them in the backup directory. (The drive
> > > is 5Tb.)  The Windows Disk Management tool describes it as NTFS, Basic Data
> > > Partition.
> > > 
> > > Doing ls (for example) takes a very perceptible numbers of seconds (though
> > > whatever takes a long time seems to be cached, at least for a while, since a
> > > second ls soon after is fast).
> > 
> > The problem is the 'noacl' mount option and the fact that POSIX only
> > offers the *stat*() functions to retrieve file information. These
> > functions always need to provide the full file information, even if only
> > a small subset is needed.
> > 
> > To determine the 'x'-permission bits in the 'stat.st_mode' field on a
> > 'noacl'-mount, Cygwin reads the first bytes of most files (all except
> > *.exe, *.lnk, *.com). The 'x' bits are set if the file starts with "#!"
> > (script), ":\n" (?) or "MZ" (Windows executable).
> > 
> > On 'noacl' mounts, this behavior could be suppressed by 'exec' or 'noexec' mount options.
> 
> Interesting.  I removed the noacl from /etc/fstab and restarted all Cygwin processes.
> The mount program now shows that drive without noacl.  It still takes surprisingly
> long to ls if I have not done so recently.  The directory contains ~1200 files.
> 
> Further thoughts?

Does this make any difference?
$ env - LANG=C ls -f /cygdrive/d/

Also, ISTR prior mailing list postings on how cygwin may open() each
file to determine some info, and that can be expensive.  Is that what is
happening if you trace the 'ls'?

Cheers, Glenn

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Question about slow access to file information
  2023-01-15  1:05   ` Eliot Moss
  2023-01-15  3:24     ` gs-cygwin.com
@ 2023-01-17 15:21     ` Christian Franke
  1 sibling, 0 replies; 6+ messages in thread
From: Christian Franke @ 2023-01-17 15:21 UTC (permalink / raw)
  To: cygwin

Eliot Moss via Cygwin wrote:
> On 1/15/2023 3:38 AM, Christian Franke via Cygwin wrote:
>> Eliot Moss via Cygwin wrote:
>>> I have a separate drive mounted this way:
>>>
>>> d:/ /cygdrive/d ntfs binary,posix=0,user,noacl,auto 0 0
>>>
>>> One thing I use it for is to store backup files.  These tend to be 2 Gb
>>> chunks, and there can be hundreds of them in the backup directory. 
>>> (The drive
>>> is 5Tb.)  The Windows Disk Management tool describes it as NTFS, 
>>> Basic Data
>>> Partition.
>>>
>>> Doing ls (for example) takes a very perceptible numbers of seconds 
>>> (though
>>> whatever takes a long time seems to be cached, at least for a while, 
>>> since a
>>> second ls soon after is fast).
>>
>> The problem is the 'noacl' mount option and the fact that POSIX only 
>> offers the *stat*() functions to retrieve file information. These 
>> functions always need to provide the full file information, even if 
>> only a small subset is needed.
>>
>> To determine the 'x'-permission bits in the 'stat.st_mode' field on a 
>> 'noacl'-mount, Cygwin reads the first bytes of most files (all except 
>> *.exe, *.lnk, *.com). The 'x' bits are set if the file starts with 
>> "#!" (script), ":\n" (?) or "MZ" (Windows executable).
>>
>> On 'noacl' mounts, this behavior could be suppressed by 'exec' or 
>> 'noexec' mount options.
>
> Interesting.  I removed the noacl from /etc/fstab and restarted all 
> Cygwin processes.
> The mount program now shows that drive without noacl.  It still takes 
> surprisingly
> long to ls if I have not done so recently.  The directory contains 
> ~1200 files.

This depends on storage device, sometimes (HDD) on filesystem 
fragmentation and always on 'ls' options. Plain '/bin/ls' without any 
arguments does not call stat(). 'ls -s' or 'ls --color=yes' call stat() 
for each file. 'ls -l' additionally calls getfacl() for each file if on 
an 'acl' mount. The latter is apparently slower than expected, see below.

Here a quick test on a directory with 10000 ~3KB files on a NTFS USB 
drive connected via USB-2 (~28MB/s raw read speed). The first test of 
each mount variant was done immediately after connecting the drive:

$ TIMEFORMAT='%R'

1. mount [-o acl]

$ time ls -l > /dev/null
4.282
$ time ls -l > /dev/null
1.322
$ time ls -s > /dev/null
0.404
$ time ls > /dev/null
0.032


2. mount -o noacl

$ time ls -l > /dev/null
13.452
$ time ls -l > /dev/null
0.789
$ time ls -s > /dev/null
0.764
$ time ls > /dev/null
0.033


3. mount -o noacl,noexec

$ time ls -l > /dev/null
3.215
$ time ls -l > /dev/null
0.368
$ time ls -s > /dev/null
0.355
$ time ls > /dev/null
0.032

-- 
Regards,
Christian


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-01-17 15:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-14  0:42 Question about slow access to file information Eliot Moss
2023-01-14 13:45 ` Adam Dinwoodie
2023-01-14 16:38 ` Christian Franke
2023-01-15  1:05   ` Eliot Moss
2023-01-15  3:24     ` gs-cygwin.com
2023-01-17 15:21     ` Christian Franke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).