public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* Curiosity about file access performance
@ 2021-10-29  9:35 Eliot Moss
  2021-10-29 10:14 ` Takashi Yano
  2021-10-29 10:44 ` Adam Dinwoodie
  0 siblings, 2 replies; 7+ messages in thread
From: Eliot Moss @ 2021-10-29  9:35 UTC (permalink / raw)
  To: cygwin

Dear Cygwiners -

I think a lot of us know that fork() under Cygwin is slower than on Linux and
have some grasp of why.  But I have noticed that file access is rather lower
under Cygwin as well.  My "poster child" for this is running latex.  I am
working on writing a book, which includes a huge number of LaTeX style files
and such.  Under WSL1 (which has the same fork cost issues as Cygwin for
similar reasons), reading the style files goes by in little more than the
blink of an eye (about 1 sec), while on Cygwin it takes a little over 17 seconds.

The time to process the body of the book is 23 seconds under WSL1 and 35 under
Cygwin.  So the total times are 53 seconds under Cygwin and 24 under WSL1.  I
believe the LaTeX installations are the same versions, and I get the same
outputs.  Both LaTeX's are 64 bit programs.  There is not much forking here
(at least I don't believe there is, but maybe there is under the cover for
doing things with pdf figures or something), but a fair amount of file I/O.

For many / most things, the Cygwin overhead is tolerable; for running this
book, since I will be doing it over and over, it was worth investing in
getting everything set up on WSL1.

But it got me wondering as to why?

Best wishes - Eliot

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Curiosity about file access performance
  2021-10-29  9:35 Curiosity about file access performance Eliot Moss
@ 2021-10-29 10:14 ` Takashi Yano
  2021-10-29 10:23   ` Eliot Moss
  2021-10-29 10:44 ` Adam Dinwoodie
  1 sibling, 1 reply; 7+ messages in thread
From: Takashi Yano @ 2021-10-29 10:14 UTC (permalink / raw)
  To: cygwin

On Fri, 29 Oct 2021 10:35:08 +0100
Eliot Moss wrote:
> I think a lot of us know that fork() under Cygwin is slower than on Linux and
> have some grasp of why.  But I have noticed that file access is rather lower
> under Cygwin as well.  My "poster child" for this is running latex.  I am
> working on writing a book, which includes a huge number of LaTeX style files
> and such.  Under WSL1 (which has the same fork cost issues as Cygwin for
> similar reasons), reading the style files goes by in little more than the
> blink of an eye (about 1 sec), while on Cygwin it takes a little over 17 seconds.
> 
> The time to process the body of the book is 23 seconds under WSL1 and 35 under
> Cygwin.  So the total times are 53 seconds under Cygwin and 24 under WSL1.  I
> believe the LaTeX installations are the same versions, and I get the same
> outputs.  Both LaTeX's are 64 bit programs.  There is not much forking here
> (at least I don't believe there is, but maybe there is under the cover for
> doing things with pdf figures or something), but a fair amount of file I/O.
> 
> For many / most things, the Cygwin overhead is tolerable; for running this
> book, since I will be doing it over and over, it was worth investing in
> getting everything set up on WSL1.
> 
> But it got me wondering as to why?

Why do you think the cause is the file access performance?
I tested the file access speed using dd as follows.

In cygwin:
[yano@Express5800-S70 ~]$ dd if=/dev/zero of=test.dat bs=1M count=500
500+0 records in
500+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 0.186714 s, 2.8 GB/s
[yano@Express5800-S70 ~]$ dd if=test.dat of=/dev/null bs=1M count=500
500+0 records in
500+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 0.125709 s, 4.2 GB/s

In WSL1:
Express5800-S70:~> dd if=/dev/zero of=test.dat bs=1M count=500
500+0 records in
500+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 0.301657 s, 1.7 GB/s
Express5800-S70:~> dd if=test.dat of=/dev/null bs=1M count=500
500+0 records in
500+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 0.229617 s, 2.3 GB/s

The result shows the file access performance of cygwin is
better than WSL1.

I think the cause of your problem is something other than
file access performance.

-- 
Takashi Yano <takashi.yano@nifty.ne.jp>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Curiosity about file access performance
  2021-10-29 10:14 ` Takashi Yano
@ 2021-10-29 10:23   ` Eliot Moss
  2021-10-29 10:47     ` Noel Grandin
  2021-10-29 18:33     ` bzs
  0 siblings, 2 replies; 7+ messages in thread
From: Eliot Moss @ 2021-10-29 10:23 UTC (permalink / raw)
  To: Takashi Yano, cygwin


Sorry, it could depend on what we mean by "file access", so allow me to try to
clarify.  I am grateful of your data since they show that raw data handling
speed is good.  But to read a file you have to open it.  I suspect that file
lookup and opening may be an issue.  Which remains me, I should check and see
if any of the TeX lookup paths are significantly different between the two
cases!

Best wishes - Eliot

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Curiosity about file access performance
  2021-10-29  9:35 Curiosity about file access performance Eliot Moss
  2021-10-29 10:14 ` Takashi Yano
@ 2021-10-29 10:44 ` Adam Dinwoodie
  2021-10-29 10:58   ` Eliot Moss
  1 sibling, 1 reply; 7+ messages in thread
From: Adam Dinwoodie @ 2021-10-29 10:44 UTC (permalink / raw)
  To: cygwin

On Fri, 29 Oct 2021 at 10:36, Eliot Moss <moss@cs.umass.edu> wrote:
> I think a lot of us know that fork() under Cygwin is slower than on Linux and
> have some grasp of why.  But I have noticed that file access is rather lower
> under Cygwin as well.  My "poster child" for this is running latex.  I am
> working on writing a book, which includes a huge number of LaTeX style files
> and such.  Under WSL1 (which has the same fork cost issues as Cygwin for
> similar reasons), reading the style files goes by in little more than the
> blink of an eye (about 1 sec), while on Cygwin it takes a little over 17 seconds.
>
> The time to process the body of the book is 23 seconds under WSL1 and 35 under
> Cygwin.  So the total times are 53 seconds under Cygwin and 24 under WSL1.  I
> believe the LaTeX installations are the same versions, and I get the same
> outputs.  Both LaTeX's are 64 bit programs.  There is not much forking here
> (at least I don't believe there is, but maybe there is under the cover for
> doing things with pdf figures or something), but a fair amount of file I/O.
>
> For many / most things, the Cygwin overhead is tolerable; for running this
> book, since I will be doing it over and over, it was worth investing in
> getting everything set up on WSL1.
>
> But it got me wondering as to why?

AIUI it's a fundamental part of the trade-offs that NTFS makes:
compared to common Linux file systems like ext4, NTFS is much slower
at things like parsing directory structures (which is a necessary part
of opening any given file). In the same way that native Windows
programs tend to use threading implementations that work differently
to fork(), native Windows applications will also often much prefer
large monolithic data files, where native *nix applications are much
more likely to have lots of small files. As a result, for things that
require opening lots of files, WSL (at least if you're using the
native WSL disk, which will be a *nix disk image stored in a file,
rather than files under /mnt/c or similar) will likely be quicker than
a similar operation through Cygwin, as Cygwin will always be affected
by those NTFS overheads.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Curiosity about file access performance
  2021-10-29 10:23   ` Eliot Moss
@ 2021-10-29 10:47     ` Noel Grandin
  2021-10-29 18:33     ` bzs
  1 sibling, 0 replies; 7+ messages in thread
From: Noel Grandin @ 2021-10-29 10:47 UTC (permalink / raw)
  To: moss, Takashi Yano, cygwin


There are a bunch of different possibilities

(*) temporary files - there was an improvement here in recent cygwin versions which means that if your machine has lots 
of memory and your program creates lot of temporary files, then it will now be significantly faster
(*) file name lookup - linux has a path name cache, which makes it quite a bit faster then Linux for heavy use (git is 
the poster child here)
(*) file information lookup - some of the "default" Unix APIs will look up a bunch of information which is cheap on 
unix, but expensive on Windows. Normally there are alternative API which will only load the minimal set of information, 
which will then be cheaper on Windows.
(*) spawning - it is quite possible that Latex is making heavy use of spawning child processes to do various things, 
which is unfortunately more expensive on Windows.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Curiosity about file access performance
  2021-10-29 10:44 ` Adam Dinwoodie
@ 2021-10-29 10:58   ` Eliot Moss
  0 siblings, 0 replies; 7+ messages in thread
From: Eliot Moss @ 2021-10-29 10:58 UTC (permalink / raw)
  To: Adam Dinwoodie, cygwin

On 10/29/2021 11:44 AM, Adam Dinwoodie wrote:

> AIUI it's a fundamental part of the trade-offs that NTFS makes:
> compared to common Linux file systems like ext4, NTFS is much slower
> at things like parsing directory structures (which is a necessary part
> of opening any given file). In the same way that native Windows
> programs tend to use threading implementations that work differently
> to fork(), native Windows applications will also often much prefer
> large monolithic data files, where native *nix applications are much
> more likely to have lots of small files. As a result, for things that
> require opening lots of files, WSL (at least if you're using the
> native WSL disk, which will be a *nix disk image stored in a file,
> rather than files under /mnt/c or similar) will likely be quicker than
> a similar operation through Cygwin, as Cygwin will always be affected
> by those NTFS overheads.

Ah, that's interesting.  The files in question, that seem to be opened
(and *maybe* read) faster are in the *nix hierarchy, while my book files
are all in Windows (/mnt/c on WSL1).  So the huge speedup reading those
makes sense.  The speedup processing the rest still doesn't quite make
sense, unless maybe WSL1's parsed-directory caching is more effective
than Cygwin's or something.  (I assume something like that is going on,
to reduce conversions of directories to *nix format.)

Regards - Eliot

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Curiosity about file access performance
  2021-10-29 10:23   ` Eliot Moss
  2021-10-29 10:47     ` Noel Grandin
@ 2021-10-29 18:33     ` bzs
  1 sibling, 0 replies; 7+ messages in thread
From: bzs @ 2021-10-29 18:33 UTC (permalink / raw)
  To: moss; +Cc: Takashi Yano, cygwin


I/O to/from /dev/zero or /dev/null could be special-cased.

Benchmarking file system performance can be fraught.

-- 
        -Barry Shein, co-author of nfsstones benchmark

Software Tool & Die    | bzs@TheWorld.com             | http://www.TheWorld.com
Purveyors to the Trade | Voice: +1 617-STD-WRLD       | 800-THE-WRLD
The World: Since 1989  | A Public Information Utility | *oo*

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-10-29 18:35 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-29  9:35 Curiosity about file access performance Eliot Moss
2021-10-29 10:14 ` Takashi Yano
2021-10-29 10:23   ` Eliot Moss
2021-10-29 10:47     ` Noel Grandin
2021-10-29 18:33     ` bzs
2021-10-29 10:44 ` Adam Dinwoodie
2021-10-29 10:58   ` Eliot Moss

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).