public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* Filesystem enumeration performance improvement
@ 2018-09-30 18:41 Marco Mason
  2018-09-30 19:50 ` Jürgen Wagner
  0 siblings, 1 reply; 2+ messages in thread
From: Marco Mason @ 2018-09-30 18:41 UTC (permalink / raw)
  To: cygwin

I recently upgraded from cygwin v2.10 to v2.11.1 and noticed that one of my
programs got a tremendous speed boost.  It's a custom filesystem
enumeration program whose output I feed to frcode to update the
/var/locatedb database.  It used to take quite a bit of time (15-20
minutes?), and now runs in about a minute.  Since the program seems to work
well, just many times faster, I'm rather happy with the changes.

The reason I'm writing is that I don't see *why* I should have any timing
changes at all!  The reason I have my own file enumerator for locatedb is
that the original went through the POSIX layer and was pretty slow,
especially for remote-mounts.  As I only needed enough for locate, I wrote
my own enumerator against the Windows API for speed.  Since my loop is
essentially just using FindFirstFile/FindNextFile and printf(), I don't
know why file gathering would be any faster.

So either printf() has gotten remarkably faster, or there are some
interactions between Cygwin and windows in the file enumeration area that
are surprising me.  Can someone please clue me in to what might be causing
the speed increases?

Looking at the git log and mailing list history, my best guess would be
that it's related to the EMail threads  "Why does readdir() open files ?"
(Ben Rubson 2018-03-28) and "Why does (stat() ?) open files ?" (Ben Rubson
2018-04-09).  However, I can't seem to pin down which git commits are
relevent to those threads.  If anyone can provide a little insight, I'd
really appreciate it.

--marco

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Filesystem enumeration performance improvement
  2018-09-30 18:41 Filesystem enumeration performance improvement Marco Mason
@ 2018-09-30 19:50 ` Jürgen Wagner
  0 siblings, 0 replies; 2+ messages in thread
From: Jürgen Wagner @ 2018-09-30 19:50 UTC (permalink / raw)
  To: cygwin, marco.mason

Hi Marco,
  as you don't use the Cygwin APIs but go to the Windows APIs directly,
any changes to the way stat()/readdir() or related functions in Cygwin
operate do not seem to be a plausible reason why your code is running
faster. I doubt printf() can be improved to provide such a dramatic
speed-up.

In my experience, such effects usually have one of two reasons:

- There is some caching involved, either in Windows or on the disk
level. Run the benchmark tests with empty caches or caching disabled.

- Your virus scanner has improved and the operation of determining the
status of files no longer excessively causes checks. This is a bit
harder to verify or test.

Did you compare your program's performance with that of Cygwin's "find"?
Did that also show such a dramatic increase in throughput?
There is a free and quite fast disk space analyzer called RidNacs
(ScanDisk backwards). If the magic you observe is an optimized way of
caching, this program should also be affected.



Cheers,
--J.


On 30.09.2018 20:41, Marco Mason wrote:
> I recently upgraded from cygwin v2.10 to v2.11.1 and noticed that one of my
> programs got a tremendous speed boost.  It's a custom filesystem
> enumeration program whose output I feed to frcode to update the
> /var/locatedb database.  It used to take quite a bit of time (15-20
> minutes?), and now runs in about a minute.  Since the program seems to work
> well, just many times faster, I'm rather happy with the changes.
>
> The reason I'm writing is that I don't see *why* I should have any timing
> changes at all!  The reason I have my own file enumerator for locatedb is
> that the original went through the POSIX layer and was pretty slow,
> especially for remote-mounts.  As I only needed enough for locate, I wrote
> my own enumerator against the Windows API for speed.  Since my loop is
> essentially just using FindFirstFile/FindNextFile and printf(), I don't
> know why file gathering would be any faster.
>
> So either printf() has gotten remarkably faster, or there are some
> interactions between Cygwin and windows in the file enumeration area that
> are surprising me.  Can someone please clue me in to what might be causing
> the speed increases?
>
> Looking at the git log and mailing list history, my best guess would be
> that it's related to the EMail threads  "Why does readdir() open files ?"
> (Ben Rubson 2018-03-28) and "Why does (stat() ?) open files ?" (Ben Rubson
> 2018-04-09).  However, I can't seem to pin down which git commits are
> relevent to those threads.  If anyone can provide a little insight, I'd
> really appreciate it.
>
> --marco
>
> --
> Problem reports:       http://cygwin.com/problems.html
> FAQ:                   http://cygwin.com/faq/
> Documentation:         http://cygwin.com/docs.html
> Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
>
>



--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-09-30 19:50 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-30 18:41 Filesystem enumeration performance improvement Marco Mason
2018-09-30 19:50 ` Jürgen Wagner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).