public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
From: Marco Mason <marco.mason@gmail.com>
To: cygwin@cygwin.com
Subject: Filesystem enumeration performance improvement
Date: Sun, 30 Sep 2018 18:41:00 -0000	[thread overview]
Message-ID: <CANNqMjAGEm64Z4ULhbe4KtcmT1Y7njOYoJCG6V_KbrFuj1dj=A@mail.gmail.com> (raw)

I recently upgraded from cygwin v2.10 to v2.11.1 and noticed that one of my
programs got a tremendous speed boost.  It's a custom filesystem
enumeration program whose output I feed to frcode to update the
/var/locatedb database.  It used to take quite a bit of time (15-20
minutes?), and now runs in about a minute.  Since the program seems to work
well, just many times faster, I'm rather happy with the changes.

The reason I'm writing is that I don't see *why* I should have any timing
changes at all!  The reason I have my own file enumerator for locatedb is
that the original went through the POSIX layer and was pretty slow,
especially for remote-mounts.  As I only needed enough for locate, I wrote
my own enumerator against the Windows API for speed.  Since my loop is
essentially just using FindFirstFile/FindNextFile and printf(), I don't
know why file gathering would be any faster.

So either printf() has gotten remarkably faster, or there are some
interactions between Cygwin and windows in the file enumeration area that
are surprising me.  Can someone please clue me in to what might be causing
the speed increases?

Looking at the git log and mailing list history, my best guess would be
that it's related to the EMail threads  "Why does readdir() open files ?"
(Ben Rubson 2018-03-28) and "Why does (stat() ?) open files ?" (Ben Rubson
2018-04-09).  However, I can't seem to pin down which git commits are
relevent to those threads.  If anyone can provide a little insight, I'd
really appreciate it.

--marco

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

             reply	other threads:[~2018-09-30 18:41 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-30 18:41 Marco Mason [this message]
2018-09-30 19:50 ` Jürgen Wagner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANNqMjAGEm64Z4ULhbe4KtcmT1Y7njOYoJCG6V_KbrFuj1dj=A@mail.gmail.com' \
    --to=marco.mason@gmail.com \
    --cc=cygwin@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).