public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* Optimizing away "ReadFile" calls when Make calls stat()
@ 2001-02-13 10:36 Jonathan Kamens
  2001-02-13 10:56 ` Larry Hall (RFK Partners, Inc)
                   ` (2 more replies)
  0 siblings, 3 replies; 50+ messages in thread
From: Jonathan Kamens @ 2001-02-13 10:36 UTC (permalink / raw)
  To: cygwin

We use Cygwin to develop a large product (running a build and the test
suites takes about two hours on a very fast machine); our builds are
driven by GNU Make.  We compile and test the same product under Linux.
We've found that builds under Cygwin run several times slower than
builds under Linux, even on machines of comparable speed, RAM, etc.
The slowness is seriously impacting the productivity of our developers
who work on Windows, so we're searching for any way we can to speed up
Cygwin builds.

We've found that one of the biggest culprits in slowing down the
Cygwin builds is Make.  The problem is that every time Make does
stat() to find out the modification time on a dependency to determine
whether or not its dependents need to be rebuilt, Cygwin calls
ReadFile on the file twice -- once to determine whether it's a
symbolic link, and a second time to determine whether it should appear
to be executable according to stat().  We have thousands of
dependencies in our Makefiles, and many of those dependencies
frequently live on network drives, so these calls to ReadFile
seriously slow things down.

We don't use symbolic links anywhere in our source tree or build
tree, and Make doesn't really care whether a file is executable when
deciding whether it is newer than one of its dependents, so both of
these calls to ReadFile are totally unnecessary to us.  As an
experiment, I added code to the Cygwin DLL to allow these ReadFile
calls to be temporarily disabled, and then I compiled a modified
version of Make which disables the ReadFile calls before calling
stat() and then turns them back on.

To measure the effect of these changes, I ran "make all" in a build
tree tha was already completely built, so that I would be timing only
the work Make does to check dependencies, rather than timing actual
build work.  With the unmodified Make, "make all" takes around six
minutes; with the modified Make, it takes around three.  We consider
this a significant improvement.  (However, note that on Linux, "make
all" when nothing needs to be done takes only 17 seconds, so clearly
there's still a lot of room for improvement under Cygwin.)

I'm wondering if the maintainers of Cygwin would be willing to
consider incorporating these changes, if I submit them, into the
Cygwin DLL and the Cygwin version of Make.  I'm thinking that the DLL
changes would actually need to be split into two flags -- one to say,
"Don't call ReadFile to find out whether a file is executable, because
I don't care about that," and the other to say, "Don't call ReadFile
to find out if a file is a symbolic link, because I know I'm not using
any symbolic links."  Then, GNU Make on Cygwin could always set the
first flag, and it could set the second flag if the user specified
"--nosymlinks" or something like that.

I realize that this is a bit gross.  However, (a) surely it isn't much
more gross than storing symbolic links inside files and reading files
to determine whether they should look executable :-), and (b) it
really does give a drastic performance improvement for the small price
of not using symbolic links in your source or build tree.

Please comment.

Thanks,

  Jonathan Kamens

--
Want to unsubscribe from this list?
Check out: http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 50+ messages in thread
* RE: Optimizing away "ReadFile" calls when Make calls stat()
@ 2001-02-13 14:15 Puttkammer, Roman
  2001-02-13 14:28 ` Christopher Faylor
  0 siblings, 1 reply; 50+ messages in thread
From: Puttkammer, Roman @ 2001-02-13 14:15 UTC (permalink / raw)
  To: cygwin

> -----Original Message-----
> From: jfaith@lineo.com [ mailto:jfaith@lineo.com ]
> ...
> script just did "make --version > /dev/null" one thousand times
> ...
> Linux: 3 sec.
> VMWare running Linux: 9 sec.
> DOS (batch file) 18 sec.
> Cygwin: 30 sec.

AFAIK, fork() tends to be much slower on windows than on most unixes
such as solaris or linux. Hence you'll always get a bad performance
on windows when running this kind of tests. I doubt however that you
can generalize these results; it's kind of like comparing pineapples
with carrots.

The reason why this is the case though is probably because unixes are
optimized for server applications using many heavy and light weight
processes/threads. Windows however seems to be optimized for running
10meg VB script functions inside an excel spreadsheet - and it does
actually a pretty good job running those :-)

putt

--
Want to unsubscribe from this list?
Check out: http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 50+ messages in thread
* RE: Optimizing away "ReadFile" calls when Make calls stat()
@ 2001-02-14  2:41 Bernard Dautrevaux
  0 siblings, 0 replies; 50+ messages in thread
From: Bernard Dautrevaux @ 2001-02-14  2:41 UTC (permalink / raw)
  To: 'DJ Delorie', jik-cygwin; +Cc: cygwin

> -----Original Message-----
> From: DJ Delorie [ mailto:dj@delorie.com ]
> Sent: Tuesday, February 13, 2001 8:54 PM
> To: jik-cygwin@curl.com
> Cc: cygwin@cygwin.com
> Subject: Re: Optimizing away "ReadFile" calls when Make calls stat()
> 
> 
> 
> > As I've noted separately, reading tens of thousands of 
> files even once
> > incurs a significant performance penalty.
> 
> True, but reading them all once is better than reading them all twice.
> I'm trying to break the problem down into small enough changes that we
> actually have a chance of implementing them.
> 
> > The change I've proposed can eliminate reading them at all.
> 
> But not in a way that we can make it the default.  Perhaps you could
> propose a set of mount flags to optimize common situations?  We
> already have one to avoid the read-for-execute test, perhaps you could
> work on an assume-no-symlinks flag?  Then we wouldn't need a custom
> make.exe (or any other program).
> 
> > But it does nothing at all for the "usual case" I'm trying to
> > optimize, which is Make stat()ing a file but never reading it.
> 
> It does, because stat() reads the file twice, once to see if it's a
> symlink, and once to see if the executable bit needs to be set.
> 
> > >  These should be easier wins (thus, more doable) than a 
> global cache,
> > >  which NT should be providing itself as part of the disk cache
> > >  subsystem (for local drives, at least).  I don't think it's
> > >  appropriate for cygwin to go beyond this anyway - too many race
> > >  conditions arise.
> > 
> > As far as I know, there are no race conditions in the change I
> > suggested.  In fact, it *removes* race conditions, since it reduces
> > the number of distinct OS operations that must be performed 
> on a file
> > during stat().
> 
> Right, but others were suggesting a global cache of file bytes.
> *That* would introduce race conditions.
> 

Perhaps a solution would be to maintain what could be called a "partial"
stat() cache: maintain a global cache of ALL the result of the ReadFile()s
(that can easily I think reduced to 1) together with the last-time-modified
value.

stat() will then ALWAYS check the last-time-modified of the ACTUAL file,
then check the cache and if the cache is up-to-date, returns the
execute/symlink flags found in the cache. If the cache is obsolete or
absent, just re-read the file's content and save in the cache the
LMT/exec/symlink values.

The only race condition will be when UPDATING the cache (no problem on
reading if we first change exec/symlink then upadte LMT); this should be
simple to handle.

Regretfully I don't have time to look at this (and don't know how it is
effectively implemented now) but this should provide quite a big win for
cygwin.

Regards,

	Bernard

--------------------------------------------
Bernard Dautrevaux
Microprocess Ingenierie
97 bis, rue de Colombes
92400 COURBEVOIE
FRANCE
Tel:	+33 (0) 1 47 68 80 80
Fax:	+33 (0) 1 47 88 97 85
e-mail:	dautrevaux@microprocess.com
		b.dautrevaux@usa.net
-------------------------------------------- 

--
Want to unsubscribe from this list?
Check out: http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 50+ messages in thread
* RE: Optimizing away "ReadFile" calls when Make calls stat()
@ 2001-02-14  4:46 Bernard Dautrevaux
  0 siblings, 0 replies; 50+ messages in thread
From: Bernard Dautrevaux @ 2001-02-14  4:46 UTC (permalink / raw)
  To: 'cygwin@cygwin.com'

> -----Original Message-----
> From: Christopher Faylor [ mailto:cgf@redhat.com ]
> Sent: Tuesday, February 13, 2001 11:29 PM
> To: cygwin@cygwin.com
> Subject: Re: Optimizing away "ReadFile" calls when Make calls stat()
> 
> 
> On Tue, Feb 13, 2001 at 05:13:49PM -0500, Puttkammer, Roman wrote:
> >
> >> -----Original Message-----
> >> From: jfaith@lineo.com [ mailto:jfaith@lineo.com ]
> >> ...
> >> script just did "make --version > /dev/null" one thousand times
> >> ...
> >> Linux: 3 sec.
> >> VMWare running Linux: 9 sec.
> >> DOS (batch file) 18 sec.
> >> Cygwin: 30 sec.
> >
> >AFAIK, fork() tends to be much slower on windows than on most unixes
> >such as solaris or linux.
> 
> There is no real fork on generic Win32.  Cygwin emulates the 
> fork call and
> it is, as a result, very slow.
> 

AFAIK, cygwin is not the only at fault here, the raw Win32 CreateProcess()
call is quite slow also. In our cross-development toolset we only use a
"spawn" call implemented directly on top of ProcessCreate and we see a more
than 10-times performance loss between "fork/exec" on Linux and "spawn" on
NT :-)

Regards,

	Bernard

--------------------------------------------
Bernard Dautrevaux
Microprocess Ingenierie
97 bis, rue de Colombes
92400 COURBEVOIE
FRANCE
Tel:	+33 (0) 1 47 68 80 80
Fax:	+33 (0) 1 47 88 97 85
e-mail:	dautrevaux@microprocess.com
		b.dautrevaux@usa.net
-------------------------------------------- 

--
Want to unsubscribe from this list?
Check out: http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 50+ messages in thread
* RE: Optimizing away "ReadFile" calls when Make calls stat()
@ 2001-02-16  9:24 Bernard Dautrevaux
  2001-02-16 10:17 ` Christopher Faylor
  0 siblings, 1 reply; 50+ messages in thread
From: Bernard Dautrevaux @ 2001-02-16  9:24 UTC (permalink / raw)
  To: 'cygwin@cygwin.com'

> -----Original Message-----
> From: Christopher Faylor [ mailto:cgf@redhat.com ]
> Sent: Friday, February 16, 2001 6:00 PM
> To: Cygwin-L
> Subject: Re: Optimizing away "ReadFile" calls when Make calls stat()
> 
> 
> On Fri, Feb 16, 2001 at 10:59:49AM -0500, Larry Hall (RFK 
> Partners, Inc) wrote:
> >At 04:34 AM 2/16/2001, Warren Young wrote:
> >>"Charles S. Wilson" wrote:
> >> > 
> >> > If I were porting an old app from unix to cygwin, and 
> wanted to tune
> >> > performance, I'd much rather do this:
> >>
> >>Both you and Jonathan have understood my intent perfectly.  
> >>
> >>Christopher, please do consider this proposal.  It's easy 
> to implement
> >>-- probably just a few tweaks on Egor's patch -- and it 
> makes it easy to
> >>gain performance with straightforward patches to affected programs. 
> >>It'd be nice if we can make Cygwin faster, but this proposal has an
> >>inherent advantage: the calling process _knows_ what it 
> wants, whereas
> >>Cygwin can only guess or anticipate.
> >>
> >>Egor, Jonathan, maybe some benchmarks would help convince 
> Christopher of
> >>the patch's utility.
> >
> >Chris sent some email about this yesterday.  He's looking at 
> the possibility
> >of eliminating this problem without changing the API.
> 
> Nice to see that *someone* is paying attention.


I, for one, is more than just paying attention; I just wondered if it was
worthless to add some traffic to an already quite long thread. 

I' REALLY waiting to see your solution; I myself expose my ideas on how to
do that, but I trust you to have found a solution a lot better than mine :-)


Thanks in advance for the good work,

	Berrnard
--------------------------------------------
Bernard Dautrevaux
Microprocess Ingenierie
97 bis, rue de Colombes
92400 COURBEVOIE
FRANCE
Tel:	+33 (0) 1 47 68 80 80
Fax:	+33 (0) 1 47 88 97 85
e-mail:	dautrevaux@microprocess.com
		b.dautrevaux@usa.net
-------------------------------------------- 

--
Want to unsubscribe from this list?
Check out: http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2001-02-16 10:17 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-02-13 10:36 Optimizing away "ReadFile" calls when Make calls stat() Jonathan Kamens
2001-02-13 10:56 ` Larry Hall (RFK Partners, Inc)
2001-02-13 11:01   ` jik-cygwin
2001-02-13 11:14     ` Larry Hall (RFK Partners, Inc)
2001-02-13 11:18       ` jik-cygwin
2001-02-13 11:26         ` Larry Hall (RFK Partners, Inc)
2001-02-13 11:35     ` DJ Delorie
2001-02-13 11:46       ` jik-cygwin
2001-02-13 11:54         ` DJ Delorie
2001-02-13 11:56           ` Jonathan Kamens
2001-02-13 12:06             ` DJ Delorie
2001-02-13 12:31             ` Larry Hall (RFK Partners, Inc)
2001-02-13 12:22           ` Christopher Faylor
2001-02-13 12:50             ` DJ Delorie
2001-02-14  0:12             ` Egor Duda
2001-02-14  0:17               ` Robert Collins
2001-02-15 11:47               ` Warren Young
2001-02-15 13:14                 ` Larry Hall (RFK Partners, Inc)
2001-02-15 14:17                   ` Charles S. Wilson
2001-02-16  1:34                     ` Warren Young
2001-02-16  8:07                       ` Larry Hall (RFK Partners, Inc)
2001-02-16  9:00                         ` Christopher Faylor
2001-02-15 14:17                   ` Christopher Faylor
2001-02-15 14:19                   ` Jonathan Kamens
2001-02-16  1:14                 ` Egor Duda
2001-02-16  1:29                   ` Warren Young
2001-02-13 15:28         ` Warren Young
2001-02-14  0:48           ` Lothan
2001-02-13 11:12   ` Earnie Boyd
2001-02-13 11:46   ` Christopher Faylor
2001-02-13 11:09 ` Earnie Boyd
2001-02-13 11:15   ` jik-cygwin
2001-02-13 11:48     ` Earnie Boyd
2001-02-13 11:54       ` jik-cygwin
2001-02-13 12:25         ` DJ Delorie
2001-02-13 12:50           ` Larry Hall (RFK Partners, Inc)
2001-02-13 12:51             ` DJ Delorie
2001-02-13 13:37             ` jfaith
2001-02-13 13:50             ` Mumit Khan
2001-02-13 14:13               ` DJ Delorie
2001-02-13 12:11       ` DJ Delorie
2001-02-13 11:24 ` Eric M. Monsler
2001-02-13 11:28   ` jik-cygwin
2001-02-13 12:04     ` Eric M. Monsler
2001-02-13 14:15 Puttkammer, Roman
2001-02-13 14:28 ` Christopher Faylor
2001-02-14  2:41 Bernard Dautrevaux
2001-02-14  4:46 Bernard Dautrevaux
2001-02-16  9:24 Bernard Dautrevaux
2001-02-16 10:17 ` Christopher Faylor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).