RE: OT: possible project/research project

public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed

* RE: OT: possible project/research project
@ 2002-03-19 23:29 Robert Collins
  2002-03-20 21:00 ` Gary R. Van Sickle
  0 siblings, 1 reply; 30+ messages in thread
From: Robert Collins @ 2002-03-19 23:29 UTC (permalink / raw)
  To: Gary R. Van Sickle, cygwin

> -----Original Message-----
> From: Gary R. Van Sickle [mailto:g.r.vansickle@worldnet.att.net] 
> Sent: Wednesday, March 20, 2002 1:52 PM

> 
> I don't see it that the source of the problem is the 
> implementation of fork/vfork; the way I see it the very 
> *concept* of forking makes little to no sense.  I've written 
> a lot of code, and not once have I thought to myself, "ok, 
> now what I want to do here is duplicate the current process 
> in almost exactly its current state."  Maybe it made more 
> sense back in the day, or maybe I'm missing something, but it 
> seems to me there's a lot more efficient ways to do 
> multithreading/multi"process"ing/IPC/etc (or better yet avoid 
> them altogether) these days.

Well, most high-performance systems use a combination of MT,MP and IPC.
Look at IIS for instance (not that I like MS :}). IIS uses in-process
filters to allow modularity and extensability, much like apache does -
now - with modules. (i.e. consider the php module vs the php cgi). In
such cases performance and scalability go up dramatically. However there
is a maintenance cost - it's harder to keep a system in good design the
more tightly coupled it is.

MT/MP and IPC will (IMO) allways have a place, because of the loose
coupling they allow. However COM & CORBA also allow loose coupling AND
in-process behaviour, so a happy can be found.

The issue at hand though, is twofold:
1) Minimise the changes needed to make a proxy for a program. I.e.
imagine if GCC and cc1plus.exe lived in-process. That would remove 2Mb
of disk IO for each compile. However the _only_ chance of getting such a
program proxied would be a minimalistic, non-intrusive approach, or
keeping a patched branch :[.
2) Make the context saving and restoring as low-overhead as possible.
(if this is > spawn() + wait, there is no point).

Rob

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: OT: possible project/research project
  2002-03-19 23:29 OT: possible project/research project Robert Collins
@ 2002-03-20 21:00 ` Gary R. Van Sickle
  2002-03-21  0:54   ` Jesper Eskilson
  0 siblings, 1 reply; 30+ messages in thread
From: Gary R. Van Sickle @ 2002-03-20 21:00 UTC (permalink / raw)
  To: cygwin

> The issue at hand though, is twofold:
> 1) Minimise the changes needed to make a proxy for a program. I.e.
> imagine if GCC and cc1plus.exe lived in-process. That would remove 2Mb
> of disk IO for each compile. However the _only_ chance of getting such a
> program proxied would be a minimalistic, non-intrusive approach, or
> keeping a patched branch :[.
> 2) Make the context saving and restoring as low-overhead as possible.
> (if this is > spawn() + wait, there is no point).
>
> Rob

My thinking on this matter (and I've been cogitating about it for some time
actually) takes a slightly different tack.  My basic ideas for a "modernized sh"
are:

1.  Eliminate as much fork()ing as possible, ideally all of it.
2.  Get some concurrency going.

#1 is basically the same as what you propose, though I'm not sure I'm wild about
the DLL idea; if everything's a builtin, why not just statically link?

#2 I think could be a significant win even for Unix folk.  Basically I'm
thinking along the lines of a "pipelined shell", e.g.:

# Why should this...:
rm //a/bunch/of/files/out/on/a/super/slow/server/*
# ...block this:
gcc hello.c

Obviously you're never going to be able to take advatage of all
non-dependencies, but as a wise man once told me, "you can't win if you don't
enter".

--
Gary R. Van Sickle
Brewer.  Patriot.

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: OT: possible project/research project
  2002-03-20 21:00 ` Gary R. Van Sickle
@ 2002-03-21  0:54   ` Jesper Eskilson
  0 siblings, 0 replies; 30+ messages in thread
From: Jesper Eskilson @ 2002-03-21  0:54 UTC (permalink / raw)
  To: cygwin

"Gary R. Van Sickle" <g.r.vansickle@worldnet.att.net> writes:

> # Why should this...:
> rm //a/bunch/of/files/out/on/a/super/slow/server/*
> # ...block this:
> gcc hello.c
> 
> Obviously you're never going to be able to take advatage of all
> non-dependencies, but as a wise man once told me, "you can't win if you don't
> enter".

make -j?

-- 
/Jesper


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: OT: possible project/research project
@ 2002-03-21 21:10 Robert Collins
  0 siblings, 0 replies; 30+ messages in thread
From: Robert Collins @ 2002-03-21 21:10 UTC (permalink / raw)
  To: cygwin



> -----Original Message-----
> From: Christopher Faylor [mailto:cygwin@cygwin.com] 
> Sent: Friday, March 22, 2002 3:54 AM
> 
> Can we take this discussion somewhere else?  I don't really 
> see how it relates to cygwin.

Sure. Given the apparent interest I was about to start looking for a
mailing list home... but ksh93 is more than close enough to what I was
thinking about, so consider this end of thread.
 
Rob

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: OT: possible project/research project
  2002-03-21  5:03 Robert Collins
@ 2002-03-21  9:29 ` Christopher Faylor
  0 siblings, 0 replies; 30+ messages in thread
From: Christopher Faylor @ 2002-03-21  9:29 UTC (permalink / raw)
  To: cygwin

On Thu, Mar 21, 2002 at 11:37:26PM +1100, Robert Collins wrote:
>
>
>> -----Original Message-----
>> From: Jesper Eskilson [mailto:jojo@virtutech.se] 
>> Sent: Thursday, March 21, 2002 11:27 PM
>> To: Robert Collins
>> Cc: cygwin@cygwin.com
>> Subject: Re: OT: possible project/research project
>> 
>> 
>> "Robert Collins" <robert.collins@itdomain.com.au> writes:
>> 
>> > make -j serialises at directory borders (at a minimum).  You might 
>> > like to review the 'recursive make considered harmful' 
>> paper (if you 
>> > haven't already).
>> 
>> 'make -j' and recursive make are orthogonal issues.
>
>Depending on context, I agree.

Can we take this discussion somewhere else?  I don't really see how it
relates to cygwin.

I've already suggested that there are already shells around that do some
of what you're proposing.  I've already said that, regardless, it would
be very unlikely that I will be interested in this for cygwin.

Discussions about make -j, the awfulness of fork, etc. really don't have
much bearing to cygwin.

Please continue the discussion in private email.

cgf
--
Please do not send me personal email with cygwin questions.
Use the resources at http://cygwin.com/ .

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: OT: possible project/research project
@ 2002-03-21  5:03 Robert Collins
  2002-03-21  9:29 ` Christopher Faylor
  0 siblings, 1 reply; 30+ messages in thread
From: Robert Collins @ 2002-03-21  5:03 UTC (permalink / raw)
  To: Jesper Eskilson; +Cc: cygwin



> -----Original Message-----
> From: Jesper Eskilson [mailto:jojo@virtutech.se] 
> Sent: Thursday, March 21, 2002 11:27 PM
> To: Robert Collins
> Cc: cygwin@cygwin.com
> Subject: Re: OT: possible project/research project
> 
> 
> "Robert Collins" <robert.collins@itdomain.com.au> writes:
> 
> > make -j serialises at directory borders (at a minimum).  You might 
> > like to review the 'recursive make considered harmful' 
> paper (if you 
> > haven't already).
> 
> 'make -j' and recursive make are orthogonal issues.

Depending on context, I agree.
Rob

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: OT: possible project/research project
  2002-03-21  4:27 Robert Collins
@ 2002-03-21  4:53 ` Jesper Eskilson
  0 siblings, 0 replies; 30+ messages in thread
From: Jesper Eskilson @ 2002-03-21  4:53 UTC (permalink / raw)
  To: Robert Collins; +Cc: cygwin

"Robert Collins" <robert.collins@itdomain.com.au> writes:

> make -j serialises at directory borders (at a minimum).  You might like
> to review the 'recursive make considered harmful' paper (if you haven't
> already).

'make -j' and recursive make are orthogonal issues.

-- 
/Jesper


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: OT: possible project/research project
@ 2002-03-21  4:27 Robert Collins
  2002-03-21  4:53 ` Jesper Eskilson
  0 siblings, 1 reply; 30+ messages in thread
From: Robert Collins @ 2002-03-21  4:27 UTC (permalink / raw)
  To: Jesper Eskilson, cygwin

> -----Original Message-----
> From: Jesper Eskilson [mailto:jojo@virtutech.se] 
> Sent: Thursday, March 21, 2002 4:52 PM
> To: cygwin@cygwin.com
> Subject: Re: OT: possible project/research project
> 
> 
> "Gary R. Van Sickle" <g.r.vansickle@worldnet.att.net> writes:
> 
> > # Why should this...:
> > rm //a/bunch/of/files/out/on/a/super/slow/server/*
> > # ...block this:
> > gcc hello.c
> > 
> > Obviously you're never going to be able to take advatage of all 
> > non-dependencies, but as a wise man once told me, "you can't win if 
> > you don't enter".
> 
> make -j?

make -j serialises at directory borders (at a minimum).
You might like to review the 'recursive make considered harmful' paper
(if you haven't already). 

such an approach allows hugely decreased make times in the general case,
by reducing
forks()
dependency recalculations
increased parallelism (partly by reduced serialisation of
non-dependencies).

I routinely do this with automake - I have One, or at most 3
Makefile.am's for projects, even quite large ones, and make -j3 just
flies along, even with lots and lots of little modular directories and
dependencies turned on.

Rob

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: OT: possible project/research project
@ 2002-03-21  2:55 Robert Collins
  0 siblings, 0 replies; 30+ messages in thread
From: Robert Collins @ 2002-03-21  2:55 UTC (permalink / raw)
  To: Gary R. Van Sickle, cygwin

> -----Original Message-----
> From: Gary R. Van Sickle [mailto:g.r.vansickle@worldnet.att.net] 
> Sent: Thursday, March 21, 2002 2:32 PM

> The thing is, a lot of work *has* been done to make fork as 
> efficient as possible.  But there's a limit on how fast you 
> can create a new process and duplicate the current one into 
> it.  And Windows doesn't go out of its way to help.

Also worth considering is the implicit overhead (both with and without
optimisations such as copy-on-write) of duplicating a process and then
replacing the image vs creating a process from a image.

In reverse order, a) loading the image is the same in both cases:
library path searches, permission checks, prologue code, and then you
are in business. So however fast one of these is made, the other can
match.

b) A process really is just a bunch of kernel structures: memmory map +
page tables , environment block, permissions block, ownership,
threads... and actual virtual memory.

Duplicating a process requires (in general terms):
Copying all the kernel structures.
Fixing them up - adjust child relationship, pid etc.
Assign new virtual memory (may be same via cow) mapped to the original
address's in the process

Note that the amount of work is directly related to the size of the
process. Forking 4Gb of VM is always going to be more work that forking
20kb of VM.

Creating a process requires (in general terms):
Creating empty kernel structures with appropriate values.
Assign the first VM page.
Mapping the image in - fixup library requirements etc. (from a)).

So the unique aspect of creating a process is
Creating empty kernel structures with appropriate values.
Assign the first VM page.

To compare then, you have a near-constant 'trivial' new process overhead
(spawn) vs a variable overhead (but almost guaranteed to be greater
(that is barring corner cases)) than new process overhead (fork + exec).

Hmm, which seems more efficient? 

Lol,
Rob

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: OT: possible project/research project
@ 2002-03-21  2:40 Robert Collins
  0 siblings, 0 replies; 30+ messages in thread
From: Robert Collins @ 2002-03-21  2:40 UTC (permalink / raw)
  To: Gary R. Van Sickle, cygwin

> -----Original Message-----
> From: Gary R. Van Sickle [mailto:g.r.vansickle@worldnet.att.net] 
> Sent: Thursday, March 21, 2002 2:25 PM

> #1 is basically the same as what you propose, though I'm not 
> sure I'm wild about the DLL idea; if everything's a builtin, 
> why not just statically link?

Several reasons.
1) I'm a user and I want foo-version of bar. How do I get that?
2) I'm in cmd.exe and want to run bar - does it still exist?
3) I'm running setup.exe and ls has been updated - do I download an
updated fileutils package or an updated bash package?
4) I'm a user and I want to add a new builtin.

One thing I'm considering doing is contributing a posix_spawn and
posix_spawnp to cygwin, to enhance the vfork capability. That should
allow patchs to be contributed upstream to gcc etc to remove fork -
because posix_spawn is a standard, it wouldn't be see as a cygwin hack.

Rob

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: OT: possible project/research project
       [not found] <5.1.0.14.2.20020320191522.02498050@pop3.cris.com>
@ 2002-03-20 21:48 ` Gary R. Van Sickle
  0 siblings, 0 replies; 30+ messages in thread
From: Gary R. Van Sickle @ 2002-03-20 21:48 UTC (permalink / raw)
  To: Randall R Schulz, Cygwin mailing list

> Gary,
>
> You labelled yourself a patriot.

I quoted the label of a beer bottle.  Samuel Adams to be precise.

> I just pointed out some relevant wisdom.

Indeed.  But not the relevant wisdom you thought you had.

> If you perceive that to be namecalling, so be it. It's the sort of baseless
> conclusion I expect from someone who admires patriotism.
>

Or drinks beer?

> To the best of my ability to discern it, there is no connection between the
> impoverished and gravely mistaken notion of patriotism and software process
> control models. If you can see one, please share it with me.
>

Well, I think the fork() concept qualifies as "impoverished and gravely
mistaken".  So I got that goin' for me.

> Randy
>

PS: In the future, if you have any insults or namecalling you feel you need to
send my way, please do so in a public forum where others can enjoy your sarcasm.
I have no desire to converse privately with someone who hates his country as
much as he hates himself.

My apologies to the list for bothering to respond to Mr. Schulz' bait.

--
Gary R. Van Sickle
Brewer.  Patriot.

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: OT: possible project/research project
  2002-03-20  1:58       ` Stephano Mariani
  2002-03-20  9:16         ` Christopher Faylor
@ 2002-03-20 20:07         ` Gary R. Van Sickle
  1 sibling, 0 replies; 30+ messages in thread
From: Gary R. Van Sickle @ 2002-03-20 20:07 UTC (permalink / raw)
  To: cygwin

> I would certainly agree with you about that, but the fact remains, a lot
> of code, that cygwin exists to ease the porting of, uses it. If the work
> was done on fork itself, it would help speed-up a lot more that just
> configure (or similar) scripts.
>
> Stephano Mariani

The thing is, a lot of work *has* been done to make fork as efficient as
possible.  But there's a limit on how fast you can create a new process and
duplicate the current one into it.  And Windows doesn't go out of its way to
help.

--
Gary R. Van Sickle
Brewer.  Patriot.



--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: OT: possible project/research project
  2002-03-19 23:04       ` Randall R Schulz
@ 2002-03-20 19:41         ` Gary R. Van Sickle
  0 siblings, 0 replies; 30+ messages in thread
From: Gary R. Van Sickle @ 2002-03-20 19:41 UTC (permalink / raw)
  To: cygwin

> Sir,
>
> We await your improved model for process control and the operating system
> that implements it.
>

Senor,

Well wait no longer!  These days, by gosh, we got everything from spawns to
execs to named synchronization objects to... dare I say it?... yes, even
threads!  Gone are the days when alls a guy could do was fork dozens of exact
duplicates of the process he was already running when all he wanted was a little
concurrency!

Oh, and don't worry, ALL the OSes gots 'em!

> Randall Schulz
> Mountain View, CA USA
>
> Patriotism is the last refuge of a scoundrel.
>   -- Samuel Johnson

I'm a "scoundrel"?  That's the best you got Randy?  Heeheeeehehehee!

And you're so smitten with fork that you need to start "namecalling"?

--
Gary R. Van Sickle
Brewer.  Patriot.

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: OT: possible project/research project
@ 2002-03-20 18:11 Joshua Daniel Franklin
  0 siblings, 0 replies; 30+ messages in thread
From: Joshua Daniel Franklin @ 2002-03-20 18:11 UTC (permalink / raw)
  To: cygwin

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=us-ascii, Size: 1060 bytes --]

Just to get a little more off-topic...

> Don't forget this:
> 
> "... this is the best of all possible worlds."
>   -- Voltaire

Maybe this was a joke, but you *do* realize that this was taken from
a work of fiction? (_Candide_, which was a satire of Gottfried Wilhelm
Leibnitz' philosophy.) Leibnitz claimed, in _Theodicee_, that the presence of
evil in God's Creation does not mean that God is not perfect, which was
lampooned into 

1. God created the world
2. God is perfect
3. Creation is perfect as possible

In other words, neither Voltaire nor Leibnitz ever really claimed that "this
is the best of all possible worlds."

Perhaps this area would make an interesting research project, too.

__________________________________________________
Do You Yahoo!?
Yahoo! Movies - coverage of the 74th Academy Awards®
http://movies.yahoo.com/

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: OT: possible project/research project
@ 2002-03-20 11:44 Robert Collins
  0 siblings, 0 replies; 30+ messages in thread
From: Robert Collins @ 2002-03-20 11:44 UTC (permalink / raw)
  To: Randall R Schulz, cygwin

Randall..

> -----Original Message-----
> From: Randall R Schulz [mailto:rrschulz@cris.com] 
> Sent: Thursday, March 21, 2002 2:47 AM
> To: Robert Collins; cygwin@cygwin.com
> Subject: RE: OT: possible project/research project

> >No - sounds like you haven't been paying attention. In my very first 
> >email I pointed out that this was not an acceptable 
> approach, and that 
> >committing changes upstream would be the only meaningful way 
> of doing 
> >this.
> 
> Are you saying you think you're going to convince the 
> maintainers of these 
> special programs that have been endowed with the ability to operate 
> parasitically in your special version of the shell to let you 
> put these 
> changes into their mainline code bases? Good luck!
 
I'm saying that not doing that makes the maintenance untenable - at
first, second and third glance. If it's not a good enough model with
real enough potential for them to agree, then it's not worth doing.
 
> >Nearly everyone here does - most scripts have #!/bin/sh in 
> the header.
> 
> Perhaps. I do, but only until I want to use a BASH feature 
> that ash doesn't 
> have.

> >"The best is the enemy of the good."
> >- Voltaire "
> 
> Yes, yes. I've been around long enough to have heard all of these.
> 
> Don't forget this:
> 
> "... this is the best of all possible worlds."
>   -- Voltaire

Yup... and murphy was an optimist!.

Oh, for the objectivity thing.... yes I'm defending it, but I'm not in
love with it per se - that more acceptable?

Rob

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: OT: possible project/research project
  2002-03-20  1:58       ` Stephano Mariani
@ 2002-03-20  9:16         ` Christopher Faylor
  2002-03-20 20:07         ` Gary R. Van Sickle
  1 sibling, 0 replies; 30+ messages in thread
From: Christopher Faylor @ 2002-03-20  9:16 UTC (permalink / raw)
  To: cygwin

On Wed, Mar 20, 2002 at 09:37:42AM -0000, Stephano Mariani wrote:
>I would certainly agree with you about that, but the fact remains, a
>lot of code, that cygwin exists to ease the porting of, uses it.  If
>the work was done on fork itself, it would help speed-up a lot more
>that just configure (or similar) scripts.

That is making the oft-repeated assumption that there actually *is*
something that could be done to fork.

At this point, I'd even take ideas over patches.  However I'd like
informed ideas not "Say, I've heard that you can do copy-on-write
on Windows -- just make fork use that."

FWIW, /bin/sh uses vfork these days which, is basically just a spawn
call with some fd table copying overhead.

cgf
--
Please do not send me personal email with cygwin questions.
Use the resources at http://cygwin.com/ .

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: OT: possible project/research project
  2002-03-19 23:20 Robert Collins
@ 2002-03-20  9:00 ` Christopher Faylor
  0 siblings, 0 replies; 30+ messages in thread
From: Christopher Faylor @ 2002-03-20  9:00 UTC (permalink / raw)
  To: cygwin

On Wed, Mar 20, 2002 at 06:04:44PM +1100, Robert Collins wrote:
>In fact cgf has had a copy-on-write fork() for cygwin in alpha-quality
>IIRC. I'd love to do some perf tests with that, and in fact on my todo
>list is cygwin profiling. Time however, is the killer.

This keeps coming up.  Maybe it should be added to the FAQ.

The copy-on-write semantics available in the Windows API are not
adequate for fork.  copy-on-write doesn't work right on 9x/Me and, even
on NT, there is no way, AFAICT, to allow something like process a forks
process b, which forks process c.  In that scenario process c needs to
have a copy of process b's memory, not process a's.  That is not
possible to do, AFAICT, without actually copying memory, which defeats
the purpose.

This is all with Win32 APIs.  It may be possible to do something with
NT-specific code, as Robert mentioned, but I ran into problems even
there, too.

If people are under the impression that we haven't given fork
performance a lot of thought, they're wrong.  *I* could be wrong (and
would be thrilled to be wrong) but I don't think that improving fork
performance is an afternoon's work for some skilled programmer.

As far as the one shell for all proposal, there are already a few of
these things kicking around.  They're used for linux boot disks where
it is more advantageous to have one large executable with lots of
builtin functionality rather than one shell with lots of programs.

I sincerely doubt that I would ever be convinced to allow something
like this into the cygwin distribution.  It seems like it flys in the
face of cygwin's goal and would just be YA thing to confuse people.

cgf
--
Please do not send me personal email with cygwin questions.
Use the resources at http://cygwin.com/ .

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: OT: possible project/research project
       [not found] <FC169E059D1A0442A04C40F86D9BA76062E0@itdomain003.itdomain. net.au>
@ 2002-03-20  8:24 ` Randall R Schulz
  0 siblings, 0 replies; 30+ messages in thread
From: Randall R Schulz @ 2002-03-20  8:24 UTC (permalink / raw)
  To: Robert Collins, cygwin

Rob,

More...


At 01:33 2002-03-20, Robert Collins wrote:
>Randall,
>responses inline..
>
> > -----Original Message-----
> > From: Randall R Schulz [mailto:rrschulz@cris.com]
> > Sent: Wednesday, March 20, 2002 7:34 PM
>
> > >Well we still have that basic separate - bash's builtin's
> > for example.
> > >If
> > >it's not builtin, it needs a sub process.
> >
> > That's not quite right. Built-ins still need sub-processes if
> > they're going
> > to operate in a pipeline or are enclosed within parentheses.
>
>Ok. So if it's not builtin, or it's a builtin that needs to be
>pipelined/parentisised it requires a sub-process. That sounds like
>something that a patch to the relevant shell might provide some easy
>wins.

Eh? What's to be patched? The shell built-ins already do this. They 
wouldn't work if they didn't.


> > >sub process's after all) -  but we have the source so....
> >
> > How will your magical push_context protect from wild pointer
> > references, e.g.?
>
>If that becomes a problem, I'd suggest that dll's get loaded on page
>boundaries and we protect the non-permitted address space with
>read-only, and install an exception handler that unprotects and restores
>context. It may be that handling that is not worth the development time
>- so reliability could be an issue.

The Win32 API allows read/write protections to be altered dynamically 
within a process?

This alone will require operating on many, probably a majority of the 
process's page table entries, which is going to start to cost something 
like as much as a vfork or copy-on-write fork.


> > >The fork()/exec() model bites. Sorry, but it does. fork()
> > based servers
> > >for instance run into the galloping herd - and scale very
> > badly. The other
> > >use for fork -the fork/exec combination is better achieved
> > with spawn()
> > >which is designed to do just that one job well. It also
> > happens to work
> > >very well on cygwin, and I see no reason to change that. So
> > spawned apps
> > >will remain completely separated and independent.
> >
> > Servers are not shells. Why should they fork at all? That's
> > what threads
> > are for. It's also why CGI (without something like mod_perl)
> > is not a good
> > thing and the Java server model has significant advantages.
>
>Exactly... my point is that the fork/exec model has no innate use.
>vfork/execve does - which is what spawn (look under posix_spawn() for
>the offical spawn these days) accomplishes.

Vfork() is a hack that goes back to the first BSD ports of Unix to the Vax. 
The proper way to do it is transparently, with copy-on-write in the fork() 
call.


> > Are you planning on incorporating your scheme into every
> > program that runs
> > sub-processes on a regular basis? How likely is it that what
> > works in one
> > shell will work in another or in a server?
>
>No. I'm not trying to create a new operating environment, I'm trying to
>address a common-case issue. If I can get certain configure scripts to
>run in under 30 minutes on my machine here, I'd be very happy. As for
>portability to different shells, or even to servers, I'd suggest that
>keeping the API very simply and clean - much like the sub process model
>is simple and clean would encourage such re-use.

So have you profiled the code to know how much of the time in build goes 
into forking? If you lowered the cost to zero, how much would you save?

You're just not going to get simpler and cleaner than the fork/exec model! 
Likewise for encouraging re-use.


> > I don't know the details of spawn(). How does it accomplish
> > I/O redirection?
>
>int posix_spawn(pid_t *restrict pid, const char *restrict path,
>const posix_spawn_file_actions_t *file_actions,
>const posix_spawnattr_t *restrict attrp,
>char *const argv[restrict], char *const envp[restrict]);
>
>Is the prototype. If file_actions is null, the the new process gets a
>copy of the parents fd table. If it's not null, then it provides the fd
>table for the new process.
>
> > Obviously if you add something, the old stuff isn't
> > (necessarily) lost. I'm
> > just saying that the fork/exec process model is simple,
> > elegant, available,
> > universal and fully functional in all POSIX systems. Your
> > model is a horse
> > of another color and any given command that would avail itself of the
> > supposed benefits of your scheme must be recast into a library that
> > conforms to the requirements of your embedded task model.
>
>Yes. Which is a significant impediment right from the word go. Which
>should go some way to explaining my ambivalence on this idea. However
>the building blocks to use this model are present and functional on all
>POSIX systems, so there's no reason to assume we couldn't 'make it
>work'.

What are these "building blocks?"

Let me be clear. I think this probably _can_ be done (it's a SMOP, after 
all), but that it shouldn't be done because it's not worth doing in from 
the perspective of a rational, complete and accurate cost / benefit 
analysis, including both the up-front programming costs and the ongoing 
maintenance costs.


> > It doesn't prevent it, but to avail ones self of the putative
> > benefits of
> > your proposed scheme, a significantly different programming
> > model has to be
> > learned and used. All for what? A tiny incremental
> > improvement in program
> > start-up times on a single platform and one or two
> > pre-ordained shells?
>
>Huh? That's an assumption. I'd hope I could achieve librarisation as
>simply as casting main to lib_main, and providing link time replacements
>for exit() and _exit() and fatal(). Then the real-binary doesn't use
>those link time replacements.

What about the C runtime startup actions? Heap (malloc) initialization? 
Standard I/O table initialization? Probably lots of other startup actions 
about which I know nothing whose code assume their operating in a 
post-exec() context.

Offhand, it seems like this is a hairy beast indeed.


> > How much time do they save? That's for you to claim and
> > substantiate. I'm
> > not trying to justify or validate your project, I'm trying to
> > repudiate it.
>
>I can tell. I'm not trying to defend it, as that assumes that it is
>defendable. I'm discussing it in a neutral (ish) light, I hope. I am
>trying to provide responses to the specific points you make as part of
>that discussion.

Please be intellectually honest. You've got and idea and you're trying to 
defend it. There's nothing wrong with that. If you didn't have an 
Australian email address, I'd think you were deluded by that 
all-too-American desire for "objectivity."


> > But consider this: By the time you complete this task, the
> > upward march of
> > system speeds (CPU and I/O) will probably have done more to improve
> > elapsed-time performance of command invocation than your
> > improvements are
> > going to achieve.
>
>Straw poll, who here has and uses a machine more than 2 years old right
>now? My hand goes up, as does my girlfriends, and my firewall. (My PC
>happens to be a dual processor, but still). Also, consider that as
>system speeds increase, so does the functionality. We may find MS
>polling internet servers on process startup or something equally
>ridiculous that drastically increase process startup speed. Certainly
>system policies now play a part, as each process startup has to be
>tested against an arbitrarily long list of rules. And don't talk about
>virus scanners.

I knew you'd bring that up, but it's not a valid argument. You don't have 
to have the latest hardware to be climbing the curve of rising system power 
that is happening throughout the computer industry. As a group, users are 
trading up as better hardware becomes available, even if they're trading up 
one or two years behind the curve of the latest and fastest.


> > And five staff-minutes per user per month? You think that's
> > significant?
> > What would you do with those five minutes spread throughout
> > the month?
> > That's right: Nothing, 'cause you'd get it in
> > fraction-of-a-second parcels.
>
>Well that's an assumption. For me, I'd get it running configure scripts,
>which is in far bigger chunks than fraction of a second.

But the gain is still incremental. And both the current cost and that of 
the "new and improved scheme" are still just unknowns.


> > Lastly, you'll have to have an ongoing effort to port changes
> > from the
> > stand-alone original versions of the commands to your
> > embedded counterparts.
>
>No - sounds like you haven't been paying attention. In my very first
>email I pointed out that this was not an acceptable approach, and that
>committing changes upstream would be the only meaningful way of doing
>this.

Are you saying you think you're going to convince the maintainers of these 
special programs that have been endowed with the ability to operate 
parasitically in your special version of the shell to let you put these 
changes into their mainline code bases? Good luck!


> > >I'd guess at ash, as that's the smallest shell we have, but if it's
> > >easier
> > >with bash, then I see no reason not to - as this would be a /bin/sh
> > >replacement - if the benefits were to be realised.
> >
> > How many people use such a bare-bones shell? Unless you
> > modify them all,
> > there will be a sizeable user contingent that does not
> > benefit from your
> > efforts.
>
>Nearly everyone here does - most scripts have #!/bin/sh in the header.

Perhaps. I do, but only until I want to use a BASH feature that ash doesn't 
have.


> > I think you need a good technical justification for the effort you'll
> > expend relative to the benefits you're going to gain and the
> > detriments
> > you're going to incur.
>
>Absolutely. The problem domain needs further refinement, a lit search is
>needed, some rough test cases /mock upss to provide a rule-of-thumb idea
>about the potential returns, cygwin needs serious profiling to
>understand if my assumptions about performance are correct. Lotsa work
>to do this right.
>
> > As with all optimizations, you must measure the cost of the
> > current code
> > and that of replacement. In this case, you could possibly
> > mock up a test
> > jig that did DLL loading and compare that with the cost of
> > fork / exec. But
> > that would not include the unknown costs of your putative
> > push_context /
> > pop_context mechanism.
>
>Absolutely. In fact
>"
>Rules of Optimization:
>Rule 1: Don't do it.
>Rule 2 (for experts only): Don't do it yet.
>- M.A. Jackson
>
>"More computing sins are committed in the name of efficiency (without
>necessarily achieving it) than for any other single reason - including
>blind stupidity."
>- W.A. Wulf
>
>"We should forget about small efficiencies, say about 97% of the time:
>premature optimization is the root of all evil."
>- Donald Knuth
>
>"The best is the enemy of the good."
>- Voltaire "

Yes, yes. I've been around long enough to have heard all of these.

Don't forget this:

"... this is the best of all possible worlds."
  -- Voltaire


>With assembly credit to
>http://www-2.cs.cmu.edu/~jch/java/optimization.html
>
> > "The proof of the pudding is in the eating." So until you've
> > done it, you
> > won't know for an empirical fact if it's a win and if so how
> > much of a win
> > it is.
>
>Sure.
>
>Rob


Randall Schulz
Mountain View, CA USA


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: OT: possible project/research project
  2002-03-19 19:17     ` Gary R. Van Sickle
@ 2002-03-20  1:58       ` Stephano Mariani
  2002-03-20  9:16         ` Christopher Faylor
  2002-03-20 20:07         ` Gary R. Van Sickle
  0 siblings, 2 replies; 30+ messages in thread
From: Stephano Mariani @ 2002-03-20  1:58 UTC (permalink / raw)
  To: 'Gary R. Van Sickle', cygwin

I would certainly agree with you about that, but the fact remains, a lot
of code, that cygwin exists to ease the porting of, uses it. If the work
was done on fork itself, it would help speed-up a lot more that just
configure (or similar) scripts.

Stephano Mariani

> -----Original Message-----
> From: cygwin-owner@cygwin.com [mailto:cygwin-owner@cygwin.com] On
Behalf
> Of Gary R. Van Sickle
> Sent: Wednesday, 20 March 2002 2:52 AM
> To: cygwin@cygwin.com
> Subject: RE: OT: possible project/research project
> 
> > -----Original Message-----
> > From: cygwin-owner@cygwin.com [mailto:cygwin-owner@cygwin.com]On
Behalf
> > Of Stephano Mariani
> > Sent: Tuesday, March 19, 2002 7:34 PM
> > To: 'Randall R Schulz'; 'Robert Collins'; cygwin@cygwin.com
> > Subject: RE: OT: possible project/research project
> >
> >
> > I am no cygwin expert, or windows expert, but isn't the effort
better
> > spent getting the cygwin fork/vfork to work faster?
> >
> > Stephano Mariani
> >
> > PS: Please do not fry me if this is a stupid suggestion or not
possible
> > because of an obvious flaw, I simply fail to see why the source of
the
> > problem is not being targeted.
> >
> 
> I don't see it that the source of the problem is the implementation of
> fork/vfork; the way I see it the very *concept* of forking makes
little to
> no
> sense.  I've written a lot of code, and not once have I thought to
myself,
> "ok,
> now what I want to do here is duplicate the current process in almost
> exactly
> its current state."  Maybe it made more sense back in the day, or
maybe
> I'm
> missing something, but it seems to me there's a lot more efficient
ways to
> do
> multithreading/multi"process"ing/IPC/etc (or better yet avoid them
> altogether)
> these days.
> 
> --
> Gary R. Van Sickle
> Brewer.  Patriot.
> 
> 
> --
> Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
> Bug reporting:         http://cygwin.com/bugs.html
> Documentation:         http://cygwin.com/docs.html
> FAQ:                   http://cygwin.com/faq/




--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: OT: possible project/research project
@ 2002-03-20  1:54 Robert Collins
  0 siblings, 0 replies; 30+ messages in thread
From: Robert Collins @ 2002-03-20  1:54 UTC (permalink / raw)
  To: Randall R Schulz, cygwin

Randall,
responses inline..

> -----Original Message-----
> From: Randall R Schulz [mailto:rrschulz@cris.com] 
> Sent: Wednesday, March 20, 2002 7:34 PM

> >Well we still have that basic separate - bash's builtin's 
> for example. 
> >If
> >it's not builtin, it needs a sub process.
> 
> That's not quite right. Built-ins still need sub-processes if 
> they're going 
> to operate in a pipeline or are enclosed within parentheses.

Ok. So if it's not builtin, or it's a builtin that needs to be
pipelined/parentisised it requires a sub-process. That sounds like
something that a patch to the relevant shell might provide some easy
wins.

> >sub process's after all) -  but we have the source so....
> 
> How will your magical push_context protect from wild pointer 
> references, e.g.?

If that becomes a problem, I'd suggest that dll's get loaded on page
boundaries and we protect the non-permitted address space with
read-only, and install an exception handler that unprotects and restores
context. It may be that handling that is not worth the development time
- so reliability could be an issue.  

> >The fork()/exec() model bites. Sorry, but it does. fork() 
> based servers
> >for instance run into the galloping herd - and scale very 
> badly. The other 
> >use for fork -the fork/exec combination is better achieved 
> with spawn() 
> >which is designed to do just that one job well. It also 
> happens to work 
> >very well on cygwin, and I see no reason to change that. So 
> spawned apps 
> >will remain completely separated and independent.
> 
> Servers are not shells. Why should they fork at all? That's 
> what threads 
> are for. It's also why CGI (without something like mod_perl) 
> is not a good 
> thing and the Java server model has significant advantages.

Exactly... my point is that the fork/exec model has no innate use.
vfork/execve does - which is what spawn (look under posix_spawn() for
the offical spawn these days) accomplishes.

> Are you planning on incorporating your scheme into every 
> program that runs 
> sub-processes on a regular basis? How likely is it that what 
> works in one 
> shell will work in another or in a server?

No. I'm not trying to create a new operating environment, I'm trying to
address a common-case issue. If I can get certain configure scripts to
run in under 30 minutes on my machine here, I'd be very happy. As for
portability to different shells, or even to servers, I'd suggest that
keeping the API very simply and clean - much like the sub process model
is simple and clean would encourage such re-use.

> I don't know the details of spawn(). How does it accomplish 
> I/O redirection?

int posix_spawn(pid_t *restrict pid, const char *restrict path,
const posix_spawn_file_actions_t *file_actions,
const posix_spawnattr_t *restrict attrp,
char *const argv[restrict], char *const envp[restrict]);

Is the prototype. If file_actions is null, the the new process gets a
copy of the parents fd table. If it's not null, then it provides the fd
table for the new process.

> Obviously if you add something, the old stuff isn't 
> (necessarily) lost. I'm 
> just saying that the fork/exec process model is simple, 
> elegant, available, 
> universal and fully functional in all POSIX systems. Your 
> model is a horse 
> of another color and any given command that would avail itself of the 
> supposed benefits of your scheme must be recast into a library that 
> conforms to the requirements of your embedded task model.

Yes. Which is a significant impediment right from the word go. Which
should go some way to explaining my ambivalence on this idea. However
the building blocks to use this model are present and functional on all
POSIX systems, so there's no reason to assume we couldn't 'make it
work'.

> It doesn't prevent it, but to avail ones self of the putative 
> benefits of 
> your proposed scheme, a significantly different programming 
> model has to be 
> learned and used. All for what? A tiny incremental 
> improvement in program 
> start-up times on a single platform and one or two 
> pre-ordained shells?

Huh? That's an assumption. I'd hope I could achieve librarisation as
simply as casting main to lib_main, and providing link time replacements
for exit() and _exit() and fatal(). Then the real-binary doesn't use
those link time replacements.

> How much time do they save? That's for you to claim and 
> substantiate. I'm 
> not trying to justify or validate your project, I'm trying to 
> repudiate it.

I can tell. I'm not trying to defend it, as that assumes that it is
defendable. I'm discussing it in a neutral (ish) light, I hope. I am
trying to provide responses to the specific points you make as part of
that discussion.

> But consider this: By the time you complete this task, the 
> upward march of 
> system speeds (CPU and I/O) will probably have done more to improve 
> elapsed-time performance of command invocation than your 
> improvements are 
> going to achieve.

Straw poll, who here has and uses a machine more than 2 years old right
now? My hand goes up, as does my girlfriends, and my firewall. (My PC
happens to be a dual processor, but still). Also, consider that as
system speeds increase, so does the functionality. We may find MS
polling internet servers on process startup or something equally
ridiculous that drastically increase process startup speed. Certainly
system policies now play a part, as each process startup has to be
tested against an arbitrarily long list of rules. And don't talk about
virus scanners.

> And five staff-minutes per user per month? You think that's 
> significant? 
> What would you do with those five minutes spread throughout 
> the month? 
> That's right: Nothing, 'cause you'd get it in 
> fraction-of-a-second parcels.

Well that's an assumption. For me, I'd get it running configure scripts,
which is in far bigger chunks than fraction of a second. 

> Lastly, you'll have to have an ongoing effort to port changes 
> from the 
> stand-alone original versions of the commands to your 
> embedded counterparts.

No - sounds like you haven't been paying attention. In my very first
email I pointed out that this was not an acceptable approach, and that
committing changes upstream would be the only meaningful way of doing
this.

> >I'd guess at ash, as that's the smallest shell we have, but if it's 
> >easier
> >with bash, then I see no reason not to - as this would be a /bin/sh 
> >replacement - if the benefits were to be realised.
> 
> How many people use such a bare-bones shell? Unless you 
> modify them all, 
> there will be a sizeable user contingent that does not 
> benefit from your 
> efforts.

Nearly everyone here does - most scripts have #!/bin/sh in the header.

> I think you need a good technical justification for the effort you'll 
> expend relative to the benefits you're going to gain and the 
> detriments 
> you're going to incur.

Absolutely. The problem domain needs further refinement, a lit search is
needed, some rough test cases /mock upss to provide a rule-of-thumb idea
about the potential returns, cygwin needs serious profiling to
understand if my assumptions about performance are correct. Lotsa work
to do this right.

> As with all optimizations, you must measure the cost of the 
> current code 
> and that of replacement. In this case, you could possibly 
> mock up a test 
> jig that did DLL loading and compare that with the cost of 
> fork / exec. But 
> that would not include the unknown costs of your putative 
> push_context / 
> pop_context mechanism.

Absolutely. In fact 
"
Rules of Optimization:
Rule 1: Don't do it.
Rule 2 (for experts only): Don't do it yet.
- M.A. Jackson

"More computing sins are committed in the name of efficiency (without
necessarily achieving it) than for any other single reason - including
blind stupidity."
- W.A. Wulf

"We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil."
- Donald Knuth

"The best is the enemy of the good."
- Voltaire "

With assembly credit to
http://www-2.cs.cmu.edu/~jch/java/optimization.html

> "The proof of the pudding is in the eating." So until you've 
> done it, you 
> won't know for an empirical fact if it's a win and if so how 
> much of a win 
> it is.

Sure.

Rob

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: OT: possible project/research project
       [not found] <FC169E059D1A0442A04C40F86D9BA76062DD@itdomain003.itdomain. net.au>
@ 2002-03-20  0:53 ` Randall R Schulz
  0 siblings, 0 replies; 30+ messages in thread
From: Randall R Schulz @ 2002-03-20  0:53 UTC (permalink / raw)
  To: Robert Collins, cygwin

Robert,

Responses interposed below.

At 22:55 2002-03-19, Robert Collins wrote:
> > -----Original Message-----
> > From: Randall R Schulz [mailto:rrschulz@cris.com]
> > Sent: Wednesday, March 20, 2002 12:15 PM
>
> >
> > Robert,
> >
> > This idea isn't really new.
>
>I don't recall claiming it as 'new' .. just an idea.  :} (ok, pedant mode 
>off).
>
> > The problem is that you're creating a huge project that
> > creates no new
> > functionality and that has horrendous maintainence issues, as you say.
>
>Yah. That's the crux. I've no interest in creating such a project.
>
> > The library conversion idea is kind of a throwback to
> > pre-Unix days or to
> > systems like VMS (if I recall and understand it properly). In
> > these systems
> > there were "blessed" commands understood by the command
> > interpreter and
> > endowed with a more direct means of invocation. Other
> > commands required
> > full sub-process creation.
>
>Well we still have that basic separate - bash's builtin's for example. If 
>it's not builtin, it needs a sub process.

That's not quite right. Built-ins still need sub-processes if they're going 
to operate in a pipeline or are enclosed within parentheses.

> > I trust it's your intent that the user will see no obvious
> > differences in
> > invoking these programs, but you may find full transparency harder to
> > achieve than you expect. Will the full range of shell
> > features be available
> > to these specially integrated commands?
>
>That is the design goal, should such a project be attempted by me.
>
> > Will you be able to
> > pipe into and
> > out of them?
>
>Yes.
>
> > Will they work within parentheses?
>
>Yes.
>
> > In
> > procedures?
>
>Yes.
>
> > Will you
> > allow all shell features (pipes, say) are applied to
> > arbitrary combinations
> > of conventional and integrated commands?
>
>Yes.
>
>Before you think I've got plans to big for my boots, consider that if we 
>leverage an existing shell, all those feature work out-of-process now. 
>proxying each feature into a library-capable equivalent one at a time 
>would allow a serious fallback mechanism for any functionality gaps.
>
> > In your example of a `backquote command` (which I prefer to
> > invoke via $( ... ) using BASH)
>
>Not being a shell afficiondo, I'm happy to be educated: what is the key 
>difference between `...` and $(...)?

Same functionality, but $( ) constructs nest.

> > you'd be exposed to any unintended
> > side-effects within
> > the backquote command. Side-effects like file descriptor alterations,
> > changes in signal dispositions, receipt of signals or
> > exceptions (expected
> > or the result of a programming error).
>
>Well that's the point of the 'push_context' pseudo command, to save all 
>that information, and then restore it. It may need some 'OS' co-operation 
>to fully achieve that - ie stdin being kept intact (although I imagine 
>that the librarised commands would be given a virtual stdin, as they are 
>sub process's after all) -  but we have the source so....

How will your magical push_context protect from wild pointer references, e.g.?

> > The beauty of the fork/exec model with entirely separated
> > programs _is_
> > their self-containedness and the complete independence and
> > isolation each
> > of the programs gets from each other and from the program(s)
> > that invoke
> > them.
>
>The fork()/exec() model bites. Sorry, but it does. fork() based servers 
>for instance run into the galloping herd - and scale very badly. The other 
>use for fork -the fork/exec combination is better achieved with spawn() 
>which is designed to do just that one job well. It also happens to work 
>very well on cygwin, and I see no reason to change that. So spawned apps 
>will remain completely separated and independent.

Servers are not shells. Why should they fork at all? That's what threads 
are for. It's also why CGI (without something like mod_perl) is not a good 
thing and the Java server model has significant advantages.

Are you planning on incorporating your scheme into every program that runs 
sub-processes on a regular basis? How likely is it that what works in one 
shell will work in another or in a server?

I don't know the details of spawn(). How does it accomplish I/O redirection?

> > It is also nice in that it is a very simple programming
> > model for commands, both built-in and end-user-supplied, that run
> > within it.
>
>I don't see how this idea detracts from that. Do you think that the 
>presence of a librarised 'ls' command (for instance) will prevent the user 
>adding perl to their system? Or replacing ls? Either scenario is abhorrent 
>to me.

Obviously if you add something, the old stuff isn't (necessarily) lost. I'm 
just saying that the fork/exec process model is simple, elegant, available, 
universal and fully functional in all POSIX systems. Your model is a horse 
of another color and any given command that would avail itself of the 
supposed benefits of your scheme must be recast into a library that 
conforms to the requirements of your embedded task model.

> > It is probably less platform-specific than a scheme that demands use of
> > dynamically-linked / shared libraries.
>
>Ermm, I guarantee I'll be using libtool if I do this...
>
> > The Unix shell and process model may be somewhat costly of computing
> > resources (but only marginally so), especially as I said without
> > copy-on-write behavior in the fork call, but that rather
> > modest down-side is more than made up for by independence, modularity, and
> > open-endedness of the scheme.
>
>I grant that independence, modularity and open-endedness are wonderful 
>things. Can you please describe how what I have suggested prevents any one 
>of the above? The whole point of a librarised approach is to make the 
>shell modular. That also grants open-endedness for free. As for 
>independence, if none of the libraries are available, then the whole thing 
>would run as a normal shell, with no in-process behaviour.

It doesn't prevent it, but to avail ones self of the putative benefits of 
your proposed scheme, a significantly different programming model has to be 
learned and used. All for what? A tiny incremental improvement in program 
start-up times on a single platform and one or two pre-ordained shells?

> > I can't see how all the work your idea implies just for the
> > sake of some incremental performance improvements is going to be 
> worthwhile.
>
>Well that's arguable :}. If it takes 100 mythical man-months to create 
>this beast and libraries the top 20 shell tools.... how many users can use 
>this, and how much time do they save? Lets say they save 5 mythical 
>man-minutes per month per user. Well we have ??? thousand users, so I 
>think it'd pay forward it's time investment quick smart.

How much time do they save? That's for you to claim and substantiate. I'm 
not trying to justify or validate your project, I'm trying to repudiate it.

But consider this: By the time you complete this task, the upward march of 
system speeds (CPU and I/O) will probably have done more to improve 
elapsed-time performance of command invocation than your improvements are 
going to achieve.

Note, too, that it's not valid to measure system resources or elapsed time 
saved by adding up that saved by each individual user. Below a threshold of 
gain _per user_ your work is for naught because it is imperceptible.

And five staff-minutes per user per month? You think that's significant? 
What would you do with those five minutes spread throughout the month? 
That's right: Nothing, 'cause you'd get it in fraction-of-a-second parcels.

Lastly, you'll have to have an ongoing effort to port changes from the 
stand-alone original versions of the commands to your embedded counterparts.

> > By the way, which shell will you do this for? BASH, TCSH,
> > Ash? More than one?
>
>I'd guess at ash, as that's the smallest shell we have, but if it's easier 
>with bash, then I see no reason not to - as this would be a /bin/sh 
>replacement - if the benefits were to be realised.

How many people use such a bare-bones shell? Unless you modify them all, 
there will be a sizeable user contingent that does not benefit from your 
efforts.

> > Please feel free to prove me wrong, of course.
>
>Well, I've got to complete my review before I decide what I think of the 
>idea. Until then its just an idea. Also I don't feel the urge to 'prove 
>something' on this:}.

I think you need a good technical justification for the effort you'll 
expend relative to the benefits you're going to gain and the detriments 
you're going to incur.

As with all optimizations, you must measure the cost of the current code 
and that of replacement. In this case, you could possibly mock up a test 
jig that did DLL loading and compare that with the cost of fork / exec. But 
that would not include the unknown costs of your putative push_context / 
pop_context mechanism.

"The proof of the pudding is in the eating." So until you've done it, you 
won't know for an empirical fact if it's a win and if so how much of a win 
it is.

>Rob

Randall Schulz
Mountain View, CA USA

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: OT: possible project/research project
@ 2002-03-19 23:50 Robert Collins
  0 siblings, 0 replies; 30+ messages in thread
From: Robert Collins @ 2002-03-19 23:50 UTC (permalink / raw)
  To: Matthew Smith, Cygwin



> -----Original Message-----
> From: Matthew Smith [mailto:matts@bluesguitar.org] 
> Sent: Wednesday, March 20, 2002 11:46 AM
> To: Cygwin
> Subject: Re: OT: possible project/research project
> 
> 
> Robert:
> 
>     I'm not sure what I could do, but if you're willing to be 
> the project leader, and hand out work to do, I'd be more than 
> happy to pitch in and help.  Sounds like you have a pretty 
> good idea of what needs to be done.
> 
> cheers,
> -Matt Smith

Thanks Matt. I appreciate the offer, and if I'd be happy to lead such a
project. I intend to do a reasonible literature review first, and
hopefully bat the idea around some more before starting such a behemoth.
I hope you don't mind if I come back to you later though :].

If there is a reasonable interest in discussing this in detail -
design/review only - I'll look at getting a mailing list somewhere for
it. 

Rob

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: OT: possible project/research project
@ 2002-03-19 23:20 Robert Collins
  2002-03-20  9:00 ` Christopher Faylor
  0 siblings, 1 reply; 30+ messages in thread
From: Robert Collins @ 2002-03-19 23:20 UTC (permalink / raw)
  To: Stephano Mariani, Randall R Schulz, cygwin

> -----Original Message-----
> From: Stephano Mariani [mailto:sk.mail@btinternet.com] 
> Sent: Wednesday, March 20, 2002 12:34 PM
> To: 'Randall R Schulz'; Robert Collins; cygwin@cygwin.com
> Subject: RE: OT: possible project/research project
> 
> 
> I am no cygwin expert, or windows expert, but isn't the 
> effort better spent getting the cygwin fork/vfork to work faster?
> 
> Stephano Mariani
> 
> PS: Please do not fry me if this is a stupid suggestion or 
> not possible because of an obvious flaw, I simply fail to see 
> why the source of the problem is not being targeted.

Fry Fry!

Seriously though, reducing the overhead of fork() is a great idea. (BTW:
vfork is a different beast, it's ~ spawn() and that's OK.).
Unfortunately that requires kernel-level coding for NT, and/or kernel
level object modification to win9x on-the-fly. IOW it's going to be
unreliable for one and horribly complexify cygwin's innards for the
other (NT).

In fact cgf has had a copy-on-write fork() for cygwin in alpha-quality
IIRC. I'd love to do some perf tests with that, and in fact on my todo
list is cygwin profiling. Time however, is the killer.

Rob

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: OT: possible project/research project
@ 2002-03-19 23:14 Robert Collins
  0 siblings, 0 replies; 30+ messages in thread
From: Robert Collins @ 2002-03-19 23:14 UTC (permalink / raw)
  To: Randall R Schulz, cygwin

> -----Original Message-----
> From: Randall R Schulz [mailto:rrschulz@cris.com] 
> Sent: Wednesday, March 20, 2002 12:15 PM

> 
> Robert,
> 
> This idea isn't really new. 

I don't recall claiming it as 'new' .. just an idea.  :} (ok, pedant
mode off).

> The problem is that you're creating a huge project that 
> creates no new 
> functionality and that has horrendous maintainence issues, as you say.

Yah. That's the crux. I've no interest in creating such a project.

> The library conversion idea is kind of a throwback to 
> pre-Unix days or to 
> systems like VMS (if I recall and understand it properly). In 
> these systems 
> there were "blessed" commands understood by the command 
> interpreter and 
> endowed with a more direct means of invocation. Other 
> commands required 
> full sub-process creation.

Well we still have that basic separate - bash's builtin's for example.
If it's not builtin, it needs a sub process. 

> I trust it's your intent that the user will see no obvious 
> differences in 
> invoking these programs, but you may find full transparency harder to 
> achieve than you expect. Will the full range of shell 
> features be available 
> to these specially integrated commands? 

That is the design goal, should such a project be attempted by me.

> Will you be able to 
> pipe into and 
> out of them? 

Yes.

> Will they work within parentheses? 

Yes.

> In 
> procedures? 

Yes.

> Will you 
> allow all shell features (pipes, say) are applied to 
> arbitrary combinations 
> of conventional and integrated commands?

Yes.

Before you think I've got plans to big for my boots, consider that if we
leverage an existing shell, all those feature work out-of-process now.
proxying each feature into a library-capable equivalent one at a time
would allow a serious fallback mechanism for any functionality gaps.

> In your example of a `backquote command` (which I prefer to 
> invoke via $( ... ) using BASH) 

Not being a shell afficiondo, I'm happy to be educated: what is the key
difference between `...` and $(...)?

> you'd be exposed to any unintended 
> side-effects within 
> the backquote command. Side-effects like file descriptor alterations, 
> changes in signal dispositions, receipt of signals or 
> exceptions (expected 
> or the result of a programming error).

Well that's the point of the 'push_context' pseudo command, to save all
that information, and then restore it. It may need some 'OS'
co-operation to fully achieve that - ie stdin being kept intact
(although I imagine that the librarised commands would be given a
virtual stdin, as they are sub process's after all) -  but we have the
source so....

> The beauty of the fork/exec model with entirely separated 
> programs _is_ 
> their self-containedness and the complete independence and 
> isolation each 
> of the programs gets from each other and from the program(s) 
> that invoke 
> them. 

The fork()/exec() model bites. Sorry, but it does. fork() based servers
for instance run into the galloping herd - and scale very badly. The
other use for fork -the fork/exec combination is better achieved with
spawn() which is designed to do just that one job well. It also happens
to work very well on cygwin, and I see no reason to change that. So
spawned apps will remain completely separated and independent.

> It is also nice in that it is a very simple programming 
> model for commands, both built-in and end-user-supplied, that run 
> within it.

I don't see how this idea detracts from that. Do you think that the
presence of a librarised 'ls' command (for instance) will prevent the
user adding perl to their system? Or replacing ls? Either scenario is
abhorrent to me.

> It is probably less platform-specific than a scheme that demands use
of 
> dynamically-linked / shared libraries.

Ermm, I guarantee I'll be using libtool if I do this... 

> The Unix shell and process model may be somewhat costly of computing 
> resources (but only marginally so), especially as I said without 
> copy-on-write behavior in the fork call, but that rather 
> modest down-side is more than made up for by independence, modularity,
and 
> open-endedness of 
> the scheme.

I grant that independence, modularity and open-endedness are wonderful
things. Can you please describe how what I have suggested prevents any
one of the above? The whole point of a librarised approach is to make
the shell modular. That also grants open-endedness for free. As for
independence, if none of the libraries are available, then the whole
thing would run as a normal shell, with no in-process behaviour. 

> I can't see how all the work your idea implies just for the 
> sake of some incremental performance improvements is going to be
worthwhile.

Well that's arguable :}. If it takes 100 mythical man-months to create
this beast and libraries the top 20 shell tools.... how many users can
use this, and how much time do they save? Lets say they save 5 mythical
man-minutes per month per user. Well we have ??? thousand users, so I
think it'd pay forward it's time investment quick smart.

> By the way, which shell will you do this for? BASH, TCSH, 
> Ash? More than one?

I'd guess at ash, as that's the smallest shell we have, but if it's
easier with bash, then I see no reason not to - as this would be a
/bin/sh replacement - if the benefits were to be realised.

> Please feel free to prove me wrong, of course.

Well, I've got to complete my review before I decide what I think of the
idea. Until then its just an idea. Also I don't feel the urge to 'prove
something' on this :}.

Rob

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: OT: possible project/research project
       [not found]     ` <NCBBIHCHBLCMLBLOBONKMEOGCLAA.g.r.vansickle@worldnet.att.ne t>
@ 2002-03-19 23:04       ` Randall R Schulz
  2002-03-20 19:41         ` Gary R. Van Sickle
  0 siblings, 1 reply; 30+ messages in thread
From: Randall R Schulz @ 2002-03-19 23:04 UTC (permalink / raw)
  To: Gary R. Van Sickle, cygwin

Sir,

We await your improved model for process control and the operating system 
that implements it.

Randall Schulz
Mountain View, CA USA

Patriotism is the last refuge of a scoundrel.
  -- Samuel Johnson


At 18:51 2002-03-19, Gary R. Van Sickle wrote:
>I don't see it that the source of the problem is the implementation of 
>fork/vfork; the way I see it the very *concept* of forking makes little to 
>no sense.  I've written a lot of code, and not once have I thought to 
>myself, "ok, now what I want to do here is duplicate the current process 
>in almost exactly its current state."  Maybe it made more sense back in 
>the day, or maybe I'm missing something, but it seems to me there's a lot 
>more efficient ways to do multithreading/multi"process"ing/IPC/etc (or 
>better yet avoid them altogether) these days.
>
>--
>Gary R. Van Sickle
>Brewer.  Patriot.


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: OT: possible project/research project
  2002-03-19 17:47   ` Stephano Mariani
@ 2002-03-19 19:17     ` Gary R. Van Sickle
  2002-03-20  1:58       ` Stephano Mariani
       [not found]     ` <NCBBIHCHBLCMLBLOBONKMEOGCLAA.g.r.vansickle@worldnet.att.ne t>
  1 sibling, 1 reply; 30+ messages in thread
From: Gary R. Van Sickle @ 2002-03-19 19:17 UTC (permalink / raw)
  To: cygwin

> -----Original Message-----
> From: cygwin-owner@cygwin.com [mailto:cygwin-owner@cygwin.com]On Behalf
> Of Stephano Mariani
> Sent: Tuesday, March 19, 2002 7:34 PM
> To: 'Randall R Schulz'; 'Robert Collins'; cygwin@cygwin.com
> Subject: RE: OT: possible project/research project
>
>
> I am no cygwin expert, or windows expert, but isn't the effort better
> spent getting the cygwin fork/vfork to work faster?
>
> Stephano Mariani
>
> PS: Please do not fry me if this is a stupid suggestion or not possible
> because of an obvious flaw, I simply fail to see why the source of the
> problem is not being targeted.
>

I don't see it that the source of the problem is the implementation of
fork/vfork; the way I see it the very *concept* of forking makes little to no
sense.  I've written a lot of code, and not once have I thought to myself, "ok,
now what I want to do here is duplicate the current process in almost exactly
its current state."  Maybe it made more sense back in the day, or maybe I'm
missing something, but it seems to me there's a lot more efficient ways to do
multithreading/multi"process"ing/IPC/etc (or better yet avoid them altogether)
these days.

--
Gary R. Van Sickle
Brewer.  Patriot.


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: OT: possible project/research project
  2002-03-19 17:32 ` Randall R Schulz
@ 2002-03-19 17:47   ` Stephano Mariani
  2002-03-19 19:17     ` Gary R. Van Sickle
       [not found]     ` <NCBBIHCHBLCMLBLOBONKMEOGCLAA.g.r.vansickle@worldnet.att.ne t>
  0 siblings, 2 replies; 30+ messages in thread
From: Stephano Mariani @ 2002-03-19 17:47 UTC (permalink / raw)
  To: 'Randall R Schulz', 'Robert Collins', cygwin

I am no cygwin expert, or windows expert, but isn't the effort better
spent getting the cygwin fork/vfork to work faster?

Stephano Mariani

PS: Please do not fry me if this is a stupid suggestion or not possible
because of an obvious flaw, I simply fail to see why the source of the
problem is not being targeted.

> -----Original Message-----
> From: cygwin-owner@cygwin.com [mailto:cygwin-owner@cygwin.com] On
Behalf
> Of Randall R Schulz
> Sent: Wednesday, 20 March 2002 1:15 AM
> To: Robert Collins; cygwin@cygwin.com
> Subject: Re: OT: possible project/research project
> 
> Robert,
> 
> This idea isn't really new. I remember people talking about it back in
the
> System 6, System 7 and 32v days, when programs were starting to get
> bigger,
> disks were still pretty slow, main store rather small and there was
not
> yet
> a copy-on-write fork(2) or a vfork(2). (Not to mention the meager
control
> flow in the pre-Bourne shell that used a sub-process to effect a
seek(2)
> on
> the standard input being interpreted by the shell running the
script!!!)
> 
> The problem is that you're creating a huge project that creates no new
> functionality and that has horrendous maintainence issues, as you say.
> 
> The library conversion idea is kind of a throwback to pre-Unix days or
to
> systems like VMS (if I recall and understand it properly). In these
> systems
> there were "blessed" commands understood by the command interpreter
and
> endowed with a more direct means of invocation. Other commands
required
> full sub-process creation.
> 
> I trust it's your intent that the user will see no obvious differences
in
> invoking these programs, but you may find full transparency harder to
> achieve than you expect. Will the full range of shell features be
> available
> to these specially integrated commands? Will you be able to pipe into
and
> out of them? Will they work within parentheses? In procedures? Will
you
> allow all shell features (pipes, say) are applied to arbitrary
> combinations
> of conventional and integrated commands?
> 
> In your example of a `backquote command` (which I prefer to invoke via
$(
> ... ) using BASH) you'd be exposed to any unintended side-effects
within
> the backquote command. Side-effects like file descriptor alterations,
> changes in signal dispositions, receipt of signals or exceptions
(expected
> or the result of a programming error).
> 
> The beauty of the fork/exec model with entirely separated programs
_is_
> their self-containedness and the complete independence and isolation
each
> of the programs gets from each other and from the program(s) that
invoke
> them. It is also nice in that it is a very simple programming model
for
> commands, both built-in and end-user-supplied, that run within it. It
is
> probably less platform-specific than a scheme that demands use of
> dynamically-linked / shared libraries.
> 
> The Unix shell and process model may be somewhat costly of computing
> resources (but only marginally so), especially as I said without
> copy-on-write behavior in the fork call, but that rather modest
down-side
> is more than made up for by independence, modularity, and
open-endedness
> of
> the scheme.
> 
> I can't see how all the work your idea implies just for the sake of
some
> incremental performance improvements is going to be worthwhile.
> 
> By the way, which shell will you do this for? BASH, TCSH, Ash? More
than
> one?
> 
> Please feel free to prove me wrong, of course.
> 
> Randall Schulz
> Mountain View, CA USA
> 
> 
> At 22:50 2002-03-18, Robert Collins wrote:
> >Just a curiousity...
> >
> >I've a mental concept I've been batting around for a while - about
how
> can
> >we drastically increase configure and related script performance on
> cygwin...
> >
> >AFAICT the largest performance issue is fork() and exec(). File
access is
> >quite fast, as is networking. Unix sockets are a go slow given Ralf's
> >testing :p but that's about it.
> >
> >So, what I'm thinking could be done is:
> >Create a new shell. For the most common current causes of
fork()/exec(),
> >make those commands internal. Specifically, make all expression
> evaluation
> >(such as `basename foo`) done in-process (i.e. C-style
> >code:{save_context();evalute (expression);pop_context(result);}, only
> >spawning commands where they are not internal. (Currently, AFAIK, ash
and
> >bash use sub-shells quite commonly).
> >
> >Now that would be a maintenance and coding nightmare - repeating lots
of
> >other folk's work, and having to get bug compatability as well.... no
> thanks.
> >
> >What if, instead of rewriting all those helper commands, we
> >*) Make each one into a library - ie cygshellbasename0.dll. - with a
> >well-defined interface (say execute (int argc, char **argv), AND no
ABI
> >changes!
> >*) Replace the current binary with a façade that uses the .dll.
> >*) in the shell, look for the library *before* calling the binary,
thus
> >saving a spawn()
> >*) Ideally, adapt an existing shell rather than starting new (I'm not
a
> >reinvent-ze-wheel) kinda guy.
> >
> >Now I imagine that if done _properly_ the upstream authors won't
object
> >too much to librarization, so the amount of code to be written is
> >significantly shrunk.
> >
> >I've not seen a specific project to accomplish this (in
> >google/freshmeat/sourceforge) - but I figure that cygwin is _such_ a
> prime
> >platform for it that if one exists, and I'd be repeating work, I'll
find
> >someone who knows it here....
> >
> >Anyway, this is (obviously) a long-term proposition, but if two or
three
> >folk from here would be interested in collaborating on such a
project...
> >
> >Cheers,
> >Rob
> 
> 
> --
> Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
> Bug reporting:         http://cygwin.com/bugs.html
> Documentation:         http://cygwin.com/docs.html
> FAQ:                   http://cygwin.com/faq/




--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: OT: possible project/research project
       [not found] <FC169E059D1A0442A04C40F86D9BA76062DA@itdomain003.itdomain. net.au>
@ 2002-03-19 17:32 ` Randall R Schulz
  2002-03-19 17:47   ` Stephano Mariani
  0 siblings, 1 reply; 30+ messages in thread
From: Randall R Schulz @ 2002-03-19 17:32 UTC (permalink / raw)
  To: Robert Collins, cygwin

Robert,

This idea isn't really new. I remember people talking about it back in the 
System 6, System 7 and 32v days, when programs were starting to get bigger, 
disks were still pretty slow, main store rather small and there was not yet 
a copy-on-write fork(2) or a vfork(2). (Not to mention the meager control 
flow in the pre-Bourne shell that used a sub-process to effect a seek(2) on 
the standard input being interpreted by the shell running the script!!!)

The problem is that you're creating a huge project that creates no new 
functionality and that has horrendous maintainence issues, as you say.

The library conversion idea is kind of a throwback to pre-Unix days or to 
systems like VMS (if I recall and understand it properly). In these systems 
there were "blessed" commands understood by the command interpreter and 
endowed with a more direct means of invocation. Other commands required 
full sub-process creation.

I trust it's your intent that the user will see no obvious differences in 
invoking these programs, but you may find full transparency harder to 
achieve than you expect. Will the full range of shell features be available 
to these specially integrated commands? Will you be able to pipe into and 
out of them? Will they work within parentheses? In procedures? Will you 
allow all shell features (pipes, say) are applied to arbitrary combinations 
of conventional and integrated commands?

In your example of a `backquote command` (which I prefer to invoke via $( 
... ) using BASH) you'd be exposed to any unintended side-effects within 
the backquote command. Side-effects like file descriptor alterations, 
changes in signal dispositions, receipt of signals or exceptions (expected 
or the result of a programming error).

The beauty of the fork/exec model with entirely separated programs _is_ 
their self-containedness and the complete independence and isolation each 
of the programs gets from each other and from the program(s) that invoke 
them. It is also nice in that it is a very simple programming model for 
commands, both built-in and end-user-supplied, that run within it. It is 
probably less platform-specific than a scheme that demands use of 
dynamically-linked / shared libraries.

The Unix shell and process model may be somewhat costly of computing 
resources (but only marginally so), especially as I said without 
copy-on-write behavior in the fork call, but that rather modest down-side 
is more than made up for by independence, modularity, and open-endedness of 
the scheme.

I can't see how all the work your idea implies just for the sake of some 
incremental performance improvements is going to be worthwhile.

By the way, which shell will you do this for? BASH, TCSH, Ash? More than one?

Please feel free to prove me wrong, of course.

Randall Schulz
Mountain View, CA USA

At 22:50 2002-03-18, Robert Collins wrote:
>Just a curiousity...
>
>I've a mental concept I've been batting around for a while - about how can 
>we drastically increase configure and related script performance on cygwin...
>
>AFAICT the largest performance issue is fork() and exec(). File access is 
>quite fast, as is networking. Unix sockets are a go slow given Ralf's 
>testing :p but that's about it.
>
>So, what I'm thinking could be done is:
>Create a new shell. For the most common current causes of fork()/exec(), 
>make those commands internal. Specifically, make all expression evaluation 
>(such as `basename foo`) done in-process (i.e. C-style 
>code:{save_context();evalute (expression);pop_context(result);}, only 
>spawning commands where they are not internal. (Currently, AFAIK, ash and 
>bash use sub-shells quite commonly).
>
>Now that would be a maintenance and coding nightmare - repeating lots of 
>other folk's work, and having to get bug compatability as well.... no thanks.
>
>What if, instead of rewriting all those helper commands, we
>*) Make each one into a library - ie cygshellbasename0.dll. - with a 
>well-defined interface (say execute (int argc, char **argv), AND no ABI 
>changes!
>*) Replace the current binary with a façade that uses the .dll.
>*) in the shell, look for the library *before* calling the binary, thus 
>saving a spawn()
>*) Ideally, adapt an existing shell rather than starting new (I'm not a 
>reinvent-ze-wheel) kinda guy.
>
>Now I imagine that if done _properly_ the upstream authors won't object 
>too much to librarization, so the amount of code to be written is 
>significantly shrunk.
>
>I've not seen a specific project to accomplish this (in 
>google/freshmeat/sourceforge) - but I figure that cygwin is _such_ a prime 
>platform for it that if one exists, and I'd be repeating work, I'll find 
>someone who knows it here....
>
>Anyway, this is (obviously) a long-term proposition, but if two or three 
>folk from here would be interested in collaborating on such a project...
>
>Cheers,
>Rob

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: OT: possible project/research project
  2002-03-18 23:17 Robert Collins
@ 2002-03-19 17:27 ` Matthew Smith
  0 siblings, 0 replies; 30+ messages in thread
From: Matthew Smith @ 2002-03-19 17:27 UTC (permalink / raw)
  To: Cygwin

Robert:

    I'm not sure what I could do, but if you're willing to be the project
leader, and hand out work to do, I'd be more than happy to pitch in and
help.  Sounds like you have a pretty good idea of what needs to be done.

cheers,
-Matt Smith

>I've not seen a specific project to accomplish this (in
google/freshmeat/sourceforge) - but I figure that cygwin is _such_ a prime
>platform for it that if one exists, and I'd be repeating work, I'll find
someone who knows it here....
>
>Anyway, this is (obviously) a long-term proposition, but if two or three
folk from here would be interested in collaborating on >such a project...
>
>Cheers,
>Rob





--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* OT: possible project/research project
@ 2002-03-18 23:17 Robert Collins
  2002-03-19 17:27 ` Matthew Smith
  0 siblings, 1 reply; 30+ messages in thread
From: Robert Collins @ 2002-03-18 23:17 UTC (permalink / raw)
  To: cygwin

Just a curiousity...

I've a mental concept I've been batting around for a while - about how can we drastically increase configure and related script performance on cygwin...

AFAICT the largest performance issue is fork() and exec(). File access is quite fast, as is networking. Unix sockets are a go slow given Ralf's testing :p but that's about it.

So, what I'm thinking could be done is:
Create a new shell. For the most common current causes of fork()/exec(), make those commands internal. Specifically, make all expression evaluation (such as `basename foo`) done in-process (i.e. C-style code:{save_context();evalute (expression);pop_context(result);}, only spawning commands where they are not internal. (Currently, AFAIK, ash and bash use sub-shells quite commonly).

Now that would be a maintenance and coding nightmare - repeating lots of other folk's work, and having to get bug compatability as well.... no thanks.

What if, instead of rewriting all those helper commands, we
*) Make each one into a library - ie cygshellbasename0.dll. - with a well-defined interface (say execute (int argc, char **argv), AND no ABI changes!
*) Replace the current binary with a façade that uses the .dll.
*) in the shell, look for the library *before* calling the binary, thus saving a spawn()
*) Ideally, adapt an existing shell rather than starting new (I'm not a reinvent-ze-wheel) kinda guy.

Now I imagine that if done _properly_ the upstream authors won't object too much to librarization, so the amount of code to be written is significantly shrunk.

I've not seen a specific project to accomplish this (in google/freshmeat/sourceforge) - but I figure that cygwin is _such_ a prime platform for it that if one exists, and I'd be repeating work, I'll find someone who knows it here....

Anyway, this is (obviously) a long-term proposition, but if two or three folk from here would be interested in collaborating on such a project...

Cheers,
Rob

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2002-03-22  5:08 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-03-19 23:29 OT: possible project/research project Robert Collins
2002-03-20 21:00 ` Gary R. Van Sickle
2002-03-21  0:54   ` Jesper Eskilson
  -- strict thread matches above, loose matches on Subject: below --
2002-03-21 21:10 Robert Collins
2002-03-21  5:03 Robert Collins
2002-03-21  9:29 ` Christopher Faylor
2002-03-21  4:27 Robert Collins
2002-03-21  4:53 ` Jesper Eskilson
2002-03-21  2:55 Robert Collins
2002-03-21  2:40 Robert Collins
     [not found] <5.1.0.14.2.20020320191522.02498050@pop3.cris.com>
2002-03-20 21:48 ` Gary R. Van Sickle
2002-03-20 18:11 Joshua Daniel Franklin
2002-03-20 11:44 Robert Collins
     [not found] <FC169E059D1A0442A04C40F86D9BA76062E0@itdomain003.itdomain. net.au>
2002-03-20  8:24 ` Randall R Schulz
2002-03-20  1:54 Robert Collins
     [not found] <FC169E059D1A0442A04C40F86D9BA76062DD@itdomain003.itdomain. net.au>
2002-03-20  0:53 ` Randall R Schulz
2002-03-19 23:50 Robert Collins
2002-03-19 23:20 Robert Collins
2002-03-20  9:00 ` Christopher Faylor
2002-03-19 23:14 Robert Collins
     [not found] <FC169E059D1A0442A04C40F86D9BA76062DA@itdomain003.itdomain. net.au>
2002-03-19 17:32 ` Randall R Schulz
2002-03-19 17:47   ` Stephano Mariani
2002-03-19 19:17     ` Gary R. Van Sickle
2002-03-20  1:58       ` Stephano Mariani
2002-03-20  9:16         ` Christopher Faylor
2002-03-20 20:07         ` Gary R. Van Sickle
     [not found]     ` <NCBBIHCHBLCMLBLOBONKMEOGCLAA.g.r.vansickle@worldnet.att.ne t>
2002-03-19 23:04       ` Randall R Schulz
2002-03-20 19:41         ` Gary R. Van Sickle
2002-03-18 23:17 Robert Collins
2002-03-19 17:27 ` Matthew Smith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).