public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* distributed compilation with gcc: distcc
@ 2002-07-22  7:55 Martin Pool
  2002-07-22 17:30 ` Aldy Hernandez
  0 siblings, 1 reply; 14+ messages in thread
From: Martin Pool @ 2002-07-22  7:55 UTC (permalink / raw)
  To: gcc

I've written a small wrapper around gcc that allows the work of
compilation to be distributed across several machines on a network.

  http://distcc.samba.org/

Scalability is quite good: with 3 equal machines, compilation is
typically 2.5 times faster.  (Obviously the number will vary depending
on the code, machines, network, etc.)  In other words, a project that
normally builds in 45 minutes might be built in 15, just using spare
machines that you may already have in your office.

distcc does not require shared filesystems, a kernel patch, root
access, a gcc patch, and in general does not require modifications to
your source or Makefiles.  It has a reasonably complete manual and a
test suite.  It supports cross-compilation and clusters of
heterogenous machines.

Anyone who remembers this post will be relieved to hear that distcc is
considerably smaller and simpler.

  http://gcc.gnu.org/ml/gcc/2001-02/msg00378.html

It's early days, but distcc seems to successfully build several large
projects including the Linux kernel, KDE, Samba and GARNOME.  I hear
it works well with hairy C++ ASN.1 code, and can build gcc.

Please check it out if you have a chance, and let me know what you
think.

-- 
Martin 
(i'm not subscribed; please cc me on replies)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: distributed compilation with gcc: distcc
  2002-07-22  7:55 distributed compilation with gcc: distcc Martin Pool
@ 2002-07-22 17:30 ` Aldy Hernandez
  2002-07-22 18:34   ` Diego Novillo
                     ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Aldy Hernandez @ 2002-07-22 17:30 UTC (permalink / raw)
  To: Martin Pool; +Cc: gcc

>>>>> "Martin" == Martin Pool <mbp@samba.org> writes:

 > I've written a small wrapper around gcc that allows the work of
 > compilation to be distributed across several machines on a network.

 >   http://distcc.samba.org/

 > access, a gcc patch, and in general does not require modifications to
 > your source or Makefiles.  It has a reasonably complete manual and a

hi martin.

I've been playing with distcc, and it requires some infrastructure
work before it can really satisfy our needs.

Only a minimal part of gcc is built with the system compiler.  For
example, once a minimal compiler is built, the rest of the bootstrap
process uses the new compiler.  So not only do Makefiles have to be
tweaked, but distcc has to be altered to either handle a path to the
new compiler, or be sent the entire [new] compiler and assembler as
part of the compilation request (in case you don't want to use NFS).

Bootstraps of gcc generally take 2 hours on my system, and distcc will
only speed up the first stage, which just takes about 15 minutes.
What ends up happening is that a "make -jxx", will cause the first
stage to be built distributed, and the rest of the build (since it's
using the new compiler and distcc knows nothing about it) will end up
thrashing on the parent build host.

I'd be delighted to use distcc for building gcc, if it could handle
bootstraps.  Hint hint.  ;-)

Aldy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: distributed compilation with gcc: distcc
  2002-07-22 17:30 ` Aldy Hernandez
@ 2002-07-22 18:34   ` Diego Novillo
  2002-07-22 18:58     ` Aldy Hernandez
  2002-07-23  0:42   ` Martin Pool
  2002-07-23  2:10   ` Martin Pool
  2 siblings, 1 reply; 14+ messages in thread
From: Diego Novillo @ 2002-07-22 18:34 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Martin Pool, gcc

On Mon, 2002-07-22 at 16:34, Aldy Hernandez wrote:

> What ends up happening is that a "make -jxx", will cause the first
> stage to be built distributed, and the rest of the build (since it's
> using the new compiler and distcc knows nothing about it) will end up
> thrashing on the parent build host.
> 

When distributing the build, it would also be interesting if the partial
builds can be done on local filesystems.  Even if you distribute the
build itself, hammering on a shared fileserver will slow you down
significantly.

Of course, I have not thought about this in detail.  I'm not even sure
if it's realistic to think about building stages 1-3 in separate file
systems.

Another thing to think about: do you need the build machines to have the
exact same set of include and library files?


Diego.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: distributed compilation with gcc: distcc
  2002-07-22 18:34   ` Diego Novillo
@ 2002-07-22 18:58     ` Aldy Hernandez
  2002-07-22 22:17       ` Martin Pool
  0 siblings, 1 reply; 14+ messages in thread
From: Aldy Hernandez @ 2002-07-22 18:58 UTC (permalink / raw)
  To: Diego Novillo; +Cc: Martin Pool, gcc


> Another thing to think about: do you need the build machines to have the
> exact same set of include and library files?

correct me if i'm wrong, but i think distcc sends preprocessed files to
the slaves, so header files don't matter.  and the link is done on the
host machine, so it'll use the libraries in the host.

aldy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: distributed compilation with gcc: distcc
  2002-07-22 18:58     ` Aldy Hernandez
@ 2002-07-22 22:17       ` Martin Pool
  2002-07-22 22:55         ` Aldy Hernandez
  0 siblings, 1 reply; 14+ messages in thread
From: Martin Pool @ 2002-07-22 22:17 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Diego Novillo, gcc

On 22 Jul 2002, Aldy Hernandez <aldyh@redhat.com> wrote:
> 
> > Another thing to think about: do you need the build machines to have the
> > exact same set of include and library files?
> 
> correct me if i'm wrong, but i think distcc sends preprocessed files to
> the slaves, so header files don't matter.  and the link is done on the
> host machine, so it'll use the libraries in the host.

That is correct.  One of my design goals was to avoid forcing people
to have all the machines exactly in sync -- I think that is too much
hassle in most cases.  Even if you tried to achieve it, you might get
wierd failures where the libraries or headers were not exactly the
same.

The biggest dependency on the clients is that they must have a version
of gcc that is "sufficiently reasonable" to compile the preprocessed
source into an object file.  To start with, they must have an
appropriate cross compiler if the target architecture is not native.

There are also more incompatibilities between gcc versions that I
naively expected before I started doing this.  For example, for C++ in
particular, using object files from different versions of g++ is
pretty risky.  Another example is that some Linux kernels use gcc
features or behaviours that are only present in recent compilers.
FreeBSD and Linux object files also seem to be almost, but not really,
compatible.

I'm sure all that is old news to gcc people but it has surprised some
people using distcc.

Fortunately gcc has very nice support for having multiple versions
installed at the same time, so by using -b and -V you can avoid the
problems.  (Failing that, specify something like "distcc
gcc-x86-linux-2.95.4".)

-- 
Martin 
(please cc me on replies)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: distributed compilation with gcc: distcc
  2002-07-22 22:17       ` Martin Pool
@ 2002-07-22 22:55         ` Aldy Hernandez
  2002-07-23  3:34           ` Hook in gcc Sathiskanna
  2002-08-01  5:26           ` distributed compilation with gcc: distcc Martin Pool
  0 siblings, 2 replies; 14+ messages in thread
From: Aldy Hernandez @ 2002-07-22 22:55 UTC (permalink / raw)
  To: Martin Pool; +Cc: Diego Novillo, gcc


> The biggest dependency on the clients is that they must have a version
> of gcc that is "sufficiently reasonable" to compile the preprocessed

hmm, this is fine [most of the time] for building gcc because this
only affects the first stage build.   but that's the point of my other
post-- we'd all probably be delighted to use this, if it could handle
parallelizing the other stages.

in which case, you're going to have to design a mechanism to use the
recently built compiler to rebuild itself, target libraries, and 
other languages (c++, java, etc).  (see my previous email)

aldy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: distributed compilation with gcc: distcc
  2002-07-22 17:30 ` Aldy Hernandez
  2002-07-22 18:34   ` Diego Novillo
@ 2002-07-23  0:42   ` Martin Pool
  2002-07-23  2:10   ` Martin Pool
  2 siblings, 0 replies; 14+ messages in thread
From: Martin Pool @ 2002-07-23  0:42 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: gcc

On 22 Jul 2002, Aldy Hernandez <aldyh@redhat.com> wrote:
> >>>>> "Martin" == Martin Pool <mbp@samba.org> writes:
> 
>  > I've written a small wrapper around gcc that allows the work of
>  > compilation to be distributed across several machines on a network.
> 
>  >   http://distcc.samba.org/
> 
>  > access, a gcc patch, and in general does not require modifications to
>  > your source or Makefiles.  It has a reasonably complete manual and a
> 
> hi martin.
> 
> I've been playing with distcc, and it requires some infrastructure
> work before it can really satisfy our needs.

I haven't built gcc from scratch in years.  I will try it out under
distcc when I get a chance.  In principle, I think it would be
straightforward to make distcc use the new compiler.  I'm looking at
the gcc Makefile in viewcvs now, but it is a bit complex.

> Only a minimal part of gcc is built with the system compiler.  For
> example, once a minimal compiler is built, the rest of the bootstrap
> process uses the new compiler.  

How does that work if you're building a cross-compiler?  (Obviously it
does; I just don't understand.)  Do you build a native minimal gcc
first, and then use it to build the cross?

> So not only do Makefiles have to be tweaked, but distcc has to be
> altered to either handle a path to the new compiler, or be sent the
> entire [new] compiler and assembler as part of the compilation
> request (in case you don't want to use NFS).

distcc will already happily accept a fully-qualified name for the
compiler, as long as the binary can be found under that name on all
systems.  So you can do

  CC="distcc `pwd`/gcc-bootstrapped"

or whatever the name is.  

If the bootstrapped gcc requires some environment variables to be set
to find its libraries that may be more of a problem, since those
variables will not be passed to the daemon.  Ideally they would be set
by command line values instead.  Failing that you could use a shell
script to set them.

Is the intermediate gcc just a single binary, or does it require
libraries and spec files like the real gcc?  If so, copying it across
will be a little more complex.  Is there already something in the
Makefile that knows what all the files are?  (Something like
install-bootstrap?)

Obviously the volunteer machines need to get access to the newly built
compiler.  I can see three ways to do that

 1: have all machines share the build directory over NFS

 2: copy the new compiler onto all machines using rsync, scp, or
    whatever

 3: add a mechanism for copying random files to distcc

I don't know if NFS is practical in the environment of a typical gcc
maintainer.  If it is, then using distcc should be straightforward:
just set CC to give an absolute path to the compiler. 

#2 will require a little bit of distcc-specific Makefile tweaking to
copy the new compiler onto all machines.  (I guess you could keep the
distcc-specific parts out by just having a Make variable run as a hook
before switching compilers; that might be considered more tasteful.)
There are a few details about choosing an appropriate place to put the
new compiler on the remote system.

#3 is possible but it feels a bit like bloat, so I'd rather not do it
unless #1 and #2 are impractical.  Possibly it will be the practical
way to chose the right filenames though.

> I'd be delighted to use distcc for building gcc, if it could handle
> bootstraps.  Hint hint.  ;-)

I would be delighted to have distcc be useful to the gcc maintainers.
distcc wouldn't be nearly as much fun without gcc. :-)

-- 
Martin 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: distributed compilation with gcc: distcc
  2002-07-22 17:30 ` Aldy Hernandez
  2002-07-22 18:34   ` Diego Novillo
  2002-07-23  0:42   ` Martin Pool
@ 2002-07-23  2:10   ` Martin Pool
  2002-07-23 10:19     ` Alexandre Oliva
  2002-07-23 17:22     ` Ben Elliston
  2 siblings, 2 replies; 14+ messages in thread
From: Martin Pool @ 2002-07-23  2:10 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: gcc, distcc

[cc'd distcc list]

I just looked at  

  http://subversions.gnu.org/cgi-bin/viewcvs/gcc/gcc/gcc/Makefile.in?rev=1.912&content-type=text/vnd.viewcvs-markup

which makes things a bit clearer than the parent Makefile.  

So it seems like we have to at least have the $(build_tooldir), and
the $(builddir)/stage[1234] directories shared or copied across all
the machines.  In addition, for the special case of gcc which runs its
own output, all the volunteers will need to be compatible
architectures and OSs.

I suspect the easiest thing will be to share the builddir over NFS.
Is that possible?

-- 
Martin 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Hook in gcc
  2002-07-22 22:55         ` Aldy Hernandez
@ 2002-07-23  3:34           ` Sathiskanna
  2002-08-01  5:26           ` distributed compilation with gcc: distcc Martin Pool
  1 sibling, 0 replies; 14+ messages in thread
From: Sathiskanna @ 2002-07-23  3:34 UTC (permalink / raw)
  To: gcc


What is hook in gcc??

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: distributed compilation with gcc: distcc
  2002-07-23  2:10   ` Martin Pool
@ 2002-07-23 10:19     ` Alexandre Oliva
  2002-07-23 17:22     ` Ben Elliston
  1 sibling, 0 replies; 14+ messages in thread
From: Alexandre Oliva @ 2002-07-23 10:19 UTC (permalink / raw)
  To: Martin Pool; +Cc: Aldy Hernandez, gcc, distcc

On Jul 22, 2002, Martin Pool <mbp@samba.org> wrote:

> I suspect the easiest thing will be to share the builddir over NFS.

I agree.  I've been playing with distcc (along with ccache) and found
it to significantly speed up builds of gcc and gdb using 3-6 machines
at home (one of my favorite examples was that an all-gcc all-gdb build
went down from 7 to just short of 2 minutes, using 4 build machines,
and that without ccache!)

There are two problems with extending this to typical builds of gcc:

- when we build target libraries, we use the just-built xgcc as the
  compiler (except in the case of Canadian crosses), but the
  definition of CC_FOR_TARGET does not contain distcc in it, and it's
  not easy to put it in.  This means all of libgcc, libstdc++,
  libjava, etc, get compiled on the build machine.  If you have a
  distcc farm and tell make -j to use those many machines, your build
  machine will thrash when it gets to the point of building target
  libraries, because it won't be able to share the work with other
  machines.  In the case of Canadian crosses, the problem may get more
  complicated, because the PATH (to which one has presumably added the
  directory containing the cross tools necessary to build the Canadian
  cross) is not passed to the remote distccd, so it is likely to fail
  to find the cross tool.

- when we bootstrap gcc natively, we use stage1/xgcc and stage2/xgcc
  (with relative pathnames) to build the next stage, so distcc can't
  be used as it is now, since it expects compiler pathnames to be
  found in the PATH; it does not `cd' to a directory remotely before
  starting the compiler.  Also, the CC passed to every stage's build
  does not contain distcc (*), and there's no easy way to do it right
  now.  So, the same thrashing problem occurs when we bootstrap gcc
  with make -jN.


My suggestion to alleviate this problem involves 3 tasks:

- add a flag to distcc (and modify its protocol accordingly) to enable
  it to send a (partial?) PATH over to the daemon, such that one
  doesn't have to restart the daemon remotely to find cross compilers
  for Canadian crosses.  i.e., distcc -P /my/toolchain/bin would
  make sure the intended compiler is used for the entire build.

- add a flag to distcc (and modify its protocol accordingly) to tell
  distccd to chdir to given directory before running the command.
  Ideally, this flag should take an argument that specifies a
  transform pattern to be applied to the CWD, such that say a local
  pathname can be turned into a network-visible automounted pathname
  (i.e., distcc -C s:^:/net/`uname -n`: causes /local/tmp/mybuild/gcc
  to be referenced as /net/<hostname>/local/tmp/mybuild/gcc, assuming
  /net is where automount or amd does host mounts.  In some cases,
  such as pathnames that already are network-uniform (a shared /home),
  this would not be necessary, so we may want to decouple the request
  to chdir from pathname transforms, but I don't think it's worth it.

- introduce hooks in the GCC Makefiles to let one set prefixes for
  CC_FOR_TARGET and CC for stages, such that distcc can be easily
  prepended, along with the additional flags introduced above.  The
  latter would be used for native builds and bootstraps, as well as
  non-Canadian crosses (even though it wouldn't hurt Canadian crosses
  too), whereas the former would be mostly useful for Canadian
  crosses, but it would also help all other builds by ensuring that
  the intended toolchain is used for the initial build.


* nor ccache, but that would be pointless: the timestamp change in the
  compiler driver invalidates the cache; but then, when using ccache
  and distcc, ccache thinks distcc is the compiler driver, so I think
  we may actually end up with false matches.  Perhaps there should be
  a way to tell ccache to compute a checksum of a file compiler, and
  give it that checksum in the command line, to be used instead of
  taking info from the compiler driver.

-- 
Alexandre Oliva   Enjoy Guarana', see http://www.ic.unicamp.br/~oliva/
Red Hat GCC Developer                 aoliva@{redhat.com, gcc.gnu.org}
CS PhD student at IC-Unicamp        oliva@{lsd.ic.unicamp.br, gnu.org}
Free Software Evangelist                Professional serial bug killer

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: distributed compilation with gcc: distcc
  2002-07-23  2:10   ` Martin Pool
  2002-07-23 10:19     ` Alexandre Oliva
@ 2002-07-23 17:22     ` Ben Elliston
  2002-07-23 18:21       ` Tom Tromey
  2002-07-24  7:26       ` Martin Pool
  1 sibling, 2 replies; 14+ messages in thread
From: Ben Elliston @ 2002-07-23 17:22 UTC (permalink / raw)
  To: Martin Pool; +Cc: Aldy Hernandez, gcc, distcc

>>>>> "Martin" == Martin Pool <mbp@samba.org> writes:

  Martin> I suspect the easiest thing will be to share the builddir over NFS.
  Martin> Is that possible?

That depends entirely on the user's own preference.  I, for one, would
have no objections sharing the builddir over NFS, but I wonder how the
net performance will be after paying that penalty.

Ben

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: distributed compilation with gcc: distcc
  2002-07-23 17:22     ` Ben Elliston
@ 2002-07-23 18:21       ` Tom Tromey
  2002-07-24  7:26       ` Martin Pool
  1 sibling, 0 replies; 14+ messages in thread
From: Tom Tromey @ 2002-07-23 18:21 UTC (permalink / raw)
  To: Ben Elliston; +Cc: Martin Pool, Aldy Hernandez, gcc, distcc

>>>>> "Ben" == Ben Elliston <bje@redhat.com> writes:

Ben> That depends entirely on the user's own preference.  I, for one,
Ben> would have no objections sharing the builddir over NFS, but I
Ben> wonder how the net performance will be after paying that penalty.

If you really want to speed up gcc builds using distcc, you'll
eventually have to address the libgcj build.  In this case you'll
basically have to accept a shared directory structure.  A single
compilation in libgcj requires a lot of different files -- it isn't
like C where you can finesse the problem by running the preprocessor.

Tom

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: distributed compilation with gcc: distcc
  2002-07-23 17:22     ` Ben Elliston
  2002-07-23 18:21       ` Tom Tromey
@ 2002-07-24  7:26       ` Martin Pool
  1 sibling, 0 replies; 14+ messages in thread
From: Martin Pool @ 2002-07-24  7:26 UTC (permalink / raw)
  To: Ben Elliston; +Cc: Aldy Hernandez, gcc, distcc

On 23 Jul 2002, Ben Elliston <bje@redhat.com> wrote:
> >>>>> "Martin" == Martin Pool <mbp@samba.org> writes:
> 
>   Martin> I suspect the easiest thing will be to share the builddir over NFS.
>   Martin> Is that possible?
> 
> That depends entirely on the user's own preference.  I, for one, would
> have no objections sharing the builddir over NFS, but I wonder how the
> net performance will be after paying that penalty.

It ought to be pretty good, if the machines have enough memory to
cache the built compiler and so on, because once the files are
generated they won't change.

-- 
Martin 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: distributed compilation with gcc: distcc
  2002-07-22 22:55         ` Aldy Hernandez
  2002-07-23  3:34           ` Hook in gcc Sathiskanna
@ 2002-08-01  5:26           ` Martin Pool
  1 sibling, 0 replies; 14+ messages in thread
From: Martin Pool @ 2002-08-01  5:26 UTC (permalink / raw)
  To: gcc; +Cc: distcc

On 22 Jul 2002, Aldy Hernandez <aldyh@redhat.com> wrote:
> 
> > The biggest dependency on the clients is that they must have a version
> > of gcc that is "sufficiently reasonable" to compile the preprocessed
> 
> hmm, this is fine [most of the time] for building gcc because this
> only affects the first stage build.   but that's the point of my other
> post-- we'd all probably be delighted to use this, if it could handle
> parallelizing the other stages.
> 
> in which case, you're going to have to design a mechanism to use the
> recently built compiler to rebuild itself, target libraries, and 
> other languages (c++, java, etc).  (see my previous email)

Java's model for this kind of thing is obvious very different from C.
I think you could probably do something broadly similar to distcc, but
the details would be quite different, and it perhaps should be a
separate program.  I don't use Java much these days so I probably will
not do it any time soon.

-- 
Martin 

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2002-08-01 12:26 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-07-22  7:55 distributed compilation with gcc: distcc Martin Pool
2002-07-22 17:30 ` Aldy Hernandez
2002-07-22 18:34   ` Diego Novillo
2002-07-22 18:58     ` Aldy Hernandez
2002-07-22 22:17       ` Martin Pool
2002-07-22 22:55         ` Aldy Hernandez
2002-07-23  3:34           ` Hook in gcc Sathiskanna
2002-08-01  5:26           ` distributed compilation with gcc: distcc Martin Pool
2002-07-23  0:42   ` Martin Pool
2002-07-23  2:10   ` Martin Pool
2002-07-23 10:19     ` Alexandre Oliva
2002-07-23 17:22     ` Ben Elliston
2002-07-23 18:21       ` Tom Tromey
2002-07-24  7:26       ` Martin Pool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).