public inbox for overseers@sourceware.org
 help / color / mirror / Atom feed
* Re: ftp mirrors
  2000-12-30  6:08 ` ftp mirrors Jim Kingdon
@ 2000-06-09 14:53   ` Jim Kingdon
  2000-12-30  6:08   ` Andrew Cagney
  1 sibling, 0 replies; 20+ messages in thread
From: Jim Kingdon @ 2000-06-09 14:53 UTC (permalink / raw)
  To: DJ Delorie; +Cc: overseers

(in response to a sourcemaster@cygnus.com query)

> Do we support rsync at this time?  Is there some resource page for
> people mirroring from sourceware?

Yes, we have rsync set up for both CVS and FTP.  Feel free to tell
people like mirror sites (my only reason for reluctance is that rsync
could conceivably be a bandwidth hog if it got popular but I'm not all
that worried about that happening).

See http://sourceware.cygnus.com/sourceware/rsync.html for details.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ftp mirrors
  2000-12-30  6:08   ` Andrew Cagney
@ 2000-06-09 22:05     ` Andrew Cagney
  2000-12-30  6:08     ` Jason Molenda
  1 sibling, 0 replies; 20+ messages in thread
From: Andrew Cagney @ 2000-06-09 22:05 UTC (permalink / raw)
  To: Jim Kingdon; +Cc: DJ Delorie, overseers

Jim Kingdon wrote:
> 
> (in response to a sourcemaster@cygnus.com query)
> 
> > Do we support rsync at this time?  Is there some resource page for
> > people mirroring from sourceware?
> 
> Yes, we have rsync set up for both CVS and FTP.  Feel free to tell
> people like mirror sites (my only reason for reluctance is that rsync
> could conceivably be a bandwidth hog if it got popular but I'm not all
> that worried about that happening).
> 
> See http://sourceware.cygnus.com/sourceware/rsync.html for details.

And (shameless plug) cvsup :-)

With rsync, I've wondered about creating a hidden shaddow of the FTP
area that didn't contain gz/bz files.  rsync works best against
uncompressed archives.  Giving official shaddows access to that should
result in a significant reduction in bandwidth (at a cost of cpu
performance at the remote end).

	Andrew

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ftp mirrors
  2000-12-30  6:08     ` Jason Molenda
@ 2000-06-10  1:02       ` Jason Molenda
  2000-12-30  6:08       ` Andrew Cagney
  2000-12-30  6:08       ` Jason Molenda
  2 siblings, 0 replies; 20+ messages in thread
From: Jason Molenda @ 2000-06-10  1:02 UTC (permalink / raw)
  To: Andrew Cagney; +Cc: Jim Kingdon, DJ Delorie, overseers

On Sat, Jun 10, 2000 at 03:02:59PM +1000, Andrew Cagney wrote:

> And (shameless plug) cvsup :-)

For Buddha's sake Cagney, you'd read your mail with cvsup if you
could figure out how. :-)

> With rsync, I've wondered about creating a hidden shaddow of the FTP
> area that didn't contain gz/bz files.  

We don't have the disk space for that.  (try a df on sourceware and
imagine what the 3GB of ftp would expand to when uncompressed)

More importantly, rsync has support for this problem already.

       dont compress
              The "dont compress" option  allows  you  to  select
              filenames  based  on  wildcard patterns that should
              not be compressed during transfer.  Compression  is
              expensive  in  terms  of CPU usage so it is usually
              good to not try to compress files that  won't  com-
              press well, such as already compressed files.

              The  "dont compress" option takes a space separated
              list of  case-insensitive  wildcard  patterns.  Any
              source  filename  matching one of the patterns will
              not be compressed during transfer.

              The default setting is

              *.gz *.tgz *.zip *.z *.rpm *.deb

Looks like setting 'dont compress' to include .bz2 would be enough.


J

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ftp mirrors
  2000-12-30  6:08       ` Jason Molenda
@ 2000-06-10  1:08         ` Jason Molenda
  0 siblings, 0 replies; 20+ messages in thread
From: Jason Molenda @ 2000-06-10  1:08 UTC (permalink / raw)
  To: Andrew Cagney; +Cc: Jim Kingdon, DJ Delorie, overseers

On Sat, Jun 10, 2000 at 01:01:32AM -0700, Jason Molenda wrote:

> More importantly, rsync has support for this problem already.

>        dont compress

>               The default setting is
> 
>               *.gz *.tgz *.zip *.z *.rpm *.deb
> 
> Looks like setting 'dont compress' to include .bz2 would be enough.


Har, I'm so cool I had already done this long ago and forgot all
about it. :-)     From infra/bin/generate-rsyncd-conf.sh:

  dont compress = *.gz *.tgz *.zip *.z *.rpm *.deb *.bz2


Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ftp mirrors
  2000-12-30  6:08       ` Andrew Cagney
@ 2000-06-10  4:19         ` Andrew Cagney
  2000-12-30  6:08         ` Andrew Cagney
  2000-12-30  6:08         ` Jason Molenda
  2 siblings, 0 replies; 20+ messages in thread
From: Andrew Cagney @ 2000-06-10  4:19 UTC (permalink / raw)
  To: Jason Molenda; +Cc: Jim Kingdon, DJ Delorie, overseers

Jason Molenda wrote:

> We don't have the disk space for that.  (try a df on sourceware and
> imagine what the 3GB of ftp would expand to when uncompressed)
> 
> More importantly, rsync has support for this problem already.
> 
>        dont compress
>               The "dont compress" option  allows  you  to  select
>               filenames  based  on  wildcard patterns that should
>               not be compressed during transfer.  Compression  is
>               expensive  in  terms  of CPU usage so it is usually
>               good to not try to compress files that  won't  com-
>               press well, such as already compressed files.
> 
>               The  "dont compress" option takes a space separated
>               list of  case-insensitive  wildcard  patterns.  Any
>               source  filename  matching one of the patterns will
>               not be compressed during transfer.
> 
>               The default setting is
> 
>               *.gz *.tgz *.zip *.z *.rpm *.deb
> 
> Looks like setting 'dont compress' to include .bz2 would be enough.

That isn't what I'm thinking of.  given a gzip file, rsync has
effectivly random data.  Given the file uncompressed and rsync can
recognize internal sections.  Tar balls, for instance change very
little, however because they are compressed, rsync can't see that.

	Andrew

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ftp mirrors
  2000-12-30  6:08         ` Jason Molenda
@ 2000-06-10  9:39           ` Jason Molenda
  0 siblings, 0 replies; 20+ messages in thread
From: Jason Molenda @ 2000-06-10  9:39 UTC (permalink / raw)
  To: Andrew Cagney; +Cc: Jim Kingdon, DJ Delorie, overseers

On Sat, Jun 10, 2000 at 09:17:11PM +1000, Andrew Cagney wrote:

> That isn't what I'm thinking of.  given a gzip file, rsync has
> effectivly random data.  

Right.

> Given the file uncompressed and rsync can
> recognize internal sections.  

??  Rsync recognizes blocks of data.  It doesn't interpret the
syntax of a .c file, run an SGML parser on an .html file, or try
to parse the English sentences in a .txt file.  A block of random
data and a block of human-readable data are little different to
rsync.

> Tar balls, for instance change very
> little, however because they are compressed, rsync can't see that.

How is a tarball any different from any other file?
How often does someone modify an existing tarball, anyway?
Rsync will run a checksum or timestamp check on the tarball, see
that the checksum of the server version matches that of the client
version, and move on.

Are you making this all up on your own - assuming that there is a
problem - or is there some actual evidence of a problem?  I don't
mean to be harsh, but I seem to be engaged in an intellectual
discussion of the tone "I bet rsync is slow doing foo and we should
change how we do things.  Someone should prove to me otherwise."

Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ftp mirrors
  2000-12-30  6:08         ` Andrew Cagney
@ 2000-06-10 18:03           ` Andrew Cagney
  2000-12-30  6:08           ` Jason Molenda
  1 sibling, 0 replies; 20+ messages in thread
From: Andrew Cagney @ 2000-06-10 18:03 UTC (permalink / raw)
  To: Jason Molenda; +Cc: overseers

Jason Molenda wrote:

> ??  Rsync recognizes blocks of data.  It doesn't interpret the
> syntax of a .c file, run an SGML parser on an .html file, or try
> to parse the English sentences in a .txt file.  A block of random
> data and a block of human-readable data are little different to
> rsync.

Given two nightly snapshots there are very few differences.  Once the
tar ball has gone through gzip, however, all similarity is lost.

Its a lot more efficient to rsync the uncompressed tar-ball than it is
to down load the compressed version (well it is for me :-).

> Are you making this all up on your own - assuming that there is a
> problem - or is there some actual evidence of a problem?  I don't
> mean to be harsh, but I seem to be engaged in an intellectual
> discussion of the tone "I bet rsync is slow doing foo and we should
> change how we do things.  Someone should prove to me otherwise."

Not so much a problem of ``rsync is slow'' but rather, is there a better
way.  

One issue raised by individual testers during the gdb 5.0 release
process was the logistics of repeatedly draging down 10mb tar-balls. 
Next time around I'll have the un-compressed tar ball available.  It
occures to me that this could be scaled :-)

	enjoy,
		Andrew

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ftp mirrors
  2000-12-30  6:08           ` Jason Molenda
@ 2000-06-10 23:55             ` Jason Molenda
  2000-12-30  6:08             ` Jim Kingdon
  1 sibling, 0 replies; 20+ messages in thread
From: Jason Molenda @ 2000-06-10 23:55 UTC (permalink / raw)
  To: Andrew Cagney; +Cc: overseers

On Sun, Jun 11, 2000 at 11:02:43AM +1000, Andrew Cagney wrote:

> Given two nightly snapshots there are very few differences.  Once the
> tar ball has gone through gzip, however, all similarity is lost.

I guess I don't understand this.

You're suggesting that rsync is clever enough to see two files on
the server, foo-2000-06-09 and foo-2000-06-10, see that the client
already has foo-2000-06-09, and make the leap that the -09 and -10
files are probably pretty close to each other, so do a diff between
the server's -09 and -10 and send that diff?

I've never heard of that - do you have something to back it up?  I
must admit to being a little incredulous.

> Its a lot more efficient to rsync the uncompressed tar-ball than it is
> to down load the compressed version (well it is for me :-).

I'd have to see a clear example of this before I believed it - it
just doesn't make any sense.  You saw a file, which existed in both
.tar and .tar.gz format on sourceware, and neither existed on your
local host, and rsync'ing both of them to your system resulted in
the .tar file downloading noticeably faster than the .tar.gz file?

I'm sorry, but I just can't take that on faith.  Can you outline a
little more explicitly what you're measuring here?

Think about it.  The .tar file is 30MB on disk.  The .tar.gz, which
was gzipped with gzip -9 presumably, is 10MB.  The default gzip -3
gives you, say, a file length of 13MB.

If rsync send the .tar.gz, it sends 10MB of data.

If rsync sends the .tar file without compression, it sends 30MB of data.

If rsync sends the .tar file with compression, it sends 10-13MB of data.

Where's the gain?


> One issue raised by individual testers during the gdb 5.0 release
> process was the logistics of repeatedly dragging down 10mb tar-balls. 

Let me introduce you to my good friend, "diff". :-)

If someone is following a release process, they either need to (a)
use CVS, (b) download diffs back to the last fully downloaded
snapshot they have, or (c) have an amazingly fast net connection.

If they don't have an amazingly fast net connection, and they're
downloading snapshots every day *and complaining about it*, then
the solution here is to educate them on the magic of patch, not
putting uncompressed files on the ftp server.  Fix the developer's
problem at the right place -- at the developer.

> Next time around I'll have the un-compressed tar ball available.  It
> occures to me that this could be scaled :-)

Of course I don't maintain sourceware any longer, but the disk usage for
having uncompressed tar files, and having dozens of people download a file
that is 4x larger than necessary -- for no gain -- is simply unacceptable.
We don't have the disk space, we don't have the bandwidth.

But as I said, I don't run sourceware any longer, so it's out of my hands.

Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ftp mirrors
  2000-12-30  6:08             ` Jim Kingdon
@ 2000-06-11  8:24               ` Jim Kingdon
  2000-12-30  6:08               ` Jason Molenda
  1 sibling, 0 replies; 20+ messages in thread
From: Jim Kingdon @ 2000-06-11  8:24 UTC (permalink / raw)
  To: overseers

> You're suggesting that rsync is clever enough to see two files on
> the server, foo-2000-06-09 and foo-2000-06-10, see that the client
> already has foo-2000-06-09, and make the leap that the -09 and -10
> files are probably pretty close to each other, so do a diff between
> the server's -09 and -10 and send that diff?

If rsync just concatenates the files or something equivalent it should
be able to get this result without a special case (at least some of
the time).  See

http://rsync.samba.org/rsync/tech_report/

especially the section on "checksum searching".  Of course it would be
good if someone would actually test it to verify this happens in real
world cases like snapshots.

There is hair and possible adverse effects in the cases where rsync
can't do it's magic, but I guess at least in the case of cvsup, we
(well, Jeff anyway :-)) were willing to deal with the hair if Andrew
did the work of actually maintaining the thing.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ftp mirrors
  2000-12-30  6:08               ` Jason Molenda
@ 2000-06-11 15:08                 ` Jason Molenda
  0 siblings, 0 replies; 20+ messages in thread
From: Jason Molenda @ 2000-06-11 15:08 UTC (permalink / raw)
  To: Jim Kingdon; +Cc: overseers

On Sun, Jun 11, 2000 at 11:24:09AM -0400, Jim Kingdon wrote:

> > You're suggesting that rsync is clever enough to see two files on
> > the server, foo-2000-06-09 and foo-2000-06-10, see that the client
> > already has foo-2000-06-09, and make the leap that the -09 and -10
> > files are probably pretty close to each other, so do a diff between
> > the server's -09 and -10 and send that diff?
> 
> If rsync just concatenates the files or something equivalent it should
> be able to get this result without a special case (at least some of
> the time).  See
> 
> http://rsync.samba.org/rsync/tech_report/
> 
> especially the section on "checksum searching".  Of course it would be
> good if someone would actually test it to verify this happens in real
> world cases like snapshots.

rsync doesn't concatenate files like that AFAIK - the whole discussion
about checksum search is only relevant to when a file that exists
on both the server and the client has changed and rsync wants to
find the block that change.

Most importantly, all of this is irrelevant to Andrew's reported
problem.  Andrew says that a developer complained that they had to
download full tarballs every day while following a release process.

I am at a loss to guess why Andrew made a leap from that problem
to the solution of having a copy of the ftp dir with uncompressed
versions of the tarballs, and offering rsync access to that.  I'm
unable to understand what (a) this has to do with the problem Andrew
described (was the developer actually rsync'ing the gdb ftp dir?
I don't think I've ever seen anyone download snapshots via rsync)
(b) why the developer shouldn't just download diffs, and finally,
(c) what benefit, exactly, is gained by having uncompressed tarballs
with rsync.

Maybe Andrew is just throwing out non-sequitors to frustrate me or
yank on my chain - I'm at a loss of what the point of this discussion
is.  God forbid an ftp mirror site mirrored the uncompressed versions
of the ftp directory; the communication between the mirror site
and sourceware would be compressed, but all of the files in the
mirror site's ftp dir would be uncompressed, just like the uncompressed
dir on sourceware.  It'd be a huge disaster.

Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ftp mirrors
  2000-12-30  6:08     ` Jason Molenda
  2000-06-10  1:02       ` Jason Molenda
@ 2000-12-30  6:08       ` Andrew Cagney
  2000-06-10  4:19         ` Andrew Cagney
                           ` (2 more replies)
  2000-12-30  6:08       ` Jason Molenda
  2 siblings, 3 replies; 20+ messages in thread
From: Andrew Cagney @ 2000-12-30  6:08 UTC (permalink / raw)
  To: Jason Molenda; +Cc: Jim Kingdon, DJ Delorie, overseers

Jason Molenda wrote:

> We don't have the disk space for that.  (try a df on sourceware and
> imagine what the 3GB of ftp would expand to when uncompressed)
> 
> More importantly, rsync has support for this problem already.
> 
>        dont compress
>               The "dont compress" option  allows  you  to  select
>               filenames  based  on  wildcard patterns that should
>               not be compressed during transfer.  Compression  is
>               expensive  in  terms  of CPU usage so it is usually
>               good to not try to compress files that  won't  com-
>               press well, such as already compressed files.
> 
>               The  "dont compress" option takes a space separated
>               list of  case-insensitive  wildcard  patterns.  Any
>               source  filename  matching one of the patterns will
>               not be compressed during transfer.
> 
>               The default setting is
> 
>               *.gz *.tgz *.zip *.z *.rpm *.deb
> 
> Looks like setting 'dont compress' to include .bz2 would be enough.

That isn't what I'm thinking of.  given a gzip file, rsync has
effectivly random data.  Given the file uncompressed and rsync can
recognize internal sections.  Tar balls, for instance change very
little, however because they are compressed, rsync can't see that.

	Andrew

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ftp mirrors
  2000-12-30  6:08     ` Jason Molenda
  2000-06-10  1:02       ` Jason Molenda
  2000-12-30  6:08       ` Andrew Cagney
@ 2000-12-30  6:08       ` Jason Molenda
  2000-06-10  1:08         ` Jason Molenda
  2 siblings, 1 reply; 20+ messages in thread
From: Jason Molenda @ 2000-12-30  6:08 UTC (permalink / raw)
  To: Andrew Cagney; +Cc: Jim Kingdon, DJ Delorie, overseers

On Sat, Jun 10, 2000 at 01:01:32AM -0700, Jason Molenda wrote:

> More importantly, rsync has support for this problem already.

>        dont compress

>               The default setting is
> 
>               *.gz *.tgz *.zip *.z *.rpm *.deb
> 
> Looks like setting 'dont compress' to include .bz2 would be enough.


Har, I'm so cool I had already done this long ago and forgot all
about it. :-)     From infra/bin/generate-rsyncd-conf.sh:

  dont compress = *.gz *.tgz *.zip *.z *.rpm *.deb *.bz2


Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ftp mirrors
  2000-12-30  6:08   ` Andrew Cagney
  2000-06-09 22:05     ` Andrew Cagney
@ 2000-12-30  6:08     ` Jason Molenda
  2000-06-10  1:02       ` Jason Molenda
                         ` (2 more replies)
  1 sibling, 3 replies; 20+ messages in thread
From: Jason Molenda @ 2000-12-30  6:08 UTC (permalink / raw)
  To: Andrew Cagney; +Cc: Jim Kingdon, DJ Delorie, overseers

On Sat, Jun 10, 2000 at 03:02:59PM +1000, Andrew Cagney wrote:

> And (shameless plug) cvsup :-)

For Buddha's sake Cagney, you'd read your mail with cvsup if you
could figure out how. :-)

> With rsync, I've wondered about creating a hidden shaddow of the FTP
> area that didn't contain gz/bz files.  

We don't have the disk space for that.  (try a df on sourceware and
imagine what the 3GB of ftp would expand to when uncompressed)

More importantly, rsync has support for this problem already.

       dont compress
              The "dont compress" option  allows  you  to  select
              filenames  based  on  wildcard patterns that should
              not be compressed during transfer.  Compression  is
              expensive  in  terms  of CPU usage so it is usually
              good to not try to compress files that  won't  com-
              press well, such as already compressed files.

              The  "dont compress" option takes a space separated
              list of  case-insensitive  wildcard  patterns.  Any
              source  filename  matching one of the patterns will
              not be compressed during transfer.

              The default setting is

              *.gz *.tgz *.zip *.z *.rpm *.deb

Looks like setting 'dont compress' to include .bz2 would be enough.


J

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ftp mirrors
  2000-12-30  6:08             ` Jim Kingdon
  2000-06-11  8:24               ` Jim Kingdon
@ 2000-12-30  6:08               ` Jason Molenda
  2000-06-11 15:08                 ` Jason Molenda
  1 sibling, 1 reply; 20+ messages in thread
From: Jason Molenda @ 2000-12-30  6:08 UTC (permalink / raw)
  To: Jim Kingdon; +Cc: overseers

On Sun, Jun 11, 2000 at 11:24:09AM -0400, Jim Kingdon wrote:

> > You're suggesting that rsync is clever enough to see two files on
> > the server, foo-2000-06-09 and foo-2000-06-10, see that the client
> > already has foo-2000-06-09, and make the leap that the -09 and -10
> > files are probably pretty close to each other, so do a diff between
> > the server's -09 and -10 and send that diff?
> 
> If rsync just concatenates the files or something equivalent it should
> be able to get this result without a special case (at least some of
> the time).  See
> 
> http://rsync.samba.org/rsync/tech_report/
> 
> especially the section on "checksum searching".  Of course it would be
> good if someone would actually test it to verify this happens in real
> world cases like snapshots.

rsync doesn't concatenate files like that AFAIK - the whole discussion
about checksum search is only relevant to when a file that exists
on both the server and the client has changed and rsync wants to
find the block that change.

Most importantly, all of this is irrelevant to Andrew's reported
problem.  Andrew says that a developer complained that they had to
download full tarballs every day while following a release process.

I am at a loss to guess why Andrew made a leap from that problem
to the solution of having a copy of the ftp dir with uncompressed
versions of the tarballs, and offering rsync access to that.  I'm
unable to understand what (a) this has to do with the problem Andrew
described (was the developer actually rsync'ing the gdb ftp dir?
I don't think I've ever seen anyone download snapshots via rsync)
(b) why the developer shouldn't just download diffs, and finally,
(c) what benefit, exactly, is gained by having uncompressed tarballs
with rsync.

Maybe Andrew is just throwing out non-sequitors to frustrate me or
yank on my chain - I'm at a loss of what the point of this discussion
is.  God forbid an ftp mirror site mirrored the uncompressed versions
of the ftp directory; the communication between the mirror site
and sourceware would be compressed, but all of the files in the
mirror site's ftp dir would be uncompressed, just like the uncompressed
dir on sourceware.  It'd be a huge disaster.

Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ftp mirrors
  2000-12-30  6:08       ` Andrew Cagney
  2000-06-10  4:19         ` Andrew Cagney
  2000-12-30  6:08         ` Andrew Cagney
@ 2000-12-30  6:08         ` Jason Molenda
  2000-06-10  9:39           ` Jason Molenda
  2 siblings, 1 reply; 20+ messages in thread
From: Jason Molenda @ 2000-12-30  6:08 UTC (permalink / raw)
  To: Andrew Cagney; +Cc: Jim Kingdon, DJ Delorie, overseers

On Sat, Jun 10, 2000 at 09:17:11PM +1000, Andrew Cagney wrote:

> That isn't what I'm thinking of.  given a gzip file, rsync has
> effectivly random data.  

Right.

> Given the file uncompressed and rsync can
> recognize internal sections.  

??  Rsync recognizes blocks of data.  It doesn't interpret the
syntax of a .c file, run an SGML parser on an .html file, or try
to parse the English sentences in a .txt file.  A block of random
data and a block of human-readable data are little different to
rsync.

> Tar balls, for instance change very
> little, however because they are compressed, rsync can't see that.

How is a tarball any different from any other file?
How often does someone modify an existing tarball, anyway?
Rsync will run a checksum or timestamp check on the tarball, see
that the checksum of the server version matches that of the client
version, and move on.

Are you making this all up on your own - assuming that there is a
problem - or is there some actual evidence of a problem?  I don't
mean to be harsh, but I seem to be engaged in an intellectual
discussion of the tone "I bet rsync is slow doing foo and we should
change how we do things.  Someone should prove to me otherwise."

Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ftp mirrors
       [not found] <200006091545.LAA00938.cygnus.project.sourcemaster@envy.delorie.com>
@ 2000-12-30  6:08 ` Jim Kingdon
  2000-06-09 14:53   ` Jim Kingdon
  2000-12-30  6:08   ` Andrew Cagney
  0 siblings, 2 replies; 20+ messages in thread
From: Jim Kingdon @ 2000-12-30  6:08 UTC (permalink / raw)
  To: DJ Delorie; +Cc: overseers

(in response to a sourcemaster@cygnus.com query)

> Do we support rsync at this time?  Is there some resource page for
> people mirroring from sourceware?

Yes, we have rsync set up for both CVS and FTP.  Feel free to tell
people like mirror sites (my only reason for reluctance is that rsync
could conceivably be a bandwidth hog if it got popular but I'm not all
that worried about that happening).

See http://sourceware.cygnus.com/sourceware/rsync.html for details.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ftp mirrors
  2000-12-30  6:08       ` Andrew Cagney
  2000-06-10  4:19         ` Andrew Cagney
@ 2000-12-30  6:08         ` Andrew Cagney
  2000-06-10 18:03           ` Andrew Cagney
  2000-12-30  6:08           ` Jason Molenda
  2000-12-30  6:08         ` Jason Molenda
  2 siblings, 2 replies; 20+ messages in thread
From: Andrew Cagney @ 2000-12-30  6:08 UTC (permalink / raw)
  To: Jason Molenda; +Cc: overseers

Jason Molenda wrote:

> ??  Rsync recognizes blocks of data.  It doesn't interpret the
> syntax of a .c file, run an SGML parser on an .html file, or try
> to parse the English sentences in a .txt file.  A block of random
> data and a block of human-readable data are little different to
> rsync.

Given two nightly snapshots there are very few differences.  Once the
tar ball has gone through gzip, however, all similarity is lost.

Its a lot more efficient to rsync the uncompressed tar-ball than it is
to down load the compressed version (well it is for me :-).

> Are you making this all up on your own - assuming that there is a
> problem - or is there some actual evidence of a problem?  I don't
> mean to be harsh, but I seem to be engaged in an intellectual
> discussion of the tone "I bet rsync is slow doing foo and we should
> change how we do things.  Someone should prove to me otherwise."

Not so much a problem of ``rsync is slow'' but rather, is there a better
way.  

One issue raised by individual testers during the gdb 5.0 release
process was the logistics of repeatedly draging down 10mb tar-balls. 
Next time around I'll have the un-compressed tar ball available.  It
occures to me that this could be scaled :-)

	enjoy,
		Andrew

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ftp mirrors
  2000-12-30  6:08 ` ftp mirrors Jim Kingdon
  2000-06-09 14:53   ` Jim Kingdon
@ 2000-12-30  6:08   ` Andrew Cagney
  2000-06-09 22:05     ` Andrew Cagney
  2000-12-30  6:08     ` Jason Molenda
  1 sibling, 2 replies; 20+ messages in thread
From: Andrew Cagney @ 2000-12-30  6:08 UTC (permalink / raw)
  To: Jim Kingdon; +Cc: DJ Delorie, overseers

Jim Kingdon wrote:
> 
> (in response to a sourcemaster@cygnus.com query)
> 
> > Do we support rsync at this time?  Is there some resource page for
> > people mirroring from sourceware?
> 
> Yes, we have rsync set up for both CVS and FTP.  Feel free to tell
> people like mirror sites (my only reason for reluctance is that rsync
> could conceivably be a bandwidth hog if it got popular but I'm not all
> that worried about that happening).
> 
> See http://sourceware.cygnus.com/sourceware/rsync.html for details.

And (shameless plug) cvsup :-)

With rsync, I've wondered about creating a hidden shaddow of the FTP
area that didn't contain gz/bz files.  rsync works best against
uncompressed archives.  Giving official shaddows access to that should
result in a significant reduction in bandwidth (at a cost of cpu
performance at the remote end).

	Andrew

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ftp mirrors
  2000-12-30  6:08           ` Jason Molenda
  2000-06-10 23:55             ` Jason Molenda
@ 2000-12-30  6:08             ` Jim Kingdon
  2000-06-11  8:24               ` Jim Kingdon
  2000-12-30  6:08               ` Jason Molenda
  1 sibling, 2 replies; 20+ messages in thread
From: Jim Kingdon @ 2000-12-30  6:08 UTC (permalink / raw)
  To: overseers

> You're suggesting that rsync is clever enough to see two files on
> the server, foo-2000-06-09 and foo-2000-06-10, see that the client
> already has foo-2000-06-09, and make the leap that the -09 and -10
> files are probably pretty close to each other, so do a diff between
> the server's -09 and -10 and send that diff?

If rsync just concatenates the files or something equivalent it should
be able to get this result without a special case (at least some of
the time).  See

http://rsync.samba.org/rsync/tech_report/

especially the section on "checksum searching".  Of course it would be
good if someone would actually test it to verify this happens in real
world cases like snapshots.

There is hair and possible adverse effects in the cases where rsync
can't do it's magic, but I guess at least in the case of cvsup, we
(well, Jeff anyway :-)) were willing to deal with the hair if Andrew
did the work of actually maintaining the thing.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ftp mirrors
  2000-12-30  6:08         ` Andrew Cagney
  2000-06-10 18:03           ` Andrew Cagney
@ 2000-12-30  6:08           ` Jason Molenda
  2000-06-10 23:55             ` Jason Molenda
  2000-12-30  6:08             ` Jim Kingdon
  1 sibling, 2 replies; 20+ messages in thread
From: Jason Molenda @ 2000-12-30  6:08 UTC (permalink / raw)
  To: Andrew Cagney; +Cc: overseers

On Sun, Jun 11, 2000 at 11:02:43AM +1000, Andrew Cagney wrote:

> Given two nightly snapshots there are very few differences.  Once the
> tar ball has gone through gzip, however, all similarity is lost.

I guess I don't understand this.

You're suggesting that rsync is clever enough to see two files on
the server, foo-2000-06-09 and foo-2000-06-10, see that the client
already has foo-2000-06-09, and make the leap that the -09 and -10
files are probably pretty close to each other, so do a diff between
the server's -09 and -10 and send that diff?

I've never heard of that - do you have something to back it up?  I
must admit to being a little incredulous.

> Its a lot more efficient to rsync the uncompressed tar-ball than it is
> to down load the compressed version (well it is for me :-).

I'd have to see a clear example of this before I believed it - it
just doesn't make any sense.  You saw a file, which existed in both
.tar and .tar.gz format on sourceware, and neither existed on your
local host, and rsync'ing both of them to your system resulted in
the .tar file downloading noticeably faster than the .tar.gz file?

I'm sorry, but I just can't take that on faith.  Can you outline a
little more explicitly what you're measuring here?

Think about it.  The .tar file is 30MB on disk.  The .tar.gz, which
was gzipped with gzip -9 presumably, is 10MB.  The default gzip -3
gives you, say, a file length of 13MB.

If rsync send the .tar.gz, it sends 10MB of data.

If rsync sends the .tar file without compression, it sends 30MB of data.

If rsync sends the .tar file with compression, it sends 10-13MB of data.

Where's the gain?


> One issue raised by individual testers during the gdb 5.0 release
> process was the logistics of repeatedly dragging down 10mb tar-balls. 

Let me introduce you to my good friend, "diff". :-)

If someone is following a release process, they either need to (a)
use CVS, (b) download diffs back to the last fully downloaded
snapshot they have, or (c) have an amazingly fast net connection.

If they don't have an amazingly fast net connection, and they're
downloading snapshots every day *and complaining about it*, then
the solution here is to educate them on the magic of patch, not
putting uncompressed files on the ftp server.  Fix the developer's
problem at the right place -- at the developer.

> Next time around I'll have the un-compressed tar ball available.  It
> occures to me that this could be scaled :-)

Of course I don't maintain sourceware any longer, but the disk usage for
having uncompressed tar files, and having dozens of people download a file
that is 4x larger than necessary -- for no gain -- is simply unacceptable.
We don't have the disk space, we don't have the bandwidth.

But as I said, I don't run sourceware any longer, so it's out of my hands.

Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2000-12-30  6:08 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <200006091545.LAA00938.cygnus.project.sourcemaster@envy.delorie.com>
2000-12-30  6:08 ` ftp mirrors Jim Kingdon
2000-06-09 14:53   ` Jim Kingdon
2000-12-30  6:08   ` Andrew Cagney
2000-06-09 22:05     ` Andrew Cagney
2000-12-30  6:08     ` Jason Molenda
2000-06-10  1:02       ` Jason Molenda
2000-12-30  6:08       ` Andrew Cagney
2000-06-10  4:19         ` Andrew Cagney
2000-12-30  6:08         ` Andrew Cagney
2000-06-10 18:03           ` Andrew Cagney
2000-12-30  6:08           ` Jason Molenda
2000-06-10 23:55             ` Jason Molenda
2000-12-30  6:08             ` Jim Kingdon
2000-06-11  8:24               ` Jim Kingdon
2000-12-30  6:08               ` Jason Molenda
2000-06-11 15:08                 ` Jason Molenda
2000-12-30  6:08         ` Jason Molenda
2000-06-10  9:39           ` Jason Molenda
2000-12-30  6:08       ` Jason Molenda
2000-06-10  1:08         ` Jason Molenda

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).