public inbox for cygwin-talk@cygwin.com
 help / color / mirror / Atom feed
* Compressing hippos really fast
@ 2008-03-04 15:43 Corinna Vinschen
  2008-03-04 16:33 ` Dave Korn
  2008-03-04 16:35 ` Owen Rees
  0 siblings, 2 replies; 13+ messages in thread
From: Corinna Vinschen @ 2008-03-04 15:43 UTC (permalink / raw)
  To: cygwin-talk

Hi,


does anybody know about a compression tool which is above all capable of
compressing really fast?  The compression ratio is only a mild concern,
it's rather more important that the tool is not acting as bottleneck
when compressing files which are badly compressable.  Unfortunately
the usual compression tools are rather interested in a good compression
than in a good speed when streaming lots of data.

Here are a couple of disks which are supposed to be backed up.  Right
now this is done using a script which creats tar.gz archives of all
disks.  Some of this disks are quite big and contains many files which
are already compressed.  It turns out that gzipping these disks is *the*
bottleneck when backing up.

When not compressing, tar creates archives with 37MB/s.  When creating
tar.gz archives, the compression takes so much time that the speed goes
down to 6MB/s.  Using gzip --fast doesn't help much.  bzip is a lot
slower than gzip.

So the question is, does anybody know a compression tool which can be
used with tar, which doesn't slow down the backup by a factor of 6?  It
would be cool to have a tool which is as quick as the hardware
compression used in modern tape drives, but that's just dreaming...


May the hippos be with you,
Corinna

^ permalink raw reply	[flat|nested] 13+ messages in thread
* RE: Compressing hippos really fast
@ 2008-03-04 18:35 Phil Betts
  2008-03-04 18:57 ` Lee D. Rothstein
  2008-03-05 10:05 ` Corinna Vinschen
  0 siblings, 2 replies; 13+ messages in thread
From: Phil Betts @ 2008-03-04 18:35 UTC (permalink / raw)
  To: cygwin-talk

Corinna Vinschen wrote on Tuesday, March 04, 2008 3:43 PM::

> Hi,
> 
> 
> does anybody know about a compression tool which is above all capable
> of compressing really fast?  The compression ratio is only a mild
> concern, it's rather more important that the tool is not acting as
> bottleneck when compressing files which are badly compressable. 
> Unfortunately 
> the usual compression tools are rather interested in a good
> compression than in a good speed when streaming lots of data.
> 
> Here are a couple of disks which are supposed to be backed up.  Right
> now this is done using a script which creats tar.gz archives of all
> disks.  Some of this disks are quite big and contains many files which
> are already compressed.  It turns out that gzipping these disks is
> *the* bottleneck when backing up.
> 
> When not compressing, tar creates archives with 37MB/s.  When creating
> tar.gz archives, the compression takes so much time that the speed
> goes down to 6MB/s.  Using gzip --fast doesn't help much.  bzip is a
> lot slower than gzip.
> 
> So the question is, does anybody know a compression tool which can be
> used with tar, which doesn't slow down the backup by a factor of 6? 
> It would be cool to have a tool which is as quick as the hardware
> compression used in modern tape drives, but that's just dreaming...
> 
> 
> May the hippos be with you,
> Corinna

I had this problem ages ago.  My solution was to run two backups.  
One uncompressed including only files globbing *.gz, *.t[bg]z, *.[zZ], 
*.bz2, *.zip etc, and one for the remainder which was piped 
through gzip.

Even a fast compression algorithm is just wasting time trying to 
compress previously compressed files, and as most compressors work 
on some variant of Lempel Ziv, if they're fed a mixture of 
compressible and incompressible data, the incompressible data 
flushes the dictionary making the compression of the compressible 
part worse.

Phil

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2008-03-08 22:04 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-04 15:43 Compressing hippos really fast Corinna Vinschen
2008-03-04 16:33 ` Dave Korn
2008-03-04 16:46   ` Corinna Vinschen
2008-03-04 16:35 ` Owen Rees
2008-03-04 16:45   ` Corinna Vinschen
2008-03-04 16:58     ` Dave Korn
2008-03-04 17:01     ` Igor Peshansky
2008-03-04 21:24     ` Buchbinder, Barry (NIH/NIAID) [E]
2008-03-05 10:03       ` Corinna Vinschen
2008-03-08 22:04         ` Robert Pendell
2008-03-04 18:35 Phil Betts
2008-03-04 18:57 ` Lee D. Rothstein
2008-03-05 10:05 ` Corinna Vinschen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).