public inbox for cygwin-talk@cygwin.com
 help / color / mirror / Atom feed
* RE: Compressing hippos really fast
@ 2008-03-04 18:35 Phil Betts
  2008-03-04 18:57 ` Lee D. Rothstein
  2008-03-05 10:05 ` Corinna Vinschen
  0 siblings, 2 replies; 13+ messages in thread
From: Phil Betts @ 2008-03-04 18:35 UTC (permalink / raw)
  To: cygwin-talk

Corinna Vinschen wrote on Tuesday, March 04, 2008 3:43 PM::

> Hi,
> 
> 
> does anybody know about a compression tool which is above all capable
> of compressing really fast?  The compression ratio is only a mild
> concern, it's rather more important that the tool is not acting as
> bottleneck when compressing files which are badly compressable. 
> Unfortunately 
> the usual compression tools are rather interested in a good
> compression than in a good speed when streaming lots of data.
> 
> Here are a couple of disks which are supposed to be backed up.  Right
> now this is done using a script which creats tar.gz archives of all
> disks.  Some of this disks are quite big and contains many files which
> are already compressed.  It turns out that gzipping these disks is
> *the* bottleneck when backing up.
> 
> When not compressing, tar creates archives with 37MB/s.  When creating
> tar.gz archives, the compression takes so much time that the speed
> goes down to 6MB/s.  Using gzip --fast doesn't help much.  bzip is a
> lot slower than gzip.
> 
> So the question is, does anybody know a compression tool which can be
> used with tar, which doesn't slow down the backup by a factor of 6? 
> It would be cool to have a tool which is as quick as the hardware
> compression used in modern tape drives, but that's just dreaming...
> 
> 
> May the hippos be with you,
> Corinna

I had this problem ages ago.  My solution was to run two backups.  
One uncompressed including only files globbing *.gz, *.t[bg]z, *.[zZ], 
*.bz2, *.zip etc, and one for the remainder which was piped 
through gzip.

Even a fast compression algorithm is just wasting time trying to 
compress previously compressed files, and as most compressors work 
on some variant of Lempel Ziv, if they're fed a mixture of 
compressible and incompressible data, the incompressible data 
flushes the dictionary making the compression of the compressible 
part worse.

Phil

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Compressing hippos really fast
  2008-03-04 18:35 Compressing hippos really fast Phil Betts
@ 2008-03-04 18:57 ` Lee D. Rothstein
  2008-03-05 10:05 ` Corinna Vinschen
  1 sibling, 0 replies; 13+ messages in thread
From: Lee D. Rothstein @ 2008-03-04 18:57 UTC (permalink / raw)
  To: The Vulgar and Unprofessional Cygwin-Talk List

Sounds like he needs data-dedupe. Google "data de-duplication" for an 
array of vendors.

Phil Betts wrote:
> Corinna Vinschen wrote on Tuesday, March 04, 2008 3:43 PM::
>
>   
>> Hi,
>>
>>
>> does anybody know about a compression tool which is above all capable
>> of compressing really fast?  The compression ratio is only a mild
>> concern, it's rather more important that the tool is not acting as
>> bottleneck when compressing files which are badly compressable. 
>> Unfortunately 
>> the usual compression tools are rather interested in a good
>> compression than in a good speed when streaming lots of data.
>>
>> Here are a couple of disks which are supposed to be backed up.  Right
>> now this is done using a script which creats tar.gz archives of all
>> disks.  Some of this disks are quite big and contains many files which
>> are already compressed.  It turns out that gzipping these disks is
>> *the* bottleneck when backing up.
>>
>> When not compressing, tar creates archives with 37MB/s.  When creating
>> tar.gz archives, the compression takes so much time that the speed
>> goes down to 6MB/s.  Using gzip --fast doesn't help much.  bzip is a
>> lot slower than gzip.
>>
>> So the question is, does anybody know a compression tool which can be
>> used with tar, which doesn't slow down the backup by a factor of 6? 
>> It would be cool to have a tool which is as quick as the hardware
>> compression used in modern tape drives, but that's just dreaming...
>>
>>
>> May the hippos be with you,
>> Corinna
>>     
>
> I had this problem ages ago.  My solution was to run two backups.  
> One uncompressed including only files globbing *.gz, *.t[bg]z, *.[zZ], 
> *.bz2, *.zip etc, and one for the remainder which was piped 
> through gzip.
>
> Even a fast compression algorithm is just wasting time trying to 
> compress previously compressed files, and as most compressors work 
> on some variant of Lempel Ziv, if they're fed a mixture of 
> compressible and incompressible data, the incompressible data 
> flushes the dictionary making the compression of the compressible 
> part worse.
>
> Phil
>
>   

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Compressing hippos really fast
  2008-03-04 18:35 Compressing hippos really fast Phil Betts
  2008-03-04 18:57 ` Lee D. Rothstein
@ 2008-03-05 10:05 ` Corinna Vinschen
  1 sibling, 0 replies; 13+ messages in thread
From: Corinna Vinschen @ 2008-03-05 10:05 UTC (permalink / raw)
  To: cygwin-talk

On Mar  4 18:35, Phil Betts wrote:
> I had this problem ages ago.  My solution was to run two backups.  
> One uncompressed including only files globbing *.gz, *.t[bg]z, *.[zZ], 
> *.bz2, *.zip etc, and one for the remainder which was piped 
> through gzip.

I guess that's the way to go in the long run.


Corinna

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Compressing hippos really fast
  2008-03-05 10:03       ` Corinna Vinschen
@ 2008-03-08 22:04         ` Robert Pendell
  0 siblings, 0 replies; 13+ messages in thread
From: Robert Pendell @ 2008-03-08 22:04 UTC (permalink / raw)
  To: cygwin-talk

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Corinna Vinschen wrote:
| On Mar  4 16:23, Buchbinder, Barry (NIH/NIAID) [E] wrote:
|> Corinna Vinschen wrote:
|>> does anybody know about a compression tool which is above all
|>> capable of compressing really fast?
|> I know that this is very "un-Linuxy", but if you don't really
|> need tar, you might consider zip.
|
| Urgh.  I'm not sure that's really an option.  I never tried to use
| zip on a terabyte of data...
|
|
| Corinna
|
I don't think that would work.  I think the linux version is based on
Info-Zip and that one was limited to 2GB archive size.

- --
Robert Pendell
shinji@elite-systems.org

Thawte Web of Trust Notary
CAcert Assurer
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFH0w1bs1pR2j1qW+sRAuNcAJ41P5hGXU+ruPuK0lQbue36We5gHgCfaIqK
PA277TzK78wmnB7llQMwQoA=
=rlR8
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Compressing hippos really fast
  2008-03-04 21:24     ` Buchbinder, Barry (NIH/NIAID) [E]
@ 2008-03-05 10:03       ` Corinna Vinschen
  2008-03-08 22:04         ` Robert Pendell
  0 siblings, 1 reply; 13+ messages in thread
From: Corinna Vinschen @ 2008-03-05 10:03 UTC (permalink / raw)
  To: cygwin-talk

On Mar  4 16:23, Buchbinder, Barry (NIH/NIAID) [E] wrote:
> Corinna Vinschen wrote:
> > does anybody know about a compression tool which is above all
> > capable of compressing really fast?
> 
> I know that this is very "un-Linuxy", but if you don't really
> need tar, you might consider zip.

Urgh.  I'm not sure that's really an option.  I never tried to use
zip on a terabyte of data...


Corinna

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Compressing hippos really fast
  2008-03-04 16:45   ` Corinna Vinschen
  2008-03-04 16:58     ` Dave Korn
  2008-03-04 17:01     ` Igor Peshansky
@ 2008-03-04 21:24     ` Buchbinder, Barry (NIH/NIAID) [E]
  2008-03-05 10:03       ` Corinna Vinschen
  2 siblings, 1 reply; 13+ messages in thread
From: Buchbinder, Barry (NIH/NIAID) [E] @ 2008-03-04 21:24 UTC (permalink / raw)
  To: cygwin-talk

Corinna Vinschen wrote:
> does anybody know about a compression tool which is above all
> capable of compressing really fast?

I know that this is very "un-Linuxy", but if you don't really
need tar, you might consider zip.

The -u (update) option will only compress changed files.  The
rest of the time, is devoted to coping the already compressed
part of the zip and doing the date comparisons.  (One might
let find do the latter.)

And to adopt a previously suggestion, the -n option will just
store (no compression) files with certain suffixes.

In my experience, zip -u doesn't seem particularly fast
(though that is with cygwin over a busy LAN on busy servers),
but it might be worth an experiment.  (And be prepared for
the initial creation of the zip archives being slow.)

Good luck.

- Barry

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Compressing hippos really fast
  2008-03-04 16:45   ` Corinna Vinschen
  2008-03-04 16:58     ` Dave Korn
@ 2008-03-04 17:01     ` Igor Peshansky
  2008-03-04 21:24     ` Buchbinder, Barry (NIH/NIAID) [E]
  2 siblings, 0 replies; 13+ messages in thread
From: Igor Peshansky @ 2008-03-04 17:01 UTC (permalink / raw)
  To: The Cygwin-Talk Maiming List

On Tue, 4 Mar 2008, Corinna Vinschen wrote:

> On Mar  4 16:34, Owen Rees wrote:
>
> > ...but a passing hippo suggested that lzop <http://www.lzop.org/>
> > would be worth investigating.
>
> Dave Korn is a hippo?
>
> http://cygwin.com/ml/cygwin-talk/2008-q1/msg00096.html

On the internet, nobody knows if you're a hippo.
	Igor
-- 
				http://cs.nyu.edu/~pechtcha/
      |\      _,,,---,,_	    pechtcha@cs.nyu.edu | igor@watson.ibm.com
ZZZzz /,`.-'`'    -.  ;-;;,_		Igor Peshansky, Ph.D. (name changed!)
     |,4-  ) )-,_. ,\ (  `'-'		old name: Igor Pechtchanski
    '---''(_/--'  `-'\_) fL	a.k.a JaguaR-R-R-r-r-r-.-.-.  Meow!

"That which is hateful to you, do not do to your neighbor.  That is the whole
Torah; the rest is commentary.  Go and study it." -- Rabbi Hillel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Compressing hippos really fast
  2008-03-04 16:45   ` Corinna Vinschen
@ 2008-03-04 16:58     ` Dave Korn
  2008-03-04 17:01     ` Igor Peshansky
  2008-03-04 21:24     ` Buchbinder, Barry (NIH/NIAID) [E]
  2 siblings, 0 replies; 13+ messages in thread
From: Dave Korn @ 2008-03-04 16:58 UTC (permalink / raw)
  To: 'wheeeeesssssplatKERSQUELCH!'

On 04 March 2008 16:45, maybe.i'll.try@rolling.it.back.one.version wrote:

> On Mar  4 16:34, Owen Rees wrote:
>> --On 04 March 2008 16:43 +0100 Corinna Vinschen wrote:
>> 
>>> does anybody know about a compression tool which is above all capable of
>>> compressing really fast?
>> 
>> I have not used it myself, but a passing hippo suggested that lzop
>> <http://www.lzop.org/> would be worth investigating.
> 
> Dave Korn is a hippo?


   Ssshhhhh!  You're giving my secret away!

<wallows>

    cheers,
      DaveK

-- 
Can't think of a witty .sigline today....

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Compressing hippos really fast
  2008-03-04 16:33 ` Dave Korn
@ 2008-03-04 16:46   ` Corinna Vinschen
  0 siblings, 0 replies; 13+ messages in thread
From: Corinna Vinschen @ 2008-03-04 16:46 UTC (permalink / raw)
  To: cygwin-talk

On Mar  4 16:33, a hippo wrote:
>   It should be possible to use lzop as an inline filter compressor, more or
> less as a direct drop-in replacement for gzip
> 
> http://www.lzop.org/

Thanks, I'll give it a try.


Corinna

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Compressing hippos really fast
  2008-03-04 16:35 ` Owen Rees
@ 2008-03-04 16:45   ` Corinna Vinschen
  2008-03-04 16:58     ` Dave Korn
                       ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Corinna Vinschen @ 2008-03-04 16:45 UTC (permalink / raw)
  To: cygwin-talk

On Mar  4 16:34, Owen Rees wrote:
> --On 04 March 2008 16:43 +0100 Corinna Vinschen wrote:
>
>> does anybody know about a compression tool which is above all capable of
>> compressing really fast?
>
> I have not used it myself, but a passing hippo suggested that lzop 
> <http://www.lzop.org/> would be worth investigating.

Dave Korn is a hippo?

http://cygwin.com/ml/cygwin-talk/2008-q1/msg00096.html

;)


Thanks,
Corinna

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Compressing hippos really fast
  2008-03-04 15:43 Corinna Vinschen
  2008-03-04 16:33 ` Dave Korn
@ 2008-03-04 16:35 ` Owen Rees
  2008-03-04 16:45   ` Corinna Vinschen
  1 sibling, 1 reply; 13+ messages in thread
From: Owen Rees @ 2008-03-04 16:35 UTC (permalink / raw)
  To: The Vulgar and Unprofessional Cygwin-Talk List

--On 04 March 2008 16:43 +0100 Corinna Vinschen wrote:

> does anybody know about a compression tool which is above all capable of
> compressing really fast?

I have not used it myself, but a passing hippo suggested that lzop 
<http://www.lzop.org/> would be worth investigating.

-- 
Owen Rees
========================================================
Hewlett-Packard Limited.   Registered No: 690597 England
Registered Office:  Cain Road, Bracknell, Berks RG12 1HN

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Compressing hippos really fast
  2008-03-04 15:43 Corinna Vinschen
@ 2008-03-04 16:33 ` Dave Korn
  2008-03-04 16:46   ` Corinna Vinschen
  2008-03-04 16:35 ` Owen Rees
  1 sibling, 1 reply; 13+ messages in thread
From: Dave Korn @ 2008-03-04 16:33 UTC (permalink / raw)
  To: 'argh quotefix keeps picking the wrong name'

On 04 March 2008 15:43, oh.no.my@quotefix.is.borken wrote:

> does anybody know about a compression tool which is above all capable of
> compressing really fast?  The compression ratio is only a mild concern,
> it's rather more important that the tool is not acting as bottleneck
> when compressing files which are badly compressable.  Unfortunately
> the usual compression tools are rather interested in a good compression
> than in a good speed when streaming lots of data.

  Hmm, I came across something much like this lately while researching
compression: take a look at LZO.  It focusses more on uncompression speed
but is also supposed to be fairly fast for compression. 

http://www.oberhumer.com/opensource/lzo/

  It should be possible to use lzop as an inline filter compressor, more or
less as a direct drop-in replacement for gzip

http://www.lzop.org/



    cheers,
      DaveK

-- 
Can't think of a witty .sigline today....

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Compressing hippos really fast
@ 2008-03-04 15:43 Corinna Vinschen
  2008-03-04 16:33 ` Dave Korn
  2008-03-04 16:35 ` Owen Rees
  0 siblings, 2 replies; 13+ messages in thread
From: Corinna Vinschen @ 2008-03-04 15:43 UTC (permalink / raw)
  To: cygwin-talk

Hi,


does anybody know about a compression tool which is above all capable of
compressing really fast?  The compression ratio is only a mild concern,
it's rather more important that the tool is not acting as bottleneck
when compressing files which are badly compressable.  Unfortunately
the usual compression tools are rather interested in a good compression
than in a good speed when streaming lots of data.

Here are a couple of disks which are supposed to be backed up.  Right
now this is done using a script which creats tar.gz archives of all
disks.  Some of this disks are quite big and contains many files which
are already compressed.  It turns out that gzipping these disks is *the*
bottleneck when backing up.

When not compressing, tar creates archives with 37MB/s.  When creating
tar.gz archives, the compression takes so much time that the speed goes
down to 6MB/s.  Using gzip --fast doesn't help much.  bzip is a lot
slower than gzip.

So the question is, does anybody know a compression tool which can be
used with tar, which doesn't slow down the backup by a factor of 6?  It
would be cool to have a tool which is as quick as the hardware
compression used in modern tape drives, but that's just dreaming...


May the hippos be with you,
Corinna

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2008-03-08 22:04 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-04 18:35 Compressing hippos really fast Phil Betts
2008-03-04 18:57 ` Lee D. Rothstein
2008-03-05 10:05 ` Corinna Vinschen
  -- strict thread matches above, loose matches on Subject: below --
2008-03-04 15:43 Corinna Vinschen
2008-03-04 16:33 ` Dave Korn
2008-03-04 16:46   ` Corinna Vinschen
2008-03-04 16:35 ` Owen Rees
2008-03-04 16:45   ` Corinna Vinschen
2008-03-04 16:58     ` Dave Korn
2008-03-04 17:01     ` Igor Peshansky
2008-03-04 21:24     ` Buchbinder, Barry (NIH/NIAID) [E]
2008-03-05 10:03       ` Corinna Vinschen
2008-03-08 22:04         ` Robert Pendell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).