From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 24805 invoked by alias); 4 Mar 2008 18:35:41 -0000 Received: (qmail 24798 invoked by uid 22791); 4 Mar 2008 18:35:40 -0000 X-Spam-Check-By: sourceware.org Received: from asc-mail.ascribe.com (HELO mail-bridge.ascribe.com) (87.102.48.98) by sourceware.org (qpsmtpd/0.31) with ESMTP; Tue, 04 Mar 2008 18:35:16 +0000 Received: from core-email.int.ascribe.com (core-email.int.ascribe.com [10.0.100.71]) by mail-bridge.ascribe.com (Postfix) with ESMTP id 84572E0976 for ; Tue, 4 Mar 2008 18:34:15 +0000 (GMT) Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Subject: RE: Compressing hippos really fast Date: Tue, 04 Mar 2008 18:35:00 -0000 Message-ID: <5E25AF06EFB9EA4A87C19BC98F5C87533F02DF@core-email.int.ascribe.com> From: "Phil Betts" To: X-IsSubscribed: yes Mailing-List: contact cygwin-talk-help@cygwin.com; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: cygwin-talk-owner@cygwin.com Reply-To: The Vulgar and Unprofessional Cygwin-Talk List X-SW-Source: 2008-q1/txt/msg00102.txt.bz2 Corinna Vinschen wrote on Tuesday, March 04, 2008 3:43 PM:: > Hi, >=20 >=20 > does anybody know about a compression tool which is above all capable > of compressing really fast? The compression ratio is only a mild > concern, it's rather more important that the tool is not acting as > bottleneck when compressing files which are badly compressable.=20 > Unfortunately=20 > the usual compression tools are rather interested in a good > compression than in a good speed when streaming lots of data. >=20 > Here are a couple of disks which are supposed to be backed up. Right > now this is done using a script which creats tar.gz archives of all > disks. Some of this disks are quite big and contains many files which > are already compressed. It turns out that gzipping these disks is > *the* bottleneck when backing up. >=20 > When not compressing, tar creates archives with 37MB/s. When creating > tar.gz archives, the compression takes so much time that the speed > goes down to 6MB/s. Using gzip --fast doesn't help much. bzip is a > lot slower than gzip. >=20 > So the question is, does anybody know a compression tool which can be > used with tar, which doesn't slow down the backup by a factor of 6?=20 > It would be cool to have a tool which is as quick as the hardware > compression used in modern tape drives, but that's just dreaming... >=20 >=20 > May the hippos be with you, > Corinna I had this problem ages ago. My solution was to run two backups.=20=20 One uncompressed including only files globbing *.gz, *.t[bg]z, *.[zZ],=20 *.bz2, *.zip etc, and one for the remainder which was piped=20 through gzip. Even a fast compression algorithm is just wasting time trying to=20 compress previously compressed files, and as most compressors work=20 on some variant of Lempel Ziv, if they're fed a mixture of=20 compressible and incompressible data, the incompressible data=20 flushes the dictionary making the compression of the compressible=20 part worse. Phil