From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 123134 invoked by alias); 16 Apr 2016 10:04:16 -0000 Mailing-List: contact cygwin-apps-help@cygwin.com; run by ezmlm Precedence: bulk Sender: cygwin-apps-owner@cygwin.com List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Mail-Followup-To: cygwin-apps@cygwin.com Received: (qmail 123112 invoked by uid 89); 16 Apr 2016 10:04:15 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 spammy=quarter, practically, xlr, singles X-HELO: mail-in-11.arcor-online.net Received: from mail-in-11.arcor-online.net (HELO mail-in-11.arcor-online.net) (151.189.21.51) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (CAMELLIA256-SHA encrypted) ESMTPS; Sat, 16 Apr 2016 10:04:05 +0000 Received: from mail-in-20-z2.arcor-online.net (mail-in-20-z2.arcor-online.net [151.189.8.85]) by mx.arcor.de (Postfix) with ESMTP id 3qn93j61JNz32d5 for ; Sat, 16 Apr 2016 12:04:01 +0200 (CEST) Received: from mail-in-09.arcor-online.net (mail-in-09.arcor-online.net [151.189.21.49]) by mail-in-20-z2.arcor-online.net (Postfix) with ESMTP id CBE73835E35 for ; Sat, 16 Apr 2016 12:04:01 +0200 (CEST) X-DKIM: Sendmail DKIM Filter v2.8.2 mail-in-09.arcor-online.net 3qn93j46fzz9JW9 Received: from Gertrud (p54B46F42.dip0.t-ipconnect.de [84.180.111.66]) (Authenticated sender: stromeko@arcor.de) by mail-in-09.arcor-online.net (Postfix) with ESMTPSA id 3qn93j46fzz9JW9 for ; Sat, 16 Apr 2016 12:04:00 +0200 (CEST) From: Achim Gratz To: cygwin-apps@cygwin.com Subject: Dedup x86/x86_64 --> noarch Date: Sat, 16 Apr 2016 10:04:00 -0000 Message-ID: <87zistg99v.fsf@Rainer.invalid> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.92 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-SW-Source: 2016-04/txt/msg00023.txt.bz2 After a discussion on IRC about de-duping the noarch content out of package files (where I was told this would be too difficult), I've just tried what would happen for two of my packages, maxima and perl. Maxima is practically a noarch package, save for the clisp memory image. Perl has gobs and gobs of non-arch-specific files mixed in with quite a bit of arch-specific stuff. I've used hashdeep for finding the dupes since it is really fast, so the files are only de-duped if the are bit-for-bit identical. set p=perl # reference ( cd $p.x86/inst ; hashdeep -c sha256 -lr * ) > $p.x86.hash # matching files ( cd $p.x86_64/inst ; hashdeep -c sha256 -k ../../$p.x86.hash -mlr * ) # non-matching files ( cd $p.x86_64/inst ; hashdeep -c sha256 -k ../../$p.x86.hash -xlr * ) For Maxima, there are a few files that should be identical, but aren't: these are leakages from the build environment that I'll have to patch out later (one of these leakages is actually a bug, affecting parts of the documentation). For Perl, the GZip compressed man-pages are flagged as different, because gzip leaks the time-stamp (but that could be avoided using the -n option to gzip in cygport). Fixing that, the documentation packages for Perl are completely shared between the two arches (well, duh), but even the binary packages perl, perl-debginfo and perl_base would share about a quarter of their content (so they'd need to be split into something like perl_base / perl_base-common). Looking at the current repo content we'd save about 30GB from the dedup of the src abd doc packages alone and probably about 20GB from dedup in the remaining packages. Regards, Achim. -- +<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+ Factory and User Sound Singles for Waldorf Q+, Q and microQ: http://Synth.Stromeko.net/Downloads.html#WaldorfSounds