public inbox for cygwin-apps@cygwin.com
 help / color / mirror / Atom feed
From: Achim Gratz <Stromeko@nexgo.de>
To: cygwin-apps@cygwin.com
Subject: Dedup x86/x86_64 --> noarch
Date: Sat, 16 Apr 2016 10:04:00 -0000	[thread overview]
Message-ID: <87zistg99v.fsf@Rainer.invalid> (raw)


After a discussion on IRC about de-duping the noarch content out of
package files (where I was told this would be too difficult), I've just
tried what would happen for two of my packages, maxima and perl.  Maxima
is practically a noarch package, save for the clisp memory image.  Perl
has gobs and gobs of non-arch-specific files mixed in with quite a bit
of arch-specific stuff.  I've used hashdeep for finding the dupes since
it is really fast, so the files are only de-duped if the are bit-for-bit
identical.

set p=perl
# reference
( cd $p.x86/inst    ; hashdeep -c sha256 -lr * ) > $p.x86.hash
# matching files
( cd $p.x86_64/inst ; hashdeep -c sha256 -k ../../$p.x86.hash -mlr * )
# non-matching files
( cd $p.x86_64/inst ; hashdeep -c sha256 -k ../../$p.x86.hash -xlr * )

For Maxima, there are a few files that should be identical, but aren't:
these are leakages from the build environment that I'll have to patch
out later (one of these leakages is actually a bug, affecting parts of
the documentation).

For Perl, the GZip compressed man-pages are flagged as different,
because gzip leaks the time-stamp (but that could be avoided using the
-n option to gzip in cygport).  Fixing that, the documentation packages
for Perl are completely shared between the two arches (well, duh), but
even the binary packages perl, perl-debginfo and perl_base would share
about a quarter of their content (so they'd need to be split into
something like perl_base / perl_base-common).

Looking at the current repo content we'd save about 30GB from the dedup
of the src abd doc packages alone and probably about 20GB from dedup in
the remaining packages.


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Factory and User Sound Singles for Waldorf Q+, Q and microQ:
http://Synth.Stromeko.net/Downloads.html#WaldorfSounds

             reply	other threads:[~2016-04-16 10:04 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-16 10:04 Achim Gratz [this message]
2016-04-18 19:45 ` Achim Gratz
2016-04-23 10:51 ` Jon Turney
2016-04-23 11:19   ` Achim Gratz
2016-04-23 14:19   ` Achim Gratz
2016-04-23 15:32     ` Corinna Vinschen
2016-04-23 15:43       ` Achim Gratz
2016-05-09 14:38         ` Jon Turney
2016-05-09 16:43           ` Achim Gratz
2016-05-09 14:18     ` Jon Turney
2016-05-09 16:45       ` Achim Gratz
2016-05-09 22:41 ` Andrew Schulman
2016-05-10  5:44   ` Achim Gratz
2016-05-10  6:20     ` Andrew Schulman
2016-05-11 18:59       ` Jon Turney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zistg99v.fsf@Rainer.invalid \
    --to=stromeko@nexgo.de \
    --cc=cygwin-apps@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).