public inbox for cygwin-apps@cygwin.com
 help / color / mirror / Atom feed
From: Jon Turney <jon.turney@dronecode.org.uk>
To: cygwin-apps@cygwin.com
Subject: Re: Dedup x86/x86_64 --> noarch
Date: Sat, 23 Apr 2016 10:51:00 -0000	[thread overview]
Message-ID: <571B539D.4050304@dronecode.org.uk> (raw)
In-Reply-To: <87zistg99v.fsf@Rainer.invalid>

On 16/04/2016 11:03, Achim Gratz wrote:
> After a discussion on IRC about de-duping the noarch content out of
> package files (where I was told this would be too difficult), I've just

I think it was more along the lines of 'not yet' :)

In any case, we need noarch support in calm, before it's useful to have 
dedup of arch packages to noarch.

I think I have implemented the changes to calm to support all-or-nothing 
noarch (i.e. where all packages produced from a source package must be 
noarch), so if you can nominate a suitable, unimportant perl package, we 
can test it with that, initially.

(This wasn't quite as straightforward as just looking in another 
directory for packages, as the upload validation becomes more complex: 
we must check that consistent package sets result for both x86 and 
x86_64 before we can move noarch packages)

To make full use of this, cygport upload will need a feature to upload 
noarch packages from dist/ to noarch/ rather than <arch>/.

On 18/04/2016 20:44, Achim Gratz wrote:
>> Looking at the current repo content we'd save about 30GB from the dedup
>> of the src abd doc packages alone and probably about 20GB from dedup in
>> the remaining packages.
>
> I've implemented some POC code and deduped my Cygwin mirror (it is
> missing most of KDE and the cross-Cygwin compilation toolchains).  This
> took a solid 12 hours of flat out 400% CPU load on my SandyBridge laptop
> and ballooned the page file to 21GiB.  But it also removed almost
> exactly a third from the repo's size (going from 81.2GiB to 51.4GiB), so
> projected to the full repo it's slightly more than my original estimate.

Thanks.  It's very useful to have some numbers.

I don't think this distinguishes between packages which are (or should 
be) marked ARCH="noarch" in the cygport, and those where the build 
products happen to be identical and can be deduped?

I would guess that this saving is dominated by some very large, 
data-only noarch packages, but who knows?

(Also, looking forward, perhaps cygport needs a separate command to 
build the source package, rather than building it for each arch and then 
deduping it?)

  parent reply	other threads:[~2016-04-23 10:51 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-16 10:04 Achim Gratz
2016-04-18 19:45 ` Achim Gratz
2016-04-23 10:51 ` Jon Turney [this message]
2016-04-23 11:19   ` Achim Gratz
2016-04-23 14:19   ` Achim Gratz
2016-04-23 15:32     ` Corinna Vinschen
2016-04-23 15:43       ` Achim Gratz
2016-05-09 14:38         ` Jon Turney
2016-05-09 16:43           ` Achim Gratz
2016-05-09 14:18     ` Jon Turney
2016-05-09 16:45       ` Achim Gratz
2016-05-09 22:41 ` Andrew Schulman
2016-05-10  5:44   ` Achim Gratz
2016-05-10  6:20     ` Andrew Schulman
2016-05-11 18:59       ` Jon Turney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=571B539D.4050304@dronecode.org.uk \
    --to=jon.turney@dronecode.org.uk \
    --cc=cygwin-apps@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).