public inbox for dwz@sourceware.org
 help / color / mirror / Atom feed
From: Jakub Jelinek <jakub@redhat.com>
To: Tom de Vries <tdevries@suse.de>
Cc: dwz@sourceware.org, mark@klomp.org
Subject: Re: [RFC] Allow parallel multifile with -p -e
Date: Fri, 26 Mar 2021 17:47:38 +0100	[thread overview]
Message-ID: <20210326164738.GW1179226@tucnak> (raw)
In-Reply-To: <20210326164049.GA29676@delia.home>

On Fri, Mar 26, 2021 at 05:40:51PM +0100, Tom de Vries wrote:
> This gives us reproducible compression:
> ...
> $ ls -la j1/*
> -rwxr-xr-x 1 vries users  11432 Mar 26 17:16 j1/1
> -rwxr-xr-x 1 vries users  11432 Mar 26 17:16 j1/2
> -rwxr-xr-x 1 vries users 807376 Mar 26 17:16 j1/3
> -rwxr-xr-x 1 vries users 807376 Mar 26 17:16 j1/4
> -rw-r--r-- 1 vries users  64543 Mar 26 17:16 j1/5
> $ ls -la j4/*
> -rwxr-xr-x 1 vries users  11432 Mar 26 17:16 j4/1
> -rwxr-xr-x 1 vries users  11432 Mar 26 17:16 j4/2
> -rwxr-xr-x 1 vries users 807376 Mar 26 17:16 j4/3
> -rwxr-xr-x 1 vries users 807376 Mar 26 17:16 j4/4
> -rw-r--r-- 1 vries users  64543 Mar 26 17:16 j4/5
> ...
> 
> But it doesn't give reproducible results:
> ...
> $ md5sum j1/*
> e6e655f7b5d1078672c8b0da99ab8c41  j1/1
> e6e655f7b5d1078672c8b0da99ab8c41  j1/2
> d833aa3ad6ad35597e1b7d0635b401cf  j1/3
> d833aa3ad6ad35597e1b7d0635b401cf  j1/4
> d5282aa9d065f1d00fd7a46c54ebde8d  j1/5
> $ md5sum j4/*
> de1645ce60bba6f345b2334825deb01f  j4/1
> de1645ce60bba6f345b2334825deb01f  j4/2
> ac2f16c50cf3d31be1f42f35ced4a091  j4/3
> ac2f16c50cf3d31be1f42f35ced4a091  j4/4
> 7fc3cd2c2514c8bf1f23348a27025b8d  j4/5
> ...
> 
> The temporary multifile section contributions happen in random
> order, so consequently the multifile layout will be different, and the
> files referring to the multifile will be different.

What I meant is that each fork should use different temporary filenames
for the multifiles, once all childs are done, merge them (depends on how
exactly is the work distributed among the forks, if e.g. for 4 forks
first fork gets first quarter of files, second second quarter etc., then
just merge them in the order, otherwise more work would be needed to make
the merging reproduceable.
Then on generate in a single process the multifile, and then again
in multiple forks work on the individual files against the multifile.

	Jakub


  reply	other threads:[~2021-03-26 16:47 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-26 16:40 Tom de Vries
2021-03-26 16:47 ` Jakub Jelinek [this message]
2021-03-26 16:55   ` Tom de Vries
2021-03-30  9:42   ` [PATCH] " Tom de Vries

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210326164738.GW1179226@tucnak \
    --to=jakub@redhat.com \
    --cc=dwz@sourceware.org \
    --cc=mark@klomp.org \
    --cc=tdevries@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).