public inbox for dwz@sourceware.org
 help / color / mirror / Atom feed
From: Tom de Vries <tdevries@suse.de>
To: Jakub Jelinek <jakub@redhat.com>
Cc: dwz@sourceware.org, mark@klomp.org
Subject: Re: [RFC] Allow parallel multifile with -p -e
Date: Fri, 26 Mar 2021 17:55:16 +0100	[thread overview]
Message-ID: <00290ad6-b33b-460a-1c2c-987571b358fc@suse.de> (raw)
In-Reply-To: <20210326164738.GW1179226@tucnak>

On 3/26/21 5:47 PM, Jakub Jelinek wrote:
> On Fri, Mar 26, 2021 at 05:40:51PM +0100, Tom de Vries wrote:
>> This gives us reproducible compression:
>> ...
>> $ ls -la j1/*
>> -rwxr-xr-x 1 vries users  11432 Mar 26 17:16 j1/1
>> -rwxr-xr-x 1 vries users  11432 Mar 26 17:16 j1/2
>> -rwxr-xr-x 1 vries users 807376 Mar 26 17:16 j1/3
>> -rwxr-xr-x 1 vries users 807376 Mar 26 17:16 j1/4
>> -rw-r--r-- 1 vries users  64543 Mar 26 17:16 j1/5
>> $ ls -la j4/*
>> -rwxr-xr-x 1 vries users  11432 Mar 26 17:16 j4/1
>> -rwxr-xr-x 1 vries users  11432 Mar 26 17:16 j4/2
>> -rwxr-xr-x 1 vries users 807376 Mar 26 17:16 j4/3
>> -rwxr-xr-x 1 vries users 807376 Mar 26 17:16 j4/4
>> -rw-r--r-- 1 vries users  64543 Mar 26 17:16 j4/5
>> ...
>>
>> But it doesn't give reproducible results:
>> ...
>> $ md5sum j1/*
>> e6e655f7b5d1078672c8b0da99ab8c41  j1/1
>> e6e655f7b5d1078672c8b0da99ab8c41  j1/2
>> d833aa3ad6ad35597e1b7d0635b401cf  j1/3
>> d833aa3ad6ad35597e1b7d0635b401cf  j1/4
>> d5282aa9d065f1d00fd7a46c54ebde8d  j1/5
>> $ md5sum j4/*
>> de1645ce60bba6f345b2334825deb01f  j4/1
>> de1645ce60bba6f345b2334825deb01f  j4/2
>> ac2f16c50cf3d31be1f42f35ced4a091  j4/3
>> ac2f16c50cf3d31be1f42f35ced4a091  j4/4
>> 7fc3cd2c2514c8bf1f23348a27025b8d  j4/5
>> ...
>>
>> The temporary multifile section contributions happen in random
>> order, so consequently the multifile layout will be different, and the
>> files referring to the multifile will be different.
> 
> What I meant is that each fork should use different temporary filenames
> for the multifiles, once all childs are done, merge them (depends on how
> exactly is the work distributed among the forks, if e.g. for 4 forks
> first fork gets first quarter of files, second second quarter etc., then
> just merge them in the order, otherwise more work would be needed to make
> the merging reproduceable.

Hi,

yes, I understood your comments in bugzilla.  I just wanted to see how
far I got _without_ solving the reproducibility problem.

> Then on generate in a single process the multifile, and then again
> in multiple forks work on the individual files against the multifile.

Yeah, that bit I haven't gotten to yet, but that doesn't look very
difficult.

Thanks,
- Tom

  reply	other threads:[~2021-03-26 16:55 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-26 16:40 Tom de Vries
2021-03-26 16:47 ` Jakub Jelinek
2021-03-26 16:55   ` Tom de Vries [this message]
2021-03-30  9:42   ` [PATCH] " Tom de Vries

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=00290ad6-b33b-460a-1c2c-987571b358fc@suse.de \
    --to=tdevries@suse.de \
    --cc=dwz@sourceware.org \
    --cc=jakub@redhat.com \
    --cc=mark@klomp.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).