From: Tom de Vries <tdevries@suse.de>
To: Jakub Jelinek <jakub@redhat.com>
Cc: dwz@sourceware.org, mark@klomp.org
Subject: Re: [RFC] Allow parallel multifile with -p -e
Date: Fri, 26 Mar 2021 17:55:16 +0100 [thread overview]
Message-ID: <00290ad6-b33b-460a-1c2c-987571b358fc@suse.de> (raw)
In-Reply-To: <20210326164738.GW1179226@tucnak>
On 3/26/21 5:47 PM, Jakub Jelinek wrote:
> On Fri, Mar 26, 2021 at 05:40:51PM +0100, Tom de Vries wrote:
>> This gives us reproducible compression:
>> ...
>> $ ls -la j1/*
>> -rwxr-xr-x 1 vries users 11432 Mar 26 17:16 j1/1
>> -rwxr-xr-x 1 vries users 11432 Mar 26 17:16 j1/2
>> -rwxr-xr-x 1 vries users 807376 Mar 26 17:16 j1/3
>> -rwxr-xr-x 1 vries users 807376 Mar 26 17:16 j1/4
>> -rw-r--r-- 1 vries users 64543 Mar 26 17:16 j1/5
>> $ ls -la j4/*
>> -rwxr-xr-x 1 vries users 11432 Mar 26 17:16 j4/1
>> -rwxr-xr-x 1 vries users 11432 Mar 26 17:16 j4/2
>> -rwxr-xr-x 1 vries users 807376 Mar 26 17:16 j4/3
>> -rwxr-xr-x 1 vries users 807376 Mar 26 17:16 j4/4
>> -rw-r--r-- 1 vries users 64543 Mar 26 17:16 j4/5
>> ...
>>
>> But it doesn't give reproducible results:
>> ...
>> $ md5sum j1/*
>> e6e655f7b5d1078672c8b0da99ab8c41 j1/1
>> e6e655f7b5d1078672c8b0da99ab8c41 j1/2
>> d833aa3ad6ad35597e1b7d0635b401cf j1/3
>> d833aa3ad6ad35597e1b7d0635b401cf j1/4
>> d5282aa9d065f1d00fd7a46c54ebde8d j1/5
>> $ md5sum j4/*
>> de1645ce60bba6f345b2334825deb01f j4/1
>> de1645ce60bba6f345b2334825deb01f j4/2
>> ac2f16c50cf3d31be1f42f35ced4a091 j4/3
>> ac2f16c50cf3d31be1f42f35ced4a091 j4/4
>> 7fc3cd2c2514c8bf1f23348a27025b8d j4/5
>> ...
>>
>> The temporary multifile section contributions happen in random
>> order, so consequently the multifile layout will be different, and the
>> files referring to the multifile will be different.
>
> What I meant is that each fork should use different temporary filenames
> for the multifiles, once all childs are done, merge them (depends on how
> exactly is the work distributed among the forks, if e.g. for 4 forks
> first fork gets first quarter of files, second second quarter etc., then
> just merge them in the order, otherwise more work would be needed to make
> the merging reproduceable.
Hi,
yes, I understood your comments in bugzilla. I just wanted to see how
far I got _without_ solving the reproducibility problem.
> Then on generate in a single process the multifile, and then again
> in multiple forks work on the individual files against the multifile.
Yeah, that bit I haven't gotten to yet, but that doesn't look very
difficult.
Thanks,
- Tom
next prev parent reply other threads:[~2021-03-26 16:55 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-26 16:40 Tom de Vries
2021-03-26 16:47 ` Jakub Jelinek
2021-03-26 16:55 ` Tom de Vries [this message]
2021-03-30 9:42 ` [PATCH] " Tom de Vries
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=00290ad6-b33b-460a-1c2c-987571b358fc@suse.de \
--to=tdevries@suse.de \
--cc=dwz@sourceware.org \
--cc=jakub@redhat.com \
--cc=mark@klomp.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).