From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by sourceware.org (Postfix) with ESMTPS id 1B3B1385800F for ; Fri, 26 Mar 2021 16:55:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 1B3B1385800F Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=tdevries@suse.de X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 34561AD8D; Fri, 26 Mar 2021 16:55:17 +0000 (UTC) Subject: Re: [RFC] Allow parallel multifile with -p -e To: Jakub Jelinek Cc: dwz@sourceware.org, mark@klomp.org References: <20210326164049.GA29676@delia.home> <20210326164738.GW1179226@tucnak> From: Tom de Vries Message-ID: <00290ad6-b33b-460a-1c2c-987571b358fc@suse.de> Date: Fri, 26 Mar 2021 17:55:16 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.0 MIME-Version: 1.0 In-Reply-To: <20210326164738.GW1179226@tucnak> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-6.1 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, KAM_LOTSOFHASH, NICE_REPLY_A, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: dwz@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Dwz mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Mar 2021 16:55:19 -0000 On 3/26/21 5:47 PM, Jakub Jelinek wrote: > On Fri, Mar 26, 2021 at 05:40:51PM +0100, Tom de Vries wrote: >> This gives us reproducible compression: >> ... >> $ ls -la j1/* >> -rwxr-xr-x 1 vries users 11432 Mar 26 17:16 j1/1 >> -rwxr-xr-x 1 vries users 11432 Mar 26 17:16 j1/2 >> -rwxr-xr-x 1 vries users 807376 Mar 26 17:16 j1/3 >> -rwxr-xr-x 1 vries users 807376 Mar 26 17:16 j1/4 >> -rw-r--r-- 1 vries users 64543 Mar 26 17:16 j1/5 >> $ ls -la j4/* >> -rwxr-xr-x 1 vries users 11432 Mar 26 17:16 j4/1 >> -rwxr-xr-x 1 vries users 11432 Mar 26 17:16 j4/2 >> -rwxr-xr-x 1 vries users 807376 Mar 26 17:16 j4/3 >> -rwxr-xr-x 1 vries users 807376 Mar 26 17:16 j4/4 >> -rw-r--r-- 1 vries users 64543 Mar 26 17:16 j4/5 >> ... >> >> But it doesn't give reproducible results: >> ... >> $ md5sum j1/* >> e6e655f7b5d1078672c8b0da99ab8c41 j1/1 >> e6e655f7b5d1078672c8b0da99ab8c41 j1/2 >> d833aa3ad6ad35597e1b7d0635b401cf j1/3 >> d833aa3ad6ad35597e1b7d0635b401cf j1/4 >> d5282aa9d065f1d00fd7a46c54ebde8d j1/5 >> $ md5sum j4/* >> de1645ce60bba6f345b2334825deb01f j4/1 >> de1645ce60bba6f345b2334825deb01f j4/2 >> ac2f16c50cf3d31be1f42f35ced4a091 j4/3 >> ac2f16c50cf3d31be1f42f35ced4a091 j4/4 >> 7fc3cd2c2514c8bf1f23348a27025b8d j4/5 >> ... >> >> The temporary multifile section contributions happen in random >> order, so consequently the multifile layout will be different, and the >> files referring to the multifile will be different. > > What I meant is that each fork should use different temporary filenames > for the multifiles, once all childs are done, merge them (depends on how > exactly is the work distributed among the forks, if e.g. for 4 forks > first fork gets first quarter of files, second second quarter etc., then > just merge them in the order, otherwise more work would be needed to make > the merging reproduceable. Hi, yes, I understood your comments in bugzilla. I just wanted to see how far I got _without_ solving the reproducibility problem. > Then on generate in a single process the multifile, and then again > in multiple forks work on the individual files against the multifile. Yeah, that bit I haven't gotten to yet, but that doesn't look very difficult. Thanks, - Tom