From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by sourceware.org (Postfix) with ESMTP id 1EEB3385800F for ; Fri, 26 Mar 2021 16:47:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 1EEB3385800F Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-237-hkfEOlkBOaOUhXgsmb2JPQ-1; Fri, 26 Mar 2021 12:47:45 -0400 X-MC-Unique: hkfEOlkBOaOUhXgsmb2JPQ-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 2694E1005E42; Fri, 26 Mar 2021 16:47:44 +0000 (UTC) Received: from tucnak.zalov.cz (ovpn-112-95.ams2.redhat.com [10.36.112.95]) by smtp.corp.redhat.com (Postfix) with ESMTPS id AE5FA5C737; Fri, 26 Mar 2021 16:47:43 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.16.1/8.16.1) with ESMTPS id 12QGleof3281741 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Fri, 26 Mar 2021 17:47:40 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.16.1/8.16.1/Submit) id 12QGlceU3281740; Fri, 26 Mar 2021 17:47:38 +0100 Date: Fri, 26 Mar 2021 17:47:38 +0100 From: Jakub Jelinek To: Tom de Vries Cc: dwz@sourceware.org, mark@klomp.org Subject: Re: [RFC] Allow parallel multifile with -p -e Message-ID: <20210326164738.GW1179226@tucnak> Reply-To: Jakub Jelinek References: <20210326164049.GA29676@delia.home> MIME-Version: 1.0 In-Reply-To: <20210326164049.GA29676@delia.home> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-5.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_LOTSOFHASH, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: dwz@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Dwz mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Mar 2021 16:47:50 -0000 On Fri, Mar 26, 2021 at 05:40:51PM +0100, Tom de Vries wrote: > This gives us reproducible compression: > ... > $ ls -la j1/* > -rwxr-xr-x 1 vries users 11432 Mar 26 17:16 j1/1 > -rwxr-xr-x 1 vries users 11432 Mar 26 17:16 j1/2 > -rwxr-xr-x 1 vries users 807376 Mar 26 17:16 j1/3 > -rwxr-xr-x 1 vries users 807376 Mar 26 17:16 j1/4 > -rw-r--r-- 1 vries users 64543 Mar 26 17:16 j1/5 > $ ls -la j4/* > -rwxr-xr-x 1 vries users 11432 Mar 26 17:16 j4/1 > -rwxr-xr-x 1 vries users 11432 Mar 26 17:16 j4/2 > -rwxr-xr-x 1 vries users 807376 Mar 26 17:16 j4/3 > -rwxr-xr-x 1 vries users 807376 Mar 26 17:16 j4/4 > -rw-r--r-- 1 vries users 64543 Mar 26 17:16 j4/5 > ... > > But it doesn't give reproducible results: > ... > $ md5sum j1/* > e6e655f7b5d1078672c8b0da99ab8c41 j1/1 > e6e655f7b5d1078672c8b0da99ab8c41 j1/2 > d833aa3ad6ad35597e1b7d0635b401cf j1/3 > d833aa3ad6ad35597e1b7d0635b401cf j1/4 > d5282aa9d065f1d00fd7a46c54ebde8d j1/5 > $ md5sum j4/* > de1645ce60bba6f345b2334825deb01f j4/1 > de1645ce60bba6f345b2334825deb01f j4/2 > ac2f16c50cf3d31be1f42f35ced4a091 j4/3 > ac2f16c50cf3d31be1f42f35ced4a091 j4/4 > 7fc3cd2c2514c8bf1f23348a27025b8d j4/5 > ... > > The temporary multifile section contributions happen in random > order, so consequently the multifile layout will be different, and the > files referring to the multifile will be different. What I meant is that each fork should use different temporary filenames for the multifiles, once all childs are done, merge them (depends on how exactly is the work distributed among the forks, if e.g. for 4 forks first fork gets first quarter of files, second second quarter etc., then just merge them in the order, otherwise more work would be needed to make the merging reproduceable. Then on generate in a single process the multifile, and then again in multiple forks work on the individual files against the multifile. Jakub