From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from sa-prd-fep-042.btinternet.com (mailomta30-sa.btinternet.com [213.120.69.36]) by sourceware.org (Postfix) with ESMTPS id 8564C3858D37 for ; Thu, 27 Apr 2023 16:11:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8564C3858D37 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=dronecode.org.uk Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=dronecode.org.uk Received: from sa-prd-rgout-002.btmx-prd.synchronoss.net ([10.2.38.5]) by sa-prd-fep-042.btinternet.com with ESMTP id <20230427161129.TDWX16997.sa-prd-fep-042.btinternet.com@sa-prd-rgout-002.btmx-prd.synchronoss.net>; Thu, 27 Apr 2023 17:11:29 +0100 Authentication-Results: btinternet.com; auth=pass (PLAIN) smtp.auth=jonturney@btinternet.com; bimi=skipped X-SNCR-Rigid: 64067E9B060B6B5C X-Originating-IP: [86.140.112.72] X-OWM-Source-IP: 86.140.112.72 (GB) X-OWM-Env-Sender: jonturney@btinternet.com X-VadeSecure-score: verdict=clean score=0/300, class=clean X-RazorGate-Vade: gggruggvucftvghtrhhoucdtuddrgedvhedrfeduiedgleelucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuueftkffvkffujffvgffngfevqffopdfqfgfvnecuuegrihhlohhuthemuceftddunecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenucfjughrpefkffggfgfuvfhfhfgjtgfgsehtjeertddtfeejnecuhfhrohhmpeflohhnucfvuhhrnhgvhicuoehjohhnrdhtuhhrnhgvhiesughrohhnvggtohguvgdrohhrghdruhhkqeenucggtffrrghtthgvrhhnpeffkeeigfdujeehteduiefgjeeltdelgeelteekudetfedtffefhfeufefgueettdenucfkphepkeeirddugedtrdduuddvrdejvdenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhephhgvlhhopegludelvddrudeikedruddruddtiegnpdhinhgvthepkeeirddugedtrdduuddvrdejvddpmhgrihhlfhhrohhmpehjohhnrdhtuhhrnhgvhiesughrohhnvggtohguvgdrohhrghdruhhkpdhnsggprhgtphhtthhopedvpdhrtghpthhtohepvehhrhhishhtihgrnhdrhfhrrghnkhgvsehtqdhonhhlihhnvgdruggvpdhrtghpthhtoheptgihghifihhnqdgrphhpshestgihghifihhnrdgtohhmpdhrvghvkffrpehhohhsthekiedqudegtddqudduvddqjedvrdhrrghnghgvkeeiqddugedtrdgsthgtvghnthhrrghlphhluhhsrdgtohhmpdgruhhthhgpuhhsvghrpehjohhnthhurhhnvgihsegsthhinhht vghrnhgvthdrtghomhdpghgvohfkrfepifeupdfovfetjfhoshhtpehsrgdqphhrugdqrhhgohhuthdqtddtvd X-RazorGate-Vade-Verdict: clean 0 X-RazorGate-Vade-Classification: clean Received: from [192.168.1.106] (86.140.112.72) by sa-prd-rgout-002.btmx-prd.synchronoss.net (5.8.814) (authenticated as jonturney@btinternet.com) id 64067E9B060B6B5C; Thu, 27 Apr 2023 17:11:29 +0100 Message-ID: <358f3794-ea6c-d771-731b-34ab9bffde9b@dronecode.org.uk> Date: Thu, 27 Apr 2023 17:11:28 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 Subject: Re: [PATCH setup 2/2] Detect filename collisions between packages Content-Language: en-GB To: "cygwin-apps@cygwin.com" , Christian Franke References: <20230423144330.3107-1-jon.turney@dronecode.org.uk> <20230423144330.3107-3-jon.turney@dronecode.org.uk> From: Jon Turney In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,JMQ_SPF_NEUTRAL,KAM_DMARC_STATUS,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 24/04/2023 17:26, Christian Franke via Cygwin-apps wrote: > Jon Turney via Cygwin-apps wrote: >> Detect filename collisions between packages >> Don't check filenames under /etc/postinstall/ for collisions >> Report when filename collisions exist >> Add option '--collisions' to enable > > IMO a useful enhancement. :) >> Notes: >> >> Reading file catalog from a package is moderately expensive in terms of >> I/O: To extract all the filenames from a tar archive, we need to seek to >> every file header, and to seek forward through a compressed file, we >> must examine every intervening byte to decompress it. >> >> This adds a fourth(!) pass through each archive (one to checksum it, one >> to extract files, another one (I added in dbfd1a64 without thinking too >> deeply about it) to extract symlinks), and now one to check for filename >> collisions). >> >> Using std::set_intersection() on values from std::map() here is probably >> a mistake. It's simple to write, but the performance is not good. > > A faster alternative which avoids set_intersection calls in a loop is > possibly to use one large data structure which maps filenames to sets of > packages. Using multimap instead of the straightforward > map> needs possibly less memory (not tested). But > for multimap it is required that file/package name pairs are not > inserted twice. > > I attached a small standalone POC source file using multimap. It would > also detect collisions in the already installed packages. Thanks for the ideas. It seems I really didn't think that carefully about this... It seems like maybe building a map from filename to the set of package names which contain it, and then finding all the filenames where that set has more than one member would be a possible better implementation. [...] > Is the new file filemanifest.h required at all? It could be reduced to > the following in install.cc: > > #include > ... > typedef std::map FileManifest; > // or more modern (C++11): > // using FileManifest = std::map; I think I had some idea to put the (de)serialization of the file manifests for installed packages into that class as well, but never got around to it (these need to be considered in the collision assessment as well as newly installed packages)