public inbox for cygwin-apps@cygwin.com
 help / color / mirror / Atom feed
* [PATCH setup 0/2] Detect filename collisions between packages
@ 2023-04-23 14:43 Jon Turney
  2023-04-23 14:43 ` [PATCH setup 1/2] Add underlying() method to io_stream class Jon Turney
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Jon Turney @ 2023-04-23 14:43 UTC (permalink / raw)
  To: cygwin-apps; +Cc: Jon Turney

This is a woefully underoptimized implementation of detecting filename
collisions between packages, so it's hidden behind the command line option
'--collisions' to enable it.

A good implementation probably (i) collects the filenames at the same time
as checksumming the archive, or has them attached to it somehow by the
packaging process, and (ii) uses a libsolv string pool to hold the filenames
(so we only need to compute the intersection and unions of sets of integers,
which is hopefully much faster)

Nevertheless, this labouriously identifies some obvious collisions which I
might bring to the attention of package maintainers when I have some time to
do so...

Jon Turney (2):
  Add underlying() method to io_stream class
  Detect filename collisions between packages

 compress.h       |  13 +++
 compress_bz.cc   |   2 -
 compress_bz.h    |   2 -
 compress_gz.h    |   2 -
 compress_xz.cc   |   2 -
 compress_xz.h    |   2 -
 compress_zstd.cc |   2 -
 compress_zstd.h  |   2 -
 filemanifest.h   |  29 ++++++
 install.cc       | 254 +++++++++++++++++++++++++++++++++++++++--------
 io_stream.h      |  31 ++----
 res/en/res.rc    |   3 +
 resource.h       |   3 +
 13 files changed, 266 insertions(+), 81 deletions(-)
 create mode 100644 filemanifest.h

-- 
2.39.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH setup 1/2] Add underlying() method to io_stream class
  2023-04-23 14:43 [PATCH setup 0/2] Detect filename collisions between packages Jon Turney
@ 2023-04-23 14:43 ` Jon Turney
  2023-04-23 14:43 ` [PATCH setup 2/2] Detect filename collisions between packages Jon Turney
  2023-04-24 16:46 ` [PATCH setup 0/2] " Corinna Vinschen
  2 siblings, 0 replies; 12+ messages in thread
From: Jon Turney @ 2023-04-23 14:43 UTC (permalink / raw)
  To: cygwin-apps; +Cc: Jon Turney

Add underlying() method to io_stream class.  This saves having to pass
around both original and decompressed stream, just so you can call
tell() on the original stream.

For a non-compressed stream, it simply returns that io_stream object.

In the case of a compressed stream, this returns the io_stream object
for the underlying file stream.

Move the original and owns_original members up to compress class from
it's subclasses to facilitate that.

Also clean up some cruft in io_stream.h
---
 compress.h       | 13 +++++++++++++
 compress_bz.cc   |  2 --
 compress_bz.h    |  2 --
 compress_gz.h    |  2 --
 compress_xz.cc   |  2 --
 compress_xz.h    |  2 --
 compress_zstd.cc |  2 --
 compress_zstd.h  |  2 --
 io_stream.h      | 31 ++++++-------------------------
 9 files changed, 19 insertions(+), 39 deletions(-)

diff --git a/compress.h b/compress.h
index 1692fa6..5e84408 100644
--- a/compress.h
+++ b/compress.h
@@ -21,6 +21,11 @@
 class compress:public io_stream
 {
 public:
+  compress () :
+    original(NULL),
+    owns_original(true)
+  { };
+
   /* Get a decompressed stream from a normal stream. If this function returns non-null
    * a valid compress header was found. The io_stream pointer passed to decompress
    * should be discarded. The old io_stream will be automatically closed when the 
@@ -48,6 +53,14 @@ public:
   virtual const char *next_file_name () = 0;
   /* if you are still needing these hints... give up now! */
   virtual ~compress () = 0;
+
+  virtual io_stream *underlying ()
+  {
+    return original;
+  }
+protected:
+  io_stream *original;
+  bool owns_original;
 };
 
 #endif /* SETUP_COMPRESS_H */
diff --git a/compress_bz.cc b/compress_bz.cc
index d801de1..f8a1199 100644
--- a/compress_bz.cc
+++ b/compress_bz.cc
@@ -26,7 +26,6 @@
 compress_bz::compress_bz (io_stream * parent) : peeklen (0), position (0)
 {
   /* read only via this constructor */
-  original = 0;
   lasterr = 0;
   if (!parent || parent->error ())
     {
@@ -34,7 +33,6 @@ compress_bz::compress_bz (io_stream * parent) : peeklen (0), position (0)
       return;
     }
   original = parent;
-  owns_original = true;
   init_state();
 }
 
diff --git a/compress_bz.h b/compress_bz.h
index 9ab59d7..6ccccfc 100644
--- a/compress_bz.h
+++ b/compress_bz.h
@@ -56,8 +56,6 @@ public:
   /* if you are still needing these hints... give up now! */
     virtual ~ compress_bz ();
 private:
-  io_stream *original;
-  bool owns_original;
   char peekbuf[512];
   size_t peeklen;
   int lasterr;
diff --git a/compress_gz.h b/compress_gz.h
index c7ae14b..05b4b7b 100644
--- a/compress_gz.h
+++ b/compress_gz.h
@@ -67,8 +67,6 @@ private:
   void putLong (unsigned long);
   void destroy ();
   int do_flush (int);
-  io_stream *original;
-  bool owns_original;
   const char *openmode;
   /* from zlib */
   z_stream stream;
diff --git a/compress_xz.cc b/compress_xz.cc
index b91c8a4..5c34a54 100644
--- a/compress_xz.cc
+++ b/compress_xz.cc
@@ -45,8 +45,6 @@ le64dec(const void *pp)
  */
 compress_xz::compress_xz (io_stream * parent)
 :
-  original(NULL),
-  owns_original(true),
   peeklen(0),
   lasterr(0),
   compression_type (COMPRESSION_UNKNOWN)
diff --git a/compress_xz.h b/compress_xz.h
index 9be7822..48e620d 100644
--- a/compress_xz.h
+++ b/compress_xz.h
@@ -44,8 +44,6 @@ public:
 private:
   compress_xz () {};
 
-  io_stream *original;
-  bool owns_original;
   char peekbuf[512];
   size_t peeklen;
   int lasterr;
diff --git a/compress_zstd.cc b/compress_zstd.cc
index 8d6e51f..ba0e491 100644
--- a/compress_zstd.cc
+++ b/compress_zstd.cc
@@ -24,8 +24,6 @@
  */
 compress_zstd::compress_zstd (io_stream * parent)
 :
-  original(NULL),
-  owns_original(true),
   lasterr(0)
 {
   /* read only */
diff --git a/compress_zstd.h b/compress_zstd.h
index b9b541e..c30d1a8 100644
--- a/compress_zstd.h
+++ b/compress_zstd.h
@@ -39,8 +39,6 @@ public:
 private:
   compress_zstd () {};
 
-  io_stream *original;
-  bool owns_original;
   int lasterr;
   void create ();
   void destroy ();
diff --git a/io_stream.h b/io_stream.h
index f19e134..68915b1 100644
--- a/io_stream.h
+++ b/io_stream.h
@@ -47,15 +47,6 @@ typedef enum
 }
 path_type_t;
 
-typedef enum
-{
-  IO_STREAM_INVALID,
-  IO_STREAM_STREAM,
-  IO_STREAM_COMPRESS,
-  IO_STREAM_ARCHIVE
-}
-io_stream_type_t;
-
 typedef enum
 {
   IO_STREAM_SYMLINK,
@@ -143,27 +134,17 @@ public:
   virtual int error () = 0;
   /* hmm, yet another for the guessing books */
   virtual char *gets (char *, size_t len);
-  /* what sort of stream is this?
-   * known types are:
-   * IO_STREAM_INVALID - not a valid stream.
-   * IO_STREAM_STREAM - just another stream.
-   * IO_STREAM_COMPRESS - a compressed or compressing stream.
-   * IO_STREAM_ARCHIVE - an archive of some sort, with > 0 files.
-   * this is a crutch for real runtime type evaluation.
-   */
-  /* Find out the next stream name -
-   * ie for foo.tar.gz, at offset 0, next_file_name = foo.tar
-   * for foobar that is an archive, next_file_name is the next
-   * extractable filename.
-   */
-//  virtual const char* next_file_name() = NULL;
-  /* if you are still needing these hints... give up now! */
+
   virtual ~io_stream () = 0;
 
   io_stream& operator << (io_stream&);
   virtual void operator << (std::string) {}
   virtual void operator << (const char *) {}
-  
+
+  virtual io_stream *underlying ()
+  {
+    return this;
+  }
 protected:
   void operator= (const io_stream &);
   io_stream() {};
-- 
2.39.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH setup 2/2] Detect filename collisions between packages
  2023-04-23 14:43 [PATCH setup 0/2] Detect filename collisions between packages Jon Turney
  2023-04-23 14:43 ` [PATCH setup 1/2] Add underlying() method to io_stream class Jon Turney
@ 2023-04-23 14:43 ` Jon Turney
  2023-04-24 16:26   ` Christian Franke
  2023-04-24 16:46 ` [PATCH setup 0/2] " Corinna Vinschen
  2 siblings, 1 reply; 12+ messages in thread
From: Jon Turney @ 2023-04-23 14:43 UTC (permalink / raw)
  To: cygwin-apps; +Cc: Jon Turney

Detect filename collisions between packages
Don't check filenames under /etc/postinstall/ for collisions
Report when filename collisions exist
Add option '--collisions' to enable

Notes:

Reading file catalog from a package is moderately expensive in terms of
I/O: To extract all the filenames from a tar archive, we need to seek to
every file header, and to seek forward through a compressed file, we
must examine every intervening byte to decompress it.

This adds a fourth(!) pass through each archive (one to checksum it, one
to extract files, another one (I added in dbfd1a64 without thinking too
deeply about it) to extract symlinks), and now one to check for filename
collisions).

Using std::set_intersection() on values from std::map() here is probably
a mistake. It's simple to write, but the performance is not good.
---
 filemanifest.h |  29 ++++++
 install.cc     | 254 +++++++++++++++++++++++++++++++++++++++++--------
 res/en/res.rc  |   3 +
 resource.h     |   3 +
 4 files changed, 247 insertions(+), 42 deletions(-)
 create mode 100644 filemanifest.h

diff --git a/filemanifest.h b/filemanifest.h
new file mode 100644
index 0000000..cc6fb81
--- /dev/null
+++ b/filemanifest.h
@@ -0,0 +1,29 @@
+/*
+ * Copyright (c) 2022 Jon Turney
+ *
+ *     This program is free software; you can redistribute it and/or modify
+ *     it under the terms of the GNU General Public License as published by
+ *     the Free Software Foundation; either version 2 of the License, or
+ *     (at your option) any later version.
+ *
+ *     A copy of the GNU General Public License can be found at
+ *     http://www.gnu.org/
+ *
+ */
+
+/*
+  FileManifest
+
+  A serializable set of filenames
+  each filename is associated with the package name that owns it
+
+  Used when working with the install package file manifests in /etc/setup
+*/
+
+#include <string>
+#include <unordered_map>
+#include <algorithm>
+
+class FileManifest: public std::map<std::string, std::string>
+{
+};
diff --git a/install.cc b/install.cc
index 628dbd0..bcdbf8d 100644
--- a/install.cc
+++ b/install.cc
@@ -58,6 +58,7 @@
 #include "package_meta.h"
 #include "package_version.h"
 #include "package_source.h"
+#include "filemanifest.h"
 
 #include "threebar.h"
 #include "Exception.h"
@@ -71,6 +72,7 @@ static off_t package_bytes = 0;
 
 static BoolOption NoReplaceOnReboot (false, 'r', "no-replaceonreboot", IDS_HELPTEXT_NO_REPLACEONREBOOT);
 static BoolOption NoWriteRegistry (false, '\0', "no-write-registry", IDS_HELPTEXT_NO_WRITE_REGISTRY);
+static BoolOption Collisions (false, '\0', "collisions", IDS_HELPTEXT_CHECK_COLLISIONS);
 
 struct std_dirs_t {
   const char *name;
@@ -101,6 +103,32 @@ private:
   const std::string stratum;
 };
 
+// check if this is a collidable filename (i.e. a filename we should check for collisions)
+static bool
+collidableFilename(const std::string &fn)
+{
+  // ignore reserved paths (starting with '.')
+  if (fn[0] == '.')
+    return false;
+
+  // ignore directories
+  if (fn.back() == '/')
+    return false;
+
+  // ignore scripts under /etc/postinstall
+  // XXX: only should apply to non-perpetual scripts
+  //
+  // cygport allows multiple install packages made from the same source to
+  // contain the same post-install script, but this could still be a problem if we
+  // ever have two packages with different scripts with the same name...
+  //
+  // XXX: possibly use isAScript, but that could be written more efficiently...
+  if (fn.rfind("etc/postinstall/", 0) == 0)
+    return false;
+
+  return true;
+}
+
 class Installer
 {
   public:
@@ -116,15 +144,17 @@ class Installer
     void installOne (packagemeta &pkg, const packageversion &ver,
                      packagesource &source,
                      const std::string& , const std::string&, HWND );
+    void manifestOne(packagesource &source, const std::string& name, FileManifest &manifest);
     int errors;
   private:
     bool extract_replace_on_reboot(archive *, const std::string&,
                                    const std::string&, std::string);
+
+    archive *_openOne(packagesource &source);
     bool _installOne (packagemeta &pkgm,
                       const std::string& prefixURL,
                       const std::string& prefixPath,
                       HWND owner,
-                      io_stream *pkgfile,
                       archive *tarstream,
                       io_stream *lst,
                       bool symlink_phase);
@@ -210,6 +240,38 @@ Installer::preremoveOne (packagemeta & pkg)
     try_run_script ("/etc/preremove/", pkg.name, exts[i]);
 }
 
+// utility class for reading an installed package's file manifest
+class FileManifestReader
+{
+public:
+  FileManifestReader(std::string pkgname)
+  {
+    listfile = io_stream::open ("cygfile:///etc/setup/" + pkgname + ".lst.gz", "rb", 0);
+    listdata = compress::decompress (listfile);
+  }
+
+  ~FileManifestReader()
+  {
+    delete listdata;
+  }
+
+  bool readline(std::string &line)
+  {
+    line.reserve(CYG_PATH_MAX);
+    const char *sz = listdata->gets (&line[0], CYG_PATH_MAX);
+
+    // NULL indicates EOF
+    if (sz == NULL)
+      return false;
+
+    return true;
+  }
+
+private:
+  io_stream *listfile;
+  io_stream *listdata;
+};
+
 void
 Installer::uninstallOne (packagemeta & pkg)
 {
@@ -222,18 +284,10 @@ Installer::uninstallOne (packagemeta & pkg)
 
   std::set<std::string> dirs;
 
-  io_stream *listfile = io_stream::open ("cygfile:///etc/setup/" + pkg.name + ".lst.gz", "rb", 0);
-  io_stream *listdata = compress::decompress (listfile);
-
-  while (listdata)
+  FileManifestReader reader(pkg.name);
+  std::string line;
+  while (reader.readline(line))
     {
-      char getfilenamebuffer[CYG_PATH_MAX];
-      const char *sz = listdata->gets (getfilenamebuffer, sizeof (getfilenamebuffer));
-      if (sz == NULL)
-        break;
-
-      std::string line(sz);
-
       /* Insert the paths of all parent directories of line into dirs. */
       size_t idx = line.length();
       while ((idx = line.find_last_of('/', idx-1)) != std::string::npos)
@@ -271,7 +325,6 @@ Installer::uninstallOne (packagemeta & pkg)
     }
 
   /* Remove the listing file */
-  delete listdata;
   io_stream::remove ("cygfile:///etc/setup/" + pkg.name + ".lst.gz");
 
   /* An STL set maintains itself in sorted order. Thus, iterating over it
@@ -432,19 +485,13 @@ Installer::extract_replace_on_reboot (archive *tarstream, const std::string& pre
 }
 
 static char all_null[512];
+#define EMPTY_ARCHIVE (archive *)-1
 
-/* install one source at a given prefix. */
-void
-Installer::installOne (packagemeta &pkgm, const packageversion &ver,
-                       packagesource &source,
-                       const std::string& prefixURL,
-                       const std::string& prefixPath,
-                       HWND owner)
+archive *
+Installer::_openOne(packagesource &source)
 {
   if (!source.Canonical())
-    return;
-  Progress.SetText1 (IDS_PROGRESS_INSTALL);
-  Progress.SetText2 ((pkgm.name + "-" + ver.Canonical_version()).c_str());
+    return NULL;
 
   io_stream *pkgfile = NULL;
 
@@ -452,7 +499,7 @@ Installer::installOne (packagemeta &pkgm, const packageversion &ver,
     {
       note (NULL, IDS_ERR_OPEN_READ, source.Canonical (), "Unknown filename");
       ++errors;
-      return;
+      return NULL;
     }
 
   if (!io_stream::exists (source.Cached ())
@@ -460,10 +507,9 @@ Installer::installOne (packagemeta &pkgm, const packageversion &ver,
     {
       note (NULL, IDS_ERR_OPEN_READ, source.Cached (), "No such file");
       ++errors;
-      return;
+      return NULL;
     }
 
-
   /* At this point pkgfile is an opened io_stream to either a .tar.bz2 file,
      a .tar.gz file, a .tar.lzma file, or just a .tar file.  Try it first as
      a compressed file and if that fails try opening it as a tar directly.
@@ -485,20 +531,20 @@ Installer::installOne (packagemeta &pkgm, const packageversion &ver,
         {
           /* Decompression succeeded but we couldn't grok it as a valid tar
              archive.  */
+          archive *result = NULL;
           char c[512];
-	  ssize_t len;
+          ssize_t len;
           if ((len = try_decompress->peek (c, 512)) < 0
-	      || !memcmp (c, all_null, len))
+              || !memcmp (c, all_null, len))
             /* In some cases, maintainers have uploaded bzipped
                0-byte files as dummy packages instead of empty tar files.
                This is common enough that we should not treat this as an
                error condition.
-	       Same goes for tar archives consisting of a big block of
-	       all zero bytes (the famous 46 bytes tar archives). */
-	    {
-	      if (ver.Type () == package_binary)
-		pkgm.installed = ver;
-	    }
+               Same goes for tar archives consisting of a big block of
+               all zero bytes (the famous 46 bytes tar archives). */
+            {
+              result = EMPTY_ARCHIVE;
+            }
           else
             {
               note (NULL, IDS_ERR_OPEN_READ, source.Cached (),
@@ -506,7 +552,7 @@ Installer::installOne (packagemeta &pkgm, const packageversion &ver,
               ++errors;
             }
           delete try_decompress;
-          return;
+          return result;
         }
     }
   else if ((tarstream = archive::extract (pkgfile)) == NULL)
@@ -516,6 +562,31 @@ Installer::installOne (packagemeta &pkgm, const packageversion &ver,
       note (NULL, IDS_ERR_OPEN_READ, source.Cached (),
             "Unrecognisable file format");
       ++errors;
+      return NULL;
+    }
+
+  return tarstream;
+}
+
+/* install one source at a given prefix. */
+void
+Installer::installOne (packagemeta &pkgm, const packageversion &ver,
+                       packagesource &source,
+                       const std::string& prefixURL,
+                       const std::string& prefixPath,
+                       HWND owner)
+{
+  Progress.SetText1 (IDS_PROGRESS_INSTALL);
+  Progress.SetText2 ((pkgm.name + "-" + ver.Canonical_version()).c_str());
+
+  archive *tarstream = _openOne(source);
+  if (!tarstream)
+    return;
+
+  if (tarstream == EMPTY_ARCHIVE)
+    {
+      if (ver.Type () == package_binary)
+        pkgm.installed = ver;
       return;
     }
 
@@ -549,12 +620,12 @@ Installer::installOne (packagemeta &pkgm, const packageversion &ver,
   Log (LOG_PLAIN) << "Extracting from " << source.Cached () << endLog;
 
   bool error_in_this_package = _installOne(pkgm, prefixURL, prefixPath, owner,
-                                           pkgfile, tarstream, lst, false);
+                                           tarstream, lst, false);
   if (tarstream->seek(0, IO_SEEK_SET) == -1)
     Log (LOG_PLAIN) << "Error rewinding to extract symlinks" << source.Cached () << endLog;
 
   error_in_this_package |= _installOne(pkgm, prefixURL, prefixPath, owner,
-                                       pkgfile, tarstream, lst, true);
+                                       tarstream, lst, true);
 
   if (lst)
     delete lst;
@@ -575,7 +646,6 @@ Installer::_installOne (packagemeta &pkgm,
                         const std::string& prefixURL,
                         const std::string& prefixPath,
                         HWND owner,
-                        io_stream *pkgfile,
                         archive *tarstream,
                         io_stream *lst,
                         bool symlink_phase)
@@ -764,13 +834,39 @@ Installer::_installOne (packagemeta &pkgm,
 
           break;
         }
-      progress (pkgfile->tell ());
+      // progress is reported against the length of underlying file who's total
+      // size is always known in advance (unlike the that of the tar archive, if
+      // it's compressed)
+      progress (tarstream->underlying()->tell ());
       num_installs++;
     }
 
   return error_in_this_package;
 }
 
+void
+Installer::manifestOne(packagesource &source, const std::string &name, FileManifest &manifest)
+{
+  archive *tarstream = _openOne(source);
+  if (!tarstream)
+    return;
+
+  if (tarstream == EMPTY_ARCHIVE)
+    return;
+
+  std::string fn;
+  while ((fn = tarstream->next_file_name ()).size ())
+    {
+      // should be ignored?
+      if (collidableFilename(fn))
+        manifest[fn] = name;
+
+      tarstream->skip_file();
+    }
+
+  delete tarstream;
+}
+
 static void
 check_for_old_cygwin (HWND owner)
 {
@@ -802,10 +898,72 @@ check_for_old_cygwin (HWND owner)
 }
 
 static void
-do_install_thread (HINSTANCE h, HWND owner)
+check_for_collisions (Installer &myInstaller, std::vector <packageversion> &install_q)
 {
-  int i;
+  /* Build file manifest for currently installed packages  ... */
+  Progress.SetText1(IDS_PROGRESS_CONFLICTS);
+  Progress.SetBar1(0);
+
+  FileManifest manifest;
+  for (packagedb::packagecollection::iterator i = packagedb::packages.begin ();
+       i != packagedb::packages.end ();
+       ++i)
+    {
+      Progress.SetBar2(std::distance(packagedb::packages.begin(), i), packagedb::packages.size());
+
+      packagemeta & pkgm = *(i->second);
+      /* installed and not getting uninstalled (perhaps to upgrade or reinstall) */
+      if (pkgm.installed && (pkgm.desired != pkgm.installed))
+        {
+          FileManifestReader reader(pkgm.name);
+
+          std::string line;
+          while (reader.readline(line))
+            {
+              if (collidableFilename(line))
+                manifest[line] = pkgm.name;
+            }
+        }
+    }
+
+  /* ... then check for conflicts with each package to install ... */
+  for (std::vector <packageversion>::iterator i = install_q.begin ();
+       i != install_q.end (); ++i)
+  {
+    Progress.SetBar2(std::distance(install_q.begin(), i), install_q.size());
+
+    packageversion & pkg = *i;
+    FileManifest newmanifest;
+    myInstaller.manifestOne(*pkg.source(), i->Name(), newmanifest);
+
+    FileManifest intersect;
+    std::set_intersection(manifest.begin(), manifest.end(),
+                          newmanifest.begin(), newmanifest.end(),
+                          std::inserter(intersect, intersect.begin()),
+                          manifest.value_comp());
+
+    if (!intersect.empty())
+      {
+        Log (LOG_PLAIN) << "package " << i->Name() << " has " << intersect.size() << " conflicts" << endLog;
+        for (FileManifest::iterator j = intersect.begin();
+             j != intersect.end();
+             j++)
+          {
+            Log (LOG_PLAIN) << "packages " << i->Name() << " and " << j->second << " both contain file /" << j->first << endLog;
+          }
+        ++myInstaller.errors;
+      }
+    else
+      {
+        /* ... adding each packages file manifest as we go */
+        manifest.insert(newmanifest.begin(), newmanifest.end());
+      }
+  }
+}
 
+static void
+do_install_thread (HINSTANCE h, HWND owner)
+{
   num_installs = 0, num_uninstalls = 0;
   rebootneeded = false;
 
@@ -813,7 +971,7 @@ do_install_thread (HINSTANCE h, HWND owner)
                        std::string("file://") + std::string(get_root_dir()),
 		       0755);
 
-  for (i = 0; Installer::StandardDirs[i].name; i++)
+  for (int i = 0; Installer::StandardDirs[i].name; i++)
   {
     std::string p = cygpath (Installer::StandardDirs[i].name);
     if (p.size())
@@ -907,6 +1065,16 @@ do_install_thread (HINSTANCE h, HWND owner)
       Progress.SetBar2 (md5sum_total_bytes_sofar, md5sum_total_bytes);
   }
 
+  if (Collisions)
+    {
+      check_for_collisions(myInstaller, install_q);
+
+      if (myInstaller.errors)
+        {
+          fatal (owner, IDS_CONFLICT);
+        }
+    }
+
   /* start with uninstalls - remove files that new packages may replace */
   Progress.SetBar2(0);
   myInstaller.preremovePerpetual ("0");
@@ -934,6 +1102,7 @@ do_install_thread (HINSTANCE h, HWND owner)
     Progress.SetBar2(std::distance(uninstall_q.begin(), i) + 1, uninstall_q.size());
   }
 
+  /* installs */
   for (std::vector <packageversion>::iterator i = install_q.begin ();
        i != install_q.end (); ++i)
   {
@@ -956,6 +1125,7 @@ do_install_thread (HINSTANCE h, HWND owner)
     }
   }
 
+  /* source package installs */
   for (std::vector <packageversion>::iterator i = sourceinstall_q.begin ();
        i != sourceinstall_q.end (); ++i)
   {
diff --git a/res/en/res.rc b/res/en/res.rc
index e2f04fd..e2e4418 100644
--- a/res/en/res.rc
+++ b/res/en/res.rc
@@ -649,6 +649,8 @@ BEGIN
     IDS_FILE_INUSE_KILL "&Kill Processes"
     IDS_FILE_INUSE_MSG "Unable to extract"
     IDS_USER_URL_TOOLTIP "Additional site URL, path or UNC path"
+    IDS_PROGRESS_CONFLICTS "Checking for filename conflicts..."
+    IDS_CONFLICT "Filename conflicts"
 END
 
 //
@@ -664,6 +666,7 @@ BEGIN
     IDS_HELPTEXT_ALLOW_UNSUPPORTED_WINDOWS "Allow old, unsupported Windows versions"
     IDS_HELPTEXT_ARCH "Architecture to install (x86_64 or x86)"
     IDS_HELPTEXT_CATEGORIES "Specify categories to install"
+    IDS_HELPTEXT_CHECK_COLLISIONS "Check for filename collisions between packages"
     IDS_HELPTEXT_COMPACTOS "Compress installed files with Compact OS (xpress4k, xpress8k, xpress16k, lzx)"
     IDS_HELPTEXT_DELETE_ORPHANS "Remove orphaned packages"
     IDS_HELPTEXT_DISABLE_ANTIVIRUS "Disable known or suspected buggy anti virus software packages during execution"
diff --git a/resource.h b/resource.h
index a08489c..72ff554 100644
--- a/resource.h
+++ b/resource.h
@@ -108,6 +108,8 @@
 #define IDS_VIEW_UNNEEDED                1212
 #define IDS_UNSUPPORTED_WINDOWS_ARCH     1213
 #define IDS_USER_URL_TOOLTIP             1214
+#define IDS_PROGRESS_CONFLICTS           1215
+#define IDS_CONFLICT                     1216
 
 #define IDS_HELPTEXT_COMPACTOS           1500
 #define IDS_HELPTEXT_PUBKEY              1501
@@ -158,6 +160,7 @@
 #define IDS_HELPTEXT_HEADER              1546
 #define IDS_HELPTEXT_FOOTER              1547
 #define IDS_HELPTEXT_NO_WRITE_REGISTRY   1548
+#define IDS_HELPTEXT_CHECK_COLLISIONS    1549
 
 // Dialogs
 
-- 
2.39.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH setup 2/2] Detect filename collisions between packages
  2023-04-23 14:43 ` [PATCH setup 2/2] Detect filename collisions between packages Jon Turney
@ 2023-04-24 16:26   ` Christian Franke
  2023-04-27 16:11     ` Jon Turney
  0 siblings, 1 reply; 12+ messages in thread
From: Christian Franke @ 2023-04-24 16:26 UTC (permalink / raw)
  To: Jon Turney, cygwin-apps

[-- Attachment #1: Type: text/plain, Size: 2883 bytes --]

Jon Turney via Cygwin-apps wrote:
> Detect filename collisions between packages
> Don't check filenames under /etc/postinstall/ for collisions
> Report when filename collisions exist
> Add option '--collisions' to enable

IMO a useful enhancement.


> Notes:
>
> Reading file catalog from a package is moderately expensive in terms of
> I/O: To extract all the filenames from a tar archive, we need to seek to
> every file header, and to seek forward through a compressed file, we
> must examine every intervening byte to decompress it.
>
> This adds a fourth(!) pass through each archive (one to checksum it, one
> to extract files, another one (I added in dbfd1a64 without thinking too
> deeply about it) to extract symlinks), and now one to check for filename
> collisions).
>
> Using std::set_intersection() on values from std::map() here is probably
> a mistake. It's simple to write, but the performance is not good.

A faster alternative which avoids set_intersection calls in a loop is 
possibly to use one large data structure which maps filenames to sets of 
packages. Using multimap<string, string> instead of the straightforward 
map<string, set<string>> needs possibly less memory (not tested). But 
for multimap it is required that file/package name pairs are not 
inserted twice.

I attached a small standalone POC source file using multimap. It would 
also detect collisions in the already installed packages.


> ...
> diff --git a/filemanifest.h b/filemanifest.h
> new file mode 100644
> index 0000000..cc6fb81
> --- /dev/null
> +++ b/filemanifest.h
> @@ -0,0 +1,29 @@
> +/*
> + * Copyright (c) 2022 Jon Turney
> + *
> + *     This program is free software; you can redistribute it and/or modify
> + *     it under the terms of the GNU General Public License as published by
> + *     the Free Software Foundation; either version 2 of the License, or
> + *     (at your option) any later version.
> + *
> + *     A copy of the GNU General Public License can be found at
> + *http://www.gnu.org/

"SPDX-License-Identifier: GPL-2.0-or-later" is possibly a more brief and 
modern way to say this in new code.


> + *
> + */
> +
> +/*
> +  FileManifest
> +
> +  A serializable set of filenames
> +  each filename is associated with the package name that owns it
> +
> +  Used when working with the install package file manifests in /etc/setup
> +*/
> +
> +#include <string>
> +#include <unordered_map>

This should be <map> as std::unordered_map is not used in the code.


> +#include <algorithm>
> +
> +class FileManifest: public std::map<std::string, std::string>
> +{
> +};

Is the new file filemanifest.h required at all? It could be reduced to 
the following in install.cc:

#include <map>
...
typedef std::map<std::string, std::string> FileManifest;
// or more modern (C++11):
// using FileManifest = std::map<std::string, std::string>;


-- 
Regards,
Christian


[-- Attachment #2: check_for_conflicts.cc --]
[-- Type: text/plain, Size: 1450 bytes --]

#include <cstdio>
#include <map>
#include <string>

using FileManifest = std::multimap<std::string, std::string>;

int main()
{
  FileManifest manifest;

  // read from /etc/setup/*.lst.gz
  const FileManifest installed {
    {"/usr/bin/file11", "package1"}, // #1 package2
    {"/usr/bin/file12", "package1"}, // #2 package3
    {"/usr/bin/file21", "package2"},
    {"/usr/bin/file22", "package2"}, // #3 package3, package4
    {"/usr/bin/file11", "package2"}  // #1 package1
  };
  manifest.insert(installed.begin(), installed.end());

  // read from tar files to be installed
  const FileManifest install {
    {"/usr/bin/file31", "package3"},
    {"/usr/bin/file12", "package3"}, // #2 package1
    {"/usr/bin/file22", "package3"}, // #3 package2, package4
    {"/usr/bin/file41", "package4"},
    {"/usr/bin/file42", "package4"},
    {"/usr/bin/file22", "package4"}  // #3 package2, package3
  };
  manifest.insert(install.begin(), install.end());
  
  for (auto i = manifest.cbegin(), end = manifest.cend(); i != end; ) {
    auto j = i;
    if (++j != end && j->first == i->first) {
      std::printf("Packages");
      j = i;
      do
        std::printf(" %s", j->second.c_str());
      while (++j != end && j->first == i->first);
      std::printf(" contain file %s\n", i->first.c_str());
    }
  //else
    //std::printf("Package %s contains file %s\n",
    //            i->second.c_str(), i->first.c_str());
    i = j;
  }
  return 0;
}

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH setup 0/2] Detect filename collisions between packages
  2023-04-23 14:43 [PATCH setup 0/2] Detect filename collisions between packages Jon Turney
  2023-04-23 14:43 ` [PATCH setup 1/2] Add underlying() method to io_stream class Jon Turney
  2023-04-23 14:43 ` [PATCH setup 2/2] Detect filename collisions between packages Jon Turney
@ 2023-04-24 16:46 ` Corinna Vinschen
  2023-04-24 18:16   ` Achim Gratz
  2 siblings, 1 reply; 12+ messages in thread
From: Corinna Vinschen @ 2023-04-24 16:46 UTC (permalink / raw)
  To: cygwin-apps

Hi Jon,

On Apr 23 15:43, Jon Turney via Cygwin-apps wrote:
> This is a woefully underoptimized implementation of detecting filename
> collisions between packages, so it's hidden behind the command line option
> '--collisions' to enable it.
> 
> A good implementation probably (i) collects the filenames at the same time
> as checksumming the archive, or has them attached to it somehow by the
> packaging process, and (ii) uses a libsolv string pool to hold the filenames
> (so we only need to compute the intersection and unions of sets of integers,
> which is hopefully much faster)
> 
> Nevertheless, this labouriously identifies some obvious collisions which I
> might bring to the attention of package maintainers when I have some time to
> do so...

Theoretically, wouldn't this be better handled in the upload process, i.
e. by calm, rather than during the download process?

Calm could create a database containing all the files from the tar
archives it uploads, and compare that against the newly uploaded files
on the fly.  So it could just refuse to upload packages with files
already present in another package.  I don't think such a database would
take too much space on sware.  This may be much less time consuming and
it would prevent the collisions before they even occur.

There's probably another problem in terms of different file lists in
different package versions, though...


Corinna

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH setup 0/2] Detect filename collisions between packages
  2023-04-24 16:46 ` [PATCH setup 0/2] " Corinna Vinschen
@ 2023-04-24 18:16   ` Achim Gratz
  2023-04-27 16:11     ` Jon Turney
  0 siblings, 1 reply; 12+ messages in thread
From: Achim Gratz @ 2023-04-24 18:16 UTC (permalink / raw)
  To: cygwin-apps

Corinna Vinschen via Cygwin-apps writes:
> Calm could create a database containing all the files from the tar
> archives it uploads, and compare that against the newly uploaded files
> on the fly.

That already exists as the basis for package grep, albeit in the form of
a buiinch of text files.

[…]
> There's probably another problem in terms of different file lists in
> different package versions, though...

That is probably not too onerous to check for, but files moving from one
package to another is a different story.


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Factory and User Sound Singles for Waldorf rackAttack:
http://Synth.Stromeko.net/Downloads.html#WaldorfSounds

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH setup 2/2] Detect filename collisions between packages
  2023-04-24 16:26   ` Christian Franke
@ 2023-04-27 16:11     ` Jon Turney
  2023-04-27 17:20       ` Christian Franke
  0 siblings, 1 reply; 12+ messages in thread
From: Jon Turney @ 2023-04-27 16:11 UTC (permalink / raw)
  To: cygwin-apps, Christian Franke

On 24/04/2023 17:26, Christian Franke via Cygwin-apps wrote:
> Jon Turney via Cygwin-apps wrote:
>> Detect filename collisions between packages
>> Don't check filenames under /etc/postinstall/ for collisions
>> Report when filename collisions exist
>> Add option '--collisions' to enable
> 
> IMO a useful enhancement.

:)

>> Notes:
>>
>> Reading file catalog from a package is moderately expensive in terms of
>> I/O: To extract all the filenames from a tar archive, we need to seek to
>> every file header, and to seek forward through a compressed file, we
>> must examine every intervening byte to decompress it.
>>
>> This adds a fourth(!) pass through each archive (one to checksum it, one
>> to extract files, another one (I added in dbfd1a64 without thinking too
>> deeply about it) to extract symlinks), and now one to check for filename
>> collisions).
>>
>> Using std::set_intersection() on values from std::map() here is probably
>> a mistake. It's simple to write, but the performance is not good.
> 
> A faster alternative which avoids set_intersection calls in a loop is 
> possibly to use one large data structure which maps filenames to sets of 
> packages. Using multimap<string, string> instead of the straightforward 
> map<string, set<string>> needs possibly less memory (not tested). But 
> for multimap it is required that file/package name pairs are not 
> inserted twice.
> 
> I attached a small standalone POC source file using multimap. It would 
> also detect collisions in the already installed packages.

Thanks for the ideas. It seems I really didn't think that carefully 
about this...

It seems like maybe building a map from filename to the set of package 
names which contain it, and then finding all the filenames where that 
set has more than one member would be a possible better implementation.

[...]
> Is the new file filemanifest.h required at all? It could be reduced to 
> the following in install.cc:
> 
> #include <map>
> ...
> typedef std::map<std::string, std::string> FileManifest;
> // or more modern (C++11):
> // using FileManifest = std::map<std::string, std::string>;

I think I had some idea to put the (de)serialization of the file 
manifests for installed packages into that class as well, but never got 
around to it (these need to be considered in the collision assessment as 
well as newly installed packages)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH setup 0/2] Detect filename collisions between packages
  2023-04-24 18:16   ` Achim Gratz
@ 2023-04-27 16:11     ` Jon Turney
  2023-04-28  5:51       ` Brian Inglis
  0 siblings, 1 reply; 12+ messages in thread
From: Jon Turney @ 2023-04-27 16:11 UTC (permalink / raw)
  To: cygwin-apps

On 24/04/2023 19:16, Achim Gratz via Cygwin-apps wrote:
> Corinna Vinschen via Cygwin-apps writes:
>> Calm could create a database containing all the files from the tar
>> archives it uploads, and compare that against the newly uploaded files
>> on the fly.
> 
> That already exists as the basis for package grep, albeit in the form of
> a buiinch of text files.
> 
> […]
>> There's probably another problem in terms of different file lists in
>> different package versions, though...
> 
> That is probably not too onerous to check for, but files moving from one
> package to another is a different story.

This is certainly doable.

(There's a few more wrinkles, as calm doesn't presently concern itself 
with trying to work out what the solver would install if you asked to 
install "everything", considering 'obsoletes:', and maybe even explicit 
'conflicts:' markers in hints, etc.)

I think this functionality needs to exist in setup as well, though, as 
calm can't possibly have knowledge of packages you might be installing 
from 3rd party overlay package repositories.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH setup 2/2] Detect filename collisions between packages
  2023-04-27 16:11     ` Jon Turney
@ 2023-04-27 17:20       ` Christian Franke
  0 siblings, 0 replies; 12+ messages in thread
From: Christian Franke @ 2023-04-27 17:20 UTC (permalink / raw)
  To: Jon Turney, cygwin-apps

Jon Turney wrote:
> ...
>>> Using std::set_intersection() on values from std::map() here is 
>>> probably
>>> a mistake. It's simple to write, but the performance is not good.
>>
>> A faster alternative which avoids set_intersection calls in a loop is 
>> possibly to use one large data structure which maps filenames to sets 
>> of packages. Using multimap<string, string> instead of the 
>> straightforward map<string, set<string>> needs possibly less memory 
>> (not tested). But for multimap it is required that file/package name 
>> pairs are not inserted twice.
>>
>> I attached a small standalone POC source file using multimap. It 
>> would also detect collisions in the already installed packages.
>
> Thanks for the ideas. It seems I really didn't think that carefully 
> about this...
>
> It seems like maybe building a map from filename to the set of package 
> names which contain it, and then finding all the filenames where that 
> set has more than one member would be a possible better implementation.

Of course this is the more obvious method and easier to implement. It is 
somewhat slower and needs more memory.

Meantime I did a quick artificial test with 10000 "packages" with 100 
"files" each and 2 collisions. This resulted in a (multi)map size if 
1000000(+2), total size of strings was ~80MB. Results:

Data structure: execution time, memory (working set)
multimap<string, string>: 1.8 seconds, 238MB
map<string, set<string>>: 2.0 seconds, 318MB

The execution time is needed for the map insertions. The final scan for 
collisions is very fast in both cases.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH setup 0/2] Detect filename collisions between packages
  2023-04-27 16:11     ` Jon Turney
@ 2023-04-28  5:51       ` Brian Inglis
  2023-04-30 18:25         ` Jon Turney
  0 siblings, 1 reply; 12+ messages in thread
From: Brian Inglis @ 2023-04-28  5:51 UTC (permalink / raw)
  To: cygwin-apps

On 2023-04-27 10:11, Jon Turney via Cygwin-apps wrote:
> On 24/04/2023 19:16, Achim Gratz via Cygwin-apps wrote:
>> Corinna Vinschen via Cygwin-apps writes:
>>> Calm could create a database containing all the files from the tar
>>> archives it uploads, and compare that against the newly uploaded files
>>> on the fly.
>>
>> That already exists as the basis for package grep, albeit in the form of
>> a buiinch of text files.
>>
>> […]
>>> There's probably another problem in terms of different file lists in
>>> different package versions, though...
>>
>> That is probably not too onerous to check for, but files moving from one
>> package to another is a different story.
> 
> This is certainly doable.
> 
> (There's a few more wrinkles, as calm doesn't presently concern itself with 
> trying to work out what the solver would install if you asked to install 
> "everything", considering 'obsoletes:', and maybe even explicit 'conflicts:' 
> markers in hints, etc.)
> 
> I think this functionality needs to exist in setup as well, though, as calm 
> can't possibly have knowledge of packages you might be installing from 3rd party 
> overlay package repositories.

Please make any of these conflict messages warnings only, as few packages use 
alternatives, and there may well be benign duplication, e.g. multiple language 
versions, as we normally get complaints about conflicts.

Alternatively, what about a report like Deprecated shared library packages 
https://cygwin.com/packages/reports/deprecated_so.html?

-- 
Take care. Thanks, Brian Inglis              Calgary, Alberta, Canada

La perfection est atteinte                   Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter  not when there is no more to add
mais lorsqu'il n'y a plus rien à retirer     but when there is no more to cut
                                 -- Antoine de Saint-Exupéry

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH setup 0/2] Detect filename collisions between packages
  2023-04-28  5:51       ` Brian Inglis
@ 2023-04-30 18:25         ` Jon Turney
  2023-05-04  4:14           ` Brian Inglis
  0 siblings, 1 reply; 12+ messages in thread
From: Jon Turney @ 2023-04-30 18:25 UTC (permalink / raw)
  To: Brian Inglis, cygwin-apps

On 28/04/2023 06:51, Brian Inglis via Cygwin-apps wrote:
> On 2023-04-27 10:11, Jon Turney via Cygwin-apps wrote:
[...]
>> I think this functionality needs to exist in setup as well, though, as 
>> calm can't possibly have knowledge of packages you might be installing 
>> from 3rd party overlay package repositories.
> 
> Please make any of these conflict messages warnings only, as few 
> packages use alternatives, and there may well be benign duplication, 

Your mention of 'alternatives' makes no sense to me.

The alternatives symlinks are not (and should not be) part of the 
package, but created or updated by postinstall scripts.

(It seems like it's impossible to make them work sensibly otherwise, as 
the link would be that from the most recently installed package (which 
could be any of the parallel installable alternatives), not the highest 
priority one.)

> e.g. multiple language versions, as we normally get complaints about 
> conflicts.

I don't know what this refers to.  Can you give an example?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH setup 0/2] Detect filename collisions between packages
  2023-04-30 18:25         ` Jon Turney
@ 2023-05-04  4:14           ` Brian Inglis
  0 siblings, 0 replies; 12+ messages in thread
From: Brian Inglis @ 2023-05-04  4:14 UTC (permalink / raw)
  To: cygwin-apps

On 2023-04-30 12:25, Jon Turney wrote:
> On 28/04/2023 06:51, Brian Inglis wrote:
>> On 2023-04-27 10:11, Jon Turney wrote:
> [...]
>>> I think this functionality needs to exist in setup as well, though, as calm 
>>> can't possibly have knowledge of packages you might be installing from 3rd 
>>> party overlay package repositories.
>>
>> Please make any of these conflict messages warnings only, as few packages use 
>> alternatives, and there may well be benign duplication, 
> 
> Your mention of 'alternatives' makes no sense to me.
> 
> The alternatives symlinks are not (and should not be) part of the package, but 
> created or updated by postinstall scripts.
> 
> (It seems like it's impossible to make them work sensibly otherwise, as the link 
> would be that from the most recently installed package (which could be any of 
> the parallel installable alternatives), not the highest priority one.)
> 
>> e.g. multiple language versions, as we normally get complaints about conflicts.
> 
> I don't know what this refers to.  Can you give an example?

As only a few packages use alternatives, and there may be multiple versions of 
packages for different language versions, e.g. python3... there may be some 
duplicate driver/selector file paths in some packages for different versions if 
they may be installed in parallel, and later versions do not obsolete earlier.

-- 
Take care. Thanks, Brian Inglis              Calgary, Alberta, Canada

La perfection est atteinte                   Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter  not when there is no more to add
mais lorsqu'il n'y a plus rien à retirer     but when there is no more to cut
                                 -- Antoine de Saint-Exupéry


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-05-04  4:14 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-23 14:43 [PATCH setup 0/2] Detect filename collisions between packages Jon Turney
2023-04-23 14:43 ` [PATCH setup 1/2] Add underlying() method to io_stream class Jon Turney
2023-04-23 14:43 ` [PATCH setup 2/2] Detect filename collisions between packages Jon Turney
2023-04-24 16:26   ` Christian Franke
2023-04-27 16:11     ` Jon Turney
2023-04-27 17:20       ` Christian Franke
2023-04-24 16:46 ` [PATCH setup 0/2] " Corinna Vinschen
2023-04-24 18:16   ` Achim Gratz
2023-04-27 16:11     ` Jon Turney
2023-04-28  5:51       ` Brian Inglis
2023-04-30 18:25         ` Jon Turney
2023-05-04  4:14           ` Brian Inglis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).