public inbox for cygwin-apps@cygwin.com
 help / color / mirror / Atom feed
* [PATCH setup] Add new option '--compact-os'
@ 2021-05-08 20:03 Christian Franke
  2021-05-12 15:14 ` Jon Turney
  0 siblings, 1 reply; 14+ messages in thread
From: Christian Franke @ 2021-05-08 20:03 UTC (permalink / raw)
  To: cygwin-apps

[-- Attachment #1: Type: text/plain, Size: 626 bytes --]

This experimental patch allows to reduce the footprint of a Cygwin 
installation on Windows 10.

The I/O-control itself is in a new file compactos.cc because 
WIN32_LEAN_AND_MEAN (win32.h) must not be used and some required 
definitions are still missing in the current headers.

Test results with 64bit Cygwin (Disk space used without / with 
--compact-os):

Base installation: 135MiB / 66,1 MiB (-51%)
Installation with g++, Mingw, Perl, Python, Tex, ...:  2.19GiB / 854MiB 
(-62%)

Base installation with NTFS compression: 78.7MiB (results in significant 
file fragmentation, Compact OS does not)

-- 
Regards,
Christian


[-- Attachment #2: 0001-Add-new-option-compact-os.patch --]
[-- Type: text/plain, Size: 6858 bytes --]

From 05b92f618371ea05d381aacf842f74fb2083d027 Mon Sep 17 00:00:00 2001
From: Christian Franke <christian.franke@t-online.de>
Date: Sat, 8 May 2021 21:25:07 +0200
Subject: [PATCH] Add new option '--compact-os'.

If specified, Compact OS LZX compression is applied to files below
/bin, /sbin and /usr.  DLL files are excluded because rebase will
open these files again for writing.
---
 Makefile.am          |  2 ++
 compactos.cc         | 62 ++++++++++++++++++++++++++++++++++++++++++++
 compactos.h          | 26 +++++++++++++++++++
 io_stream_cygfile.cc | 46 ++++++++++++++++++++++++++++++--
 io_stream_cygfile.h  |  2 ++
 5 files changed, 136 insertions(+), 2 deletions(-)
 create mode 100644 compactos.cc
 create mode 100644 compactos.h

diff --git a/Makefile.am b/Makefile.am
index d10ad6b..63e96da 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -108,6 +108,8 @@ inilint_SOURCES = \
 	archive_tar_file.cc \
 	choose.cc \
 	choose.h \
+	compactos.cc \
+	compactos.h \
 	compress.cc \
 	compress.h \
 	compress_bz.cc \
diff --git a/compactos.cc b/compactos.cc
new file mode 100644
index 0000000..2f1d1df
--- /dev/null
+++ b/compactos.cc
@@ -0,0 +1,62 @@
+//
+// compactos.cc
+//
+// Copyright (C) 2021 Christian Franke
+//
+// SPDX-License-Identifier: MIT
+//
+
+#include "compactos.h"
+
+#ifndef FSCTL_SET_EXTERNAL_BACKING
+#define FSCTL_SET_EXTERNAL_BACKING \
+  CTL_CODE(FILE_DEVICE_FILE_SYSTEM, 195, METHOD_BUFFERED, FILE_SPECIAL_ACCESS)
+#endif
+
+#ifndef WOF_CURRENT_VERSION
+#define WOF_CURRENT_VERSION 1
+#define WOF_PROVIDER_FILE 2
+
+typedef struct _WOF_EXTERNAL_INFO {
+  DWORD Version;
+  DWORD Provider;
+} WOF_EXTERNAL_INFO;
+
+#endif
+
+#ifndef FILE_PROVIDER_CURRENT_VERSION
+#define FILE_PROVIDER_CURRENT_VERSION 1
+
+typedef struct _FILE_PROVIDER_EXTERNAL_INFO_V1 {
+  DWORD Version;
+  DWORD Algorithm;
+  DWORD Flags;
+} FILE_PROVIDER_EXTERNAL_INFO_V1;
+
+#endif
+
+#ifndef ERROR_COMPRESSION_NOT_BENEFICIAL
+#define ERROR_COMPRESSION_NOT_BENEFICIAL 344
+#endif
+
+int CompactOsCompressFile(HANDLE h, DWORD algorithm)
+{
+  struct {
+    WOF_EXTERNAL_INFO Wof;
+    FILE_PROVIDER_EXTERNAL_INFO_V1 FileProvider;
+  } wfp;
+  wfp.Wof.Version = WOF_CURRENT_VERSION;
+  wfp.Wof.Provider = WOF_PROVIDER_FILE;
+  wfp.FileProvider.Version = FILE_PROVIDER_CURRENT_VERSION;
+  wfp.FileProvider.Algorithm = algorithm;
+  wfp.FileProvider.Flags = 0;
+
+  if (!DeviceIoControl(h, FSCTL_SET_EXTERNAL_BACKING, &wfp, sizeof(wfp), 0, 0, 0, 0))
+    {
+      if (GetLastError() != ERROR_COMPRESSION_NOT_BENEFICIAL)
+        return -1;
+      return 0;
+    }
+
+  return 1;
+}
diff --git a/compactos.h b/compactos.h
new file mode 100644
index 0000000..c1470f1
--- /dev/null
+++ b/compactos.h
@@ -0,0 +1,26 @@
+//
+// compactos.h
+//
+// Copyright (C) 2021 Christian Franke
+//
+// SPDX-License-Identifier: MIT
+//
+
+#ifndef COMPACTOS_H
+#define COMPACTOS_H
+
+#ifndef _INC_WINDOWS
+#include <windows.h>
+#endif
+
+#ifndef FILE_PROVIDER_COMPRESSION_XPRESS4K
+#define FILE_PROVIDER_COMPRESSION_XPRESS4K  0
+#define FILE_PROVIDER_COMPRESSION_LZX       1
+#define FILE_PROVIDER_COMPRESSION_XPRESS8K  2
+#define FILE_PROVIDER_COMPRESSION_XPRESS16K 3
+#endif
+
+// Returns: 1=compressed, 0=not compressed, -1=error
+int CompactOsCompressFile(HANDLE h, DWORD algorithm);
+
+#endif // COMPACTOS_H
diff --git a/io_stream_cygfile.cc b/io_stream_cygfile.cc
index 2d0716f..6be2940 100644
--- a/io_stream_cygfile.cc
+++ b/io_stream_cygfile.cc
@@ -18,6 +18,9 @@
 #include "filemanip.h"
 #include "mkdir.h"
 #include "mount.h"
+#include "compactos.h"
+
+#include "getopt++/BoolOption.h"
 
 #include <stdlib.h>
 #include <errno.h>
@@ -27,6 +30,7 @@
 #include "IOStreamProvider.h"
 #include "LogSingleton.h"
 
+static BoolOption CompactOsOption (false, '\0', "compact-os", "Compress installed files with Compact OS LZX");
 
 /* completely private iostream registration class */
 class CygFileProvider : public IOStreamProvider
@@ -59,7 +63,8 @@ CygFileProvider CygFileProvider::theInstance = CygFileProvider();
 
 
 std::string io_stream_cygfile::cwd("/");
-  
+bool io_stream_cygfile::compact_os_is_available = (OSMajorVersion () >= 10);
+
 // Normalise a unix style path relative to 
 // cwd.
 std::string
@@ -120,7 +125,22 @@ get_root_dir_now ()
   read_mounts (std::string ());
 }
 
-io_stream_cygfile::io_stream_cygfile (const std::string& name, const std::string& mode, mode_t perms) : fp(), lasterr (0), fname(), wname (NULL)
+static bool
+compactos_is_useless (const std::string& name)
+{
+  const char * const p = name.c_str();
+  if (!(!strncmp (p, "/bin/", 5) || !strncmp (p, "/sbin/", 6) || !strncmp (p, "/usr/", 5)))
+    return true; /* File is not in R/O tree. */
+  const size_t len = name.size(); /* >= 5 */
+  if (!strcmp (p + (len - 4), ".dll") || !strcmp (p + (len - 3), ".so"))
+    return true; /* Rebase will open file for writing which uncompresses the file. */
+  if (!strcmp (p + (len - 3), ".gz") || !strcmp (p + (len - 3), ".xz"))
+    return true; /* File is already compressed. */
+  return false;
+}
+
+io_stream_cygfile::io_stream_cygfile (const std::string& name, const std::string& mode, mode_t perms)
+: fp(), lasterr (0), fname(), wname (NULL), compact_os_algorithm(-1)
 {
   errno = 0;
   if (!name.size())
@@ -153,6 +173,10 @@ io_stream_cygfile::io_stream_cygfile (const std::string& name, const std::string
 	Log (LOG_TIMESTAMP) << "io_stream_cygfile: fopen(" << name << ") failed " << errno << " "
 	  << strerror(errno) << endLog;
       }
+
+      if (mode[0] == 'w' && compact_os_is_available && CompactOsOption
+	  && !compactos_is_useless (name))
+	compact_os_algorithm = FILE_PROVIDER_COMPRESSION_LZX; /* best */
     }
 }
 
@@ -367,6 +391,24 @@ io_stream_cygfile::set_mtime (time_t mtime)
 		   FILE_ATTRIBUTE_NORMAL | FILE_FLAG_BACKUP_SEMANTICS, 0);
   if (h == INVALID_HANDLE_VALUE)
     return 1;
+
+  if (compact_os_algorithm >= 0)
+    {
+      /* Compact OS must be applied during last GENERIC_WRITE access
+	 and before SetFileTime(). */
+      int rc = CompactOsCompressFile (h, compact_os_algorithm);
+      if (rc < 0)
+	{
+	  DWORD err = GetLastError();
+	  Log (LOG_TIMESTAMP) << "Compact OS disabled after error " << err
+			      << " on " << fname << endLog;
+	  compact_os_is_available = false;
+	}
+      else
+	Log (LOG_BABBLE) << "Compact OS algorithm " << compact_os_algorithm
+			 << (rc == 0 ? " not " : " ") << "applied to " << fname << endLog;
+    }
+
   SetFileTime (h, 0, 0, &ftime);
   CloseHandle (h);
   return 0;
diff --git a/io_stream_cygfile.h b/io_stream_cygfile.h
index 1ece242..b977909 100644
--- a/io_stream_cygfile.h
+++ b/io_stream_cygfile.h
@@ -61,7 +61,9 @@ private:
   std::string fname;
   wchar_t *wname;
   wchar_t *w_str ();
+  int compact_os_algorithm;
   static std::string cwd;
+  static bool compact_os_is_available;
 };
 
 #endif /* SETUP_IO_STREAM_CYGFILE_H */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH setup] Add new option '--compact-os'
  2021-05-08 20:03 [PATCH setup] Add new option '--compact-os' Christian Franke
@ 2021-05-12 15:14 ` Jon Turney
  2021-05-12 17:50   ` Christian Franke
  2021-05-12 18:04   ` Corinna Vinschen
  0 siblings, 2 replies; 14+ messages in thread
From: Jon Turney @ 2021-05-12 15:14 UTC (permalink / raw)
  To: cygwin-apps, Christian Franke

On 08/05/2021 21:03, Christian Franke wrote:
> This experimental patch allows to reduce the footprint of a Cygwin 
> installation on Windows 10.

Thanks.

> The I/O-control itself is in a new file compactos.cc because 
> WIN32_LEAN_AND_MEAN (win32.h) must not be used and some required 

Yeah, I think "Any include of <windows.h> should be through this file" 
is more a guideline than a rule, since we already break it in other 
places :)

> definitions are still missing in the current headers.

Let me encourage you to submit those to MinGW-w64 so they end up in the 
w32api package.

> Test results with 64bit Cygwin (Disk space used without / with 
> --compact-os):
> 
> Base installation: 135MiB / 66,1 MiB (-51%)
> Installation with g++, Mingw, Perl, Python, Tex, ...:  2.19GiB / 854MiB 
> (-62%)
> 
> Base installation with NTFS compression: 78.7MiB (results in significant 
> file fragmentation, Compact OS does not)

Nice.

A few minor comments.

> Date: Sat, 8 May 2021 21:25:07 +0200
> Subject: [PATCH] Add new option '--compact-os'.
[...]> --- /dev/null
> +++ b/compactos.cc
> @@ -0,0 +1,62 @@
> +//
> +// compactos.cc
> +//
> +// Copyright (C) 2021 Christian Franke
> +//
> +// SPDX-License-Identifier: MIT
> +//
> +
> +#include "compactos.h"
> +
> +#ifndef FSCTL_SET_EXTERNAL_BACKING

There should be a comment here saying "not yet provided by w32api" or 
similar.

> diff --git a/compactos.h b/compactos.h
> new file mode 100644
> index 0000000..c1470f1
> --- /dev/null
> +++ b/compactos.h
> @@ -0,0 +1,26 @@
> +//
> +// compactos.h
> +//
> +// Copyright (C) 2021 Christian Franke
> +//
> +// SPDX-License-Identifier: MIT
> +//
> +
> +#ifndef COMPACTOS_H
> +#define COMPACTOS_H
> +
> +#ifndef _INC_WINDOWS

I hope windows.h already has it's own include guard?

> +#include <windows.h>
> +#endif
[...]
> +bool io_stream_cygfile::compact_os_is_available = (OSMajorVersion () >= 10);

The documentation seems a bit vague, but are we really expecting this to 
work on Windows 10 1507?

> +
>  // Normalise a unix style path relative to 
>  // cwd.
>  std::string
> @@ -120,7 +125,22 @@ get_root_dir_now ()
>    read_mounts (std::string ());
>  }
>  
> -io_stream_cygfile::io_stream_cygfile (const std::string& name, const std::string& mode, mode_t perms) : fp(), lasterr (0), fname(), wname (NULL)
> +static bool
> +compactos_is_useless (const std::string& name)

Something like 'compression_useful' might be a bit clearer?

> +{
> +  const char * const p = name.c_str();
> +  if (!(!strncmp (p, "/bin/", 5) || !strncmp (p, "/sbin/", 6) || !strncmp (p, "/usr/", 5)))
> +    return true; /* File is not in R/O tree. */
> +  const size_t len = name.size(); /* >= 5 */
> +  if (!strcmp (p + (len - 4), ".dll") || !strcmp (p + (len - 3), ".so"))
> +    return true; /* Rebase will open file for writing which uncompresses the file. */
> +  if (!strcmp (p + (len - 3), ".gz") || !strcmp (p + (len - 3), ".xz"))
> +    return true; /* File is already compressed. */

Is this an assertion that there are no .bz2, .lzma, .zst etc. files in 
the install?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH setup] Add new option '--compact-os'
  2021-05-12 15:14 ` Jon Turney
@ 2021-05-12 17:50   ` Christian Franke
  2021-05-12 18:35     ` ASSI
  2021-05-13 14:55     ` Jon Turney
  2021-05-12 18:04   ` Corinna Vinschen
  1 sibling, 2 replies; 14+ messages in thread
From: Christian Franke @ 2021-05-12 17:50 UTC (permalink / raw)
  To: cygwin-apps

Jon Turney wrote:
> On 08/05/2021 21:03, Christian Franke wrote:
> ...
>
>> definitions are still missing in the current headers.
>
> Let me encourage you to submit those to MinGW-w64 so they end up in 
> the w32api package.

Done: https://sourceforge.net/p/mingw-w64/mailman/message/37280923/


>
>> Test results with 64bit Cygwin (Disk space used without / with 
>> --compact-os):
>>
>> Base installation: 135MiB / 66,1 MiB (-51%)
>> Installation with g++, Mingw, Perl, Python, Tex, ...:  2.19GiB / 
>> 854MiB (-62%)
>>
>> Base installation with NTFS compression: 78.7MiB (results in 
>> significant file fragmentation, Compact OS does not)
>
> Nice.
>
> A few minor comments.
>
>
>> ...
>> @@ -0,0 +1,62 @@
>> +//
>> +// compactos.cc
>> +//
>> +// Copyright (C) 2021 Christian Franke
>> +//
>> +// SPDX-License-Identifier: MIT
>> +//
>> +
>> +#include "compactos.h"
>> +
>> +#ifndef FSCTL_SET_EXTERNAL_BACKING
>
> There should be a comment here saying "not yet provided by w32api" or 
> similar.

... or we wait for a release of w32api headers with the patch mentioned 
above :-)


>
>> ...
>> +#ifndef COMPACTOS_H
>> +#define COMPACTOS_H
>> +
>> +#ifndef _INC_WINDOWS
>
> I hope windows.h already has it's own include guard?

Yes, _INC_WINDOWS should be removed.


>
> ...
>> +bool io_stream_cygfile::compact_os_is_available = (OSMajorVersion () 
>> >= 10);
>
> The documentation seems a bit vague, but are we really expecting this 
> to work on Windows 10 1507?

Not tested with 1507. With an old 1511 VBox VM, the command 'compact /C 
/EXE:LZX' works, so this I/O-control should work also.
(BTW: Caution: 'compact /C /EXE:...' does not preserve last write time - 
this is IMO a bug)


>
>> ...
>> -io_stream_cygfile::io_stream_cygfile (const std::string& name, const 
>> std::string& mode, mode_t perms) : fp(), lasterr (0), fname(), wname 
>> (NULL)
>> +static bool
>> +compactos_is_useless (const std::string& name)
>
> Something like 'compression_useful' might be a bit clearer?

I intentionally selected 'useless' because the negation is only 
'possibly_useful'. Compression might still "fail" with 
ERROR_COMPRESSION_NOT_BENEFICIAL.


>
>> +{
>> +  const char * const p = name.c_str();
>> +  if (!(!strncmp (p, "/bin/", 5) || !strncmp (p, "/sbin/", 6) || 
>> !strncmp (p, "/usr/", 5)))
>> +    return true; /* File is not in R/O tree. */
>> +  const size_t len = name.size(); /* >= 5 */
>> +  if (!strcmp (p + (len - 4), ".dll") || !strcmp (p + (len - 3), 
>> ".so"))
>> +    return true; /* Rebase will open file for writing which 
>> uncompresses the file. */
>> +  if (!strcmp (p + (len - 3), ".gz") || !strcmp (p + (len - 3), ".xz"))
>> +    return true; /* File is already compressed. */
>
> Is this an assertion that there are no .bz2, .lzma, .zst etc. files in 
> the install?
>

No, but there are only a few occurrences in packages (except src 
packages). Extension .bz2 occurs more often, so it should possibly be 
added. Adding all compression formats is IMO not worth the effort.

Even applying the compression to all files would be safe. Any too small 
or non-compressible file would result in 
ERROR_COMPRESSION_NOT_BENEFICIAL. Any later open for write access would 
silently uncompress the file.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH setup] Add new option '--compact-os'
  2021-05-12 15:14 ` Jon Turney
  2021-05-12 17:50   ` Christian Franke
@ 2021-05-12 18:04   ` Corinna Vinschen
  2021-05-13 14:42     ` Christian Franke
  1 sibling, 1 reply; 14+ messages in thread
From: Corinna Vinschen @ 2021-05-12 18:04 UTC (permalink / raw)
  To: cygwin-apps

On May 12 16:14, Jon Turney wrote:
> On 08/05/2021 21:03, Christian Franke wrote:
> [...]
> > +bool io_stream_cygfile::compact_os_is_available = (OSMajorVersion () >= 10);
> 
> The documentation seems a bit vague, but are we really expecting this to
> work on Windows 10 1507?

I think this could even work under 8.1 from what I can see on MSDN.

> > +{
> > +  const char * const p = name.c_str();
> > +  if (!(!strncmp (p, "/bin/", 5) || !strncmp (p, "/sbin/", 6) || !strncmp (p, "/usr/", 5)))
> > +    return true; /* File is not in R/O tree. */
> > +  const size_t len = name.size(); /* >= 5 */
> > +  if (!strcmp (p + (len - 4), ".dll") || !strcmp (p + (len - 3), ".so"))
> > +    return true; /* Rebase will open file for writing which uncompresses the file. */
> > +  if (!strcmp (p + (len - 3), ".gz") || !strcmp (p + (len - 3), ".xz"))
> > +    return true; /* File is already compressed. */
> 
> Is this an assertion that there are no .bz2, .lzma, .zst etc. files in the
> install?

Another question is this: FILE_PROVIDER_COMPRESSION_LZX
"This algorithm is designed to be highly compact, and provides for small
 footprint for infrequently accessed data."

When running a shell script, certain executables (especially coreutils,
gawk, sed, grep, find) are not so very infrequently accessed.  Is this
compression really feasible for these binaries?  Did you compare shell
script performance with non-compressed, XPRESS16K and LZX compressed
/bin dir?

You know how Cygwin is already slow...


Thanks,
Corinna

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH setup] Add new option '--compact-os'
  2021-05-12 17:50   ` Christian Franke
@ 2021-05-12 18:35     ` ASSI
  2021-05-12 18:48       ` Achim Gratz
  2021-05-13 14:55     ` Jon Turney
  1 sibling, 1 reply; 14+ messages in thread
From: ASSI @ 2021-05-12 18:35 UTC (permalink / raw)
  To: cygwin-apps

Christian Franke writes:
>> Is this an assertion that there are no .bz2, .lzma, .zst etc. files
>> in the install?
>>
>
> No, but there are only a few occurrences in packages (except src
> packages). Extension .bz2 occurs more often, so it should possibly be 
> added. Adding all compression formats is IMO not worth the effort.

For ZStandard that assertion is patently false already.


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Wavetables for the Waldorf Blofeld:
http://Synth.Stromeko.net/Downloads.html#BlofeldUserWavetables

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH setup] Add new option '--compact-os'
  2021-05-12 18:35     ` ASSI
@ 2021-05-12 18:48       ` Achim Gratz
  2021-05-13 15:09         ` Christian Franke
  0 siblings, 1 reply; 14+ messages in thread
From: Achim Gratz @ 2021-05-12 18:48 UTC (permalink / raw)
  To: cygwin-apps

ASSI writes:
> Christian Franke writes:
>>> Is this an assertion that there are no .bz2, .lzma, .zst etc. files
>>> in the install?
>>>
>>
>> No, but there are only a few occurrences in packages (except src
>> packages). Extension .bz2 occurs more often, so it should possibly be 
>> added. Adding all compression formats is IMO not worth the effort.
>
> For ZStandard that assertion is patently false already.

Oh wait, you are talking about the files that get installed, not the
packages.  Forget that comment, then.

Deciding the compression and compression type by the extension is prone
to miss a lot of real-world things, though, so you'd hope that that was
recognized by the compact OS code itself instead of working around it?


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

SD adaptation for Waldorf rackAttack V1.04R1:
http://Synth.Stromeko.net/Downloads.html#WaldorfSDada

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH setup] Add new option '--compact-os'
  2021-05-12 18:04   ` Corinna Vinschen
@ 2021-05-13 14:42     ` Christian Franke
  2021-05-13 14:45       ` Christian Franke
  2021-05-17 10:17       ` Corinna Vinschen
  0 siblings, 2 replies; 14+ messages in thread
From: Christian Franke @ 2021-05-13 14:42 UTC (permalink / raw)
  To: cygwin-apps

Corinna Vinschen via Cygwin-apps wrote:
> On May 12 16:14, Jon Turney wrote:
>> On 08/05/2021 21:03, Christian Franke wrote:
>> [...]
>>> +bool io_stream_cygfile::compact_os_is_available = (OSMajorVersion () >= 10);
>> The documentation seems a bit vague, but are we really expecting this to
>> work on Windows 10 1507?
> I think this could even work under 8.1 from what I can see on MSDN.

I skipped all Win8*, so I didn't test with 8.1 :-)

This page says "Available starting with Windows 10":
https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/ns-ntifs-_file_provider_external_info_v0

It also says "Header: ntifs.h" but in recent "Windows Kits" all required 
defines are in winioctl.h.

These defines are enabled even for '>= _WIN32_WINNT_WIN7'. According to 
a test I did some time ago, Win7 could not read these files.


>
>>> +{
>>> +  const char * const p = name.c_str();
>>> +  if (!(!strncmp (p, "/bin/", 5) || !strncmp (p, "/sbin/", 6) || !strncmp (p, "/usr/", 5)))
>>> +    return true; /* File is not in R/O tree. */
>>> +  const size_t len = name.size(); /* >= 5 */
>>> +  if (!strcmp (p + (len - 4), ".dll") || !strcmp (p + (len - 3), ".so"))
>>> +    return true; /* Rebase will open file for writing which uncompresses the file. */
>>> +  if (!strcmp (p + (len - 3), ".gz") || !strcmp (p + (len - 3), ".xz"))
>>> +    return true; /* File is already compressed. */
>> Is this an assertion that there are no .bz2, .lzma, .zst etc. files in the
>> install?
> Another question is this: FILE_PROVIDER_COMPRESSION_LZX
> "This algorithm is designed to be highly compact, and provides for small
>   footprint for infrequently accessed data."
>
> When running a shell script, certain executables (especially coreutils,
> gawk, sed, grep, find) are not so very infrequently accessed.  Is this
> compression really feasible for these binaries?  Did you compare shell
> script performance with non-compressed, XPRESS16K and LZX compressed
> /bin dir?

Good point. Now I did a test with a ./configure script run after reboot: 
There was significant difference with /bin/*.exe (only) uncompressed, 
NTFS-, XPRESS16K- or LZX-compressed. Time was always around 23s.

Here a read speed test with fast and slow storage and a 10+ years old 
i7-2600K (4C/8T). The 256MiB test file was generated by concatenating 
various EXE files. All file accesses were the first after reboot. AV 
(defender) was turned off:


  Compression MiB      T1     T2   T3,T4
  ======================================
  None        256   0.69s  10.1s  <0.02s
  NTFS        159   1.03s   8.1s  <0.02s
  XPRESS4K    138   -
  XPRESS8K    128   -
  XPRESS16K   123   0.64s   5.4s  <0.02s
  LZX          97   0.79s   4.8s  <0.02s

T1,T2: Read whole file: time dd if=FILE bs=FILESIZE of=/dev/null
T3,T4: Read last byte: time dd if=FILE bs=1 skip=FILESIZE-1 of=/dev/null

T1,T3: SATA SSD, raw read speed with dd bs=1M: ~520MB/s
T2,T4: USB3 flash drive via USB2, raw read speed: ~27MB/s


As expected, compression helps to improve 'virtual' read speed on slow 
storage. Otherwise, it depends on storage speed, CPU speed, system load, ...
As unexpected (for me), even LZX seems to be suitable for random reads 
which are done when EXE files are preloaded or paged-in.

If the files were already cached, all read times were similar: ~0.135s 
for the whole file.

For more flexibility, I will provide a new version of the patch with 
'--compact-os ALGORITHM' option.

Thanks,
Christian


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH setup] Add new option '--compact-os'
  2021-05-13 14:42     ` Christian Franke
@ 2021-05-13 14:45       ` Christian Franke
  2021-05-17 10:17       ` Corinna Vinschen
  1 sibling, 0 replies; 14+ messages in thread
From: Christian Franke @ 2021-05-13 14:45 UTC (permalink / raw)
  To: cygwin-apps

Christian Franke wrote:
> Corinna Vinschen via Cygwin-apps wrote:
>> ...
>> When running a shell script, certain executables (especially coreutils,
>> gawk, sed, grep, find) are not so very infrequently accessed. Is this
>> compression really feasible for these binaries?  Did you compare shell
>> script performance with non-compressed, XPRESS16K and LZX compressed
>> /bin dir?
>
> Good point. Now I did a test with a ./configure script run after 
> reboot: There was significant difference with /bin/*.exe (only) 
> uncompressed, NTFS-, XPRESS16K- or LZX-compressed. Time was always 
> around 23s.

Of course this should be: "... . There was *no* significant difference 
...", sorry.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH setup] Add new option '--compact-os'
  2021-05-12 17:50   ` Christian Franke
  2021-05-12 18:35     ` ASSI
@ 2021-05-13 14:55     ` Jon Turney
  2021-05-14  7:27       ` Christian Franke
  1 sibling, 1 reply; 14+ messages in thread
From: Jon Turney @ 2021-05-13 14:55 UTC (permalink / raw)
  To: cygwin-apps

On 12/05/2021 18:50, Christian Franke wrote:
> Jon Turney wrote:
>> On 08/05/2021 21:03, Christian Franke wrote:
>> ...
>>> +#include "compactos.h"
>>> +
>>> +#ifndef FSCTL_SET_EXTERNAL_BACKING
>>
>> There should be a comment here saying "not yet provided by w32api" or 
>> similar.
> 
> ... or we wait for a release of w32api headers with the patch mentioned 
> above :-)

No, I think this way is better, since I build the setup releases on 
Fedora, and so don't have any control about when the w32api package I'm 
building against gets updated

(and furthermore it's an old Fedora at the moment, since the x86 MinGW 
toolchain in recent Fedora isn't built with SJLJ exception handling...)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH setup] Add new option '--compact-os'
  2021-05-12 18:48       ` Achim Gratz
@ 2021-05-13 15:09         ` Christian Franke
  0 siblings, 0 replies; 14+ messages in thread
From: Christian Franke @ 2021-05-13 15:09 UTC (permalink / raw)
  To: cygwin-apps

Achim Gratz wrote:
> ...
> Deciding the compression and compression type by the extension is prone
> to miss a lot of real-world things, though, so you'd hope that that was
> recognized by the compact OS code itself instead of working around it?

Yes. The compression only succeeds if the number of clusters could be 
reduced. Otherwise it fails with ERROR_COMPRESSION_NOT_BENEFICIAL and 
leaves the file as is.

Regards,
Christian


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH setup] Add new option '--compact-os'
  2021-05-13 14:55     ` Jon Turney
@ 2021-05-14  7:27       ` Christian Franke
  2021-05-14  7:55         ` Christian Franke
  0 siblings, 1 reply; 14+ messages in thread
From: Christian Franke @ 2021-05-14  7:27 UTC (permalink / raw)
  To: cygwin-apps

[-- Attachment #1: Type: text/plain, Size: 887 bytes --]

Jon Turney wrote:
> On 12/05/2021 18:50, Christian Franke wrote:
>> Jon Turney wrote:
>>> On 08/05/2021 21:03, Christian Franke wrote:
>>> ...
>>>> +#include "compactos.h"
>>>> +
>>>> +#ifndef FSCTL_SET_EXTERNAL_BACKING
>>>
>>> There should be a comment here saying "not yet provided by w32api" 
>>> or similar.
>>
>> ... or we wait for a release of w32api headers with the patch 
>> mentioned above :-)
>
> No, I think this way is better, since I build the setup releases on 
> Fedora, and so don't have any control about when the w32api package 
> I'm building against gets updated
>
> (and furthermore it's an old Fedora at the moment, since the x86 MinGW 
> toolchain in recent Fedora isn't built with SJLJ exception handling...)
>

I see. BTW: Mingw-w64 upstream pushed my patch yesterday.

Attached is a new patch for setup which also allows to select the 
compression algorithm.


[-- Attachment #2: 0001-Add-new-option-compact-os-ALGORITHM.patch --]
[-- Type: text/plain, Size: 8108 bytes --]

From d65db8dcbe3b07a06adbf5484dcb5ab98e165b04 Mon Sep 17 00:00:00 2001
From: Christian Franke <christian.franke@t-online.de>
Date: Fri, 14 May 2021 09:10:06 +0200
Subject: [PATCH] Add new option '--compact-os ALGORITHM'.

If specified, selected Compact OS compression algorithm is applied
to files below /bin, /sbin and /usr.  Most DLL files are excluded
because rebase will open these files again for writing.
---
 Makefile.am          |  2 +
 compactos.cc         | 63 +++++++++++++++++++++++++++++++
 compactos.h          | 25 +++++++++++++
 io_stream_cygfile.cc | 88 +++++++++++++++++++++++++++++++++++++++++++-
 io_stream_cygfile.h  |  2 +
 5 files changed, 178 insertions(+), 2 deletions(-)
 create mode 100644 compactos.cc
 create mode 100644 compactos.h

diff --git a/Makefile.am b/Makefile.am
index d10ad6b..63e96da 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -108,6 +108,8 @@ inilint_SOURCES = \
 	archive_tar_file.cc \
 	choose.cc \
 	choose.h \
+	compactos.cc \
+	compactos.h \
 	compress.cc \
 	compress.h \
 	compress_bz.cc \
diff --git a/compactos.cc b/compactos.cc
new file mode 100644
index 0000000..9ed2a73
--- /dev/null
+++ b/compactos.cc
@@ -0,0 +1,63 @@
+//
+// compactos.cc
+//
+// Copyright (C) 2021 Christian Franke
+//
+// SPDX-License-Identifier: MIT
+//
+
+#include "compactos.h"
+
+/* Not yet provided by w32api headers. */
+#ifndef FSCTL_SET_EXTERNAL_BACKING
+#define FSCTL_SET_EXTERNAL_BACKING \
+  CTL_CODE(FILE_DEVICE_FILE_SYSTEM, 195, METHOD_BUFFERED, FILE_SPECIAL_ACCESS)
+#endif
+
+#ifndef WOF_CURRENT_VERSION
+#define WOF_CURRENT_VERSION 1
+
+typedef struct _WOF_EXTERNAL_INFO {
+  DWORD Version;
+  DWORD Provider;
+} WOF_EXTERNAL_INFO;
+
+#endif
+
+#ifndef WOF_PROVIDER_FILE
+#define WOF_PROVIDER_FILE 2
+#define FILE_PROVIDER_CURRENT_VERSION 1
+
+typedef struct _FILE_PROVIDER_EXTERNAL_INFO_V1 {
+  DWORD Version;
+  DWORD Algorithm;
+  DWORD Flags;
+} FILE_PROVIDER_EXTERNAL_INFO_V1;
+
+#endif
+
+#ifndef ERROR_COMPRESSION_NOT_BENEFICIAL
+#define ERROR_COMPRESSION_NOT_BENEFICIAL 344
+#endif
+
+int CompactOsCompressFile(HANDLE h, DWORD algorithm)
+{
+  struct {
+    WOF_EXTERNAL_INFO Wof;
+    FILE_PROVIDER_EXTERNAL_INFO_V1 FileProvider;
+  } wfp;
+  wfp.Wof.Version = WOF_CURRENT_VERSION;
+  wfp.Wof.Provider = WOF_PROVIDER_FILE;
+  wfp.FileProvider.Version = FILE_PROVIDER_CURRENT_VERSION;
+  wfp.FileProvider.Algorithm = algorithm;
+  wfp.FileProvider.Flags = 0;
+
+  if (!DeviceIoControl(h, FSCTL_SET_EXTERNAL_BACKING, &wfp, sizeof(wfp), 0, 0, 0, 0))
+    {
+      if (GetLastError() != ERROR_COMPRESSION_NOT_BENEFICIAL)
+        return -1;
+      return 0;
+    }
+
+  return 1;
+}
diff --git a/compactos.h b/compactos.h
new file mode 100644
index 0000000..f187718
--- /dev/null
+++ b/compactos.h
@@ -0,0 +1,25 @@
+//
+// compactos.h
+//
+// Copyright (C) 2021 Christian Franke
+//
+// SPDX-License-Identifier: MIT
+//
+
+#ifndef COMPACTOS_H
+#define COMPACTOS_H
+
+#include <windows.h>
+
+/* Not yet provided by w32api headers. */
+#ifndef FILE_PROVIDER_COMPRESSION_XPRESS4K
+#define FILE_PROVIDER_COMPRESSION_XPRESS4K  0
+#define FILE_PROVIDER_COMPRESSION_LZX       1
+#define FILE_PROVIDER_COMPRESSION_XPRESS8K  2
+#define FILE_PROVIDER_COMPRESSION_XPRESS16K 3
+#endif
+
+// Returns: 1=compressed, 0=not compressed, -1=error
+int CompactOsCompressFile(HANDLE h, DWORD algorithm);
+
+#endif // COMPACTOS_H
diff --git a/io_stream_cygfile.cc b/io_stream_cygfile.cc
index 2d0716f..97e70db 100644
--- a/io_stream_cygfile.cc
+++ b/io_stream_cygfile.cc
@@ -18,6 +18,9 @@
 #include "filemanip.h"
 #include "mkdir.h"
 #include "mount.h"
+#include "compactos.h"
+
+#include "getopt++/StringOption.h"
 
 #include <stdlib.h>
 #include <errno.h>
@@ -27,6 +30,45 @@
 #include "IOStreamProvider.h"
 #include "LogSingleton.h"
 
+/* Option '--compact-os ALGORITHM' */
+class CompactOsStringOption : public StringOption
+{
+public:
+  CompactOsStringOption ();
+  virtual Result Process (char const *optarg, int prefixIndex) /* override */;
+  operator int () const { return intval; }
+private:
+  int intval;
+};
+
+CompactOsStringOption::CompactOsStringOption ()
+: StringOption ("", '\0', "compact-os",
+    "Compress installed files with Compact OS "
+    "(xpress4k, xpress8k, xpress16k, lzx)", false),
+  intval (-1)
+{
+}
+
+Option::Result CompactOsStringOption::Process (char const *optarg, int prefixIndex)
+{
+  Result res = StringOption::Process (optarg, prefixIndex);
+  if (res != Ok)
+    return res;
+  const std::string& strval = *this;
+  if (strval == "xpress4k")
+    intval = FILE_PROVIDER_COMPRESSION_XPRESS4K;
+  else if (strval == "xpress8k")
+    intval = FILE_PROVIDER_COMPRESSION_XPRESS8K;
+  else if (strval == "xpress16k")
+    intval = FILE_PROVIDER_COMPRESSION_XPRESS16K;
+  else if (strval == "lzx")
+    intval = FILE_PROVIDER_COMPRESSION_LZX;
+  else
+    return Failed;
+  return Ok;
+}
+
+static CompactOsStringOption CompactOsOption;
 
 /* completely private iostream registration class */
 class CygFileProvider : public IOStreamProvider
@@ -59,7 +101,8 @@ CygFileProvider CygFileProvider::theInstance = CygFileProvider();
 
 
 std::string io_stream_cygfile::cwd("/");
-  
+bool io_stream_cygfile::compact_os_is_available = (OSMajorVersion () >= 10);
+
 // Normalise a unix style path relative to 
 // cwd.
 std::string
@@ -120,7 +163,26 @@ get_root_dir_now ()
   read_mounts (std::string ());
 }
 
-io_stream_cygfile::io_stream_cygfile (const std::string& name, const std::string& mode, mode_t perms) : fp(), lasterr (0), fname(), wname (NULL)
+static bool
+compactos_is_useless (const std::string& name)
+{
+  const char * const p = name.c_str();
+  if (!(!strncmp (p, "/bin/", 5) || !strncmp (p, "/sbin/", 6) || !strncmp (p, "/usr/", 5)))
+    return true; /* File is not in R/O tree. */
+  const size_t len = name.size(); /* >= 5 */
+  if (!strcmp (p + (len - 4), ".dll") || !strcmp (p + (len - 3), ".so")) {
+    if (!strcmp (p + (len - 11), "cygwin1.dll") || strstr (p + 5, "/sys-root/mingw/"))
+      return false; /* Ignored by rebase. */
+    return true; /* Rebase will open file for writing which uncompresses the file. */
+  }
+  if (!strcmp (p + (len - 4), ".bz2") || !strcmp (p + (len - 3), ".gz")
+      || !strcmp (p + (len - 3), ".xz"))
+    return true; /* File is already compressed. */
+  return false;
+}
+
+io_stream_cygfile::io_stream_cygfile (const std::string& name, const std::string& mode, mode_t perms)
+: fp(), lasterr (0), fname(), wname (NULL), compact_os_algorithm(-1)
 {
   errno = 0;
   if (!name.size())
@@ -153,6 +215,10 @@ io_stream_cygfile::io_stream_cygfile (const std::string& name, const std::string
 	Log (LOG_TIMESTAMP) << "io_stream_cygfile: fopen(" << name << ") failed " << errno << " "
 	  << strerror(errno) << endLog;
       }
+
+      if (mode[0] == 'w' && compact_os_is_available && CompactOsOption >= 0
+	  && !compactos_is_useless (name))
+	compact_os_algorithm = CompactOsOption;
     }
 }
 
@@ -367,6 +433,24 @@ io_stream_cygfile::set_mtime (time_t mtime)
 		   FILE_ATTRIBUTE_NORMAL | FILE_FLAG_BACKUP_SEMANTICS, 0);
   if (h == INVALID_HANDLE_VALUE)
     return 1;
+
+  if (compact_os_algorithm >= 0)
+    {
+      /* Compact OS must be applied after last WriteFile()
+	 and before SetFileTime(). */
+      int rc = CompactOsCompressFile (h, compact_os_algorithm);
+      if (rc < 0)
+	{
+	  DWORD err = GetLastError();
+	  Log (LOG_TIMESTAMP) << "Compact OS disabled after error " << err
+			      << " on " << fname << endLog;
+	  compact_os_is_available = false;
+	}
+      else
+	Log (LOG_BABBLE) << "Compact OS algorithm " << compact_os_algorithm
+			 << (rc == 0 ? " not" : "") << " applied to " << fname << endLog;
+    }
+
   SetFileTime (h, 0, 0, &ftime);
   CloseHandle (h);
   return 0;
diff --git a/io_stream_cygfile.h b/io_stream_cygfile.h
index 1ece242..b977909 100644
--- a/io_stream_cygfile.h
+++ b/io_stream_cygfile.h
@@ -61,7 +61,9 @@ private:
   std::string fname;
   wchar_t *wname;
   wchar_t *w_str ();
+  int compact_os_algorithm;
   static std::string cwd;
+  static bool compact_os_is_available;
 };
 
 #endif /* SETUP_IO_STREAM_CYGFILE_H */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH setup] Add new option '--compact-os'
  2021-05-14  7:27       ` Christian Franke
@ 2021-05-14  7:55         ` Christian Franke
  2021-07-18 13:44           ` Jon Turney
  0 siblings, 1 reply; 14+ messages in thread
From: Christian Franke @ 2021-05-14  7:55 UTC (permalink / raw)
  To: cygwin-apps

[-- Attachment #1: Type: text/plain, Size: 1042 bytes --]

Christian Franke wrote:
> Jon Turney wrote:
>> On 12/05/2021 18:50, Christian Franke wrote:
>>> Jon Turney wrote:
>>>> On 08/05/2021 21:03, Christian Franke wrote:
>>>> ...
>>>>> +#include "compactos.h"
>>>>> +
>>>>> +#ifndef FSCTL_SET_EXTERNAL_BACKING
>>>>
>>>> There should be a comment here saying "not yet provided by w32api" 
>>>> or similar.
>>>
>>> ... or we wait for a release of w32api headers with the patch 
>>> mentioned above :-)
>>
>> No, I think this way is better, since I build the setup releases on 
>> Fedora, and so don't have any control about when the w32api package 
>> I'm building against gets updated
>>
>> (and furthermore it's an old Fedora at the moment, since the x86 
>> MinGW toolchain in recent Fedora isn't built with SJLJ exception 
>> handling...)
>>
>
> I see. BTW: Mingw-w64 upstream pushed my patch yesterday.
>
> Attached is a new patch for setup which also allows to select the 
> compression algorithm.
>

Sorry - I missed a possible segfault in the check for cygwin1.dll.

Fixed version attached.



[-- Attachment #2: 0001-Add-new-option-compact-os-ALGORITHM.patch --]
[-- Type: text/plain, Size: 8129 bytes --]

From 109190447edfb1abab61942c3661def62014fe24 Mon Sep 17 00:00:00 2001
From: Christian Franke <christian.franke@t-online.de>
Date: Fri, 14 May 2021 09:50:12 +0200
Subject: [PATCH] Add new option '--compact-os ALGORITHM'.

If specified, selected Compact OS compression algorithm is applied
to files below /bin, /sbin and /usr.  Most DLL files are excluded
because rebase will open these files again for writing.
---
 Makefile.am          |  2 +
 compactos.cc         | 63 +++++++++++++++++++++++++++++++
 compactos.h          | 25 +++++++++++++
 io_stream_cygfile.cc | 89 +++++++++++++++++++++++++++++++++++++++++++-
 io_stream_cygfile.h  |  2 +
 5 files changed, 179 insertions(+), 2 deletions(-)
 create mode 100644 compactos.cc
 create mode 100644 compactos.h

diff --git a/Makefile.am b/Makefile.am
index d10ad6b..63e96da 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -108,6 +108,8 @@ inilint_SOURCES = \
 	archive_tar_file.cc \
 	choose.cc \
 	choose.h \
+	compactos.cc \
+	compactos.h \
 	compress.cc \
 	compress.h \
 	compress_bz.cc \
diff --git a/compactos.cc b/compactos.cc
new file mode 100644
index 0000000..9ed2a73
--- /dev/null
+++ b/compactos.cc
@@ -0,0 +1,63 @@
+//
+// compactos.cc
+//
+// Copyright (C) 2021 Christian Franke
+//
+// SPDX-License-Identifier: MIT
+//
+
+#include "compactos.h"
+
+/* Not yet provided by w32api headers. */
+#ifndef FSCTL_SET_EXTERNAL_BACKING
+#define FSCTL_SET_EXTERNAL_BACKING \
+  CTL_CODE(FILE_DEVICE_FILE_SYSTEM, 195, METHOD_BUFFERED, FILE_SPECIAL_ACCESS)
+#endif
+
+#ifndef WOF_CURRENT_VERSION
+#define WOF_CURRENT_VERSION 1
+
+typedef struct _WOF_EXTERNAL_INFO {
+  DWORD Version;
+  DWORD Provider;
+} WOF_EXTERNAL_INFO;
+
+#endif
+
+#ifndef WOF_PROVIDER_FILE
+#define WOF_PROVIDER_FILE 2
+#define FILE_PROVIDER_CURRENT_VERSION 1
+
+typedef struct _FILE_PROVIDER_EXTERNAL_INFO_V1 {
+  DWORD Version;
+  DWORD Algorithm;
+  DWORD Flags;
+} FILE_PROVIDER_EXTERNAL_INFO_V1;
+
+#endif
+
+#ifndef ERROR_COMPRESSION_NOT_BENEFICIAL
+#define ERROR_COMPRESSION_NOT_BENEFICIAL 344
+#endif
+
+int CompactOsCompressFile(HANDLE h, DWORD algorithm)
+{
+  struct {
+    WOF_EXTERNAL_INFO Wof;
+    FILE_PROVIDER_EXTERNAL_INFO_V1 FileProvider;
+  } wfp;
+  wfp.Wof.Version = WOF_CURRENT_VERSION;
+  wfp.Wof.Provider = WOF_PROVIDER_FILE;
+  wfp.FileProvider.Version = FILE_PROVIDER_CURRENT_VERSION;
+  wfp.FileProvider.Algorithm = algorithm;
+  wfp.FileProvider.Flags = 0;
+
+  if (!DeviceIoControl(h, FSCTL_SET_EXTERNAL_BACKING, &wfp, sizeof(wfp), 0, 0, 0, 0))
+    {
+      if (GetLastError() != ERROR_COMPRESSION_NOT_BENEFICIAL)
+        return -1;
+      return 0;
+    }
+
+  return 1;
+}
diff --git a/compactos.h b/compactos.h
new file mode 100644
index 0000000..f187718
--- /dev/null
+++ b/compactos.h
@@ -0,0 +1,25 @@
+//
+// compactos.h
+//
+// Copyright (C) 2021 Christian Franke
+//
+// SPDX-License-Identifier: MIT
+//
+
+#ifndef COMPACTOS_H
+#define COMPACTOS_H
+
+#include <windows.h>
+
+/* Not yet provided by w32api headers. */
+#ifndef FILE_PROVIDER_COMPRESSION_XPRESS4K
+#define FILE_PROVIDER_COMPRESSION_XPRESS4K  0
+#define FILE_PROVIDER_COMPRESSION_LZX       1
+#define FILE_PROVIDER_COMPRESSION_XPRESS8K  2
+#define FILE_PROVIDER_COMPRESSION_XPRESS16K 3
+#endif
+
+// Returns: 1=compressed, 0=not compressed, -1=error
+int CompactOsCompressFile(HANDLE h, DWORD algorithm);
+
+#endif // COMPACTOS_H
diff --git a/io_stream_cygfile.cc b/io_stream_cygfile.cc
index 2d0716f..a9150e7 100644
--- a/io_stream_cygfile.cc
+++ b/io_stream_cygfile.cc
@@ -18,6 +18,9 @@
 #include "filemanip.h"
 #include "mkdir.h"
 #include "mount.h"
+#include "compactos.h"
+
+#include "getopt++/StringOption.h"
 
 #include <stdlib.h>
 #include <errno.h>
@@ -27,6 +30,45 @@
 #include "IOStreamProvider.h"
 #include "LogSingleton.h"
 
+/* Option '--compact-os ALGORITHM' */
+class CompactOsStringOption : public StringOption
+{
+public:
+  CompactOsStringOption ();
+  virtual Result Process (char const *optarg, int prefixIndex) /* override */;
+  operator int () const { return intval; }
+private:
+  int intval;
+};
+
+CompactOsStringOption::CompactOsStringOption ()
+: StringOption ("", '\0', "compact-os",
+    "Compress installed files with Compact OS "
+    "(xpress4k, xpress8k, xpress16k, lzx)", false),
+  intval (-1)
+{
+}
+
+Option::Result CompactOsStringOption::Process (char const *optarg, int prefixIndex)
+{
+  Result res = StringOption::Process (optarg, prefixIndex);
+  if (res != Ok)
+    return res;
+  const std::string& strval = *this;
+  if (strval == "xpress4k")
+    intval = FILE_PROVIDER_COMPRESSION_XPRESS4K;
+  else if (strval == "xpress8k")
+    intval = FILE_PROVIDER_COMPRESSION_XPRESS8K;
+  else if (strval == "xpress16k")
+    intval = FILE_PROVIDER_COMPRESSION_XPRESS16K;
+  else if (strval == "lzx")
+    intval = FILE_PROVIDER_COMPRESSION_LZX;
+  else
+    return Failed;
+  return Ok;
+}
+
+static CompactOsStringOption CompactOsOption;
 
 /* completely private iostream registration class */
 class CygFileProvider : public IOStreamProvider
@@ -59,7 +101,8 @@ CygFileProvider CygFileProvider::theInstance = CygFileProvider();
 
 
 std::string io_stream_cygfile::cwd("/");
-  
+bool io_stream_cygfile::compact_os_is_available = (OSMajorVersion () >= 10);
+
 // Normalise a unix style path relative to 
 // cwd.
 std::string
@@ -120,7 +163,27 @@ get_root_dir_now ()
   read_mounts (std::string ());
 }
 
-io_stream_cygfile::io_stream_cygfile (const std::string& name, const std::string& mode, mode_t perms) : fp(), lasterr (0), fname(), wname (NULL)
+static bool
+compactos_is_useless (const std::string& name)
+{
+  const char * const p = name.c_str();
+  if (!(!strncmp (p, "/bin/", 5) || !strncmp (p, "/sbin/", 6) || !strncmp (p, "/usr/", 5)))
+    return true; /* File is not in R/O tree. */
+  const size_t len = name.size(); /* >= 5 */
+  if (!strcmp (p + (len - 4), ".dll") || !strcmp (p + (len - 3), ".so")) {
+    if ((len >= 5 + 11 && !strcmp (p + (len - 11), "cygwin1.dll"))
+	|| strstr (p + 5, "/sys-root/mingw/"))
+      return false; /* Ignored by rebase. */
+    return true; /* Rebase will open file for writing which uncompresses the file. */
+  }
+  if (!strcmp (p + (len - 4), ".bz2") || !strcmp (p + (len - 3), ".gz")
+      || !strcmp (p + (len - 3), ".xz"))
+    return true; /* File is already compressed. */
+  return false;
+}
+
+io_stream_cygfile::io_stream_cygfile (const std::string& name, const std::string& mode, mode_t perms)
+: fp(), lasterr (0), fname(), wname (NULL), compact_os_algorithm(-1)
 {
   errno = 0;
   if (!name.size())
@@ -153,6 +216,10 @@ io_stream_cygfile::io_stream_cygfile (const std::string& name, const std::string
 	Log (LOG_TIMESTAMP) << "io_stream_cygfile: fopen(" << name << ") failed " << errno << " "
 	  << strerror(errno) << endLog;
       }
+
+      if (mode[0] == 'w' && compact_os_is_available && CompactOsOption >= 0
+	  && !compactos_is_useless (name))
+	compact_os_algorithm = CompactOsOption;
     }
 }
 
@@ -367,6 +434,24 @@ io_stream_cygfile::set_mtime (time_t mtime)
 		   FILE_ATTRIBUTE_NORMAL | FILE_FLAG_BACKUP_SEMANTICS, 0);
   if (h == INVALID_HANDLE_VALUE)
     return 1;
+
+  if (compact_os_algorithm >= 0)
+    {
+      /* Compact OS must be applied after last WriteFile()
+	 and before SetFileTime(). */
+      int rc = CompactOsCompressFile (h, compact_os_algorithm);
+      if (rc < 0)
+	{
+	  DWORD err = GetLastError();
+	  Log (LOG_TIMESTAMP) << "Compact OS disabled after error " << err
+			      << " on " << fname << endLog;
+	  compact_os_is_available = false;
+	}
+      else
+	Log (LOG_BABBLE) << "Compact OS algorithm " << compact_os_algorithm
+			 << (rc == 0 ? " not" : "") << " applied to " << fname << endLog;
+    }
+
   SetFileTime (h, 0, 0, &ftime);
   CloseHandle (h);
   return 0;
diff --git a/io_stream_cygfile.h b/io_stream_cygfile.h
index 1ece242..b977909 100644
--- a/io_stream_cygfile.h
+++ b/io_stream_cygfile.h
@@ -61,7 +61,9 @@ private:
   std::string fname;
   wchar_t *wname;
   wchar_t *w_str ();
+  int compact_os_algorithm;
   static std::string cwd;
+  static bool compact_os_is_available;
 };
 
 #endif /* SETUP_IO_STREAM_CYGFILE_H */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH setup] Add new option '--compact-os'
  2021-05-13 14:42     ` Christian Franke
  2021-05-13 14:45       ` Christian Franke
@ 2021-05-17 10:17       ` Corinna Vinschen
  1 sibling, 0 replies; 14+ messages in thread
From: Corinna Vinschen @ 2021-05-17 10:17 UTC (permalink / raw)
  To: cygwin-apps

On May 13 16:42, Christian Franke wrote:
> Corinna Vinschen via Cygwin-apps wrote:
> > When running a shell script, certain executables (especially coreutils,
> > gawk, sed, grep, find) are not so very infrequently accessed.  Is this
> > compression really feasible for these binaries?  Did you compare shell
> > script performance with non-compressed, XPRESS16K and LZX compressed
> > /bin dir?
> 
> Good point. Now I did a test with a ./configure script run after reboot:
> There was significant difference with /bin/*.exe (only) uncompressed, NTFS-,
> XPRESS16K- or LZX-compressed. Time was always around 23s.
> 
> Here a read speed test with fast and slow storage and a 10+ years old
> i7-2600K (4C/8T). The 256MiB test file was generated by concatenating
> various EXE files. All file accesses were the first after reboot. AV
> (defender) was turned off:
> 
> 
>  Compression MiB      T1     T2   T3,T4
>  ======================================
>  None        256   0.69s  10.1s  <0.02s
>  NTFS        159   1.03s   8.1s  <0.02s
>  XPRESS4K    138   -
>  XPRESS8K    128   -
>  XPRESS16K   123   0.64s   5.4s  <0.02s
>  LZX          97   0.79s   4.8s  <0.02s
> 
> T1,T2: Read whole file: time dd if=FILE bs=FILESIZE of=/dev/null
> T3,T4: Read last byte: time dd if=FILE bs=1 skip=FILESIZE-1 of=/dev/null
> 
> T1,T3: SATA SSD, raw read speed with dd bs=1M: ~520MB/s
> T2,T4: USB3 flash drive via USB2, raw read speed: ~27MB/s
> 
> 
> As expected, compression helps to improve 'virtual' read speed on slow
> storage. Otherwise, it depends on storage speed, CPU speed, system load, ...
> As unexpected (for me), even LZX seems to be suitable for random reads which
> are done when EXE files are preloaded or paged-in.
> 
> If the files were already cached, all read times were similar: ~0.135s for
> the whole file.
> 
> For more flexibility, I will provide a new version of the patch with
> '--compact-os ALGORITHM' option.

Great, thanks!


Corinna

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH setup] Add new option '--compact-os'
  2021-05-14  7:55         ` Christian Franke
@ 2021-07-18 13:44           ` Jon Turney
  0 siblings, 0 replies; 14+ messages in thread
From: Jon Turney @ 2021-07-18 13:44 UTC (permalink / raw)
  To: cygwin-apps, Christian Franke

On 14/05/2021 08:55, Christian Franke wrote:
> Christian Franke wrote:
>> Jon Turney wrote:
>>> On 12/05/2021 18:50, Christian Franke wrote:
>>>> Jon Turney wrote:
>>>>> On 08/05/2021 21:03, Christian Franke wrote:
>>>>> ...
>>>>>> +#include "compactos.h"
>>>>>> +
>>>>>> +#ifndef FSCTL_SET_EXTERNAL_BACKING
>>>>>
>>>>> There should be a comment here saying "not yet provided by w32api" 
>>>>> or similar.
>>>>
>>>> ... or we wait for a release of w32api headers with the patch 
>>>> mentioned above :-)
>>>
>>> No, I think this way is better, since I build the setup releases on 
>>> Fedora, and so don't have any control about when the w32api package 
>>> I'm building against gets updated
>>>
>>> (and furthermore it's an old Fedora at the moment, since the x86 
>>> MinGW toolchain in recent Fedora isn't built with SJLJ exception 
>>> handling...)
>>>
>>
>> I see. BTW: Mingw-w64 upstream pushed my patch yesterday.
>>
>> Attached is a new patch for setup which also allows to select the 
>> compression algorithm.
>>
> 
> Sorry - I missed a possible segfault in the check for cygwin1.dll.
> 
> Fixed version attached.

Sorry, I'd thought this was in the "waiting for an updated version of 
the patch" state, but you'd done that.

Now applied. Thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-07-18 13:45 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-08 20:03 [PATCH setup] Add new option '--compact-os' Christian Franke
2021-05-12 15:14 ` Jon Turney
2021-05-12 17:50   ` Christian Franke
2021-05-12 18:35     ` ASSI
2021-05-12 18:48       ` Achim Gratz
2021-05-13 15:09         ` Christian Franke
2021-05-13 14:55     ` Jon Turney
2021-05-14  7:27       ` Christian Franke
2021-05-14  7:55         ` Christian Franke
2021-07-18 13:44           ` Jon Turney
2021-05-12 18:04   ` Corinna Vinschen
2021-05-13 14:42     ` Christian Franke
2021-05-13 14:45       ` Christian Franke
2021-05-17 10:17       ` Corinna Vinschen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).