* Q: Are bzip2 archives of identical inputs also guaranteed identical?
@ 2024-03-12 1:58 Jim DeLaHunt
0 siblings, 0 replies; only message in thread
From: Jim DeLaHunt @ 2024-03-12 1:58 UTC (permalink / raw)
To: bzip2-devel
[-- Attachment #1: Type: text/plain, Size: 2450 bytes --]
Hello, bzip2 supporters:
Many thanks for your work to develop BZip2 and make it freely available
in the world. I am using it, and it works well for me.
I had a question, for which I could not find an answer at the
documentation[1].
If I have two input files, F1 and F2, and I compress them with bzip2 on
different machines at different times, maybe with different versions of
bzip2, or with different implementations of the bzip2 algorithm, are the
resulting archives F1.bz2 and F2.bz2 guaranteed to be bit-for-bit
identical if and only iff the inputs F1 and F2 are bit-for-bit
identical? Or are they guaranteed to be different? Or is this property
undefined?
Reasons why they could be guaranteed identical:
* The algorithm is 100% deterministic in the output it generates
* The test suite of the implementation tests this property
Reasons why they could be guaranteed to be different:
* The algorithm calls for putting a date stamp or nonce value in the
output.
* The algorithm calls for putting the name or version of the
compression tool used into the output.
* The compression algorithm is not deterministic.
* The uncompression algorithm is not deterministic, the same archive
could generate different uncompressed output depending on
circumstances. (This would surprise me, but I suppose it is
logically possible.)
Reasons why the property could be undefined:
* No-one specified this property or tested it.
* There are known cases where the property is true, and known cases
where the property is not true, and you never can tell which case a
user fill find themselves in.
The motivation for this question:
I was cleaning up a file server. I had just finished compressing a very
large file F1 to an archive F1.bz2, and had just irretrievably deleted
F1. I came across another large file F2, with the same byte count as F1.
I want to know if F2 is identical to F1. I could uncompress F1.bz2 to
recreate F1, then diff F1 and F2. Or, if the archives are guaranteed to
be the same if and only if inputs are the same, then I could compress F2
to archive F2.bz2, and diff F1.bz2 with F2.bz2.
Best regards,
—Jim DeLaHunt, Vancouver, Canada
[1] bzip2 documentation <https://sourceware.org/bzip2/manual/manual.html>
--
. --Jim DeLaHunt,jdlh@jdlh.com http://blog.jdlh.com/ (http://jdlh.com/)
multilingual websites consultant, Vancouver, B.C., Canada
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2024-03-12 1:58 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-12 1:58 Q: Are bzip2 archives of identical inputs also guaranteed identical? Jim DeLaHunt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).