From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from serval.cherry.relay.mailchannels.net (serval.cherry.relay.mailchannels.net [23.83.223.163]) by sourceware.org (Postfix) with ESMTPS id 56930385840B for ; Tue, 12 Mar 2024 01:58:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 56930385840B Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=jdlh.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=jdlh.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 56930385840B Authentication-Results: server2.sourceware.org; arc=fail smtp.remote-ip=23.83.223.163 ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1710208706; cv=fail; b=jN+LgRaRNURfKroxFgWPaFoKO63Y0L/HqOmOeA9xubWerf7ExGAyCouGDloCx/ptGdNs1qrZK5DYTcsTTbUKUvHYFyr6UCJMOnOzjd8XZ9zz8oIyym7u0d99X2v0mj2VChD4uq1CXSfeuJeILPPQKlOltgIBjcZB+NKtEJVnzWg= ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1710208706; c=relaxed/simple; bh=ZMi1jSahvp4GNkGjeplncpXZGDOQE3TIMPxhuMJoF9k=; h=DKIM-Signature:Message-ID:Date:MIME-Version:To:From:Subject; b=NRp7B723XgyeNkb/pxSCde331SLqGJPHASwCf2ULHZT/twbLf0CYkueqtEj68S33OpIpInUFfo5VCuJXa4jmwljxmCcbExNll4X7IDUzHdrFSSmB2hBZ5TC7J1kGAZJBHnsRAOKwHZhLWQvQ1zU7hriQDAjw4RdeWLaFRgho1UY= ARC-Authentication-Results: i=2; server2.sourceware.org X-Sender-Id: dreamhost|x-authsender|jdlh@jdlh.com Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id 72F64101A50 for ; Tue, 12 Mar 2024 01:58:21 +0000 (UTC) Received: from pdx1-sub0-mail-a233.dreamhost.com (unknown [127.0.0.6]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id 1A56410190E for ; Tue, 12 Mar 2024 01:58:21 +0000 (UTC) ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1710208701; a=rsa-sha256; cv=none; b=Gv3/9to073VcRgVDEnVg8pNAkGEjRXig7sNQzIhieTZTvSktXx+nbOYlDBUl2LGauLKVLj jJoPh9PM6BosJmNOtEshdKlObQDHi9poGby/7znydbWWuiyKbYS2+6a2i6o6ZRNd7UjyV+ IaaunO5MfeSA7s58+lhCS/XWli1jiLcDaqsXRQixASuznmnmHaUbQpFZGKd6HnK2oV3Ny+ N1W6bX0EdU0EAAk7wXZlgmECmSOLhYL2Eg1qtryrHfWNP3z4ROw+g8xqSwkym8d9H2SvOJ crQC7I+Q6NUEL2LFecGkxL/q8E1uM4TYo6Z7EPzh3j8ZLrMsplYVH9qLJQQ7MQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=mailchannels.net; s=arc-2022; t=1710208701; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: dkim-signature; bh=8Py98NnX0gtsY9e6IdZCuqRGB4cq2/SfZMeC6K9R1r4=; b=uw486jJ3XafDS4h90pDQDNpVLbkpiDvlR6HlZI8If5D9YG6xG+F4Uef5oVXbJLuMfcYva3 QmsL50tvp3s2naxPQhXu7NYv3TP3hiaRk+5hKv0gfJCMSKpJYTSEeHUUbqQ/48rUdHgdbr wZcgb+CwOXOVu2pBvpOShkbRmxD8zo1FIA7OkyOX8orBm3POIiSb+y66RCrLF9ix+sLwC7 pFxgkdjvbO7zX0JFKlQkxyG24xevE/cTL2TKX4AbES0LCtNN6rTlsVCwjRwDQjx3U4bIcJ 32ZzNLO0F4NtFOA2+ophikPQIknnoTFIJoj7eDff7N2HIKDxnIHP2FbAIa6nyw== ARC-Authentication-Results: i=1; rspamd-5db57bc4b6-v7clm; auth=pass smtp.auth=dreamhost smtp.mailfrom=list+Bzip2-devel@jdlh.com X-Sender-Id: dreamhost|x-authsender|jdlh@jdlh.com X-MC-Relay: Neutral X-MailChannels-SenderId: dreamhost|x-authsender|jdlh@jdlh.com X-MailChannels-Auth-Id: dreamhost X-Industry-Cellar: 445376f62641ecc7_1710208701343_2971870436 X-MC-Loop-Signature: 1710208701343:4287083290 X-MC-Ingress-Time: 1710208701343 Received: from pdx1-sub0-mail-a233.dreamhost.com (pop.dreamhost.com [64.90.62.162]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384) by 100.105.100.9 (trex/6.9.2); Tue, 12 Mar 2024 01:58:21 +0000 Received: from [192.168.0.174] (107-190-30-52.cpe.teksavvy.com [107.190.30.52]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: jdlh@jdlh.com) by pdx1-sub0-mail-a233.dreamhost.com (Postfix) with ESMTPSA id 4Ttxcr6XyKzx for ; Mon, 11 Mar 2024 18:58:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=jdlh.com; s=dreamhost; t=1710208700; bh=d6hmyyMszrCvTvAoS12eAHSEaP1eiKSdNvzTwVuxmlU=; h=Content-Type:Date:To:From:Subject; b=MUnH411KgwHB+x/UqZw69XiUxfaKXQtN+BlJiZAL2ZvTjVEasglwNmrDgobqeXIxc gpPIeEweDJjHWB7GGIypPFkARCJf4cIouMoMyHBzkQy1CqIspWdCvtEKFto4te3JQi scWDkl2+i09tDfdjhkkJOBSy7Dli0SProMCRuOe4rS/kKQNvro49GezBTYvj6PDUcH Mcjvzmww+hGTu5p2IBEqS/LgBMesJx/83WFCnXZHTKZMkjQW3ivXs7AbmflkHzWHMx 4HOF7ktCRlklc101mu2kv785pMQTjQAoFlZBgtYFF0VcXT3RTXZXn+uTkQyGiSS1uw 7uen7WJ/vnrmQ== Content-Type: multipart/alternative; boundary="------------sbRhLFw9Ct1hsuzCYVvnC5D9" Message-ID: <715a2b71-62cf-4bbc-8d14-9d61bb5ce525@jdlh.com> Date: Mon, 11 Mar 2024 18:58:20 -0700 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-CA To: bzip2-devel@sourceware.org From: Jim DeLaHunt Subject: Q: Are bzip2 archives of identical inputs also guaranteed identical? X-Spam-Status: No, score=-0.6 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,HTML_MESSAGE,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This is a multi-part message in MIME format. --------------sbRhLFw9Ct1hsuzCYVvnC5D9 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Hello, bzip2 supporters: Many thanks for your work to develop BZip2 and make it freely available in the world. I am using it, and it works well for me. I had a question, for which I could not find an answer at the documentation[1]. If I have two input files, F1 and F2, and I compress them with bzip2 on different machines at different times, maybe with different versions of bzip2, or with different implementations of the bzip2 algorithm, are the resulting archives F1.bz2 and F2.bz2 guaranteed to be bit-for-bit identical if and only iff the inputs F1 and F2 are bit-for-bit identical?  Or are they guaranteed to be different?  Or is this property undefined? Reasons why they could be guaranteed identical: * The algorithm is 100% deterministic in the output it generates * The test suite of the implementation tests this property Reasons why they could be guaranteed to be different: * The algorithm calls for putting a date stamp or nonce value in the output. * The algorithm calls for putting the name or version of the compression tool used into the output. * The compression algorithm is not deterministic. * The uncompression algorithm is not deterministic, the same archive could generate different uncompressed output depending on circumstances. (This would surprise me, but I suppose it is logically possible.) Reasons why the property could be undefined: * No-one specified this property or tested it. * There are known cases where the property is true, and known cases where the property is not true, and you never can tell which case a user fill find themselves in. The motivation for this question: I was cleaning up a file server. I had just finished compressing a very large file F1 to an archive F1.bz2, and had just irretrievably deleted F1. I came across another large file F2, with the same byte count as F1. I want to know if F2 is identical to F1. I could uncompress F1.bz2 to recreate F1, then diff F1 and F2. Or, if the archives are guaranteed to be the same if and only if inputs are the same, then I could compress F2 to archive F2.bz2, and diff F1.bz2 with F2.bz2. Best regards,      —Jim DeLaHunt, Vancouver, Canada [1] bzip2 documentation -- . --Jim DeLaHunt,jdlh@jdlh.com http://blog.jdlh.com/ (http://jdlh.com/) multilingual websites consultant, Vancouver, B.C., Canada --------------sbRhLFw9Ct1hsuzCYVvnC5D9--