From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id E765D385840F for ; Thu, 2 Dec 2021 15:16:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E765D385840F Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-420-NMTx83jnP6GC4fK-yGs-2Q-1; Thu, 02 Dec 2021 10:16:41 -0500 X-MC-Unique: NMTx83jnP6GC4fK-yGs-2Q-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id AB5B592503; Thu, 2 Dec 2021 15:16:38 +0000 (UTC) Received: from redhat.com (unknown [10.2.16.99]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 6251B60C4A; Thu, 2 Dec 2021 15:16:31 +0000 (UTC) Received: from fche by redhat.com with local (Exim 4.94.2) (envelope-from ) id 1msnp0-0002TQ-6J; Thu, 02 Dec 2021 10:16:30 -0500 Date: Thu, 2 Dec 2021 10:16:30 -0500 From: "Frank Ch. Eigler" To: Florian Weimer Cc: "Frank Ch. Eigler via Elfutils-devel" , Mark Wielaard , Luca Boccassi Subject: Re: [PATCH v2] libebl: recognize FDO Packaging Metadata ELF note Message-ID: <20211202151630.GA9174@redhat.com> References: <20211119003127.466778-1-luca.boccassi@gmail.com> <20211121194318.105654-1-luca.boccassi@gmail.com> <40a5de54f089f344697ece88e11eb41e526462ac.camel@gmail.com> <17e1d554c9a52598d2c7d27e7a40f17381285ba5.camel@klomp.org> <20211130162352.GC17988@redhat.com> <87czmhbnbd.fsf@oldenburg.str.redhat.com> MIME-Version: 1.0 In-Reply-To: <87czmhbnbd.fsf@oldenburg.str.redhat.com> User-Agent: Mutt/1.12.0 (2019-05-25) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-6.3 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: elfutils-devel@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Elfutils-devel mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Dec 2021 15:16:48 -0000 Hi - > JSON has been targeted at the Windows/Java UTF-16 world, there is always > going to be a mismatch if you try to represent it in UTF-8 or anything > that doesn't have surrogate pairs. The JSON RFC8259 8.1 mandates UTF-8 encoding for situations like ours. > > Yes, and yet we have had the bidi situation recently where UTF-8 raw > > codes could visually confuse a human reader whereas escaped \uXXXX > > wouldn't. If we forbid \uXXXX unilaterally, we literally become > > incompatible with JSON (RFC8259 7. String. "Any character may be > > escaped."), and for what? > > RFC 8259 says this: > > However, the ABNF in this specification allows member names and > string values to contain bit sequences that cannot encode Unicode > characters; for example, "\uDEAD" (a single unpaired UTF-16 > surrogate). Instances of this have been observed, for example, when > a library truncates a UTF-16 string without checking whether the > truncation split a surrogate pair. The behavior of software that > receives JSON texts containing such values is unpredictable; for > example, implementations might return different values for the length > of a string value or even suffer fatal runtime exceptions. > > A UTF-8 environment has to enforce *some* additional constraints > compared to the official JSON syntax. I'm sorry, I don't see how. If a JSON string were to include the suspect "\uDEAD", but from observing our hypothetical "no escapes!" rule they could reencode it as the UTF-8 octets 0xED 0xBA 0xAD. ISTM we're no better off. - FChE