Hi! On Wed, Jul 12, 2023 at 09:44:26PM +0200, Bruno Haible wrote: > Regarding the mapping of the bytes 0x80..0xFF: > > By strategically picking c= we land at the same point of the > > Unicode Low Surrogate Area at DC00-DCFF, described as > > > Isolated surrogate code points have no interpretation; > > > consequently, no character code charts or names lists > > > are provided for this range. > > as the Python UTF-8 errors=surrogateescape encoding. > musl libc maps the bytes 0x80..0xFF to U+DF80..U+DFFF. [1][2] > > I think it is more useful to avoid an inconsistency between glibc and > musl libc, than to be consistent with what a particular user-space > program (Python) does. > > How about mapping the bytes 0x80..0xFF to U+DF80..U+DFFF, like musl libc > does? That's what I had done originally (and citing the same exact reasons!), but changed it in v10 https://sourceware.org/pipermail/libc-alpha/2023-April/147652.html because Florian likes it better. He forwarded it to musl@ https://www.openwall.com/lists/musl/2022/11/10/1 in v6 https://sourceware.org/pipermail/libc-alpha/2022-December/143690.html In short: python uses the DCxx range, musl put it at DFxx for no particular reason but decided to not move it because that would imply some sort of stability or semantic meaning. I personally like DFxx more but don't really care, so Reviewer's Privilege of Mostly-Arbitrary Design Choice. I can change it back for the same reason, but I'd rather do it, uh. Once. Don't wanna be ping-ponging this.