From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 2B4883858D35 for ; Thu, 10 Nov 2022 09:52:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2B4883858D35 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1668073937; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=b7bzP9h+wwh0ri6qtUS9u57+Ge+MZi87I0tp6c4Ea8M=; b=Ic/vnWmjlCFKgnJ8ln1whGtRdDh8le1T9hOhNMyH5eVmuZZgIRrGV0ITs+V10WAgqfE9LU 9gVoy212oykU88paj6zIeMucI2WuoCg8CzKLRFAuOvdE5q8K54/LzELIw9Ht70S1/g7XMM pHHLDwUpV48kLyRWvcZbj51KENcSBIc= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-347-RgdsH_qGOV-EGgjaRZor9g-1; Thu, 10 Nov 2022 04:52:14 -0500 X-MC-Unique: RgdsH_qGOV-EGgjaRZor9g-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id ABC85801231; Thu, 10 Nov 2022 09:52:13 +0000 (UTC) Received: from oldenburg.str.redhat.com (unknown [10.39.193.5]) by smtp.corp.redhat.com (Postfix) with ESMTPS id A55B040C83DD; Thu, 10 Nov 2022 09:52:12 +0000 (UTC) From: Florian Weimer To: =?utf-8?B?0L3QsNCx?= Cc: libc-alpha@sourceware.org, Victor Stinner Subject: Re: [PATCH v7] POSIX locale covers every byte [BZ# 29511] References: <969aa82c8d5904c1d2040bba87abe2f17a0dc647.1667409408.git.nabijaczleweli@nabijaczleweli.xyz> <874jv8dxat.fsf@oldenburg.str.redhat.com> <20221109161415.eyqgyrp2jlwzfdmb@tarta.nabijaczleweli.xyz> Date: Thu, 10 Nov 2022 10:52:10 +0100 In-Reply-To: <20221109161415.eyqgyrp2jlwzfdmb@tarta.nabijaczleweli.xyz> (=?utf-8?B?ItC90LDQsSIncw==?= message of "Wed, 9 Nov 2022 17:14:15 +0100") Message-ID: <87tu37uofp.fsf@oldenburg.str.redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-10.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: * =D0=BD=D0=B0=D0=B1: >> Not sure what is more important here, musl compatibility or Python >> compatibility. Cc:ing Victor in case he as comments. I should probably >> ask on the musl list as well as how this divergence came to pass. > I went for musl because (a) it's a libc not some random programming > language, (b) putting the end of our domain at the end of the > surrogates is more aesthetically and ideologically pleasing, and (c) > there's marginal value of having both musl and glibc produce the same > characters if you like save them as integers for some reason. > But the choice of any range therein is pretty much editorial, I think. Let's wait and see what the musl folks say. >> This change definitely needs a NEWS entry. > Something like this? > Deprecated and removed features, and other changes affecting compatibil= ity: > * The default/"POSIX"/"C" locale's character set is now "POSIX", > instead of "ANSI_X3.4-1968" this is a new fully-reversible > 8-bit transparent encoding for compatibility with Issue 7 TC 2, =E2=80=9CPOSIX Issue 7 TC 2=E2=80=9D > identity-mapping bytes in the ASCII [0, 0x7F] range, > and mapping [0x80, 0xFF] bytes to [, ]. It should go into the major new features section, I think. I would also say that POSIX no longer allows using UTF-8 for the C/POSIX locale because the obvious question will be =E2=80=9Cwhy this custom encodi= ng and not UTF-8?=E2=80=9D. This new POSIX requirement is still a major disappointment to me. No need to repost for now. >> > diff --git a/stdio-common/tst-printf-bz25691.c b/stdio-common/tst-prin= tf-bz25691.c >> > index 44844e71c3..e66242b58f 100644 >> > --- a/stdio-common/tst-printf-bz25691.c >> > +++ b/stdio-common/tst-printf-bz25691.c >> > @@ -30,6 +30,8 @@ >> > static int >> > do_test (void) >> > { >> > + setlocale(LC_CTYPE, "C.UTF-8"); >> > + >> > mtrace (); >> > =20 >> > /* For 's' conversion specifier with 'l' modifier the array must be >>=20 >> What's the rationale for this change? If it is really required, you >> must also update stdio-common/Makefile with a new dependency on >> $(gen-locales). > The test depends on the locale having a hole at 0xFF, cf. ll. 93-100: > /* Same test, but with an invalid multibyte sequence. */ > mbs[mbssize - 2] =3D 0xff; > > ret =3D swprintf (result, resultsize, L"%.65537s", mbs); > TEST_COMPARE (ret, -1); > > ret =3D swprintf (result, resultsize, L"%1$.65537s", mbs); > TEST_COMPARE (ret, -1); > And this is the simplest way to ensure that, I think. > > Dependency added. Right, makes sense. Thanks, Florian