public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Florian Weimer <fweimer@redhat.com>
To: наб <nabijaczleweli@nabijaczleweli.xyz>
Cc: libc-alpha@sourceware.org,  Victor Stinner <vstinner@redhat.com>
Subject: Re: [PATCH v7] POSIX locale covers every byte [BZ# 29511]
Date: Thu, 10 Nov 2022 10:52:10 +0100	[thread overview]
Message-ID: <87tu37uofp.fsf@oldenburg.str.redhat.com> (raw)
In-Reply-To: <20221109161415.eyqgyrp2jlwzfdmb@tarta.nabijaczleweli.xyz> (=?utf-8?B?ItC90LDQsSIncw==?= message of "Wed, 9 Nov 2022 17:14:15 +0100")

* наб:

>> Not sure what is more important here, musl compatibility or Python
>> compatibility.  Cc:ing Victor in case he as comments.  I should probably
>> ask on the musl list as well as how this divergence came to pass.

> I went for musl because (a) it's a libc not some random programming
> language, (b) putting the end of our domain at the end of the
> surrogates is more aesthetically and ideologically pleasing, and (c)
> there's marginal value of having both musl and glibc produce the same
> characters if you like save them as integers for some reason.
> But the choice of any range therein is pretty much editorial, I think.

Let's wait and see what the musl folks say.

>> This change definitely needs a NEWS entry.
> Something like this?
>   Deprecated and removed features, and other changes affecting compatibility:
>   * The default/"POSIX"/"C" locale's character set is now "POSIX",
>     instead of "ANSI_X3.4-1968"  this is a new fully-reversible
>     8-bit transparent encoding for compatibility with Issue 7 TC 2,

“POSIX Issue 7 TC 2”

>     identity-mapping bytes in the ASCII [0, 0x7F] range,
>     and mapping [0x80, 0xFF] bytes to [<U+DF80>, <U+DFFF>].

It should go into the major new features section, I think.

I would also say that POSIX no longer allows using UTF-8 for the C/POSIX
locale because the obvious question will be “why this custom encoding
and not UTF-8?”.  This new POSIX requirement is still a major
disappointment to me.

No need to repost for now.

>> > diff --git a/stdio-common/tst-printf-bz25691.c b/stdio-common/tst-printf-bz25691.c
>> > index 44844e71c3..e66242b58f 100644
>> > --- a/stdio-common/tst-printf-bz25691.c
>> > +++ b/stdio-common/tst-printf-bz25691.c
>> > @@ -30,6 +30,8 @@
>> >  static int
>> >  do_test (void)
>> >  {
>> > +  setlocale(LC_CTYPE, "C.UTF-8");
>> > +
>> >    mtrace ();
>> >  
>> >    /* For 's' conversion specifier with 'l' modifier the array must be
>> 
>> What's the rationale for this change?  If it is really required, you
>> must also update stdio-common/Makefile with a new dependency on
>> $(gen-locales).
> The test depends on the locale having a hole at 0xFF, cf. ll. 93-100:
>     /* Same test, but with an invalid multibyte sequence.  */
>     mbs[mbssize - 2] = 0xff;
>
>     ret = swprintf (result, resultsize, L"%.65537s", mbs);
>     TEST_COMPARE (ret, -1);
>
>     ret = swprintf (result, resultsize, L"%1$.65537s", mbs);
>     TEST_COMPARE (ret, -1);
> And this is the simplest way to ensure that, I think.
>
> Dependency added.

Right, makes sense.

Thanks,
Florian


  reply	other threads:[~2022-11-10  9:52 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-30 18:19 [PATCH] " наб
2022-09-06 14:06 ` [PATCH v2] " наб
2022-09-06 14:19 ` [PATCH] " Florian Weimer
2022-09-06 18:06   ` наб
2022-09-06 18:10     ` [PATCH v3 1/2] iconvdata/tst-table-charmap.sh: remove handling of old, borrowed format наб
2022-09-14  2:39       ` [PATCH v4 " наб
2022-09-21 14:01         ` [PATCH v5 " наб
2022-11-02 17:17           ` [PATCH v6 " наб
2022-11-09 12:49             ` Florian Weimer
2022-11-02 17:17           ` [PATCH v6 2/2] POSIX locale covers every byte [BZ# 29511] наб
2022-11-09 14:20             ` Florian Weimer
2022-11-09 16:14               ` [PATCH v7] " наб
2022-11-10  9:52                 ` Florian Weimer [this message]
2023-01-09 15:17                   ` [PATCH v8] " наб
2023-02-07 14:16                     ` [PATCH v9] " наб
2023-02-13 14:52                       ` Florian Weimer
2023-04-26 18:54                         ` наб
2023-04-26 21:27                           ` Florian Weimer
2023-04-27  0:17                             ` [PATCH v10] " наб
2023-04-28 15:43                               ` [PATCH v11] " наб
2023-05-07 22:53                                 ` [PATCH v12] " наб
2023-05-29 13:54                                   ` [PATCH v13] " наб
2022-11-10  8:10               ` [PATCH v6 2/2] " Florian Weimer
2022-11-28 16:24                 ` наб
2022-12-02 17:36                   ` Florian Weimer
2022-12-02 18:42                     ` наб
2022-09-21 14:01         ` [PATCH v5 " наб
2022-09-14  2:39       ` [PATCH v4 " наб
2022-09-06 18:11     ` [PATCH v3 " наб

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87tu37uofp.fsf@oldenburg.str.redhat.com \
    --to=fweimer@redhat.com \
    --cc=libc-alpha@sourceware.org \
    --cc=nabijaczleweli@nabijaczleweli.xyz \
    --cc=vstinner@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).