public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Carlos O'Donell <carlos@redhat.com>
To: Florian Weimer <fweimer@redhat.com>, libc-alpha@sourceware.org
Subject: Re: [PATCH 0/5] Assume UTF-8 encoding for localedef input files
Date: Mon, 4 Jul 2022 15:54:08 -0400	[thread overview]
Message-ID: <706a1d98-1d50-ccfd-47ac-4b402fe19454@redhat.com> (raw)
In-Reply-To: <cover.1652994079.git.fweimer@redhat.com>

On 5/19/22 17:06, Florian Weimer via Libc-alpha wrote:
> This is a backwards-compatible change because of two localedef bugs that
> cause bytes outside the ASCII range to produce unpredictable results:
> 
>   If char is signed, conversion from the assumed ISO-8859-1 input format
>   to a UCS-4 codepoint does not produce the correct result.
> 
>   If the output character set is not overlapping ISO-8859-1 in the
>   characters used in the locale, the required character set conversion
>   is not applied.
> 
> This is why I think we can switch to UTF-8 without impacting backwards
> compatibility, and there is no need for an option to restore the old
> behavior.

I can agree with that. In some sense I think the parsing of locale files is something
that can require developers and users to adjust the syntax, though we'd like for it
to be backwards compatible. In this case it couldn't have worked.

Thank you for working on this to make locales easier to use.

I particularly appreciate the example conversion of de_DE.

Overall the series looks good and we should commit this ahead of glibc 2.36 so we can
get any new strings translated for the TP project. This series particularly adds some
error messages for the use of UTF-8 in the locale sources.

Again, I really appreciate that this makes it easier for natural language speakers
to write, adjust, and review locale sources. In cases where disambiguation is required
we still have the capacity to write it differently if we need to. This continues the
early work to convert from U-codes to ASCII.

Just like last time we had this discussion the idea that glibc would support
compiling locale sources on a system that lacks UTF-8 is no longer a requirement
that we should have for the library.
 
> Tested on i686-linux-gnu and x86_64-linux-gnu.
> 
> Thanks,
> Florian
> 
> Florian Weimer (5):
>   locale: Turn ADDC and ADDS into functions in linereader.c
>   locale: Fix signed char bug in lr_getc
>   locale: Introduce translate_unicode_codepoint into linereader.c
>   locale: localdef input files are now encoded in UTF-8
>   de_DE: Convert to UTF-8
> 
>  NEWS                         |   4 +
>  locale/programs/linereader.c | 504 ++++++++++++++++++++++-------------
>  locale/programs/linereader.h |   2 +-
>  localedata/locales/de_DE     |  32 +--
>  4 files changed, 338 insertions(+), 204 deletions(-)
> 
> 
> base-commit: 2d5ec6692f5746ccb11db60976a6481ef8e9d74f


-- 
Cheers,
Carlos.


      parent reply	other threads:[~2022-07-04 19:54 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-19 21:06 Florian Weimer
2022-05-19 21:06 ` [PATCH 1/5] locale: Turn ADDC and ADDS into functions in linereader.c Florian Weimer
2022-07-04 19:54   ` Carlos O'Donell
2022-05-19 21:06 ` [PATCH 2/5] locale: Fix signed char bug in lr_getc Florian Weimer
2022-07-04 19:54   ` Carlos O'Donell
2022-05-19 21:06 ` [PATCH 3/5] locale: Introduce translate_unicode_codepoint into linereader.c Florian Weimer
2022-07-04 19:54   ` Carlos O'Donell
2022-05-19 21:06 ` [PATCH 4/5] locale: localdef input files are now encoded in UTF-8 Florian Weimer
2022-07-04 19:54   ` Carlos O'Donell
2022-05-19 21:06 ` [PATCH 5/5] de_DE: Convert to UTF-8 Florian Weimer
2022-07-04 19:54   ` Carlos O'Donell
2022-07-05  9:27   ` Andreas Schwab
2022-07-05  9:55     ` Florian Weimer
2022-07-05 10:38       ` Andreas Schwab
2022-07-04 19:54 ` Carlos O'Donell [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=706a1d98-1d50-ccfd-47ac-4b402fe19454@redhat.com \
    --to=carlos@redhat.com \
    --cc=fweimer@redhat.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).