public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v4 0/4] Add new C.UTF-8 locale (Bug 17318)
@ 2021-04-28 13:00 Carlos O'Donell
  2021-04-28 13:00 ` [PATCH v4 1/4] Add support for processing wide ellipsis ranges in UTF-8 Carlos O'Donell
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Carlos O'Donell @ 2021-04-28 13:00 UTC (permalink / raw)
  To: libc-alpha, fweimer

In order to make implementing the C.UTF-8 locale easier there are
several steps that should be taken before the locale is added:

1) Implement wide ellipsis range handling for UTF-8 to simplify
   the LC_COLLATE description in the locale.
2) Update the UTF-8 charmap processing to include all code points
   (excluding surrogates) and make use of the wide ellipsis ranges.
4) Regenerate the UTF-8 character map with the new characters
   for full code point coverage.

The new C.UTF-8 locale is not added to SUPPORTED because it is
28MiB in size due to the size of the weights array in LC_COLLATE
for the full set of code points. Before we can make C.UTF-8
supported we must simplify the weights processing to use strcmp
and remove the weights array from the binary data. To some extent
this is a reference implementation from which we can test a newer
version or a builtin version that has the size and performance
we expect.

Carlos O'Donell (4):
  Add support for processing wide ellipsis ranges in UTF-8.
  Update UTF-8 charmap processing.
  Regenerate localedata files.
  Add generic C.UTF-8 locale (Bug 17318)

 locale/programs/charmap.c              |  174 +-
 localedata/C.UTF-8.in                  |  156 +
 localedata/Makefile                    |    2 +
 localedata/charmaps/UTF-8              | 4396 ++++--------------------
 localedata/locales/C                   |  188 +
 localedata/locales/i18n_ctype          |    2 +-
 localedata/locales/tr_TR               |    2 +-
 localedata/locales/translit_circle     |    2 +-
 localedata/locales/translit_cjk_compat |    2 +-
 localedata/locales/translit_combining  |    2 +-
 localedata/locales/translit_compat     |    2 +-
 localedata/locales/translit_font       |    2 +-
 localedata/locales/translit_fraction   |    2 +-
 localedata/unicode-gen/utf8_gen.py     |  133 +-
 14 files changed, 1288 insertions(+), 3777 deletions(-)
 create mode 100644 localedata/C.UTF-8.in
 create mode 100644 localedata/locales/C

-- 
2.26.3


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-05-02 19:18 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-28 13:00 [PATCH v4 0/4] Add new C.UTF-8 locale (Bug 17318) Carlos O'Donell
2021-04-28 13:00 ` [PATCH v4 1/4] Add support for processing wide ellipsis ranges in UTF-8 Carlos O'Donell
2021-04-29 14:11   ` Florian Weimer
2021-04-28 13:00 ` [PATCH v4 2/4] Update UTF-8 charmap processing Carlos O'Donell
2021-04-29 14:07   ` Florian Weimer
2021-04-29 21:02     ` Carlos O'Donell
2021-04-30  4:18       ` Florian Weimer
2021-05-02 19:18         ` Carlos O'Donell
2021-04-28 13:00 ` [PATCH v4 3/4] Regenerate localedata files Carlos O'Donell
2021-04-29 21:03   ` Carlos O'Donell
2021-04-28 13:00 ` [PATCH v4 4/4] Add generic C.UTF-8 locale (Bug 17318) Carlos O'Donell
2021-04-29 14:13   ` Florian Weimer
2021-04-29 20:05     ` Carlos O'Donell
2021-04-30 17:59       ` Carlos O'Donell
2021-04-30 18:20         ` Florian Weimer
2021-05-02 19:18           ` Carlos O'Donell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).