public inbox for libc-help@sourceware.org
 help / color / mirror / Atom feed
* [idea] Update ISO 14651 file in locales to the latest standard version
@ 2021-10-10 15:53 Alexander Bantyev
  2021-10-10 16:07 ` Florian Weimer
  0 siblings, 1 reply; 3+ messages in thread
From: Alexander Bantyev @ 2021-10-10 15:53 UTC (permalink / raw)
  To: libc-help

The file localedef/locales/iso14651_t1_common is, as far as I can tell,
supposed to be taken from <https://standards.iso.org/iso-iec/14651>. 
However,
the version in glibc repository is quite old (from 2016, I think) and is
missing some new Unicode codepoints. There have been new editions to the
standard, the newest being edition 6 from 2020:
<https://standards.iso.org/iso-iec/14651/ed-6/en/ISO14651_2020_TABLE1_en.txt>

Perhaps the file in the glibc repository can be updated to match the 
latest
standard?
--
Александр Бантьев /Alexander Bantyev/ aka balsoft

Nix DevOPS/SRE at serokell.io

<balsoft@balsoft.ru>
<alexander.bantyev@serokell.io>

matrix://@balsoft:balsoft.ru
(https://matrix.to/#/@balsoft:balsoft.ru)
https://t.me/balsoft
https://github.com/balsoft


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [idea] Update ISO 14651 file in locales to the latest standard version
  2021-10-10 15:53 [idea] Update ISO 14651 file in locales to the latest standard version Alexander Bantyev
@ 2021-10-10 16:07 ` Florian Weimer
  2021-11-02 16:52   ` Carlos O'Donell
  0 siblings, 1 reply; 3+ messages in thread
From: Florian Weimer @ 2021-10-10 16:07 UTC (permalink / raw)
  To: Alexander Bantyev; +Cc: libc-help, mfabian, carlos

* Alexander Bantyev:

> The file localedef/locales/iso14651_t1_common is, as far as I can tell,
> supposed to be taken from <https://standards.iso.org/iso-iec/14651>. 
> However,
> the version in glibc repository is quite old (from 2016, I think) and is
> missing some new Unicode codepoints. There have been new editions to the
> standard, the newest being edition 6 from 2020:
> <https://standards.iso.org/iso-iec/14651/ed-6/en/ISO14651_2020_TABLE1_en.txt>
>
> Perhaps the file in the glibc repository can be updated to match the
> latest standard?

I think it's scary to update this file because it alters the result of
bracket patterns in regular expressions.  The file is no longer fully
automatically generated, I think.  Implementing rational ranges where
it counts in glibc would be one way forward here.

Cc:ing Mike and Carlos, who have more details.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [idea] Update ISO 14651 file in locales to the latest standard version
  2021-10-10 16:07 ` Florian Weimer
@ 2021-11-02 16:52   ` Carlos O'Donell
  0 siblings, 0 replies; 3+ messages in thread
From: Carlos O'Donell @ 2021-11-02 16:52 UTC (permalink / raw)
  To: Florian Weimer, Alexander Bantyev; +Cc: libc-help, mfabian

On 10/10/21 12:07, Florian Weimer wrote:
> * Alexander Bantyev:
> 
>> The file localedef/locales/iso14651_t1_common is, as far as I can tell,
>> supposed to be taken from <https://standards.iso.org/iso-iec/14651>. 
>> However,
>> the version in glibc repository is quite old (from 2016, I think) and is
>> missing some new Unicode codepoints. There have been new editions to the
>> standard, the newest being edition 6 from 2020:
>> <https://standards.iso.org/iso-iec/14651/ed-6/en/ISO14651_2020_TABLE1_en.txt>
>>
>> Perhaps the file in the glibc repository can be updated to match the
>> latest standard?
> 
> I think it's scary to update this file because it alters the result of
> bracket patterns in regular expressions.  The file is no longer fully
> automatically generated, I think.  Implementing rational ranges where
> it counts in glibc would be one way forward here.
> 
> Cc:ing Mike and Carlos, who have more details.

(1) Where does glibc's ISO 14651 data come from?

We use ISO 14651 in glibc for collation weights.

We do not use ISO 14651 in glibc for collation element ordering (CEO).

(2) Is glibc's ISO 14651 data updated in an automated fashion?

No. Importing new ISO 14651 data is a manual and difficult process that involves
harmonizing with all existing locale and their collation tailorings. This is
difficult and requires reviewing the tailorings and harmoizning them with the
updates from ISO 14651.

(3) What about regexp ranges?

Regular expression ranges rely on "collation element ordering" (not weights)
and so after importing ISO 14651 updates we must update the element orders to
retain rational ranges for English language speaker expectations for ranges
e.g. [a-z], [A-Z], and [0-9].

(4) When was the ISO 14651 data last updated for glibc?

In 2018 we updated to ISO 14651 4th Edition which was harmonized with Unicode 9.0.0.

We have not updated to 5th or 6th Edition yet.

I've filed the following bug to track this:
Bug 28528 - Update to ISO 14651 6th Edition 2020.
https://sourceware.org/bugzilla/show_bug.cgi?id=28528

Hopefully this answers your questions.

-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-11-02 16:53 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-10 15:53 [idea] Update ISO 14651 file in locales to the latest standard version Alexander Bantyev
2021-10-10 16:07 ` Florian Weimer
2021-11-02 16:52   ` Carlos O'Donell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).