public inbox for libc-locales@sourceware.org
 help / color / mirror / Atom feed
From: "joseph at codesourcery dot com" <sourceware-bugzilla@sourceware.org>
To: libc-locales@sourceware.org
Subject: [Bug localedata/14095] Review / update collation data from Unicode / ISO 14651
Date: Tue, 30 Jun 2015 13:40:00 -0000	[thread overview]
Message-ID: <bug-14095-716-uY20IrkHNG@http.sourceware.org/bugzilla/> (raw)
In-Reply-To: <bug-14095-716@http.sourceware.org/bugzilla/>

https://sourceware.org/bugzilla/show_bug.cgi?id=14095

--- Comment #2 from joseph at codesourcery dot com <joseph at codesourcery dot com> ---
The people involved in getting the collation data to its present state are 
mostly no longer involved in glibc development, so if you want an 
authoritative answer you'll need to do a lot of work tracking them down.  
My hypothesis would be that each person submitting a change generally had 
their own itch to scratch (supporting collation for their own language 
better, with no interest in a more general update to a newer version of 
ISO 14651, if a newer version even existed at that time, or insufficient 
time / expertise / resources to get involved in their national standards 
committees parallel to JTC1/SC2/WG2, if ISO 14651 did not support their 
language then) and that each person accepting such a change decided that 
it was better to have the incremental improvement than to have no 
collation support for that language for the indefinite future until 
someone appeared to contribute a more thorough update.

We don't, however, need to know people's motivations for making 
incremental changes rather than larger bulk updates.  The questions that 
are actually relevant for updating the data now are more along the lines 
of: for the original addition of the ISO 14651 data, what differences are 
there from the relevant version of ISO 14651?  Do those differences relate 
to conceptual differences between the POSIX collation model and the ISO 
14651 collation model, or do they reflect different choices for how to 
collate particular characters?  If they reflect different choices, do we 
still agree that those choices are appropriate for the contexts in which 
glibc locales are used, or, with hindsight, would the ISO 14651 choices 
now be better?  Where a change was made subsequently affecting existing 
characters, is the change still at variance with current ISO 14651, and do 
we think there is still a good reason for such a difference?  Where 
collation support for new characters was added, how does that support 
compare to the support, if any, for those characters in current ISO 14651, 
and are there any differences we think are deliberate and should be 
preserved?  Do any differences reflect cases where e.g. different national 
standards specify different collation for the same characters (or 
collation differs by context), and so individual locales may need to 
override the generic international version?

Yes, there is a lot of detailed, careful work involved in analysis of the 
history of the current collation data in order to produce a justified 
analysis of those questions with recommendations for how to use data from 
current ISO 14651.  Given the responsibility to users to avoid 
regressions, we need to understand what changes would be involved in such 
an update, and satisfy ourselves that they are good changes rather than 
regressions, as part of making such an update.  Contributors willing to 
help with that careful analysis are welcome.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

  parent reply	other threads:[~2015-06-30 13:40 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-10 21:06 [Bug localedata/14095] New: " jsm28 at gcc dot gnu.org
2013-11-26 17:09 ` [Bug localedata/14095] " myllynen at redhat dot com
2014-02-18 10:12 ` pravin.d.s at gmail dot com
2014-06-25 12:08 ` fweimer at redhat dot com
2014-10-10 21:50 ` maiku.fabian at gmail dot com
2015-06-30  5:09 ` pabs3 at bonedaddy dot net
2015-06-30 13:40 ` joseph at codesourcery dot com [this message]
2015-06-30 15:29   ` Keld Simonsen
2015-06-30 13:45 ` carlos at redhat dot com
2015-06-30 15:30 ` keld at keldix dot com
2015-06-30 21:03 ` joseph at codesourcery dot com
2015-07-01  7:57   ` Keld Simonsen
2015-07-01  8:02 ` keld at keldix dot com
2016-02-19 10:46 ` vapier at gentoo dot org
2016-02-19 17:18 ` joseph at codesourcery dot com
2017-12-14 16:53 ` maiku.fabian at gmail dot com
2017-12-14 16:58 ` maiku.fabian at gmail dot com
2018-01-25 10:41 ` maiku.fabian at gmail dot com
2018-02-27 16:55 ` cvs-commit at gcc dot gnu.org
2018-02-28 14:11 ` maiku.fabian at gmail dot com
2018-03-02 12:59 ` cvs-commit at gcc dot gnu.org
2018-03-31 12:32 ` jeremip11 at gmail dot com
2018-07-09 12:01 ` fweimer at redhat dot com
2018-08-01  5:58 ` cvs-commit at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-14095-716-uY20IrkHNG@http.sourceware.org/bugzilla/ \
    --to=sourceware-bugzilla@sourceware.org \
    --cc=libc-locales@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).