public inbox for libc-locales@sourceware.org
 help / color / mirror / Atom feed
* [Bug localedata/15537] New: Invalid collation for Latvian diacritical letters
@ 2013-05-26 14:24 alex at gorka dot lv
  2014-06-13 18:03 ` [Bug localedata/15537] " fweimer at redhat dot com
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: alex at gorka dot lv @ 2013-05-26 14:24 UTC (permalink / raw)
  To: libc-locales

http://sourceware.org/bugzilla/show_bug.cgi?id=15537

            Bug ID: 15537
           Summary: Invalid collation for Latvian diacritical letters
           Product: glibc
           Version: 2.18
            Status: NEW
          Severity: normal
          Priority: P2
         Component: localedata
          Assignee: unassigned at sourceware dot org
          Reporter: alex at gorka dot lv
                CC: libc-locales at sourceware dot org

Latvian language locale for Latvia has wrong collation order for Latvian
vowels: A MACRON (U0100, U0101), E MACRON (U0112, U0113), I MACRON (U012A,
U012B), O MACRON (U014C, U014D), and U MACRON (U016A, U016B).  The first weight
specifier for these letters should be equal to base letter (A, E, I, O, and U,
respectively), and only the second weight specifier must be heavier.  In other
words, letters with macrons are sorted after the same letters without macrons
only when string parts after the letter are equal.

Note that diacritical consonants - C CARON, G CEDILLA, K CEDILLA, L CEDILLA, N
CEDILLA, S CARON, and Z CARON - are always sorted after their base letters; for
these letters the first weight specifier must be different, and that is correct
with current version of the Latvian locale.

Besides, current version of Latvian locale contains letter R WITH CEDILLA
(U0156, U0157), which is now sorted separately from letter R with other
diacritical marks.  This letter is not currently used for Latvian writing in
Latvia (it was used in the first half of the 20th century, and is still used by
some Latvian communities outside Latvia), so the sorting rules for this letter
are not obvious.  I think that it would be better to make the first weight for
letter R WITH CEDILLA equal to R because most of current Latvian language users
cannot say when to use R with cedilla instead of R.

Finally, current version of Latvian locale sorts capital letters before small
letters, and that is not consistent with ISO14651 rules used by many glibc
locales; some users complain about that too.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug localedata/15537] Invalid collation for Latvian diacritical letters
  2013-05-26 14:24 [Bug localedata/15537] New: Invalid collation for Latvian diacritical letters alex at gorka dot lv
@ 2014-06-13 18:03 ` fweimer at redhat dot com
  2016-04-22  4:46 ` [Bug localedata/15537] lt_LT: invalid " vapier at gentoo dot org
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: fweimer at redhat dot com @ 2014-06-13 18:03 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=15537

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
              Flags|                            |security-

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug localedata/15537] lt_LT: invalid collation for Latvian diacritical letters
  2013-05-26 14:24 [Bug localedata/15537] New: Invalid collation for Latvian diacritical letters alex at gorka dot lv
  2014-06-13 18:03 ` [Bug localedata/15537] " fweimer at redhat dot com
@ 2016-04-22  4:46 ` vapier at gentoo dot org
  2017-10-21  8:25 ` maiku.fabian at gmail dot com
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: vapier at gentoo dot org @ 2016-04-22  4:46 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=15537

Mike Frysinger <vapier at gentoo dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Invalid collation for       |lt_LT: invalid collation
                   |Latvian diacritical letters |for Latvian diacritical
                   |                            |letters

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug localedata/15537] lt_LT: invalid collation for Latvian diacritical letters
  2013-05-26 14:24 [Bug localedata/15537] New: Invalid collation for Latvian diacritical letters alex at gorka dot lv
  2014-06-13 18:03 ` [Bug localedata/15537] " fweimer at redhat dot com
  2016-04-22  4:46 ` [Bug localedata/15537] lt_LT: invalid " vapier at gentoo dot org
@ 2017-10-21  8:25 ` maiku.fabian at gmail dot com
  2017-10-21  8:37 ` [Bug localedata/15537] lv_LV: " alex at gorka dot lv
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-10-21  8:25 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=15537

Mike FABIAN <maiku.fabian at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |maiku.fabian at gmail dot com

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug localedata/15537] lv_LV: invalid collation for Latvian diacritical letters
  2013-05-26 14:24 [Bug localedata/15537] New: Invalid collation for Latvian diacritical letters alex at gorka dot lv
                   ` (2 preceding siblings ...)
  2017-10-21  8:25 ` maiku.fabian at gmail dot com
@ 2017-10-21  8:37 ` alex at gorka dot lv
  2017-10-30  7:52 ` maiku.fabian at gmail dot com
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: alex at gorka dot lv @ 2017-10-21  8:37 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=15537

alexander smishlajev <alex at gorka dot lv> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|lt_LT: invalid collation    |lv_LV: invalid collation
                   |for Latvian diacritical     |for Latvian diacritical
                   |letters                     |letters

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug localedata/15537] lv_LV: invalid collation for Latvian diacritical letters
  2013-05-26 14:24 [Bug localedata/15537] New: Invalid collation for Latvian diacritical letters alex at gorka dot lv
                   ` (3 preceding siblings ...)
  2017-10-21  8:37 ` [Bug localedata/15537] lv_LV: " alex at gorka dot lv
@ 2017-10-30  7:52 ` maiku.fabian at gmail dot com
  2017-11-20  8:49 ` maiku.fabian at gmail dot com
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-10-30  7:52 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=15537

--- Comment #1 from Mike FABIAN <maiku.fabian at gmail dot com> ---
Theh CLDR collation rules
for Latvian look like this:

http://unicode.org/cldr/trac/browser/trunk/common/collation/lv.xml

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug localedata/15537] lv_LV: invalid collation for Latvian diacritical letters
  2013-05-26 14:24 [Bug localedata/15537] New: Invalid collation for Latvian diacritical letters alex at gorka dot lv
                   ` (4 preceding siblings ...)
  2017-10-30  7:52 ` maiku.fabian at gmail dot com
@ 2017-11-20  8:49 ` maiku.fabian at gmail dot com
  2017-11-20  8:50 ` maiku.fabian at gmail dot com
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-11-20  8:49 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=15537

--- Comment #2 from Mike FABIAN <maiku.fabian at gmail dot com> ---
Created attachment 10623
  --> https://sourceware.org/bugzilla/attachment.cgi?id=10623&action=edit
0001-lv_LV-locale-fix-collation-BZ-15537.patch

Order without my patch:

$ LC_ALL=lv_LV.UTF-8 ls
Ʒ  a   Aa  æ   Āb  c  D  i   Y   yb  Īb  ĵa  L  ņ   ra  Ŗa  Sa  š   T   Zb  ža
ʒ  ʒa  aa  Ā   āb  Ç  Ģ  Ia  y   Ī   īb  Ĵb  Ļ  O   Rb  ŗa  sa  Ša  Z   zb  Žb
ȥ  Ʒa  Ab  ā   ʒb  ç  ģ  ia  Ya  ī   Ĵ   ĵb  ļ  Ø   rb  Ŗb  Sb  ša  z   Ž   žb
Ȥ  Å   ab  Āa  Ʒb  Č  H  Ib  ya  Īa  ĵ   Ķ   M  ø   Ŗ   ŗb  sb  Šb  Za  ž
A  å   Æ   āa  C   č  I  ib  Yb  īa  Ĵa  ķ   Ņ  Ra  ŗ   S   Š   šb  za  Ža
$

Order with my patch:

bash-4.4# LC_ALL=lv_LV.UTF-8 ls 
a  Ā   ab  Æ  č  H  y   Īa  īb  Ĵ   ķ  M  Ø   ŗ   Ŗb  Sb  šb  Z   zb  Ža  ʒa
A  aa  Ab  c  Č  i  Y   ya  Īb  ĵa  Ķ  ņ  ra  Ŗ   S   š   Šb  ȥ   Zb  žb  Ʒa
å  Aa  āb  C  D  I  ia  Ya  yb  Ĵa  L  Ņ  Ra  ŗa  sa  Š   t   Ȥ   ž   Žb  ʒb
Å  āa  Āb  ç  ģ  ī  Ia  ib  Yb  ĵb  ļ  O  rb  Ŗa  Sa  ša  T   za  Ž   ʒ   Ʒb
ā  Āa  æ   Ç  Ģ  Ī  īa  Ib  ĵ   Ĵb  Ļ  ø  Rb  ŗb  sb  Ša  z   Za  ža  Ʒ
bash-4.4#

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug localedata/15537] lv_LV: invalid collation for Latvian diacritical letters
  2013-05-26 14:24 [Bug localedata/15537] New: Invalid collation for Latvian diacritical letters alex at gorka dot lv
                   ` (5 preceding siblings ...)
  2017-11-20  8:49 ` maiku.fabian at gmail dot com
@ 2017-11-20  8:50 ` maiku.fabian at gmail dot com
  2017-11-20  8:57 ` maiku.fabian at gmail dot com
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-11-20  8:50 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=15537

Mike FABIAN <maiku.fabian at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |digitalfreak@lingonborough.
                   |                            |com

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug localedata/15537] lv_LV: invalid collation for Latvian diacritical letters
  2013-05-26 14:24 [Bug localedata/15537] New: Invalid collation for Latvian diacritical letters alex at gorka dot lv
                   ` (6 preceding siblings ...)
  2017-11-20  8:50 ` maiku.fabian at gmail dot com
@ 2017-11-20  8:57 ` maiku.fabian at gmail dot com
  2017-11-22  5:05 ` cvs-commit at gcc dot gnu.org
  2017-11-22  5:18 ` maiku.fabian at gmail dot com
  9 siblings, 0 replies; 11+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-11-20  8:57 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=15537

--- Comment #3 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to alexander smishlajev from comment #0)

> Besides, current version of Latvian locale contains letter R WITH CEDILLA
> (U0156, U0157), which is now sorted separately from letter R with other
> diacritical marks.  This letter is not currently used for Latvian writing in
> Latvia (it was used in the first half of the 20th century, and is still used
> by some Latvian communities outside Latvia), so the sorting rules for this
> letter are not obvious.  I think that it would be better to make the first
> weight for letter R WITH CEDILLA equal to R because most of current Latvian
> language users cannot say when to use R with cedilla instead of R.

My patch fixes the problems you report, *except* the problem you
report about R WITH CEDILLA.

I fixed it by throwing away all the existing rules in LC_COLLATE in the
lv_LV locale and do a 

copy "iso14651_t1"

instead to include the default sort order.

Then, on top of the default sort order I implemented the same
rules as in

http://unicode.org/cldr/trac/browser/trunk/common/collation/lv.xml

This collation data from CLDR treats the R WITH CEDILLA as primary different
from R, i.e. it continues to sort it the same way as the current
lv_LV locale in glibc does.

I don’t want to deviate from the CLDR collation data for no good reason,
so if this is really wrong it would be good to report a bug
against CLDR. But I guess it is correct because it cites
a Latvian dictionary as a reference.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug localedata/15537] lv_LV: invalid collation for Latvian diacritical letters
  2013-05-26 14:24 [Bug localedata/15537] New: Invalid collation for Latvian diacritical letters alex at gorka dot lv
                   ` (7 preceding siblings ...)
  2017-11-20  8:57 ` maiku.fabian at gmail dot com
@ 2017-11-22  5:05 ` cvs-commit at gcc dot gnu.org
  2017-11-22  5:18 ` maiku.fabian at gmail dot com
  9 siblings, 0 replies; 11+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2017-11-22  5:05 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=15537

--- Comment #4 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  4b7af5fca7db9fe1f4c078c57f20a08e2a1e2404 (commit)
      from  922bb78c0c074aaeaa9f0312195b717674ed7430 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=4b7af5fca7db9fe1f4c078c57f20a08e2a1e2404

commit 4b7af5fca7db9fe1f4c078c57f20a08e2a1e2404
Author: Mike FABIAN <mfabian@redhat.com>
Date:   Fri Nov 17 10:54:52 2017 +0100

    lv_LV locale: fix collation [BZ #15537]

        [BZ #15537]
        * localedata/locales/lv_LV (LC_COLLATE): Fix collation by
        using “copy "iso14651_t1"” and then implementing the
        collation rules for lv from CLDR on top of that.
        * Makefile: Add lv_LV.UTF-8 to test-input and to the list
        of locales to be built for testing.
        * lv_LV.UTF-8.in: New file with test data to test the Latvian
        sorting.

    Reviewed-by: Carlos O'Donell <carlos@redhat.com>

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                 |   11 +
 localedata/Makefile       |    4 +-
 localedata/locales/lv_LV  | 2107 +-------------------------------------------
 localedata/lv_LV.UTF-8.in |  105 +++
 4 files changed, 166 insertions(+), 2061 deletions(-)
 create mode 100644 localedata/lv_LV.UTF-8.in

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug localedata/15537] lv_LV: invalid collation for Latvian diacritical letters
  2013-05-26 14:24 [Bug localedata/15537] New: Invalid collation for Latvian diacritical letters alex at gorka dot lv
                   ` (8 preceding siblings ...)
  2017-11-22  5:05 ` cvs-commit at gcc dot gnu.org
@ 2017-11-22  5:18 ` maiku.fabian at gmail dot com
  9 siblings, 0 replies; 11+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-11-22  5:18 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=15537

Mike FABIAN <maiku.fabian at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED
   Target Milestone|---                         |2.27

--- Comment #5 from Mike FABIAN <maiku.fabian at gmail dot com> ---
Fixed in glibc master.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-11-22  5:18 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-26 14:24 [Bug localedata/15537] New: Invalid collation for Latvian diacritical letters alex at gorka dot lv
2014-06-13 18:03 ` [Bug localedata/15537] " fweimer at redhat dot com
2016-04-22  4:46 ` [Bug localedata/15537] lt_LT: invalid " vapier at gentoo dot org
2017-10-21  8:25 ` maiku.fabian at gmail dot com
2017-10-21  8:37 ` [Bug localedata/15537] lv_LV: " alex at gorka dot lv
2017-10-30  7:52 ` maiku.fabian at gmail dot com
2017-11-20  8:49 ` maiku.fabian at gmail dot com
2017-11-20  8:50 ` maiku.fabian at gmail dot com
2017-11-20  8:57 ` maiku.fabian at gmail dot com
2017-11-22  5:05 ` cvs-commit at gcc dot gnu.org
2017-11-22  5:18 ` maiku.fabian at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).