* [Bug localedata/15537] New: Invalid collation for Latvian diacritical letters
@ 2013-05-26 14:24 alex at gorka dot lv
2014-06-13 18:03 ` [Bug localedata/15537] " fweimer at redhat dot com
` (9 more replies)
0 siblings, 10 replies; 11+ messages in thread
From: alex at gorka dot lv @ 2013-05-26 14:24 UTC (permalink / raw)
To: libc-locales
http://sourceware.org/bugzilla/show_bug.cgi?id=15537
Bug ID: 15537
Summary: Invalid collation for Latvian diacritical letters
Product: glibc
Version: 2.18
Status: NEW
Severity: normal
Priority: P2
Component: localedata
Assignee: unassigned at sourceware dot org
Reporter: alex at gorka dot lv
CC: libc-locales at sourceware dot org
Latvian language locale for Latvia has wrong collation order for Latvian
vowels: A MACRON (U0100, U0101), E MACRON (U0112, U0113), I MACRON (U012A,
U012B), O MACRON (U014C, U014D), and U MACRON (U016A, U016B). The first weight
specifier for these letters should be equal to base letter (A, E, I, O, and U,
respectively), and only the second weight specifier must be heavier. In other
words, letters with macrons are sorted after the same letters without macrons
only when string parts after the letter are equal.
Note that diacritical consonants - C CARON, G CEDILLA, K CEDILLA, L CEDILLA, N
CEDILLA, S CARON, and Z CARON - are always sorted after their base letters; for
these letters the first weight specifier must be different, and that is correct
with current version of the Latvian locale.
Besides, current version of Latvian locale contains letter R WITH CEDILLA
(U0156, U0157), which is now sorted separately from letter R with other
diacritical marks. This letter is not currently used for Latvian writing in
Latvia (it was used in the first half of the 20th century, and is still used by
some Latvian communities outside Latvia), so the sorting rules for this letter
are not obvious. I think that it would be better to make the first weight for
letter R WITH CEDILLA equal to R because most of current Latvian language users
cannot say when to use R with cedilla instead of R.
Finally, current version of Latvian locale sorts capital letters before small
letters, and that is not consistent with ISO14651 rules used by many glibc
locales; some users complain about that too.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug localedata/15537] Invalid collation for Latvian diacritical letters
2013-05-26 14:24 [Bug localedata/15537] New: Invalid collation for Latvian diacritical letters alex at gorka dot lv
@ 2014-06-13 18:03 ` fweimer at redhat dot com
2016-04-22 4:46 ` [Bug localedata/15537] lt_LT: invalid " vapier at gentoo dot org
` (8 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: fweimer at redhat dot com @ 2014-06-13 18:03 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=15537
Florian Weimer <fweimer at redhat dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Flags| |security-
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug localedata/15537] lt_LT: invalid collation for Latvian diacritical letters
2013-05-26 14:24 [Bug localedata/15537] New: Invalid collation for Latvian diacritical letters alex at gorka dot lv
2014-06-13 18:03 ` [Bug localedata/15537] " fweimer at redhat dot com
@ 2016-04-22 4:46 ` vapier at gentoo dot org
2017-10-21 8:25 ` maiku.fabian at gmail dot com
` (7 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: vapier at gentoo dot org @ 2016-04-22 4:46 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=15537
Mike Frysinger <vapier at gentoo dot org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|Invalid collation for |lt_LT: invalid collation
|Latvian diacritical letters |for Latvian diacritical
| |letters
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug localedata/15537] lt_LT: invalid collation for Latvian diacritical letters
2013-05-26 14:24 [Bug localedata/15537] New: Invalid collation for Latvian diacritical letters alex at gorka dot lv
2014-06-13 18:03 ` [Bug localedata/15537] " fweimer at redhat dot com
2016-04-22 4:46 ` [Bug localedata/15537] lt_LT: invalid " vapier at gentoo dot org
@ 2017-10-21 8:25 ` maiku.fabian at gmail dot com
2017-10-21 8:37 ` [Bug localedata/15537] lv_LV: " alex at gorka dot lv
` (6 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-10-21 8:25 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=15537
Mike FABIAN <maiku.fabian at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |maiku.fabian at gmail dot com
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug localedata/15537] lv_LV: invalid collation for Latvian diacritical letters
2013-05-26 14:24 [Bug localedata/15537] New: Invalid collation for Latvian diacritical letters alex at gorka dot lv
` (2 preceding siblings ...)
2017-10-21 8:25 ` maiku.fabian at gmail dot com
@ 2017-10-21 8:37 ` alex at gorka dot lv
2017-10-30 7:52 ` maiku.fabian at gmail dot com
` (5 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: alex at gorka dot lv @ 2017-10-21 8:37 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=15537
alexander smishlajev <alex at gorka dot lv> changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|lt_LT: invalid collation |lv_LV: invalid collation
|for Latvian diacritical |for Latvian diacritical
|letters |letters
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug localedata/15537] lv_LV: invalid collation for Latvian diacritical letters
2013-05-26 14:24 [Bug localedata/15537] New: Invalid collation for Latvian diacritical letters alex at gorka dot lv
` (3 preceding siblings ...)
2017-10-21 8:37 ` [Bug localedata/15537] lv_LV: " alex at gorka dot lv
@ 2017-10-30 7:52 ` maiku.fabian at gmail dot com
2017-11-20 8:49 ` maiku.fabian at gmail dot com
` (4 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-10-30 7:52 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=15537
--- Comment #1 from Mike FABIAN <maiku.fabian at gmail dot com> ---
Theh CLDR collation rules
for Latvian look like this:
http://unicode.org/cldr/trac/browser/trunk/common/collation/lv.xml
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug localedata/15537] lv_LV: invalid collation for Latvian diacritical letters
2013-05-26 14:24 [Bug localedata/15537] New: Invalid collation for Latvian diacritical letters alex at gorka dot lv
` (4 preceding siblings ...)
2017-10-30 7:52 ` maiku.fabian at gmail dot com
@ 2017-11-20 8:49 ` maiku.fabian at gmail dot com
2017-11-20 8:50 ` maiku.fabian at gmail dot com
` (3 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-11-20 8:49 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=15537
--- Comment #2 from Mike FABIAN <maiku.fabian at gmail dot com> ---
Created attachment 10623
--> https://sourceware.org/bugzilla/attachment.cgi?id=10623&action=edit
0001-lv_LV-locale-fix-collation-BZ-15537.patch
Order without my patch:
$ LC_ALL=lv_LV.UTF-8 ls
Ʒ a Aa æ Āb c D i Y yb Īb ĵa L ņ ra Ŗa Sa š T Zb ža
ʒ ʒa aa Ā āb Ç Ģ Ia y Ī īb Ĵb Ļ O Rb ŗa sa Ša Z zb Žb
ȥ Ʒa Ab ā ʒb ç ģ ia Ya ī Ĵ ĵb ļ Ø rb Ŗb Sb ša z Ž žb
Ȥ Å ab Āa Ʒb Č H Ib ya Īa ĵ Ķ M ø Ŗ ŗb sb Šb Za ž
A å Æ āa C č I ib Yb īa Ĵa ķ Ņ Ra ŗ S Š šb za Ža
$
Order with my patch:
bash-4.4# LC_ALL=lv_LV.UTF-8 ls
a Ā ab Æ č H y Īa īb Ĵ ķ M Ø ŗ Ŗb Sb šb Z zb Ža ʒa
A aa Ab c Č i Y ya Īb ĵa Ķ ņ ra Ŗ S š Šb ȥ Zb žb Ʒa
å Aa āb C D I ia Ya yb Ĵa L Ņ Ra ŗa sa Š t Ȥ ž Žb ʒb
Å āa Āb ç ģ ī Ia ib Yb ĵb ļ O rb Ŗa Sa ša T za Ž ʒ Ʒb
ā Āa æ Ç Ģ Ī īa Ib ĵ Ĵb Ļ ø Rb ŗb sb Ša z Za ža Ʒ
bash-4.4#
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug localedata/15537] lv_LV: invalid collation for Latvian diacritical letters
2013-05-26 14:24 [Bug localedata/15537] New: Invalid collation for Latvian diacritical letters alex at gorka dot lv
` (5 preceding siblings ...)
2017-11-20 8:49 ` maiku.fabian at gmail dot com
@ 2017-11-20 8:50 ` maiku.fabian at gmail dot com
2017-11-20 8:57 ` maiku.fabian at gmail dot com
` (2 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-11-20 8:50 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=15537
Mike FABIAN <maiku.fabian at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |digitalfreak@lingonborough.
| |com
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug localedata/15537] lv_LV: invalid collation for Latvian diacritical letters
2013-05-26 14:24 [Bug localedata/15537] New: Invalid collation for Latvian diacritical letters alex at gorka dot lv
` (6 preceding siblings ...)
2017-11-20 8:50 ` maiku.fabian at gmail dot com
@ 2017-11-20 8:57 ` maiku.fabian at gmail dot com
2017-11-22 5:05 ` cvs-commit at gcc dot gnu.org
2017-11-22 5:18 ` maiku.fabian at gmail dot com
9 siblings, 0 replies; 11+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-11-20 8:57 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=15537
--- Comment #3 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to alexander smishlajev from comment #0)
> Besides, current version of Latvian locale contains letter R WITH CEDILLA
> (U0156, U0157), which is now sorted separately from letter R with other
> diacritical marks. This letter is not currently used for Latvian writing in
> Latvia (it was used in the first half of the 20th century, and is still used
> by some Latvian communities outside Latvia), so the sorting rules for this
> letter are not obvious. I think that it would be better to make the first
> weight for letter R WITH CEDILLA equal to R because most of current Latvian
> language users cannot say when to use R with cedilla instead of R.
My patch fixes the problems you report, *except* the problem you
report about R WITH CEDILLA.
I fixed it by throwing away all the existing rules in LC_COLLATE in the
lv_LV locale and do a
copy "iso14651_t1"
instead to include the default sort order.
Then, on top of the default sort order I implemented the same
rules as in
http://unicode.org/cldr/trac/browser/trunk/common/collation/lv.xml
This collation data from CLDR treats the R WITH CEDILLA as primary different
from R, i.e. it continues to sort it the same way as the current
lv_LV locale in glibc does.
I don’t want to deviate from the CLDR collation data for no good reason,
so if this is really wrong it would be good to report a bug
against CLDR. But I guess it is correct because it cites
a Latvian dictionary as a reference.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug localedata/15537] lv_LV: invalid collation for Latvian diacritical letters
2013-05-26 14:24 [Bug localedata/15537] New: Invalid collation for Latvian diacritical letters alex at gorka dot lv
` (7 preceding siblings ...)
2017-11-20 8:57 ` maiku.fabian at gmail dot com
@ 2017-11-22 5:05 ` cvs-commit at gcc dot gnu.org
2017-11-22 5:18 ` maiku.fabian at gmail dot com
9 siblings, 0 replies; 11+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2017-11-22 5:05 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=15537
--- Comment #4 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".
The branch, master has been updated
via 4b7af5fca7db9fe1f4c078c57f20a08e2a1e2404 (commit)
from 922bb78c0c074aaeaa9f0312195b717674ed7430 (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=4b7af5fca7db9fe1f4c078c57f20a08e2a1e2404
commit 4b7af5fca7db9fe1f4c078c57f20a08e2a1e2404
Author: Mike FABIAN <mfabian@redhat.com>
Date: Fri Nov 17 10:54:52 2017 +0100
lv_LV locale: fix collation [BZ #15537]
[BZ #15537]
* localedata/locales/lv_LV (LC_COLLATE): Fix collation by
using “copy "iso14651_t1"” and then implementing the
collation rules for lv from CLDR on top of that.
* Makefile: Add lv_LV.UTF-8 to test-input and to the list
of locales to be built for testing.
* lv_LV.UTF-8.in: New file with test data to test the Latvian
sorting.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
-----------------------------------------------------------------------
Summary of changes:
ChangeLog | 11 +
localedata/Makefile | 4 +-
localedata/locales/lv_LV | 2107 +-------------------------------------------
localedata/lv_LV.UTF-8.in | 105 +++
4 files changed, 166 insertions(+), 2061 deletions(-)
create mode 100644 localedata/lv_LV.UTF-8.in
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug localedata/15537] lv_LV: invalid collation for Latvian diacritical letters
2013-05-26 14:24 [Bug localedata/15537] New: Invalid collation for Latvian diacritical letters alex at gorka dot lv
` (8 preceding siblings ...)
2017-11-22 5:05 ` cvs-commit at gcc dot gnu.org
@ 2017-11-22 5:18 ` maiku.fabian at gmail dot com
9 siblings, 0 replies; 11+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-11-22 5:18 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=15537
Mike FABIAN <maiku.fabian at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |FIXED
Target Milestone|--- |2.27
--- Comment #5 from Mike FABIAN <maiku.fabian at gmail dot com> ---
Fixed in glibc master.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2017-11-22 5:18 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-26 14:24 [Bug localedata/15537] New: Invalid collation for Latvian diacritical letters alex at gorka dot lv
2014-06-13 18:03 ` [Bug localedata/15537] " fweimer at redhat dot com
2016-04-22 4:46 ` [Bug localedata/15537] lt_LT: invalid " vapier at gentoo dot org
2017-10-21 8:25 ` maiku.fabian at gmail dot com
2017-10-21 8:37 ` [Bug localedata/15537] lv_LV: " alex at gorka dot lv
2017-10-30 7:52 ` maiku.fabian at gmail dot com
2017-11-20 8:49 ` maiku.fabian at gmail dot com
2017-11-20 8:50 ` maiku.fabian at gmail dot com
2017-11-20 8:57 ` maiku.fabian at gmail dot com
2017-11-22 5:05 ` cvs-commit at gcc dot gnu.org
2017-11-22 5:18 ` maiku.fabian at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).