public inbox for libc-locales@sourceware.org
 help / color / mirror / Atom feed
* [PATCH][BZ 17293] Fix sorting order for Ukrainian locale
@ 2015-01-02  4:22 Andriy Rysin
  2015-02-11  0:59 ` Andriy Rysin
  0 siblings, 1 reply; 7+ messages in thread
From: Andriy Rysin @ 2015-01-02  4:22 UTC (permalink / raw)
  To: libc-locales

[-- Attachment #1: Type: text/plain, Size: 980 bytes --]

The sorting order for several characters was wrong in uk_UA locale.
This patch fixes two problems:
1) soft sign position (it has its own in the alphabet and should not be ignored)
2) UKR-IE should follow CYR-IE (as they are separate letters and have
their own positions)

Collation order tests added.

Unfortunately there's no official standard for collation for Ukrainian
language in public access but this new order is confirmed to be used
in official documents and dictionaries in Ukrainian.
Some links:
http://spelling.ulif.org.ua/peredmova.htm - Official spelling rules
for Ukrainian (the alphabet is listed there and the only note is about
apostrophe which should not affect the sorting)
http://lcorp.ulif.org.ua/dictua/ - Ukrainian dictionaries from
National Academy of Science use the sorting order that matches the one
provided by the patch

Also with this patch the order for soft sign and UKR-IE/CYR-IE match
those in ICU which follows Unicode standard.

Thanks
Andriy

[-- Attachment #2: fix_UKR-IE_order_in_uk_UA.patch --]
[-- Type: text/x-patch, Size: 8350 bytes --]

From e2cdfa3b916a2dbac80184ed7918aaebb88d57e7 Mon Sep 17 00:00:00 2001
From: Andriy Rysin <arysin@gmail.com>
Date: Thu, 1 Jan 2015 22:16:10 -0500
Subject: [PATCH] Fix sorting order for Ukrainian locale:   soft sign has its
 own position and UKR-IE should follow CYR-IE;   added collation tests for
 Ukrainian locale symbols

---
 localedata/ChangeLog     |   5 ++
 localedata/Makefile      |   4 +-
 localedata/locales/uk_UA | 116 ++++++++++++++++++++++++-----------------------
 localedata/uk_UA.in      |  56 +++++++++++++++++++++++
 4 files changed, 122 insertions(+), 59 deletions(-)
 create mode 100644 localedata/uk_UA.in

diff --git a/localedata/ChangeLog b/localedata/ChangeLog
index 1636e52..147df93 100644
--- a/localedata/ChangeLog
+++ b/localedata/ChangeLog
@@ -1,3 +1,8 @@
+2015-01-01  Andriy Rysin  <arysin@gmail.com>
+
+	[BZ #17293]
+	* uk_UA: Fix sorting order for Ukrainian locale
+
 2014-12-01  Pravin Satpute <psatpute@redhat.com>
 
 	[BZ #16857]
diff --git a/localedata/Makefile b/localedata/Makefile
index 0826b36..5f8ca7f 100644
--- a/localedata/Makefile
+++ b/localedata/Makefile
@@ -37,7 +37,7 @@ test-srcs := collate-test xfrm-test tst-fmon tst-rpmatch tst-trans \
 	     tst-ctype tst-langinfo tst-langinfo-static tst-numeric
 test-input := de_DE.ISO-8859-1 en_US.ISO-8859-1 da_DK.ISO-8859-1 \
 	      hr_HR.ISO-8859-2 sv_SE.ISO-8859-1 tr_TR.UTF-8 fr_FR.UTF-8 \
-	      si_LK.UTF-8
+	      si_LK.UTF-8 uk_UA.UTF-8
 test-input-data = $(addsuffix .in, $(basename $(test-input)))
 test-output := $(foreach s, .out .xout, \
 			 $(addsuffix $s, $(basename $(test-input))))
@@ -106,7 +106,7 @@ LOCALES := de_DE.ISO-8859-1 de_DE.UTF-8 en_US.ANSI_X3.4-1968 \
 	   hr_HR.ISO-8859-2 sv_SE.ISO-8859-1 ja_JP.SJIS fr_FR.ISO-8859-1 \
 	   nb_NO.ISO-8859-1 nn_NO.ISO-8859-1 tr_TR.UTF-8 cs_CZ.UTF-8 \
 	   zh_TW.EUC-TW fa_IR.UTF-8 fr_FR.UTF-8 ja_JP.UTF-8 si_LK.UTF-8 \
-	   tr_TR.ISO-8859-9 en_GB.UTF-8
+	   tr_TR.ISO-8859-9 en_GB.UTF-8 uk_UA.UTF-8
 LOCALE_SRCS := $(shell echo "$(LOCALES)"|sed 's/\([^ .]*\)[^ ]*/\1/g')
 CHARMAPS := $(shell echo "$(LOCALES)" | \
 		    sed -e 's/[^ .]*[.]\([^ ]*\)/\1/g' -e s/SJIS/SHIFT_JIS/g)
diff --git a/localedata/locales/uk_UA b/localedata/locales/uk_UA
index d9194b8..2bd30eb 100644
--- a/localedata/locales/uk_UA
+++ b/localedata/locales/uk_UA
@@ -349,61 +349,63 @@ collating-symbol <UKR-GHE>
 % Soft sign '<U044C>' may follow only this set of nine characters [<U0432><U0434><U0437><U043B><U043D><U0440><U0441><U0442><U0446>].
 % It only softens pronunciation of these characters so it's should not impact
 % sorting.
-
-
-collating-symbol <V+SS>
-collating-element <V-SS> from "<U0412><U042C>"
-collating-element <V-ss> from "<U0412><U044C>"
-collating-element <v-SS> from "<U0432><U042C>"
-collating-element <v-ss> from "<U0432><U044C>"
-
-collating-symbol <D+SS>
-collating-element <D-SS> from "<U0414><U042C>"
-collating-element <D-ss> from "<U0414><U044C>"
-collating-element <d-SS> from "<U0434><U042C>"
-collating-element <d-ss> from "<U0434><U044C>"
-
-collating-symbol <Z+SS>
-collating-element <Z-SS> from "<U0417><U042C>"
-collating-element <Z-ss> from "<U0417><U044C>"
-collating-element <z-SS> from "<U0437><U042C>"
-collating-element <z-ss> from "<U0437><U044C>"
-
-collating-symbol <L+SS>
-collating-element <L-SS> from "<U041B><U042C>"
-collating-element <L-ss> from "<U041B><U044C>"
-collating-element <l-SS> from "<U043B><U042C>"
-collating-element <l-ss> from "<U043B><U044C>"
-
-collating-symbol <N+SS>
-collating-element <N-SS> from "<U041D><U042C>"
-collating-element <N-ss> from "<U041D><U044C>"
-collating-element <n-SS> from "<U043D><U042C>"
-collating-element <n-ss> from "<U043D><U044C>"
-
-collating-symbol <R+SS>
-collating-element <R-SS> from "<U0420><U042C>"
-collating-element <R-ss> from "<U0420><U044C>"
-collating-element <r-SS> from "<U0440><U042C>"
-collating-element <r-ss> from "<U0440><U044C>"
-
-collating-symbol <S+SS>
-collating-element <S-SS> from "<U0421><U042C>"
-collating-element <S-ss> from "<U0421><U044C>"
-collating-element <s-SS> from "<U0441><U042C>"
-collating-element <s-ss> from "<U0441><U044C>"
-
-collating-symbol <T+SS>
-collating-element <T-SS> from "<U0422><U042C>"
-collating-element <T-ss> from "<U0422><U044C>"
-collating-element <t-SS> from "<U0442><U042C>"
-collating-element <t-ss> from "<U0442><U044C>"
-
-collating-symbol <TSE+SS>
-collating-element <TS-SS> from "<U0426><U042C>"
-collating-element <TS-ss> from "<U0426><U044C>"
-collating-element <ts-SS> from "<U0446><U042C>"
-collating-element <ts-ss> from "<U0446><U044C>"
+%
+% Note: in the official alphabet the soft sign is a letter and has a hard position in the order
+
+
+%collating-symbol <V+SS>
+%collating-element <V-SS> from "<U0412><U042C>"
+%collating-element <V-ss> from "<U0412><U044C>"
+%collating-element <v-SS> from "<U0432><U042C>"
+%collating-element <v-ss> from "<U0432><U044C>"
+%
+%collating-symbol <D+SS>
+%collating-element <D-SS> from "<U0414><U042C>"
+%collating-element <D-ss> from "<U0414><U044C>"
+%collating-element <d-SS> from "<U0434><U042C>"
+%collating-element <d-ss> from "<U0434><U044C>"
+%
+%collating-symbol <Z+SS>
+%collating-element <Z-SS> from "<U0417><U042C>"
+%collating-element <Z-ss> from "<U0417><U044C>"
+%collating-element <z-SS> from "<U0437><U042C>"
+%collating-element <z-ss> from "<U0437><U044C>"
+%
+%collating-symbol <L+SS>
+%collating-element <L-SS> from "<U041B><U042C>"
+%collating-element <L-ss> from "<U041B><U044C>"
+%collating-element <l-SS> from "<U043B><U042C>"
+%collating-element <l-ss> from "<U043B><U044C>"
+%
+%collating-symbol <N+SS>
+%collating-element <N-SS> from "<U041D><U042C>"
+%collating-element <N-ss> from "<U041D><U044C>"
+%collating-element <n-SS> from "<U043D><U042C>"
+%collating-element <n-ss> from "<U043D><U044C>"
+%
+%collating-symbol <R+SS>
+%collating-element <R-SS> from "<U0420><U042C>"
+%collating-element <R-ss> from "<U0420><U044C>"
+%collating-element <r-SS> from "<U0440><U042C>"
+%collating-element <r-ss> from "<U0440><U044C>"
+%
+%collating-symbol <S+SS>
+%collating-element <S-SS> from "<U0421><U042C>"
+%collating-element <S-ss> from "<U0421><U044C>"
+%collating-element <s-SS> from "<U0441><U042C>"
+%collating-element <s-ss> from "<U0441><U044C>"
+%
+%collating-symbol <T+SS>
+%collating-element <T-SS> from "<U0422><U042C>"
+%collating-element <T-ss> from "<U0422><U044C>"
+%collating-element <t-SS> from "<U0442><U042C>"
+%collating-element <t-ss> from "<U0442><U044C>"
+%
+%collating-symbol <TSE+SS>
+%collating-element <TS-SS> from "<U0426><U042C>"
+%collating-element <TS-ss> from "<U0426><U044C>"
+%collating-element <ts-SS> from "<U0446><U042C>"
+%collating-element <ts-ss> from "<U0446><U044C>"
 
 
 collating-symbol <CAP-MIN>
@@ -489,11 +491,11 @@ reorder-after <U0434>
 <U0455> "<U003C><U0043><U0059><U0052><U002D><U0044><U0045><U003E><U003C><U0043><U0059><U0052><U002D><U005A><U0045><U003E>";"<U003C><U004C><U0049><U0047><U003E><U003C><U004C><U0049><U0047><U003E>";"<U003C><U004D><U0049><U004E><U003E><U003C><U004D><U0049><U004E><U003E>";IGNORE % CYR-DZE
 
 reorder-after <U0435>
-<U0454> <CYR-IE>;<UKR-IE>;<MIN>;IGNORE
+%<U0454> <CYR-IE>;<UKR-IE>;<MIN>;IGNORE
 <U0451> <CYR-IE>;<CYR-IO>;<MIN>;IGNORE
 <U044D> <CYR-IE>;<CYR-E>;<MIN>;IGNORE
 reorder-after <U0415>
-<U0404> <CYR-IE>;<UKR-IE>;<CAP>;IGNORE
+%<U0404> <CYR-IE>;<UKR-IE>;<CAP>;IGNORE
 <U0401> <CYR-IE>;<CYR-IO>;<CAP>;IGNORE
 <U042D> <CYR-IE>;<CYR-E>;<CAP>;IGNORE
 
diff --git a/localedata/uk_UA.in b/localedata/uk_UA.in
new file mode 100644
index 0000000..ff4d284
--- /dev/null
+++ b/localedata/uk_UA.in
@@ -0,0 +1,56 @@
+01010
+Абажур
+абажур
+абажур-10
+брама
+вермішель
+грати
+Граття
+граття
+ґрати
+ебонітовий
+експорт
+експосол
+екс-посол
+експоцентр
+експрацівник
+екс-працівник
+еластичність
+електрика
+ельбор
+елюент
+епатаж
+євгеніка
+Європа
+єдність
+Жмих
+жмих
+зоря
+и
+і
+ї
+й
+Карпати
+криниця
+лебідь
+місяцевий
+місяць
+наразі
+обапіл
+об'їзд
+об’їзд
+обʼїзд
+образ
+опір
+право
+сонце
+тарган
+упродовж
+фантастика
+центр
+чухатися
+ш
+щ
+ь
+ю
+я
-- 
2.1.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-03-13 15:08 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-02  4:22 [PATCH][BZ 17293] Fix sorting order for Ukrainian locale Andriy Rysin
2015-02-11  0:59 ` Andriy Rysin
2015-03-13  2:38   ` Andriy Rysin
2015-03-13 12:47     ` Carlos O'Donell
2015-03-13 14:51       ` Andriy Rysin
2015-03-13 15:02         ` Carlos O'Donell
2015-03-13 15:08           ` Andriy Rysin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).