From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 26144 invoked by alias); 2 Mar 2018 12:59:14 -0000 Mailing-List: contact libc-locales-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-locales-owner@sourceware.org Received: (qmail 25830 invoked by uid 55); 2 Mar 2018 12:59:04 -0000 From: "cvs-commit at gcc dot gnu.org" To: libc-locales@sourceware.org Subject: [Bug localedata/14095] Review / update collation data from Unicode / ISO 14651 Date: Fri, 02 Mar 2018 12:59:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: localedata X-Bugzilla-Version: 2.15 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: cvs-commit at gcc dot gnu.org X-Bugzilla-Status: RESOLVED X-Bugzilla-Resolution: FIXED X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: maiku.fabian at gmail dot com X-Bugzilla-Target-Milestone: 2.28 X-Bugzilla-Flags: security- X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2018-q1/txt/msg00110.txt.bz2 https://sourceware.org/bugzilla/show_bug.cgi?id=3D14095 --- Comment #12 from cvs-commit at gcc dot gnu.org --- This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, mfabian/collation-update-2.27 has been created at 9589174d076327deb7ed816d16b89b0e7470abd6 (commit) - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=3Dglibc.git;h=3D9589174d076327deb7e= d816d16b89b0e7470abd6 commit 9589174d076327deb7ed816d16b89b0e7470abd6 Author: Mike FABIAN Date: Thu Dec 21 18:56:52 2017 +0100 Remove the lines from cmn_TW.UTF-8.in which cannot work at the moment. See this bug https://sourceware.org/bugzilla/show_bug.cgi?id=3D22898 These lines don=E2=80=99t yet work because of a glibc bug, not because = of problems in the locale data. No matter what sorting rules one uses, these characters cannot be sorted at all at the moment. As soon as that bug is fixed, these lines should be added back to the test file. * localedata/cmn_TW.UTF-8.in: Remove the lines which cannot be sorted correctly at the moment because of a bug. https://sourceware.org/git/gitweb.cgi?p=3Dglibc.git;h=3De289a7d4c7f2abf09e4= a4877b8cadcded7440e55 commit e289a7d4c7f2abf09e4a4877b8cadcded7440e55 Author: Mike FABIAN Date: Mon Dec 11 18:26:22 2017 +0100 Adapt collation in several locales to the new iso14651_t1_common file [BZ #22550] - es_ES locale (and other es_* locales): collation should treat =C3=B1 as a primary different character, sync the collation for Spanish with CLDR [BZ #21547] - Tibetan script collation broken (Dzongkha and Tibetan) * localedata/Makefile: Add new test files. * localedata/lv_LV.UTF-8.in: Adapt test file to new collation order. * localedata/sv_SE.ISO-8859-1.in: Adapt test file to new collation order. * localedata/uk_UA.UTF-8.in: Adapt test file to new collation order. * localedata/am_ET.UTF-8.in: New test file. * localedata/az_AZ.UTF-8.in: Likewise. * localedata/be_BY.UTF-8.in: Likewise. * localedata/ber_DZ.UTF-8.in: Likewise. * localedata/ber_MA.UTF-8.in: Likewise. * localedata/bg_BG.UTF-8.in: Likewise. * localedata/br_FR.UTF-8.in: Likewise. * localedata/cmn_TW.UTF-8.in: Likewise. * localedata/crh_UA.UTF-8.in: Likewise. * localedata/csb_PL.UTF-8.in: Likewise. * localedata/cv_RU.UTF-8.in: Likewise. * localedata/cy_GB.UTF-8.in: Likewise. * localedata/dz_BT.UTF-8.in: Likewise. * localedata/eo.UTF-8.in: Likewise. * localedata/es_ES.UTF-8.in: Likewise. * localedata/fa_IR.UTF-8.in: Likewise. * localedata/fi_FI.UTF-8.in: Likewise. * localedata/fil_PH.UTF-8.in: Likewise. * localedata/fur_IT.UTF-8.in: Likewise. * localedata/gez_ER.UTF-8@abegede.in: Likewise. * localedata/ha_NG.UTF-8.in: Likewise. * localedata/ig_NG.UTF-8.in: Likewise. * localedata/ik_CA.UTF-8.in: Likewise. * localedata/kk_KZ.UTF-8.in: Likewise. * localedata/ku_TR.UTF-8.in: Likewise. * localedata/ky_KG.UTF-8.in: Likewise. * localedata/ln_CD.UTF-8.in: Likewise. * localedata/mi_NZ.UTF-8.in: Likewise. * localedata/ml_IN.UTF-8.in: Likewise. * localedata/mn_MN.UTF-8.in: Likewise. * localedata/mr_IN.UTF-8.in: Likewise. * localedata/mt_MT.UTF-8.in: Likewise. * localedata/nb_NO.UTF-8.in: Likewise. * localedata/om_KE.UTF-8.in: Likewise. * localedata/os_RU.UTF-8.in: Likewise. * localedata/ps_AF.UTF-8.in: Likewise. * localedata/ro_RO.UTF-8.in: Likewise. * localedata/ru_RU.UTF-8.in: Likewise. * localedata/sc_IT.UTF-8.in: Likewise. * localedata/se_NO.UTF-8.in: Likewise. * localedata/sq_AL.UTF-8.in: Likewise. * localedata/sv_SE.UTF-8.in: Likewise. * localedata/szl_PL.UTF-8.in: Likewise. * localedata/tg_TJ.UTF-8.in: Likewise. * localedata/tk_TM.UTF-8.in: Likewise. * localedata/tt_RU.UTF-8.in: Likewise. * localedata/tt_RU.UTF-8@iqtelif.in: Likewise. * localedata/ug_CN.UTF-8.in: Likewise. * localedata/uz_UZ.UTF-8.in: Likewise. * localedata/vi_VN.UTF-8.in: Likewise. * localedata/yi_US.UTF-8.in: Likewise. * localedata/yo_NG.UTF-8.in: Likewise. * localedata/zh_CN.UTF-8.in: Likewise. * localedata/locales/am_ET: Adapt collation rules to new iso14651_t1_common file and fix bugs in the collation. * localedata/locales/az_AZ: Likewise. * localedata/locales/be_BY: Likewise. * localedata/locales/ber_DZ: Likewise. * localedata/locales/ber_MA: Likewise. * localedata/locales/bg_BG: Likewise. * localedata/locales/br_FR: Likewise. * localedata/locales/br_FR@euro: Likewise. * localedata/locales/ca_ES: Likewise. * localedata/locales/cns11643_stroke: Likewise. * localedata/locales/crh_UA: Likewise. * localedata/locales/cs_CZ: Likewise. * localedata/locales/csb_PL: Likewise. * localedata/locales/cv_RU: Likewise. * localedata/locales/cy_GB: Likewise. * localedata/locales/da_DK: Likewise. * localedata/locales/dz_BT: Likewise. * localedata/locales/en_CA: Likewise. * localedata/locales/eo: Likewise. * localedata/locales/es_CU: Likewise. * localedata/locales/es_EC: Likewise. * localedata/locales/es_ES: Likewise. * localedata/locales/es_US: Likewise. * localedata/locales/et_EE: Likewise. * localedata/locales/fa_IR: Likewise. * localedata/locales/fi_FI: Likewise. * localedata/locales/fil_PH: Likewise. * localedata/locales/fur_IT: Likewise. * localedata/locales/gez_ER@abegede: Likewise. * localedata/locales/ha_NG: Likewise. * localedata/locales/hr_HR: Likewise. * localedata/locales/hsb_DE: Likewise. * localedata/locales/hu_HU: Likewise. * localedata/locales/ig_NG: Likewise. * localedata/locales/ik_CA: Likewise. * localedata/locales/is_IS: Likewise. * localedata/locales/iso14651_t1_pinyin: Likewise. * localedata/locales/kk_KZ: Likewise. * localedata/locales/ku_TR: Likewise. * localedata/locales/ky_KG: Likewise. * localedata/locales/ln_CD: Likewise. * localedata/locales/lt_LT: Likewise. * localedata/locales/lv_LV: Likewise. * localedata/locales/mi_NZ: Likewise. * localedata/locales/ml_IN: Likewise. * localedata/locales/mn_MN: Likewise. * localedata/locales/mr_IN: Likewise. * localedata/locales/mt_MT: Likewise. * localedata/locales/nb_NO: Likewise. * localedata/locales/om_KE: Likewise. * localedata/locales/os_RU: Likewise. * localedata/locales/pl_PL: Likewise. * localedata/locales/ps_AF: Likewise. * localedata/locales/ro_RO: Likewise. * localedata/locales/ru_RU: Likewise. * localedata/locales/ru_UA: Likewise. * localedata/locales/sc_IT: Likewise. * localedata/locales/se_NO: Likewise. * localedata/locales/si_LK: Likewise. * localedata/locales/sq_AL: Likewise. * localedata/locales/sv_FI: Likewise. * localedata/locales/sv_FI@euro: Likewise. * localedata/locales/sv_SE: Likewise. * localedata/locales/szl_PL: Likewise. * localedata/locales/tg_TJ: Likewise. * localedata/locales/ti_ER: Likewise. * localedata/locales/tk_TM: Likewise. * localedata/locales/tl_PH: Likewise. * localedata/locales/tr_TR: Likewise. * localedata/locales/tt_RU: Likewise. * localedata/locales/tt_RU@iqtelif: Likewise. * localedata/locales/ug_CN: Likewise. * localedata/locales/uk_UA: Likewise. * localedata/locales/uz_UZ: Likewise. * localedata/locales/uz_UZ@cyrillic: Likewise. * localedata/locales/vi_VN: Likewise. * localedata/locales/yi_US: Likewise. * localedata/locales/yo_NG: Likewise. https://sourceware.org/git/gitweb.cgi?p=3Dglibc.git;h=3D242596394db9dad6147= bb2b7bcb53d8a7610e1d0 commit 242596394db9dad6147bb2b7bcb53d8a7610e1d0 Author: Mike FABIAN Date: Mon Jan 1 15:33:50 2018 +0100 Improve gen-locales.mk and gen-locale.sh to make test files with @ opti= ons work With out this, adding collation test files like localedata/gez_ER.UTF-8@abegede.in does not work for locales which contain @ modifiers. * gen-locales.mk: Make test files which contain @ modifiers in their name work. * localedata/gen-locale.sh: Likewise. https://sourceware.org/git/gitweb.cgi?p=3Dglibc.git;h=3Dcc5351f2c0502826f8b= 4143f3646d44e334ff7b8 commit cc5351f2c0502826f8b4143f3646d44e334ff7b8 Author: Mike FABIAN Date: Tue Jan 23 17:29:36 2018 +0100 Fix test cases tst-fnmatch and tst-regexloc for the new iso14651_t1_com= mon file. See: http://pubs.opengroup.org/onlinepubs/7908799/xbd/re.html > A range expression represents the set of collating elements that fall > between two elements in the current collation sequence, > inclusively. It is expressed as the starting point and the ending > point separated by a hyphen (-). > > Range expressions must not be used in portable applications because > their behaviour is dependent on the collating sequence. Ranges will be > treated according to the current collating sequence, and include such > characters that fall within the range based on that collating > sequence, regardless of character values. This, however, means that > the interpretation will differ depending on collating sequence. If, > for instance, one collating sequence defines =C3=A4 as a variant of a, > while another defines it as a letter following z, then the expression > [=C3=A4-z] is valid in the first language and invalid in the second. Therefore, using [a-z] does not make much sense except in the C/POSIX locale. The new iso14651_t1_common lists upper case and lower case Latin characters in a different order than the old one which causes surprising results for example in the de_DE locale: [a-z] now includes A because A comes after a in iso14651_t1_common but does not include Z because that comes after z in iso14651_t1_common. * posix/tst-fnmatch.input: Fix results for range expressions for non C locales. * posix/tst-regexloc.c: Do not use a range expression for de_DE.ISO-8859-1 locale. https://sourceware.org/git/gitweb.cgi?p=3Dglibc.git;h=3Dffa8106c727607fb365= f2b93649fe3ea182dffe4 commit ffa8106c727607fb365f2b93649fe3ea182dffe4 Author: Mike FABIAN Date: Fri Dec 15 07:19:45 2017 +0100 Fix posix/bug-regex5.c test case, adapt to iso14651_t1_common upate This test case tests how many collating elements are defined in da_DK.ISO-8859-1 locale. The da_DK locale source defines 4: collating-element from "" collating-element from "" collating-element from "" collating-element from "" The new iso14651_t1_common file defines more collating elements, two of them are in the ISO-8859-1 range: collating-element from "" % decomposition of LATIN CAPITAL LETTER L WITH MIDDLE DOT collating-element from "" % decomposition of LATIN SMALL LETTER L WITH MIDDLE DOT So the total count is now 6 instead of 4. * posix/bug-regex5.c: Fix test case because with the new iso14651_t1_common file, the da_DK locale now has 6 collating elements in the ISO-8859-1 range instead of 4 with the old iso14651_t1_common file. https://sourceware.org/git/gitweb.cgi?p=3Dglibc.git;h=3D61e613fb97aa619ae4f= abac3f106d5fffe15eacb commit 61e613fb97aa619ae4fabac3f106d5fffe15eacb Author: Mike FABIAN Date: Wed Dec 13 14:39:54 2017 +0100 Collation order of @-. and space has changed in new iso14651_t1_common file, adapt test files * localedata/da_DK.ISO-8859-1.in: In the new iso14651_t1_common file downloaded from ISO, the collation order of @-. and space has changed. Therefore, this test file needed to be adapted. * localedata/fr_CA.UTF-8.in: Likewise. * localedata/fr_FR.UTF-8.in: Likewise. * localedata/uk_UA.UTF-8.in: Likewise. https://sourceware.org/git/gitweb.cgi?p=3Dglibc.git;h=3D059454de60bdb1be997= 9ee09596c1e9a7e9e6c8b commit 059454de60bdb1be9979ee09596c1e9a7e9e6c8b Author: Mike FABIAN Date: Tue Dec 12 14:39:34 2017 +0100 Collation order of =C8=A5 has changed in new iso14651_t1_common file, a= dapt test files * localedata/cs_CZ.UTF-8.in: adapt this test file to the collation order of =C8=A5 in the new iso14651_t1_common file. * localedata/pl_PL.UTF-8.in: Likewise. https://sourceware.org/git/gitweb.cgi?p=3Dglibc.git;h=3D1f4df3bb2ac69f2e194= 7c2953379a7f19b5f0c35 commit 1f4df3bb2ac69f2e1947c2953379a7f19b5f0c35 Author: Mike FABIAN Date: Tue Jan 30 15:45:05 2018 +0100 Add sections for various scripts to the iso14651_t1_common file * localedata/locales/iso14651_t1_common: Add sections for various scripts to the iso14651_t1_common file. https://sourceware.org/git/gitweb.cgi?p=3Dglibc.git;h=3Da93fecdcece3e217883= 4f4b4868b2309b0158753 commit a93fecdcece3e2178834f4b4868b2309b0158753 Author: Mike FABIAN Date: Wed Jan 31 06:18:47 2018 +0100 iso14651_t1_common: make the fourth level the codepoint for characters which are ignorable on all 4 levels Entries for characters which have =E2=80=9CIGNORE=E2=80=9D on all 4 lev= els like: IGNORE;IGNORE;IGNORE;IGNORE % START OF HEADING (in ISO 6429) are changed into: IGNORE;IGNORE;IGNORE; % START OF HEADING (in ISO 6429) i.e. putting the code point of the character into the fourth level instead of =E2=80=9CIGNORE=E2=80=9D. Without that change, all such char= acters would compare equal which would make a wcscoll test case fail. It is better to have a clearly defined sort order even for characters like this so it is good to use the code point as a tie-break. * localedata/locales/iso14651_t1_common: Use the code point of a character in the fourth collation level instead of IGNORE for a= ll entries which have IGNORE on all 4 levels. https://sourceware.org/git/gitweb.cgi?p=3Dglibc.git;h=3D3e7089bf28ed1fd77e6= 44bb3ce7405aff7847e61 commit 3e7089bf28ed1fd77e644bb3ce7405aff7847e61 Author: Mike FABIAN Date: Mon Dec 11 20:00:24 2017 +0100 Add convenience symbols like , to iso14651_t1_common * localedata/locales/iso14651_t1_common: Add some convenient collat= ion symbols like , to make tailoring easier using rules similar to those in CLDR. https://sourceware.org/git/gitweb.cgi?p=3Dglibc.git;h=3D50a54ba443575e69ffb= 03aa67d53ccf8b66a4fbd commit 50a54ba443575e69ffb03aa67d53ccf8b66a4fbd Author: Mike FABIAN Date: Tue Jan 30 18:24:47 2018 +0100 Fixing syntax errors after updating the iso14651_t1_common file * localedata/locales/iso14651_t1_common: The new version of this file downloaded from ISO contained several syntax errors which are fixed by this patch. https://sourceware.org/git/gitweb.cgi?p=3Dglibc.git;h=3D661ab21c7521ba8e6e8= bc7dad897b6cf162e0cd0 commit 661ab21c7521ba8e6e8bc7dad897b6cf162e0cd0 Author: Mike FABIAN Date: Tue Jan 30 18:07:39 2018 +0100 iso14651_t1_common: =E2= =86=92 * localedata/locales/iso14651_t1_common: replace all with because glibc understands only 4 digit or 8 digit https://sourceware.org/git/gitweb.cgi?p=3Dglibc.git;h=3D06061c30d615b2862ac= 360f11384092c92022ea7 commit 06061c30d615b2862ac360f11384092c92022ea7 Author: Mike FABIAN Date: Tue Jan 30 18:04:31 2018 +0100 Necessary changes after updating the iso14651_t1_common file * localedata/locales/iso14651_t1_common: Necessary changes to make the file downloaded from ISO usable by glibc. https://sourceware.org/git/gitweb.cgi?p=3Dglibc.git;h=3Dbc1d41044c0cf9f0214= acdbfd79b6cd11fd1e8c1 commit bc1d41044c0cf9f0214acdbfd79b6cd11fd1e8c1 Author: Mike FABIAN Date: Tue Jan 30 17:59:00 2018 +0100 Update iso14651_t1_common file to ISO14651_2016_TABLE1_en.txt [BZ #1409= 5] [BZ #14095] - Review / update collation data from Unicode / ISO 14651 File downloaded from: http://standards.iso.org/iso-iec/14651/ed-4/ISO14651_2016_TABLE1_en.txt Updating this file alone is not enough, there are problems in the new file which need to be fixed and the collation rules for many locales need to be adapted. This is done by the following patches. This update also fixes the problem that many characters are treated as identical when sorting because they were not yet in the old iso14651_t1_common file, see: https://bugzilla.redhat.com/show_bug.cgi?id=3D1336308 - Infinite (=E2=88=9E) and empty set (=E2=88=85) are treated as if they= were the same character by sort and uniq [BZ #14095] * localedata/locales/iso14651_t1_common: Update file to latest version from ISO (ISO14651_2016_TABLE1_en.txt). https://sourceware.org/git/gitweb.cgi?p=3Dglibc.git;h=3D16e349c550942d274d3= 193ccedaa88855e3ac690 commit 16e349c550942d274d3193ccedaa88855e3ac690 Author: Mike FABIAN Date: Fri Mar 2 11:29:24 2018 +0100 Remove --quiet argument when installing locales Using this argument hides problems. I would like to see when something fails. * localedata/Makefile: Remove --quiet argument when installing locales ----------------------------------------------------------------------- --=20 You are receiving this mail because: You are on the CC list for the bug.