From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 60649 invoked by alias); 15 Oct 2018 12:11:46 -0000 Mailing-List: contact libc-locales-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-locales-owner@sourceware.org Received: (qmail 60584 invoked by uid 48); 15 Oct 2018 12:11:41 -0000 From: "maiku.fabian at gmail dot com" To: libc-locales@sourceware.org Subject: [Bug localedata/23774] lv_LV collates Y/y incorrectly Date: Mon, 15 Oct 2018 12:11:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: localedata X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: minor X-Bugzilla-Who: maiku.fabian at gmail dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2018-q4/txt/msg00059.txt.bz2 https://sourceware.org/bugzilla/show_bug.cgi?id=3D23774 --- Comment #1 from Mike FABIAN --- (In reply to Danko Alexeyev from comment #0) > Commit 159738548130d5ac4fe6178977e940ed5f8cfdc4 introduced this change in > the lv_LV locale: >=20 > - ;;;IGNORE % y > - ;;;IGNORE % Y > + ;;;IGNORE % y > + ;;;IGNORE % Y >=20 > I don't know what "PCL" meant and whether "Y" was supposed to be "BASE" in > the first place, but "LOWLINE" certainly looks like a bug. PCL was an old collation symbol which was used in an older version of the glibc/localedata/locales/iso14651_t1_common file. It was a second level collation symbol. To get the right sort order, replacing it by any existing secondary collation symbol except "BASE" works fine here. The current glibc/localedata/locales/iso14651_t1_common contains: % Second-level collating symbols collating-symbol collating-symbol % COMBINING LOW LINE collating-symbol % COMBINING COMMA ABOVE collating-symbol % COMBINING REVERSED COMMA ABOVE collating-symbol % COMBINING ACUTE ACCENT ... means base letter, all the following collation symbols can be used to indicate secondary differences to base letters. As there is nothing particularly appropriate for the difference between i and y, it doesn=E2=80= =99t really matter which one is used, so I did choose the first one, LOWLINE. > Letter Y is not present in the Latvian alphabet, however it is present in > Latgalian and is located after I, which is what the CLDR rule seems to > suggest: >=20 > &I< I found this by accident while investigating the result of this command on > my system (with LANG being lv_LV.UTF-8) >=20 > $ echo abcxyz | grep -Eo '[a-z]+' > abcx > z >=20 > I'm sorry if I misunderstood something as I've never worked with either > glibc or CLDR locales directly before. This fails for other reasons, not because of the use of LOWLINE. --=20 You are receiving this mail because: You are on the CC list for the bug.