From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 11920 invoked by alias); 19 Jun 2014 10:28:21 -0000 Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: glibc-bugs-owner@sourceware.org Received: (qmail 11807 invoked by uid 48); 19 Jun 2014 10:28:07 -0000 From: "pravin.d.s at gmail dot com" To: glibc-bugs@sourceware.org Subject: [Bug localedata/14094] Update locale data to Unicode 6.3 Date: Thu, 19 Jun 2014 10:28:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: localedata X-Bugzilla-Version: 2.15 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: pravin.d.s at gmail dot com X-Bugzilla-Status: ASSIGNED X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: pravin.d.s at gmail dot com X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2014-06/txt/msg01266.txt.bz2 https://sourceware.org/bugzilla/show_bug.cgi?id=14094 --- Comment #9 from Pravin S --- (In reply to Rich Felker from comment #1) > One of the major "local hacks" can be fixed, fixing many other problems at > the same time, by switching to using the Unicode "Alphabetic" property (from > DerivedCoreProperties.txt) instead of just categories L* for class alpha. > Right now there are many languages whose letters are considered > non-alphabetic by glibc because they're in category Mn or Mc or even Cf. > There are "local hacks" to fix this for maybe one or two languages, but > using the right Unicode property would fix it for all languages. I was almost done with things bug While updating this, i found around 248 characters were added after gen-unicode-ctype.c processing in ALPHA group in present i18n CTYPE (Unicode 5.1 https://github.com/pravins/glibc-i18n/blob/master/unicode5-1/Report ) and i am facing same issue while upgrading it to Unicode 6.3 (246 characters) (https://github.com/pravins/glibc-i18n/blob/master/Report) During reading http://www.unicode.org/reports/tr44/#Property_List_Table It is mentioned "Implementations should simply use the derived properties, and should not try to rederive them from lists of simple properties and collections of rules, because of the chances for error and divergence when doing so." I agree with Rich, We should collect available things from DerivedCoreProperties.txt rather than processing raw UnicodeData.txt. I am writing script to process groups from DerivedCoreProperties.txt -- You are receiving this mail because: You are on the CC list for the bug.