From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 10212 invoked by alias); 4 Jul 2014 09:13:39 -0000 Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: glibc-bugs-owner@sourceware.org Received: (qmail 10103 invoked by uid 48); 4 Jul 2014 09:13:27 -0000 From: "pravin.d.s at gmail dot com" To: glibc-bugs@sourceware.org Subject: [Bug localedata/14094] Update locale data to Unicode 7.0.0 Date: Fri, 04 Jul 2014 09:13:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: localedata X-Bugzilla-Version: 2.15 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: pravin.d.s at gmail dot com X-Bugzilla-Status: ASSIGNED X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: pravin.d.s at gmail dot com X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: security- X-Bugzilla-Changed-Fields: attachments.created Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2014-07/txt/msg00558.txt.bz2 https://sourceware.org/bugzilla/show_bug.cgi?id=14094 --- Comment #13 from Pravin S --- Created attachment 7679 --> https://sourceware.org/bugzilla/attachment.cgi?id=7679&action=edit Patch to update UTF-8 CHARMAP to unicode 7.0 I have worked on updating UTF-8 file to Unicode 7.0. Following are the important points before review this patch. 1. Present patch is only for CHARMAP, patch for updating WIDTH will be available soon. 2. utf8-gen.py: New script to generate UTF-8 file. 3. patch is created by ignoring space changes (-w) 4. ''' Where UnicodeData.txt file has given characters in range Example: 3400;;Lo;0;L;;;;;N;;;;; 4DB5;;Lo;0;L;;;;;N;;;;; UTF-8 file mention these range by adding 0x3F inbetween First and Last Unicode character. Example: .. /xe3/x90/x80 . . .. /xe4/xb6/x80 * Note: No idea why Hangul syllable AC00; D7A3; were not expanded in Unicode ** ** 5.0 UTF-8. We are following consistency and expanding Hangul as well.** * ''' 5. Name changes are in UnicodeData.txt in some cases. ''' Some characters have as a name, so using "Unicode 1.0 Name" Characters U+0080, U+0081, U+0084 and U+0099 has "" as a name and even no "Unicode 1.0 Name" (10th field) in UnicodeData.txt We can write code to take there alternate name from NameAliases.txt ''' -- You are receiving this mail because: You are on the CC list for the bug.