public inbox for glibc-bugs@sourceware.org help / color / mirror / Atom feed
From: "maiku.fabian at gmail dot com" <sourceware-bugzilla@sourceware.org> To: glibc-bugs@sourceware.org Subject: [Bug localedata/17588] Update UTF-8 charmap and width to Unicode 7.0.0 Date: Wed, 03 Dec 2014 07:17:00 -0000 [thread overview] Message-ID: <bug-17588-131-DkqnHQQZQ1@http.sourceware.org/bugzilla/> (raw) In-Reply-To: <bug-17588-131@http.sourceware.org/bugzilla/> [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="UTF-8", Size: 9632 bytes --] https://sourceware.org/bugzilla/show_bug.cgi?id=17588 --- Comment #9 from Mike FABIAN <maiku.fabian at gmail dot com> --- I built glibc with the patch from comment#8. I produces some FAILs in âmake checkâ: FAIL: localedata/cs_CZ.UTF-8/LC_CTYPE ... similar FAILs ... Shortly after starting âmake checkâ one sees: ./charmaps/UTF-8:42734: unknown character `U00009FCD' ... similar messages ... All the above problems are cause by ranges of reserved code points which are listed in EastAsianWidth.txt like this: 9FCD..9FFF;W # Cn [51] <reserved-9FCD>..<reserved-9FFF> and these code points are not in UnicodeData.txt. Therefore, they are not generated into the CHARMAP section of glibcâs UTF-8 file and it causes the above problems if they are generated into the WIDTH section of glibcâs UTF-8 file. This can be fixed by not generating reserved code points into the WIDTH section, i.e. by ignoring the reserved code points mentioned in EastAsianWidth.txt. Patch for utf8-gen.py: diff --git a/utf8-gen.py b/utf8-gen.py index 57875b6..20b68bb 100755 --- a/utf8-gen.py +++ b/utf8-gen.py @@ -218,6 +218,8 @@ if __name__ == "__main__": write_comments(outfile, 1) elines = [] for line in easta_file.readlines(): + if re.match(r'.*<reserved-.+>\.\.<reserved-.+>.*', line): + continue if re.match(r'^[^;]*;[WF]', line): elines.append(line.strip()) process_width(outfile, flines, elines) -- You are receiving this mail because: You are on the CC list for the bug. >From glibc-bugs-return-26784-listarch-glibc-bugs=sources.redhat.com@sourceware.org Wed Dec 03 09:59:20 2014 Return-Path: <glibc-bugs-return-26784-listarch-glibc-bugs=sources.redhat.com@sourceware.org> Delivered-To: listarch-glibc-bugs@sources.redhat.com Received: (qmail 30733 invoked by alias); 3 Dec 2014 09:59:20 -0000 Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <glibc-bugs.sourceware.org> List-Subscribe: <mailto:glibc-bugs-subscribe@sourceware.org> List-Post: <mailto:glibc-bugs@sourceware.org> List-Help: <mailto:glibc-bugs-help@sourceware.org>, <http://sourceware.org/lists.html#faqs> Sender: glibc-bugs-owner@sourceware.org Delivered-To: mailing list glibc-bugs@sourceware.org Received: (qmail 30661 invoked by uid 48); 3 Dec 2014 09:59:14 -0000 From: "maiku.fabian at gmail dot com" <sourceware-bugzilla@sourceware.org> To: glibc-bugs@sourceware.org Subject: [Bug localedata/14094] Update locale data to Unicode 7.0.0 Date: Wed, 03 Dec 2014 09:59:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: localedata X-Bugzilla-Version: 2.21 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: maiku.fabian at gmail dot com X-Bugzilla-Status: ASSIGNED X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: pravin.d.s at gmail dot com X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: security- X-Bugzilla-Changed-Fields: Message-ID: <bug-14094-131-MOycuzHapv@http.sourceware.org/bugzilla/> In-Reply-To: <bug-14094-131@http.sourceware.org/bugzilla/> References: <bug-14094-131@http.sourceware.org/bugzilla/> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2014-12/txt/msg00027.txt.bz2 Content-length: 3386 https://sourceware.org/bugzilla/show_bug.cgi?id=14094 --- Comment #34 from Mike FABIAN <maiku.fabian at gmail dot com> --- When I generate a new glibc/localedata/locales/i18n file using gen-unicode-ctype.py from comment#33 and build glibc with that and then run the tests with âmake checkâ, I get one failure: FAIL: localedata/tst-ctype Looking why it fails I find in ./localedata/tst-ctype.out: Locale-specific tests for `lower' islower('ª' = '\xaa') is true islower('º' = '\xba') is true Locale-specific tests for `lower' ... 2 errors for `de_DE.ISO-8859-1' locale The new âlowerâ character class generated by gen-unicode-ctype.py contains U+00AA ª FEMININE ORDINAL INDICATOR and U+00BA º MASCULINE ORDINAL INDICATOR. The test tst-ctype run by âmake checkâ wants them *not* to be lower case. DerivedCoreProperties.txt lists both as lower case though: 00AA ; Lowercase # Lo FEMININE ORDINAL INDICATOR 00BA ; Lowercase # Lo MASCULINE ORDINAL INDICATOR Thatâs why gen-unicode-ctype.py adds them to the âlowerâ character class, it adds all characters found in DerivedCoreProperties.txt marked as âLowercaseâ to the character class âlowerâ. I wonder what needs to be done here. Is the test in glibc wrong? If so, it could be fixed by a patch like this: $ git show | iconv -f iso-8859-1 -t utf-8 commit 25c913674386011a44b6270579a894b2e8200d25 Author: Mike FABIAN <mfabian@redhat.com> Date: Wed Dec 3 10:05:42 2014 +0100 Fix test case localedata/tst-ctype-de_DE.ISO-8859-1.in DerivedCoreProperties.txt from Unicode 7.0.0 lists the characters U+00AA (ê) and U+00BA (ú) as lower case: 00AA ; Lowercase # Lo FEMININE ORDINAL INDICATOR 00BA ; Lowercase # Lo MASCULINE ORDINAL INDICATOR diff --git a/localedata/tst-ctype-de_DE.ISO-8859-1.in b/localedata/tst-ctype-de_DE.ISO-8859-1.in index f71d76c..e124a52 100644 --- a/localedata/tst-ctype-de_DE.ISO-8859-1.in +++ b/localedata/tst-ctype-de_DE.ISO-8859-1.in @@ -1,5 +1,5 @@ lower  ¡¢£¤¥¦§¨©ª«¬Â®¯°±²³´µ¶·¸¹º»¼½¾¿ÃÃÃÃÃà ÃÃÃÃÃÃÃÃÃà - 000000000000000000000100000000000000000000000000 + 000000000010000000000100001000000000000000000000 lower ÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃà áâãäåæçèéêëìÃîïðñòóôõö÷øùúûüýþÿ 000000000000000111111111111111111111111011111111 upper  ¡¢£¤¥¦§¨©ª«¬Â®¯°±²³´µ¶·¸¹º»¼½¾¿ÃÃÃÃÃà ÃÃÃÃÃÃÃÃÃà -- You are receiving this mail because: You are on the CC list for the bug. >From glibc-bugs-return-26785-listarch-glibc-bugs=sources.redhat.com@sourceware.org Wed Dec 03 11:49:27 2014 Return-Path: <glibc-bugs-return-26785-listarch-glibc-bugs=sources.redhat.com@sourceware.org> Delivered-To: listarch-glibc-bugs@sources.redhat.com Received: (qmail 6139 invoked by alias); 3 Dec 2014 11:49:27 -0000 Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <glibc-bugs.sourceware.org> List-Subscribe: <mailto:glibc-bugs-subscribe@sourceware.org> List-Post: <mailto:glibc-bugs@sourceware.org> List-Help: <mailto:glibc-bugs-help@sourceware.org>, <http://sourceware.org/lists.html#faqs> Sender: glibc-bugs-owner@sourceware.org Delivered-To: mailing list glibc-bugs@sourceware.org Received: (qmail 6105 invoked by uid 48); 3 Dec 2014 11:49:23 -0000 From: "pravin.d.s at gmail dot com" <sourceware-bugzilla@sourceware.org> To: glibc-bugs@sourceware.org Subject: [Bug localedata/17588] Update UTF-8 charmap and width to Unicode 7.0.0 Date: Wed, 03 Dec 2014 11:49:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: localedata X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: pravin.d.s at gmail dot com X-Bugzilla-Status: ASSIGNED X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: pravin.d.s at gmail dot com X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: security- X-Bugzilla-Changed-Fields: attachments.isobsolete attachments.created Message-ID: <bug-17588-131-DUHIElb1jH@http.sourceware.org/bugzilla/> In-Reply-To: <bug-17588-131@http.sourceware.org/bugzilla/> References: <bug-17588-131@http.sourceware.org/bugzilla/> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2014-12/txt/msg00028.txt.bz2 Content-length: 1283 https://sourceware.org/bugzilla/show_bug.cgi?id\x17588 Pravin S <pravin.d.s at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #7980|0 |1 is obsolete| | --- Comment #10 from Pravin S <pravin.d.s at gmail dot com> --- Created attachment 7987 --> https://sourceware.org/bugzilla/attachment.cgi?idy87&actioníit Patch to update UTF-8 CHARMAP and WIDTH to unicode 7.0 2014-12-01 Pravin Satpute <psatpute@redhat.com> [BZ #17588 #13064] * charmaps/UTF-8: Updated UTF-8 CHARMAP and WIDTH to Unicode 7.0.0. * localedata/utf8-gen.py: New script for generating UTF-8 CHARMAP from latest UnicodeData.txt. * localedata/utf8-compatibility.py: New script for testing backward compatibility of newly generated UTF-8 file. Reviewed and improved by Mike FABIAN <mfabian@redhat.com> ------------------------------------------------------------------------------ Yes, i also able to reproduce same issues while building glibc with patch. This patch fixes those issues. -- You are receiving this mail because: You are on the CC list for the bug.
next prev parent reply other threads:[~2014-12-03 7:17 UTC|newest] Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top 2014-11-12 10:11 [Bug localedata/17588] New: " pravin.d.s at gmail dot com 2014-11-12 11:19 ` [Bug localedata/17588] " pravin.d.s at gmail dot com 2014-11-12 11:22 ` pravin.d.s at gmail dot com 2014-11-21 6:27 ` pravin.d.s at gmail dot com 2014-11-21 7:35 ` maiku.fabian at gmail dot com 2014-11-21 16:49 ` maiku.fabian at gmail dot com 2014-11-24 16:34 ` pravin.d.s at gmail dot com 2014-12-01 11:49 ` maiku.fabian at gmail dot com 2014-12-01 11:54 ` pravin.d.s at gmail dot com 2014-12-03 7:17 ` maiku.fabian at gmail dot com [this message] 2014-12-12 11:31 ` pravin.d.s at gmail dot com 2015-02-20 22:36 ` cvs-commit at gcc dot gnu.org 2015-02-21 0:06 ` aoliva at sourceware dot org
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=bug-17588-131-DkqnHQQZQ1@http.sourceware.org/bugzilla/ \ --to=sourceware-bugzilla@sourceware.org \ --cc=glibc-bugs@sourceware.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).