From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 5744 invoked by alias); 6 Aug 2011 16:31:30 -0000 Received: (qmail 5737 invoked by uid 22791); 6 Aug 2011 16:31:29 -0000 X-SWARE-Spam-Status: No, hits=-2.8 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 X-Spam-Check-By: sourceware.org Received: from localhost (HELO sourceware.org) (127.0.0.1) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Sat, 06 Aug 2011 16:31:16 +0000 From: "bruno at clisp dot org" To: glibc-bugs@sources.redhat.com Subject: [Bug localedata/13061] New: iconv mapping of 0xA8 0xEC in CP1258 is non-canonical X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: localedata X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: bruno at clisp dot org X-Bugzilla-Status: NEW X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: libc-locales at sources dot redhat.com X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: Message-ID: X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Date: Sat, 06 Aug 2011 16:31:00 -0000 Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: glibc-bugs-owner@sourceware.org X-SW-Source: 2011-08/txt/msg00006.txt.bz2 http://sourceware.org/bugzilla/show_bug.cgi?id=13061 Summary: iconv mapping of 0xA8 0xEC in CP1258 is non-canonical Product: glibc Version: 2.14 Status: NEW Severity: normal Priority: P2 Component: localedata AssignedTo: libc-locales@sources.redhat.com ReportedBy: bruno@clisp.org Bug 12777 was fixed to map U+0385 (like U+1FEE) to 0xA8 0xEC. Good. But at the same time, in the reverse direction, 0xA8 0xEC ought to map to U+0385, not to U+1FEE. Why? 1) http://www.unicode.org/charts/PDF/U1F00.pdf states that the decomposition of U+1FEE is U+0385. That is, U+0385 is a "simpler" Unicode character than U+1FEE, although both look very similar (cf. http://www.unicode.org/charts/PDF/U1F00.pdf and http://www.unicode.org/charts/PDF/U0370.pdf). 2) According to http://www.unicode.org/versions/Unicode6.0.0/ch07.pdf, the block U+0370..U+03FF is more for modern Greek, whereas the block U+1F00..U+1FFF is mostly for ancient Greek. But CP1258 is about modern Greek. To reproduce: $ printf '\xA8\xEC' | iconv -f CP1258 -t UCS-4LE | od -t x4 0000000 00001fee 0000004 Should be: $ printf '\xA8\xEC' | iconv -f CP1258 -t UCS-4LE | od -t x4 0000000 00000385 0000004 Attached is probable fix (untested). -- Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.