From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 13950 invoked by alias); 23 Apr 2012 04:22:51 -0000 Received: (qmail 17956 invoked by uid 22791); 23 Apr 2012 01:38:16 -0000 X-SWARE-Spam-Status: No, hits=-2.9 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 X-Spam-Check-By: sourceware.org From: "bugdal at aerifal dot cx" To: libc-locales@sources.redhat.com Subject: [Bug localedata/14010] New: Serious omissions in alphabetic character class Date: Mon, 23 Apr 2012 04:22:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: localedata X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: bugdal at aerifal dot cx X-Bugzilla-Status: NEW X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: Message-ID: X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Mailing-List: contact libc-locales-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-locales-owner@sourceware.org X-SW-Source: 2012-q2/txt/msg00109.txt.bz2 http://sourceware.org/bugzilla/show_bug.cgi?id=14010 Bug #: 14010 Summary: Serious omissions in alphabetic character class Product: glibc Version: unspecified Status: NEW Severity: normal Priority: P2 Component: localedata AssignedTo: unassigned@sourceware.org ReportedBy: bugdal@aerifal.cx CC: libc-locales@sources.redhat.com Classification: Unclassified The localedata generation code defines is_alpha based on Unicode categories L*, plus Nl, Nd, and a moderate number of special cases mostly to fix Thai language support (to fix is_alpha returning false for letters in category Mn). However Thai is not the only language affected; any language that uses non-spacing letters is broken by glibc's deficient is_alpha definition. As a particular example, all of the Tibetan subjoined letters are considered non-alphabetic (and thus punctuation) by glibc. Unicode addresses this issue by defining the Other_Alphabetic property in PropList.txt and the Alphabetic derived property in DerivedCoreProperties.txt, the latter of which consists of Lu+Ll+Lt+Lm+Lo+Nl + Other_Alphabetic. This subsumes all special-case hacks for Thai in glibc's gen-unicode-ctype.c and fixes the issue (at least approximately) for all other languages/scripts at the same time. glibc's localedata should adopt the definition of Alphabetic from Unicode's DerivedCoreProperties.txt (and still add Nd and the special cases from So). -- Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.