public inbox for libc-locales@sourceware.org
 help / color / mirror / Atom feed
From: "bugdal at aerifal dot cx" <sourceware-bugzilla@sourceware.org>
To: libc-locales@sources.redhat.com
Subject: [Bug localedata/14010] New: Serious omissions in alphabetic character class
Date: Mon, 23 Apr 2012 04:22:00 -0000	[thread overview]
Message-ID: <bug-14010-716@http.sourceware.org/bugzilla/> (raw)

http://sourceware.org/bugzilla/show_bug.cgi?id=14010

             Bug #: 14010
           Summary: Serious omissions in alphabetic character class
           Product: glibc
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: localedata
        AssignedTo: unassigned@sourceware.org
        ReportedBy: bugdal@aerifal.cx
                CC: libc-locales@sources.redhat.com
    Classification: Unclassified


The localedata generation code defines is_alpha based on Unicode categories L*,
plus Nl, Nd, and a moderate number of special cases mostly to fix Thai language
support (to fix is_alpha returning false for letters in category Mn). However
Thai is not the only language affected; any language that uses non-spacing
letters is broken by glibc's deficient is_alpha definition. As a particular
example, all of the Tibetan subjoined letters are considered non-alphabetic
(and thus punctuation) by glibc.

Unicode addresses this issue by defining the Other_Alphabetic property in
PropList.txt and the Alphabetic derived property in DerivedCoreProperties.txt,
the latter of which consists of Lu+Ll+Lt+Lm+Lo+Nl + Other_Alphabetic. This
subsumes all special-case hacks for Thai in glibc's gen-unicode-ctype.c and
fixes the issue (at least approximately) for all other languages/scripts at the
same time.

glibc's localedata should adopt the definition of Alphabetic from Unicode's 
DerivedCoreProperties.txt (and still add Nd and the special cases from So).

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

             reply	other threads:[~2012-04-23  4:22 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-23  4:22 bugdal at aerifal dot cx [this message]
2012-09-21 23:46 ` [Bug localedata/14010] " bugdal at aerifal dot cx
2012-09-23 22:39 ` joseph at codesourcery dot com
2013-10-25 20:32 ` myllynen at redhat dot com
2013-10-25 20:34 ` joseph at codesourcery dot com
2013-10-25 20:34 ` bugdal at aerifal dot cx
2013-10-25 20:34 ` bugdal at aerifal dot cx
2014-06-25 12:09 ` fweimer at redhat dot com
2014-12-04 10:34 ` maiku.fabian at gmail dot com
2020-04-15 13:51 ` meave390 at gmail dot com
2020-04-15 14:02 ` meave390 at gmail dot com

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-14010-716@http.sourceware.org/bugzilla/ \
    --to=sourceware-bugzilla@sourceware.org \
    --cc=libc-locales@sources.redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).