public inbox for libc-locales@sourceware.org
 help / color / mirror / Atom feed
From: "rob.ross at ymail dot com" <sourceware-bugzilla@sourceware.org>
To: libc-locales@sourceware.org
Subject: [Bug localedata/24658] New: wcwidth inconsistencies with Unicode 12.1
Date: Mon, 10 Jun 2019 16:49:00 -0000	[thread overview]
Message-ID: <bug-24658-716@http.sourceware.org/bugzilla/> (raw)

https://sourceware.org/bugzilla/show_bug.cgi?id=24658

            Bug ID: 24658
           Summary: wcwidth inconsistencies with Unicode 12.1
           Product: glibc
           Version: unspecified
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: localedata
          Assignee: unassigned at sourceware dot org
          Reporter: rob.ross at ymail dot com
                CC: libc-locales at sourceware dot org
  Target Milestone: ---

For "en_US.utf8", the 2019-06-10 trunk closely follows Unicode standard except
for U+3248 to U+324F (Circled numbers with Ambiguous [A] width) and U+4DC0 to
U+4DFF (Yijing hexagram symbols with Neutral [N] width) where wcwidth returns 2
instead of 1.  Those deviations were intentionally added to
"localedata/unicode-gen/utf8_gen.py" starting at line 262.  The rationale
starting at line 263 refers to
<http://www.unicode.org/mail-arch/unicode-ml/y2017-m08/0023.html> which only
applies to the first range and depends on the definition of "context".  The
interpretation that glibc is a context, regardless of locale, is likely not
what was intended.  In particular, UAX 11
(<http://www.unicode.org/reports/tr11/tr11-36.html>) makes it clear that the
"EastAsianWidth.txt" context is either "East Asian" or "non-East Asian".  It
also states that "narrow characters include N, Na, H, and A (when not in East
Asian context)."

This bug relates to 21750
(<https://sourceware.org/bugzilla/show_bug.cgi?id=21750>) item 5.  Part of the
rationale there for forcing a width of 2 was based on xterm's implementation
but xterm defaults to using wcwidth (unless you set mkWidth) so it's not very
convincing.  Another rationale was "glyphs for these characters are quadratic
in most fonts" which is a good point but lots of characters have this problem. 
Should there be wcwidth bugs for those characters?  Why should some ranges
receive special treatment?  The last rationale related to application
compatibility.  Changing widths to better track the Unicode database will break
old versions of applications, but programs are increasingly tracking that
database themselves so the problem will resolve itself.  A concrete example is
vim which needs its own table in order to function consistently on platforms
without wcwidth.  Egmont Koblinger provided good rationales for a width of 1
and I don't see why they were discounted.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

             reply	other threads:[~2019-06-10 16:49 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-10 16:49 rob.ross at ymail dot com [this message]
2019-06-10 18:09 ` [Bug localedata/24658] " rob.ross at ymail dot com

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-24658-716@http.sourceware.org/bugzilla/ \
    --to=sourceware-bugzilla@sourceware.org \
    --cc=libc-locales@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).