From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-locales-return-6753-listarch-libc-locales=sources.redhat.com@sourceware.org>
Received: (qmail 90871 invoked by alias); 10 Jun 2019 16:49:08 -0000
Mailing-List: contact libc-locales-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-locales.sourceware.org>
List-Subscribe: <mailto:libc-locales-subscribe@sourceware.org>
List-Post: <mailto:libc-locales@sourceware.org>
List-Help: <mailto:libc-locales-help@sourceware.org>, <http://sourceware.org/lists.html#faqs>
Sender: libc-locales-owner@sourceware.org
Received: (qmail 87082 invoked by uid 48); 10 Jun 2019 16:49:05 -0000
From: "rob.ross at ymail dot com" <sourceware-bugzilla@sourceware.org>
To: libc-locales@sourceware.org
Subject: [Bug localedata/24658] New: wcwidth inconsistencies with Unicode
 12.1
Date: Mon, 10 Jun 2019 16:49:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: glibc
X-Bugzilla-Component: localedata
X-Bugzilla-Version: unspecified
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rob.ross at ymail dot com
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: unassigned at sourceware dot org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status
 bug_severity priority component assigned_to reporter cc target_milestone
Message-ID: <bug-24658-716@http.sourceware.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://sourceware.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2019-q2/txt/msg00089.txt.bz2

https://sourceware.org/bugzilla/show_bug.cgi?id=3D24658

            Bug ID: 24658
           Summary: wcwidth inconsistencies with Unicode 12.1
           Product: glibc
           Version: unspecified
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: localedata
          Assignee: unassigned at sourceware dot org
          Reporter: rob.ross at ymail dot com
                CC: libc-locales at sourceware dot org
  Target Milestone: ---

For "en_US.utf8", the 2019-06-10 trunk closely follows Unicode standard exc=
ept
for U+3248 to U+324F (Circled numbers with Ambiguous [A] width) and U+4DC0 =
to
U+4DFF (Yijing hexagram symbols with Neutral [N] width) where wcwidth retur=
ns 2
instead of 1.  Those deviations were intentionally added to
"localedata/unicode-gen/utf8_gen.py" starting at line 262.  The rationale
starting at line 263 refers to
<http://www.unicode.org/mail-arch/unicode-ml/y2017-m08/0023.html> which only
applies to the first range and depends on the definition of "context".  The
interpretation that glibc is a context, regardless of locale, is likely not
what was intended.  In particular, UAX 11
(<http://www.unicode.org/reports/tr11/tr11-36.html>) makes it clear that the
"EastAsianWidth.txt" context is either "East Asian" or "non-East Asian".  It
also states that "narrow characters include N, Na, H, and A (when not in Ea=
st
Asian context)."

This bug relates to 21750
(<https://sourceware.org/bugzilla/show_bug.cgi?id=3D21750>) item 5.  Part o=
f the
rationale there for forcing a width of 2 was based on xterm's implementation
but xterm defaults to using wcwidth (unless you set mkWidth) so it's not ve=
ry
convincing.  Another rationale was "glyphs for these characters are quadrat=
ic
in most fonts" which is a good point but lots of characters have this probl=
em.=20
Should there be wcwidth bugs for those characters?  Why should some ranges
receive special treatment?  The last rationale related to application
compatibility.  Changing widths to better track the Unicode database will b=
reak
old versions of applications, but programs are increasingly tracking that
database themselves so the problem will resolve itself.  A concrete example=
 is
vim which needs its own table in order to function consistently on platforms
without wcwidth.  Egmont Koblinger provided good rationales for a width of 1
and I don't see why they were discounted.

--=20
You are receiving this mail because:
You are on the CC list for the bug.