From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 90871 invoked by alias); 10 Jun 2019 16:49:08 -0000 Mailing-List: contact libc-locales-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-locales-owner@sourceware.org Received: (qmail 87082 invoked by uid 48); 10 Jun 2019 16:49:05 -0000 From: "rob.ross at ymail dot com" To: libc-locales@sourceware.org Subject: [Bug localedata/24658] New: wcwidth inconsistencies with Unicode 12.1 Date: Mon, 10 Jun 2019 16:49:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: localedata X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: rob.ross at ymail dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter cc target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2019-q2/txt/msg00089.txt.bz2 https://sourceware.org/bugzilla/show_bug.cgi?id=3D24658 Bug ID: 24658 Summary: wcwidth inconsistencies with Unicode 12.1 Product: glibc Version: unspecified Status: UNCONFIRMED Severity: normal Priority: P2 Component: localedata Assignee: unassigned at sourceware dot org Reporter: rob.ross at ymail dot com CC: libc-locales at sourceware dot org Target Milestone: --- For "en_US.utf8", the 2019-06-10 trunk closely follows Unicode standard exc= ept for U+3248 to U+324F (Circled numbers with Ambiguous [A] width) and U+4DC0 = to U+4DFF (Yijing hexagram symbols with Neutral [N] width) where wcwidth retur= ns 2 instead of 1. Those deviations were intentionally added to "localedata/unicode-gen/utf8_gen.py" starting at line 262. The rationale starting at line 263 refers to which only applies to the first range and depends on the definition of "context". The interpretation that glibc is a context, regardless of locale, is likely not what was intended. In particular, UAX 11 () makes it clear that the "EastAsianWidth.txt" context is either "East Asian" or "non-East Asian". It also states that "narrow characters include N, Na, H, and A (when not in Ea= st Asian context)." This bug relates to 21750 () item 5. Part o= f the rationale there for forcing a width of 2 was based on xterm's implementation but xterm defaults to using wcwidth (unless you set mkWidth) so it's not ve= ry convincing. Another rationale was "glyphs for these characters are quadrat= ic in most fonts" which is a good point but lots of characters have this probl= em.=20 Should there be wcwidth bugs for those characters? Why should some ranges receive special treatment? The last rationale related to application compatibility. Changing widths to better track the Unicode database will b= reak old versions of applications, but programs are increasingly tracking that database themselves so the problem will resolve itself. A concrete example= is vim which needs its own table in order to function consistently on platforms without wcwidth. Egmont Koblinger provided good rationales for a width of 1 and I don't see why they were discounted. --=20 You are receiving this mail because: You are on the CC list for the bug.