From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id C242F3857001; Sun, 5 Jul 2020 17:08:05 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C242F3857001 From: "nickblack at linux dot com" To: glibc-bugs@sourceware.org Subject: [Bug libc/26207] New: wcwidth() returns -1 for numerous emoji added by unicode 13.0 Date: Sun, 05 Jul 2020 17:08:05 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: libc X-Bugzilla-Version: 2.33 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: nickblack at linux dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter cc target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: glibc-bugs@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Glibc-bugs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Jul 2020 17:08:05 -0000 https://sourceware.org/bugzilla/show_bug.cgi?id=3D26207 Bug ID: 26207 Summary: wcwidth() returns -1 for numerous emoji added by unicode 13.0 Product: glibc Version: 2.33 Status: UNCONFIRMED Severity: normal Priority: P2 Component: libc Assignee: unassigned at sourceware dot org Reporter: nickblack at linux dot com CC: drepper.fsp at gmail dot com Target Milestone: --- This is with the 2.33 candidate, commit ca3549c8. I see the same behavior o= n my vendor (Debian Unstable, glibc 2.30-8) glibc. In a proper UTF-8 locale (LANG=3D"en_US.UTF-8"), wcwidth() returns -1 for a number of emoji introduced by Unicode 13.0. I've prepared what I believe to= be an exhaustive list: U+01f972 U+01f978 U+01f90c U+01fac0 U+01fac1 U+01f9ac U+01f9a3 U+01f9ab U+01f9a4 U+01fab6 U+01f9ad U+01fab2 U+01fab3 U+01fab0 U+01fab1 U+01fab4 U+01fad0 U+01fad2 U+01fad1 U+01fad3 U+01fad4 U+01fad5 U+01fad6 U+01f9cb U+01faa8 U+01fab5 U+01f6d6 U+01f6fb U+01f6fc U+01fa84 U+01fa85 U+01fa86 U+01faa1 U+01faa2 U+01fa74 U+01fa96 U+01fa97 U+01fa98 U+01fa99 U+01fa83 U+01fa9a U+01fa9b U+01fa9d U+01fa9c U+01f6d7 U+01fa9e U+01fa9f U+01faa0 U+01faa4 U+01faa3 U+01faa5 U+01faa6 U+01faa7 while all of these values have a high 17th bit, there are numerous codepoin= ts with a high 17th bit for which glibc returns a correct wcwidth() value. This was discovered while developing the "Mojibake" demo of Notcurses: https://github.com/dankamongmen/notcurses/blob/master/src/demo/mojibake.c --=20 You are receiving this mail because: You are on the CC list for the bug.=